Textbooks

Traditional textbooks do not exist for a class like this. Instead we’ll be using a number of inexpensive or free paperback/electronic books that will cover different aspects of the course. They are all well worth owning. In addition, we will be using numerous free web based resources.

At the bottom of this page I’ll list the books I’ve used at some point in the past - but you do NOT need them.

Required texts for Fall 2024

Several of the texts have free versions available online. All these books also have official websites from which you can buy print, PDF, or eBooks. Of course, you can also find them at numerous places on the web. I’ve listed approximate pricing from checking a few of the online booksellers.

Introduction to Statistical Learning (with Applications in R) (Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani) (ISLR)

This is a FREE text that does a great job of explaining the main statistical learning techniques at an accessible mathematical level. You can download a free PDF from the link above. They’ve even created a whole set of video lectures accompanying the book - https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/.

R for Everyone (2nd Edition) (Jared Lander) (RforE)

This provides an accessible, modern and thorough introduction to the world of the R statistical computing platform. I’ve used this book the past three years.

R for Data Science (2e) (Wickham, Cetinkaya-Rundel and Grolemund) (r4ds)

The second edition of this book was released in summer of 2023. It’s a pretty big overhaul and it’s freely available online.

This is a newly released book by one of the giants in the R community. Hadley Wickham has created some of the most widely used R packages and has had a tremendous influence on the use of R for data science.

Practical Data Science with R (2ed) (Nina Zumel and John Mount) (PDSwR)

This is a newish book (2014 with 2ed just out a few years ago) that does just what the title suggests. It is structured around typical business analytics or data science projects and covers the main statistical learning techniques along with tons of practical advice on doing data science projects.

A Whirlwind Tour of Python (Jake VanderPlas) (WToP)

One of JVP’s contributions is this very nice, concise, intro to programming in Python.

Python Data Science Handbook (Jake VanderPlas) (PDSH)

This is another newish book and is written by a scientist who has been a big contributor to the Python data science world. This books covers all the main essentials for doing data science work in Python.

More good books (NOT required)

[PfDA] Python for Data Analysis - https://wesmckinney.com/book/

Wes McKinney Free online but print also available

This is a somewhat more advanced book on using Python for data analysis. It was written by the developer of the hugely popular Python package, pandas. In addition to a thorough coverage of pandas, it covers numpy, IPython, and even an intro to the Python language. This is the 3rd edition which just came out in 2022.

A few years we used the following book along with RforE and PDSwR. It’s not required for the class this year but I do highly recommend it for those interested in more advanced web scraping and other data wrangling tasks. See the description below.

[DWwP] Data Wrangling with Python - http://shop.oreilly.com/product/0636920032861.do Jacqueline Kazil & Katharine Jarmul

~$30 new + ebook, less for used

Finally, a problem driven book that introduces the Python language as it’s needed to solve these problems. Tons of practical advice and written in a style that matches how this work is really done - lots of trying stuff and partially succeeding and then trying other stuff … (repeat till happy). I believe this is a great way to learn to be an effective programmer and both get useful things done and have fun while doing it.

[DDS] Doing Data Science: Straight Talk from the Frontline - http://shop.oreilly.com/product/0636920028529.do

Cathy O’Neil & Rachel Schutt ~$25

This book is more a collection of chapters written by the authors and various data science practitioners. It’s very readable and full of insights on the practice of data science.

[PCfB] Practical Computing for Biologists - http://practicalcomputing.org/

Steve Haddock & Casey Dunn ~$40-55

I fell in love with this book immediately and found myself wishing that someone would write a similar book for business. It is aimed at scientists who realize that they need to get better at computing to deal with all the data they need to process and analyze. That sounds like many business analysts. It’s Mac and Linux based and is crammed full of useful information on text files, using the command line, regular expressions, shell scripts, Python programming, dealing with image files, relational databases and even working with physical data collection devices.