Intro to PCDA course¶
Welcome to my Practical Computing for Data Analytics (PCDA) class. We’ll do several things the first week of class:
overview of the field of business analytics / data science
course overview and logistics
get some hands on experience with some of the technology we’ll use in the course
start to learn how to use the Linux shell for basic file management and putting together Linux commands to accomplish simple analytical tasks
Objectives¶
Through this module you will:
explore the syllabus and course web sites so that you know how this course will operate,
have had a preview of some of the types of things you’ll learn and the activities you’ll do in this course,
have begun to get hands on experience with some of the technical computing tools used in this course,
be ready to learn all kinds of cool business analytics things.
Readings¶
We’ll start using the Linux bash shell during the first week of class. So, might as well get going on learning the basics.
For now, read Section 1 of the Software Carpentry tutorial entitled: The Unix Shell. In Week 2 we’ll be learning the things covered in Sections 1-4 so feel free to skim those if you want to get a head start.
See the Explore section below for additional Linux shell related resources.
Downloads¶
Download_Session01_Intro.zip - just a zip version of the same compressed archive
There will always be one or more “Download” files for each class. It is a compressed archive containing all the files we’ll need for the session. In the Windows world, this would usually be a .zip
file. However, in the Linux world, we often use “gzipped tarballs” which will have a .tar.gz
extension. We’ll extract these in our Linux virtual machine (as part of our Week 1 intro), though you can certainly extract these files in Windows as well using the free utility 7-Zip.
Activities¶
Note
Our SBA web server has some issues that sometimes leadto problems loading our course webpages or my faculty home page. If this happens, you can usually fix theproblem by clearing your browser cache and reloading the page.
Or, you can use one of the alternative links - https://pcda.misken.org or https://mis5470.netlify.app.
Overview of pcda class¶
I’ll present an overview of this class as well as the general topic of data science / business analytics.
Warning
If you are using the VM, do NOT watch the screencasts from within the pcda VM. Watch them from a browser opened in your host OS (i.e. Windows or Mac).
Class logistics¶
Between the “Course welcome video” and the “Week 1 Welcome Video” (both available via Moodle), all of the course logistics are covered. So, if you haven’t watched these yet, please do so ASAP. Also, read the syllabus carefully (again, Moodle). Finally, review the first two Announcements I made in Moodle.
The pcda computing appliance¶
We’ll discuss things which led to the pcda appliance:
why’s and what’s of Linux
why’s and what’s of R and Python
open source facilitates contributed packages with latest and greatest statistical techniques, bug fixes, domain specific tools, etc.
free, like speech and like beer
efficiency of command line and scripts vs GUI
reproducible analysis/research
You should go through (if you haven’t already) the screencasts and instructions on the pcda VM page that covers installation and an overview of VirtualBox and the Lubuntu desktop. The screencasts below are from Fall 2020 but nothing has changed except the name of the VM.
SCREENCAST: Intro to the pcda VM (11:36)
Preview of data science with R and R Studio¶
You’ll get your first peek at these tools and get a preview of a typical analysis project involving building and comparing predictive models. This will serve as a preview of much of what this course is about.
Preview of Python and Anaconda¶
We’ll just do a quick look so that those who are curious can start to tinker around. We’ll be learning Python later in the semester.
SCREENCAST: Preview of Python (10:59)
Explore (OPTIONAL)¶
A few more Linux shell tutorials that I’ve found useful are:
Note
In the “Learn Enough Command Line to be Dangerous…” tutorial, there are two nice boxes describing the “magic of computers” and “technical sophistication”. READ THEM.
This section will typically have links related to the topic, … or not. Have fun exploring and learning more.
Are “super nerds” killing baseball? This article raise some thought provoking issues about analytics in sports.
Hurricane models it’s not just one model
Advice for constructing an online portfolio for analytics job seekers - Q&A on Quora. Another thread on Quora discussed the types of classes one might take to learn data science.
Getting started in data science
Short blog post. No hype. Good advice. For another dose of advice, check out this podcast from TalkPython on paths to a data science career