Overview

Table of contents

Welcome

Welcome to the courseware for ITSE-1302 Computer Programming: Scientific Python 1 at Austin Community College in Austin, TX.

The college website for this course is: http://www.austincc.edu/baldwin/ .

Course description

Official description

As of April 2018, the catalog description for this course reads as follows:

This course is an introduction to (scientific) computer programming including design, development, testing, implementation, and documentation. It may include: Python Fundamentals, functions, data structures, classes, objects, statistical programming, data visualization with MatPlotLib and Programming with the NumPy library.

Note that the word (scientific) was inadvertently omitted from the catalog description.

Requisites: COSC 1336 or departmental approval. - Recommended prior to taking this course, but is not required.

Note, this is not a programming course for beginners. If your knowledge of Python programming is not at or above a level consistent with the successful completion of COSC 1336, you may have difficulty succeeding in this course.

Practical description

As a practical matter, this course assumes that you are already skilled in many of the items listed in the official course description. This course and the two follow-on courses are designed to take you to the next level by helping you learn how to use the various components of the Python scientific computing ecosystem, which is described in the Scipy Lecture Notes.

The "how" and the "why" of data science

The purpose of this course and two follow-on courses is to teach you how to work as a programmer in the technology area known generally as data science and analytics. Note however that these courses are designed to teach you the "how" but not the "why" of data science and analytics. For example, this course will teach you how to create a histogram for a dataset but won't necessarily teach you why you might need to create a histogram for the dataset.

In order to be hired as a data science programmer, you will probably also need to know the "why" in addition to the "how" and it will be your responsibility to learn the "why" on your own. The best way for you to learn the "why" will be to read every online article and watch every online video that you can find on data science and analytics. (The section titled Data science links contains links to many useful online data science resources. An Internet search will undoubtedly reveal other useful resources that are not included in that list. For example, an Internet search will reveal numerous websites containing data science interview questions.)  The webpage at Awesome Data Science - Motivation also provides a long list of links to data science resources.

In order to be hired as a data science programmer, you will probably also need to create a portfolio that demonstrates your knowledge of the how and why of data science. For that, you will need some data. The web page at Awesome Data Science - Data Sets provides links to many data sets, some of which are freely available.

Course structure

No conventional textbook

This course does not use a conventional paper or electronic textbook. Instead, this online study guide and the Blackboard learning management system will guide you through a variety of free online resources on topics that you will need to learn in order to succeed in the course. Some of those resources will be freely available with no requirement to register or enroll, such as the video tutorials at Graphing in Python with Matplotlib. Other resources will also be free but will require you to register and/or enroll such as the Udacity statistics course titled Intro to Descriptive Statistics..

This study guide will not knowingly recommend a resource for which you are required to pay a fee. If you find a recommended resource for which there is a required fee, please notify your instructor so that the recommendation for that resource can be removed from the study guide and replaced by another free resource.

Online resources

As you can see from the list of useful resources in the section titled Data science links, there is no shortage of online resources available in this area. You will probably have more difficulty deciding which resources you should concentrate on than you will have in finding resources on which to concentrate. This study guide will make recommendations for online resources in each competency area, but ultimately it will be up to you to choose and concentrate on the resources that best fit your learning style (video versus text for example).

This study guide will also provide a large number of sample programs and exercises for each competency.

In addition to the sample programs and exercises provided for each competency, recommended resources from edX and Udacity will also provide many quizzes and programming exercises to help you learn and retain the material.

Four major units

The course is structured into four major units -- one review unit and three competency units. Assessments such as assignments. quizzes, and tests will be administered through Blackboard. Some of the free online resources will also include graded assessments such as programming exercises and tests. You are encouraged to take advantage of those programming exercises and tests to enhance your ability to learn and retain the material. However, grades and credits associated with those resources will not be integrated into your grade for this course. Your grade for this course will be based solely on your grades on assignments and tests administered by your ACC instructor through Blackboard.

The four units included in this course are:

The Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

You won't be expected to install and use a Jupyter Notebook for any of your programming efforts in this course. However, many of the online resources, including some of the resources developed specifically for this course, are presented in the format of a Jupyter Notebook. Therefore, you will need to know what it is and how to interpret what you are seeing to understand those resources. A quick (approximately 19 minutes) tour of the following three videos should provide that knowledge.

Installing and running Scientific Python

Installation

See Installing packages for information on getting the Python Scientific Computing Ecosystem up and running on your computer.

In order to run the sample code provided in this study guide, you will need to either build your own homebrew system, or download and install a Python distribution that supports the Python Scientific Computing Ecosystem such as  Anaconda or WinPython. To get up and running quickly, I recommend downloading and installing one of the distributions listed under Installing packages. However, there are also some advantages to building your own homebrew system.

Of the distributions listed on that page, for those students running Windows, I recommend WinPython. My main reason for recommending WinPython is that it doesn't actually require a Windows installation. You simply need to unzip the distribution onto a hard disk, USB drive, or even a USB memory stick to get up and running. Among other things, this means that you can carry the USB drive or USB memory stick into an ACC lab, plug it in, and start running without having to depend on lab personnel to install the correct Python distribution in the lab. You can also update to new versions when they are released without depending on lab personnel to install the new versions.

As of December 2017, I am running WinPython-32bit-3.6.2.0Qt5 very successfully on a USB drive named M:. I downloaded, unzipped, and copied the distribution into a folder on that drive named ProgramFilesWithNoSpaces.

I routinely swap the USB drive among the computer in my office at the college, the desktop computer at my house, and a laptop computer that I use for a variety of purposes in a variety of locations. I am able to start a new Jupyter Notebook project on any of those computers, swap to a different computer, and continue working on the project in a completely seamless manner.

I have also copied the distribution to a USB memory stick and have successfully run both Jupyter Notebook and stand-alone python from there as well. The only change required was to point to the copy of the distribution on the memory stick (disk N:) instead of the copy on the USB drive (disk M:). (As might be expected, execution was somewhat slower on the memory stick than on the USB drive.)

Running Jupyter Notebook

To start Jupyter Notebook, I execute the following command at a command prompt:

"M:\ProgramFilesWithNoSpaces\WinPython-32bit-3.6.2.0Qt5\Jupyter Notebook.exe"

Running stand-alone Python

To run Python in a stand-alone mode, I execute a batch file containing the following commands in the folder containing the file named Proj.py (where Proj.py is the name of the Python script that I need to run):

echo off
path=%path%;"M:\ProgramFilesWithNoSpaces\WinPython-32bit-3.6.2.0Qt5\python-3.6.2"
python Proj01.py
pause

File and directory structure

The top-level file and directory structure of WinPython-32bit-3.6.2.0Qt5 on my machine is as follows:

<DIR> notebooks
<DIR> python-3.6.2
<DIR> scripts
<DIR> settings
<DIR> tools

IDLEX (Python GUI).exe
IPython Qt Console.exe
Jupyter Lab.exe
Jupyter Notebook.exe
Qt Designer.exe
Spyder reset.exe
Spyder.exe
WinPython Command Prompt.exe
WinPython Control Panel.exe
WinPython Interpreter.exe
WinPython Powershell Prompt.exe

I mention this here mainly to point out the folder named notebooks. As you learned in the above videos, when you start Jupyter Notebook, you see a file management system in the Jupyter Notebook home page. By default, that file management system is rooted in the notebooks folder. In other words, when you create a new document using that file management system, you are creating a file with an extension of ipynb in a tree structure rooted in the folder named notebooks. If you upgrade WinPython to a new version, or upgrade from a 32-bit version to a 64-bit version, you can simply copy the folder named notebooks into your new version to carry previous work forward into the new version.

Jupyter Notebook has an automatic backup feature. However, you will probably also want to include the notebooks folder in your normal daily, weekly, or periodic backup routine once you actually start creating documents in Jupyter Notebook.

Documentation

As is the case with most modern library-oriented programming languages, you will probably need to make frequent reference to the library documentation. Matplotlib documentation is located here. Numpy and Scipy documentation is located here.

For example, this is an example of Matplotlib documentation. You can search the Matplotlib documentation here. A Matplotlib documentation index, which includes a "Quick search" box, is located here.

Data science links

This section contains links to useful online data science resources.

Housekeeping material

Author: Prof. Richard G. Baldwin
Affiliation: Professor of Computer Information Technology at Austin Community College in Austin, TX.
File: Overview.htm
Revised: 04/24/18
Copyright 2018 Richard G. Baldwin

-end-