This notebook will illustrate various operations that you can perform on a DataFrame object.
This dataset was downloaded from Kaggle at https://www.kaggle.com/szrlee/stock-time-series-20050101-to-20171231
It was listed as DJIA 30 Stock Time Series. The dataset contains price information for a particular stock on the NASDAQ exchange for 3019 trading days.
import pandas as pd
%matplotlib inline
TestData01 = pd.read_csv('data/AABA_2006-01-01_to_2018-01-01.csv',
index_col='Date')
The CSV file specified in the above statement was stored in a subdirectory named 'data'.
Examine the first 3 records in the dataset.
TestData01.head(3)
Examine the last three records in the dataset.
TestData01.tail(3)
The dataset contains 3019 individual records beginning in January 2006 and ending in December 2017.
The stock prices for Open, High, Low, and Close for each day are shown in the columns with the corresponding names.
The trading date for each record is shown as the row index.
TestData01 = TestData01.loc[:,'Open':'Close']
TestData01.head(3)
TestData01.plot(figsize = (7,3))
The following code will divide each of the numeric values in the dataset by a factor of two.
TestData02 = TestData01/2
TestData02.head(3)
TestData02.plot(figsize = (7,3))
TestData03 = TestData02.loc[:]
TestData03['Open']+=5
TestData03['Close']+=10
TestData03['High']+=15
TestData03.head(3)
TestData03.plot(figsize = (7,3))
TestData04 = TestData03[:]
TestData04['High']+=5
TestData04.head(3)
TestData04.plot(figsize = (7,3))
TestData10 = TestData04.loc[:,'High':'Low']
TestData10.head(3)
TestData10.plot(figsize = (7,3))
meanSeries = TestData10.mean(axis=1)
meanSeries.name = 'Mean'
TestData11 = pd.concat([TestData10, meanSeries], axis=1)
TestData11.head(3)
TestData12 = TestData11.loc['2006-01-03':'2017-12-04','High':'Mean']
TestData12.plot(figsize = (7,3))
The following code writes the contents of the DataFrame named TestData12 into an output CSV file named csvOutput.csv. Then it loads the data from the file twice, once without specifying the index and again by specifying the index. Then it plots the data in the DataFrame object to confirm that it matches the plot shown above.
TestData12.to_csv('csvOutput.csv')
TestData13 = pd.read_csv('csvOutput.csv')
TestData13.head(3)
TestData12.to_csv('csvOutput.csv')
TestData13 = pd.read_csv('csvOutput.csv',index_col='Date')
TestData13.head(3)
TestData13.plot(figsize = (7,3))
The following code plots the contents of the DataFrame object in the form of a histogram, showing the three overlapping distributions.
TestData13.plot(kind ='hist',bins=100,figsize = (7,3))
The following code plots the contents of the DataFrame object in kde format, showing the three overlapping distributions.
TestData13.plot(kind ='kde',figsize = (7,3))
Author: Prof. Richard G. Baldwin
Affiliation: Professor of Computer Information Technology at Austin Community College in Austin, TX.
File: PandasDataFrame02.html
Revised: 09/02/18
Copyright 2018 Richard G. Baldwin