There are at least four different ways to create a DataFrame object. That will be the focus of this notebook.
You can create a DataFrame object in any of the following four ways, and possibly other ways as well:
You have already seen numerous examples of this in previous notebooks so it shouldn't be necessary to explain it further. It is worth mentioning, however, that there are other similar methods such as read_excel and read_json that also create DataFrame objects.
You have already seen numerous examples of this approach in previous notebooks, so it also shouldn't be necessary to explain this further.
You saw an example of this in an earlier notebook. Ditto on a further explanation.
You have also seen a few examples of this approach in previous notebooks, but those examples were far from exhaustive from a constructor argument.
A good good source of information on this topic can be found under Object Creation in 10 Minutes to pandas.
The positional constructor arguments are:
The default values for the first four arguments are None. The default value for the last argument is False.
As you can see, many combinations are possible. This notebook will illustrate two approaches that call the DataFrame constructor to create a new DataFrame object.
This approach constructs a DataFrame object by passing an ndarray of random numbers for the data, and passing lists of strings for the index and columns arguments. It accepts the default values for the arguments named dtype and copy.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dta = np.random.randn(6,4)
idx=['Tom','Dick','Harry','Joe','Bill','Albert']
col=['First','Second','Third','Fourth']
dataFrame01 = pd.DataFrame(data=dta,
index=idx,
columns= col)
dataFrame01.head(7)
This approach uses a dictionary object to specify the data and the columns. Then it uses a list to specify the row index. It accepts the default values for the arguments named dtype and copy.
Note that this approach sorts the columns on the basis of column names, which may or may not be what you want.
dataFrame02 = pd.DataFrame({
'Fruits' : ['Peach','Pair','Apple','Orange'],
'Boats' : [1,2,3,4],
'Cars' : [5,6,7,8]},
index=['Tom','Dick','Harry','Bill'])
dataFrame02.head(5)
Author: Prof. Richard G. Baldwin
Affiliation: Professor of Computer Information Technology at Austin Community College in Austin, TX.
File: PandasDataFrame04.html
Revised: 09/02/18/18
Copyright 2018 Richard G. Baldwin