Wednesday, June 26, 2013

Create empty DataFrame in Pandas

It seems like it should be a simple thing: create an empty DataFrame in the Pandas Python Data Analysis Library. But if you want to create a DataFrame that

  • is empty (has no records)
  • has datatypes
  • has columns in a specific order

...i.e. the equivalent of SQL's CREATE TABLE, then it's not obvious how to do it in Pandas, and I wasn't able to find any one web page that laid it all out. The trick is to use an empty Numpy ndarray in the DataFrame constructor:

df=DataFrame(np.zeros(0,dtype=[
('ProductID', 'i4'),
('ProductName', 'a50')]))

Then, to insert a single record:

df = df.append({'ProductID':1234, 'ProductName':'Widget'})

UPDATE 2013-07-18: Append is missing a parameter:

df = df.append({'ProductID':1234, 'ProductName':'Widget'},ignore_index=True)

1 comment:

Patrick Surry said...

thanks, this was helpful. you can also use pandas columns arg to name the columns, e.g.

columns = ['price', 'item']
pd.DataFrame(data=np.zeros((0,len(columns))), columns=columns)