How to create a DataFrame from dict with different lengths?

Creating a DataFrame from a dictionary with different lengths is possible by using the DataFrame.from_dict() method. This method can take a dictionary with different lengths and convert it into a DataFrame, where the keys of the dictionary are used as the column names and the values of the dictionary are converted into row entries. This method is especially useful when dealing with unstructured data since it allows for quick and easy conversion of data into a DataFrame with a well defined structure.


You can use the following basic syntax to create a pandas DataFrame from a dictionary whose entries have different lengths:

import pandas as pd

df = pd.DataFrame(dict([(key, pd.Series(value)) for key, value in some_dict.items()]))

This syntax converts a list of arrays in the dictionary into a list of pandas Series.

This allows us to create a pandas DataFrame and simply fill in NaN values to ensure that each column in the resulting DataFrame is the same length.

The following example shows how to use this syntax in practice.

Example: Create Pandas DataFrame from dict with Different Lengths

Suppose we have the following dictionary that contains entries with different lengths:

#create dictionary whose entries have different lengths
some_dict = dict(A=[2, 5, 5, 7, 8], B=[9, 3], C=[4, 4, 2])

#view dictionary
print(some_dict)

{'A': [2, 5, 5, 7, 8], 'B': [9, 3], 'C': [4, 4, 2]}

If we attempt to use the from_dict() function to convert this dictionary into a pandas DataFrame, we’ll receive an error:

import pandas as pd

#attempt to create pandas DataFrame from dictionary
df = pd.DataFrame.from_dict(some_dict)

ValueError: All arrays must be of the same length

We receive an error that tells us all arrays in the dictionary must have the same length.

To get around this error, we can use the following syntax to convert the dictionary into a DataFrame:

import pandas as pd

#create pandas DataFrame from dictionary
df = pd.DataFrame(dict([(key, pd.Series(value)) for key, value in some_dict.items()]))

#view DataFrame
print(df)

   A    B    C
0  2  9.0  4.0
1  5  3.0  4.0
2  5  NaN  2.0
3  7  NaN  NaN
4  8  NaN  NaN

Notice that we’re able to successfully create a pandas DataFrame and NaN values are filled in to ensure that each column is the same length.

If you would like to replace these NaN values with other values (such as zero), you can use the replace() function as follows:

#replace all NaNs with zeros
df.replace(np.nan, 0, inplace=True)

#view updated DataFrame
print(df)

   A    B    C
0  2  9.0  4.0
1  5  3.0  4.0
2  5  0.0  2.0
3  7  0.0  0.0
4  8  0.0  0.0

Notice that each NaN value has been replaced with zero.

x