Table of Contents
Merging object and int64 columns involves converting the object column values to the same data type as the int64 column, and then combining the two datasets into a single dataframe or table. This is often done using the pandas.merge() function, which allows for combining two datasets into a single dataframe based on a particular column or set of columns. It is important to note that the column data types must match for this to be successful.
One error you may encounter when using pandas is:
ValueError: You are trying to merge on int64 and object columns.
If you wish to proceed you should use pd.concat
This error occurs when you attempt to merge two pandas DataFrames but the column you’re merging on is an object in one DataFrame and an integer in the other DataFrame.
The following example shows how to fix this error in practice.
How to Reproduce the Error
Suppose we create the following two pandas DataFrames:
import pandas as pd #create DataFrame df1 = pd.DataFrame({'year': [2015, 2016, 2017, 2018, 2019, 2020, 2021], 'sales': [500, 534, 564, 671, 700, 840, 810]}) df2 = pd.DataFrame({'year': ['2015', '2016', '2017', '2018', '2019', '2020', '2021'], 'refunds': [31, 36, 40, 40, 43, 70, 62]}) #view DataFrames print(df1) year sales 0 2015 500 1 2016 534 2 2017 564 3 2018 671 4 2019 700 5 2020 840 6 2021 810 print(df2) year refunds 0 2015 31 1 2016 36 2 2017 40 3 2018 40 4 2019 43 5 2020 70 6 2021 62
Now suppose we attempt to merge the two DataFrames:
#attempt to merge two DataFrames
big_df = df1.merge(df2, on='year', how='left')
ValueError: You are trying to merge on int64 and object columns.
If you wish to proceed you should use pd.concat
We receive a ValueError because the year variable in the first DataFrame is an integer but the year variable in the second DataFrame is an object.
How to Fix the Error
The easiest way to fix this error is to simply convert the year variable in the second DataFrame to an integer and then perform the merge.
The following syntax shows how to do so:
#convert year variable in df2 to integer
df2['year']=df2['year'].astype(int)
#merge two DataFrames
big_df = df1.merge(df2, on='year', how='left')
#view merged DataFrame
big_df
year sales refunds
0 2015 500 31
1 2016 534 36
2 2017 564 40
3 2018 671 40
4 2019 700 43
5 2020 840 70
6 2021 810 62
Notice that we don’t receive any ValueError and we are able to successfully merge the two DataFrames into one.