Table of Contents
Specifying the dtype when importing a CSV file can help improve the performance of the resulting dataframe. It can also help ensure that data is imported correctly, as the type of data is specified and not inferred automatically. This can be done by adding the ‘dtype’ argument to the ‘pd.read_csv()’ function, and passing in a dictionary of the column names and their corresponding data types.
You can use the following basic syntax to specify the dtype of each column in a DataFrame when importing a CSV file into pandas:
df = pd.read_csv('my_data.csv', dtype = {'col1': str, 'col2': float, 'col3': int})
The dtype argument specifies the data type that each column should have when importing the CSV file into a pandas DataFrame.
The following example shows how to use this syntax in practice.
Example: Specify dtypes when Importing CSV File into Pandas
Suppose we have the following CSV file called basketball_data.csv:
If we import the CSV file using the read_csv() function, pandas will attempt to identify the data type for each column automatically:
import pandas as pd #import CSV file df = pd.read_csv('basketball_data.csv') #view resulting DataFrame print(df) A 22 10 0 B 14 9 1 C 29 6 2 D 30 2 3 E 22 9 4 F 31 10 #view data type of each column print(df.dtypes) team object points int64 rebounds int64 dtype: object
From the output we can see that the columns in the DataFrame have the following data types:
- team: object
- points: int64
- rebounds: int64
However, we can use the dtype argument within the read_csv() function to specify the data types that each column should have:
import pandas as pd #import CSV file and specify dtype of each column df = pd.read_csv('basketball_data.csv', dtype = {'team': str, 'points': float, 'rebounds': int})) #view resulting DataFrame print(df) A 22 10 0 B 14 9 1 C 29 6 2 D 30 2 3 E 22 9 4 F 31 10 #view data type of each column print(df.dtypes) team object points float64 rebounds int32 dtype: object
From the output we can see that the columns in the DataFrame have the following data types:
- team: object
- points: float64
- rebounds: int32
These data types match the ones that we specified using the dtype argument.
Note that in this example, we specified the dtype for each column in the DataFrame.
Note: You can find the complete documentation for the pandas read_csv() function .