Table of Contents
Data normalization is a process used to standardize data in order to remove any inconsistencies and make it more organized and usable. In Python, data normalization can be achieved through various methods such as scaling, standardization, and feature engineering. Scaling involves transforming numerical data to a specific range, while standardization involves converting data to have a mean of 0 and a standard deviation of 1. Feature engineering involves creating new features or modifying existing ones to better represent the data. These methods can be implemented using various libraries and functions in Python, such as Scikit-learn and Pandas. By normalizing data in Python, it becomes more suitable for analysis and modeling, leading to more accurate and meaningful results.
Normalize Data in Python
Often in statistics and machine learning, we normalize variables such that the range of the values is between 0 and 1.
The most common reason to normalize variables is when we conduct some type of multivariate analysis (i.e. we want to understand the relationship between several predictor variables and a response variable) and we want each variable to contribute equally to the analysis.
When variables are measured at different scales, they often do not contribute equally to the analysis. For example, if the values of one variable range from 0 to 100,000 and the values of another variable range from 0 to 100, the variable with the larger range will be given a larger weight in the analysis.
By normalizing the variables, we can be sure that each variable contributes equally to the analysis.
To normalize the values to be between 0 and 1, we can use the following formula:
xnorm = (xi – xmin) / (xmax – xmin)
where:
- xnorm: The ith normalized value in the dataset
- xi: The ith value in the dataset
- xmax: The minimum value in the dataset
- xmin: The maximum value in the dataset
The following examples show how to normalize one or more variables in Python.
Example 1: Normalize a NumPy Array
The following code shows how to normalize all values in a NumPy array:
import numpy as np #create NumPy array data = np.array([[13, 16, 19, 22, 23, 38, 47, 56, 58, 63, 65, 70, 71]]) #normalize all values in array data_norm = (data - data.min())/ (data.max() - data.min()) #view normalized values data_norm array([[0. , 0.05172414, 0.10344828, 0.15517241, 0.17241379, 0.43103448, 0.5862069 , 0.74137931, 0.77586207, 0.86206897, 0.89655172, 0.98275862, 1. ]])
Each of the values in the normalized array are now between 0 and 1.
Example 2: Normalize All Variables in Pandas DataFrame
The following code shows how to normalize all variables in a pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29], 'assists': [5, 7, 7, 9, 12, 9, 9, 4], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) #normalize values in every column df_norm = (df-df.min())/ (df.max() - df.min()) #view normalized DataFrame df_norm points assists rebounds 0 0.764706 0.125 0.857143 1 0.000000 0.375 0.428571 2 0.176471 0.375 0.714286 3 0.117647 0.625 0.142857 4 0.411765 1.000 0.142857 5 0.647059 0.625 0.000000 6 0.764706 0.625 0.571429 7 1.000000 0.000 1.000000
Each of the values in every column are now between 0 and1.
Example 3: Normalize Specific Variables in Pandas DataFrame
The following code shows how to normalize a specific variables in a pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29], 'assists': [5, 7, 7, 9, 12, 9, 9, 4], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) define columns to normalize x = df.iloc[:,0:2] #normalize values in first two columns only df.iloc[:,0:2] = (x-x.min())/ (x.max() - x.min()) #view normalized DataFrame df points assists rebounds 0 0.764706 0.125 11 1 0.000000 0.375 8 2 0.176471 0.375 10 3 0.117647 0.625 6 4 0.411765 1.000 6 5 0.647059 0.625 5 6 0.764706 0.625 9 7 1.000000 0.000 12
Notice that just the values in the first two columns are normalized.
The following tutorials provide additional information on normalizing data:
Cite this article
stats writer (2024). How can data be normalized in Python?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-data-be-normalized-in-python/
stats writer. "How can data be normalized in Python?." PSYCHOLOGICAL SCALES, 5 May. 2024, https://scales.arabpsychology.com/stats/how-can-data-be-normalized-in-python/.
stats writer. "How can data be normalized in Python?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-data-be-normalized-in-python/.
stats writer (2024) 'How can data be normalized in Python?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-data-be-normalized-in-python/.
[1] stats writer, "How can data be normalized in Python?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, May, 2024.
stats writer. How can data be normalized in Python?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
