How to Normalize Data Between -1 and 1

Normalizing data between -1 and 1 is a common practice in data science in order to ensure that all features are on the same scale and can be easily compared. This can be done by subtracting the feature’s mean from each sample and dividing by the standard deviation. This ensures that all values are between -1 and 1. This process also helps to reduce the effects of outliers on the data.


To normalize the values in a dataset to be between -1 and 1, you can use the following formula:

zi = 2 * ((xi – xmin) / (xmax – xmin)) – 1

where:

  • zi: The ith normalized value in the dataset
  • xiThe ith value in the dataset
  • xmin: The minimum value in the dataset
  • xmax: The maximum value in the dataset

For example, suppose we have the following dataset:

The minimum value in the dataset is 13 and the maximum value is 71.

To normalize the first value of 13, we would apply the formula shared earlier:

  • zi = 2 * ((xi – xmin) / (xmax – xmin)) – 1 = 2 * ((13 – 13) / (71 – 13)) – 1 = -1

To normalize the second value of 16, we would use the same formula:

  • zi = 2 * ((xi – xmin) / (xmax – xmin)) – 1 = 2 * ((16 – 13) / (71 – 13)) – 1 = -0.897

To normalize the third value of 19, we would use the same formula:

  • zi = 2 * ((xi – xmin) / (xmax – xmin)) – 1 = 2 * ((19 – 13) / (71 – 13)) – 1 = -0.793

We can use this exact same formula to normalize each value in the original dataset to be between -1 and 1:

Each value in the normalized dataset is now between -1 and 1.

Using this normalization method, the following statements will always be true:

  • The normalized value for the minimum value in the dataset will always be -1.
  • The normalized value for the maximum value in the dataset will always be 1.
  • The normalized values for all other values in the dataset will be between -1 and 1.

When to Normalize Data

Often we normalize variables when performing some type of analysis in which we have multiple variables that are measured on different scales and we want each of the variables to have the same range.

This prevents one variable from being too influential, especially if it’s measured in different units (i.e. if one variable is measured in inches and another is measured in yards).

Also note that the normalization method we used here is only one possible option.

In some cases, it makes sense to instead normalize variables between 0 and 1 or even between 0 and 100.

The following tutorials explain how to perform other types of normalization:

 

x