Table of Contents
The ecdf() function in R is used for constructing empirical cumulative distribution functions, which represent the cumulative probability distribution of a dataset. It takes a numerical vector as an input and outputs a step function that plots the cumulative proportion of data points below each unique value in the vector. This can be useful for visualizing the distribution of a dataset and comparing it to a theoretical distribution. The ecdf() function can also be used to calculate and plot the quantiles of a dataset, making it a useful tool for descriptive statistics and data analysis.
You can use the ecdf function in R to calculate and plot an empirical cumulative distribution function.
Here is the most common way to use this function:
#calculate empirical cumulative distribution function of data p = ecdf(data) #plot empirical cumulative distribution function plot(p)
The following example shows how to use this function in practice.
Example: How to Use ecdf() Function in R
For this example, let’s create a vector of 1,000 random values that follow a :
#make this example reproducible set.seed(1) #create vector of 1,000 random values that follow standard normal distribution data = rnorm(1000) #view first six values in vector head(data) [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
We can use the ecdf function to calculate the empirical cumulative distribution function of this dataset and then use the plot function to visualize it:
#calculate empirical cumulative distribution function of data p = ecdf(data) #plot empirical cumulative distribution function plot(p)
Note that you can also use the xlab, ylab and main arguments within the plot function to add an x-axis label, y-axis label and title to the plot, respectively:
#calculate empirical cumulative distribution function of data p = ecdf(data) #plot empirical cumulative distribution function with axis labels and title plot(p, xlab='x', ylab='CDF', main='CDF of Data')
The x-axis displays the values from the dataset.
The y-axis displays the cumulative distribution function.
Related:
Additional Resources
How to Plot a Normal Distribution in R
A Guide to dnorm, pnorm, qnorm, and rnorm in R
How to Perform a Shapiro-Wilk Test for Normality in R