Kendall’s Tau

How to Calculate and Interpret Kendall’s Tau for Ranking Data

Kendall’s Tau, often referred to as the Kendall rank correlation coefficient, stands as a fundamental non-parametric measure in statistics. Its primary function is to quantify the strength and determine the direction of the association between two paired variables. Unlike its counterpart, Pearson’s correlation, which requires assumptions of normality and linearity, Kendall’s Tau operates independently of assumptions regarding the underlying data distribution, making it exceptionally robust. This statistical technique is indispensable when dealing with variables that are measured on an ordinal data scale or when analyzing continuous data that deviates significantly from a normal distribution, perhaps due to severe skewness or the presence of outliers.

The resulting Tau coefficient is interpreted based on a range spanning from -1 to +1. A value of 0 signifies absolutely no association between the two variables. Conversely, values approaching +1 indicate a strong, positive, and concordant relationship—meaning the ranks of both variables increase together. Values nearing -1 suggest a strong, negative, and discordant relationship, where an increase in the rank of one variable corresponds to a decrease in the rank of the other. Because of its reliance on ranks rather than absolute magnitudes, Kendall’s Tau provides a nuanced understanding of concordance and discordance within the dataset. Its applicability spans diverse domains, including the social sciences, psychological research, finance, and environmental studies, where variables are often inherently ranked or violate strict parametric assumptions.


Understanding the Kendall Rank Correlation Coefficient

The primary role of Kendall’s Tau is to assess the degree of correspondence between the rankings of observations when comparing two different variables. It is fundamentally concerned with identifying whether pairs of observations are concordant or discordant. A pair is considered concordant if the ranking for the first variable agrees with the ranking for the second variable across the pair. Conversely, a pair is discordant if the rankings disagree. The calculation essentially quantifies the difference between the number of concordant pairs and the number of discordant pairs, normalized by the total number of possible pairs.

Statistically, Kendall’s Tau (specifically Tau-b, the most commonly used variant) calculates the difference between the probability that the two variables are in the same order and the probability that they are in different orders. This methodology makes it exceptionally useful for scenarios where the relationship between the variables is expected to be monotonic—meaning the relationship proceeds consistently in one general direction, though not necessarily linearly. This concept of monotonicity is critical, ensuring that as the ranks of one variable increase, the ranks of the other variable reliably follow the same trend (either upward or downward). The Tau value provides a clear, interpretable measure of this directional consistency.

While Pearson’s correlation coefficient measures the strength of a linear relationship, Kendall’s Tau measures the strength of the monotonic association. This subtle yet significant difference is why Tau is the preferred tool when analyzing data that is ranked (like preference scores or satisfaction scales) or data that is continuous but exhibits significant non-normality or sensitivity to outliers. It provides a robust and mathematically sound measure of association derived purely from rank order information, preserving the directional relationship while minimizing the impact of measurement deviations.

Kendall's Tau measures the relationship between two variables when one or more of the variables is ordinal, non-linear, skewed, or has outliers.

For reference, Kendall’s Tau is also frequently referred to by the following names: the Kendall rank correlation coefficient, or more specifically, Kendall’s tau-b.


Critical Assumptions Governing Kendall’s Tau Analysis

Every statistical procedure operates within a framework of assumptions. Although Kendall’s Tau is classified as a non-parametric measure, which grants it freedom from strict requirements like normal distribution, it is not entirely assumption-free. Ensuring that the underlying data structure satisfies its core requirements is essential for guaranteeing that the results derived from the statistical procedure are accurate, reliable, and scientifically interpretable. Failure to meet these assumptions can lead to invalid statistical inference regarding the population relationship.

The crucial assumptions that must be satisfied prior to running a Kendall’s Tau correlation involve the level of measurement for the variables and the general form of their relationship:

  1. The variables must be measurable on either a continuous or ordinal scale.
  2. The relationship between the variables must exhibit monotonicity.

These conditions define the boundaries within which Kendall’s Tau provides a superior estimation of correlation compared to other measures. Researchers must rigorously assess their data against these criteria before proceeding with the analysis to confirm the technique’s suitability.

Data Measurement: Continuous or Ordinal Variables

The first and most critical assumption dictates the level of measurement for the variables under investigation. Both variables must be measured at least at the ordinal data level. An ordinal variable is characterized by categories that possess a meaningful, intrinsic order or sequence, such as levels of agreement (e.g., Strongly Agree, Neutral, Strongly Disagree), or socioeconomic groupings. The primary feature of ordinal data is that while we know the order, the intervals or distances between the categories are not assumed to be equal or known.

Data measured at the continuous level (interval or ratio scales) is also perfectly acceptable. Continuous variables can theoretically take on any value within a given range, such as exact reaction times, body mass index, or standardized test scores. Kendall’s Tau becomes an especially valuable alternative to Pearson’s correlation when dealing with continuous data, particularly when the data distribution is severely non-normal or highly influenced by extreme scores or outliers. This robustness against extreme values is rooted in its reliance on ranks rather than the absolute magnitude of the scores, which prevents a single extreme observation from unduly skewing the entire coefficient.

The Requirement of a Monotonic Relationship

The second core assumption demands that the relationship between your two variables must be monotonic. Monotonicity implies a consistent direction in the association between the variables, even if the relationship is not perfectly straight (linear). This consistency means that as the values of one variable generally increase, the values of the second variable must either consistently increase (a positive monotonic relationship) or consistently decrease (a negative monotonic relationship). It is this general consistency in direction that Kendall’s Tau is designed to quantify.

Visually, plotting the scores of the two variables on a scatterplot helps confirm this assumption. In a positive monotonic relationship, the data points would consistently trend upwards and to the right. While the path might curve or flatten slightly, the overall trajectory must be maintained. If the relationship were non-monotonic—for example, if a variable increased to a peak and then began to decrease (an inverted U-shape)—the fundamental assumption of consistent direction would be violated, rendering the resulting Tau coefficient unreliable as a summary measure of association.

Monotonicity means that as one variable increases or decreases on average, so does the other.

Verifying monotonicity is a crucial preliminary step, typically executed through graphical inspection. If the data visually confirms this consistent directional trend, the use of Kendall’s Tau is justified for measuring the correlation strength. If the relationship appears complex or non-directional, alternative measures of association or non-linear modeling techniques should be explored.


Strategic Scenarios for Employing Kendall’s Tau

Choosing the correct statistical test is paramount in generating trustworthy research findings. Kendall’s Tau should be the statistical tool of choice when the research design places emphasis on the relationship between two specific variables, and when the characteristics of the data preclude the use of strict parametric tests like Pearson’s correlation. This often occurs when dealing with non-normally distributed data or data based purely on rankings, particularly in fields such as psychology, education, and market research.

Researchers should prioritize the use of Kendall’s Tau in the following three key scenarios, which summarize its unique advantages over other correlation methods:

  1. The research objective is explicitly to understand the nature and strength of the relationship (correlation) between two variables.
  2. The variables of interest are classified as ordinal or are continuous data severely compromised by outliers.
  3. The analysis involves only two paired variables, focusing solely on their bivariate relationship.

Adhering to these criteria ensures that the statistical method aligns precisely with the research question and maximizes the validity of the correlation estimate.

Focus on Correlation and Association

If your primary goal is to determine how two variables move together—that is, whether they are associated, and how strongly—then correlation analysis is necessary. Correlation measures association, which is conceptually distinct from other common analytical goals. For example, testing for a difference involves comparing central tendencies across groups, while prediction aims to forecast one variable’s value based on others. Kendall’s Tau is strictly an associational statistic. It seeks to answer the fundamental question: to what extent does knowing the rank of Variable A correspond with the rank of Variable B?

The resulting Tau value provides a direct, quantifiable estimate of this rank association, allowing researchers to formally test the hypothesis that the variables are related in a population. Because it is calculated based on concordant and discordant pairs, it gives a clear indication of how much agreement exists between the relative ordering of the two measures, regardless of whether that ordering is positive or negative.

Data Type Specificity: Ordinality and Outlier Management

As previously established, the strength of Kendall’s Tau stems from its ability to handle ordinal data effectively. Data based on ordered categories, such as levels of pain or ranking political candidates, are perfectly suited for this measure. However, its utility is perhaps most critical in continuous data analysis where assumptions for parametric statistics are violated, specifically due to the presence of statistical outliers.

When continuous variables contain severe outliers, using Pearson correlation can result in a highly inflated or misleading coefficient, as Pearson’s formula is highly sensitive to the magnitude of these extreme scores. By converting the data to ranks, Kendall’s Tau effectively mitigates the disproportionate influence of these extremes, offering a more stable and reliable estimate of the underlying population relationship. This makes Tau an invaluable tool in datasets derived from real-world observations, where minor measurement errors or naturally occurring extreme values are common.

Choosing the appropriate correlation method is crucial. If your data is continuous and exhibits normality without severe outliers, the recommended approach is to use Pearson Correlation. Should one variable be continuous and the other a simple dichotomy (binary), the Point Biserial Correlation is the correct method. Finally, for two strictly nominal or categorical variables (where there is no inherent order), measures such as the Phi Coefficient or Cramer’s V are more appropriate.

Limitation to Bivariate Analysis

It is fundamental to the definition of Kendall’s Tau that it is strictly designed for bivariate analysis—meaning it can only be used to evaluate the correlation between exactly two variables at one time. The coefficient summarizes the pairwise association. While researchers can calculate a matrix of multiple pairwise Tau coefficients for a larger dataset, the core statistical operation itself remains confined to assessing the relationship between a single pair (Variable X and Variable Y). Researchers needing to explore complex relationships involving three or more variables simultaneously (such as examining partial correlation, where the effect of a third variable is controlled for) would need to employ extensions or alternative multivariate statistical models.


Illustrative Example of Kendall’s Tau Application

To concretely understand the application of this measure, consider a research scenario investigating socioeconomic factors:

Variable 1: Average Hours Worked per Week.
Variable 2: Annual Personal Income.

The objective of this study is to examine the relationship between the dedication of time to work and the resulting financial remuneration across a defined population sample. We are interested in determining if individuals who report higher hours worked generally report higher income, thus establishing a positive monotonic relationship. Data is collected from a representative group of participants covering both variables.

In most realistic economic datasets, variables like “Annual Income” are notoriously non-normally distributed; they tend to be heavily skewed, often containing significant extreme outliers (the exceptionally high earners). Because of this typical violation of the normality assumption required for Pearson correlation, Kendall’s Tau is chosen as the appropriate, robust analytical technique. After confirming the monotonic relationship via visual inspection, we calculate the Tau coefficient using the ranks of the collected data points.

The output of the analysis yields two primary values essential for interpretation: the Tau coefficient itself and the associated p-value. The Tau value, ranging from -1 to 1, indicates the nature and strength of the relationship. A positive Tau (e.g., +0.35) suggests a weak-to-moderate, positive relationship, implying that individuals ranked higher in hours worked generally also rank higher in income. The magnitude of Tau reflects the overall proportion of agreement in ranking across all possible pairs.

Crucially, the p-value determines the statistical significance of this observed correlation. The p-value represents the probability of observing a correlation coefficient as extreme as the one calculated (or more extreme) if, in reality, there was absolutely no relationship between hours worked and income in the population. Following conventional statistical standards, if the p-value is less than or equal to the significance level (commonly set at $alpha=0.05$), we reject the null hypothesis and conclude that the result is statistically significant, meaning the observed association is unlikely to be due to random chance alone.

Leave a Reply

Slide Up
x
Scroll to Top