Chok, Nian Shong (2010) *Pearson's Versus Spearman's and Kendall's Correlation Coefficients for Continuous Data.* Master's Thesis, University of Pittsburgh.

| PDF - Primary Text Download (327Kb) | Preview |

## Abstract

The association between two variables is often of interest in data analysis and methodological research. Pearson's, Spearman's and Kendall's correlation coefficients are the most commonly used measures of monotone association, with the latter two usually suggested for non-normally distributed data. These three correlation coefficients can be represented as the differently weighted averages of the same concordance indicators. The weighting used in the Pearson's correlation coefficient could be preferable for reflecting monotone association in some types of continuous and not necessarily bivariate normal data.In this work, I investigate the intrinsic ability of Pearson's, Spearman's and Kendall's correlation coefficients to affect the statistical power of tests for monotone association in continuous data. This investigation is important in many fields including Public Health, since it can lead to guidelines that help save health research resources by reducing the number of inconclusive studies and enabling design of powerful studies with smaller sample sizes.The statistical power can be affected by both the structure of the employed correlation coefficient and type of a test statistic. Hence, I standardize the comparison of the intrinsic properties of the correlation coefficients by using a permutation test that is applicable to all of them. In the simulation study, I consider four types of continuous bivariate distributions composed of pairs of normal, log-normal, double exponential and t distributions. These distributions enable modeling the scenarios with different degrees of violation of normality with respect to skewness and kurtosis.As a result of the simulation study, I demonstrate that the Pearson's correlation coefficient could offer a substantial improvement in statistical power even for distributions with moderate skewness or excess kurtosis. Nonetheless, because of its known sensitivity to outliers, Pearson's correlation leads to a less powerful statistical test for distributions with extreme skewness or excess of kurtosis (where the datasets with outliers are more likely). In conclusion, the results of my investigation indicate that the Pearson's correlation coefficient could have significant advantages for continuous non-normal data which does not have obvious outliers. Thus, the shape of the distribution should not be a sole reason for not using the Pearson product moment correlation coefficient.

## Share | |

Citation/Export: | |
---|---|

Social Networking: |

## Details | |||||||||||||

Item Type: | University of Pittsburgh ETD | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

ETD Committee: |
| ||||||||||||

Title: | Pearson's Versus Spearman's and Kendall's Correlation Coefficients for Continuous Data | ||||||||||||

Status: | Unpublished | ||||||||||||

Abstract: | The association between two variables is often of interest in data analysis and methodological research. Pearson's, Spearman's and Kendall's correlation coefficients are the most commonly used measures of monotone association, with the latter two usually suggested for non-normally distributed data. These three correlation coefficients can be represented as the differently weighted averages of the same concordance indicators. The weighting used in the Pearson's correlation coefficient could be preferable for reflecting monotone association in some types of continuous and not necessarily bivariate normal data.In this work, I investigate the intrinsic ability of Pearson's, Spearman's and Kendall's correlation coefficients to affect the statistical power of tests for monotone association in continuous data. This investigation is important in many fields including Public Health, since it can lead to guidelines that help save health research resources by reducing the number of inconclusive studies and enabling design of powerful studies with smaller sample sizes.The statistical power can be affected by both the structure of the employed correlation coefficient and type of a test statistic. Hence, I standardize the comparison of the intrinsic properties of the correlation coefficients by using a permutation test that is applicable to all of them. In the simulation study, I consider four types of continuous bivariate distributions composed of pairs of normal, log-normal, double exponential and t distributions. These distributions enable modeling the scenarios with different degrees of violation of normality with respect to skewness and kurtosis.As a result of the simulation study, I demonstrate that the Pearson's correlation coefficient could offer a substantial improvement in statistical power even for distributions with moderate skewness or excess kurtosis. Nonetheless, because of its known sensitivity to outliers, Pearson's correlation leads to a less powerful statistical test for distributions with extreme skewness or excess of kurtosis (where the datasets with outliers are more likely). In conclusion, the results of my investigation indicate that the Pearson's correlation coefficient could have significant advantages for continuous non-normal data which does not have obvious outliers. Thus, the shape of the distribution should not be a sole reason for not using the Pearson product moment correlation coefficient. | ||||||||||||

Date: | 24 September 2010 | ||||||||||||

Date Type: | Completion | ||||||||||||

Defense Date: | 26 May 2010 | ||||||||||||

Approval Date: | 24 September 2010 | ||||||||||||

Submission Date: | 09 June 2010 | ||||||||||||

Access Restriction: | No restriction; Release the ETD for access worldwide immediately. | ||||||||||||

Patent pending: | No | ||||||||||||

Institution: | University of Pittsburgh | ||||||||||||

Thesis Type: | Master's Thesis | ||||||||||||

Refereed: | Yes | ||||||||||||

Degree: | MS - Master of Science | ||||||||||||

URN: | etd-06092010-123415 | ||||||||||||

Uncontrolled Keywords: | Pearson product moment correlation coefficient | ||||||||||||

Schools and Programs: | Graduate School of Public Health > Biostatistics | ||||||||||||

Date Deposited: | 10 Nov 2011 14:46 | ||||||||||||

Last Modified: | 13 Jun 2012 15:11 | ||||||||||||

Other ID: | http://etd.library.pitt.edu/ETD/available/etd-06092010-123415/, etd-06092010-123415 |

### Actions (login required)

View Item |