Pitt Logo LinkContact Us

Pearson's Versus Spearman's and Kendall's Correlation Coefficients for Continuous Data

Chok, Nian Shong (2010) Pearson's Versus Spearman's and Kendall's Correlation Coefficients for Continuous Data. Master's Thesis, University of Pittsburgh.

[img]
Preview
PDF - Primary Text
Download (327Kb) | Preview

    Abstract

    The association between two variables is often of interest in data analysis and methodological research. Pearson's, Spearman's and Kendall's correlation coefficients are the most commonly used measures of monotone association, with the latter two usually suggested for non-normally distributed data. These three correlation coefficients can be represented as the differently weighted averages of the same concordance indicators. The weighting used in the Pearson's correlation coefficient could be preferable for reflecting monotone association in some types of continuous and not necessarily bivariate normal data.In this work, I investigate the intrinsic ability of Pearson's, Spearman's and Kendall's correlation coefficients to affect the statistical power of tests for monotone association in continuous data. This investigation is important in many fields including Public Health, since it can lead to guidelines that help save health research resources by reducing the number of inconclusive studies and enabling design of powerful studies with smaller sample sizes.The statistical power can be affected by both the structure of the employed correlation coefficient and type of a test statistic. Hence, I standardize the comparison of the intrinsic properties of the correlation coefficients by using a permutation test that is applicable to all of them. In the simulation study, I consider four types of continuous bivariate distributions composed of pairs of normal, log-normal, double exponential and t distributions. These distributions enable modeling the scenarios with different degrees of violation of normality with respect to skewness and kurtosis.As a result of the simulation study, I demonstrate that the Pearson's correlation coefficient could offer a substantial improvement in statistical power even for distributions with moderate skewness or excess kurtosis. Nonetheless, because of its known sensitivity to outliers, Pearson's correlation leads to a less powerful statistical test for distributions with extreme skewness or excess of kurtosis (where the datasets with outliers are more likely). In conclusion, the results of my investigation indicate that the Pearson's correlation coefficient could have significant advantages for continuous non-normal data which does not have obvious outliers. Thus, the shape of the distribution should not be a sole reason for not using the Pearson product moment correlation coefficient.


    Share

    Citation/Export:
    Social Networking:

    Details

    Item Type: University of Pittsburgh ETD
    ETD Committee:
    ETD Committee TypeCommittee MemberEmail
    Committee ChairBandos, Andriyanb61@pitt.edu
    Committee MemberVuga, Marikevugam@edc.pitt.edu
    Committee MemberAnderson, Stewartsja@nsabp.pitt.edu
    Title: Pearson's Versus Spearman's and Kendall's Correlation Coefficients for Continuous Data
    Status: Unpublished
    Abstract: The association between two variables is often of interest in data analysis and methodological research. Pearson's, Spearman's and Kendall's correlation coefficients are the most commonly used measures of monotone association, with the latter two usually suggested for non-normally distributed data. These three correlation coefficients can be represented as the differently weighted averages of the same concordance indicators. The weighting used in the Pearson's correlation coefficient could be preferable for reflecting monotone association in some types of continuous and not necessarily bivariate normal data.In this work, I investigate the intrinsic ability of Pearson's, Spearman's and Kendall's correlation coefficients to affect the statistical power of tests for monotone association in continuous data. This investigation is important in many fields including Public Health, since it can lead to guidelines that help save health research resources by reducing the number of inconclusive studies and enabling design of powerful studies with smaller sample sizes.The statistical power can be affected by both the structure of the employed correlation coefficient and type of a test statistic. Hence, I standardize the comparison of the intrinsic properties of the correlation coefficients by using a permutation test that is applicable to all of them. In the simulation study, I consider four types of continuous bivariate distributions composed of pairs of normal, log-normal, double exponential and t distributions. These distributions enable modeling the scenarios with different degrees of violation of normality with respect to skewness and kurtosis.As a result of the simulation study, I demonstrate that the Pearson's correlation coefficient could offer a substantial improvement in statistical power even for distributions with moderate skewness or excess kurtosis. Nonetheless, because of its known sensitivity to outliers, Pearson's correlation leads to a less powerful statistical test for distributions with extreme skewness or excess of kurtosis (where the datasets with outliers are more likely). In conclusion, the results of my investigation indicate that the Pearson's correlation coefficient could have significant advantages for continuous non-normal data which does not have obvious outliers. Thus, the shape of the distribution should not be a sole reason for not using the Pearson product moment correlation coefficient.
    Date: 24 September 2010
    Date Type: Completion
    Defense Date: 26 May 2010
    Approval Date: 24 September 2010
    Submission Date: 09 June 2010
    Access Restriction: No restriction; The work is available for access worldwide immediately.
    Patent pending: No
    Institution: University of Pittsburgh
    Thesis Type: Master's Thesis
    Refereed: Yes
    Degree: MS - Master of Science
    URN: etd-06092010-123415
    Uncontrolled Keywords: Pearson product moment correlation coefficient
    Schools and Programs: Graduate School of Public Health > Biostatistics
    Date Deposited: 10 Nov 2011 14:46
    Last Modified: 13 Jun 2012 15:11
    Other ID: http://etd.library.pitt.edu/ETD/available/etd-06092010-123415/, etd-06092010-123415

    Actions (login required)

    View Item

    Document Downloads