Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Mixtures of discrete and continuous variables: considerations for dimension reduction

Pleis, John (2018) Mixtures of discrete and continuous variables: considerations for dimension reduction. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img] PDF
Submitted Version
Restricted to University of Pittsburgh users only until August 2019.

Download (14MB) | Request a Copy

Abstract

For this dissertation, we will examine mixtures of different types of data, the analytic challenges that such data can present, and some approaches for addressing this issue. Specifically, we will
consider mixtures of continuous and discrete data. For the theoretical developments that follow, we will focus on the general location model (GLOM)-based methodology for deriving the joint probability distribution of continuous and discrete random variables as the product of conditional and marginal probability distributions. As we will show, the general specification of this joint distribution is a finite mixture of Gaussian distributions. We will consider both the univariate and multivariate cases. For the univariate case we will first determine the distribution of the sample variance, and for the multivariate case we will first determine the distribution of the sample covariance matrix. When the component distributions of the mixture have different variances (univariate) or covariance matrices (multivariate), any analysis can become more challenging. In such cases, we propose approximating the mixture density with a non-mixture density from the same parametric family (e.g., multivariate Gaussian). Finally, we will present some extensions of this work to the field of dimension reduction.

Public Health Significance: Mixtures of continuous and discrete variables are somewhat common in public health settings (e.g., genetics, health services research), but statistical methods for the analysis of such data are not nearly as developed and robust, compared to the analysis of only one type of data (e.g., continuous). The methods developed in this dissertation could be used to expand inferential approaches to non-normal data which are commonly seen in public health settings. For example, hypothesis testing of the proportionate contribution of eigenvalues could be adapted to mixtures of different types of data, and these methods could possibly be extended to high-dimensional data (e.g., genetics) by examining mixtures of singular Wishart distributions.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Pleis, Johnpleis.jr@gmail.comjrp85
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairAnderson, Stewartsja@pitt.edusja
Committee MemberChang, Chung-Chou Hochangj@pitt.edu
Committee MemberJung, Sungkyusungkyu@pitt.edu
Committee MemberKang, Chaeryoncrkang@pitt.edu
Date: 26 September 2018
Date Type: Publication
Defense Date: 30 May 2018
Approval Date: 26 September 2018
Submission Date: 24 July 2018
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 144
Institution: University of Pittsburgh
Schools and Programs: Graduate School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: mixture distributions; Wishart;
Date Deposited: 26 Sep 2018 14:07
Last Modified: 26 Sep 2018 14:07
URI: http://d-scholarship.pitt.edu/id/eprint/35090

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item