Mixtures of discrete and continuous variables: considerations for dimension reduction

Pleis, John (2018) Mixtures of discrete and continuous variables: considerations for dimension reduction. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Submitted Version
Download (14MB) | Preview

Abstract

For this dissertation, we will examine mixtures of different types of data, the analytic challenges that such data can present, and some approaches for addressing this issue. Specifically, we will
consider mixtures of continuous and discrete data. For the theoretical developments that follow, we will focus on the general location model (GLOM)-based methodology for deriving the joint probability distribution of continuous and discrete random variables as the product of conditional and marginal probability distributions. As we will show, the general specification of this joint distribution is a finite mixture of Gaussian distributions. We will consider both the univariate and multivariate cases. For the univariate case we will first determine the distribution of the sample variance, and for the multivariate case we will first determine the distribution of the sample covariance matrix. When the component distributions of the mixture have different variances (univariate) or covariance matrices (multivariate), any analysis can become more challenging. In such cases, we propose approximating the mixture density with a non-mixture density from the same parametric family (e.g., multivariate Gaussian). Finally, we will present some extensions of this work to the field of dimension reduction.

Public Health Significance: Mixtures of continuous and discrete variables are somewhat common in public health settings (e.g., genetics, health services research), but statistical methods for the analysis of such data are not nearly as developed and robust, compared to the analysis of only one type of data (e.g., continuous). The methods developed in this dissertation could be used to expand inferential approaches to non-normal data which are commonly seen in public health settings. For example, hypothesis testing of the proportionate contribution of eigenvalues could be adapted to mixtures of different types of data, and these methods could possibly be extended to high-dimensional data (e.g., genetics) by examining mixtures of singular Wishart distributions.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Pleis, John	pleis.jr@gmail.com	jrp85

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Anderson, Stewart	sja@pitt.edu	sja
Committee Member	Chang, Chung-Chou Ho	changj@pitt.edu
Committee Member	Jung, Sungkyu	sungkyu@pitt.edu
Committee Member	Kang, Chaeryon	crkang@pitt.edu

Date:

26 September 2018

Date Type:

Publication

Defense Date:

30 May 2018

Approval Date:

26 September 2018

Submission Date:

24 July 2018

Access Restriction:

1 year -- Restrict access to University of Pittsburgh for a period of 1 year.

Number of Pages:

144

Institution:

University of Pittsburgh

Schools and Programs:

School of Public Health > Biostatistics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

mixture distributions; Wishart;

Date Deposited:

26 Sep 2018 14:07

Last Modified:

01 Sep 2019 05:15

URI:

http://d-scholarship.pitt.edu/id/eprint/35090

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Mixtures of discrete and continuous variables: considerations for dimension reduction

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds