Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Treelet Dimension Reduction of Diagnoses Among Intensive Care Unit Admissions

DiSanto, James (2021) Treelet Dimension Reduction of Diagnoses Among Intensive Care Unit Admissions. Master's Thesis, University of Pittsburgh. (Unpublished)

Submitted Version

Download (1MB) | Preview


Background: The objective of this thesis is to transform a large number of ICD-9-CM
diagnosis codes into a reduced number of variables, using treelet dimension reduction, and use this
resulting transformation in the prediction of clinical outcomes of in-hospital mortality, unplanned
re-admission, and hospital length of stay.

Data: International Classification of Disease, 9th Revision, Clinical Modification (ICD-9-
CM) codes and patient demographic data (age, sex, insurance coverage) from the Medical
Information Mart for Intensive Care III (MIMIC-III) database prospective cohort study of 38,554
adults admitted to a single intensive care unit from 2001 to 2012.

Methods: We applied treelet dimension reduction to ICD-9-CM diagnosis codes (n=178,
>1% prevalence in the analytic cohort) to identify a transformed feature space of patient diagnoses
that we then used, with patient demographic data, to predict in-hospital mortality, unplanned
hospital re-admission, and length of hospital stay using logistic and negative binomial regression

Results: Treelet dimension reduction for ICD-9-CM diagnosis codes identified reduced
feature spaces in prediction of in-hospital mortality, unplanned hospital re-admission, and length
of stay. The resulting treelet features for each clinical outcome, in addition to patient age, gender,
and payment method, demonstrate improved utility in predicting in-hospital mortality
(AUC=0.858) but limited accuracy in prediction hospital re-admission (AUC=0.661). Treelet dimension reduction identifies a sparse number of ICD-9-CM diagnosis codes (107 of 178)
retained in the treelet features included in modelling of length of stay (RMSE=10.29).

Public Health Significance: These analyses leverage a large, public database of critical
care admissions, generating predictive models of clinical outcomes using only patient
demographic and comorbidity diagnosis information. The presented analysis builds upon previous
work by applying the novel treelet dimension reduction model on diagnosis data in a dataset of
critical care admissions and demonstrate the utility of diagnosis code data alone in prediction of
clinical outcomes.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
DiSanto, Jamesjdd65@pitt.edujdd650000-0002-8815-2169
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorBuchanich, Jeaninejeanine@pitt.edujeanine
Committee MemberCarlson, Jennajnc315@pitt.edujnc315
Committee MemberYouk, Adaayouk@pitt.eduayouk
Committee MemberLandsittel, Douglasdpl12@pitt.edudpl12
Date: 19 January 2021
Date Type: Publication
Defense Date: 7 December 2020
Approval Date: 19 January 2021
Submission Date: 30 November 2020
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 153
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: treelet, dimension reduction, diagnosis codes, generalized linear models
Date Deposited: 19 Jan 2021 19:46
Last Modified: 19 Jan 2022 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item