Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Utilization of Basic Demographic Data and Area-Level Data in Non-Small Cell Lung Cancer Stage Classification

Picone, Celeste (2024) Utilization of Basic Demographic Data and Area-Level Data in Non-Small Cell Lung Cancer Stage Classification. Master's Thesis, University of Pittsburgh. (Unpublished)

Download (880kB) | Preview


Background: The objective of this thesis is to utilize classification and regression trees (CART) to identify interactions and assess the predictive ability of basic demographic measures along with area-level measures in non-small cell lung cancer (NSCLC) staging (early vs late).
Data: Individual-level demographics (age, sex, race, insurance coverage, census tract) and lung cancer data (stage, subtype) were obtained from the Pennsylvania Cancer Registry. Census tract area-level measures (neighborhood deprivation index, radon readings, PM2.5 readings, greenspace area, total air cancer risk) were obtained from the U.S. Census Bureau and the U.S. Environmental Protection Agency.
Methods: We employed CART decision tree algorithms to analyze the ability of limited individual-level data with area-level data to predict the stage of NSCLC diagnoses in Allegheny County, Pennsylvania from 2015 to 2019.
Results: The CART algorithm identified seven out of the nine original predictors as important in classifying NSCLC stage. Of the seven, three were individual-level demographic indicators (primary payer, race, and sex) and four were area-level indicators (radon levels, PM2.5 levels, neighborhood deprivation index, and greenspace). These indicators showed poor accuracy in predicting whether a patient was diagnosed with early- or late-stage NSCLC (Area Under the Curve (AUC) <0.60).
Public Health Significance: This thesis highlights the importance of quality, in-depth, individual-level patient data in cancer analysis and modeling. While area-level indicators of health are important in the prognosis of cancer, these factors, alone, are not enough to accurately predict the patient’s cancer stage. Cancer registries should strive to collect individual-level data above and beyond basic demographics.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Picone, Celestecmp158@pitt.educmp158
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBuchanich, Jeaninejeanine@pitt.edujeanine
Committee MemberCarlson, Jennajnc35@pitt.edujnc35
Committee MemberTipre, MeghanMET169@pitt.eduMET169
Date: 16 May 2024
Date Type: Publication
Defense Date: 17 April 2024
Approval Date: 16 May 2024
Submission Date: 17 April 2024
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 49
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: CART, machine learning, lung cancer, logistic regression, demographic, area-level, census tract
Date Deposited: 16 May 2024 20:22
Last Modified: 16 May 2024 20:22


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item