Picone, Celeste
(2024)
Utilization of Basic Demographic Data and Area-Level Data in Non-Small Cell Lung Cancer Stage Classification.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Background: The objective of this thesis is to utilize classification and regression trees (CART) to identify interactions and assess the predictive ability of basic demographic measures along with area-level measures in non-small cell lung cancer (NSCLC) staging (early vs late).
Data: Individual-level demographics (age, sex, race, insurance coverage, census tract) and lung cancer data (stage, subtype) were obtained from the Pennsylvania Cancer Registry. Census tract area-level measures (neighborhood deprivation index, radon readings, PM2.5 readings, greenspace area, total air cancer risk) were obtained from the U.S. Census Bureau and the U.S. Environmental Protection Agency.
Methods: We employed CART decision tree algorithms to analyze the ability of limited individual-level data with area-level data to predict the stage of NSCLC diagnoses in Allegheny County, Pennsylvania from 2015 to 2019.
Results: The CART algorithm identified seven out of the nine original predictors as important in classifying NSCLC stage. Of the seven, three were individual-level demographic indicators (primary payer, race, and sex) and four were area-level indicators (radon levels, PM2.5 levels, neighborhood deprivation index, and greenspace). These indicators showed poor accuracy in predicting whether a patient was diagnosed with early- or late-stage NSCLC (Area Under the Curve (AUC) <0.60).
Public Health Significance: This thesis highlights the importance of quality, in-depth, individual-level patient data in cancer analysis and modeling. While area-level indicators of health are important in the prognosis of cancer, these factors, alone, are not enough to accurately predict the patient’s cancer stage. Cancer registries should strive to collect individual-level data above and beyond basic demographics.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
16 May 2024 |
Date Type: |
Publication |
Defense Date: |
17 April 2024 |
Approval Date: |
16 May 2024 |
Submission Date: |
17 April 2024 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
49 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
CART, machine learning, lung cancer, logistic regression, demographic, area-level, census tract |
Date Deposited: |
16 May 2024 20:22 |
Last Modified: |
16 May 2024 20:22 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/46150 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |