Prediction of Preterm Birth in Southwestern PA using Classification Models: A Comparative AnalysisPudasainy, Sabnum (2022) Prediction of Preterm Birth in Southwestern PA using Classification Models: A Comparative Analysis. Master's Thesis, University of Pittsburgh. (Unpublished)
AbstractBackground: Preterm birth is a global health burden and a leading cause of neonatal mortality and morbidity. This study aims to compare prediction models to identify clinical, demographic, and environmental risk factors associated with preterm birth using binary classification methods. Methods: Data from 221,060 infants born between 2010 and 2020 to mothers who resided in eight southwestern Pennsylvania counties (Allegheny, Armstrong, Beaver, Butler, Fayette, Greene, Washington, Westmoreland) were used. Covariates utilized for this analysis were the mother’s and the neonate’s clinical and demographic features and the mother’s mean exposure to air pollutants - Carbon monoxide (CO), Nitrogen dioxide (NO2), Particulate Matter (PM2.5), Ozone (O3) and Sulfur dioxide (SO2) in mother’s geocoded areas of residence during the mother’s gestation period. Exploratory data analysis, including Empirical Bayes approach, was conducted to better understand the covariates and the outcome, i.e., preterm birth. Further, three supervised machine learning techniques – Elastic Net (GLMNET), Support Vector Machine (SVM) and Random Forest – were used to build and compare prediction models based on performance metrices like Area under the Curve (AUC), sensitivity and specificity. Results: Empirical Bayes identified mothers with fewer prenatal visits (0-10) and mothers who resided in Allegheny County to be associated with higher posterior average for event probability. Among the three different algorithms used to predict preterm birth, Random Forest seemed to outperform GLMNET and SVM with an AUC of 0.83, compared to 0.77 for both GLMNET and SVM. The top important predictors common to GLMNET and SVM were total number of prenatal visits, mother’s race and education. Additionally, Random Forest identified mean exposures to pollutants as the top features, along with number of prenatal visits and Allegheny as the mother’s residential county. The results from Empirical Bayes exploration and the classification models were fairly consistent. Public Health Significance: Optimal prediction of preterm birth facilitates early identification and treatment of at-risk mothers, and enables targeted interventions to minimize infant mortality and morbidity, which would significantly benefit the community, nation, and the healthcare system as a whole. The environmental factors identified here should be explored further. Share
Details
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|