Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Machine learning to estimate the effects of road proximity on preterm birth

O'Connell, Brian (2022) Machine learning to estimate the effects of road proximity on preterm birth. Master's Thesis, University of Pittsburgh. (Unpublished)

Download (1MB) | Preview


Background: Preterm birth is a critical public health issue because babies born before 37 weeks of gestation have much higher risks of having chronic health issues. Infants born preterm also experience more immediate health complications due to the lack of development of critical organs such as the lungs.

Methods: Logistic regression and the machine learning methods of neural networks, random forests, and extreme gradient boosted trees were implemented to create models that predict whether a baby will be born preterm. These models were trained using the mothers’ age, ethnicity, socioeconomic status, education level, smoking status, pre-pregnancy weight, date of giving birth, county of birth, and number of prenatal visits. Adjustments for father's race and ethnicity were accommodated in the model. These demographic variables of the parents were used in conjunction with an environmental measure of proximity to the nearest major road or train tracks as an air quality metric to determine the most important variables for predicting preterm birth.

Results: The random forest model performed the best with an area under the curve (AUC) of 0.731 for the receiver operating characteristic (ROC) plot and had the distance to the nearest road or train tracks as the most important variable to the model. All models performed similarly well with the lowest being logistic regression with an AUC of 0.657. All models besides the random forest method identified the number of prenatal visits as the most important variable.

Conclusion: Using machine learning methods, the number of prenatal visits and the distance to the nearest major road or train tracks were the most important variables in predicating preterm birth. Monitoring these variables can help public health officials and medical professionals give the best recommendations for expecting mothers.

Public Health Impact: This thesis will help discover nonlinear associations between demographic or environmental and preterm birth. Identifying these variables can help guide public health officials know where to focus their efforts in putting forth legislation and implementing policies. Additionally, providing this information to doctors can also help them better inform and guide their patients as they navigate their pregnancy.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
O'Connell, Brianbrr99@pitt.edubrr99
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorYouk, Adaayouk@pitt.eduayouk
Committee MemberCarlson, Jennajnc35@pitt.edujnc35
Committee MemberBuchanich, Jeaninejeanine@pitt.edujeanine
Committee MemberFabisiak, Jamesfabs@pitt.edufabs
Date: 12 May 2022
Date Type: Publication
Defense Date: 25 April 2022
Approval Date: 12 May 2022
Submission Date: 28 April 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 62
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: Machine Learning, Neural Network, Random Forest, Extreme Gradient Boosting, Preterm Birth
Date Deposited: 12 May 2022 13:43
Last Modified: 12 May 2022 13:43


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item