Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Evaluating and Improving the Viability of Machine Learning to Solve Chemical Problems

Folmsbee, Dakota Lee (2022) Evaluating and Improving the Viability of Machine Learning to Solve Chemical Problems. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (6MB) | Preview


While improvements in computer processing have allowed for increasingly faster quantum mechanical (QM) calculations, the need for alternative techniques to accelerate computer-accelerated material design continues to grow. Screening methods have tackled this through methods that search chemical space more efficiently but often use faster, albeit less accurate methods for evaluation due to the large number of calculations conducted. Machine learning (ML) has shown promise as a potential surrogate for time-consuming quantum mechanical calculations, such as density functional and first-principles method, that would lend these screening methods a fast and accurate approach to evaluation.

This work sets out to determine the viability of ML methods through multiple tests. The ranking of thermally accessible conformations was conducted to establish ML's capacity to differentiate small energy differences compared to other established methods. The performance of ML methods was found to be equivalent to that of semi-empirical methods in both accuracy and evaluation time, demonstrating promise for future improvements of ML models. Next, ML's understanding of chemical physics was tested by analyzing the short and long-range interactions that occur with bond compressing and stretching as well as the effect of steric hindrance of dihedral angles. The work demonstrated the extent the training set has on the model as short and long-range interactions not present in the set became apparent in the testing of the models. Additionally, the inclusion of torsion sampling in the ANI-2 training exemplifies why more robust training sets are needed for more accurate ML methods.

Current work on ML indicates a strong need for additional diversity in training data. Initial work done on comparing experimental crystallographic geometry and gas-phase computed conformer torsional preferences examine the possible use of a quantum-based ETKDG, QTDG, for future conformer training set generation for expanding existing training sets. Future work on expanding data sets is crucial for ML performance as ML methods are very reliant on the scope of the training set. Incomplete training sets that do not appropriately represent chemical space diminish the applicability of ML to solve chemical problems.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Folmsbee, Dakota Leedlf57@pitt.edudlf570000-0002-4094-233X
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHutchison, Geoffrey Rghutchis@pitt.edughutchis0000-0002-1757-1980
Committee MemberJordan, Kennethjordan@pitt.edujordan
Committee MemberLiu, Pengpengliu@pitt.edupengliu0000-0002-8188-632X
Committee MemberKoes, David Rdkoes@pitt.edudkoes0000-0002-6892-6614
Date: 6 June 2022
Date Type: Publication
Defense Date: 11 February 2022
Approval Date: 6 June 2022
Submission Date: 14 March 2022
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 138
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Chemistry
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: machine learning, quantum chemistry
Date Deposited: 06 Jun 2022 15:58
Last Modified: 06 Jun 2023 05:15

Available Versions of this Item


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item