Evaluating and Improving the Viability of Machine Learning to Solve Chemical ProblemsFolmsbee, Dakota Lee (2022) Evaluating and Improving the Viability of Machine Learning to Solve Chemical Problems. Doctoral Dissertation, University of Pittsburgh. (Unpublished) This is the latest version of this item.
AbstractWhile improvements in computer processing have allowed for increasingly faster quantum mechanical (QM) calculations, the need for alternative techniques to accelerate computer-accelerated material design continues to grow. Screening methods have tackled this through methods that search chemical space more efficiently but often use faster, albeit less accurate methods for evaluation due to the large number of calculations conducted. Machine learning (ML) has shown promise as a potential surrogate for time-consuming quantum mechanical calculations, such as density functional and first-principles method, that would lend these screening methods a fast and accurate approach to evaluation. This work sets out to determine the viability of ML methods through multiple tests. The ranking of thermally accessible conformations was conducted to establish ML's capacity to differentiate small energy differences compared to other established methods. The performance of ML methods was found to be equivalent to that of semi-empirical methods in both accuracy and evaluation time, demonstrating promise for future improvements of ML models. Next, ML's understanding of chemical physics was tested by analyzing the short and long-range interactions that occur with bond compressing and stretching as well as the effect of steric hindrance of dihedral angles. The work demonstrated the extent the training set has on the model as short and long-range interactions not present in the set became apparent in the testing of the models. Additionally, the inclusion of torsion sampling in the ANI-2 training exemplifies why more robust training sets are needed for more accurate ML methods. Current work on ML indicates a strong need for additional diversity in training data. Initial work done on comparing experimental crystallographic geometry and gas-phase computed conformer torsional preferences examine the possible use of a quantum-based ETKDG, QTDG, for future conformer training set generation for expanding existing training sets. Future work on expanding data sets is crucial for ML performance as ML methods are very reliant on the scope of the training set. Incomplete training sets that do not appropriately represent chemical space diminish the applicability of ML to solve chemical problems. Share
Details
Available Versions of this Item
MetricsMonthly Views for the past 3 yearsPlum AnalyticsActions (login required)
|