Ondich, Brett
(2023)
Using Quantum Chemical Features in a Neural Network to Improve Aqueous Solubility Prediction.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Aqueous solubility is a vital molecular property in numerous fields, such as drug discovery and material design. Accurate prediction of molecular aqueous solubility can reduce the number of potential candidates prior to experimental analysis. Shrinking the chemical search space can
result in streamlining the selection process, saving valuable time and resources. Recent developments have increased interests in utilizing machine learning techniques to computationally predict aqueous solubility rather than experimentation. One such technique is the Molecular
Attention Transformer (MAT). Transformers are a special case of graph neural networks (GNN). GNNs utilize inputs in the form of graphs that have data stored as nodes and edges, which can be thought of as atoms and bonds, respectively. An important aspect of building a GNN is determining which features to use as descriptors for the nodes and edges. This paper investigates the effects of including quantum chemical data as node features in a GNN model. The hypothesis was that by including this quantum data, the model will be able to better discriminate between compounds of high similarity and more accurately predict their aqueous solubility. However, there was no significant improvement in model performance when the quantum data was included in the model.
The accuracy of the quantum data was analyzed to determine if the performance did not improve due to the data or the model. It was determined that the solvation models being used to compute the quantum data were unable to produce data at a level of accuracy to enable the model to benefit
from the inclusion of the quantum features. Furthermore, a recently published model pretrained on quantum data was compared to the base model being used to determine if including quantum features improves performance. The quantum model outperformed the base model, further showing that including quantum features should improve model performance but requires quality quantum data.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
27 January 2023 |
Date Type: |
Publication |
Defense Date: |
6 December 2022 |
Approval Date: |
27 January 2023 |
Submission Date: |
12 December 2022 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
32 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Chemistry |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
machine learning, deep learning, solubility prediction |
Article Type: |
Research Article |
Date Deposited: |
27 Jan 2023 17:50 |
Last Modified: |
27 Jan 2023 17:50 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/43973 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |