Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Using Quantum Chemical Features in a Neural Network to Improve Aqueous Solubility Prediction

Ondich, Brett (2023) Using Quantum Chemical Features in a Neural Network to Improve Aqueous Solubility Prediction. Master's Thesis, University of Pittsburgh. (Unpublished)

Download (680kB) | Preview


Aqueous solubility is a vital molecular property in numerous fields, such as drug discovery and material design. Accurate prediction of molecular aqueous solubility can reduce the number of potential candidates prior to experimental analysis. Shrinking the chemical search space can
result in streamlining the selection process, saving valuable time and resources. Recent developments have increased interests in utilizing machine learning techniques to computationally predict aqueous solubility rather than experimentation. One such technique is the Molecular
Attention Transformer (MAT). Transformers are a special case of graph neural networks (GNN). GNNs utilize inputs in the form of graphs that have data stored as nodes and edges, which can be thought of as atoms and bonds, respectively. An important aspect of building a GNN is determining which features to use as descriptors for the nodes and edges. This paper investigates the effects of including quantum chemical data as node features in a GNN model. The hypothesis was that by including this quantum data, the model will be able to better discriminate between compounds of high similarity and more accurately predict their aqueous solubility. However, there was no significant improvement in model performance when the quantum data was included in the model.
The accuracy of the quantum data was analyzed to determine if the performance did not improve due to the data or the model. It was determined that the solvation models being used to compute the quantum data were unable to produce data at a level of accuracy to enable the model to benefit
from the inclusion of the quantum features. Furthermore, a recently published model pretrained on quantum data was compared to the base model being used to determine if including quantum features improves performance. The quantum model outperformed the base model, further showing that including quantum features should improve model performance but requires quality quantum data.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Ondich, Brettbrett78931@gmail.comBJO200000-0002-2091-7364
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHutchison, Geoffreygeoffh@pitt.edugeoffh
Committee MemberJordan, Kennethjordan@pitt.edujordan
Committee MemberLiu, Pengpengliu@pitt.edupengliu
Date: 27 January 2023
Date Type: Publication
Defense Date: 6 December 2022
Approval Date: 27 January 2023
Submission Date: 12 December 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 32
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Chemistry
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: machine learning, deep learning, solubility prediction
Article Type: Research Article
Date Deposited: 27 Jan 2023 17:50
Last Modified: 27 Jan 2023 17:50


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item