Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Deep Learning Methods and Datasets for All-atom Protein Structure Prediction

King, Jonathan (2024) Deep Learning Methods and Datasets for All-atom Protein Structure Prediction. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (20MB) | Preview


Over the past decade, significant progress has been made in protein structure prediction, largely thanks to tools like AlphaFold2. This paper tackles several lingering challenges in this field. First, we introduce SidechainNet, a dataset and toolkit designed to streamline the handling of protein sequence and structure data for machine learning and increase its accessibility. This initiative addresses the prevalent issue of effectively collecting and organizing data, especially in the realm of protein science, where data quality and availability may vary. Furthermore, leading methods have been observed to fall short in real-world tasks like molecular docking. To enhance the physical realism of predictions made by deep learning models, we implement potential energy as a loss function through OpenMM-Loss. This technique reduces potential energy and clashes in predicted structures, potentially rendering these predictions more viable for various applications. We also scrutinize AlphaFold2 with the aim of refining its sidechain modeling—a crucial aspect of drug discovery. Although we don’t pinpoint a significantly more accurate model, our analysis reveals comparable performance between ResNet and Transformer models in sidechain prediction tasks. In light of these results, we recommend that future efforts concentrate on more holistic sidechain modeling efforts. Finally, we discuss potential future developments and extensions of our methods.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
King, Jonathan
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairVanDemark, Andrew
Thesis AdvisorKoes, David
Committee MemberDurrant, Jacob
Committee MemberIsayev,
Date: 5 March 2024
Date Type: Publication
Defense Date: 13 October 2023
Approval Date: 5 March 2024
Submission Date: 23 October 2023
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 103
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Computational and Systems Biology
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: protein structure prediction, machine learning, deep learning, structural biology, datasets, python, data science, biology, computational biology, alphafold
Related URLs:
Date Deposited: 05 Mar 2024 17:43
Last Modified: 05 Mar 2024 17:43


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item