King, Jonathan
(2024)
Deep Learning Methods and Datasets for All-atom Protein Structure Prediction.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Over the past decade, significant progress has been made in protein structure prediction, largely thanks to tools like AlphaFold2. This paper tackles several lingering challenges in this field. First, we introduce SidechainNet, a dataset and toolkit designed to streamline the handling of protein sequence and structure data for machine learning and increase its accessibility. This initiative addresses the prevalent issue of effectively collecting and organizing data, especially in the realm of protein science, where data quality and availability may vary. Furthermore, leading methods have been observed to fall short in real-world tasks like molecular docking. To enhance the physical realism of predictions made by deep learning models, we implement potential energy as a loss function through OpenMM-Loss. This technique reduces potential energy and clashes in predicted structures, potentially rendering these predictions more viable for various applications. We also scrutinize AlphaFold2 with the aim of refining its sidechain modeling—a crucial aspect of drug discovery. Although we don’t pinpoint a significantly more accurate model, our analysis reveals comparable performance between ResNet and Transformer models in sidechain prediction tasks. In light of these results, we recommend that future efforts concentrate on more holistic sidechain modeling efforts. Finally, we discuss potential future developments and extensions of our methods.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
King, Jonathan | | | |
|
ETD Committee: |
|
Date: |
5 March 2024 |
Date Type: |
Publication |
Defense Date: |
13 October 2023 |
Approval Date: |
5 March 2024 |
Submission Date: |
23 October 2023 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
103 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Computational and Systems Biology |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
protein structure prediction, machine learning, deep learning, structural biology, datasets, python, data science, biology, computational biology, alphafold |
Related URLs: |
|
Date Deposited: |
05 Mar 2024 17:43 |
Last Modified: |
05 Mar 2024 17:43 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/45461 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |