Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A comparative analysis to predict p53 activity using classification models

Setty, Priyanka (2019) A comparative analysis to predict p53 activity using classification models. Master's Thesis, University of Pittsburgh. (Unpublished)

Submitted Version

Download (2MB) | Preview


Mutation studies of TP53, the gene coding the tumor protein p53, have become increasingly common in cancer research to understand its structural changes and its implications for tumor suppression. The protein’s structure is built with four identical chains containing 393 amino acids per chain. This homo-tetrameric configuration of p53 plays an important role in suppressing tumors and it is important to understand the structure-function dynamics and their role in cancer development.
A p53 mutant dataset was obtained from the University of California at Irvine (UCI) Machine Learning Repository to infer p53 protein’s ability to suppress tumors based on its two-dimensional (2D) and three-dimensional (3D) structural features. The dataset consisted of 31,283 instances (observations) and 5,408 numerical features. Among the total features, the first 4,826 accounted for 2D structural features which were based on electrostatic and surface properties. The remaining 582 3D features were the distance maps between mutant and wild type p53. After selecting a subset of the features that were statistically relevant in predicting the outcome (n=100), three classification algorithms, Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF), were fit to the data and trained using a cross-validation scheme to obtain good parameters to classify an active p53 mutant from its inactive counterparts. Performance metrics in terms of accuracy and area-under-the-curve (AUC) were utilized in order to evaluate a particular classification model. Among the three different algorithms used to predict the outcome, LR seemed to outperform SVM and RF with an accuracy ranging from 0.75 to 0.81 and AUC ranging from 0.75 to 0.88.
The LR model identified 2D feature numbers 60,74,49,40, and 73 as features of high importance in predicting the activity of p53. The public health significance of this study is that it advances the understanding of p53, which is critical to cancer tumor suppression, by helping to predict p53 activation using set of structural features obtained from simple classification models.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Setty, Priyankaprs77@pitt.eduprs77
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBuchanich,
Committee MemberCarlson,
Committee MemberShaffer,
Committee MemberRamanathan,
Date: 26 September 2019
Date Type: Publication
Defense Date: 29 July 2019
Approval Date: 26 September 2019
Submission Date: 21 July 2019
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 65
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: MS - Master of Science
Thesis Type: Master's Thesis
Refereed: Yes
Uncontrolled Keywords: p53 mutant,classification,p53 activity
Date Deposited: 26 Sep 2019 16:53
Last Modified: 01 Sep 2020 05:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item