Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Statistical methods for genetic risk confidence intervals, Bayesian disease risk prediction, and estimating mutation screening saturation

Shan, Ying (2016) Statistical methods for genetic risk confidence intervals, Bayesian disease risk prediction, and estimating mutation screening saturation. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (7MB) | Preview


Genetic information can be used to improve disease risk estimation as well as to estimate the number of genes influencing a trait. Here we explore these issues in three parts. 1) For an informed understanding of a disease risk prediction, the confidence interval of the risk estimate should be taken into account. But few previous studies have considered it. We propose a better risk prediction model and provide a better screening strategy considering the confidence intervals. Risk models are built with varying numbers of genetic risk variants known as single nucleotide polymorphisms (SNPs). Inclusion in the risk model of SNPs, sorted in decreasing order by effect size, with smaller effects modestly, shifts the risk but also increases the confidence intervals. The more appropriate risk prediction model should not include the small effect SNPs. The newly proposed screening method is superior to the traditional one as evaluated by net benefit quantity. 2) Many methods have been developed for associated SNP selection, SNP effect estimation, and risk prediction. A Bayesian method designed for continuous phenotypes, BayesR, shows good characteristics. Here, we developed an extension of BayesR (BayesRB), so that the method can be used for binary phenotypes. For SNP effect estimation, BayesRB shows the unbiasedness and sparseness for the big and small effect SNPs, respectively. It also performs well on risk prediction, but not on associated SNP selection. 3) When a recessive forward genetic screening study (RFGSS) is carried out to detect disease mutations, it is important to estimate the screening saturation so as to guide the screening strategy. Here, we develop a simulation-based "unseen species" method to estimate the screening saturation in a RFGSS. We simulated a RFGSS process based on a real study and compared our method to both nonparametric methods and parametric methods. The proposed method performs better than all the other methods, except an existing "unseen species" method. The above three newly proposed methods are helpful for constructing better risk prediction models and for estimating the number of disease contributing genes. These methods can be applied to different disease studies and may make contributions to public health.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Shan, Yingyis29@pitt.eduYIS29
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairWeeks, Daniel E.weeks@pitt.eduWEEKS0000-0001-9410-7228
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberDay, Roger S.day01@pitt.eduDAY01
Committee MemberPark, Yong Seokyongpark@pitt.eduYONGPARK
Committee MemberChen, Weiwei.chen@chp.eduWEC47
Date: 12 September 2016
Date Type: Publication
Defense Date: 12 July 2016
Approval Date: 12 September 2016
Submission Date: 14 July 2016
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 165
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Risk prediction, Conffidence intervals, Bayesian models, Screening saturation
Date Deposited: 12 Sep 2016 16:05
Last Modified: 30 Jun 2022 15:22


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item