Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A pipeline for classifying close family relationships with dense SNP data and putative pedigree information

Zeng, Zhen (2015) A pipeline for classifying close family relationships with dense SNP data and putative pedigree information. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (2MB)


When genome-wide association studies (GWAS) or sequencing studies are performed on family-based datasets, the genotype data can be used to check the structure of putative pedigrees. Even in datasets of putatively unrelated people, close relationships can often be detected using dense single-nucleotide polymorphism/variant (SNP/SNV) data.
A number of methods for finding relationships using dense genetic data exist, but they all have certain limitations, including that they typically use average genetic sharing, which is only a subset of the available information. We present a set of approaches for classifying relationships in GWAS datasets or whole genome sequencing datasets. We first propose an empirical method for detecting identity-by-descent segments in close relative pairs using unphased dense SNP data and demonstrate how that information can assist in building a relationship classifier. We then develop a strategy to take advantage of putative pedigree information to enhance classification accuracy. Our methods are tested and illustrated with two SNP array datasets from two distinct populations. With these new techniques, we propose classification pipelines for checking and identifying pair-wise relationships in datasets containing a large number of small pedigrees.
We also explore the performance of the pipeline on a whole exome sequencing dataset. Although the classifier based on SNP array data does not perform well on exome sequencing data, it can in principle be modified using new algorithm parameters and training data in order to achieve better performance.
Finally, we develop a method to reconstruct pedigrees from pair-wise relationship information. Our method can reconstruct core pedigrees with high accuracy and pair-wise relationship inferences can be further improved during this process.
Detecting close family relationships and reconstructing pedigrees are important in both population-based and family-based studies. Providing precise pedigrees and hidden relatedness information helps increase the accuracy and power of various genetic analyses and avoids false positive associations, making these studies more efficient in identifying the genetic basis of diseases. This is a crucial step on the path to developing better treatments and interventions and improving public health.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Zeng, Zhenzhz43@pitt.eduZHZ43
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberWeeks, Daniel E.weeks@pitt.eduWEEKS0000-0001-9410-7228
Committee MemberTseng, George C.ctseng@pitt.eduCTSENG
Committee MemberChen,
Date: 28 September 2015
Date Type: Publication
Defense Date: 6 May 2015
Approval Date: 28 September 2015
Submission Date: 12 June 2015
Access Restriction: 2 year -- Restrict access to University of Pittsburgh for a period of 2 years.
Number of Pages: 88
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: family relationships, IBD, GWAS, sequencing studies, classification, pedigree reconstruction
Date Deposited: 28 Sep 2015 16:59
Last Modified: 30 Jun 2022 15:52


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item