Anderson, Kevin
(2022)
INCLUSION OF 48 PACIFIC ISLANDERS WITHIN A COSMOPOLITAN
REFERENCE PANEL IS SUFFICIENT FOR HIGH ACCURACY GENOTYPE
IMPUTATION OF SAMOANS.
Master's Thesis, University of Pittsburgh.
(Unpublished)
Abstract
Imputation is a computational method for inferring genotypes based on previous knowledge of shared haplotype structure commonly used in genome-wide association studies. Genotype fre-quencies not only play an important role in imputation but also are highly variable around the world, meaning it is crucial to adjust for population bias in genetic studies. Common methods for imputation involve the use of publicly available haplotype panels from 1000 Genomes, TOPMed, or other consortia. However, these panels contain data mostly pulled from individuals of Europe-an ancestry. Population isolates such as Polynesians greatly benefit in genotype accuracy when using a population-specific haplotype reference panel. Here, I perform multiple imputations using the 1000 Genomes phase III reference panel and genome-wide data from 1285, 384, 96, 48, 24, and 1 Samoan on chromosomes 5 and 21 to determine how many fully sequenced individuals are needed to include in study-specific haplotype panels to achieve accurate imputation. I also inves-tigated the accuracy of these multiple imputations on genotype frequencies of population-specific variants found in the CREBRF and BTNL9 genes that are previously determined to be associated with higher BMI and lower HDL levels respectively. I demonstrate that the incorporation of 96 Samoans within the 1000 Genomes cosmopolitan panel produces accurate imputation quality of rare variants (minor allele frequency of 1%), and 24 Samoans for common variants (minor allele frequency greater than 5%). These results show that the creation of a study-specific reference panel utilizing a small subset of individuals from a population-isolate within a cosmopolitan panel is a cost-effective strategy for accurate imputation. The ability to perform fine-mapping on rare population-specific variants will have broad public health implications such as better understand-ing of genetic disease etiology and function and improved genetic literacy when focusing on these population isolates.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
29 April 2022 |
Defense Date: |
22 April 2022 |
Approval Date: |
10 May 2022 |
Submission Date: |
29 April 2022 |
Access Restriction: |
2 year -- Restrict access to University of Pittsburgh for a period of 2 years. |
Number of Pages: |
79 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Human Genetics |
Degree: |
MS - Master of Science |
Thesis Type: |
Master's Thesis |
Refereed: |
Yes |
Uncontrolled Keywords: |
word |
Date Deposited: |
10 May 2022 20:00 |
Last Modified: |
30 Jun 2022 15:16 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/42901 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |