Gauthier, Christian
(2023)
Large-scale discovery and comparative genomics of Mycobacterium prophages with new computational tools.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Hundreds of thousands of bacterial genomes have been sequenced, many of which likely contain integrated prophages derived from temperate bacteriophages. Prophages play key roles by influencing bacterial metabolism, pathogenicity, antibiotic resistance, and defense against phage attack. However, they are highly diverse, vary considerably even among related strains, and are difficult to computationally identify and extract precisely. Prophage diversity and variability also make them challenging to compare to already-characterized phage genomes, especially for large datasets. For my dissertation, I have developed a trio of computational tools aimed at addressing these needs. First, I developed a new pipeline that uses MMseqs2 to assemble phage genes into groups of global sequence homologs (phams) - useful for examining the relationships between highly mosaic phage genomes. The pipeline allows rapid clustering of large gene datasets on consumer hardware, producing phams that are highly sensitive yet specific enough that genes in the same pham all likely share related functions. Second, I developed a new tool for prophage identification and extraction, which uses genomic architectural features that efficiently discriminate between phage and bacterial genomic regions, and targeted homology searches using phage gene phams to precisely extract prophage regions. I demonstrate that this tool is very fast and accurate in Mycobacteria, and its speed allowed the prediction of prophages in over 30,000 Mycobacterium genomes; I also provide evidence that this method is likely to work well in at least some other bacterial genera. Third, I devised a new genome relatedness index using genes (phams) shared between pairs of phages and built a pipeline that uses this metric to rapidly cluster sets of phage genomes, with resulting clusters almost identical to those clusters that have been manually generated using primarily nucleotide sequence identity. This allowed integration of over a thousand novel prophages into the extant mycobacteriophage dataset and comparison of these diverse groups of phages, revealing that the new prophages mostly form new cluster spaces, but some have interesting relationships with existing clusters. The tools I developed represent important steps that will allow researchers to better manage and make sense of the ever-increasing scale of the sequenced (pro)phage space.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
25 January 2023 |
Date Type: |
Publication |
Defense Date: |
18 October 2022 |
Approval Date: |
25 January 2023 |
Submission Date: |
2 December 2022 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
326 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Dietrich School of Arts and Sciences > Biological Sciences |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Mycobacterium; phage; prophage; temperate; remote homology; diversity |
Date Deposited: |
25 Jan 2023 15:37 |
Last Modified: |
25 Jan 2023 15:37 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/43762 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |