Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Large-scale discovery and comparative genomics of Mycobacterium prophages with new computational tools

Gauthier, Christian (2023) Large-scale discovery and comparative genomics of Mycobacterium prophages with new computational tools. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (20MB) | Preview


Hundreds of thousands of bacterial genomes have been sequenced, many of which likely contain integrated prophages derived from temperate bacteriophages. Prophages play key roles by influencing bacterial metabolism, pathogenicity, antibiotic resistance, and defense against phage attack. However, they are highly diverse, vary considerably even among related strains, and are difficult to computationally identify and extract precisely. Prophage diversity and variability also make them challenging to compare to already-characterized phage genomes, especially for large datasets. For my dissertation, I have developed a trio of computational tools aimed at addressing these needs. First, I developed a new pipeline that uses MMseqs2 to assemble phage genes into groups of global sequence homologs (phams) - useful for examining the relationships between highly mosaic phage genomes. The pipeline allows rapid clustering of large gene datasets on consumer hardware, producing phams that are highly sensitive yet specific enough that genes in the same pham all likely share related functions. Second, I developed a new tool for prophage identification and extraction, which uses genomic architectural features that efficiently discriminate between phage and bacterial genomic regions, and targeted homology searches using phage gene phams to precisely extract prophage regions. I demonstrate that this tool is very fast and accurate in Mycobacteria, and its speed allowed the prediction of prophages in over 30,000 Mycobacterium genomes; I also provide evidence that this method is likely to work well in at least some other bacterial genera. Third, I devised a new genome relatedness index using genes (phams) shared between pairs of phages and built a pipeline that uses this metric to rapidly cluster sets of phage genomes, with resulting clusters almost identical to those clusters that have been manually generated using primarily nucleotide sequence identity. This allowed integration of over a thousand novel prophages into the extant mycobacteriophage dataset and comparison of these diverse groups of phages, revealing that the new prophages mostly form new cluster spaces, but some have interesting relationships with existing clusters. The tools I developed represent important steps that will allow researchers to better manage and make sense of the ever-increasing scale of the sequenced (pro)phage space.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Gauthier, Christianchristian.gauthier@pitt.educhg600000-0003-4407-0994
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHatfull, Graham
Committee MemberArndt, Karen
Committee MemberPeebles, Craig
Committee MemberVanDemark, Andrew
Committee MemberHiller,
Date: 25 January 2023
Date Type: Publication
Defense Date: 18 October 2022
Approval Date: 25 January 2023
Submission Date: 2 December 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 326
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Biological Sciences
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Mycobacterium; phage; prophage; temperate; remote homology; diversity
Date Deposited: 25 Jan 2023 15:37
Last Modified: 25 Jan 2023 15:37


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item