Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form


Yu, Ke (2024) TOWARDS DATA-EFFICIENT LEARNING FOR MEDICINE. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img] PDF (Final)
Restricted to University of Pittsburgh users only until 22 January 2025.

Download (26MB) | Request a Copy


Deep learning has catalyzed significant advancements in the applications of artificial intelligence (AI) in medicine, extending across various modalities and data types, including small molecules, medical images, and electronic health records (EHRs). At the core of deep learning is representation learning, an automated process that uncovers patterns in data by mapping inputs to corresponding labels. However, the considerable cost associated with labeling medical data remains a major obstacle to the further development and implementation of deep learning algorithms for healthcare tasks.

In this thesis, we introduce three data-efficient learning algorithms, designed to capitalize on the abundance of existing medical data, which are predominantly unlabeled, semi-labeled or of multi-modal formats. Our first algorithm focuses on semi-supervised drug embedding and utilizes a medical knowledge base, specifically a drug taxonomy, as supervision to regularize the embedding space. This approach enables the localization of novel molecules within the context of drugs in the taxonomy, thereby facilitating inference of their pharmacological properties through retrieval of similar drugs from the embedding space. The second algorithm addresses self-supervised representation learning for three-dimensional (3D) medical images. By exploiting the recurrent and consistent anatomical structures found across different patient images, this method promotes the learning of anatomy-specific and disease-related features within the lung. Lastly, our third algorithm develops a weakly-supervised multi-modal representation learning framework for chest X-rays (CXR) and their corresponding radiology reports. By utilizing the rich contextual details embedded in reports, including the spatial and temporal relations between diseases and anatomical structures, the algorithm learns CXR representations that demonstrate effectiveness in disease detection, localization, and interval change classification. These proposed algorithms highlight the possibilities of effectively leveraging vast amounts of existing medical data, reducing the need for labor-intensive labeling and paving the way for scalable AI applications in healthcare.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairBatmanghelich, Kayhanbatman@bu.edu0000-0001-9893-9136
Committee CoChairVisweswaran, Shyamshv3@pitt.edushv30000-0002-2079-8684
Committee MemberHauskrecht, Milosmilos@pitt.edumilos0000-0002-7818-0633
Committee MemberKoes, Daviddkoes@pitt.edudkoes0000-0002-6892-6614
Committee MemberDeible,
Date: 22 January 2024
Date Type: Publication
Defense Date: 7 August 2023
Approval Date: 22 January 2024
Submission Date: 28 November 2023
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 169
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Intelligent Systems Program
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Semi-supervised Learning, Self-supervised Learning, Weakly-supervised Learning, Positive Unlabeled Learning, Drug Embedding, Medial Imaging Analysis
Date Deposited: 22 Jan 2024 16:51
Last Modified: 22 Jan 2024 16:51

Available Versions of this Item

  • TOWARDS DATA-EFFICIENT LEARNING FOR MEDICINE. (deposited 22 Jan 2024 16:51) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item