Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Investigation of a Data and Knowledge Driven System for Sequential Diagnosis

XUE, DIYANG (2023) Investigation of a Data and Knowledge Driven System for Sequential Diagnosis. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (2MB) | Preview


Medical diagnosis is the process of determining the nature of a disease and distinguishing it from other similar diseases. A diagnostic error happens when a diagnosis is missed, inappropriately delayed, or inaccurate. Diagnostic error accounts for the most severe patient harm, the largest fraction of claims, and highest total penalty payouts. One way to reduce diagnostic error is to use a computer-aided diagnostic (CAD) system to augment doctors’ diagnostic abilities. More and more machine learning algorithms have been applied to the medical diagnosis field and achieve good performance. However, because most of the models are very complicated and the diagnostic process is different from physicians' workflow, physicians usually do not trust those models.

My dissertation investigates how to combine electronic health record (EHR) data with medical knowledge to generate a sequential diagnostic system that utilizes clinical alignment, which is when the diagnostic process is in line with physicians' diagnostic process. The new system has two main characteristics: (1) data-driven so that we can use EHR data and machine learning algorithms for developing a multi-label classification system; (2) clinical knowledge-driven so that valuable clinical diagnostic knowledge can be integrated into the system.

I have developed (1) a framework that can integrate pre-defined medical knowledge with disease patterns in EHR data for sequential diagnosis and (2) an algorithm that generates medical diagnostic trees that recommend diagnostic actions by considering clinical workflow, diagnostic accuracy, and misdiagnosis costs. Experiments show that the learned model has better clinical alignment, higher diagnostic accuracy, and lower misdiagnosis costs than baseline models, which were developed using a traditional multi-label classification tree algorithm (ML-C4.5) and a deep reinforcement learning algorithm (deep Q learning), respectively.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHe, Daqingdah44@pitt.edudah44
Committee MemberCooper, Gregory F.gfc@pitt.edugfc
Committee MemberWagner, Michael M.mmw1@pitt.edummw1
Committee MemberFrisch, Adam
Date: 10 January 2023
Date Type: Publication
Defense Date: 14 November 2022
Approval Date: 10 January 2023
Submission Date: 30 November 2022
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 181
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Intelligent Systems Program
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: sequential diagnosis, classification tree, reinforcement learning, domain knowledge, electronic health records
Date Deposited: 10 Jan 2023 16:16
Last Modified: 10 Jan 2023 16:16


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item