Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Semantics Enhanced Deep Learning Medical Text Classifier

Posada Aguilar, Jose David (2018) Semantics Enhanced Deep Learning Medical Text Classifier. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Primary Text

Download (2MB) | Preview


Electronic health records (EHR) contain a vast amount of data with the potential to leverage applications that improve patient outcomes and enhance the work of health care providers.
A major portion of this data is inside unstructured text in the form of clinical narratives. To effectively use clinical text, NLP tools have been developed and applied to numerous problems involving clinical decision support systems, cohort identification, and phenotyping among others.
However, one of the main problems that face the development of NLP tools for the clinical domain is the lack of large annotated data sets. Clinical language and report style variations are another major problem for clinical NLP. These variations lead to problems where NLP systems created with data from one institution exhibit significantly different performance when tested in a different institution.
One way to address the lack of large annotated datasets and variations in clinical language is the explicit incorporation of semantics into the development of clinical NLP tools. Semantics allow us to know that the meaning of words, and thus help us account for language variations. In this work, we incorporate the semantics from ontologies into a loss function of a deep learning text classifier. Also, to specifically address the problem of the lack of large annotated datasets we used a large unannotated or unlabeled dataset, increasing the sample size as a result. To properly use such unlabeled data, we adapted a semi-supervised binary approach that uses the unlabeled dataset during training.
To the best of our knowledge we are the first to do so, and for that reason, this is one
of the main theoretical contributions of this work. Also, by reducing the need for extensive annotations, we believe this work could enable researchers in clinical settings to embrace and leverage the full potential of clinical NLP tools given the reduced effort required to achieve the desired performance. Furthermore, all the methods in this work are designed as reproducible and extensible software tools that aid further biomedical informatics research in this area.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Posada Aguilar, Jose Davidjoseposada@pitt.edujdp760000-0003-3864-0241
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTsui, FuchiangTSUIF@EMAIL.CHOP.EDU
Thesis AdvisorTsui, FuchiangTSUIF@EMAIL.CHOP.EDU
Committee MemberHarkema,
Committee MemberHochheiser,
Committee MemberChennubhotla,
Date: 28 August 2018
Date Type: Publication
Defense Date: 23 July 2018
Approval Date: 28 August 2018
Submission Date: 10 August 2018
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 92
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Deep Learning, Ontology, Natural Language Processing, Psychiatric Reports, Machine Learning, Psychiatry
Date Deposited: 28 Aug 2018 19:17
Last Modified: 28 Aug 2019 05:15

Available Versions of this Item


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item