Posada Aguilar, Jose David
(2018)
Semantics Enhanced Deep Learning Medical Text Classifier.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Electronic health records (EHR) contain a vast amount of data with the potential to leverage applications that improve patient outcomes and enhance the work of health care providers.
A major portion of this data is inside unstructured text in the form of clinical narratives. To effectively use clinical text, NLP tools have been developed and applied to numerous problems involving clinical decision support systems, cohort identification, and phenotyping among others.
However, one of the main problems that face the development of NLP tools for the clinical domain is the lack of large annotated data sets. Clinical language and report style variations are another major problem for clinical NLP. These variations lead to problems where NLP systems created with data from one institution exhibit significantly different performance when tested in a different institution.
One way to address the lack of large annotated datasets and variations in clinical language is the explicit incorporation of semantics into the development of clinical NLP tools. Semantics allow us to know that the meaning of words, and thus help us account for language variations. In this work, we incorporate the semantics from ontologies into a loss function of a deep learning text classifier. Also, to specifically address the problem of the lack of large annotated datasets we used a large unannotated or unlabeled dataset, increasing the sample size as a result. To properly use such unlabeled data, we adapted a semi-supervised binary approach that uses the unlabeled dataset during training.
To the best of our knowledge we are the first to do so, and for that reason, this is one
of the main theoretical contributions of this work. Also, by reducing the need for extensive annotations, we believe this work could enable researchers in clinical settings to embrace and leverage the full potential of clinical NLP tools given the reduced effort required to achieve the desired performance. Furthermore, all the methods in this work are designed as reproducible and extensible software tools that aid further biomedical informatics research in this area.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
28 August 2018 |
Date Type: |
Publication |
Defense Date: |
23 July 2018 |
Approval Date: |
28 August 2018 |
Submission Date: |
10 August 2018 |
Access Restriction: |
1 year -- Restrict access to University of Pittsburgh for a period of 1 year. |
Number of Pages: |
92 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Deep Learning, Ontology, Natural Language Processing, Psychiatric Reports, Machine Learning, Psychiatry |
Date Deposited: |
28 Aug 2018 19:17 |
Last Modified: |
28 Aug 2019 05:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/35266 |
Available Versions of this Item
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |