Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

Liu, Kaihong (2012) Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (1MB) | Preview


While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairCrowley, Rebecca crowleyrs@upmc.eduREBECCAJ
Committee MemberChapman, Wendy Wwec6@pitt.eduWEC6
Committee MemberHwa,
Committee MemberHogan,
Date: 3 January 2012
Date Type: Publication
Defense Date: 21 July 2011
Approval Date: 3 January 2012
Submission Date: 19 December 2011
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 168
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Ontology Learning, Ontology Enrichment, Knowledge acquisition and extraction, Natural Language Processing, Information Extraction, Ontology Learning Evaluation
Date Deposited: 03 Jan 2012 15:32
Last Modified: 19 Dec 2016 14:38


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item