Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

Jordan, R and Visweswaran, S and Gopalakrishnan, V (2014) Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids. Journal of Clinical Bioinformatics, 4 (1).

Published Version
Available under License : See the attached license file.

Download (685kB) | Preview
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)


Background: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance. Results: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. Conclusions: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.


Social Networking:
Share |


Item Type: Article
Status: Published
CreatorsEmailPitt UsernameORCID
Jordan, Rrmj12@pitt.eduRMJ12
Visweswaran, Sshv3@pitt.eduSHV3
Gopalakrishnan, Vvanathi@pitt.eduVANATHI
Date: 1 January 2014
Date Type: Publication
Journal or Publication Title: Journal of Clinical Bioinformatics
Volume: 4
Number: 1
DOI or Unique Handle: 10.1186/2043-9113-4-13
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
School of Medicine > Biomedical Informatics
School of Medicine > Computational and Systems Biology
Refereed: Yes
Date Deposited: 21 Dec 2016 20:45
Last Modified: 30 Mar 2021 10:55


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item