Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

Jordan, R and Visweswaran, S and Gopalakrishnan, V (2014) Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids. Journal of Clinical Bioinformatics, 4 (1).

Preview

PDF
Published Version
Available under License : See the attached license file.
Download (685kB) | Preview

Plain Text (licence)
Available under License : See the attached license file.
Download (1kB)

Abstract

Background: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids. Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance. Results: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer. Conclusions: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

Article

Status:

Published

Creators/Authors:

Creators	Email	Pitt Username
Jordan, R	rmj12@pitt.edu	RMJ12
Visweswaran, S	shv3@pitt.edu	SHV3
Gopalakrishnan, V	vanathi@pitt.edu	VANATHI

Date:

1 January 2014

Date Type:

Publication

Journal or Publication Title:

Journal of Clinical Bioinformatics

Volume:

Number:

DOI or Unique Handle:

10.1186/2043-9113-4-13

Schools and Programs:

Dietrich School of Arts and Sciences > Intelligent Systems
School of Medicine > Biomedical Informatics
School of Medicine > Computational and Systems Biology

Refereed:

Yes

Date Deposited:

21 Dec 2016 20:45

Last Modified:

30 Mar 2021 10:55

URI:

http://d-scholarship.pitt.edu/id/eprint/29479

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

View Item

My Account

Search

Browse

Information

Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

Connect with us

Send Comments or Questions

Feeds