Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Study of integrated heterogeneous data reveals prognostic power of gene expression for breast cancer survival

Neapolitan, RE and Jiang, X (2015) Study of integrated heterogeneous data reveals prognostic power of gene expression for breast cancer survival. PLoS ONE, 10 (2).

[img]
Preview
PDF
Published Version
Available under License : See the attached license file.

Download (4MB)
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)

Abstract

Background: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. Methodology/Findings: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict longterm survival. Conclusions/Significance: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Article
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Neapolitan, RE
Jiang, Xxij6@pitt.eduXIJ6
Contributors:
ContributionContributors NameEmailPitt UsernameORCID
EditorToland, Amanda EwartUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date: 27 February 2015
Date Type: Publication
Journal or Publication Title: PLoS ONE
Volume: 10
Number: 2
DOI or Unique Handle: 10.1371/journal.pone.0117658
Schools and Programs: School of Medicine > Biomedical Informatics
Refereed: Yes
Other ID: NLM PMC4344205
PubMed Central ID: PMC4344205
PubMed ID: 25723490
Date Deposited: 12 May 2015 18:11
Last Modified: 30 Mar 2021 10:55
URI: http://d-scholarship.pitt.edu/id/eprint/24098

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com


Actions (login required)

View Item View Item