Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction

Thahir, M and Sharma, T and Ganapathiraju, MK (2012) An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction. BMC Proceedings, 6.

[img]
Preview
PDF
Published Version
Available under License : See the attached license file.

Download (2MB) | Preview
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)

Abstract

Background: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting') unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA) is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations) for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. Results: We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in belief of the classification model induced by the acquisition of the feature under consideration. As compared to random selection of proteins on which the experiments are performed and the type of experiment that is performed, the heuristic method reduces the number of experiments to as few as 40%. Most notable characteristic of this method is that it does not require re-training of the classification model on every possible combination of instance, feature and feature-value tuples. For this reason, our method is far less computationally expensive as compared with previous AFA strategies. Conclusions: The results show that our heuristic method for AFA creates an optimal training set with far less features acquired as compared to random acquisition. This shows the value of active feature acquisition to aid in protein-protein interaction prediction where feature acquisition is costly. Compared to previous methods, the proposed method reduces computational cost while also achieving a better F-score. The proposed method is valuable as it presents a direction to AFA with a far lesser computational expense by removing the need for the first time, of training a classifier for every combination of instance, feature and feature-value tuples which would be impractical for several domains.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Article
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Thahir, M
Sharma, T
Ganapathiraju, MKmadhavi@pitt.eduMADHAVI
Date: 13 November 2012
Date Type: Publication
Journal or Publication Title: BMC Proceedings
Volume: 6
DOI or Unique Handle: 10.1186/1753-6561-6-s7-s2
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
School of Medicine > Biomedical Informatics
Refereed: Yes
Date Deposited: 30 Nov 2016 17:29
Last Modified: 30 Mar 2021 11:55
URI: http://d-scholarship.pitt.edu/id/eprint/29801

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com


Actions (login required)

View Item View Item