Harpale, A and Yang, Y and Gopal, S and He, D and Yue, Z
(2010)
CiteData: A new multi-faceted dataset for evaluating personalized search performance.
In: UNSPECIFIED.
![[img]](http://d-scholarship.pitt.edu/style/images/fileicons/text_plain.png) |
Plain Text (licence)
Available under License : See the attached license file.
Download (1kB)
|
Abstract
Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classifiers, PageR-ank algorithms and Collaborative Filtering methods are often used as intermediate steps in such personalized retrieval systems. Thorough comparative evaluation of such complex systems has been difficult due to the lack of appropriate publicly available datasets that provide such diverse feature sets. To remedy the situation, we have created Cite-Data, a new dataset for benchmark evaluations of personalized search performance, that will be made publicly accessible. CiteData is a collection of academic articles extracted from CiteULike and CiteSeer repositories, with rich feature sets such as authors, author-affiliations, topic labels, social tags and citation information. We further supplement it with personalized queries and relevance judgments which were obtained from volunteer users. This paper starts with a discussion of the design criteria and characteristics of the CiteData dataset in comparison with current benchmark datasets, followed by a set of task-oriented empirical evaluations of popular algorithms in statistical classification, collaborative filtering and link analysis as intermediate steps for personalized search. Our results show significant performance improvement of personalized approaches, over that of unpersonalized approaches. We also observe that a meta personalized search engine that leverages information from multiple sources of features performs better than algorithms that use only one of the constituent source of features. © 2010 ACM.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Metrics
Monthly Views for the past 3 years
Plum Analytics
Altmetric.com
Actions (login required)
 |
View Item |