ACM Home Page
Please provide us with feedback. Feedback
Document selection methodologies for efficient and effective learning-to-rank
Full text PdfPdf (474 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Learning to rank II table of contents
Pages 468-475  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Javed A. Aslam  Northeastern University, Boston, MA, USA
Evangelos Kanoulas  Northeastern University, Boston, MA, USA
Virgil Pavlu  Northeastern University, Boston, MA, USA
Stefan Savev  Northeastern University, Boston, MA, USA
Emine Yilmaz  Microsoft Research, Cambridge, United Kingdom
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 55,   Downloads (12 Months): 192,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572022
What is a DOI?

ABSTRACT

Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms.

In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation--depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets.

Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In B. Schölkopf, J. C. Platt, T. Homan, B. Schölkopf, J. C. Platt, and T. Homan, editors, NIPS, pages 193--200. MIT Press, 2006.
5
6
 
7
W. B. Croft, A. Moat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998. Information Science, 2008.
 
8
 
9
D. Harman. Overview of the third text REtreival conference (TREC-3). In D. Harman, editor, Overview of the Third Text REtrieval Conference (TREC-3), pages 1--19. U.S. Government Printing Office, Apr. 1995.
10
11
 
12
 
13
T.-Y. Liu, T. Qin, J. Xu, X. Wenying, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval.
 
14
T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR '07: Proceedings of the Learning to Rank workshop in the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
 
15
T. Minka and S. Robertson. Selection bias in the letor datasets. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.
 
16
V. Pavlu. Large Scale IR Evaluation. PhD thesis, Northeastern University, College of Computer and Information Science, 2008.
 
17
T. Qin, T.-Y. Liu, J. Xu, and H. Li. How to make letor more useful and reliable. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.
 
18
A. Singhal and G. Inc. Modern information retrieval: a brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24:2001, 2001.
19
 
20
E. M. Voorhees and D. Harman. Overview of the seventh text retrieval conference (TREC-7). In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 1--24, 1999.
21
22
23

Collaborative Colleagues:
Javed A. Aslam: colleagues
Evangelos Kanoulas: colleagues
Virgil Pavlu: colleagues
Stefan Savev: colleagues
Emine Yilmaz: colleagues