| Document selection methodologies for efficient and effective learning-to-rank |
| Full text |
Pdf
(474 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Learning to rank II
table of contents
Pages 468-475
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
Javed A. Aslam
|
Northeastern University, Boston, MA, USA
|
|
Evangelos Kanoulas
|
Northeastern University, Boston, MA, USA
|
|
Virgil Pavlu
|
Northeastern University, Boston, MA, USA
|
|
Stefan Savev
|
Northeastern University, Boston, MA, USA
|
|
Emine Yilmaz
|
Microsoft Research, Cambridge, United Kingdom
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 55, Downloads (12 Months): 192, Citation Count: 0
|
|
|
ABSTRACT
Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms. In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation--depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets. Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
4
|
C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In B. Schölkopf, J. C. Platt, T. Homan, B. Schölkopf, J. C. Platt, and T. Homan, editors, NIPS, pages 193--200. MIT Press, 2006.
|
 |
5
|
|
 |
6
|
Ben Carterette , Virgil Pavlu , Evangelos Kanoulas , Javed A. Aslam , James Allan, Evaluation over thousands of queries, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390445]
|
| |
7
|
W. B. Croft, A. Moat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998. Information Science, 2008.
|
| |
8
|
|
| |
9
|
D. Harman. Overview of the third text REtreival conference (TREC-3). In D. Harman, editor, Overview of the Third Text REtrieval Conference (TREC-3), pages 1--19. U.S. Government Printing Office, Apr. 1995.
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
T.-Y. Liu, T. Qin, J. Xu, X. Wenying, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval.
|
| |
14
|
T. Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR '07: Proceedings of the Learning to Rank workshop in the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
|
| |
15
|
T. Minka and S. Robertson. Selection bias in the letor datasets. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.
|
| |
16
|
V. Pavlu. Large Scale IR Evaluation. PhD thesis, Northeastern University, College of Computer and Information Science, 2008.
|
| |
17
|
T. Qin, T.-Y. Liu, J. Xu, and H. Li. How to make letor more useful and reliable. In SIGIR '08: Proceedings of the of the Learning to Rank workshop 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 2008. ACM.
|
| |
18
|
A. Singhal and G. Inc. Modern information retrieval: a brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24:2001, 2001.
|
 |
19
|
Michael Taylor , Hugo Zaragoza , Nick Craswell , Stephen Robertson , Chris Burges, Optimisation methods for ranking functions with multiple parameters, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183698]
|
| |
20
|
E. M. Voorhees and D. Harman. Overview of the seventh text retrieval conference (TREC-7). In Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 1--24, 1999.
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
|