ACM Home Page
Please provide us with feedback. Feedback
Top-k aggregation using intersections of ranked inputs
Full text PdfPdf (546 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web ranking table of contents
Pages 222-231  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Ravi Kumar  Yahoo! Research, Sunnyvale, CA
Kunal Punera  Yahoo! Research, Sunnyvale, CA
Torsten Suel  Yahoo! Research, Sunnyvale, CA
Sergei Vassilvitskii  Yahoo! Research, Sunnyvale, CA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 165,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498830
What is a DOI?

ABSTRACT

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and preaggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches.

We generalize the TA and NRA algorithms to the case when preaggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
B. Bhattacharjee, S. Chawathe, V. Gopalakrishnan, P. Keleher, and B. Silaghi. Efficient peer-to-peer searches using result-caching. In Proc. 2nd IWP2PS, 2003.
5
 
6
 
7
8
 
9
 
10
 
11
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a mini-computer using statistical ranking. JASIS, 41(8):581--589, 1990.
12
 
13
14
 
15
 
16
17
 
18
 
19
R. Schenkel, A. Broschart, S. Hwang, M. Theobald, and G. Weikum. Efficient text proximity search. In Proc. 14th SPIRE, pages 287--299, 2007.
 
20
A. Schrijver. Combinatorial Optimization. Springer, 2003.
21
 
22
23
 
24
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes. Morgan Kaufmann, 1999.
 
25
 
26
 
27
J. Zhang and T. Suel. Optimized inverted list assignment in distributed search engine architectures. In Proc. 21st IPDPS, pages 1--10, 2007.

Collaborative Colleagues:
Ravi Kumar: colleagues
Kunal Punera: colleagues
Torsten Suel: colleagues
Sergei Vassilvitskii: colleagues