ACM Home Page
Please provide us with feedback. Feedback
Efficient and effective link analysis with precomputed salsa maps
Full text PdfPdf (493 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: web search 1 table of contents
Pages 53-62  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Marc Najork  Microsoft Research, Mountain View, CA, USA
Nick Craswell  Microsoft, Cambridge, United Kingdom
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 151,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458093
What is a DOI?

ABSTRACT

SALSA is a link-based ranking algorithm that takes the result set of a query as input, extends the set to include additional neighboring documents in the web graph, and performs a random walk on the induced subgraph. The stationary probability distribution of this random walk, used as a relevance score, is significantly more effective for ranking purposes than popular query-independent link-based ranking algorithms such as PageRank. Unfortunately, this requires significant effort at query-time, to access the link graph and compute the stationary probability distribution. In this paper, we explore whether it is possible to perform most of the computation off-line, prior to the arrival of any queries. The off-line phase of our approach computes a "score map" for each node in the web graph by performing a SALSA-like algorithm on the neighborhood of that node and retaining the scores of the most promising nodes in the neighborhood graph. The on-line phase takes the results to a query, retrieves the score map of each result, and returns for each result a score that is the sum of the matching scores from each score map. We evaluated this algorithm on a collection of about 28,000 queries with partially labeled results, and found that it is significantly more effective than PageRank, although not quite as effective as SALSA. We also studied the trade-off between ranking effectiveness and space requirements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
S. Gollapudi, M. Najork and R. Panigrahy. Using Bloom filters to speed up HITS-like ranking algorithms. In Proc. of the 5th Workshop on Algorithms and Models for the Web Graph,pages 195--201, 2007.
4
 
5
 
6
7
 
8
G. Linden. Marissa Mayer at Web 2.0. Online at: http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html
 
9
 
10
F. McSherry and M. Najork. Computing information retrieval performance measures efficiently in the presence of tied scores. In 30th European Conference on Information Retrieval, pages 414--421, 2008.
11
12
 
13
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
 
14
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference, 2004.


Collaborative Colleagues:
Marc Najork: colleagues
Nick Craswell: colleagues