ACM Home Page
Please provide us with feedback. Feedback
Query by document
Full text PdfPdf (692 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web search table of contents
Pages 34-43  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Yin Yang  Computer Science, HKUST
Nilesh Bansal  University of Toronto
Wisam Dakka  Search Quality, Google Inc.
Panagiotis Ipeirotis  New York University
Nick Koudas  University of Toronto
Dimitris Papadias  Computer Science, HKUST
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 40,   Downloads (12 Months): 297,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498806
What is a DOI?

ABSTRACT

We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa.

In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given "query document" to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular, we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank for this purpose.

We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons's Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
BlogScope http://www.blogscope.net/about/
 
5
6
 
7
 
8
Cucerzan, S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, 2007.
 
9
 
10
Efthmiadis, E. Query Expansion. In Annual Review of Information Science and Technology, 31:121--187, 1996.
11
12
 
13
Feller, W. An Introduction to Probability Theory and Its Applications, Wiley, 1968.
 
14
 
15
Gravano, L., Ipeirotis, P., Koudas, N., Srivastava, D. Text Joins for Data Cleasing and Integration in an RDBMS. In WWW, 2003.
 
16
17
18
 
19
Ide, E. New Experiments in Relevance Feedback. In The SMART Retrieval System - Experiments in Automatic Document Processing, Prentice-Hall, 1971.
 
20
Levenshtein, V. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics Doklady 1966.
 
21
MacDonald, C., He, B., Plachouras, V., Ounis, I. University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise Tracks with Terrier. In TREC, 2005.
 
22
 
23
Medelyan, O. Computing Lexical Chains with Graph Clustering In ACL 2007.
24
 
25
Mitra, M., Buckley, C., Singhal, A., Cardie, C. An Analysis of Statistical and Sytactic Phrases. In RIAO Conference, 1997.
 
26
 
27
Amazon Mechanical Turk. http://www.mturk.com
 
28
Pantel, P., Lin, D. A statistical corpus based term extractor Lecture notes in AI, 2001, Springer-Verlag
 
29
Part-of-speech tagging. http://en.wikipedia.org/wiki/Part-of-speech_tagging
 
30
Rocchio, J. Relevance Feedback in Information Retrieval. In The SMART Retrieval System - Experiments in Automatic Document Processing, Prentice-Hall, 1971.
 
31
 
32
Spink, A., Jansen, B., Ozmultu, H. Use of Query Reformulation and Relevance Feedback by Excite Users. In Internet Research: Electronic Networking Applications and Policy, 2000.
 
33
 
34
 
35
Vechtomova, O., Karamuftuoglu, M. Approaches to High Accuracy Retrieval: Phrase-Based Search Experiments in the HARD Track. In TREC, 2004.
36
 
37
Yahoo Term Extraction Web Service. http://developer.yahoo.com/search/content/V1/termExtraction.html
38
 
39
The Future of Social Networking: Understanding Market Stratigic and Technology developments. Datamonitor, 2007.


Collaborative Colleagues:
Yin Yang: colleagues
Nilesh Bansal: colleagues
Wisam Dakka: colleagues
Panagiotis Ipeirotis: colleagues
Nick Koudas: colleagues
Dimitris Papadias: colleagues