ACM Home Page
Please provide us with feedback. Feedback
Web object retrieval
Full text PdfPdf (473 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: Data mining table of contents
Pages: 81 - 90  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Zaiqing Nie  Microsoft Research Asia, Beijing, China
Yunxiao Ma  Microsoft Research Asia, Beijing, China
Shuming Shi  Microsoft Research Asia, Beijing, China
Ji-Rong Wen  Microsoft Research Asia, Beijing, China
Wei-Ying Ma  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 202,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242584
What is a DOI?

ABSTRACT

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
J.P. Callan. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, edited by W. Bruce Croft. Kluwer Academic Publisher, pp. 127--150, 2000.
 
6
Abdur Chowdhury, Mohammed Aljlayl, Eric Jensen, Steve Beitzel, David Grossman and Ophir Frieder. Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval. In The Eleventh Text REtrieval Conference (TREC 2002), 2003.
7
8
 
9
Nick Craswell, David Hawking and Trystan Upstill. TREC12 Web and Interactive Tracks at CSIRO. In The Twelfth Text Retrieval Conference(TREC 2003), 2004.
10
11
 
12
13
 
14
 
15
16
17
18
19
 
20
Mounia Lalmas, Uniform representation of content and structure for structured document retrieval. Technical Report, Queen Mary and Westfield College, University of London, 2000.
21
22
 
23
 
24
 
25
Felix Naumann and Rolker Claudia. Assessment Methods for Information Quality Criteria. In Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, 2000.
26
 
27
Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma. Object-level Vertical Search. To appear by the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.
28
 
29
S. E. Robertson, S. Walker, S. Jones and M. M. Hancock-Beaulieu. Okapi at TREC-3. In The Third Text REtrieval Conference (TREC 3), 1994.
30
31
32
 
33
Thijs Westerveld, Wessel Kraaij and Djoerd Hiemstra. Retrieving Web Pages using Content, Links, URLs and Anchors. In The Tenth Text REtrieval Conference (TREC2001), 2001.
 
34
35
36
37
38

CITED BY  11

Collaborative Colleagues:
Zaiqing Nie: colleagues
Yunxiao Ma: colleagues
Shuming Shi: colleagues
Ji-Rong Wen: colleagues
Wei-Ying Ma: colleagues