ACM Home Page
Please provide us with feedback. Feedback
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
Full text PdfPdf (276 KB)
Source International World Wide Web Conference archive
Proceedings of the 12th international conference on World Wide Web table of contents
Budapest, Hungary
SESSION: Information Retrieval table of contents
Pages: 11 - 18  
Year of Publication: 2003
ISBN:1-58113-680-3
Authors
Shipeng Yu  Peking University, Beijing, China
Deng Cai  Tsinghua University, Beijing, China
Ji-Rong Wen  Microsoft Research Asia, Beijing, China
Wei-Ying Ma  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 181,   Citation Count: 42
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775152.775155
What is a DOI?

ABSTRACT

In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Buckley, C., Salton, G., and Allan, J., Automatic Retrieval with Locality Information Using Smart, In the First Text REtrieval Conference (TREC-1), National Institute of Standards and Technology, Gaithersburg, MD, 1992, pp. 59--72.
 
4
5
 
6
Cai, D., Yu, S., Wen, J.-R., and Ma, W.-Y., Extracting Content Structure for Web Pages based on Visual Representation, In The Fifth Asia Pacific Web Conference (APWeb2003), 2003.
7
8
 
9
Crivellari, F. and Melucci, M., Web Document Retrieval Using Passage Retrieval, Connectivity Information, and Automatic Link Weighting--TREC-9 Report, In The Ninth Text REtrieval Conference (TREC 9), 2000.
 
10
Efthimiadis, N. E., Query Expansion, In Annual Review of Information Systems and Technology, Vol. 31, 1996, pp. 121--187.
11
 
12
 
13
 
14
Newby, G., Information Space Based on HTML Structure, In The Ninth Text REtrieval Conference (TREC 9), 2000, pp. 601--610.
 
15
Robertson, S. E., Overview of the okapi projects, Journal of Documentation, Vol. 53, No. 1, 1997, pp. 3--7.
 
16
Robertson, S. E. and Sparck Jones, K., Relevance weighting of search terms, Journal of the American Society of Information Science, Vol. 27, No. May--June, 1976, pp. 129--146.
 
17
Robertson, S. E. and Walker, S., Okapi/Keenbow at TREC-8, In the Eighth Text REtrieval Conference (TREC 8), 1999, pp. 151--162.
 
18

CITED BY  42

Collaborative Colleagues:
Shipeng Yu: colleagues
Deng Cai: colleagues
Ji-Rong Wen: colleagues
Wei-Ying Ma: colleagues