|
ABSTRACT
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Buckley, C., Salton, G., and Allan, J., Automatic Retrieval with Locality Information Using Smart, In the First Text REtrieval Conference (TREC-1), National Institute of Standards and Technology, Gaithersburg, MD, 1992, pp. 59--72.
|
| |
4
|
|
 |
5
|
|
| |
6
|
Cai, D., Yu, S., Wen, J.-R., and Ma, W.-Y., Extracting Content Structure for Web Pages based on Visual Representation, In The Fifth Asia Pacific Web Conference (APWeb2003), 2003.
|
 |
7
|
|
 |
8
|
Jinlin Chen , Baoyao Zhou , Jin Shi , Hongjiang Zhang , Qiu Fengwu, Function-based object model towards website adaptation, Proceedings of the 10th international conference on World Wide Web, p.587-596, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372161]
|
| |
9
|
Crivellari, F. and Melucci, M., Web Document Retrieval Using Passage Retrieval, Connectivity Information, and Automatic Link Weighting--TREC-9 Report, In The Ninth Text REtrieval Conference (TREC 9), 2000.
|
| |
10
|
Efthimiadis, N. E., Query Expansion, In Annual Review of Information Systems and Technology, Vol. 31, 1996, pp. 121--187.
|
 |
11
|
D. W. Embley , Y. Jiang , Y.-K. Ng, Record-boundary discovery in Web documents, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.467-478, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
12
|
|
| |
13
|
Eija Kaasinen , Matti Aaltonen , Juha Kolari , Suvi Melakoski , Timo Laakko, Two approaches to bringing Internet services to WAP devices, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.231-246, June 2000, Amsterdam, The Netherlands
|
| |
14
|
Newby, G., Information Space Based on HTML Structure, In The Ninth Text REtrieval Conference (TREC 9), 2000, pp. 601--610.
|
| |
15
|
Robertson, S. E., Overview of the okapi projects, Journal of Documentation, Vol. 53, No. 1, 1997, pp. 3--7.
|
| |
16
|
Robertson, S. E. and Sparck Jones, K., Relevance weighting of search terms, Journal of the American Society of Information Science, Vol. 27, No. May--June, 1976, pp. 129--146.
|
| |
17
|
Robertson, S. E. and Walker, S., Okapi/Keenbow at TREC-8, In the Eighth Text REtrieval Conference (TREC 8), 1999, pp. 151--162.
|
| |
18
|
|
CITED BY 42
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ruihua Song , Haifeng Liu , Ji-Rong Wen , Wei-Ying Ma, Learning block importance models for web pages, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
Deng Cai , Xiaofei He , Zhiwei Li , Wei-Ying Ma , Ji-Rong Wen, Hierarchical clustering of WWW image search results using visual, textual and link information, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jalal Mahmud , Yevgen Borodin , Dipanjan Das , I. V. Ramakrishnan, Improving non-visual web access using context, Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility, October 23-25, 2006, Portland, Oregon, USA
|
|
|
|
|
|
|
|
|
Masayuki Okabe , Kyoji Umemura , Seiji Yamada, Query expansion with the minimum user feedback by transductive learning, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.963-970, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Jie Han , Dingyi Han , Chenxi Lin , Hua-Jun Zeng , Zheng Chen , Yong Yu, Homepage live: automatic block tracing for web personalization, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
Nan Di , Conglei Yao , Mengcheng Duan , Jonathan Zhu , Xiaoming Li, Representing a web page as sets of named entities of multiple types: a model and some preliminary applications, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I. V. Ramakrishnan , Jalal Mahmud , Yevgen Borodin , Muhammad Asiful Islam , Faisal Ahmed, Bridging the Web Accessibility Divide, Electronic Notes in Theoretical Computer Science (ENTCS), 235, p.107-124, April, 2009
|
|
|
|
|
|
|
|
|
Gaël Dias , Elsa Alves , José Gabriel Pereira Lopes, Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation, Proceedings of the 22nd national conference on Artificial intelligence, p.1334-1339, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|