| Webpage understanding: beyond page-level search |
| Full text |
Pdf
(799 KB)
|
Source
|
ACM SIGMOD Record
archive
Volume 37 , Issue 4 (December 2008)
table of contents
COLUMN: Special section on managing information extraction
table of contents
Pages 48-54
Year of Publication: 2009
ISSN:0163-5808
|
|
Authors
|
|
Zaiqing Nie
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Ji-Rong Wen
|
Microsoft Research Asia, Beijing, P. R. China
|
|
Wei-Ying Ma
|
Microsoft Research Asia, Beijing, P. R. China
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 112, Citation Count: 0
|
|
|
ABSTRACT
In this paper we introduce the webpage understanding problem which consists of three subtasks: webpage segmentation, webpage structure labeling, and webpage text segmentation and labeling. The problem is motivated by the search applications we have been working on including Microsoft Academic Search, Windows Live Product Search and Renlifang Entity Relationship Search. We believe that integrated webpage understanding will be an important direction for future research in Web mining.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma. VIPS: a Vision-based Page Segmentation Algorithm. Microsoft Technical Report, MSR-TR-2003-79, 2003.
|
 |
3
|
|
| |
4
|
D. DiPasquo. Using HTML Formatting to Aid in Natural Language Processing on the World Wide Web. Senior Honors Thesis, Carnegie Mellon University, 1998.
|
| |
5
|
|
| |
6
|
Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma. Object-Level Vertical Search. Proc. of CIDR, 2007.
|
 |
7
|
Zaiqing Nie , Yunxiao Ma , Shuming Shi , Ji-Rong Wen , Wei-Ying Ma, Web object retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242584]
|
 |
8
|
|
| |
9
|
S. Sarawagi and W. W. Cohen. Semi-Markov. Conditional Random Fields for Information Extraction. Proc. of NIPS, 2004.
|
| |
10
|
S. Soderland. Learning to Extract Text-based Information from the World Wide Web. Proc. of SIGKDD, 1997.
|
 |
11
|
Ruihua Song , Haifeng Liu , Ji-Rong Wen , Wei-Ying Ma, Learning block importance models for web pages, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988700]
|
 |
12
|
Chunyu Yang , Yong Cao , Zaiqing Nie , Jie Zhou , Ji-Rong Wen, Closing the loop in webpage understanding, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458298]
|
 |
13
|
Jun Zhu , Bo Zhang , Zaiqing Nie , Ji-Rong Wen , Hsiao-Wuen Hon, Webpage understanding: an integrated approach, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281288]
|
 |
14
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, Simultaneous record detection and attribute labeling in web data extraction, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150457]
|
| |
15
|
|
|