| A graph-theoretic approach to webpage segmentation |
| Full text |
Pdf
(489 KB)
|
Source
|
International World Wide Web Conference
archive
Proceeding of the 17th international conference on World Wide Web
table of contents
Beijing, China
SESSION: Search: corpus characterization and Search Perform
table of contents
Pages 377-386
Year of Publication: 2008
ISBN:978-1-60558-085-2
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 146, Citation Count: 1
|
|
|
ABSTRACT
We consider the problem of segmenting a webpage into visually and semantically cohesive pieces. Our approach is based on formulating an appropriate optimization problem on weighted graphs, where the weights capture if two nodes in the DOM tree should be placed together or apart in the segmentation; we present a learning framework to learn these weights from manually labeled data in a principled manner. Our work is a significant departure from previous heuristic and rule-based solutions to the segmentation problem. The results of our empirical analysis bring out interesting aspects of our framework, including variants of the optimization problem and the role of learning.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
| |
7
|
D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In 5th Asia Pacific Web Conference, pages 406--415, 2003.
|
 |
8
|
|
 |
9
|
Soumen Chakrabarti , Mukul Joshi , Vivek Tawde, Enhanced topic distillation using text, markup tags, and hyperlinks, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.208-216, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383990]
|
| |
10
|
|
 |
11
|
|
| |
12
|
L. Hubert and P. Arabie. Comparing partitions. J. Classification, 2:193--218, 1985.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
G. Milligan and M. Cooper. A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21(4):441--458, 1986.
|
| |
17
|
|
| |
18
|
|
| |
19
|
K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS 2006, pages 1473--1480, 2006.
|
 |
20
|
|
|