ACM Home Page
Please provide us with feedback. Feedback
Page-level template detection via isotonic smoothing
Full text PdfPdf (288 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: Data mining table of contents
Pages: 61 - 70  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Deepayan Chakrabarti  Yahoo! Research, Sunnyvale, CA
Ravi Kumar  Yahoo! Research, Sunnyvale, CA
Kunal Punera  University of Texas at Austin, Austin, TX
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 98,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242582
What is a DOI?

ABSTRACT

We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is theautomatic generation of training data for a classifier that, given apage, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these per-node classifier scores bysolving a regularized isotonic regression problem; the latter follows from a simple yet powerful abstraction of templateness on a page. Our extensive experiments on human-labeled test data show that our approachdetects templates effectively.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
6
 
7
8
 
9
B. Davison. Recognizing nepotistic links on the web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search, pages 23--28, 2000.
 
10
11
 
12
L. Hubert and P. Arabie. Comparing partitions. J. Classification, 2:193--218, 1985.
13
 
14
15
16
 
17
G. Milligan and M. Cooper. A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21(4):441--458, 1986.
 
18
 
19
T. Morton-Jones, P. Diggle, L. Parker, H.O. Dickinson, and K. Blinks. Additive isotonic regression models in epidemiology. Statistics in Medicine, 19(6):849--859, 2000.
 
20
P.M. Pardalos and G. Xue. Algorithms for a class of isotonic regression problems. Algorithmica, 23(3):211--222, 1999.
 
21
T. Robertson, F.T. Wright, and R.L. Dykstra. Order-Restrictied Statistical Inference. Wiley, 1988.
22
 
23
Q. Stout. Optimal algorithms for unimodal regression. Computing Science and Statistics, 32:348--355, 2000.
24
 
25
L. Yi and B. Liu. Web page cleaning for web mining through feature weighting. In Proc. 18th IJCAI, pages 43--50, 2003.
26
27


Collaborative Colleagues:
Deepayan Chakrabarti: colleagues
Ravi Kumar: colleagues
Kunal Punera: colleagues