ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Template detection via data mining and its applications
Full text PdfPdf (404 KB)
Source International World Wide Web Conference archive
Proceedings of the 11th international conference on World Wide Web table of contents
Honolulu, Hawaii, USA
SESSION: Description and Analysis table of contents
Pages: 580 - 591  
Year of Publication: 2002
ISBN:1-58113-449-5
Authors
Ziv Bar-Yossef  University of California at Berkeley, Berkeley, CA
Sridhar Rajagopalan  IBM Almaden Research Center, San Jose, CA
Sponsors
ACM: Association for Computing Machinery
: WWW'02
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 94,   Citation Count: 48
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511446.511522
What is a DOI?

ABSTRACT

We formulate and propose the template detection problem, and suggest a practical solution for it based on counting frequent item sets. We show that the use of templates is pervasive on the web. We describe three principles, which characterize the assumptions made by hypertext information retrieval (IR) and data mining (DM) systems, and show that templates are a major source of violation of these principles. As a consequence, basic "pure" implementations of simple search algorithms coupled with template detection and elimination show surprising increases in precision at all levels of recall.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
V. Bush. As we may think. The Atlantic Monthly, 176(1):101--108, July 1945.
6
 
7
 
8
9
 
10
11
 
12
 
13
 
14
B. D. Davison. Recognizing nepotistic links on the web. In Proceedings of the AAAI-2000 Workshop on Artificial Intelligence for Web Search, pages 23--28, 2000.
 
15
 
16
E. Garfield. "Citation Analysis as a Tool in Journal Evaluation". Science, 178:471--479, 1972.
 
17
Google. http://www.google.com.
 
18
M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
19
 
20
 
21
 
22
 
23
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1998.
 
24
G. Pinski and F. Narin. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Proc. and Management, 12, 1976.
25
 
26
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24:265--269, 1973.

CITED BY  48

Collaborative Colleagues:
Ziv Bar-Yossef: colleagues
Sridhar Rajagopalan: colleagues