ACM Home Page
Please provide us with feedback. Feedback
The connectivity sonar: detecting site functionality by structural patterns
Full text PdfPdf (153 KB)
Source Conference on Hypertext and Hypermedia archive
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia table of contents
Nottingham, UK
SESSION: Emergent web patterns table of contents
Pages: 38 - 47  
Year of Publication: 2003
ISBN:1-58113-704-4
Authors
Einat Amitay  IBM Research Labs, Haifa, Israel
David Carmel  IBM Research Labs, Haifa, Israel
Adam Darlow  IBM Research Labs, Haifa, Israel
Ronny Lempel  IBM Research Labs, Haifa, Israel
Aya Soffer  IBM Research Labs, Haifa, Israel
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 47,   Citation Count: 31
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/900051.900060
What is a DOI?

ABSTRACT

Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5% and 59% of the sites. On some categories, the precision of the classification exceeded 85%. An additional result of this work indicates that the structural signature can be used to detect spam rings and mirror sites, by clustering sites with almost identical signatures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
E. Amitay. Using common hypertext links to identify the best phrasal description of target web documents. In Proc of the SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web, Melbourne, Australia, 1998.
3
 
4
A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, October 1999.
5
 
6
 
7
B. Bollobás. Random Graphs. Academic Press, 1985.
8
 
9
10
 
11
12
13
 
14
B. D. Davison. Recognizing nepotistic links on the web. Technical Report WS-00-01, Artificial Intelligence for Web Search, 2000.
15
 
16
17
18
19
20
 
21
J. Fürnkranz. Using links for classifying web-pages. Technical Report TR-OEFAI-98-29, Austrian Research Institute for Artificial Intelligence, 1998.
 
22
23
24
25
 
26
M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.
27
 
28
J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The web as a graph: Measurements, models and methods. Proceedings of the Fifth International Computing and Combinatorics Conference, pages 1--17, 1999.
 
29
 
30
 
31
O. A. McBryan. Genvl and wwww: Tools for taming the web. In Proc First International World Wide Web Conference, Geneva, Switzerland, pages 79--90, May 1994.
32
 
33
34
35
 
36
 
37
RuleQuest Research. Data Mining Tools See5 and C5.0. http://www.rulequest.com/see5-info.html.
 
38
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. American Soc. Info. Sci., 24:265--269, 1973.
39

CITED BY  31

Collaborative Colleagues:
Einat Amitay: colleagues
David Carmel: colleagues
Adam Darlow: colleagues
Ronny Lempel: colleagues
Aya Soffer: colleagues