ACM Home Page
Please provide us with feedback. Feedback
Topical locality in the Web
Full text PdfPdf (772 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 272 - 279  
Year of Publication: 2000
ISBN:1-58113-226-3
Author
Brian D. Davison  Department of Computer Science, Rutgers, The State University of New Jersey, New Brunswick, NJ
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 73,   Citation Count: 52
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345597
What is a DOI?

ABSTRACT

Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable World-Wide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results show the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Amitay. Hypertext- The importance of being different. Master's thesis, Edinburgh University, Scotland, 1997. Also Technical Report No. HCRC/RP-94.
 
2
E. Amitay. Using common hypertext links to identify the best phrasal description of target web documents. In Proceedings of the SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web, Melbourne, Australia, 1998.
3
 
4
 
5
6
 
7
J. Boyan, D. Freitag, and T. Joachims. A Machine Learning Architecture for Optimizing Web Search Engines. In AAAI Workshop on Internet-Based Information Systems, Portland, OR, Aug. 1996.
 
8
9
 
10
 
11
 
12
 
13
B.D. Davison. Adaptive Web Prefetching. In Proceedings of the 2nd Workshop on Adaptive Systems and User Modeling on the WWW, pages 105-106, Toronto, May 1999. Position paper. Proceedings published as Computing Science Report 99-07, Dept. of Mathematics and Computing Science, Eindhoven University of Technology.
 
14
B. D. Davison. Topical locality in the Web: Experiments and observations. Technical Report DCS-TR-414, Department of Computer Science, Rutgers University, 2000.
 
15
B.D. Davison, A. Gerasoulis, K. Kleisouris, Y. Lu, H. Set, W. Wang, and B. Wu. DiscoWeb: Applying Link Analysis to Web Search. In Poster proceedings of the Eighth International World Wide Web Conference, pages 148-149, Toronto, Canada, May 1999.
 
16
17
 
18
A. Howe and D. Dreilinger. SavvySearch: A MetaSearch Engine that Learns Which Search Engines to Query. AI Magazine, 18(2), 1997.
 
19
T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: A Tour Guide for the World Wide Web. In Proceedings of the Feenth International Joint Conference on Artificial Intelligence, pages 770-775. Morgan Kaufmann, Aug. 1997.
 
20
 
21
T. Koch, A. Ardo, A. Brummer, and S. Lundberg. The building and maintenance of robot based internet search services: A review of current indexing and data collection methods. Prepared for Work Package 3 of EU Telematics for Research, project DESIRE; Available from http:l/www.ub2.1u.se/desire/radar/reportslD3.111, Sept. 1996.
 
22
 
23
S. Lawrence and C. L. Giles. Accessibility of Information on the Web. Nature, 400:107-109, 1999.
24
 
25
O. A. McBryan. GENVL and WWWW: Tools for taming the Web. In Proceedings of the First International World Wide Web Conference, Geneva, Switzerland, May 1994.
 
26
 
27
D. Mladenic. Personal WebWatcher: Implementation and Design. Technical Report IJS-DP-7472, Department of Intelligent Systems, J. Stefan Institute, Univ. of of Ljubljana, Slovenia, Oct. 1996.
28
 
29
 
30
E. Selberg and O. Etzioni. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, 12(1):8-14, Jan/Feb 1997.
 
31
D. Sullivan. More evil than Dr. Evil? From the Search Engine Report, at http://www.searchenginewatch- .com/sereport/99/11-google.html, Nov. 1999.
 
32
D. Sullivan. Search engine features for webmasters. From Search Engine Watch, at http://www.searchenginewatch- .com/webmasters/features.html, Jan. 2000.
 
33

CITED BY  53