ACM Home Page
Please provide us with feedback. Feedback
Panorama: extending digital libraries with topical crawlers
Full text PdfPdf (1.16 MB)
Source International Conference on Digital Libraries archive
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries table of contents
Tuscon, AZ, USA
SESSION: Crawling the web table of contents
Pages: 142 - 150  
Year of Publication: 2004
ISBN:1-58113-832-6
Authors
Gautam Pant  The University of Iowa, Iowa City, IA
Kostas Tsioutsiouliklis  NEC Laboratories America: Inc., Princeton, NJ
Judy Johnson  NEC Laboratories America: Inc., Princeton, NJ
C. Lee Giles  The Pennsylvania State University, University Park, PA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 46,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/996350.996384
What is a DOI?

ABSTRACT

A large amount of research, technical and professional documents are available today in digital formats Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
 
6
7
 
8
9
10
 
11
 
12
13
 
14
 
15
 
16
 
17
J. Johnson, K. Tsioutsiouliklis, and C. L. Giles Evolving strategies for focused Web crawling. In Proc 20th Intl Conference on Machine Learning (ICML 2003), Washington DC, 2003.
18
 
19
 
20
21
 
22
M. Porter. An algorithm for suffix stripping Program, 14(3):130--137, 1980.
 
23
 
24
 
25
 
26
27
 
28
 
29
 
30
31

CITED BY  11

Collaborative Colleagues:
Gautam Pant: colleagues
Kostas Tsioutsiouliklis: colleagues
Judy Johnson: colleagues
C. Lee Giles: colleagues