| Evaluating topic-driven web crawlers |
| Full text |
Pdf
(210 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
New Orleans, Louisiana, United States
Pages: 241 - 249
Year of Publication: 2001
ISBN:1-58113-331-6
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 34
|
|
|
ABSTRACT
Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed. The issue is even more important for topic-specific search engines, where crawlers must make additional decisions based on the relevance of visited pages. However, it is difficult to evaluate alternative crawling strategies because relevant sets are unknown and the search space is changing. We propose three different methods to evaluate crawling strategies. We apply the proposed metrics to compare three topic-driven crawling algorithms based on similarity ranking, link analysis, and adaptive agents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Israel Ben-Shaul , Michael Herscovici , Michal Jacovi , Yoelle S. Maarek , Dan Pelleg , Menachem Shtalhaim , Vladimir Soroka , Sigalit Ur, Adding support for dynamic and focused search with Fetuccino, Proceeding of the eighth international conference on World Wide Web, p.1653-1665, May 1999, Toronto, Canada
|
 |
3
|
|
| |
4
|
|
| |
5
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
T.Haveliwala.E .cient computation of pagerank. Technical report,Stanford Database Group,1999.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
| |
15
|
|
 |
16
|
Hwee Tou Ng , Wei Boon Goh , Kok Leong Low, Feature selection, perception learning, and a usability case study for text categorization, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.67-73, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
| |
17
|
B.Pinkerton.Finding what people want:Experiences with the webcrawler.In Proceedings of the First International World Wide Web Conference,Geneva, Switzerland 1994.
|
| |
18
|
M.Porter.An algorithm for su .x stripping.Program 14(3):130 -137,1980. http://www.muscat.com/~martin/stem.html.
|
| |
19
|
|
| |
20
|
|
 |
21
|
Ilmério Silva , Berthier Ribeiro-Neto , Pável Calado , Edleno Moura , Nívio Ziviani, Link-based and content-based evidential information in a belief network model, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.96-103, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345554]
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
CITED BY 35
|
|
|
|
|
|
|
|
|
|
|
Gautam Pant , Kostas Tsioutsiouliklis , Judy Johnson , C. Lee Giles, Panorama: extending digital libraries with topical crawlers, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yida Wang , Jiang-Ming Yang , Wei Lai , Rui Cai , Lei Zhang , Wei-Ying Ma, Exploring traversal strategy for web forum crawling, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guilherme T. de Assis , Alberto H. F. Laender , Altigran S. da Silva , Marcos André Gonçalves, The impact of term selection in genre-aware focused crawling, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
Zhumin Chen , Jun Ma , Jingsheng Lei , Bo Yuan , Li Lian , Ling Song, A cross-language focused crawling algorithm based on multiple relevance prediction strategies, Computers & Mathematics with Applications, v.57 n.6, p.1057-1072, March, 2009
|
|
|
Jiang-Ming Yang , Rui Cai , Chunsong Wang , Hua Huang , Lei Zhang , Wei-Ying Ma, Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Search process
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Performance evaluation (efficiency and effectiveness)
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.8
Problem Solving, Control Methods, and Search
Subjects:
Graph and tree search strategies
General Terms:
Algorithms,
Measurement,
Performance
Keywords:
InfoSpiders,
PageRank,
Web information retrieval,
best-first search,
focused crawlers,
performance metrics,
topic driven crawling
|