| ChangeDetector: a site-level monitoring tool for the WWW |
| Full text |
Pdf
(582 KB)
|
| Source
|
International World Wide Web Conference
archive
Proceedings of the 11th international conference on World Wide Web
table of contents
Honolulu, Hawaii, USA
SESSION: Description and Analysis
table of contents
Pages: 570 - 579
Year of Publication: 2002
ISBN:1-58113-449-5
|
|
Authors
|
|
Vijay Boyapati
|
WhizBang! Labs, Pittsburgh, PA
|
|
Kristie Chevrier
|
WhizBang! Labs, Pittsburgh, PA
|
|
Avi Finkel
|
WhizBang! Labs, Pittsburgh, PA
|
|
Natalie Glance
|
WhizBang! Labs, Pittsburgh, PA
|
|
Tom Pierce
|
WhizBang! Labs, Pittsburgh, PA
|
|
Robert Stockton
|
WhizBang! Labs, Pittsburgh, PA
|
|
Chip Whitmer
|
WhizBang! Labs, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 50, Citation Count: 9
|
|
|
ABSTRACT
This paper presents a new challenge for Web monitoring tools: to build a system that can monitor entire web sites effectively. Such a system could potentially be used to discover "silent news" hidden within corporate web sites. Examples of silent news include reorganizations in the executive team of a company or in the retirement of a product line. ChangeDetector, an implemented prototype, addresses this challenge by incorporating a number of machine learning techniques. The principal backend components of ChangeDetector all rely on machine learning: intelligent crawling, page classification and entity-based change detection. Intelligent crawling enables ChangeDetector to selectively crawl the most relevant pages of very large sites. Classification allows change detection to be filtered by topic. Entity extraction over changed pages permits change detection to be filtered by semantic concepts, such as person names, dates, addresses, and phone numbers. Finally, the front end presents a flexible way for subscribers to interact with the database of detected changes to pinpoint those changes most likely to be of interest.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ackerman, M., Starr B., Pazzani, M. "The Do-I-Care Agent: Effective Social Discovery and Filtering on the Web." In: Proc. of RIAO'97, pp. 17--31.
|
| |
2
|
|
| |
3
|
Chen, Y.-F., Douglis, F., Huan, H., Vo, K.-P. "TopBlend: An Efficient Implementation of HtmlDiff in Java." In: Proc. of the WebNet2000 Conference, San
|
| |
4
|
Chen, Y.-F., Koutsofios, E. "Website news: A website tracking and visualization service." In: Poster Proc. of World Wide Web 8, Toronto, Ontario, Canada, May 1999.
|
| |
5
|
Dedieu, O. "Pluxy: un proxy Web dynamiquement extensible." In: Proc. of the 1998 NoTeRe Colloquium, Oct. 1998, http://www-sor.inria.fr/publi/PPWDE_notore98.html.
|
| |
6
|
|
| |
7
|
Fishkin, K., Bier, E. "WebTracker - a Web Service for tracking documents." In: Proc. of World Wide Web 6, Santa Clara, CA, 1997, http://www.parc.xerox.com/istl/members/ fishkin/doc/webtracker.html.
|
| |
8
|
|
 |
9
|
Natalie Glance , Jean-Luc Meunier , Pierre Bernard , Damián Arregui, Collaborative document monitoring, Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work, September 30-October 03, 2001, Boulder, Colorado, USA
[doi> 10.1145/500286.500313]
|
 |
10
|
|
| |
11
|
Hobbs, Jerry R., Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, Mark Stickel, and Mabry Tyson. 1996. "FASTUS: A cascaded finite-state transducer for extracting information from natural-language text." In: Finite State Devices for Natural Language Processing. MIT Press, Cambridge, MA.
|
| |
12
|
|
| |
13
|
Muslea, I. "Extraction Patterns for Information Extraction Tasks: A Survey." In: Proc. of AAAI'99 Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999.
|
| |
14
|
Nigam, K., Lafferty, J., McCallum, A. "Using Maximum Entropy for Text Classification." In: IJCAI'99 Workshop on Information Filtering, 1999.
|
| |
15
|
NetMind, http://www.netmind.com/
|
| |
16
|
Minka, T. "Algorithms for maximum-likelihood logistic regression." Technical Report, 2001. http://www.stat.cmu.edu/~minka/papers/logreg.html
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
SpyOnIt, http://www.spyonit.com/
|
| |
21
|
WebSpector, http://www.illumix.com/.
|
CITED BY 9
|
|
D. C. Reis , P. B. Golgher , A. S. Silva , A. F. Laender, Automatic web news extraction using tree edit distance, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
|
|
|
Wei Tang , Kipp Jones , Ling Liu , Calton Pu, BizCQ: using continual queries to cope with changes in business information exchange, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, May 19-21, 2004, New York, NY, USA
|
|
|
N. Agrawal , R. Ananthanarayanan , R. Gupta , S. Joshi , R. Krishnapuram , S. Negi, The eShopmonitor: a comprehensive data extraction tool for monitoring web sites, IBM Journal of Research and Development, v.48 n.5/6, p.679-692, September/November 2004
|
|
|
|
|
|
Adam Jatowt , Yukiko Kawai , Satoshi Nakamura , Yutaka Kidawara , Katsumi Tanaka, Journey to the past: proposal of a framework for past web browser, Proceedings of the seventeenth conference on Hypertext and hypermedia, August 22-25, 2006, Odense, Denmark
|
|
|
Jie Han , Dingyi Han , Chenxi Lin , Hua-Jun Zeng , Zheng Chen , Yong Yu, Homepage live: automatic block tracing for web personalization, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Eytan Adar , Mira Dontcheva , James Fogarty , Daniel S. Weld, Zoetrope: interacting with the ephemeral web, Proceedings of the 21st annual ACM symposium on User interface software and technology, October 19-22, 2008, Monterey, CA, USA
|
|