|
ABSTRACT
The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Greg Barish , Craig A. Knoblock , Yi-Shin Chen , Steven Minton , Andrew Philpot , Cyrus Shahabi, The TheaterLoc Virtual Application, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, p.980-987, July 30-August 03, 2000
|
| |
4
|
|
 |
5
|
Kurt D. Bollacker , Steve Lawrence , C. Lee Giles, CiteSeer: an autonous Web agent for automatic retrieval and identification of interesting publications, Proceedings of the second international conference on Autonomous agents, p.116-123, May 10-13, 1998, Minneapolis, Minnesota, United States
[doi> 10.1145/280765.280786]
|
| |
6
|
Brandman, O., Cho, J., Garcia-Molina, H. and Shivakumar, N. 2000. Crawler-friendly Web servers. In Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS). Santa Clara, California.
|
 |
7
|
|
| |
8
|
Brody, R. 2000. Illusions of plenty: The role of search engines in the structure and suppression of knowledge. In Proceedings of the IEEE International Symposium on Technology and Society. Rome, Italy, 157--161.
|
| |
9
|
Budzik, J. and Hammond, K. 1999. Watson: Anticipating and Contextualizing Information Needs. In Proceedings of the 60nd Annual Meeting of the American Society for Information Science. 727--740.
|
| |
10
|
Cappelli, P. 2001. Making the most of online recruiting. Harvard Bus. Rev. 79,3, 139--146.
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
 |
15
|
|
| |
16
|
|
| |
17
|
Cyber Atlas. 1999. U.S. top 50 internet properties, Dec. 1999, at home/work combined. 1 (July 2000).
|
| |
18
|
Cyber Atlas. 2001. U.S. top 50 internet properties, (May 2001) at home/work combined. (July 2000).
|
| |
19
|
Cyber Atlas. 2002. (Nov. 2002) internet usage stats. (Jan. 2002).
|
| |
20
|
|
 |
21
|
Robert B. Doorenbos , Oren Etzioni , Daniel S. Weld, A scalable comparison-shopping agent for the World-Wide Web, Proceedings of the first international conference on Autonomous agents, p.39-48, February 05-08, 1997, Marina del Rey, California, United States
[doi> 10.1145/267658.267666]
|
| |
22
|
Dumais, S. T. 2002. Web experiments and test collections. The 11th International World Wide Web Conference. 2003 (April).
|
 |
23
|
|
| |
24
|
Etzioni, O. 1996a. Moving Up the information food chain: Deploying softbots on the World Wide Web. In Proceedings of the 13th National Conference on Artificial Intelligence and the 8th Innovative Applications of Artificial Intelligence Conference. 1322--1326.
|
 |
25
|
|
 |
26
|
Gary W. Flake , Eric J. Glover , Steve Lawrence , C. Lee Giles, Extracting query modifications from nonlinear SVMs, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511488]
|
| |
27
|
|
 |
28
|
|
| |
29
|
Eric J. Glover , Gary W. Flake , Steve Lawrence , Andries Kruger , David M. Pennock , William P. Birmingham , C. Lee Giles, Improving Category Specific Web Search by Learning Query Modifications, Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001), p.23, January 08-12, 2001
|
| |
30
|
Nathaniel Good , J. Ben Schafer , Joseph A. Konstan , Al Borchers , Badrul Sarwar , Jon Herlocker , John Riedl, Combining collaborative filtering with personal agents for better recommendations, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.439-446, July 18-22, 1999, Orlando, Florida, United States
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
Jansen, B. J. and Spink, A. 2003. An analysis of Web information seeking and use: Documents retrieved versus documents viewed. In Proceedings of the 4th International Conference on Internet Computing. Las Vegas, NV, 65--69.
|
| |
39
|
Jansen, B. J. and Spink, A. 2005. An analysis of Web searching by European Alltheweb.com users. Inform. Process. Manag. 42, 1, 248--263.
|
 |
40
|
|
| |
41
|
Jansen, B. J., Spink, A. and Pederson, J. 2003a. Monsters at the gates: When Softbots visit web search engines. In Proceedings of the 4th International Conference on Internet Computing. Las Vegas, NV, 620--626.
|
| |
42
|
Jansen, B. J., Spink, A. and Pederson, J. 2003b. Web searching agents: What are they doing out there? In Proceedings of the 2003 IEEE International Conference on Systems, Man and Cybernetics. Washington, DC, 10--16.
|
| |
43
|
|
| |
44
|
|
| |
45
|
Joachims, T., Freitag, D. and Mitchell, T. 1997. WebWatcher: A tour guide for the World Wide Web. In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI 97). 770--775.
|
| |
46
|
Jones, W. 2004. Finders, keepers? The present and future perfect in support of personal information management. First Monday. 9, 3.
|
| |
47
|
|
| |
48
|
Knoblock, C. A., Minton, S., Ambite, J. L., Ashish, N., Muslea, I., Philpot, A. G. and Tejada, S. 2001a. The Ariadne approach to Web-based information integration. Int. J. Coopera. Inform. Syst. (IJCIS). 10, 12, 145--169.
|
| |
49
|
Koster, M. 1998. The Web robots FAQ. www.robotstxt.org/wc/faq.html 15 (March 2002).
|
| |
50
|
Lawrence, S. 2001. Online or invisible? Nature. 411,6837, 521.
|
| |
51
|
|
| |
52
|
Lee, G., Lee, J.-H., Rho, H., Park, Y.-T., Choi, J. and Seo, J. 1998. Interactive NLI agent for multiagent Web search model. In Proceedings of the International Workshop on Intelligent Agents on the Internet and Web, in 4th World Congress on Expert Systems. Mexico City, Mexico, 67--74.
|
 |
53
|
|
| |
54
|
|
| |
55
|
|
 |
56
|
|
 |
57
|
|
 |
58
|
Filippo Menczer , Gautam Pant , Padmini Srinivasan , Miguel E. Ruiz, Evaluating topic-driven web crawlers, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.241-249, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383995]
|
| |
59
|
|
| |
60
|
|
| |
61
|
|
| |
62
|
Munarriz, R. A. 1997. How did it double? www.tool.com/ddouble/1997/ddouble 970812 html/. 10 November,
|
| |
63
|
|
 |
64
|
James Pitkow , Hinrich Schütze , Todd Cass , Rob Cooley , Don Turnbull , Andy Edmonds , Eytan Adar , Thomas Breuel, Personalized search, Communications of the ACM, v.45 n.9, September 2002
[doi> 10.1145/567498.567526]
|
| |
65
|
|
| |
66
|
|
| |
67
|
|
| |
68
|
Searchtools.Com. 2001. Source Code for Web Robot Spiders.
|
| |
69
|
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the metacrawler. In Proceedings of the 4th International World-Wide Web Conference. Boston, MA.
|
| |
70
|
|
 |
71
|
|
| |
72
|
|
 |
73
|
|
| |
74
|
Spink, A. and Jansen, B. J. 2004. Web Search: Public Searching of the Web. Kluwer, New York, NY.
|
| |
75
|
|
| |
76
|
Sullivan, D. 2002. Search Engine Math. www.searchenginewatch.com/showPage.html 11 April,
|
| |
77
|
Sullivan, D. 2003. Search Utilities. www.searchenginewatch.com 16 (March 2002).
|
 |
78
|
J. Talim , Z. Liu , Ph. Nain , E. G. Coffman, Jr., Controlling the robots of Web search engines, Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.236-244, June 2001, Cambridge, Massachusetts, United States
|
 |
79
|
|
 |
80
|
|
 |
81
|
J. L. Wolf , M. S. Squillante , P. S. Yu , J. Sethuraman , L. Ozsen, Optimal crawling strategies for web search engines, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511465]
|
| |
82
|
Xiaohui, Z., Huayong, W., Guiran, C. and Hong, Z. 2001. An autonomous system-based distribution system for Web search. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Tucson, AZ, 435--440.
|
 |
83
|
|
 |
84
|
|
CITED BY 5
|
|
|
|
|
Eunyee Koh , Andruid Kerne , Andrew Webb , Sashikanth Damaraju , David Sturdivant, Generating views of the buzz: browsing popular media and authoring using mixed-initiative composition, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
|
|
|
|
|
|
|
|
|