| Task-oriented world wide web retrieval by document type classification |
| Full text |
Pdf
(670 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 109 - 113
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
Katsushi Matsuda
|
Human Media Res. Labs., NEC 8916-47, Takayama-cho, Ikoma, Nara, 630-0101 Japan
|
|
Toshikazu Fukushima
|
Human Media Res. Labs., NEC 8916-47, Takayama-cho, Ikoma, Nara, 630-0101 Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 35, Citation Count: 3
|
|
|
ABSTRACT
This paper proposes a novel approach to accurately searching Web pages for relevant information in problem solving by specifying a Web document category instead of the user's task. Accessing information from World Wide Web pages as an approach to problem solving has become commonplace. However, such a search is difficult with current search services, since these services only provide keyword-based search methods that are equivalent to narrowing down the target references according to domains. However, problem solving usually involves both a domain and a task. Accordingly, our approach is based on problem solving tasks. To specify a user's problem solving task, we introduce the concept of document types that directly relate to the problem solving tasks; with this approach, users can easily designate problem solving tasks. We implemented PageTypeSearch system based on our approach. Classifier of PageTypeSearch classifies Web pages into the document types by comparing their pages with typical structural characteristics of the types. We compare PageTypeSearch using the document typeindices with a conventional keyword-based search system in experiments. The average precision of the document type-based search is 88.9%, while the average precision of the keyword-based search is 31.2%. Moreover, the number of irrelevant references gathered by our system is about one-thirteenth that of traditional keyword-based search systems. Our approach has practical advantages for problem solving by introducing the viewpoint of tasks to achieve higher performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Wai Lam, Kon F. Low and Chao Y. Ho, Using a Bayesian Network Induction Approach for Text Categorization. In Proceedings of 15th International Joint Conference on Artificial Intelligence, pp.745-750, 1997.
|
| |
4
|
Robert B. Doorenbos, Oren Etzioni and Daniel S. Weld, A Scalable Comparison-Shopping Agent for the World-Wide Web. University of Washington, Department of Computer Science and Engineering Technical Report UW-CSE-96-01-03, 1996.
|
| |
5
|
Robin D. Burke , Kristian J. Hammond , Vladimir A. Kulyukin , Steven L. Lytinen , N. Tomuro , S. Schoenberg, Question Answering from Frequently Asked Question Files: Experiences with the FAQ Finder System, University of Chicago, Chicago, IL, 1997
|
 |
6
|
|
 |
7
|
Hinrich Schütze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215365]
|
| |
8
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
9
|
|
|