|
ABSTRACT
Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms based solely on term frequency statistics. Information quality is usually ignored. This leads to the problem that documents are retrieved without regard to their quality. We present an approach that combines similarity-based similarity ranking with quality ranking in centralized and distributed search environments. Six quality metrics, including the currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, were investigated. Search effectiveness was significantly improved when the currency, availability, information-to-noise ratio and page cohesiveness metrics were incorporated in centralized search. The improvement seen when the availability, information-to- noise ratio, popularity, and cohesiveness metrics were incorporated in site selection was also significant. Finally, incorporating the popularity metric in information fusion resulted in a significant improvement. In summary, the results show that incorporating quality metrics can generally improve search effectiveness in both centralized and distributed search environments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AltaVista. 1999. http://www.altavista.com.
|
| |
2
|
S. Brin and L. Page. The Anatomy of a Large- Scale Hypertextual Web Search Engine. http://google, stanford.edu/long321.htm, 1999.
|
| |
3
|
J.P. Callan, W.B. Croft, and S.M. Harding, "The INQUERY retrieval system." In Proceedings of the 3rd International Conference on Database and Expert System Applications, Valencia, Spain, September, 1995.
|
| |
4
|
Ciolek. http://www.ciolek.com/WWWVLPages/ QltyDefinitions.html.
|
| |
5
|
Clearinghouse. Argus Clearinghouse Ratings System. http://clearinghouse.net/ratings.html, 1999.
|
| |
6
|
G. Crowder and C. Nicholas. "Using Statistical Properties of Text to Create Metadata." In First 1EEE Metadata Conference. April 1996.
|
| |
7
|
G. Crowder and C. Nicholas.. "Resource Selection in CAFI: an Architecture for Network Information Retrieval." In ACM-SIGIR96 Workshop on Networked Information Retrieval. 22 August, 1996.
|
| |
8
|
Direct Hit, http://www.directhit.com, 2000a.
|
| |
9
|
Direct Hit, http://www.directhit.com/about/press/ articles/cnet_shoot.html, 2000b.
|
| |
10
|
Y. Fan and S. Gauch. "Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources." In 1999 AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University, March, 1999.
|
| |
11
|
S. Ganch. 1997b. "Cooperative Agents for Concep-tual Search and Browsing of World Wide Web Resources." CAREER/EPSCoR Award number 97-03307, http://www.ittc.ukans. edu/obiwan/, 1997b.
|
 |
12
|
|
| |
13
|
L. Gravano, K. Change, H. Garcia-Molina, C. Lagoze, A. Paepcke. Stanford Protocal Proposal for Internet Retrieval and Search. http://wwwdb.stanford.edu/-gravano/standards, 1997.
|
| |
14
|
IPL. http://www.ipl.org, 1999.
|
| |
15
|
|
| |
16
|
Y. Li, and L. Rafsky "Beyond Relevance Ranking: Hyperlink Vector Voting." In ACM- SIGIR97 Workshop on Networked Information Retrieval. Philadelphia, USA, 31 July 1997.
|
| |
17
|
Lycos. http//point.lycos.com/categories/index. html, 1999b.
|
| |
18
|
Lycos. http://www.lycos.com/help/top5-help2. html, 1999c.
|
| |
19
|
Magellan. http://magellan.mckinley.com, 1999.
|
| |
20
|
Magellan.http://www.lib.ua.edu/maghelp.htm, 1998.
|
| |
21
|
|
| |
22
|
Scout. Internet Scout Project, http://scout.cs. wisc.edu/scout/index.html, 1999a.
|
| |
23
|
Scout. Scout Report Selection Criteria, http://scout.cs.wisc.edu/scout/report/criteria. html, 1999b.
|
| |
24
|
E. Selberg. "DISW '96 Query routing and Searching Breakout." In Report of the Distributed Indexing/ Searching Workshop, http://www.w3.org/Search/ 9605-Indexing- Workshop/ReportOutcomes/ S6Groupl.html, Cambridge, Massachusetts, May 1996.
|
| |
25
|
G. Towell, E.M. Voorhees, N.K. Gupta and B. Johnson-Laird B. "Learning Collection Fusion Strategies for Information Retrieval." In Proceedings of the Twelth Annual Machine Learning Conference, Lake Tahoe, July 1995.
|
| |
26
|
E. M. Voorhees, "Database Merging Strategies for Searching Public and Privated Collections." In ACM-SIGIR97 Workshop on Networked Inform-ation Retrieval, Philadelphia, USA, 32, July 1997.
|
| |
27
|
ZDNet. http://www.zdnet.com/yil, 1999.
|
 |
28
|
Xiaolan Zhu , Susan Gauch , Lutz Gerhard , Nicholas Kral , Alexander Pretschner, Ontology-based web site mapping for information exploration, Proceedings of the eighth international conference on Information and knowledge management, p.188-194, November 02-06, 1999, Kansas City, Missouri, United States
[doi> 10.1145/319950.329374]
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rong Tang , Kwong Bor Ng , Tomek Strzalkowski , Paul B. Kantor, Automatically predicting information quality in news documents, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers, p.97-99, May 27-June 01, 2003, Edmonton, Canada
|
|
|
|
|
|
|
|
|
Meiqun Hu , Ee-Peng Lim , Aixin Sun , Hady Wirawan Lauw , Ba-Quy Vuong, On improving wikipedia search using article quality, Proceedings of the 9th annual ACM international workshop on Web information and data management, November 09-09, 2007, Lisbon, Portugal
|
|
|
|
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
Steven Bethard , Philipp Wetzer , Kirsten Butcher , James H. Martin , Tamara Sumner, Automatically characterizing resource quality for educational digital libraries, Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, June 15-19, 2009, Austin, TX, USA
|
|
|
Philipp G. Wetzler , Steven Bethard , Kirsten Butcher , James H. Martin , Tamara Sumner, Automatically assessing resource quality for educational digital libraries, Proceedings of the 3rd workshop on Information credibility on the web, April 20-20, 2009, Madrid, Spain
|
|