| Distributed query sampling: a quality-conscious approach |
| Full text |
Pdf
(223 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Seattle, Washington, USA
SESSION: Distributed IR
table of contents
Pages: 340 - 347
Year of Publication: 2006
ISBN:1-59593-369-7
|
|
Authors
|
|
James Caverlee
|
Georgia Institute of Technology, Atlanta, GA
|
|
Ling Liu
|
Georgia Institute of Technology, Atlanta, GA
|
|
Joonsoo Bae
|
Chonbuk National University, Jeonju, Jeonbuk, South Korea
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 89, Citation Count: 3
|
|
|
ABSTRACT
We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Agichtein, P. Ipeirotis, and L. Gravano. Modeling query-based access to text databases. In WebDB, 2003.
|
| |
2
|
|
 |
3
|
|
 |
4
|
Jamie Callan , Margaret Connell , Aiqun Du, Automatic discovery of language models for text databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.479-490, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
5
|
J. Callan et al. The effects of query-based sampling on automatic database selection algorithms. Technical Report CMU-LTI-00-162, CMU, 2000.
|
 |
6
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
7
|
W. W. Cohen and Y. Singer. Learning to query the Web. In AAAI Workshop on Internet-Based Info. Systems. 1996.
|
 |
8
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336628]
|
 |
9
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312684]
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Probe, count, and classify: categorizing hidden web databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.67-78, May 21-24, 2001, Santa Barbara, California, United States
|
| |
16
|
J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. on Inf. Theory, 37(1):145--151, 1991.
|
 |
17
|
King-Lup Liu , Adrain Santoso , Clement Yu , Weiyi Meng, Discovering the representative of a search engine, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502696]
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
 |
23
|
Allison L. Powell , James C. French , Jamie Callan , Margaret Connell , Charles L. Viles, The impact of database selection on distributed searching, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.232-239, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345584]
|
| |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
|