ACM Home Page
Please provide us with feedback. Feedback
Distributed query sampling: a quality-conscious approach
Full text PdfPdf (223 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Distributed IR table of contents
Pages: 340 - 347  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
James Caverlee  Georgia Institute of Technology, Atlanta, GA
Ling Liu  Georgia Institute of Technology, Atlanta, GA
Joonsoo Bae  Chonbuk National University, Jeonju, Jeonbuk, South Korea
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 89,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148230
What is a DOI?

ABSTRACT

We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based sampling process into an initial seed sampling phase and a quality-aware iterative sampling phase. In the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process. The unique characteristic of our adaptive query-based sampling framework is its self-learning and self-configuring ability based on the overall quality of all text databases under consideration. We introduce three quality-conscious sampling schemes for estimating database quality, and our initial results show that the proposed framework supports higher-quality document sampling than existing approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Agichtein, P. Ipeirotis, and L. Gravano. Modeling query-based access to text databases. In WebDB, 2003.
 
2
3
4
 
5
J. Callan et al. The effects of query-based sampling on automatic database selection algorithms. Technical Report CMU-LTI-00-162, CMU, 2000.
6
 
7
W. W. Cohen and Y. Singer. Learning to query the Web. In AAAI Workshop on Internet-Based Info. Systems. 1996.
8
9
10
 
11
12
 
13
14
15
 
16
J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. on Inf. Theory, 37(1):145--151, 1991.
17
18
 
19
20
21
 
22
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
23
 
24
25
26
 
27
28


Collaborative Colleagues:
James Caverlee: colleagues
Ling Liu: colleagues
Joonsoo Bae: colleagues