ACM Home Page
Please provide us with feedback. Feedback
Topical query decomposition
Full text PdfPdf (449 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 52-60  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Francesco Bonchi  Yahoo! Research, Barcelona, Spain
Carlos Castillo  Yahoo! Research, Barcelona, Spain
Debora Donato  Yahoo! Research, Barcelona, Spain
Aristides Gionis  Yahoo! Research, Barcelona, Spain
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 236,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401902
What is a DOI?

ABSTRACT

We introduce the problem of query decomposition, where we are given a query and a document retrieval system, and we want to produce a small set of queries whose union of resulting documents corresponds approximately to that of the original query. Ideally, these queries should represent coherent, conceptually well-separated topics.

We provide an abstract formulation of the query decomposition problem, and we tackle it from two different perspectives. We first show how the problem can be instantiated as a specific variant of a set cover problem, for which we provide an efficient greedy algorithm. Next, we show how the same problem can be seen as a constrained clustering problem, with a very particular kind of constraint, i.e., clustering with predefined clusters. We develop a two-phase algorithm based on hierarchical agglomerative clustering followed by dynamic programming. Our experiments, conducted on a set of actual queries in a Web scale search engine, confirm the effectiveness of the proposed solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proc. of EDBT Workshops, pp. 588--596, 2004.
 
3
4
5
 
6
 
7
V. Chvátal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4:233--235, 1979.
8
 
9
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD), 1996.
10
11
 
12
B. M. Fonseca, P. B. Golgher, E. S. de Moura, B. Pôssas, and N. Ziviani. Discovering search engine related queries using association rules. Journal of Web Engineering, 2(4), 2004.
13
14
 
15
16
17
 
18
19
20
 
21
 
22
 
23
K. Wagstaff, S. Basu, and I. Davidson. When is constrained clustering beneficial, and why? In Proc. of AAAAI'06, 2006.
 
24
25
26
27
28
29


Collaborative Colleagues:
Francesco Bonchi: colleagues
Carlos Castillo: colleagues
Debora Donato: colleagues
Aristides Gionis: colleagues