ACM Home Page
Please provide us with feedback. Feedback
Skip-and-prune: cosine-based top-k query processing for efficient context-sensitive document retrieval
Full text PdfPdf (1.12 MB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 3: information extraction table of contents
Pages 115-126  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Jong Wook Kim  Arizona State University, Tempe, AZ, USA
K. Selçuk Candan  Arizona State University, Tempe, AZ, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 71,   Downloads (12 Months): 291,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559859
What is a DOI?

ABSTRACT

Keyword search and ranked retrieval together emerged as popular data access paradigms for various kinds of data, from web pages to XML and relational databases. A user can submit keywords without knowing much (sometimes nothing) about the complex structure underlying a data collection, yet the system can identify, rank, and return a set of relevant matches by exploiting statistics about the distribution and structure of the data. Keyword-based data models are also suitable for capturing user's search context in terms of weights associated to the keywords in the query. Given a search context, the data in the database can also be re-interpreted for semantically correct retrieval. This option, however, is often ignored as the cost of re-assessing the content in the database naively tends to be prohibitive. In this paper, we first argue that top-k query processing can help tackle this challenge by re-assessing only the relevant parts of the database, efficiently. A road-block in this process, however, is that most efficient implementations of top-k query processing assume that the scoring function is monotonic, whereas the cosine-based scoring function needed for re-interpretation of content based on user context is not. In this paper, we develop an efficient top-k query processing algorithm, skip-and-prune (SnP), which is able to process top-k queries under cosine-based non-monotonic scoring functions. We compare the use of proposed algorithm against the alternative implementations of the context-aware retrieval, including naive top-k, accumulator-based inverted files, and full-scan. The experiment results show that while being fast, naive top-k is not an effective solution due to the non-monotonicity of underlying scoring function. The proposed technique, SnP, however, matches the precision of accumulator-based inverted files and full-scan, yet it is orders of magnitude faster than these.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer:a system for keyword-based search over relational databases. In ICDE, 2002.
 
2
3
 
4
 
5
6
 
7
 
8
 
9
 
10
 
11
 
12
U. Guntzer, W. Balke, and W. Kiebling. Optimal Aggregation Algorithms for Middleware. In VLDB, 2000.
 
13
14
 
15
 
16
S. Nepal and M. V. Ramakrishna. Query processing issues in image(multimedia)databases. In ICDE, 1999.
17
 
18
P. Resnik. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. JAIR, 11:95--130, 1999.
 
19
S. E Robertson and K. S Jones. Relevance weighting of search terms. JASIS, 27(3):129--146, 1976.
20
21
 
22
23
 
24
 
25
 
26
P. Tsaparas, N. Koudas and T. Palpanas. Ranked join indices. In ICDE, 2003.
27
 
28
ACM Digital Library. http://portal.acm.org.
 
29
ScienceDirect. http://www.sciencedirect.com.

Collaborative Colleagues:
Jong Wook Kim: colleagues
K. Selçuk Candan: colleagues