| Relaxation in text search using taxonomies |
| Full text |
Pdf
(508 KB)
|
Source
|
Proceedings of the VLDB Endowment
archive
Volume 1 , Issue 1 (August 2008)
table of contents
SESSION: IR and forms
table of contents
Pages 672-683
Year of Publication: 2008
ISSN:2150-8097
|
|
Authors
|
|
Marcus Fontoura
|
Yahoo! Research, Sunnyvale, CA
|
|
Vanja Josifovski
|
Yahoo! Research, Sunnyvale, CA
|
|
Ravi Kumar
|
Yahoo! Research, Sunnyvale, CA
|
|
Christopher Olston
|
Yahoo! Research, Sunnyvale, CA
|
|
Andrew Tomkins
|
Yahoo! Research, Sunnyvale, CA
|
|
Sergei Vassilvitskii
|
Yahoo! Research, Sunnyvale, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 74, Citation Count: 1
|
|
|
ABSTRACT
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result quality. This new model may be applicable in many arenas, including multifaceted, product, and local search, where documents are augmented with hierarchical metadata such as topic or location. We present efficient algorithms for indexing and query processing in this new retrieval model. We decompose query processing into two sub-problems: first, an online search problem to determine the correct overall level of relaxation cost that must be incurred to generate the top k results; and second, a budgeted relaxation search problem in which all results at a particular relaxation cost must be produced at minimal cost. We show the latter problem is solvable exactly in two hierarchical dimensions, is NP-hard in three or more dimensions, but admits efficient approximation algorithms with provable guarantees. We present experimental results evaluating our algorithms on both synthetic and real data, showing order of magnitude improvements over the baseline algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
Andrei Z. Broder , David Carmel , Michael Herscovici , Aya Soffer , Jason Zien, Efficient query evaluation using a two-level retrieval process, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
[doi> 10.1145/956863.956944]
|
 |
7
|
|
| |
8
|
Doug Burdick , Prasad M. Deshpande , T. S. Jayram , Raghu Ramakrishnan , Shivakumar Vaithyanathan, OLAP over uncertain and imprecise data, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
Ronald Fagin , R. Guha , Ravi Kumar , Jasmine Novak , D. Sivakumar , Andrew Tomkins, Multi-structural databases, Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 13-15, 2005, Baltimore, Maryland
[doi> 10.1145/1065167.1065191]
|
| |
14
|
R. Fagin , Ph. Kolaitis , R. Kumar , J. Novak , D. Sivakumar , A. Tomkins, Efficient implementation of large-scale multi-structural databases, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
Marcus Fontoura , Engene Shekita , Jason Y. Zien , Sridhar Rajagopalan , Andreas Neumann, High performance index build algorithms for intranet search engines, Proceedings of the Thirtieth international conference on Very large data bases, p.1122-1133, August 31-September 03, 2004, Toronto, Canada
|
| |
19
|
R. J. Fowler, M. Paterson, and S. L. Tanimoto. Optimal packing and covering in the plane are NP-complete. IPL, 12(3):133--137, 1981.
|
| |
20
|
|
| |
21
|
Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997
[doi> 10.1023/A:1009726021843]
|
| |
22
|
D. Gruhl , L. Chavet , D. Gibson , J. Meyer , P. Pattanayak , A. Tomkins , J. Zien, How to build a WebFountain: An architecture for very large-scale text analytics, IBM Systems Journal, v.43 n.1, p.64-77, January 2004
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
Varun Kacholia , Shashank Pandit , Soumen Chakrabarti , S. Sudarshan , Rushi Desai , Hrishikesh Karambelkar, Bidirectional expansion for keyword search on graph databases, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
 |
27
|
Eser Kandogan , Rajasekar Krishnamurthy , Sriram Raghavan , Shivakumar Vaithyanathan , Huaiyu Zhu, Avatar semantic search: a database approach to information retrieval, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142591]
|
| |
28
|
|
| |
29
|
|
 |
30
|
Sergey Melnik , Sriram Raghavan , Beverly Yang , Hector Garcia-Molina, Building a distributed full-text index for the Web, Proceedings of the 10th international conference on World Wide Web, p.396-406, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372095]
|
| |
31
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
| |
32
|
|
| |
33
|
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann, 1999.
|
 |
34
|
|
 |
35
|
Ka-Ping Yee , Kirsten Swearingen , Kevin Li , Marti Hearst, Faceted metadata for image search and browsing, Proceedings of the SIGCHI conference on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA
[doi> 10.1145/642611.642681]
|
 |
36
|
|
|