| Dynamic faceted search for discovery-driven analysis |
| Full text |
Pdf
(477 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: faceted search, web query results presentation
table of contents
Pages 3-12
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Debabrata Dash
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Jun Rao
|
IBM Almaden Researche Center, San Jose, CA, USA
|
|
Nimrod Megiddo
|
IBM Almaden Research Center, San Jose, CA, USA
|
|
Anastasia Ailamaki
|
Carnegie Mellon University, Pittsburgh, PA, USA and Ecole Polytechnique Fédérale de Lausanne
|
|
Guy Lohman
|
IBM Almaden Research Center, San Jose, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 23, Downloads (12 Months): 328, Citation Count: 0
|
|
|
ABSTRACT
We propose a dynamic faceted search system for discovery-driven analysis on data with both textual content and structured attributes. From a keyword query, we want to dynamically select a small set of "interesting" attributes and present aggregates on them to a user. Similar to work in OLAP exploration, we define "interestingness" as how surprising an aggregated value is, based on a given expectation. We make two new contributions by proposing a novel "navigational" expectation that's particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values. Through a user survey, we find the new expectation and interestingness metric quite effective. We develop an efficient dynamic faceted search system by improving a popular open source engine, Solr. Our system exploits compressed bitmaps for caching the posting lists in an inverted index, and a novel directory structure called a bitset tree for fast bitset intersection. We conduct a comprehensive experimental study on large real data sets and show that our engine performs 2 to 3 times faster than Solr.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
W. Dakka, et al: Automatic discovery of useful facet terms. In SIGIR Faceted Search Workshop, 2006
|
| |
6
|
DBLP dataset: http://dblp.uni-trier.de/xml/
|
| |
7
|
Bradley Efron and Robert J. Tibshirani: An introduction to the bootstrap. Chapman & Hall, 1993
|
| |
8
|
|
| |
9
|
The Flamenco Search Interface Project. http://flamenco.berkeley.edu/
|
| |
10
|
|
| |
11
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
12
|
|
 |
13
|
Ihab F. Ilyas , Volker Markl , Peter Haas , Paul Brown , Ashraf Aboulnaga, CORDS: automatic discovery of correlations and soft functional dependencies, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007641]
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Patent dataset: http://www.nber.org/patents
|
| |
18
|
John Roddick, et al: A Survey of Temporal Knowledge Discovery Paradigms and Methods. In TKDE, 2002
|
| |
19
|
Sunita Sarawagi: User-Adaptive Exploration of Multidimensional Data. VLDB 2000: 307--316
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
Ping Wu, et al: From Keyword-based Retrieval to Keyword-driven Analytical Processing: A Multi-faceted Approach. SIGMOD 2007
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
Friedman, et al: Exploratory Projection Pursuit. In JASA, 1987.
|
| |
32
|
Swayne, et al: XGobi: Interactive Dynamic Data Visualization in the X Window System. In JCGS, 1998.
|
|