|
ABSTRACT
Content Management Systems (CMS) store enterprise data such as insurance claims, insurance policies, legal documents, patent applications, or archival data like in the case of digital libraries. Search over content allows for information retrieval, but does not provide users with great insight into the data. A more analytical view is needed through analysis, aggregations, groupings, trends, pivot tables or charts, and so on. Multidimensional Content eXploration (MCX) is about effectively analyzing and exploring large amounts of content by combining keyword search with OLAP-style aggregation, navigation, and reporting. We focus on unstructured data or generally speaking documents or content with limited metadata, as it is typically encountered in CMS. We formally present how CMS content and metadata should be organized in a well-defined multidimensional structure, so that sophisticated queries can be expressed and evaluated. The CMS metadata provide traditional OLAP static dimensions that are combined with dynamic dimensions discovered from the analyzed keyword search result, as well as measures for document scores based on the link structure between the documents. In addition, we provide means for multidimensional content exploration through traditional OLAP rollupdrilldown operations on the static and dynamic dimensions, solutions for multi-cube analysis and dynamic navigation of the content. We present our prototype, called DBPubs, which stores research publications as documents that can be searched and -most importantly-- analyzed, and explored. Finally, we present experimental results of the efficiency and effectiveness of our approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Akanksha Baid , Andrey Balmin , Heasoo Hwang , Erik Nijkamp , Jun Rao , Berthold Reinwald , Alkis Simitsis , Yannis Sismanis , Frank van Ham, DBPubs: multidimensional exploration of database publications, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
[doi> 10.1145/1454159.1454199]
|
| |
3
|
|
| |
4
|
|
 |
5
|
Kevin Beyer , Don Chambérlin , Latha S. Colby , Fatma Özcan , Hamid Pirahesh , Yu Xu, Extending XQuery for analytics, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066215]
|
 |
6
|
Kevin Beyer , Peter J. Haas , Berthold Reinwald , Yannis Sismanis , Rainer Gemulla, On synopses for distinct-value estimation under multiset operations, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
[doi> 10.1145/1247480.1247504]
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
CiteSeer. http://citeseer.ist.psu.edu.
|
| |
11
|
DBLP. http://www.informatik.uni-trier.de/ley/db.
|
| |
12
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.
|
| |
13
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
14
|
J. Diederich. Faceted DBLP, http://dblp.13s.de.
|
| |
15
|
Eventseer. http://eventseer.net.
|
| |
16
|
G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numerische Mathematik, 14(5):403--420, 1970.
|
| |
17
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
18
|
Harzing. Publish or Perish, http://www.harzing.com/pop.htm.
|
 |
19
|
|
 |
20
|
|
| |
21
|
Y. E. Ioannidis, D. Maier, S. Abiteboul, P. Buneman, S. B. Davidson, E. A. Fox, A. Y. Halevy, C. A. Knoblock, F. Rabitti, H.-J. Schek, and G. Weikum. Digital library information-technology infrastructures. Int. J. on Digital Libraries, 5(4):266--274, 2005.
|
| |
22
|
Ralph Kimball , Laura Reeves , Warren Thornthwaite , Margy Ross , Warren Thornwaite, The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom, John Wiley & Sons, Inc., New York, NY, 1998
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
 |
27
|
Andrew McCallum , Kamal Nigam , Lyle H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.169-178, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347123]
|
| |
28
|
Mondial, http://www.dbis.informatik.uni-goettingen.de/mondial.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
D. Takuma and I. Yoshida. Top-n keyword calculation on dynamically selected documents. IBM Research Report, RT-0760, October 2007.
|
 |
33
|
|
 |
34
|
|
CITED BY 2
|
|
Akanksha Baid , Andrey Balmin , Heasoo Hwang , Erik Nijkamp , Jun Rao , Berthold Reinwald , Alkis Simitsis , Yannis Sismanis , Frank van Ham, DBPubs: multidimensional exploration of database publications, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
|
|
|
|
|