ACM Home Page
Please provide us with feedback. Feedback
Multidimensional content eXploration
Full text PdfPdf (786 KB)
Source
Proceedings of the VLDB Endowment archive
Volume 1 ,  Issue 1  (August 2008) table of contents
SESSION: IR and forms table of contents
Pages 660-671  
Year of Publication: 2008
ISSN:2150-8097
Authors
Alkis Simitsis  IBM Almaden Research Center, San Jose, CA
Akanksha Baid  Univ. Wisconsin Madison
Yannis Sismanis  IBM Almaden Research Center, San Jose, CA
Berthold Reinwald  IBM Almaden Research Center, San Jose, CA
Publisher
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 141,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1453856.1453929
What is a DOI?

ABSTRACT

Content Management Systems (CMS) store enterprise data such as insurance claims, insurance policies, legal documents, patent applications, or archival data like in the case of digital libraries. Search over content allows for information retrieval, but does not provide users with great insight into the data. A more analytical view is needed through analysis, aggregations, groupings, trends, pivot tables or charts, and so on. Multidimensional Content eXploration (MCX) is about effectively analyzing and exploring large amounts of content by combining keyword search with OLAP-style aggregation, navigation, and reporting. We focus on unstructured data or generally speaking documents or content with limited metadata, as it is typically encountered in CMS. We formally present how CMS content and metadata should be organized in a well-defined multidimensional structure, so that sophisticated queries can be expressed and evaluated. The CMS metadata provide traditional OLAP static dimensions that are combined with dynamic dimensions discovered from the analyzed keyword search result, as well as measures for document scores based on the link structure between the documents. In addition, we provide means for multidimensional content exploration through traditional OLAP rollupdrilldown operations on the static and dynamic dimensions, solutions for multi-cube analysis and dynamic navigation of the content. We present our prototype, called DBPubs, which stores research publications as documents that can be searched and -most importantly-- analyzed, and explored. Finally, we present experimental results of the efficiency and effectiveness of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
6
7
 
8
9
 
10
CiteSeer. http://citeseer.ist.psu.edu.
 
11
DBLP. http://www.informatik.uni-trier.de/ley/db.
 
12
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.
 
13
 
14
J. Diederich. Faceted DBLP, http://dblp.13s.de.
 
15
Eventseer. http://eventseer.net.
 
16
G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numerische Mathematik, 14(5):403--420, 1970.
 
17
 
18
Harzing. Publish or Perish, http://www.harzing.com/pop.htm.
19
20
 
21
Y. E. Ioannidis, D. Maier, S. Abiteboul, P. Buneman, S. B. Davidson, E. A. Fox, A. Y. Halevy, C. A. Knoblock, F. Rabitti, H.-J. Schek, and G. Weikum. Digital library information-technology infrastructures. Int. J. on Digital Libraries, 5(4):266--274, 2005.
 
22
 
23
 
24
 
25
 
26
27
 
28
Mondial, http://www.dbis.informatik.uni-goettingen.de/mondial.
 
29
 
30
 
31
 
32
D. Takuma and I. Yoshida. Top-n keyword calculation on dynamically selected documents. IBM Research Report, RT-0760, October 2007.
33
34


Collaborative Colleagues:
Alkis Simitsis: colleagues
Akanksha Baid: colleagues
Yannis Sismanis: colleagues
Berthold Reinwald: colleagues