|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
We study the problem of economical representation of subsets of structured sets, which are sets equipped with a set cover or a family of preorders. Given a structured set U, and a language L whose expressions define subsets of U, the problem of minimum description length in L (L-MDL) is: “given a subset V of U, find a shortest string in L that defines V.” Depending on the structure and the language, the MDL-problem is in general intractable. We study the complexity of the MDL-problem for various structures and show that certain specializations are tractable. The families of focus are hierarchy, linear order, and their multidimensional extensions; these are found in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality, hierarchy, and ordering, which can also be viewed naturally as a cover-structured ordered set. Efficient algorithms are provided for the MDL-problem for hierarchical and linearly ordered structures, and we prove that the multidimensional extensions are NP-complete. Finally, we illustrate the application of the theory to summarization of large result sets and (multi) query optimization for ROLAP queries. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
Additional Classification:
General Terms:
Keywords:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||