ACM Home Page
Please provide us with feedback. Feedback
Multi-way set enumeration in real-valued tensors
Full text PdfPdf (226 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors table of contents
Paris, France
Article No. 4  
Year of Publication: 2009
ISBN:978-1-60558-673-1
Authors
Elisabeth Georgii  MPI for Biological Cybernetics/Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
Koji Tsuda  National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Bernhard Schölkopf  MPI for Biological Cybernetics, Tübingen, Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 26,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1581114.1581118
What is a DOI?

ABSTRACT

The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns. An n-set pattern consists of specific subsets of the n instance sets such that all possible n- ary associations between the corresponding instances are observed. This provides a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible combinations have been recorded in the data. Second, we take association weights into account. More precisely, we propose a method to enumerate all n- sets that satisfy a minimum threshold with respect to the average association weight. Non-observed associations obtain by default a weight of zero. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world data sets from different domains.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet, 25(1):25--29, 2000.
 
2
 
3
G. Bejerano, N. Friedman, and N. Tishby. Efficient exact p-value computation for small sample, sparse, and surprising categorical data. J Comput Biol, 11(5):867--86, 2004.
 
4
J. Besson, C. Robardet, L. D. Raedt, and J.-F. Boulicaut. Mining bi-sets in numerical data. In S. Dzeroski and J. Struyf, editors, KDID, volume 4747 of Lecture Notes in Computer Science, pages 11--23. Springer, 2006
 
5
 
6
L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. Data peeler: Contraint-based closed pattern mining in n-ary relations. In SDM, pages 37--48, 2008.
 
7
A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell, 11(12):4241--4257, 2000.
 
8
 
9
 
10
 
11
 
12
13
 
14
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In Machine Learning: ECML 2004, pages 217--226. Springer, 2004.
 
15
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. Technical Report SAND 2007-6702, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, 2007.
 
16
 
17
M. Koyuturk, W. Szpankowski, and A. Grama. Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol, 14(6):747--64, 2007.
 
18
 
19
R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart et al. Expander - an integrative program suite for microarray data analysis. BMC Bioinformatics, 6(1):232, 2005.
20
 
21
T. Uno. An efficient algorithm for enumerating pseudo cliques. In Proceedings of ISAAC 2007, pages 402--414, 2007.
22
23
24

Collaborative Colleagues:
Elisabeth Georgii: colleagues
Koji Tsuda: colleagues
Bernhard Schölkopf: colleagues