|
ABSTRACT
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns. An n-set pattern consists of specific subsets of the n instance sets such that all possible n- ary associations between the corresponding instances are observed. This provides a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible combinations have been recorded in the data. Second, we take association weights into account. More precisely, we propose a method to enumerate all n- sets that satisfy a minimum threshold with respect to the average association weight. Non-observed associations obtain by default a weight of zero. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world data sets from different domains.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet, 25(1):25--29, 2000.
|
| |
2
|
|
| |
3
|
G. Bejerano, N. Friedman, and N. Tishby. Efficient exact p-value computation for small sample, sparse, and surprising categorical data. J Comput Biol, 11(5):867--86, 2004.
|
| |
4
|
J. Besson, C. Robardet, L. D. Raedt, and J.-F. Boulicaut. Mining bi-sets in numerical data. In S. Dzeroski and J. Struyf, editors, KDID, volume 4747 of Lecture Notes in Computer Science, pages 11--23. Springer, 2006
|
| |
5
|
|
| |
6
|
L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. Data peeler: Contraint-based closed pattern mining in n-ary relations. In SDM, pages 37--48, 2008.
|
| |
7
|
A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell, 11(12):4241--4257, 2000.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In Machine Learning: ECML 2004, pages 217--226. Springer, 2004.
|
| |
15
|
T. G. Kolda and B. W. Bader. Tensor decompositions and applications. Technical Report SAND 2007-6702, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, 2007.
|
| |
16
|
|
| |
17
|
M. Koyuturk, W. Szpankowski, and A. Grama. Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol, 14(6):747--64, 2007.
|
| |
18
|
|
| |
19
|
R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart et al. Expander - an integrative program suite for microarray data analysis. BMC Bioinformatics, 6(1):232, 2005.
|
 |
20
|
|
| |
21
|
T. Uno. An efficient algorithm for enumerating pseudo cliques. In Proceedings of ISAAC 2007, pages 402--414, 2007.
|
 |
22
|
|
 |
23
|
Zhiping Zeng , Jianyong Wang , Lizhu Zhou , George Karypis, Coherent closed quasi-clique discovery from large dense graph databases, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150506]
|
 |
24
|
|
|