ACM Home Page
Please provide us with feedback. Feedback
Detecting outlying properties of exceptional objects
Full text PdfPdf (1.04 MB)
Source
ACM Transactions on Database Systems (TODS) archive
Volume 34 ,  Issue 1  (April 2009) table of contents
Article No. 7  
Year of Publication: 2009
ISSN:0362-5915
Authors
Fabrizio Angiulli  DEIS, Università della Calabria, Rende (CS), Italy
Fabio Fassetti  ICAR-CNR, Rende (CS), Italy
Luigi Palopoli  DEIS, Università della Calabria, Rende (CS), Italy
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 149,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508857.1508864
What is a DOI?

ABSTRACT

Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the (a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a “property” of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
Arning, A., Aggarwal, C., and Raghavan, P. 1996. A linear method for deviation detection in large databases. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), 164--169.
 
6
Barnett, V. and Lewis, T. 1994. Outliers in Statistical Data. John Wiley & Sons.
 
7
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth, Belmont.
8
9
 
10
 
11
Codd, E., Codd, S., and Salley, C. 1993. Providing OLAP (on-line analytical processing) to user-analysts: An it mandate. Tech. rep., Codd & Date, Inc.
 
12
De Benedictis, G., Rose, G., Carrieri, G., Luca, M. D., Falcone, E., Passarino, G., Bonafè, M., Monti, D., Baggio, G., Bertolini, S., Mari, D., Mattace, R., and Franceschi, C. 1999. Mitochondrial DNA inherited variants are associates with successful aging and longevity in humans. FASEB J. 13, 12, 1532--1536.
 
13
Garasto, S., Berardelli, M., Rango, F. D., Mari, V., Feraco, E., and Benedictis, G. D. 2004. A study of the average effect of the 3'apob-vntr polymorphism on lipidemic parameters could explain why the short alleles (< 35 repeats) are rare in centenarians. BMC Medical Genetics 5, 3.
 
14
 
15
Gini, C. 1921. Measurement of inequality of incomes. The Economic J. 31, 124--126.
 
16
Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C., and Gelbart, W. M. 1996. An Introduction to Genetic Analysis. W. H. Freeman.
 
17
 
18
Karp, R. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Plenum, New York, 85--103.
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
Newman, D. J., Hettich, S., Blake, C. L., and Merz, C. 1998. UCI repository of machine learning databases.
 
27
Papadimitriou, C. 1994. Computational Complexity. Addison-Wesley, USA.
 
28
Passarino, G., Montesanto, A., Dato, S., Giordano, S., Domma, F., Mari, V., Feraco, E., and Benedictis, G. D. 2006. Sex and age specificity of susceptibility genes modulating survival at old age. Human Heredity (Int. J. Hum. Medical Genetics) 62, 4, 213--220.
29
 
30
Rymon, R. 1992. Search through systematic set enumeration. In Proceedings of the International Conference on Principles of Knowledge and Reasoning (KR), 539--550.
 
31
 
32
 
33
 
34
Suzuki, E. 2006. Data mining methods for discovering interesting exceptions from an unsupervised table. J. Universal Comput. Sci. 12, 6, 627--653.
 
35
 
36
Wei, L., Qian, W., Zhou, A., Jin, W., and Yu, J. 2003. Hot: Hypergraph-Based outlier test for categorical data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 399--410.
 
37
 
38
 
39

Collaborative Colleagues:
Fabrizio Angiulli: colleagues
Fabio Fassetti: colleagues
Luigi Palopoli: colleagues