ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Category detection using hierarchical mean shift
Full text MovMov (12:24),  PdfPdf (529 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages: 847-856  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Pavan Vatturi  Oregon State University, Corvallis, OR, USA
Weng-Keen Wong  Oregon State University, Corvallis, OR, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 190,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557112
What is a DOI?

ABSTRACT

Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning that can help address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point under an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to bring to the user's attention a representative data point from each category in the data in as few queries as possible. In a data set with imbalanced categories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as rare category detection. We present a new approach to rare category detection based on hierarchical mean shift. In our approach, a hierarchy is created by repeatedly applying mean shift with an increasing bandwidth on the data. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main advantage of this methodology over existing approaches is that it does not require any knowledge of the dataset properties such as the total number of categories or the prior probabilities of the categories. Results on real-world data sets show that our hierarchical mean shift approach performs consistently better than previous techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
Daniel Dementhon. Spatio-temporal segmentation of video by hierarchical mean shift analysis. In SMVP 2002 (Statistical Methods in Video Processing Workshop), 2002.
 
5
 
6
K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):32--40, 1975.
 
7
 
8
Jingrui He and Jaime Carbonell. Nearest-neighbor-based active learning for rare category detection. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 633--640. MIT Press, Cambridge, MA, 2008.
 
9
M. Chris Jones, James S. Marron, and Simon J. Sheather. A brief survey of bandwidth selection for density estimation. Journal of American Statistical Association, 91(433):401--407, March 1996.
 
10
Andrew Moore Kan Deng. Multiresolution instance-based learning. In Proceedings of the Twelfth International Joint Conference on Artificial Intellingence, pages 1233--1239, San Francisco, 1995. Morgan Kaufmann.
 
11
Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. Active learning with gaussian processes for object categorization. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1--8, 2007.
 
12
 
13
 
14
C.L. Blake D.J. Newman and C.J. Merz. UCI repository of machine learning databases, 1998.
 
15
Dan Pelleg and Andrew Moore. Active learning for anomaly and rare-category detection. In Advances in Neural Information Processing Systems 18, December 2004.
 
16
 
17
 
18
Ping Wang, Dongryeol Lee, Alexander Gray, and James Rehg. Fast mean shift with accurate and stable convergence. In In Proceedings of AISTATS 2007, 2007.
 
19
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, pages 505--512. MIT Press, 2003.
 
20
 
21

Collaborative Colleagues:
Pavan Vatturi: colleagues
Weng-Keen Wong: colleagues