| |
1
|
J. Anemueller and et al. Biologically motivated audio-visual cue integration for object categorization. In CogSys, 2008.
|
| |
2
|
Z. Barzelay and Y. Schechner. Harmony in motion. In Proc. CVPR, pages 1--8, 2007.
|
| |
3
|
M.J. Beal and et al. A graphical model for audiovisual object tracking. IEEE Trans. PAMI, 25(7):828--836, 2003.
|
| |
4
|
S. Birchfield. KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker. http://vision.stanford.edu/birch.
|
| |
5
|
S.F. Chang and et al. Columbia university TRECVID-2005 video search and high-level feature extraction. In NIST TRECVID workshop, Gaithersburg, MD, 2005.
|
| |
6
|
S.F. Chang and et al. Large-scale multimodal semantic concept detection for consumer video. In ACM MIR, 2007.
|
| |
7
|
Y.X. Chen and et al. Image categorization by learning and reasoning with regions. In JMLR, 5:913--939, 2004.
|
| |
8
|
M. Cristani and et al. Audio-visual event recognition in surveillance video sequences. In IEEE Trans. Multimedia, 9(2):257--267, 2007.
|
| |
9
|
S. Chu and et al. Environmental sound recognition using MP-based features. in Proc. ICASSP, pages 1--4, 2008.
|
| |
10
|
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. CVPR, pages 886--893, 2005.
|
| |
11
|
D. Dementhon and D. Doermann. Video retrieval using spatial-temporal descriptors. In ACM Multimedia, 2003.
|
| |
12
|
Y. Deng and B.S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. In IEEE Trans. PAMI, 23(8):800--810, 2001.
|
| |
13
|
J. Friedman and et al. Additive logistic regression: a statistical view of boosting. Ann. of Sta., 28(22):337--407, 2000.
|
| |
14
|
K. Grauman and T. Darrel. The pyramid match kernel: Discriminative classification with sets of image features. In Proc. ICCV, 2:1458--1465, 2005.
|
| |
15
|
B. Han and et al. Incremental density approximation and kernel-based bayesian filtering for object tracking. In Proc. CVPR, pages 638--644, 2004.
|
| |
16
|
J. Hershey and J. Movellan. Audio-vision: Using audio-visual synchrony to locate sounds. In NIPS, 1999.
|
| |
17
|
K. Iwano and et al. Audio-visual speech recognition using lip information extracted from side-face images. In EURASIP JASMP, 2007(1):4--4, 2007.
|
| |
18
|
A. Jepson and et al. Robust online appearence models for visual tracking. IEEE Trans.PAMI, 25(10):1296--1311, 2003.
|
| |
19
|
R. Kaucic, B. Dalton, and A. Blake. Real-time lip tracking for audio-visual speech recognition applications. In Proc. ECCV, vol.2, pages 376--387, 1996.
|
| |
20
|
R. Gribonval and S. Krstulovic. MPTK, the matching pursuit toolkit. http://mptk.irisa.fr/
|
| |
21
|
A. Loui and et al. Kodak's consumer video benchmark data set: concept definition and annotation. In ACM SIGMM Int'l Workshop on MIR, pages 245--254, 2007.
|
| |
22
|
D. Lowe. Distinctive image features from scale-invariant keypoints. In IJCV, 60(2):91--110, 2004.
|
| |
23
|
B.D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proc. Imaging understanding workshop, pages 121--130, 1981.
|
| |
24
|
S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. In IEEE Trans. Signal Processing, 41(12):3397--3415, 1993.
|
| |
25
|
O. Maron and et al. A framework for multiple-instance learning. In NIPS, 1998.
|
| |
26
|
J.C. Niebles and et al. Extracting moving people from internet videos. in Proc. ECCV, pages 527--540, 2008.
|
| |
27
|
NIST. TREC Video Retrieval Evaluation (TRECVID). 2001 -- 2008. http://www-nlpir.nist.gov/projects/trecvid/
|
| |
28
|
J. Ogle and D. Ellis. Fingerprinting to identify repeated sound events in long-duration personal audio recordings. In Proc. ICASSP, pages I-233-236, 2007.
|
| |
29
|
F. Petitcolas. MPEG for MATLAB. http://www.petitcolas.net/fabien/software/mpeg
|
| |
30
|
J. Shi and C. Tomasi. Good features to track. In Proc. CVPR, pages 593--600, 1994.
|
| |
31
|
C. Stauffer and W.E.L. Grimson. Learning patterns of activity using real-time tracking. In IEEE Trans. PAMI, 22(8):747--757, 2002.
|
| |
32
|
K. Tieu and P. Viola. Boosting image retrieval. In IJCV, 56(1-2):228--235, 2000.
|
| |
33
|
V. Vapnik. Statistical learning theory. Wiley-Interscience, New York, 1998.
|
| |
34
|
X.G. Wang and et al. Learning Semantic Scene Models by Trajectory Analysis. In Proc. ECCV, pages 110--123, 2006.
|
| |
35
|
Y. Wu and et al. Multimodal information fusion for video concept detection. in Proc. ICIP, pages 2391--2394, 2004.
|
| |
36
|
C. Yang and et al. Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In Proc. CVPR, pages 2057--2063, 2006.
|
| |
37
|
G.Q. Zhao and et al. Large head movement tracking using SIFT-based registration. In ACM Multimedia, 2007.
|
| |
38
|
H. Zhou and et al. Object tracking using sift features and mean shift. Com. Vis. & Ima. Und., 113(3):345--352, 2009.
|
| |
39
|
J.C. Niebles and et al.. Extracting moving people from
|