| Naming every individual in news video monologues |
| Full text |
Pdf
(343 KB)
|
| Source
|
International Multimedia Conference
archive
Proceedings of the 12th annual ACM international conference on Multimedia
table of contents
New York, NY, USA
SESSION: Technical session 6: learning in multi-modal data
table of contents
Pages: 580 - 587
Year of Publication: 2004
ISBN:1-58113-893-8
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 18, Citation Count: 2
|
|
|
ABSTRACT
Naming every individual person appearing in broadcast news videos with names detected from the video transcript leads to better access of the news video content. In this paper, we approach this challenging problem with a statistical learning method. Two categories of information extracted from multiple video modalities have been explored, namely <i>features</i>, which help distinguish the true name of every person, as well as <i>constraints</i>, which reveal the relationships among the names of different persons. The person-naming problem is formulated into a learning framework which predicts the most likely name for each person based on the features, and refines the predictions using the constraints. Experiments conducted on ABC World New Tonight and CNN Headline News videos demonstrate that this approach outperforms a non-learning alternative by a large amount.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel M. Bikel , Scott Miller , Richard Schwartz , Ralph Weischedel, Nymble: a high-performance learning name-finder, Proceedings of the fifth conference on Applied natural language processing, p.194-201, March 31-April 03, 1997, Washington, DC
[doi> 10.3115/974557.974586]
|
| |
2
|
Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y.W., Miller, E., Foryth, D. Names and Faces in the News. In Proc. of Computer Vision and Pattern Recognition, Vol.2, pp. 848--854, 2004.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Rong, Y., Zhang, J., Yang, J. and Hauptmann, A. A Discriminative Learning Framework with Pair-wise Constraints for Video Object Classification. In Proc. of Computer Vision and Pattern Recognition, Vol.2, pp. 284--291, 2004.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Snoek, C.G.M. and Hauptmann, A. Learning to identify TV news monologues by style and context. Technical Report, CMU-CS-03-193, Carnegie Mellon University, 2003.
|
| |
12
|
TRECVID: TREC Video Retrieval Evaluation: http://www-nlpir.nist.gov/projects/trecvid/.
|
| |
13
|
Yang, J., Chen, M.Y., Hauptmann, A. Finding Person X: Correlating Names with Visual Appearances. Int'l Conf. on Image and Video Retrieval, 2004. (To appear)
|
| |
14
|
|
 |
15
|
Lei Zhang , Longbin Chen , Mingjing Li , Hongjiang Zhang, Automated annotation of human faces in family albums, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
[doi> 10.1145/957013.957090]
|
 |
16
|
|
|