| Role recognition in multiparty recordings using social affiliation networks and discrete distributions |
| Full text |
Pdf
(214 KB)
|
Source
|
International Conference on Multimodal Interfaces
archive
Proceedings of the 10th international conference on Multimodal interfaces
table of contents
Chania, Crete, Greece
SESSION: Special session on social signal processing (oral session)
table of contents
Pages 29-36
Year of Publication: 2008
ISBN:978-1-60558-198-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 79, Citation Count: 0
|
|
|
ABSTRACT
This paper presents an approach for the recognition of roles in multiparty recordings. The approach includes two major stages: extraction of Social Affiliation Networks (speaker diarization and representation of people in terms of their social interactions), and role recognition (application of discrete probability distributions to map people into roles). The experiments are performed over several corpora, including broadcast data and meeting recordings, for a total of roughly 90 hours of material. The results are satisfactory for the broadcast data (around 80 percent of the data time correctly labeled in terms of role), while they still must be improved in the case of the meeting recordings (around 45 percent of the data time correctly labeled). In both cases, the approach outperforms significantly chance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Ajmera. Robust Audio Segmentation. PhD thesis, École Polytechnique Fédérale de Lausanne (EPFL), 2004.
|
| |
2
|
J. Ajmera and C. Wooters. A robust speaker clustering algorithm. In Proceedings of IEEE Workshop on Automatic Speech Recognition Understanding, 2003.
|
| |
3
|
S. Banerjee and A. Rudnicky. Using simple speech based features to detect the state of a meeting and the roles of the meeting participants. In proceedings of International Conference on Spoken Language Processing, 2004.
|
| |
4
|
|
| |
5
|
|
| |
6
|
J. Dines, J. Vepa, and T. Hain. The segmentation of multi-channel meeting recordings for automatic speech recognition. In Proceedings of Interspeech, pages 1213--1216, 2006.
|
| |
7
|
E. Glaeser and J. Scheinkman. Measuring social interactions. In S. Durlauf and H. Young, editors, Social Dynamics, pages 83--132. MIT Press, 2001.
|
| |
8
|
T. Hain, L. Burget, J. Dines, G. Garau, V. Wan, M. Karafiát, J. Vepa, and M. Lincoln. The AMI system for the transcription of speech in meetings. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 357--360, 2007.
|
| |
9
|
Xuedong Huang , Alex Acero , Raj Reddy , Hsiao-Wuen Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR, Upper Saddle River, NJ, 2001
|
| |
10
|
I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, W. Post, D. Reidsma, and P. Wellner. The AMI meeting corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, 2005.
|
| |
11
|
Iain McCowan , Daniel Gatica-Perez , Samy Bengio , Guillaume Lathoud , Mark Barnard , Dong Zhang, Automatic Analysis of Multimodal Group Actions in Meetings, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.27 n.3, p.305-317, March 2005
[doi> 10.1109/TPAMI.2005.49]
|
| |
12
|
H. Tischler. Introduction to Sociology. Harcourt Brace College Publishers, 1990.
|
| |
13
|
A. Vinciarelli. Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Transactions on Multimedia, 9(6), 2007.
|
| |
14
|
S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, 1994.
|
| |
15
|
C. Weng, W. Chu, and J. Wu. Movie analysis based on roles social network. In proceedings of IEEE International Conference on Multimedia and Expo, pages 1403--1406, 2007.
|
| |
16
|
S. Wrigley, G. Brown, V. Wan, and S. Renals. Speech and crosstalk detection in multichannel audio. IEEE Transactions on Speech and Audio Processing, 13(1):84--91, 2005.
|
 |
17
|
|
|