ACM Home Page
Please provide us with feedback. Feedback
A multimodal speaker detection and tracking system for teleconferencing
Full text PdfPdf (297 KB)
Source International Multimedia Conference archive
Proceedings of the tenth ACM international conference on Multimedia table of contents
Juan-les-Pins, France
DEMONSTRATION SESSION: Demonstration session 2 table of contents
Pages: 427 - 428  
Year of Publication: 2002
ISBN:1-58113-620-X
Authors
Billibon H. Yoshimi  IBM T. J. Watson Research Lab, Yorktown Heights, NY
Gopal S. Pingali  IBM T. J. Watson Research Lab, Hawthorne, NY
Sponsors
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGCOMM: ACM Special Interest Group on Data Communication
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 23,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/641007.641100
What is a DOI?

ABSTRACT

A serious problem in both audio and video conferencing facilities available today is the difficulty in determining who is speaking among a large number of participants. There is a strong need for developing meeting room infrastructure and teleconference facilities that improve the sense of presence and participation experienced in remote meetings. We present a distributed multimodal tracking system that uses multiple cameras and microphones to automatically select the current speaker among multiple meeting participants. The system actively obtains and transmits video showing a good view of the selected speaker. The tracking system is integrated into a web-based video conferencing application that connects seven meeting rooms around the globe. An important part of designing such a system is to determine sensor placement and configuration through systematic experiments in the actual rooms where the system is deployed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Sukaviriya, B. Yoshimi, H. Derby, B. Bolam, B. Carmelli, J. Elliott, A. Ribak, V. Soroka, and J. Morgan, "Smart Meeting - Support for Sharing Participant Information in Live Meetings through the Web," Proceedings of The Fourth International Conference on Distributed Communities on the Web, Sydney, Australia, 2002.
 
2
D. R. Fischell and C. R. Coker, "A speech direction finder," in Proc. Of International Conference on Acoustics Speech and Signal Processing, 1984, pp. 19.8.1--19.8.4.
 
3
J. Connell, "http://www.research.ibm.com/ecvg/jhc_proj/ faces.html," 2002.


Collaborative Colleagues:
Billibon H. Yoshimi: colleagues
Gopal S. Pingali: colleagues