ACM Home Page
Please provide us with feedback. Feedback
Photo-based question answering
Full text PdfPdf (1.31 MB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Applications track A3: photo table of contents
Pages 389-398  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Tom Yeh  MIT EECS & CSAIL, Cambridge, MA, USA
John J. Lee  MIT EECS & CSAIL, Cambridge, MA, USA
Trevor Darrell  UC Berkeley & ICSI, Berkeley, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 200,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459412
What is a DOI?

ABSTRACT

Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-based QA system allows direct use of a photo to refer to the object. We develop a three-layer system architecture for photo-based QA that brings together recent technical achievements in question answering and image matching. The first, template-based QA layer matches a query photo to online images and extracts structured data from multimedia databases to answer questions about the photo. To simplify image matching, it exploits the question text to filter images based on categories and keywords. The second, information retrieval QA layer searches an internal repository of resolved photo-based questions to retrieve relevant answers. The third, human-computation QA layer leverages community experts to handle the most difficult cases. A series of experiments performed on a pilot dataset of 30,000 images of books, movie DVD covers, grocery items, and landmarks demonstrate the technical feasibility of this architecture. We present three prototypes to show how photo-based QA can be built into an online album, a text-based QA, and a mobile application.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
199QUERY: text any question and get the answer by SMS! http://www.199query.com/.
 
2
Amazon.com: Online shopping. http://www.amazon.com/.
 
3
Amazon Mechanical Turk. http://www.mturk.com/.
 
4
 
5
Ask.com search engine -- better web search. http://www.ask.com/.
 
6
AskMeNow -- get answers with search designed for mobile. http://www.askmenow.com/.
 
7
 
8
ChaCha: Good answer. http://www.chacha.com/.
 
9
C.Y. Chen, T. Kurozumi, and J. Yamato. Poster image matching by color scheme and layout information. In Proc. of ICME '06, pages 345--348, 2006.
10
 
11
 
12
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, May 1998.
 
13
Flickr: Photo sharing. http://www.flickr.com/.
 
14
ArchitectureWeek great buildings collection. http://www.greatbuildings.com/.
 
15
 
16
B. Katz. Annotating the world wide web using natural language. In Proc. of RIAO '97, 1997.
 
17
 
18
B. Katz, J. Lin, C. Stauffer, and E. Grimson. Answering questions about moving objects in surveillance videos. In Proc. of AAAI Spring Symposium on New Directions in Question Answering '03, 2003.
 
19
Y. Ke and R. Sukthankar. PCA-SIFT: a more distinctive representation for local image descriptors. In Proc. of CVPR '04, volume 2, pages 506--513, 2004.
 
20
Asian online market and groceries superstore. http://www.koamart.com/.
 
21
J. Matas, O. Chum, U. Martin, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In Proc. of BMVC, volume 1, pages 384--393, 2002.
 
22
Naver knowledge search. http://kin.naver.com/.
 
23
 
24
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. of CVPR '07, pages 1--8, 2007.
 
25
B. Platel, E. Balmachnova, L.M.J. Florack, and Ter. Top-points as interest points for image matching. In Proc. of ECCV '06, pages 418--429, 2006.
 
26
B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman. Labelme: a database and web-based tool for image annotation. MIT AI Lab Memo AIM-2005-025, 2005.
 
27
 
28
SnapTell: Image recognition based mobile marketing. http://www.snaptell.com/.
 
29
 
30
31
 
32
E. Voorhees. The TREC-8 question answering track report, 1999.
 
33
Wikipedia: the free encyclopedia. http://www.wikipedia.com/.
 
34
Yahoo! Answers. http://answers.yahoo.com/.
35
 
36
B. Yao, X. Yang, and S.C. Zhu. Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks. In Proc. of EMMCVPR '04, pages 169--183, 2007.
37
38
 
39
T. Yeh, J.J. Lee, and T. Darrell. Adaptive vocabulary forests br dynamic indexing and category learning. In Proc. of ICCV '07, pages 1--8, 2007.
 
40
T. Yeh, K. Tollmar, and T. Darrell. Searching the web with mobile images for location recognition. In Proc. of CVPR '04, volume 2, pages 76--81, 2004.
 
41
Y. Zhang, L. Wang, R. Hartley, and H. Li. Where's the weet-bix? In Proc. of ACCV '07, pages 800--810, 2007.

Collaborative Colleagues:
Tom Yeh: colleagues
John J. Lee: colleagues
Trevor Darrell: colleagues