|
ABSTRACT
In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold with an entirely different vocabulary while still maintaining its identity. Since most retrieval systems consider documents to be related only when their word content is similar, we propose joke retrieval as a domain where standard language models may fail. Other meaning-centric domains include logic puzzles, proverbs and recipes; in such domains, new techniques may be required to enable us to search effectively. For jokes, a necessary component of any retrieval system will be the ability to identify the "same joke," so we examine this task in both ranking and classification settings. We exploit the structure of jokes to develop two domain-specific alternatives to the "bag of words" document model. In one, only the punch lines, or final sentences, are compared; in the second, certain categories of words (e.g., professions and countries) are tagged and treated as interchangeable. Each technique works well for certain jokes. By combining the methods using machine learning, we create a hybrid that achieves higher performance than any individual approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Allan, J., Callan, J., Croft, W. B., Ballesteros, L., Broglio, J., Xu, J., and Shu, H. 1997. INQUERY at TREC-5. In Proceedings of the 5th Text Retrieval Conference. NIST, 119-132.
|
| |
2
|
Attardo, S. and Raskin, V. 1991. Script theory revis(it)ed: Joke similarity and joke representation model. Humor: International Journal of Humor Research 4(3-4), 293--347.
|
 |
3
|
|
 |
4
|
|
| |
5
|
Kim Binsted , Benjamin Bergen , Seana Coulson , Anton Nijholt , Oliviero Stock , Carlo Strapparava , Graeme Ritchie , Ruli Manurung , Helen Pain , Annalu Waller , Dave O'Mara, Computational Humor, IEEE Intelligent Systems, v.21 n.2, p.59-69, March 2006
[doi> 10.1109/MIS.2006.22]
|
| |
6
|
Peter F. Brown , John Cocke , Stephen A. Della Pietra , Vincent J. Della Pietra , Fredrick Jelinek , John D. Lafferty , Robert L. Mercer , Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
|
| |
7
|
|
| |
8
|
Hofstadter, D. and Gabor, L. 1989. Synopsis of the workshop on humor and cognition. Humor: International Journal of Humor Research, 2(4), 417--440.
|
 |
9
|
A. Kruger , C. L. Giles , F. M. Coetzee , E. Glover , G. W. Flake , S. Lawrence , C. Omlin, DEADLINER: building a new niche search engine, Proceedings of the ninth international conference on Information and knowledge management, p.272-281, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354829]
|
 |
10
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
Raskin, V. 1985. Semantic Mechanisms of Humor. Studies in Linguistics and Philosophy. D. Reidel.
|
| |
17
|
Ritchie, G. 2003. The Linguistic Analysis of Jokes. Routledge Studies in Linguistics, Vol. 2. Routledge, London.
|
| |
18
|
|
| |
19
|
Taylor, J. M. and Mazlack, L. J. 2007. Multiple component computational recognition of children's jokes. In IEEE International Conference on Systems, Man and Cybernetics. 1194--1199.
|
| |
20
|
|
| |
21
|
|
| |
22
|
William A. Woods , Lawrence A. Bookman , Ann Houston , Robert J. Kuhns , Paul Martin , Stephen Green, Linguistic knowledge can improve information retrieval, Proceedings of the sixth conference on Applied natural language processing, p.262-267, April 29-May 04, 2000, Seattle, Washington
[doi> 10.3115/974147.974183]
|
 |
23
|
|
| |
24
|
Zhu, J., Eisenstadt, M., Song, D., and Denham, C. 2006. Exploiting semantic association to answer 'vague queries'. In Li, Y., Looi, M., and Zhong, N., eds., Advances in Intelligent IT - Active Media Technology 2006. Frontiers in Artificial Intelligence and Applications, Vol. 138. IOS Press, 73--78.
|
 |
25
|
|
| |
26
|
Logic Problems - easy, http://www.folj.com/puzzles/easy.htm
|
| |
27
|
The Aristocrats (2005), The Internet Movie Database, http://www.imdb.com/title/tt0436078/
|
| |
28
|
Brain Teasers and Math Puzzles, Syvum Technologies, http://www.syvum.com/teasers/
|
|