ACM Home Page
Please provide us with feedback. Feedback
Generating summary keywords for emails using topics
Full text PdfPdf (597 KB)
Source
International Conference on Intelligent User Interfaces archive
Proceedings of the 13th international conference on Intelligent user interfaces table of contents
Gran Canaria, Spain
SESSION: Recommenders table of contents
Pages 199-206  
Year of Publication: 2008
ISBN:978-1-59593-987-6
Authors
Mark Dredze  University of Pennsylvania, Philadelphia, PA
Hanna M. Wallach  University of Cambridge, Cambridge, UK
Danny Puller  University of Pennsylvania, Philadelphia, PA
Fernando Pereira  University of Pennsylvania, Philadelphia, PA
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
AAAI : Association for the Advancement of Artifical Intelligence
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 210,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1378773.1378800
What is a DOI?

ABSTRACT

Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ron Bekkerman, Andrew McCallum, and Gary Huang. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora. Technical Report IR-418, University of Massachusetts Amherst, 2004.
 
2
 
3
4
 
5
Vitor R. Carvalho and William Cohen. Recommending recipients in the Enron email corpus. Technical Report CMU-LTI-07-005, Carnegie Mellon University, 2007.
 
6
 
7
 
8
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
9
 
10
Susan T. Dumais. LSI meets TREC: A status report. In Text REtrieval Conference, pages 137--152, 1992.
11
 
12
Joshua Goodman and Vitor R. Carvalho. Implicit queries for email. In CEAS, 2005.
 
13
T. L. Griffiths and M. Steyvers. A probabilistic approach to semantic representation. In Proceedings of the 24th Annual Conference of the Cognitive Society, 2002.
 
14
T. Hoffman. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.
 
15
B. Klimt and Y. Yang. The Enron corpus: A new dataset for email classification research. In ECML, 2004.
 
16
Andrew McCallum, Andres Corrada-Emmanuel, and Xuerui Wang. Topic and role discovery in social networks. In IJCAI, 2005.
 
17
Andrew McCallum, Xuerui Wang, and Andres Corrada-Emmanuel. Topic and role discovery in social networks with experiments on Enron and academic email. In Journal of Artificial Intelligence Research, 2007.
 
18
Andrew Kachites McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
 
19
Ryan McDonald, Koby Crammer, Kuzman Ganchev, Surya Prakash Bachoti, and Mark Dredze. Penn StructLearn. http://www.seas.upenn.edu/strctlrn/StructLearn/StructLearn.html, 2006.
 
20
 
21
Carman Neustaedter, A. J. Bernheim Brush, Marc A. Smith, and Danyel Fisher. The social network and relationship finder: Social sorting for email triage. In Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2005.
 
22
Chris Pal and Andrew McCallum. CC prediction with graphical models. In Conference on Email and Anti-Spam (CEAS), 2006.
 
23
Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. Summarizing email threads. In HLT/NAACL, 2004.
24
 
25
S Sood, S Owsley, K Hammond, and L Birnbaum. Tag Assist: Automatic tag suggestion for blog posts. In ICWSM, 2007.
 
26
Mark Steyvers and Tom Griffiths. Probabilistic topic models. In D McNamara, S Dennis, and W Kintsch, editors, Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, in press.
 
27
G. Venolia, L. Dabbish, J. J. Cadiz, and A. Gupta. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research, 2001.
28
 
29
 
30
Xuerui Wang and Andrew McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts Amherst, 2005.
31


Collaborative Colleagues:
Mark Dredze: colleagues
Hanna M. Wallach: colleagues
Danny Puller: colleagues
Fernando Pereira: colleagues