|
ABSTRACT
Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ron Bekkerman, Andrew McCallum, and Gary Huang. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora. Technical Report IR-418, University of Massachusetts Amherst, 2004.
|
| |
2
|
|
| |
3
|
Wray Buntine , Jaakko Lofstrom , Jukka Perkio , Sami Perttu , Vladimir Poroshin , Tomi Silander , Henry Tirri , Antti Tuominen , Ville Tuulos, A Scalable Topic-Based Open Source Search Engine, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.228-234, September 20-24, 2004
[doi> 10.1109/WI.2004.12]
|
 |
4
|
|
| |
5
|
Vitor R. Carvalho and William Cohen. Recommending recipients in the Enron email corpus. Technical Report CMU-LTI-07-005, Carnegie Mellon University, 2007.
|
| |
6
|
|
| |
7
|
|
| |
8
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
|
 |
9
|
|
| |
10
|
Susan T. Dumais. LSI meets TREC: A status report. In Text REtrieval Conference, pages 137--152, 1992.
|
 |
11
|
Michael Fink , Shai Shalev-Shwartz , Yoram Singer , Shimon Ullman, Online multiclass learning by interclass hypothesis sharing, Proceedings of the 23rd international conference on Machine learning, p.313-320, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143884]
|
| |
12
|
Joshua Goodman and Vitor R. Carvalho. Implicit queries for email. In CEAS, 2005.
|
| |
13
|
T. L. Griffiths and M. Steyvers. A probabilistic approach to semantic representation. In Proceedings of the 24th Annual Conference of the Cognitive Society, 2002.
|
| |
14
|
T. Hoffman. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.
|
| |
15
|
B. Klimt and Y. Yang. The Enron corpus: A new dataset for email classification research. In ECML, 2004.
|
| |
16
|
Andrew McCallum, Andres Corrada-Emmanuel, and Xuerui Wang. Topic and role discovery in social networks. In IJCAI, 2005.
|
| |
17
|
Andrew McCallum, Xuerui Wang, and Andres Corrada-Emmanuel. Topic and role discovery in social networks with experiments on Enron and academic email. In Journal of Artificial Intelligence Research, 2007.
|
| |
18
|
Andrew Kachites McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
|
| |
19
|
Ryan McDonald, Koby Crammer, Kuzman Ganchev, Surya Prakash Bachoti, and Mark Dredze. Penn StructLearn. http://www.seas.upenn.edu/strctlrn/StructLearn/StructLearn.html, 2006.
|
| |
20
|
|
| |
21
|
Carman Neustaedter, A. J. Bernheim Brush, Marc A. Smith, and Danyel Fisher. The social network and relationship finder: Social sorting for email triage. In Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2005.
|
| |
22
|
Chris Pal and Andrew McCallum. CC prediction with graphical models. In Conference on Email and Anti-Spam (CEAS), 2006.
|
| |
23
|
Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. Summarizing email threads. In HLT/NAACL, 2004.
|
 |
24
|
|
| |
25
|
S Sood, S Owsley, K Hammond, and L Birnbaum. Tag Assist: Automatic tag suggestion for blog posts. In ICWSM, 2007.
|
| |
26
|
Mark Steyvers and Tom Griffiths. Probabilistic topic models. In D McNamara, S Dennis, and W Kintsch, editors, Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, in press.
|
| |
27
|
G. Venolia, L. Dabbish, J. J. Cadiz, and A. Gupta. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research, 2001.
|
 |
28
|
|
| |
29
|
|
| |
30
|
Xuerui Wang and Andrew McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts Amherst, 2005.
|
 |
31
|
|
CITED BY 2
|
|
|
|
|
Mark Dredze , Hanna M. Wallach , Danny Puller , Tova Brooks , Josh Carroll , Joshua Magarick , John Blitzer , Fernando Pereira, Intelligent email: aiding users with AI, Proceedings of the 23rd national conference on Artificial intelligence, p.1524-1527, July 13-17, 2008, Chicago, Illinois
|
|