ACM Home Page
Please provide us with feedback. Feedback
Contextual search and name disambiguation in email using graphs
Full text PdfPdf (204 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Handling messages and finding experts table of contents
Pages: 27 - 34  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Einat Minkov  Carnegie Mellon University, Pittsburgh, PA
William W. Cohen  Carnegie Mellon University, Pittsburgh, PA
Andrew Y. Ng  Stanford University, Stanford, CA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 157,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148179
What is a DOI?

ABSTRACT

Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004.
3
 
4
5
6
 
7
W. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB, 2003.
 
8
W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research (JAIR), 10:243--270, 1999.
 
9
 
10
M. Collins and T. Koo. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25--69, 2005.
11
 
12
 
13
C. P. Diehl, L. Getoor, and G. Namata. Name reference resolution in organizational email archives. In SIAM, 2006.
 
14
M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI, 2005.
 
15
S. Haykin. Neural Networks. Macmillan College Publishing Company, 1994.
 
16
17
 
18
D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationship for domain independent data cleaning. In SIAM, 2005.
 
19
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML, 2004.
20
 
21
 
22
 
23
E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from emails: Applying named entity recognition to informal text. In HLT-EMNLP, 2005.
24
 
25
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report, Computer Science department, Stanford University, 1998.
26
 
27
G. Salton and C. Buckley. Global text matching for information retrieval. Science, 253:1012--1015, 1991.
 
28
 
29
30
31
32
 
33
D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, 2005.
 
34
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.

CITED BY  12

Collaborative Colleagues:
Einat Minkov: colleagues
William W. Cohen: colleagues
Andrew Y. Ng: colleagues