|
ABSTRACT
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004.
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
W. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB, 2003.
|
| |
8
|
W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research (JAIR), 10:243--270, 1999.
|
| |
9
|
|
| |
10
|
M. Collins and T. Koo. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25--69, 2005.
|
 |
11
|
|
| |
12
|
|
| |
13
|
C. P. Diehl, L. Getoor, and G. Namata. Name reference resolution in organizational email archives. In SIAM, 2006.
|
| |
14
|
M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI, 2005.
|
| |
15
|
S. Haykin. Neural Networks. Macmillan College Publishing Company, 1994.
|
| |
16
|
|
 |
17
|
|
| |
18
|
D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationship for domain independent data cleaning. In SIAM, 2005.
|
| |
19
|
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML, 2004.
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from emails: Applying named entity recognition to informal text. In HLT-EMNLP, 2005.
|
 |
24
|
|
| |
25
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report, Computer Science department, Stanford University, 1998.
|
 |
26
|
|
| |
27
|
G. Salton and C. Buckley. Global text matching for information retrieval. Science, 253:1012--1015, 1991.
|
| |
28
|
|
| |
29
|
|
 |
30
|
Kristina Toutanova , Christopher D. Manning , Andrew Y. Ng, Learning random walk models for inducing word dependency distributions, Proceedings of the twenty-first international conference on Machine learning, p.103, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015442]
|
 |
31
|
Wensi Xi , Edward A. Fox , Weiguo Fan , Benyu Zhang , Zheng Chen , Jun Yan , Dong Zhuang, SimFusion: measuring similarity using unified relationship matrix, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076059]
|
 |
32
|
|
| |
33
|
D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, 2005.
|
| |
34
|
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Philippe Cudré-Mauroux , Parisa Haghani , Michael Jost , Karl Aberer , Hermann De Meer, idMesh: graph-based disambiguation of linked data, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
Einat Amitay , David Carmel , Nadav Har'El , Shila Ofek-Koifman , Aya Soffer , Sivan Yogev , Nadav Golbandi, Social search and discovery using a unified approach, Proceedings of the 20th ACM conference on Hypertext and hypermedia, June 29-July 01, 2009, Torino, Italy
|
|
|
|
|