ACM Home Page
Please provide us with feedback. Feedback
Similarity measures for tracking information flow
Full text PdfPdf (146 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Paper session IR-6 (information retrieval): IR models 1 table of contents
Pages: 517 - 524  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Donald Metzler  University of Massachusetts, Amherst, MA
Yaniv Bernstein  RMIT University, Melbourne, Australia
W. Bruce Croft  University of Massachusetts, Amherst, MA
Alistair Moffat  University of Melbourne, Melbourne, Australia
Justin Zobel  RMIT University, Melbourne, Australia
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099695
What is a DOI?

ABSTRACT

Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity -- resulting from summarization, paraphrasing, copying, and stronger forms of topical relevance -- are useful for applications such as information flow analysis and question-answering tasks. In this paper, we explore mechanisms for measuring such intermediate kinds of similarity, focusing on the task of identifying where a particular piece of information originated. We consider both sentence-to-sentence and document-to-document comparison, and have incorporated these algorithms into <small>RECAP</small>, a prototype information flow analysis tool. Our experimental results with <small>RECAP</small> indicate that new mechanisms such as those we propose are likely to be more appropriate than existing methods for identifying the intermediate forms of similarity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proc. DARPA Broadcast News Transcription and Understanding Workshop, pages 194--218, 1998.
3
 
4
Y. Bernstein and J. Zobel. A scalable system for identifying coderivative documents. In Proc. String Processing and Information Retrieval Symp., pages 55--67, 2004. Published as LNCS 3246.
5
 
6
 
7
 
8
D. Harman. Overview of the TREC 2002 novelty track. In Proc. 11th Text REtrieval Conf. (TREC 2002). NIST, 2002.
 
9
N. Heintze. Scalable document fingerprinting. In Proc. USENIX Workshop on Electronic Commerce, November 1996.
 
10
 
11
U. Manber. Finding similar files in a large file system. In Proc. USENIX Winter Technical Conf., pages 1--10, San Fransisco, CA, USA, 17--21 1994.
12
 
13
D. Metzler, T. Strohman, H. Turtle, and W. B. Croft. Indri at terabyte track 2004. In Proc. 13th Text REtrieval Conf. (TREC 2004). NIST, 2004.
 
14
V. Murdock and W. B. Croft. Simple translation models for sentence retrieval in factoid question answering. In Proc. SIGIR Workshop on Information Retrieval for Question Answering, pages 31--35, 2004.
15
 
16
S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at TREC. In Proc. 1st Text REtrieval Conf. (TREC 2001), pages 21--30. NIST, 1992.
 
17
M. Sanderson. Duplicate detection in the Reuters collection. Technical Report TR-1997-5, University of Glasgow, 1997.
 
18
N. Shivakumar and H. García-Molina. SCAM: A copy detection mechanism for digital documents. In Proc. 2nd Conf. on the Theory and Practice of Digital Libraries, 1995.
 
19
I. Soboroff and D. Harman. Overview of the TREC 2003 novelty track. In Proc. 12th Text REtrieval Conf. (TREC 2003), pages 38--53. NIST, 2003.
20

CITED BY  11

Collaborative Colleagues:
Donald Metzler: colleagues
Yaniv Bernstein: colleagues
W. Bruce Croft: colleagues
Alistair Moffat: colleagues
Justin Zobel: colleagues