ACM Home Page
Please provide us with feedback. Feedback
DURIAN: a demo for near-duplicate detection
Full text PdfPdf (197 KB)
Source dg.o; Vol. 151 archive
Proceedings of the 2006 international conference on Digital government research table of contents
San Diego, California
SESSION: System demonstrations table of contents
Pages: 347 - 347  
Year of Publication: 2006
Authors
Hui Yang  Carnegie Mellon University
Jamie Callan  Carnegie Mellon University
Stuart Shulman  University of Pittsburgh
Sponsor
NSF : National Science Foundation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1146598.1146695
What is a DOI?

ABSTRACT

Recently, the move from paper to electronic public comments makes it much easier for individuals to customize form letters while harder for agencies to identify substantive information since there are many near-duplicate comments that express the same viewpoint in slightly different language. The identification of exact- and near-duplicate texts, and recognition of unique text within near-duplicate documents, is an important component of data cleaning and integration processes for eRulemaking.This brief paper describes a demonstration of a near-duplicate detection system, DURIAN (DUplicate Removal In lArge collectioN), that identifies and organizes the near-duplicates for eRulemaking applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1

Collaborative Colleagues:
Hui Yang: colleagues
Jamie Callan: colleagues
Stuart Shulman: colleagues