ACM Home Page
Please provide us with feedback. Feedback
EPCI: extracting potentially copyright infringement texts from the web
Full text PdfPdf (187 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
POSTER SESSION: Search table of contents
Pages: 1151 - 1152  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Takashi Tashiro  Waseda University
Takanori Ueda  Waseda University
Taisuke Hori  Waseda University
Yu Hirate  Waseda University
Hayato Yamana  Waseda University
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 36,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242740
What is a DOI?

ABSTRACT

In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set of queries based on a given copyright reserved seed-text, (2) putting every query to search engine API, (3) gathering the search result Web pages from high ranking until the similarity between the given seed-text and the search result pages becomes less than a given threshold value, and (4) merging all the gathered pages, then re-ranking them in the order of their similarity. Our experimental result using 40 seed-texts shows that EPCI is able to extract 132 potentially copyright infringement Web pages per a given copyright reserved seed-text with 94% precision in average.



Collaborative Colleagues:
Takashi Tashiro: colleagues
Takanori Ueda: colleagues
Taisuke Hori: colleagues
Yu Hirate: colleagues
Hayato Yamana: colleagues