ACM Home Page
Please provide us with feedback. Feedback
Extracting unstructured data from template generated web documents
Full text PdfPdf (210 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Poster papers - short papers table of contents
Pages: 512 - 515  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Ling Ma  Illinois Institute of Technology
Nazli Goharian  Illinois Institute of Technology
Abdur Chowdhury  America Online Inc.
Misun Chung  Illinois Institute of Technology
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 59,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956961
What is a DOI?

ABSTRACT

We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the retrieval precision for the queries that generate irrelevant results. We believe that by reducing the number of irrelevant results; the users are encouraged to go back to a given site to search. Our experimental results on several different web sites and on the whole cnnfn collection demonstrate the feasibility of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
6
 
7
8
 
9
 
10
 
11
L. Su,, H. Chen,, and X. Dong, "Evaluation of Web-based search engines from the end-user's perspective: a pilot study", Proc. of the Conf. for the American Soc. for Inf. Science, 1998.
 
12
S.E. Robertson, S. Walker and M. Beaulieu. "Okapi at TREC-7: automatic ad hoc, filtering, and interactive", Proceedings of the Seventh Text REtrieval Conference (TREC-7), 1999.


Collaborative Colleagues:
Ling Ma: colleagues
Nazli Goharian: colleagues
Abdur Chowdhury: colleagues
Misun Chung: colleagues

Peer to Peer - Readers of this Article have also read: