ACM Home Page
Please provide us with feedback. Feedback
Mining data records in Web pages
Full text PdfPdf (297 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
POSTER SESSION: Research track table of contents
Pages: 601 - 606  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Bing Liu  University of Illinois at Chicago, Chicago, IL
Robert Grossman  University of Illinois at Chicago, Chicago, IL
Yanhong Zhai  University of Illinois at Chicago, Chicago, IL
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 43,   Downloads (12 Months): 203,   Citation Count: 44
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956826
What is a DOI?

ABSTRACT

A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to perform the task. The technique is based on two observations about data records on the Web and a string matching algorithm. The proposed technique is able to mine both contiguous and non-contiguous data records. Our experimental results show that the proposed technique outperforms existing techniques substantially.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Buttler, D., Liu, L., Pu, C. "A fully automated extraction system for the World Wide Web." IEEE ICDCS-21, 2001.
3
4
5
6
 
7
 
8
 
9
 
10
Lerman, K. Knoblock, C., and Minton, S. "Automatic data extraction from lists and tables in web sources." IJCAI-01 Workshop on Adaptive Text Extraction and Mining, 2001.
 
11
Liu, B., Grossman, R. and Zhai, Y. "Mining data records in Web pages." UIC Technical Report, 2003.
12

CITED BY  44

Collaborative Colleagues:
Bing Liu: colleagues
Robert Grossman: colleagues
Yanhong Zhai: colleagues