ACM Home Page
Please provide us with feedback. Feedback
Extracting informative images from web news pages via imbalanced classification
Full text PdfPdf (391 KB)
Source
International Multimedia Conference archive
Proceedings of the seventeen ACM international conference on Multimedia table of contents
Beijing, China
SESSION: Multimedia grand challenge table of contents
Pages 1123-1124  
Year of Publication: 2009
ISBN:978-1-60558-608-3
Authors
Wei Gong  East China Normal University, Shanghai, China
Hangzai Luo  East China Normal University, Shanghai, China
Jianping Fan  UNC-Charlotte, Charlotte, USA
Sponsor
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1631272.1631529
What is a DOI?

ABSTRACT

In this paper we propose an imbalanced classification algorithm to extract informative images from web news pages. Our algorithm resolve the difficult problem based on two approaches. First, we limit our dataset to a specific application area so that the patterns of the informative images can be captured by existing classification algorithms. Second, we propose an automatic negative samples filtering algorithm to eliminate most negative samples, so that the classification training data is rebalanced. Because most classification algorithms have reduced performance on imbalanced training data, our algorithm improves the overall performance significantly. In addition, our approach is inherently robust to new web sites and style/layout change of web sites.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. A brief survey of web data extraction tools. ACM SIGMOD Record, 31(2):84--93, 2002.
 
2
Albert Orriols and Ester Bernad'oMansilla. The class imbalance problem in learning classifier systems: A preliminary study. GECCO Workshops, pages 74--78, 2005.
 
3
Yanhong Zhai and Bing Liu. Structured data extraction from the web based on partial tree alignment. IEEE Transactions on Knowledge and Data Engineering, 18(12):1614--1628, 2006.