ACM Home Page
Please provide us with feedback. Feedback
Web unit mining: finding and classifying subgraphs of web pages
Full text PdfPdf (204 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Knowledge management session 2: semantic web table of contents
Pages: 108 - 115  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Aixin Sun  Nanyang Technological University, Singapore
Ee-Peng Lim  Nanyang Technological University, Singapore
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 64,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956885
What is a DOI?

ABSTRACT

In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this assumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate the web unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments using the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
6
7
 
8
L. Getoor, E. Segal, B. Taskar, and D. Koller. Probabilistic models of text and link structure for hypertext classification. In Proc. of Intl Joint Conf. on Artificial Intelligence Workshop on Text Learning: Beyond Supervision, Seattle, WA, 2001.
 
9
D. Hawking and N. Craswell. Overview of the TREC-2001 web track. In Proc. of TREC, Maryland, 2001. http://trec.nist.gov/.
 
10
 
11
12
 
13
D. Mladenic. Turning Yahoo to automatic web-page classifier. In Proc. of 13th European Conf. on Artificial Intelligence, pages 473--474, Brighton, UK, 1998.
 
14
J. M. Pierre. On the automated classification of web sites. Linköping Electronic Articles in Computer and Info. Science, 6, 2001.
15
16
 
17
 
18
T. Westerveld, D. Hiemstra, and W. Kraaij. Retrieving web pages using content, links, urls and anchors. In Proc. of TREC, Maryland, 2001. http://trec.nist.gov/.
19
 
20