ACM Home Page
Please provide us with feedback. Feedback
Entropy-based link analysis for mining web informative structures
Full text PdfPdf (564 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eleventh international conference on Information and knowledge management table of contents
McLean, Virginia, USA
SESSION: Web search 2 table of contents
Pages: 574 - 581  
Year of Publication: 2002
ISBN:1-58113-492-4
Authors
Hung-Yu Kao  National Taiwan University, Taipei, Taiwan, ROC
Ming-Syan Chen  National Taiwan University, Taipei, Taiwan, ROC
Shian-Hua Lin  Academia Sinica, Taipei, Taiwan, ROC
Jan-Ming Ho  Academia Sinica, Taipei, Taiwan, ROC
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 91,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584792.584886
What is a DOI?

ABSTRACT

In this paper, we study the problem of mining the informative structure of a news Web site which consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by TOC pages through informative links. It is noted that the Hyperlink Induced Topics Search (HITS) algorithm has been employed to provide a solution to analyzing authorities and hubs of pages. However, most of the content sites tend to contain some extra hyperlinks, such as navigation panels, advertisements and banners, so as to increase the add-on values of their Web pages. Therefore, due to the structure induced by these extra hyperlinks, HITS is found to be insufficient to provide a good precision in solving the problem. To remedy this, we develop an algorithm to utilize entropy-based Link Analysis on Mining Web Informative Structures. This algorithm is referred to as LAMIS. The key idea of LAMIS is to utilize information entropy for representing the knowledge that corresponds to the amount of information in a link or a page in the link analysis. Experiments on several real news Web sites show that the precision and the recall of LAMIS are much superior to those obtained by heuristic methods and conventional ink analysis methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
7
8
 
9
 
10
11
 
12
 
13
 
14
B. D. Davison. Recognizing Nepotistic Links on the Web. Proc. of AAAI 2000.
15
 
16
 
17
N. Kushmerick, D. Weld, and R. Doorenbos. Wrapper Induction for Information Extraction. In Proc. of the 15th International Joint Conference on Artificial Intelligence (IJCAI), 1997.
 
18
19
20
 
21
 
22
C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:398--403, 1948.
 
23
W3C DOM. Document Object Model (DOM). http://www.w3.org/DOM/.

CITED BY  9

Collaborative Colleagues:
Hung-Yu Kao: colleagues
Ming-Syan Chen: colleagues
Shian-Hua Lin: colleagues
Jan-Ming Ho: colleagues