ACM Home Page
Please provide us with feedback. Feedback
PageCluster: Mining conceptual link hierarchies from Web log files for adaptive Web site navigation
Full text PdfPdf (281 KB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 4 ,  Issue 2  (May 2004) table of contents
Pages: 185 - 208  
Year of Publication: 2004
ISSN:1533-5399
Authors
Jianhan Zhu  University of Ulster at Jordanstown, United Kingdom
Jun Hong  University of Ulster at Jordanstown, United Kingdom
John G. Hughes  University of Ulster at Jordanstown, United Kingdom
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 90,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/990301.990305
What is a DOI?

ABSTRACT

User traversals on hyperlinks between Web pages can reveal semantic relationships between these pages. We use user traversals on hyperlinks as weights to measure semantic relationships between Web pages. On the basis of these weights, we propose a novel method to put Web pages on a Web site onto different conceptual levels in a link hierarchy. We develop a clustering algorithm called PageCluster, which clusters conceptually-related pages on each conceptual level of the link hierarchy based on their in-link and out-link similarities. Clusters are then used to construct a conceptual link hierarchy, which is visualized in a prototype called Online Navigation Explorer (ONE) for adaptive Web site navigation. Our experiments show that our method can put Web pages onto conceptual levels of a link hierarchy more accurately than both the breadth-first search method and the shortest-weighted-path method, and PageCluster can cluster conceptually-related pages more accurately than the bibliographic analysis method. Our user study also shows that the conceptual link hierarchy visualized in ONE can help users find information more effectively and efficiently as the task of finding information becomes less specific and involves more Web pages on multiple conceptual levels.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Almind, T. C. and Ingwersen, P. 1997. Informetric analysis on the World Wide Web: Methodological approaches to "Webometrics". J. Document. 53, 4, 404--426.
 
2
Bollen, J. and Heylighen, F. 1998. A system to restructure hypertext networks into valid user models. The New Review of Hypermedia and Multimedia 4, 189--213.
 
3
Carpenter, M. P. and Narin, F. 1973. Clustering of scientific journals. J. Amer. Soc. Info. Sci. 24, 6, 425--436.
 
4
 
5
6
7
8
9
10
 
11
12
 
13
Farkas, D. K. and Farkas, J. B. 2000. Guidelines for designing Web navigation. Tech. Comm. 47, 3, 341--358.
 
14
 
15
16
17
18
 
19
Gower, J. 1971. A general coefficient of similarity and some of its properties. Biomet. 27, 857--874.
 
20
Hallam-Baker, P. M. and Behlendorf, B. 1996. Extended log file format. W3C Working Draft WD-logfile-960323. http://www.w3.org/TR/WD-logfile.
 
21
Henzinger, M. R. 2000. Link analysis in Web information retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3, 3--8.
 
22
 
23
 
24
Huberman, B. A., Pirolli, P. L., Pitkow, J. E., and Lukose, R. M. 1998. Strong regularities in World-Wide Web surfing. Science 280, 5360, 94--97.
 
25
Kaplan, C., Fenwick, J., and Chen, J. 1993. Adaptive hypertext navigation based on user goals and context. User Models User Adapt. Interact. 3, 2, 193--220.
 
26
Kessler, M. M. 1963. Bibliographic coupling between scientific papers. Amer. Document. 14, 1, 10--25.
 
27
 
28
 
29
Larson, R. 1996. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society of Information Science. Baltimore, MD, October, 19--24.
 
30
 
31
Nielsen, J. 2000. Designing Web Usability. New Riders Publishing, Indianapolis, IN.
32
 
33
Perkowitz, M. and Etzioni, O. 1997. Adaptive Web sites: an AI challenge. In Proceedings of IJCAI'97, 16--23.
 
34
 
35
36
 
37
 
38
Small, H. G. 1973. Co-Citation in the scientific literature: A new measurement of the relationship between two documents. J. Amer. Soc. Infor. Sci. 24, 4, 265--269.
 
39
Small, H. G. and Koenig, M. E. D. 1977. Journal clustering using a bibliographic coupling method. Info. Process. Manage. 13, 5, 277--288.
 
40
Wishart, D. 2001. Clustan Professional User Guide. Clustan Ltd., Edinburgh, Scotland.
 
41
Wishart, D. 2002. K-means clustering with outlier deletion, for data mining with mixed variables and missing values. In Exploratory Data Analysis in Empirical Research, M. Schwaiger and O. Opitz, Eds. Springer, 216--226.


Collaborative Colleagues:
Jianhan Zhu: colleagues
Jun Hong: colleagues
John G. Hughes: colleagues