|
ABSTRACT
User traversals on hyperlinks between Web pages can reveal semantic relationships between these pages. We use user traversals on hyperlinks as weights to measure semantic relationships between Web pages. On the basis of these weights, we propose a novel method to put Web pages on a Web site onto different conceptual levels in a link hierarchy. We develop a clustering algorithm called PageCluster, which clusters conceptually-related pages on each conceptual level of the link hierarchy based on their in-link and out-link similarities. Clusters are then used to construct a conceptual link hierarchy, which is visualized in a prototype called Online Navigation Explorer (ONE) for adaptive Web site navigation. Our experiments show that our method can put Web pages onto conceptual levels of a link hierarchy more accurately than both the breadth-first search method and the shortest-weighted-path method, and PageCluster can cluster conceptually-related pages more accurately than the bibliographic analysis method. Our user study also shows that the conceptual link hierarchy visualized in ONE can help users find information more effectively and efficiently as the task of finding information becomes less specific and involves more Web pages on multiple conceptual levels.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Almind, T. C. and Ingwersen, P. 1997. Informetric analysis on the World Wide Web: Methodological approaches to "Webometrics". J. Document. 53, 4, 404--426.
|
| |
2
|
Bollen, J. and Heylighen, F. 1998. A system to restructure hypertext networks into valid user models. The New Review of Hypermedia and Multimedia 4, 189--213.
|
| |
3
|
Carpenter, M. P. and Narin, F. 1973. Clustering of scientific journals. J. Amer. Soc. Info. Sci. 24, 6, 425--436.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
 |
9
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen, Constant interaction-time scatter/gather browsing of very large document collections, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, p.126-134, June 27-July 01, 1993, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/160688.160706]
|
 |
10
|
|
| |
11
|
|
 |
12
|
David Durand , Paul Kahn, MAPA: a system for inducing and visualizing hierarchy in Websites, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.66-76, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276635]
|
| |
13
|
Farkas, D. K. and Farkas, J. B. 2000. Guidelines for designing Web navigation. Tech. Comm. 47, 3, 341--358.
|
| |
14
|
|
| |
15
|
|
 |
16
|
C. Lee Giles , Kurt D. Bollacker , Steve Lawrence, CiteSeer: an automatic citation indexing system, Proceedings of the third ACM conference on Digital libraries, p.89-98, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276685]
|
 |
17
|
Eric J. Glover , Kostas Tsioutsiouliklis , Steve Lawrence , David M. Pennock , Gary W. Flake, Using web structure for classifying and describing web pages, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511520]
|
 |
18
|
Eric Glover , David M. Pennock , Steve Lawrence , Robert Krovetz, Inferring hierarchical descriptions, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584876]
|
| |
19
|
Gower, J. 1971. A general coefficient of similarity and some of its properties. Biomet. 27, 857--874.
|
| |
20
|
Hallam-Baker, P. M. and Behlendorf, B. 1996. Extended log file format. W3C Working Draft WD-logfile-960323. http://www.w3.org/TR/WD-logfile.
|
| |
21
|
Henzinger, M. R. 2000. Link analysis in Web information retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3, 3--8.
|
| |
22
|
|
| |
23
|
|
| |
24
|
Huberman, B. A., Pirolli, P. L., Pitkow, J. E., and Lukose, R. M. 1998. Strong regularities in World-Wide Web surfing. Science 280, 5360, 94--97.
|
| |
25
|
Kaplan, C., Fenwick, J., and Chen, J. 1993. Adaptive hypertext navigation based on user goals and context. User Models User Adapt. Interact. 3, 2, 193--220.
|
| |
26
|
Kessler, M. M. 1963. Bibliographic coupling between scientific papers. Amer. Document. 14, 1, 10--25.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Larson, R. 1996. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society of Information Science. Baltimore, MD, October, 19--24.
|
| |
30
|
|
| |
31
|
Nielsen, J. 2000. Designing Web Usability. New Riders Publishing, Indianapolis, IN.
|
 |
32
|
|
| |
33
|
Perkowitz, M. and Etzioni, O. 1997. Adaptive Web sites: an AI challenge. In Proceedings of IJCAI'97, 16--23.
|
| |
34
|
|
| |
35
|
|
 |
36
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
| |
37
|
|
| |
38
|
Small, H. G. 1973. Co-Citation in the scientific literature: A new measurement of the relationship between two documents. J. Amer. Soc. Infor. Sci. 24, 4, 265--269.
|
| |
39
|
Small, H. G. and Koenig, M. E. D. 1977. Journal clustering using a bibliographic coupling method. Info. Process. Manage. 13, 5, 277--288.
|
| |
40
|
Wishart, D. 2001. Clustan Professional User Guide. Clustan Ltd., Edinburgh, Scotland.
|
| |
41
|
Wishart, D. 2002. K-means clustering with outlier deletion, for data mining with mixed variables and missing values. In Exploratory Data Analysis in Empirical Research, M. Schwaiger and O. Opitz, Eds. Springer, 216--226.
|
CITED BY 3
|
|
Xiaoxin Yin , William Yurcik , Michael Treaster , Yifan Li , Kiran Lakkaraju, VisFlowConnect: netflow visualizations of link relationships for security situational awareness, Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security, October 29-29, 2004, Washington DC, USA
|
|
|
|
|
|
|
|