|
ABSTRACT
Clustering is a viable technique to deal with the scaling issue for the web documents, which has been known for complicated combinatorial optimization problem. It is hard to develop a generally applicable optimal algorithm on the web document clustering and classification for which a simulated annealing algorithm is developed. The web document classification problem is addressed as the problem of best describing match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on-line during the search by transforming web sites. As a result, web sites can be clustered optimally in terms of keyword vectors of corresponding web documents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Edmonds, "Optimum Branching," Journal of Research of the National Bureau of Standards, 71B, pp. 233--240. 1967.
|
 |
2
|
|
| |
3
|
O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates, "Methods for Domain-Independent Information Extraction from the web: An Experimental Comparison," Proc. AAAI, pp. 391--398, 2004.
|
 |
4
|
Michalis Faloutsos , Petros Faloutsos , Christos Faloutsos, On power-law relationships of the Internet topology, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.251-262, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
5
|
|
| |
6
|
P. A. Gloor, and S. B. Dynes, "Cybermap - Visually Navigating the Web," Journal of Visual Languages and Computing, vol. 9, no. 3, pp. 319--336, 1998.
|
 |
7
|
Eric Glover , David M. Pennock , Steve Lawrence , Robert Krovetz, Inferring hierarchical descriptions, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584876]
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
David D. Lewis , Robert E. Schapire , James P. Callan , Ron Papka, Training algorithms for linear text classifiers, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.298-306, August 18-22, 1996, Zurich, Switzerland
[doi> 10.1145/243199.243277]
|
 |
16
|
|
 |
17
|
Mark W. Newman , James A. Landay, Sitemaps, storyboards, and specifications: a sketch of Web site design practice, Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques, p.263-274, August 17-19, 2000, New York City, New York, United States
[doi> 10.1145/347642.347758]
|
| |
18
|
|
| |
19
|
D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles, "Winners Don't Take All: Characterizing the Competition for Links on the Web," Computer Science, vol. 99, no. 8, pp. 5207--5211, 2002.
|
| |
20
|
|
 |
21
|
|
| |
22
|
E. Ravasz, and A. L. Barabasi, "Hierarchical Organization in Complex Networks," PHYSICAL REVIEW, vol. 67, 2003.
|
| |
23
|
R. Tarjan, "Enumeration of the Elementary Circuits of a Directed Graph," SIAM J. Computing, vol. 2, no. 3, pp. 211--216, 1973.
|
 |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
B. Yoon, "Finding the Number of Clusters and Various Experiments Based on ASA Clustering Method," Journal of KORMS, pp. 87--98, 2006.
|
| |
28
|
Wookey Lee, S. Kang, S. Lim, M. Shin, and Y. Kim, "Adaptive Hierarchical Surrogate for Searching Web with Mobile Devices," IEEE Transactions on Consumer Electronics, Vol. 53, No. 2, pp. 796--803, 2007.
|
| |
29
|
Wookey Lee, Seungkil Lim: Maximum Rooted Spanning Trees for the Web. OTM Workshops, LNCS, pp. 1873--1882, 2006.
|
|