|
ABSTRACT
This article uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users. First, we study the formation of stable distributions in tagging systems, seen as an implicit form of “consensus” reached by the users of the system around the tags that best describe a resource. We show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process, based on the Kullback-Leibler divergence measure. The convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so-called long tail. Second, we study the information structures that emerge from collaborative tagging, namely tag correlation (or folksonomy) graphs. We show how community-based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags. Furthermore, we also show, for a specialized domain, that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large-scale query logs provided by a major search engine. Although the empirical analysis presented in this article is based on a set of tagging data obtained from del.icio.us, the methods developed are general, and the conclusions should be applicable across other websites that employ tagging.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anderson, C. 2006. The Long Tail. Random House Business Books.
|
| |
2
|
Bar-Yam, Y. 2003. Dynamics of Complex Systems (Studies in Nonlinearity). Westview Press.
|
| |
3
|
Batagelj, V. and Mrvar, A. 1998. Pajek—A program for large network analysis. Connections 21, 47--57.
|
| |
4
|
Bateman, S., Brooks, C., McCalla, G., and Brusilovsky, P. 2007. Applying collaborative tagging to e-learning. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).
|
| |
5
|
Boydell, O. and Smyth, B. 2006. Capturing community search expertise for personalized Web search using snippet-indexes. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'06). ACM Press, 1313--1314.
|
| |
6
|
Boydell, O. and Smyth, B. 2007. From social bookmarking to social summarization: An experiment in community-based summary generation. In Proceedings of the International Conference on Intelligent User Interfaces, 42--51.
|
| |
7
|
Brandes, U., Delling, D., Gaertler, M., Goerke, R., Hoefer, M., Nikoloski, Z., and Wagner, D. 2006. Maximizing modularity is hard. http://arxiv.org/abs/physics/0608255.
|
| |
8
|
Butterfield, S. 2004. Folksonomy. http://www.sylloge.com/personal/2004/08/folksonomy-social-classification-great.html.
|
| |
9
|
Cattuto, C., Loreto, V., and Pietronero, L. 2007. Semiotic dynamics and collaborative tagging. Proc. Nat. Acad. Sci. 104, 5, 1461--1464.
|
| |
10
|
Chirita, P., Costache, S., Handschuh, S., and Nejdl, W. 2007. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 16th International World Wide Web Conference (WWW'07). ACM Press, 845--854.
|
| |
11
|
Cilibrasi, R. and Vitanyi, P. 2007. The google similarity distance. IEEE Trans. Knowl. Data Engin. 19, 3, 370--382.
|
| |
12
|
Dellschaft, K. and Staab, S. 2008. An epistemic dynamic model for tagging systems. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HYPERTEXT'08). ACM Press, 71--80.
|
| |
13
|
Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghvan, P., and Tomkins, A. 2006. P-tag: Large scale automated generation of personalised annotation tags for the Web. In Proceeding of the 15th International World Wide Web Conference (WWW'06). ACM Press, 193--202.
|
| |
14
|
Gligorov, R., Aleksovski, Z., ten Cate, W., and van Harmelen, F. 2008. Using Google distance to weight approximate ontology matches. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 767--775.
|
| |
15
|
Golder, S. and Huberman, B. 2006. Usage patterns of collaborative tagging systems. J. Inform. Sci. 32, 2, 198--208.
|
| |
16
|
Halpin, H., Robu, V., and Shepherd, H. 2007. The complex dynamics of collaborative tagging. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 211--220.
|
| |
17
|
Halvey, M. and Keane, M. T. 2007. An assesment of tag presentation techniques. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1313--1314.
|
| |
18
|
Hayes, C. and Avesani, P. 2007. Using tags and clustering to identify topic-relevant blogs. In Proceedings of the 1st International Conference on Weblogs and Social Media, N. Nicolov, N. Glance, E. Adar, M. Hurst, M. Liberman, J. H. Martin, and F. Salvetti, Eds. http://www.icwsm.org.
|
| |
19
|
Hearst, M. A. and Rosner, D. 2008. Tag clouds: Data analysis tools or social signaller? In Proceedings of the 41st Hawaii International Conference on System Sciences. IEEE.
|
| |
20
|
Heymann, P., Koutrika, G., and Garcia-Molina, H. 2008. Can social bookmarking improve search? In Proceedings of the International Conference on Web Search and Data Mining (WSDM'08). ACM Press, 195--205.
|
| |
21
|
|
| |
22
|
Jacob, E. 2004. Classification and categorization: A difference that makes a difference. Library Trends 52, 3, 515--540.
|
| |
23
|
Jin, R. K.-X., Parkes, D. C., and Wolfe, P. J. 2007. Analysis of bidding networks in eBay: Aggregate preference identification through community detection. In Proceedings of the AAAI Workshop on Plan, Activity and Intent Recognition (PAIR).
|
| |
24
|
Kaser, O. and Lemire, D. 2007. Tag-cloud drawing: Algorithms for cloud visualization. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization (WWW'07).
|
| |
25
|
Kuo, B. Y.-L., Hentrich, T., Good, B. M., and Wilkinson, M. D. 2007. Tag clouds for summarizing web search results. In Proceedings of the 16th International World Wide Web Conference (WWW'07). ACM Press, 1203--1204.
|
| |
26
|
Manning, C. and Schutze, H. 2002. Foundations of Statistical Natural Language Processing. MIT Press, London.
|
| |
27
|
Marlow, C., Naaman, M., Boyd, D., and Davis, M. 2006. Position paper, tagging, taxonomy, flickr, article, toread. In Proceedings of the Collaborative Web Tagging Workshop at WWW'06.
|
| |
28
|
Mathes, A. 2004. Folksonomies: Cooperative classification and communication through shared metadata. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.
|
| |
29
|
Mika, P. 2005. Ontologies are us: A unified model of social networks and semantics. In Proceedings of the 4th International Semantic Web Conference (ISWC'05). Lecture Notes in Computer Science, vol. 3729, Springer.
|
| |
30
|
Mikroyannidis, A. 2007. Towards a social semantic Web. IEEE Comput. Mag., 113--115.
|
| |
31
|
Newman, M. 2005. Power laws, pareto distributions and Zipf's law. Contem. Phys. 46, 323--351.
|
| |
32
|
Newman, M. E. J. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133.
|
| |
33
|
Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.
|
| |
34
|
Rattenbury, T., Good, N., and Naaman, M. 2007. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of SIGIR'07. Press, Ed. 103--110.
|
| |
35
|
Robu, V., Poutré, H. L., and Bohte, S. 2009. The complex dynamics of sponsored search markets. Agents and Data Mining Interaction. Lecture Notes in Computer Science, vol. 5680. Springer.
|
| |
36
|
Robu, V. and Poutré, J. A. L. 2006. Retrieving utility graphs used in multi-item negotiation through collaborative filtering. In Proceedings of RRS'06.
|
| |
37
|
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International WWW Conference (WWW10).
|
| |
38
|
Sen, S., Lam, S. K., Rashid, A. M., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F. M., and Riedl, J. 2006. Tagging, communities, vocabulary, evolution. In Proceedings of the 20th Conference on Computer Supported Cooperative Work (CSCW'06). ACM Press, 181--190.
|
| |
39
|
Shen, K. and Wu, L. 2005. Folksonomy as a complex network. http://arxiv.org/abs/cs.IR/0509072.
|
| |
40
|
Watts, D. and Strogatz, S. 1998. Collective dynamics of 'small-world' networks. Nature 393, 6684, 440--442.
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.3
Group and Organization Interfaces
Subjects:
Collaborative computing
Additional Classification:
H.
Information Systems
H.1
MODELS AND PRINCIPLES
H.1.1
Systems and Information Theory
Subjects:
Information theory
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.4
Knowledge Representation Formalisms and Methods
General Terms:
Algorithms,
Human Factors,
Measurement
Keywords:
Collaborative tagging,
community identification algorithms,
complex systems,
del.icio.us,
emergent semantics,
graphical models,
knowledge extraction,
power laws,
search engines
|