|
ABSTRACT
A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (i.e., ancestor and child nodes) and its horizonal list of attributes (or terms). In a text-dominant environment, a topic taxonomy can be used to flexibly describe a group's interests with varying granularity. However, the stagnant nature of a taxonomy may fail to timely capture the dynamic change of a group's interest. This article addresses the problem of how to adapt a topic taxonomy to the accumulated data that reflects the change of a group's interest to achieve dynamic group profiling. We first discuss the issues related to topic taxonomy. We next formulate taxonomy adaptation as an optimization problem to find the taxonomy that best fits the data. We then present a viable algorithm that can efficiently accomplish taxonomy adaptation. We conduct extensive experiments to evaluate our approach's efficacy for group profiling, compare the approach with some alternatives, and study its performance for dynamic group profiling. While pointing out various applications of taxonomy adaption, we suggest some future work that can take advantage of burgeoning Web 2.0 services for online targeted marketing, counterterrorism in connecting dots, and community tracking.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Charu C. Aggarwal , Stephen C. Gates , Philip S. Yu, On the merits of building categorization systems by supervised clustering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.352-356, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312279]
|
 |
3
|
|
| |
4
|
Airoldi, E. M., Fienberg, S. E., Joutard, C., and Love, T. M. 2006. Discovering latent patterns with hierarchical Bayesian mixed-membership models. Tech. Rep. CMU-ML-06-101, School of Computer Science, Carnegie Mellon University, Philadelphia, PA.
|
| |
5
|
|
| |
6
|
Blei, D., Griffiths, T. L., Jordan, M. I., and Tenenbaum, J. B. 2003. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems 16, S. Thrun et al., eds. MIT Press, Cambridge, MA.
|
 |
7
|
|
| |
8
|
|
| |
9
|
Bounsaythip, C. and Rinta-Runsala, E. 2001. Overview of data mining for customer behavior modeling. http://virtual.vtt.fi/inf/julkaisut/muut/2001/customerprofiling.pdf.
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
Chen, M.-C., Chiu, A.-L., and Chang, H.-H. 2005. Mining changes in customer behavior in retail marketing. Expert Syst. Appl. 28, 773--781.
|
 |
16
|
|
 |
17
|
Ofer Dekel , Joseph Keshet , Yoram Singer, Large margin hierarchical classification, Proceedings of the twenty-first international conference on Machine learning, p.27, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015374]
|
| |
18
|
Dhillon, I. S., Fan, J., and Guan, Y. 2001. Efficient clustering of very large document collections. In Data Mining for Scientific and Engineering Applications. Kluwer Academic.
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
Daniel Gruhl , R. Guha , David Liben-Nowell , Andrew Tomkins, Information diffusion through blogspace, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988739]
|
| |
23
|
|
| |
24
|
Hwang, F. and Richards, D. 1992. The Steiner tree problem. Ann. Discrete Math. 53.
|
| |
25
|
|
| |
26
|
|
| |
27
|
Li, T. and Zhu, S. 2005. Hierarchical document classification using automatically generated hierarchy. In SIAM International Data Mining Conference, Newport Beach, CA.
|
| |
28
|
|
| |
29
|
|
 |
30
|
Tie-Yan Liu , Yiming Yang , Hao Wan , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma, Support vector machines classification with a very large-scale taxonomy, ACM SIGKDD Explorations Newsletter, v.7 n.1, p.36-43, June 2005
[doi> 10.1145/1089815.1089821]
|
| |
31
|
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization.
|
| |
32
|
|
 |
33
|
|
 |
34
|
Juho Rousu , Craig Saunders , Sandor Szedmak , John Shawe-Taylor, Learning hierarchical multi-category text classification models, Proceedings of the 22nd international conference on Machine learning, p.744-751, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102445]
|
 |
35
|
|
| |
36
|
Segal, E., Koller, D., and Ormoneit, D. 2001. Probabilistic abstraction hierarchies. In Advances in Neural Information Processing Systems 14. MIT Press, Vancouver, British Columbia, Canada, 913--920.
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
 |
40
|
|
 |
41
|
Kristina Toutanova , Francine Chen , Kris Popat , Thomas Hofmann, Text classification in a hierarchical mixture model for small training sets, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502604]
|
 |
42
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
 |
43
|
|
| |
44
|
|
| |
45
|
|
 |
46
|
|
| |
47
|
|
 |
48
|
|
 |
49
|
Li Zhang , ShiXia Liu , Yue Pan , LiPing Yang, InfoAnalyzer: a computer-aided tool for building enterprise taxonomies, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
[doi> 10.1145/1031171.1031263]
|
|