ACM Home Page
Please provide us with feedback. Feedback
Enhancing clustering blog documents by utilizing author/reader comments
Full text PdfPdf (171 KB)
Source ACM Southeast Regional Conference archive
Proceedings of the 45th annual southeast regional conference table of contents
Winston-Salem, North Carolina
SESSION: Papers table of contents
Pages: 94 - 99  
Year of Publication: 2007
ISBN:978-1-59593-629-5
Authors
Beibei Li  University of Kentucky, Lexington, KY
Shuting Xu  Virginia State University, Petersburg, VA
Jun Zhang  University of Kentucky, Lexington, KY
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 130,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1233341.1233359
What is a DOI?

ABSTRACT

Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aschenbrenner, A., and Miksch, S. Blog mining in a corporate environment, Technical Report ASGAARD-TR-2005-11, Smart Agent Technologies, 2005.
 
2
 
3
 
4
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. Indexing by latent semantic analysis, Journal of the Society of Information Science, 41(1990), 391--407.
 
5
 
6
 
7
Hoyt C. Mining the blogosphere, the HUB Magazine, January 10, 2006, http://hubmagazine.com/?p=76, last accessed on October 30, 2006.
8
 
9
Liu, H., Li, J., and Wong, L. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13(2002), 51--60.
 
10
MacQueen, J. B. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5<sup>th</sup> Symposium on Mathematics, Statistics, and Probability, University of California Press, 1967, 281--297.
 
11
Malkin, M. All about the Minnesota school shooter, March 23, 2005, http://michellemalkin.com/archives/001837.htm, last accessed on November 1, 2006.
 
12
Nicolov, N., Salvetti, F., Liberman, M., and Martin, J. H. Computational approaches to analyzing weblogs. In Papers from 2006 AAAI Spring Symposium, 2006.
 
13
 
14
Sifry, D. Sifry's alerts, at http://www.sifry.com/alerts/archives/000436.html, accessed on October 31, 2006.
 
15
Tang, B., Shepherd, M., Milios, E., and Heywood, M. Comparing and combing dimension reduction techniques for efficient test clustering, In Proceedings of the Workshop on Feature Selection for Data Mining, SIAM Data Mining, 2005.
 
16
Torio, J. Blogs, A Global Conversation, Master's Thesis, Syracuse University, 2005.
 
17
18
 
19
Zhao, Y., and Karypis, G. Criterion Function for Document Clustering Experiments and Analysis, Technical Report #01--40, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 2001.


Collaborative Colleagues:
Beibei Li: colleagues
Shuting Xu: colleagues
Jun Zhang: colleagues