ACM Home Page
Please provide us with feedback. Feedback
Blog search and mining in the business domain
Full text PdfPdf (449 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 2007 international workshop on Domain driven data mining table of contents
San Jose, California
Pages: 55 - 60  
Year of Publication: 2007
ISBN:978-1-59593-846-6
Authors
Yun Chen  Nanyang Technological University, Singapore
Flora S. Tsai  Nanyang Technological University, Singapore
Kap Luk Chan  Nanyang Technological University, Singapore
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 254,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1288552.1288560
What is a DOI?

ABSTRACT

Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs written by or providing commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis (PLSA). We implement the models in our database of business blogs, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. From our study, we can uncover domain-driven data mining techniques that can better strengthen business intelligence in complex enterprise applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Mishne, G. Information Access Challenges in the Blogspace. In Proceedings of International Workshop on Intelligent Information Access (IIIA-2006) (Helsinki, Finland, 2006).
 
2
Pikas, C. K. Blog Searching for Competitive Intelligence, Brand Image, and Reputation Management (Cover Story). In Online, 29 (4) (2005) 16--21.
 
3
Mishne, G., de Rijke, M. A Study of Blog Search. In Proceedings of 28th European Conference on Information Retrieval (ECIR) (2006).
 
4
Gill, K. E. How Can We Measure the Influence of the Blogosphere? In Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics (New York, May 18, 2004).
 
5
Nakajima, S., Tatemura, J., Hino, Y., Hara, Y., Tanaka, K. Discovering Important Bloggers based on Analyzing Blog Threads. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (Chiba, Japan, 2005).
 
6
Avesani, P., Cova, M., Hayes, C., Massa, P. Learning Contextualised Weblog Topics. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics (Chiba, Japan, 2005).
 
7
Glance, N. S., Hurst, M., Tomokiyo, T. BlogPulse: Automated Trend Discovery for Weblogs. In Proceedings of WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics (New York, May 18, 2004).
8
 
9
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. Indexing by Latent Semantic Analysis. In J. Inform. Sci., 41, (1990), 391--407.
 
10
 
11
Dumais, S. T. Improving the Retrieval of Information from External Sources. Behavior Research Methods, Instruments and Computers, 23 (2) (1991) 229--236.
 
12
Kolda, T. G. Limited-Memory Matrix with Applications. Ph.D. Thesis, University of Maryland, College Park, Technical Report CS-TR-3806, 1997.
 
13
Zeimpekis, D., Gallopoulos, E. Design of a MATLAB Toolbox for Term-document Matrix Generation. In Proceedings of Workshop on Clustering High Dimensional Data and its Application, (Newport Beach, California, 2005) 38--48.
14
 
15
Yu, C., Cuadrado, J., Ceglowski, M., Payne, J. S. Patterns in Unstructured Data: Discovery, Aggregation, and Visualization. In Presentation to the Andrew W. Mellon Foundation (2002).
 
16
17

Collaborative Colleagues:
Yun Chen: colleagues
Flora S. Tsai: colleagues
Kap Luk Chan: colleagues