ACM Home Page
Please provide us with feedback. Feedback
Mining newsgroups using networks arising from social behavior
Full text PdfPdf (299 KB)
Source International World Wide Web Conference archive
Proceedings of the 12th international conference on World Wide Web table of contents
Budapest, Hungary
SESSION: Data mining table of contents
Pages: 529 - 535  
Year of Publication: 2003
ISBN:1-58113-680-3
Authors
Rakesh Agrawal  IBM Almaden Research Center, San Jose, CA
Sridhar Rajagopalan  IBM Almaden Research Center, San Jose, CA
Ramakrishnan Srikant  IBM Almaden Research Center, San Jose, CA
Yirong Xu  IBM Almaden Research Center, San Jose, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 127,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775152.775227
What is a DOI?

ABSTRACT

Recent advances in information retrieval over hyperlinked corpora have convincingly demonstrated that links carry less noisy information than text. We investigate the feasibility of applying link-based methods in new applications domains. The specific application we consider is to partition authors into opposite camps within a given topic in the context of newsgroups. A typical newsgroup posting consists of one or more quoted lines from another posting followed by the opinion of the author. This social behavior gives rise to a network in which the vertices are individuals and the links represent "responded-to" relationships. An interesting characteristic of many newsgroups is that people more frequently respond to a message when they disagree than when they agree. This behavior is in sharp contrast to the WWW link graph, where linkage is an indicator of agreement or common interest. By analyzing the graph structure of the responses, we are able to effectively classify people into opposite camps. In contrast, methods based on statistical analysis of text yield low accuracy on such datasets because the vocabulary used by the two sides tends to be largely identical, and many newsgroup postings consist of relatively few words of text.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
5
6
 
7
8
 
9
 
10
11
 
12
I. Good. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press, 1965.
 
13
14
 
15
R. M. Karp. Reducibility among combinatorial problems. In R. E. Miller and J. W. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, New York, 1975.
 
16
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report TR 95-035, University of Minnesota, Dept. of Computer Science, 1995.
17
 
18
B. W. Kernighan and S. Lin. An efficient heuristic procedure for paritioning graphs. The Bell System Technical Journal, pages 291--307, 1970.
 
19
S. Milgram. The small world problem. Psychology Today, 2:60--67, 1967.
 
20
T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth International Colloquium on Cognitive Science, San Sebastian, Spain, 1999.
 
21
 
22
J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.
 
23
 
24
 
25
26
 
27
 
28
 
29
 
30
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.

CITED BY  28

Collaborative Colleagues:
Rakesh Agrawal: colleagues
Sridhar Rajagopalan: colleagues
Ramakrishnan Srikant: colleagues
Yirong Xu: colleagues