|
ABSTRACT
Recent advances in information retrieval over hyperlinked corpora have convincingly demonstrated that links carry less noisy information than text. We investigate the feasibility of applying link-based methods in new applications domains. The specific application we consider is to partition authors into opposite camps within a given topic in the context of newsgroups. A typical newsgroup posting consists of one or more quoted lines from another posting followed by the opinion of the author. This social behavior gives rise to a network in which the vertices are individuals and the links represent "responded-to" relationships. An interesting characteristic of many newsgroups is that people more frequently respond to a message when they disagree than when they agree. This behavior is in sharp contrast to the WWW link graph, where linkage is an indicator of agreement or common interest. By analyzing the graph structure of the responses, we are able to effectively classify people into opposite camps. In contrast, methods based on statistical analysis of text yield low accuracy on such datasets because the vocabulary used by the two sides tends to be largely identical, and many newsgroup postings consist of relatively few words of text.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
I. Good. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press, 1965.
|
| |
13
|
|
 |
14
|
|
| |
15
|
R. M. Karp. Reducibility among combinatorial problems. In R. E. Miller and J. W. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, New York, 1975.
|
| |
16
|
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report TR 95-035, University of Minnesota, Dept. of Computer Science, 1995.
|
 |
17
|
|
| |
18
|
B. W. Kernighan and S. Lin. An efficient heuristic procedure for paritioning graphs. The Bell System Technical Journal, pages 291--307, 1970.
|
| |
19
|
S. Milgram. The small world problem. Psychology Today, 2:60--67, 1967.
|
| |
20
|
T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth International Colloquium on Cognitive Science, San Sebastian, Spain, 1999.
|
| |
21
|
|
| |
22
|
J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data. AAAI Press, 2000.
|
| |
23
|
Kamal Nigam , Andrew McCallum , Sebastian Thrun , Tom Mitchell, Learning to classify text from labeled and unlabeled documents, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.792-799, July 1998, Madison, Wisconsin, United States
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.
|
CITED BY 28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Natalie Glance , Matthew Hurst , Kamal Nigam , Matthew Siegler , Robert Stockton , Takashi Tomokiyo, Deriving marketing intelligence from online discussion, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
R. Guha , Ravi Kumar , D. Sivakumar , Ravi Sundaram, Unweaving a web of documents, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
Deng Cai , Zheng Shao , Xiaofei He , Xifeng Yan , Jiawei Han, Mining hidden community in heterogeneous social networks, Proceedings of the 3rd international workshop on Link discovery, p.58-65, August 21-25, 2005, Chicago, Illinois
|
|
|
Christian Bird , Alex Gourley , Prem Devanbu , Michael Gertz , Anand Swaminathan, Mining email social networks, Proceedings of the 2006 international workshop on Mining software repositories, May 22-23, 2006, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
Xiaodan Song , Belle L. Tseng , Ching-Yung Lin , Ming-Ting Sun, Personalized recommendation driven by information flow, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mary Helander , Rick Lawrence , Yan Liu , Claudia Perlich , Chandan Reddy , Saharon Rosset, Looking for great ideas: analyzing the innovation jam, Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, p.66-73, August 12-12, 2007, San Jose, California
|
|
|
|
|
|
|
|
|
|
|