|
ABSTRACT
Activity and user engagement in social media such as web logs, wikis, online forums or social networks has been increasing at unprecedented rates. In relation to social behavior in various human activities, user activity in social media indicates the existence of individuals that consistently drive or stimulate 'discussions' in the online world. Such individuals are considered as 'starters' of online discussions in contrast with 'followers' that primarily engage in discussions and follow them. In this paper, we formalize notions of 'starters' and 'followers' in social media. Motivated by the challenging size of the available information related to online social behavior, we focus on the development of random sampling approaches allowing us to achieve significant efficiency while identifying starters and followers. In our experimental section we utilize BlogScope, our social media warehousing platform under development at the University of Toronto. We demonstrate the scalability and accuracy of our sampling approaches using real data establishing the practical utility of our techniques in a real social media warehousing environment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nilesh Bansal and Nick Koudas, BlogScope: A System for Online Analysis of High Volume Text Streams, WebDb, 2007.
|
| |
2
|
|
 |
3
|
Eugene Agichtein , Carlos Castillo , Debora Donato , Aristides Gionis , Gilad Mishne, Finding high-quality content in social media, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341557]
|
| |
4
|
D. Aldous. On the markov chain simulation method for uniform combinatorial distributions and simulated annealing. Probability in the Engineering and Informational Sciences, 1987.
|
| |
5
|
|
| |
6
|
|
| |
7
|
W. Cochran. Sampling Techniques. John Wiley and Sons, 3rd edition, 1977.
|
 |
8
|
|
| |
9
|
|
| |
10
|
R. Gallager. Discrete Stochastic Processes. Springer, 1st edition, 1995.
|
| |
11
|
|
 |
12
|
|
 |
13
|
Daniel Gruhl , R. Guha , David Liben-Nowell , Andrew Tomkins, Information diffusion through blogspace, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988739]
|
| |
14
|
|
| |
15
|
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.
|
 |
16
|
|
 |
17
|
Jure Leskovec , Lars Backstrom , Ravi Kumar , Andrew Tomkins, Microscopic evolution of social networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
[doi> 10.1145/1401890.1401948]
|
| |
18
|
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs, 2007.
|
| |
19
|
P. Rusmevichientong, D. M. Pennock, S. Lawrence, and L. C. Giles. Methods for sampling pages uniformly from the world wide web. In AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.
|
| |
20
|
|
 |
21
|
|
|