|
ABSTRACT
We present an approach to leverage a small subset of a coherent community within a social network into a much larger, more representative sample. Our problem becomes identifying a small conductance subgraph containing many (but not necessarily all) members of the given seed set. Starting with an initial seed set representing a sample of a community, we seek to discover as much of the full community as possible. We present a general method for network community expansion, demonstrating that our methods work well in expanding communities in real world networks starting from small given seed groups (20 to 400 members). Our approach is marked by incremental expansion from the seeds with retrospective analysis to determine the ultimate boundaries of our community. We demonstrate how to increase the robustness of the general approach through bootstrapping multiple random partitions of the input set into seed and evaluation groups. We go beyond statistical comparisons against gold standards to careful subjective evaluations of our expanded communities. This process explains the causes of most disagreement between our expanded communities and our gold-standards—arguing that our expansion methods provide more reliable communities than can be extracted from reference sources/gazetteers such as Wikipedia.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Lars Backstrom , Dan Huttenlocher , Jon Kleinberg , Xiangyang Lan, Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150412]
|
| |
3
|
|
| |
4
|
|
| |
5
|
Cami, A., Balakrishnan, H., Deo, N., and Dutton, R. 2006. On the complexity of finding optimal global alliances. J. Comb. Math. Comb. Comput. 58, 23--31.
|
| |
6
|
Cirasella, J. 2007. Google sets, google suggest, and google search history: Three more tools for the reference librarian's bag of tricks. Refer. Libr. 48.1.
|
| |
7
|
Clauset, A., Moore, C., and Newman, M. E. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 7191, 98--101.
|
| |
8
|
Favaron, O., Fricke, G., Goddard, W., Hedetniemi, S. M., Hedetniemi, S. T., Kristiansen, P., Laskar, R. C., and Skaggs, D. 2002. Offensive alliance graphs. Discussiones Mathematicae—Graph Theory.
|
| |
9
|
Fernau, H. and Raible, D. 2007. Alliances in graphs: a complexity-theoretic study. In Proceedings of the Software Seminar, vol. 2, J. van Leeuwen, G. F. Italiano, W. van der Hoek, C. Meinel, H. Sack, F. Plasil, and M. Bielikov, Eds. Institute of Computer Science AS CR, Prague, 61--70.
|
 |
10
|
Gary William Flake , Steve Lawrence , C. Lee Giles, Efficient identification of Web communities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.150-160, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347121]
|
| |
11
|
Flake, G. W., Tarjan, R. E., and Tsioutsiouliklis, K. 2004. Graph clustering and minimum cut trees. J. Internet Math. 1, 385--408.
|
| |
12
|
Ghahramani, Z. and Heller, K. A. 2005. Bayesian sets. In Proceedings of NIPS.
|
 |
13
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
14
|
Girvan, M. and Newman, M. E. J. 2002. Community structure in social and biological networks. Proc. Nati. Acad. Sci. 99, 7821--7826.
|
| |
15
|
Godbole, N., Srinivasaiah, M., and Skiena, S. 2007. Large-scale sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'07).
|
 |
16
|
|
| |
17
|
Hopcroft, J., Khan, O., Kulis, B., and Selman, B. 2004. Tracking evolving communities in large linked networks. Proc. Natl. Acad. Sci. 101 Suppl 1, 5249--5253.
|
| |
18
|
Jamieson, L. H, Hedetniemi, S. T., and McRa, A. A. 2002. The algorithmic complexity of alliances in graphs. J. Combin. Math. Combin. Comput.
|
| |
19
|
Kernighan, B. W. and Lin, S. 1970. An efficient heuristic procedure for partitioning graphs. The Bell Syst. Tech. J. 49, 1, 291--307.
|
| |
20
|
Kil, J., Lloyd, L., and Skiena, S. 2005. Question answering with Lydia. In Proceedings of 14th Text Retrieval Conference (TREC'05).
|
 |
21
|
|
| |
22
|
Kossinets, G. and Watts, D. J. 2005. Empirical analysis of an evolving social network. Science 311, 88--90.
|
 |
23
|
|
| |
24
|
Lloyd, L., Kaulgud, P., and Skiena, S. 2006. Newspapers vs. blogs: Who gets the scoop? In Proceedings of the Conference on Computational Approaches to Analyzing Weblogs (AAAI-CAAW'06). AAAI Press, 117--124.
|
| |
25
|
Lloyd, L., Kechagias, D., and Skiena, S. 2005. Lydia: A system for large-scale news analysis. In Proceedings of the Conference on String Processing and Information Retrieval (SPIRE'05). Lecture Notes in Computer Science, Vol. 3772. 161--166.
|
| |
26
|
Lloyd, L., Mehler, A., and Skiena, S. 2006. Identifying co-referential names across large corpora. In Proceedings of the Conference on Combinatorial Pattern Matching (CPM'06).
|
| |
27
|
|
| |
28
|
Newman, M. E. J. 2004. Detecting community structure in networks. Europ. Phys. J. B 38, 321--330.
|
| |
29
|
Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113.
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
Scott, J. 2000. Social Network Analysis: A Handbook. Sage Publications.
|
| |
34
|
Shafique, K. H. 2001. Partitioning a graph in alliances and its application to data clustering. Ph.D. thesis, School of Computer Science, University of Central Florida Orlando.
|
| |
35
|
|
| |
36
|
|
| |
37
|
Ward, C., Bautin, M., and Skiena, S. 2009. Identifying differences in news coverage between cultural/ethnic groups.
|
| |
38
|
Wu, F. and Huberman, B. A. 2004. Finding communities in linear time: a physics approach. Eur. Phys. J. B 38, 331--338.
|
CITED BY 2
|
|
Anurag Ambekar , Charles Ward , Jahangir Mohammed , Swapna Male , Steven Skiena, Name-ethnicity classification from open sources, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|
|
|
|