|
ABSTRACT
Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational "cathedrals" are to be contrasted with the "bazaar-like" nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. Still, in large, complex, successful, OSS projects, we do expect that subcommunities will form spontaneously within the developer teams. Studying these subcommunities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could well hold important lessons for how commercial software teams might be organized. Building on known well-established techniques for detecting community structure in complex networks, we extract and study latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed spontaneously arise within these projects as the projects evolve. These subcommunities manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
T. Allen et al. Managing the flow of technology. Cambridge: The MIT Pr., 1979.
|
| |
3
|
U. Alon. Biological Networks: The Tinkerer as an Engineer. Science, 301(5641):1866--1867, 2003.
|
 |
4
|
Lars Backstrom , Dan Huttenlocher , Jon Kleinberg , Xiangyang Lan, Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150412]
|
| |
5
|
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289--300, 1995.
|
| |
6
|
J. Berkus. The 5 types of open source projects. March 20, 2007 http://www.powerpostgresql.com/5_types.
|
 |
7
|
Christian Bird , Alex Gourley , Prem Devanbu , Michael Gertz , Anand Swaminathan, Mining email social networks, Proceedings of the 2006 international workshop on Mining software repositories, May 22-23, 2006, Shanghai, China
[doi> 10.1145/1137983.1138016]
|
 |
8
|
Christian Bird , Alex Gourley , Prem Devanbu , Michael Gertz , Anand Swaminathan, Mining email social networks, Proceedings of the 2006 international workshop on Mining software repositories, May 22-23, 2006, Shanghai, China
[doi> 10.1145/1137983.1138016]
|
| |
9
|
|
| |
10
|
G. Box, W. Hunter, and J. Hunter. Statistics for experimenters: an introductory to design data analysis and model building. Wiley Series in Probability and Mathematical Statistics)., 1978.
|
| |
11
|
P. Boykin and V. Roychowdhury. Personal Email Networks: An Effective Anti-Spam Tool. Arxiv preprint cond-mat/0402143, 2004.
|
| |
12
|
|
 |
13
|
Marcelo Cataldo , Patrick A. Wagstrom , James D. Herbsleb , Kathleen M. Carley, Identification of coordination requirements: implications for the Design of collaboration and awareness tools, Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, November 04-08, 2006, Banff, Alberta, Canada
[doi> 10.1145/1180875.1180929]
|
| |
14
|
A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70(6):66111, 2004.
|
| |
15
|
J. F. P. D. Cleidson de Souza. Seeking the source: Software source code as a social and technical artifact, 2005. http://opensource.mit.edu/papers/desouza.pdf.
|
| |
16
|
M. Conway. How do committees invent. Datamation, 14(4):28--31, 1968.
|
| |
17
|
K. Crowston and J. Howison. The social structure of free and open source software development. First Monday, 10(2), 2005.
|
 |
18
|
|
| |
19
|
P. Dalgaard. Introductory Statistics With R. Springer, 2002.
|
| |
20
|
L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 9:P09008, 2005.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PROC. NATL. ACAD. SCI. USA, 99:7821, 2002.
|
| |
27
|
C. Gkantsidis, M. Mihail, and E. Zegura. The markov chain simulation method for generating connected power law random graphs. In Proceedings of ALENEX '03, pages 16--25, 2003.
|
| |
28
|
P. Gleiser and L. Danon. Community structure in jazz. Advances in Complex Systems, 6:565, 2003.
|
| |
29
|
J. González-Barahona, L. López, and G. Robles. Community structure of modules in the apache project. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, 2005.
|
| |
30
|
R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in organisations. Physical Review E, 68:065103, 2003.
|
| |
31
|
R. Guimerà, S. Mossa, A. Turtschi, and L. Amaral. From the Cover: The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proc Natl Acad Sci US A, 102(22):7794--7799, 2005.
|
| |
32
|
R. M. Henderson and K. B. Clark. Architectural innovation: The reconfiguration of existing product technologies and the failure of established firms. Administrative Science Quarterly, 35(1):9--30, 1990.
|
| |
33
|
|
 |
34
|
|
| |
35
|
G. Hertel, S. Niedner, and S. Herrmann. Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel. Research Policy, 32(7):1159--1177, 2003.
|
 |
36
|
|
| |
37
|
A. Hintze and C. Adami. Evolution of complex modular biological networks. PloS Computational Biology, e23.eor, 2008.
|
 |
38
|
|
| |
39
|
H. Ibarra. Network centrality, power, and innovation involvement: Determinants of technical and administrative roles. The Academy of Management Journal, 36(3):471--501, jun 1993.
|
| |
40
|
N. Kashtan and U. Alon. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences, 102(39):13773--13778, 2005.
|
| |
41
|
K. Kuwabara. Linux: A bazaar at the edge of chaos. First Monday, 5(3), March 2000.
|
| |
42
|
L. Layman, L. Williams, D. Damian, and H. Bures. Essential communication practices for Extreme Programming in a global software development team. Information and Software Technology, 48(9):781--794, 2006.
|
| |
43
|
L. Lopez, J. M. Gonzalez-Barahona, and G. Robles. Applying social network analysis to the information in cvs repositories. In Proceedings of the International Workshop on Mining Software Repositories, 2004.
|
| |
44
|
R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. On the uniform generation of random graphs with prescribed degree sequences. Arxiv preprint cond-mat/0312028, 2003.
|
 |
45
|
|
 |
46
|
|
| |
47
|
|
 |
48
|
Kumiyo Nakakoji , Yasuhiro Yamamoto , Yoshiyuki Nishinaka , Kouichi Kishida , Yunwen Ye, Evolution patterns of open-source software systems and communities, Proceedings of the International Workshop on Principles of Software Evolution, May 19-20, 2002, Orlando, Florida
[doi> 10.1145/512035.512055]
|
| |
49
|
M. E. J. Newman. Analysis of weighted networks. Physical Review E, 70:056131, 2004.
|
| |
50
|
M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3):36104, 2006.
|
| |
51
|
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.
|
| |
52
|
M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64(2), Jul 2001.
|
 |
53
|
|
| |
54
|
|
| |
55
|
M. P. Robillard. Bellairs workshop on recommender systems, 3 2008.
|
| |
56
|
|
| |
57
|
M. L. Tushman and R. Katz. External communication and project performance: An investigation into the role of gatekeepers. Management Science, 26(11):1071--1085, 1980.
|
| |
58
|
J. Tyler, D. Wilkinson, and B. Huberman. E-Mail as Spectroscopy: Automated Discovery of Community Structure within Organizations. The Information Society, 21(2):143--153, 2005.
|
| |
59
|
Giuseppe Valetto , Mary Helander , Kate Ehrlich , Sunita Chulani , Mark Wegman , Clay Williams, Using Software Repositories to Investigate Socio-technical Congruence in Development Projects, Proceedings of the Fourth International Workshop on Mining Software Repositories, p.25, May 20-26, 2007
[doi> 10.1109/MSR.2007.33]
|
| |
60
|
P. Wagstrom, J. Herbsleb, and K. Carley. A Social Network Approach To Free/Open Source Software Simulation. Proceedings of the 1st International Conference on Open Source Systems, Genova, 11th--15th July, 2005.
|
| |
61
|
S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge University Press, 1994.
|
| |
62
|
|
 |
63
|
|
| |
64
|
|
| |
65
|
E. Ziv, M. Middendorf, and C. Wiggins. Information-theoretic approach to network modularity. Physical Review E, 71(4):46117, 2005.
|
|