Social Networks
What is Social Network Analysis?
Social Network Analysis (SNA) is the study of the relationships (links) between people (nodes) in a network. SNA combines elements of mathematical graph theory, sociology, and matrix algebra to product quantitative measures that describe patterns of association.
 Related Items
 Small world problem
 Small world experiment
 Six degrees of separation
 Scale free problem
 Power law distribution
 Rich get richer
 Robust yet fragile
 The strength of weak ties
 Guanxi in Chinese society
 The rule of 150
Models
 Random graph on Wikipedia
 A random graph is obtained by starting with a set of n vertices and adding edges between them at random. Most commonly studied is the ErdősRényi model, called G(n,p), in which each of the possible edges occurs independently with probability p. A closely related model, G(n,M) assigns equal probability to all graphs with exactly M edges.
 Properties of G(n, p):
 mean degree <k> = p(n1);
 average path length L ~ ln(n)/ln(<k>);
 clustering coefficient C = p = <k>/n « 1, which means that random graphs usually have a small clustering coefficient.
 possion degree distribution
 Despite their simplicity and power, the ER graphs fail to explain two important properties observed in realworld networks:
 By assuming a constant and independent probability of two nodes being connected, they do not account for local clustering and triadic closures. This fact is formally expressed by the ER graphs having a low clustering coefficient.
 They do not account for the formation of hubs. Formally, the degree distribution of ER graphs converges to a Poisson distribution, rather than a power law observed in most realworld, scalefree networks.
* Smallworld network on Wikipedia
 Watts and Strogatz (WS) model on Wikipedia
 Many empirical graphs are well modeled by smallworld networks. Social networks, the connectivity of the Internet, and gene networks all exhibit smallworld network characteristics.
 A certain category of smallworld networks were identified as a class of random graphs by Duncan Watts and Steven Strogatz in 1998. They noted that graphs could be classified according to their clustering coefficient and their meanshortest path length. While many random graphs exhibit a small meanshortest path (varying typically as the logarithm of the number of nodes) they also usually have a small clustering coefficient.
 Watts and Strogatz measured that in fact many realworld networks have a small shortest path but also a clustering coefficient significantly higher than expected by random chance. Watts and Strogatz proposed a simple model of random graphs with (i) a small average shortest path and (ii) a large clustering coefficient.
 Limitations
 The major limitation of the model is that it produces graphs that are homogeneous in degree. In contrast, real networks are often scalefree networks inhomogeneous in degree, having hubs and a scalefree degree distribution. Such networks are better described by the preferential attachment family of models, such as the Barabási–Albert (BA) model.
 The model also implies a fixed number of nodes and thus cannot be used to model network growth.
 Scalefree network on Wikipedia
 Barabási–Albert (BA) model on Wikipedia  Generalized scalefree model on Wikipedia
 Scalefree networks show a power law degree distribution like most real networks.
 Network growth and preferential attachment have been shown to create networks with power law degree distributions.
 Socialcircles network model on Wikipedia
Issues in Social Networks
 Advertising for social networks Ad Chap
 Social search on Wikipedia
 Visualizing online social networks pdf
Resources
Research People & Organization
 Jon Kleinberg  Cornell

 the author of Six Degrees: The Science of a Connected Age (W.W. Norton, 2003) and Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton University Press, 1999).
 AlbertLászló Barabási  University of Notre Dame
 Scale fee network in Science @ 1999
 Christos Faloutsos  Jure Leskovec  CMU
 Lada Adamic  University of Michigan
 Joshua O'Madadhain  UCI PhD, now in Google, developer of JUNG
 His interests include machine learning, data mining, information retrieval, social network analysis, combinatorial optimization, and the design of data structures and algorithms. He is currently investigating the use of machine learning techniques to create predictive and descriptive models for eventbased network data.
 Academic Research Centers
 Center for Complex Network Research  AlbertLászló Barabási's lab in Univ. of Notre Dame
 Social Network Analysis  Program on Networked Governance, Harvard University
 CASOS Institute at Carnegie Mellon *NEW*
 UCI DataLab  Padhraic Smyth  UCI (JUNG developer)
 Corporate Research Centers
 Social Computing Group  Microsoft Research
Toolbox & Library
 JUNG  the Java Universal Network/Graph Framework – is a software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. Developed by Scott White, Joshua O'Madadhain, Danyel Fisher, YanBiao Boey at UCI. [Open source, free, and has a wide variety of algorithms available. Recommended by Hongbo]
 libSNA is the premier open source library for conducting social network analysis (SNA) research. [Open source, free, few basic network statistics, in Python]
 UCINET software package. [Commercial, slow, and cannot be embedded into applications: you can't call UCINET in an enduser display]
 Pajek Pajek Wiki  Analysis and Visualization of Large Scale Networks, Program for Large Network Analysis
 Computer Programs for Social Network Analysis in INSNA.org
 Social network analysis software  on Wikipedia
 Computational Models and Social Network Tools in CASOS@CMU
Data Sets
 Data Sets in INSNA.org
 Network Analysis Data in CASOS@CMU
Metrics in social network analysis
 Centrality on Wikipeida  There are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph. There are four measures of centrality that are widely used in network analysis: degree centrality, betweenness, closeness, and eigenvector centrality.
 Degree centrality: The count of the number of ties to other actors in the network. If the network is directed (meaning that ties have direction), then we usually define two separate measures of degree centrality, namely indegree and outdegree. Indegree is a count of the number of ties directed to the node, and outdegree is the number of ties that the node directs to others. For positive relations such as friendship or advice, we normally interpret indegree as a form of popularity, and outdegree as gregariousness.
 Betweenness: Degree an individual lies between other individuals in the network; the extent to which a node is directly connected only to those other nodes that are not directly connected to each other; an intermediary; liaisons; bridges. Therefore, it's the number of people who a person is connected to indirectly through their direct links.
 Closeness: The degree an individual is near all other individuals in a network (directly or indirectly). It reflects the ability to access information through the “grapevine” of network members. Thus, closeness is the inverse of the sum of the shortest distances between each individual and every other person in the network.
 Eigenvector centrality: a measure of the importance of a node in a network. It assigns relative scores to all nodes in the network based on the principle that connections to nodes having a high score contribute more to the score of the node in question. Google's PageRank is a variant of the Eigenvector centrality measure.
 Flow betweenness centrality: The degree that a node contributes to sum of maximum flow between all pairs of nodes (not that node).
 Centralization on Wikipedia  The difference between the n of links for each node divided by maximum possible sum of differences. A centralized network will have many of its links dispersed around one or a few nodes, while a decentralized network is one in which there is little variation between the n of links each node possesses.
 Clustering coefficient on Wikipeida  A measure of the likelihood that two associates of a node are associates themselves. A higher clustering coefficient indicates a greater 'cliquishness'. The clustering coefficient for a vertex is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them.
 Cohesion on Wikipedia  The degree to which actors are connected directly to each other by cohesive bonds. Groups are identified as ‘cliques’ if every actor is directly tied to every other actor, ‘social circles’ if there is less stringency of direct contact, which is imprecise, or as structurally cohesive blocks if precision is wanted.
 Structural cohesion  The minimum number of members who, if removed from a group, would disconnect the group. It is also useful to know that kcohesive graphs (or kcomponents) are always a subgraph of a kcore, although a kcore is not always kcohesive. A kcore is simply a subgraph in which all nodes have at least k neighbors but it need not even be connected.
^
 Density on Wikipedia  The degree a respondent's ties know one another/ proportion of ties among an individual's nominees. Network or globallevel density is the proportion of ties in a network relative to the total number possible (sparse versus dense networks).
 , denotes the number of edges, the number of vertices, the maximal density is 1, which is similar to the
 Path Length  The distances between pairs of nodes in the network.
 Average path length is the average of these distances between all pairs of nodes.
 Diameter  The maximum shortest path length
 Radiality  Degree an individual’s network reaches out into the network and provides novel information and influence
 Reach  The degree any member of a network can reach other members of the network.
 Structural equivalence on Wikipedia  Refers to the extent to which actors have a common set of linkages to other actors in the system. The actors don’t need to have any ties to each other to be structurally equivalent.
 Structural hole  Static holes that can be strategically filled by connecting one or more links to link together other points. Linked to ideas of social capital: if you link to two people who are not linked you can control their communication.
 Degree distribution on Wikipedia  Degree distribution gives the probability distribution of degrees in a network. Its use originates from the study of random graph by Paul Erdős and Alfréd Rényi, and it has become an important concept which describes the topology of complex networks. Powerlaw distribution by AlbertLászló Barabási.
 Size  the number of nodes
 Triadic Census  TriadicCensus is a standard social network tool that counts, for each of the different possible configurations of three vertices, the number of times that that configuration occurs in the given graph. 16 possible triads
 Structural Holes  Calculates some of the measures from Burt's text "Structural Holes: The Social Structure of Competition". In Structural Holes, Ron Burt (1992; 1995) describes a set of new measures based on ego networks. One key set of measures is concerned with the notion of redundancy. The general meaning of redundancy is clear: a person's ego network has redundancy to the extent that her contacts are connected to each other as well. Unpacking Burt's Redundancy Measures
References
Papers
Model of SmallWorld Networks and Search
 D. J. Watts and S. H. Strogatz. Collective dynamics of 'smallworld' networks. Nature, 393:440442 (1998).
 Networks must exist on a sliding scale between clustered and random, producing tightly knit groups of friends but also short paths that reach throughout the whole network. Proposed two concepts to measure the small world, which are HIGH clustering coefficient and SHORT average path length.
 J. Kleinberg. Navigation in a Small World. Nature 406(2000), 845.
 J. Kleinberg. The smallworld phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000.
 J. Kleinberg. SmallWorld Phenomena and the Dynamics of Information. Advances in Neural Information Processing Systems (NIPS) 14, 2001.
 D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks. Science, 296, 13021305 (2002).
 Oskar Sandberg and Ian Clarke. The Evolution of Navigable SmallWorld Networks. arxiv cs.DS/0607025, July 2006.
Scalefree network / Degree distribution
 R. Albert, H. Jeong, A.L. Barabási. Diameter of the world wide web. Nature 401, 130131 (1999).
 A.L. Barabási, R. Albert. Emergence of scaling in random networks. Science 286, 509512 (1999).
 A.L. Barabási, R. Albert, H. Jeong, G. Bianconi. Powerlaw distribution of the World Wide Web. Science 287, 2115 (2000).
Search a social network
 Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, Bernardo A. Huberman. Search in PowerLaw Networks. Phys. Rev. E, 64 46135 (2001).
 P. S. Dodds, R. Muhamad, and D. J. Watts. An experimental study of search in global social networks. Science, 301, 827829 (2003).
 Lada A. Adamic and Eytan Adar. How to search a social network. Social Networks, 27(3):187203, July 2005.
 G. Kossinets and D. J. Watts. Empirical Analysis of Evolving Social Networks. Science, 311, 8890 (2006).
Good Survey papers on complex social networks
 R. Albert, A.LBarabási. Statistical mechanics of complex networks. Rev. Modern Phys. 74(1), 47–97 (2002).
 M. E. J. Newman. The structure and function of complex networks. SIAM Review, 2003, 45, 167256.
Good Related Courses
 The Structure of Information Networks by Kleinberg @ Cornell
 “The past decade has seen a convergence of social and technological networks, with systems such as the World Wide Web characterized by the interplay between rich information content, the millions of individuals and organizations who create it, and the technology that supports it. This course covers recent research on the structure and analysis of such networks, and on models that abstract their basic properties. Topics include combinatorial and probabilistic techniques for link analysis, centralized and decentralized search algorithms, network models based on random graphs, and connections with work in the social sciences.”
Good Related Tutorials
 Tools for large networks: Structure and diffusion in WWW2008 by Jure Leskovec and Christos Faloutsos
Books
 Robert A. Hanneman and Mark Riddle. Introduction to Social Network Methods. UC Riverside, 2005.
 Peter J. Carrington, John Scott, Stanley Wasserman. Models and Methods in Social Network Analysis. Cambridge University Press, 2005. on Amazon
 John P Scott. Social Network Analysis: A Handbook (Paperback). Sage Publications Ltd, 2000.
Articles
 Could it be a Big World After All? The `Six Degrees of Separation' Myth, J. Kleinfeld. Society, April 2002.
 Thoughts on the Social Graph, August, 2007.
 Google’s Marissa Mayer: Social search is the future, Interviewed by VentureBest
 Is Social Search Coming from Google?, February 1, 2008
 Social Search is Coming, February 1, 2008
 The Evolution of Social Networks by By Levy Cohen, Search Engine Watch, Mar 27, 2008
 Evolution of the social network by By Marc Cieslak, BBC Click, 29 March 2008
Videos & PPTs
 Duncan Watts and Kleinberg discuss the smallworld problem on NPR's Science Friday (8 August 2003).
 Challenges in Social Network Data: Processes, Privacy and Paradoxes by Jon Kleinberg on Videolectures
 Social networks and trust : NetTrust on Youtube
 Social Network Analysis: A Research Perspective by Danyel Fisher (Microsoft)ppt
Others
 Social Networks notes of Prof. Cosma Rohilla Shalizi
 Social Network Analysis Page of Prof. Tom Snijders