Index Show pagesource Old revisions Digg this! Del.Icio.Us Google bookmark

# Social Networks

What is Social Network Analysis?

Social Network Analysis (SNA) is the study of the relationships (links) between people (nodes) in a network. SNA combines elements of mathematical graph theory, sociology, and matrix algebra to product quantitative measures that describe patterns of association.

• Related Items
• Small world problem
• Small world experiment
• Six degrees of separation
• Scale free problem
• Power law distribution
• Rich get richer
• Robust yet fragile
• The strength of weak ties
• Guanxi in Chinese society
• The rule of 150

## Models

• Random graph on Wikipedia
• A random graph is obtained by starting with a set of n vertices and adding edges between them at random. Most commonly studied is the Erdős-Rényi model, called G(n,p), in which each of the possible edges occurs independently with probability p. A closely related model, G(n,M) assigns equal probability to all graphs with exactly M edges.
• Properties of G(n, p):
• mean degree <k> = p(n-1);
• average path length L ~ ln(n)/ln(<k>);
• clustering coefficient C = p = <k>/n « 1, which means that random graphs usually have a small clustering coefficient.
• possion degree distribution
• Despite their simplicity and power, the ER graphs fail to explain two important properties observed in real-world networks:
• By assuming a constant and independent probability of two nodes being connected, they do not account for local clustering and triadic closures. This fact is formally expressed by the ER graphs having a low clustering coefficient.
• They do not account for the formation of hubs. Formally, the degree distribution of ER graphs converges to a Poisson distribution, rather than a power law observed in most real-world, scale-free networks.

* Small-world network on Wikipedia

• Watts and Strogatz (WS) model on Wikipedia
• Many empirical graphs are well modeled by small-world networks. Social networks, the connectivity of the Internet, and gene networks all exhibit small-world network characteristics.
• A certain category of small-world networks were identified as a class of random graphs by Duncan Watts and Steven Strogatz in 1998. They noted that graphs could be classified according to their clustering coefficient and their mean-shortest path length. While many random graphs exhibit a small mean-shortest path (varying typically as the logarithm of the number of nodes) they also usually have a small clustering coefficient.
• Watts and Strogatz measured that in fact many real-world networks have a small shortest path but also a clustering coefficient significantly higher than expected by random chance. Watts and Strogatz proposed a simple model of random graphs with (i) a small average shortest path and (ii) a large clustering coefficient.
• Limitations
• The major limitation of the model is that it produces graphs that are homogeneous in degree. In contrast, real networks are often scale-free networks inhomogeneous in degree, having hubs and a scale-free degree distribution. Such networks are better described by the preferential attachment family of models, such as the Barabási–Albert (BA) model.
• The model also implies a fixed number of nodes and thus cannot be used to model network growth.
• Scale-free network on Wikipedia
• Barabási–Albert (BA) model on Wikipedia | Generalized scale-free model on Wikipedia
• Scale-free networks show a power law degree distribution like most real networks.
• Network growth and preferential attachment have been shown to create networks with power law degree distributions.

## Resources

### Research People & Organization

• Jon Kleinberg - Cornell
• the author of Six Degrees: The Science of a Connected Age (W.W. Norton, 2003) and Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton University Press, 1999).
• Albert-László Barabási - University of Notre Dame
• Scale fee network in Science @ 1999
• Lada Adamic - University of Michigan
• Joshua O'Madadhain - UCI PhD, now in Google, developer of JUNG
• His interests include machine learning, data mining, information retrieval, social network analysis, combinatorial optimization, and the design of data structures and algorithms. He is currently investigating the use of machine learning techniques to create predictive and descriptive models for event-based network data.

### Toolbox & Library

• JUNG - the Java Universal Network/Graph Framework – is a software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. Developed by Scott White, Joshua O'Madadhain, Danyel Fisher, Yan-Biao Boey at UCI. [Open source, free, and has a wide variety of algorithms available. Recommended by Hongbo]
• libSNA is the premier open source library for conducting social network analysis (SNA) research. [Open source, free, few basic network statistics, in Python]
• UCINET software package. [Commercial, slow, and cannot be embedded into applications: you can't call UCINET in an end-user display]
• Pajek Pajek Wiki - Analysis and Visualization of Large Scale Networks, Program for Large Network Analysis
• Social network analysis software - on Wikipedia

## Metrics in social network analysis

• Centrality on Wikipeida - There are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph. There are four measures of centrality that are widely used in network analysis: degree centrality, betweenness, closeness, and eigenvector centrality.
• Degree centrality: The count of the number of ties to other actors in the network. If the network is directed (meaning that ties have direction), then we usually define two separate measures of degree centrality, namely indegree and outdegree. Indegree is a count of the number of ties directed to the node, and outdegree is the number of ties that the node directs to others. For positive relations such as friendship or advice, we normally interpret indegree as a form of popularity, and outdegree as gregariousness.
• Betweenness: Degree an individual lies between other individuals in the network; the extent to which a node is directly connected only to those other nodes that are not directly connected to each other; an intermediary; liaisons; bridges. Therefore, it's the number of people who a person is connected to indirectly through their direct links.
• Closeness: The degree an individual is near all other individuals in a network (directly or indirectly). It reflects the ability to access information through the “grapevine” of network members. Thus, closeness is the inverse of the sum of the shortest distances between each individual and every other person in the network.
• Eigenvector centrality: a measure of the importance of a node in a network. It assigns relative scores to all nodes in the network based on the principle that connections to nodes having a high score contribute more to the score of the node in question. Google's PageRank is a variant of the Eigenvector centrality measure.
• Flow betweenness centrality: The degree that a node contributes to sum of maximum flow between all pairs of nodes (not that node).
• Centralization on Wikipedia - The difference between the n of links for each node divided by maximum possible sum of differences. A centralized network will have many of its links dispersed around one or a few nodes, while a decentralized network is one in which there is little variation between the n of links each node possesses.
• Clustering coefficient on Wikipeida - A measure of the likelihood that two associates of a node are associates themselves. A higher clustering coefficient indicates a greater 'cliquishness'. The clustering coefficient for a vertex is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them.
• • • Cohesion on Wikipedia - The degree to which actors are connected directly to each other by cohesive bonds. Groups are identified as ‘cliques’ if every actor is directly tied to every other actor, ‘social circles’ if there is less stringency of direct contact, which is imprecise, or as structurally cohesive blocks if precision is wanted.
• Structural cohesion - The minimum number of members who, if removed from a group, would disconnect the group. It is also useful to know that k-cohesive graphs (or k-components) are always a subgraph of a k-core, although a k-core is not always k-cohesive. A k-core is simply a subgraph in which all nodes have at least k neighbors but it need not even be connected.

^

 Some illustrative examples are presented in the gallery below:  The 6-node ring in the graph has connectivity-2 or a level 2 of structural cohesion because the removal of two nodes is needed to disconnect it. A 6-node clique is a 5-component, structural cohesion 5.
• Density on Wikipedia - The degree a respondent's ties know one another/ proportion of ties among an individual's nominees. Network or global-level density is the proportion of ties in a network relative to the total number possible (sparse versus dense networks).
• , denotes the number of edges, the number of vertices, the maximal density is 1, which is similar to the • Path Length - The distances between pairs of nodes in the network.
• Average path length is the average of these distances between all pairs of nodes.
• Diameter - The maximum shortest path length
• Radiality - Degree an individual’s network reaches out into the network and provides novel information and influence
• Reach - The degree any member of a network can reach other members of the network.
• Structural equivalence on Wikipedia - Refers to the extent to which actors have a common set of linkages to other actors in the system. The actors don’t need to have any ties to each other to be structurally equivalent.
• Structural hole - Static holes that can be strategically filled by connecting one or more links to link together other points. Linked to ideas of social capital: if you link to two people who are not linked you can control their communication.
• Degree distribution on Wikipedia - Degree distribution gives the probability distribution of degrees in a network. Its use originates from the study of random graph by Paul Erdős and Alfréd Rényi, and it has become an important concept which describes the topology of complex networks. Power-law distribution by Albert-László Barabási.
• Size - the number of nodes
• Triadic Census - TriadicCensus is a standard social network tool that counts, for each of the different possible configurations of three vertices, the number of times that that configuration occurs in the given graph. 16 possible triads
• Structural Holes - Calculates some of the measures from Burt's text "Structural Holes: The Social Structure of Competition". In Structural Holes, Ron Burt (1992; 1995) describes a set of new measures based on ego networks. One key set of measures is concerned with the notion of redundancy. The general meaning of redundancy is clear: a person's ego network has redundancy to the extent that her contacts are connected to each other as well. Unpacking Burt's Redundancy Measures 