Degree distribution


In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

Definition

The degree of a node in a network is the number of connections or edges the node has to other nodes. If a network is directed, meaning that edges point in one direction from one node to another node, then nodes have two different degrees, the in-degree, which is the number of incoming edges, and the out-degree, which is the number of outgoing edges.
The degree distribution P of a network is then defined to be the fraction of nodes in the network with degree k. Thus if there are n nodes in total in a network and nk of them have degree k, we have P = nk/n.
The same information is also sometimes presented in the form of a cumulative degree distribution, the fraction of nodes with degree smaller than k, or even the complementary cumulative degree distribution, the fraction of nodes with degree greater than or equal to k if one considers C as the cumulative degree distribution; i.e. the complement of C.

Observed degree distributions

The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks. The simplest network model, for example, the random graph, in which each of n nodes is independently connected with probability p, has a binomial distribution of degrees k:
. Most networks in the real world, however, have degree distributions very different from this. Most are highly right-skewed, meaning that a large majority of nodes have low degree but a small number, known as "hubs", have high degree. Some networks, notably the Internet, the world wide web, and some social networks were argued to have degree distributions that approximately follow a power law:, where γ is a constant. Such networks are called scale-free networks and have attracted particular attention for their structural and dynamical properties. However, recently, there have been some researches based on real-world data sets claiming despite the fact that most of the observed networks have fat-tailed degree distributions, they deviate from being scale-free.

Excess degree distribution

Excess degree distribution is the probability distribution, for a node reached by following an edge, of the number of other edges attached to that node. In other words, it is the distribution of outgoing links from a node reached by following a link.
Suppose a network has a degree distribution, by selecting one node and going to one of its neighbors, then the probability of that node to have neighbors is not given by. The reason is that, whenever some node is selected in a heterogeneous network, it is more probable to reach the hobs by following one of the existing neighbors of that node. The true probability of such nodes to have degree is which is called the excess degree of that node. In the configuration model, which correlations between the nodes have been ignored and every node is assumed to be connected to any other nodes in the network with the same probability, the excess degree distribution can be found as:
where is the mean-degree of the model. It follows to that fact that the average degree of the neighbor of any node is greater than the average degree of that node. In social networks, it mean that your friends, on average, have more friends than you. This is famous as the friendship paradox. It can be shown that a network can have a giant component, if its average excess degree is larger than one:
Bear in mind that the last two equations are just for the configuration model and to derive the excess degree distribution of a real-word network, we should also add degree correlations into account.

The Generating Functions Method

can be used to calculate different properties of random networks. Given the degree distribution and the excess degree distribution of some network, and respectively, it is possible to write two power series in the following forms:
and
can also be obtained from derivatives of :
If we know the generating function for a probability distribution then we can recover the values of by differentiating:
Some properties, e.g. the moments, can be easily calculated from and its derivatives:
And in general:
For Poisson-distributed random networks, such as the ER graph,, that is the reason why the theory of random networks of this type is especially simple. The probability distributions for the 1st and 2nd-nearest neighbors are generated by the functions and. By extension, the distribution of -th neighbors is generated by:
, with iterations of the function acting on itself.
The average number of 1st neighbors,, is and the average number of 2nd neighbors is:

Degree distribution for directed networks

In a directed network, each node has some in-degree and some out-degree which are the number of links which have run into and out of that node respectfully. If is the probability that a randomly chosen node has in-degree and out-degree then the generating function assigned to this joint probability distribution can be written with two valuables and as:
Since every link in a directed network must leave some node and enter another, the net average number of links entering
a node is zero. Therefore,
which implies that, the generation function must satisfy:
where is the mean degree of the nodes in the network;
Using the function, we can again find the generation function for the in/out-degree distribution and in/out-excess degree distribution, as before. can be defined as generating functions for the number of arriving links at a randomly chosen node, and can be defined as the number of arriving links at a node reached by following a randomly chosen link. We can also define generating functions and for the number leaving such a node:
Here, the average number of 1st neighbors,, or as previously introduced as, is and the average number of 2nd neighbors reachable from a randomly chosen node is given by:. These are also the numbers of 1st and 2nd neighbors from which a random node can be reached, since these equations are manifestly symmetric in and.