Reihaneh Rabbany, D. Eswaran, C. Faloutsos, A. Dubrawski


Abstract

Many real world applications include information on both attributes of individual entities as well as relations between them, while there exists an interplay between these attributes and relations. For example, in a typical social network, the similarity of individuals’ characteristics motivates them to form relations, a.k.a. social selection; whereas the characteristics of individuals may be affected by the characteristics of their relations, a.k.a. social influence. We can measure proclivity in networks by quantifying the correlation of nodal attributes and the structure [1]. Here, we are interested in a more fundamental study, to extend the basic statistics defined for graphs and draw parallels for the attributed graphs. More formally, an attributed graph is denoted by (A,X); where An×n is the adjacency matrix and encodes the relationships between the n nodes, and Xn×k is the attributes matrix –each row shows the feature vector of the corresponding node. Degree of a node encodes the number of its neighbors, computed as ki = ∑ j Aij . We can extend this notion to networks with binary attributes to the number of neighbors which share a particular attribute x, i.e. ki(x) = ∑ j Aijδ(Xj , x); where δ(Xj , x) = 1 iff node j has attribute x. Similar to the simple graphs, where the degree distribution is studied and showed to be heavy tail, here we can look at: 1) the degree distributions per attribute, 2) the joint probability distribution of any pair of attributes. Moreover, if we assume A(x1, x2) is the induced subgraph (or masked matrix of edges) with endpoints of values (x1, x2), i.e., A(x1, x2) = Aijδ(Xi, x1)δ(Xj , x2), then we can study and compare these distributions for the induced subgraph per each pair of attribute values. For example, Figure 1 shows the same general trend in the distribution of the original graph and the three possible induced subgraph.