Advanced Network Analysis

Advanced Network Analysis
ERGMs for Valued Networks
Olga Chyzh [www.olgachyzh.com]
1 / 48

Readings

Pavel N Krivitsky. Exponential-family random graph models for valued networks. Electronic Journal of Statistics, 6: 1110--1128, 2012.
Bruce A. Desmarais and Skyler J. Cranmer. Statistical inference for valued-edge networks: The generalized exponential random graph model. PloS One, 7(1), 2012.

2 / 48

OverviewMany networks of interest have valued rather than binary edges, e.g. trade network, friendships, romantic relationships.
Can generalize ERGMs to modeling networks with count or rank-ordered valued edges.
Similar properties and estimation approach.
3 / 48

The Sample Space

For binary ERGMs, the sample space (or support) $Y$ — the set of possible networks that can occur — is usually some subset of $2^{N}$ , the set of all possible ways in which relationships among the actors may occur.
For the sample space of valued ERGMs, we need to define $S$ , the set of possible values each relationship may take. For example, for count data, that’s $S = {0, 1, \dots, s}$ if the maximum count is $s$ and ${0, 1, \dots}$ if there is no a priori upper bound. Having specified that, $Y$ is defined as some subset of $S^{N}$ : the set of possible ways to assign to each relationship a value.

4 / 48

Estimation5 / 48

Valued ERGMs

The package ergm.count extends the ergm package to allow for modeling networks with valued edges. This is done by specifying the response argument with the name of the edge attribute to use as the response variable.
New concept: a reference distribution
- Need to think about how the values for connections we measure are distributed. The reference distribution specifies the model for the data before we add any ERGM terms.

help("ergm-references")

6 / 48

The Reference Distribution

7 / 48

Install Packages:

library(devtools)
install_github("ochyzh/networkdata")
#install.packages("ergm.count")
library(statnet)
library(ergm.count)
library(networkdata)

8 / 48

Load the Data

We are going to use Gade et al's data. Our dependent variable--- the valued network--- records the number of collaborations between rebel groups.

data(gadeData)
# data characs
actors = sort(unique(c(gadeData$Var1, gadeData$Var2)))
gadeData<-sort(gadeData)
gadeData$coopActions<-round(gadeData$coopActions^2)
#These are the dyadic variables. They
#must be in matrix form.
dyadVars = names(gadeData)[c(3,5:8)]
n = length(actors) ; p = length(dyadVars)

9 / 48

# create empty arr object for all dyad vars
dyadArray = array(0, 
    dim=c(n,n,p),
    dimnames=list(actors,actors,dyadVars)
    )
# loop through and fill in
for(param in dyadVars){
    for(i in 1:nrow(gadeData)){
        a1 = gadeData$Var1[i]
        a2 = gadeData$Var2[i]
        val =gadeData[i,param]
        dyadArray[a1,a2,param] = val
    }
}

10 / 48

# These are node-level variables.
nodeVars = names(gadeData)[9:11]
nodeData = unique(gadeData[,c('Var1',nodeVars)])
rownames(nodeData) = nodeData$Var1
nodeData = nodeData[actors,c(-1)]
# The DV must be a network object
net = as.network(
    dyadArray[,,'coopActions'], 
    directed=FALSE, loops=FALSE, 
    matrix.type='adjacency',
    ignore.eval = FALSE, 
  names.eval = "coopActions"
    )

11 / 48

Look at the first 10 rows and cols:

 as.matrix(net[1:10,1:10])

##       101st 13th 1st AALS AARB AASB AASG ADF AF AFB
## 101st     0    1   0    0    0    0    0   0  0   0
## 13th      1    0   0    0    0    0    0   0  0   0
## 1st       0    0   0    0    0    0    0   0  0   1
## AALS      0    0   0    0    0    0    0   0  0   0
## AARB      0    0   0    0    0    0    0   0  1   0
## AASB      0    0   0    0    0    0    0   1  0   0
## AASG      0    0   0    0    0    0    0   0  0   1
## ADF       0    0   0    0    0    1    0   0  0   0
## AF        0    0   0    0    1    0    0   0  0   0
## AFB       0    0   1    0    0    0    1   0  0   0

12 / 48

# Set node attributes
for(param in nodeVars){
    network::set.vertex.attribute(net, param, nodeData[,param])
}
# Set network attributes:
set.network.attribute(net,'loc.dyad',dyadArray[,,'loc.dyad'])
set.network.attribute(net,'spons.dyad',dyadArray[,,'spons.dyad'])
# We can view the attribute as a sociomatrix.
as.matrix(net, attrname = "coopActions")[1:10, 1:10]

##       101st 13th 1st AALS AARB AASB AASG ADF AF AFB
## 101st     0    1   0    0    0    0    0   0  0   0
## 13th      1    0   0    0    0    0    0   0  0   0
## 1st       0    0   0    0    0    0    0   0  0   1
## AALS      0    0   0    0    0    0    0   0  0   0
## AARB      0    0   0    0    0    0    0   0  2   0
## AASB      0    0   0    0    0    0    0   1  0   0
## AASG      0    0   0    0    0    0    0   0  0   1
## ADF       0    0   0    0    0    1    0   0  0   0
## AF        0    0   0    0    2    0    0   0  0   0
## AFB       0    0   1    0    0    0    1   0  0   0

13 / 48

Make a Network Graph with Valued Edges:

plot(net, edge.col = "black", usecurve = TRUE, 
     edge.curve = 0, edge.lwd=.25*dyadArray[,,"coopActions"], 
     displaylabels = TRUE)

14 / 48

Make a Network Graph with Valued Edges:

15 / 48

Estimate a Valued ERGM

m0 <- ergm(net ~ sum + 
    nodecov('averageId.node') +
    nodecov('size.node') + 
    nodecov('spons_actor.node') + 
    absdiff('averageId.node') +
    absdiff('size.node') +
    edgecov('loc.dyad') +
    edgecov('spons.dyad'), 
    response = "coopActions", reference = ~Poisson)
mcmc.diagnostics(m0)

16 / 48

Diagnostics

17 / 48

Table of Results

summary(m0)

## Call:
## ergm(formula = net ~ sum + nodecov("averageId.node") + nodecov("size.node") + 
##     nodecov("spons_actor.node") + absdiff("averageId.node") + 
##     absdiff("size.node") + edgecov("loc.dyad") + edgecov("spons.dyad"), 
##     response = "coopActions", reference = ~Poisson)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                               Estimate Std. Error MCMC % z value Pr(>|z|)    
## sum                          -8.012597   1.117417      0  -7.171  < 1e-04 ***
## nodecov.sum.averageId.node    0.564307   0.049284      0  11.450  < 1e-04 ***
## nodecov.sum.size.node         0.074916   0.005851      0  12.803  < 1e-04 ***
## nodecov.sum.spons_actor.node  0.748998   0.140239      0   5.341  < 1e-04 ***
## absdiff.sum.averageId.node   -0.179775   0.050796      0  -3.539 0.000401 ***
## absdiff.sum.size.node        -0.089858   0.010511      0  -8.549  < 1e-04 ***
## edgecov.sum.NULL              3.834232   1.094595      0   3.503 0.000460 ***
## edgecov.sum.NULL              0.597594   0.149731      0   3.991  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance:    0.0  on 465  degrees of freedom
##  Residual Deviance: -707.2  on 457  degrees of freedom
##  
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
## 
## AIC: -691.2  BIC: -658.1  (Smaller is better. MC Std. Err. = 2.764)

18 / 48

Interpretation

Let's calculate the expected number of links between two average rebel groups:

Set all $x$ s to their mean values
Calculate the expected number as $e^{X^{T} β}$

#to get the mean values (in same order as in our model)
x_mean<-apply(data[,c(9,10,11,5,6,7,8)],2,mean)
exp(c(1,x_mean)%*%m0$coef) #expected number of collaborations

19 / 48

Your Turn

Calculate the expected number of links between two average rebel groups with no ideological differences:

20 / 48

Accounting for Network Dependencies21 / 48

Network Dependencies with Valued Edges

Goal: model endogenous network dependencies
Complication: valued edges obviate intuitive network measures, such as triangles and k-stars.
Need to re-conceptualize existing measures to adapt to the context of valued edges.

22 / 48

Individual Heterogeneity.

A count equivalent to $k$ -stars:

Actors may have different overall propensities to interact. This has been modeled using using degeneracy-prone terms like $k$ -star counts. With valued ERGMs, a more robust measure is:

$g_{a c t o r c o v .} g_{a c t o r c o v .} (y y) = \sum_{i \in N} \frac{1}{n - 2} \sum_{j, k \in Y_{i} \land j < k} (\sqrt{y y_{i, j} - \bar{\sqrt{y y}}}) (\sqrt{y y_{i, k} - \bar{\sqrt{y y}}})$

This is essentially a measure of covariance between the squared values of edges incident (originating) from the same actor. Implemented with the term nodecovar(transform="sqrt").

23 / 48

Triadic Closure

24 / 48

Triadic Closure

transitiveweights(twopath, combine, affect)

twopath---given $y y_{i, j}$ and $y y_{k, j}$ , how to compute the value for the two-path?
- "min"---the minimum of their values
- "geomean---geometric mean
combine---given the strength of the two-paths $y y_{i - > k - > j}$ for all $k \neq i, j$ , how to combine the values?
- "max"--- the strength of the strongest path
- "sum"---the sum of path strength
affect ---given the combined strength of the two-paths between $i$ and $j$ , how should they affect $Y Y_{i, j}$ ?
- "min"
- "geomean"

25 / 48

Transitiveweights, Two-Path

${Two-Path}_{m i n} (2, 5) = 2$
${Two-Path}_{g e o m e a n} (2, 5) = \sqrt{(2 * 5)} = \sqrt{10}$

26 / 48

Transitiveweights, Combine

${Combine}_{m a x} = m a x ({Two-path}_{23})$
${Combine}_{s u m} = s u m ({Two-path}_{23})$

27 / 48

Transitiveweights, Affect

For all pairs connected by two-paths:

${Affect}_{m i n} = m i n ({Combine}_{i j})$
${Affect}_{g e o m e a n} = g e o m e a n ({Combine}_{i j})$

28 / 48

Example

m1 <- ergm(net ~ sum + 
    nodecov('averageId.node') +
    nodecov('size.node') + 
    nodecov('spons_actor.node') + 
    absdiff('averageId.node') +
    absdiff('size.node') +
    edgecov('loc.dyad') +
    edgecov('spons.dyad')+ 
    transitiveweights("min","max","min"), 
    response = "coopActions", reference = ~Poisson)
par(mfrow = c(3,2))
mcmc.diagnostics(m1)

29 / 48

Diagnostics

30 / 48

Table of Results

summary(m1)

## Call:
## ergm(formula = net ~ sum + nodecov("averageId.node") + nodecov("size.node") + 
##     nodecov("spons_actor.node") + absdiff("averageId.node") + 
##     absdiff("size.node") + edgecov("loc.dyad") + edgecov("spons.dyad") + 
##     transitiveweights("min", "max", "min"), response = "coopActions", 
##     reference = ~Poisson)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                Estimate Std. Error MCMC % z value Pr(>|z|)    
## sum                           -7.454511   1.063267      0  -7.011  < 1e-04 ***
## nodecov.sum.averageId.node     0.551644   0.049526      0  11.138  < 1e-04 ***
## nodecov.sum.size.node          0.067389   0.005726      0  11.770  < 1e-04 ***
## nodecov.sum.spons_actor.node   0.828172   0.132405      0   6.255  < 1e-04 ***
## absdiff.sum.averageId.node    -0.159866   0.044402      0  -3.600 0.000318 ***
## absdiff.sum.size.node         -0.078650   0.009654      0  -8.147  < 1e-04 ***
## edgecov.sum.NULL               3.805675   1.017857      0   3.739 0.000185 ***
## edgecov.sum.NULL               0.528179   0.126521      0   4.175  < 1e-04 ***
## transitiveweights.min.max.min -0.628822   0.046141      0 -13.628  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance:    0.0  on 465  degrees of freedom
##  Residual Deviance: -810.9  on 456  degrees of freedom
##  
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
## 
## AIC: -792.9  BIC: -755.7  (Smaller is better. MC Std. Err. = 1.921)

31 / 48

Simulating Networks32 / 48

Simulating Networks

We can use the estimates from our model to simulate a network (just like with ERGMs). If the simulated networks look similar to the observed network, then our model has a good fit.

# Simulate from model fit:
simNets <- simulate(m1, nsim = 3)
# Define a plotting function:
plotSimNet = function(net, label){
    set.seed(6886)
    plot(net, edge.col = "black", usecurve = TRUE, 
     edge.curve = 0, edge.lwd=.1*dyadArray[,,"coopActions"], 
     displaylabels = TRUE)
    title(label) }

33 / 48

par(mfrow = c(2, 2))
# add actual network to list of sim nets
# for comparison
simNets[[4]] = net
labels = c(paste0("sim",1:3), 'actual')
lapply(1:length(simNets), function(i){
    plotSimNet(simNets[[i]], labels[i]) })

34 / 48

Observed vs. Simulated Networks

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL

35 / 48

With Fixed Coordinates:

#Plot the original network to get the layout:
set.seed(6886)
p<-plot(net, edge.col = "black", usecurve = TRUE, 
     edge.curve = 0, edge.lwd=.25*dyadArray[,,"coopActions"], 
     displaylabels = TRUE)

36 / 48

# Define a plotting function:
plotSimNet = function(net, label){
    set.seed(6886)
    plot(net, edge.col = "black", usecurve = TRUE, 
     edge.curve = 0, edge.lwd=.1*dyadArray[,,"coopActions"], 
     displaylabels = TRUE, coord=p)
    title(label) }

37 / 48

Observed vs. Simulated Networks (Fixed Coord.)

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL

38 / 48

Assessing Model Fit

In the above exercise, we compared our network to only 3 simulated networks. Ideally, we would like to compare it to more than 3. Since it's difficult to look at thousands of simulated networks on a graph, a way to compare our network to thousands of such simulated networks is by summarizing the characteristics of these simulated networks, such as the sum of edges or various other measures.

Notice that in the code below that, in addition to network statistics included in model m1, we can also summarize statistics that were not explicitly included in m1, such as nodecovar(transform="sqrt"). This is because our simulated networks may still exhibit vertex heterogeneity as a function of the modeled network properties (e.g. triangles), i.e. more triangles may also lead to more k-stars.

Also notice that I specified output="stats". Since I only care about network summaries, I am telling the function to NOT save the actual simulated networks, but only their summary statistics. This saves memory space.

39 / 48

# Simulate from model fit:
simNets1000 <- simulate(m1, monitor = ~ nodecovar(transform="sqrt"), 
    nsim = 1000, output = "stats")

40 / 48

Results of the Simulation

colnames(simNets1000)

##  [1] "sum"                           "nodecov.sum.averageId.node"   
##  [3] "nodecov.sum.size.node"         "nodecov.sum.spons_actor.node" 
##  [5] "absdiff.sum.averageId.node"    "absdiff.sum.size.node"        
##  [7] "edgecov.sum.loc.dyad"          "edgecov.sum.spons.dyad"       
##  [9] "transitiveweights.min.max.min" "nodecovar"

41 / 48

Plot the Summary of the Simulations

#How prevalent are k-stars in the observed network?
obsNet<-summary(net~sum+transitiveweights("min","max","min")+nodecovar(transform="sqrt"), 
                response = "coopActions")
par(mfrow = c(2, 2))
#1st col.=sum
plot(density(simNets1000[,1]), main="")
abline(v = obsNet[1])
title("Sum")
# 9th col. = transitiveweights
plot(density(simNets1000[,9]), main="")
abline(v = obsNet[2])
title("Transitive Weights")
# 10th col. = nodesqrtcovar
plot(density(simNets1000[,10]), main="")
abline(v = obsNet)
title("Nodesqrtvar")

42 / 48

Plot the Summary of the Simulations

43 / 48

Improve Model Specofication

m2 <- ergm(net ~ sum + 
    nodecov('averageId.node') +
    nodecov('size.node') + 
    nodecov('spons_actor.node') + 
    absdiff('averageId.node') +
    absdiff('size.node') +
    edgecov('loc.dyad') +
    edgecov('spons.dyad')+ 
    transitiveweights("min","max","min")+
    nodecovar(transform="sqrt"), 
    response = "coopActions", reference = ~Poisson)

44 / 48

Improve Model Specification

## Call:
## ergm(formula = net ~ sum + nodecov("averageId.node") + nodecov("size.node") + 
##     nodecov("spons_actor.node") + absdiff("averageId.node") + 
##     absdiff("size.node") + edgecov("loc.dyad") + edgecov("spons.dyad") + 
##     transitiveweights("min", "max", "min") + nodecovar(transform = "sqrt"), 
##     response = "coopActions", reference = ~Poisson)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                Estimate Std. Error MCMC % z value Pr(>|z|)    
## sum                           -6.594598   1.084880      0  -6.079  < 1e-04 ***
## nodecov.sum.averageId.node     0.510330   0.047484      0  10.747  < 1e-04 ***
## nodecov.sum.size.node          0.057320   0.005981      0   9.584  < 1e-04 ***
## nodecov.sum.spons_actor.node   0.808021   0.126655      0   6.380  < 1e-04 ***
## absdiff.sum.averageId.node    -0.085453   0.036289      0  -2.355 0.018535 *  
## absdiff.sum.size.node         -0.052703   0.009241      0  -5.703  < 1e-04 ***
## edgecov.sum.NULL               3.580399   1.020037      0   3.510 0.000448 ***
## edgecov.sum.NULL               0.304451   0.124577      0   2.444 0.014530 *  
## transitiveweights.min.max.min -0.432002   0.066354      0  -6.511  < 1e-04 ***
## nodecovar                     -1.109736   0.242183      0  -4.582  < 1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance:    0.0  on 465  degrees of freedom
##  Residual Deviance: -832.7  on 455  degrees of freedom
##  
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
## 
## AIC: -812.7  BIC: -771.3  (Smaller is better. MC Std. Err. = 2.359)

45 / 48

Your Turn

Follow the steps we did for m1 to evaluate the fit of model m2. Start by simulating a small number of networks, plot them and compare them to the observed network. Then simulate 1000 networks and compare them to the observed network on the key network statistics. Does m2 have a better fit?

46 / 48

47 / 48

48 / 48

Readings

Pavel N Krivitsky. Exponential-family random graph models for valued networks. Electronic Journal of Statistics, 6: 1110--1128, 2012.

Bruce A. Desmarais and Skyler J. Cranmer. Statistical inference for valued-edge networks: The generalized exponential random graph model. PloS One, 7(1), 2012.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Advanced Network Analysis

ERGMs for Valued Networks

Olga Chyzh [www.olgachyzh.com]

Readings

Overview

The Sample Space

Estimation

Valued ERGMs

The Reference Distribution

Install Packages:

Load the Data

Look at the first 10 rows and cols:

Make a Network Graph with Valued Edges:

Make a Network Graph with Valued Edges:

Estimate a Valued ERGM

Diagnostics

Table of Results

Interpretation

Your Turn

Accounting for Network Dependencies

Network Dependencies with Valued Edges

Individual Heterogeneity.

Triadic Closure

Triadic Closure

Transitiveweights, Two-Path

Transitiveweights, Combine

Transitiveweights, Affect

Example

Diagnostics

Table of Results

Simulating Networks

Simulating Networks

Observed vs. Simulated Networks

With Fixed Coordinates:

Observed vs. Simulated Networks (Fixed Coord.)

Assessing Model Fit

Results of the Simulation

Plot the Summary of the Simulations

Plot the Summary of the Simulations

Improve Model Specofication

Improve Model Specification

Your Turn

Readings

Help