Uncovering Places in Two-Mode Networks

Using Structural Equivalence to Study Affiliation Networks

Cécile Armand (Ecole Normale Supérieure de Lyon)

Abstract: This document presents an effective approach for handling two-mode networks, utilizing the concept of ‘place’ or structural equivalence as its fundamental framework. It primarily relies on the ‘Places’ and ‘igraph’ R packages. To illustrate this method, it employs an edge list representing students and their respective universities in the United States. The data source for this analysis is derived from the directory of an alumni club, specifically the American University Club of Shanghai, which was originally published in 1936. The document proceeds through four main steps: (1) identification of places from the edge list, (2) transformation of the list of places into a network of places, along with its transposed network of universities, (3) visualization and analysis of the network, including community detection, and (4) the introduction of a more flexible approach grounded in the concepts of regular equivalence or k-places.

Prerequisites: Basic notions of network analysis and the “tidyverse” suite.

Introduction

0.1 Context

Two-mode networks¹, i.e. networks that involve two different types of nodes, such as persons and organizations, represent a significant proportion of network analysis research in the humanities and social sciences. Indeed, it is not always possible to gather first-hand data on direct relationships, such as friendship or family ties. In many situations, social relations are mediated by a third party or have to be inferred from indirect ties, such as school attendance, co-participation in events, membership in clubs or corporate boards.

In this document, we interchangeably use the terms “vertex” (plural: “vertices”) and “node(s)” to refer to the network’s nodes. We alternatively employ the terms “edge(s)” and “tie(s)” to designate the network’s edges.

Analyzing two-mode networks raises significant challenges, which have been extensively described in specialized literature (Borgatti 2009), (Borgatti, Halgin 2011). Three major approaches have been commonly deployed. The first approach, which applies algorithms developed for one-mode networks, disregards the unique characteristics of two-mode data and introduces biases that have been discussed in previous works (Borgatti, Everett 1997). The second approach involves projecting the original two-mode network into two separate one-mode networks (Everett, Borgatti 2013). Depending on their interest, researchers typically focus on one projection and discard the other. However, this method has been shown to result in a loss of information and the creation of artificial clustering, which can introduce biases in the interpretation of the data (Newman et al. 2001), (Uzzi, Spiro 2005), (Zhou et al. 2007). A third approach, implemented notably in research on interlocks since the 1970s or in ecology, and more recently in other disciplines, maintains the bimodal structure of the studied network.

The place-based methodology we aim to introduce in this document offers a powerful alternative to the three mainstream approaches described above, as illustrated on Figure 1. First, it allows for a reduction of the network without sacrificing information. Second, it maintains the inherent duality property found in two-mode networks (Field et al. 2006).

Figure 1 - The place-based methodology

It is crucial to underscore that, in this context, the concept of place² should not be interpreted in a geographical sense. Originally introduced by sociologist Narciso Pizarro (Pizarro 2002), (Pizarro 2007), the concept of place instead takes inspiration from the notion of structural equivalence³ employed in network analysis since the 1970s. Within the framework of individuals affiliated with specific institutions, each ‘place’ denotes a group of individuals who share the exact same set of institutions. Put differently, individuals are considered part of the same ‘place’ if they are affiliated with the same institution or combination of institutions.

A more flexible approach introduces a tolerance threshold, denoted as k. This allows for regular equivalence⁴ rather than strict structural equivalence. By setting a value for k, we allow for a certain degree of variation or difference between individuals’ affiliations. For example, if we set k = 1, we accept that two individuals may differ by one institution.

The concept of places is particularly relevant under the following conditions:

When the data consists of two-mode relational data, such as club membership, interlocking boards, or co-participation in events;
When there is multiple membership, meaning individuals are connected to more than one institution;
When the range of membership per individual is not excessively wide;
When the distribution of members across institutions is not heavily skewed.

The last two conditions are not strictly necessary, but they significantly facilitate the initial interpretation of places.

While popular software tools like Cytoscape and Gephi do not provide built-in functions for place-based analysis, researchers can resort to the Places R package developed by Delio de Lucena at Science-Po Toulouse. Although there are alternative approaches for addressing structural equivalence in R (such as netdiffuseR and concoR), the “Places” package is the only available library that specifically focuses on the detection and analysis of ’places.

0.2 Packages

This document relies on the following packages:

dplyr: the dplyr package is a powerful and user-friendly tool for data manipulation, providing functions for filtering, selecting, mutating, summarizing, and arranging data frames in an efficient and readable manner.
kableExtra is used to enhance the display of dataframes and make the data more legible.
Places : A package specifically designed to find places in two-mode data. This package has been developed by Delio de Lucena (Science-Po Toulouse).
igraph: A reference package for building, analyzing and visualizing networks.

Note that visualization is not igraph’s main strength. Other packages such as tidygraph can be utilized for improving visual aspects, but this is not the core focus of this tutorial. Additionally, it may be helpful to export the edge lists for further exploration with network analysis software such as Gephi or Cytoscape, which enable greater interactivity.

0.3 Data

The example data used in this tutorial was created by the author from a directory of the American University Club of Shanghai published in 1936 (Shanghai 1936), (Armand 2024). The original dataset can be downloaded from Zenodo. It is freely accessible and open for reuse. In this tutorial, we shall use a simplified version of the original dataset, which we describe below.

The dataset is typically an edge list⁵ of individuals linked to the universities they attended. It contains 682 academic curricula distributed among the 418 members of the American University Club of Shanghai. Since the individuals may have obtained several degrees from different universities, they may appear in several rows. Each row refers to a distinct curriculum.

To load the data, we run the following line:

library(readr)

auc <- read_delim("data/auc.csv", delim = ";", escape_double = FALSE, col_types = cols(Nationality = col_factor(levels = c("Chinese", "Japanese", "Western")), Start_year = col_number(), End_year = col_number()), trim_ws = TRUE)

head(auc)

# A tibble: 6 × 7
  Name         Nationality University     Degree    Field    Start_year End_year
  <chr>        <fct>       <chr>          <chr>     <chr>         <dbl>    <dbl>
1 Ting_H.N.    Chinese     Pennsylvania   Bachelor  Arts           1915     1918
2 Inui_Kiyosue Japanese    Michigan       Bachelor  Arts           1906     1906
3 Inui_Kiyosue Japanese    Tokyo Imperial Doctorate Law            1897     1901
4 Yu_Leo W.    Chinese     Purdue         <NA>      <NA>           1925     1926
5 Yu_Leo W.    Chinese     Nebraska       Bachelor  Electri…       1925     1925
6 Yu_Leo W.    Chinese     Nevada         <NA>      <NA>           1922     1923

The names() function lists the columns contained in the data frame and the summary() function provides a summary description of the dataset:

names(auc)

[1] "Name"        "Nationality" "University"  "Degree"      "Field"      
[6] "Start_year"  "End_year"

The data frame includes the following columns:

Name: the student’s name
Nationality: the student’s national origin (Chinese, Western, Japanese)
University: the name of the university attended by the student
Degree: the nature of the academic degree obtained by the student
Field: the students’ major field of study
Start_year: the student’s year of enrollment or graduation
End_year: the year of graduation

summary(auc)

     Name             Nationality   University           Degree         
 Length:682         Chinese :401   Length:682         Length:682        
 Class :character   Japanese:  6   Class :character   Class :character  
 Mode  :character   Western :275   Mode  :character   Mode  :character  
                                                                        
                                                                        
                                                                        
                                                                        
    Field             Start_year      End_year   
 Length:682         Min.   :1883   Min.   :1883  
 Class :character   1st Qu.:1914   1st Qu.:1915  
 Mode  :character   Median :1920   Median :1921  
                    Mean   :1920   Mean   :1920  
                    3rd Qu.:1926   3rd Qu.:1926  
                    Max.   :1935   Max.   :1935  
                    NA's   :2      NA's   :1

The summary function provides useful information about the data. For example, it indicates that there are 401 curricula by Chinese students, 275 by Western students, and 6 by Japanese students. The time span of their studies ranges from 1883 to 1935, with the first degree obtained in 1883 and the last one in 1935, one year before the publication of the book.

0.4 Workflow

Figure 2 presents a tentative workflow for developing an effective place-based methodology, which comprises essential and optional modules. In this tutorial, we will focus on:

Detecting and analyzing places from two-mode data (2 and 3 on Figure 2)
Creating dual networks of places and sets from the detected places (4)
Basic network analysis and visualization (5)
Detecting communities in the dual network (6)
A brief introduction to regular equivalence and the k-places function.

Figure 2 - Standard workflow for a place-based analysis (<a href='https://xmind.app/mindmap/places-in-two-mode-data-a-workflow/YX2g4H/?from=gallery#' target='_blank'>interactive version</a>)

Figure 2 - Standard workflow for a place-based analysis (interactive version)

1 Extracting Places from the Two-Mode Network

The first section aims at detecting and analyzing places from the dataset of students and universities.

The initial step is to install the Places package from the author’s repository:

install.packages("http://lereps.sciencespo-toulouse.fr/IMG/gz/places_0.2.3.tar.gz", repos = NULL, type = "source")
library(Places)

It is important to emphasize that the data must be in a data frame format. One can use the class() function to check whether the data is in the proper format and the as.data.frame() function to make the necessary conversion, as shown below.

class(auc)

[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

auc <- as.data.frame(auc)

We can now apply the place() function, the key function to detect places. This function is made up of three arguments:

data: serves to specify the input dataset (e.g., the edge list of students and universities, “auc”). The input data must be an edge list.
col.elements: to select the source column, designated as “Elements” in the Places package terminology (in this specific case, the students).
col.sets: to select the target column, designated as “Sets” (i.e., the universities attended by the students).

Result1 <- places(data = auc, col.elements = "Name", col.sets = "University")

Cleaning data ... rows with empty cells and NAs will be removed

Rows removed: 0

Cleaning data ... duplicate rows will be removed

Duplicate rows removed: 72

There are 418 elements and 146 sets

Working ...

A total of 223 places have been identified

It is possible to make the code shorter by skipping the name of arguments, as shown below.

Result1 <- places(auc, "Name", "University")

As indicated in the console, 223 unique places were found from the initial dataset of 418 students (Elements) and 146 universities (Sets).

The place() function returns a list object which contains three data frames:

The original two columns data frame and the column “Places” with places labels.
A data frame containing information about places.
The network of places in a two-mode edgelist format.

The data frame containing information about places includes the following features:

PlaceNumber: contains the number of the place, ordered from the highest to the lowest number of sets.
PlaceLabel: the place number, and within parentheses, the number of element and sets it contains. Labels start with P, followed by the place number, the number of elements in place and the number of sets defining the place.
NbElements: the number of elements (students) contained in the place.
NbSets: the number of sets (universities) in the place.
PlaceDetail : This column contains important and detailed information about the places, including the names of all the elements within each place and the sets that define each place.

To enable further manipulation, we extract the key information in a data frame format. Additionnally, we use kableExtra to enhance the table and make the data more legible. Only the 6 first rows are displayed below:

Result1_df <- as.data.frame(Result1$PlacesData) 

library(kableExtra)

kable(head(Result1_df), caption = "First 6 places") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

First 6 places
PlaceNumber	PlaceLabel	NbElements	NbSets	PlaceDetail
1	P001(1-4)	1	4	{Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
2	P002(1-4)	1	4	{Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
3	P003(1-4)	1	4	{Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
4	P004(1-4)	1	4	{Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
5	P005(1-3)	1	3	{Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
6	P006(1-3)	1	3	{Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}

kable() is a function from the knitr package used for creating tables in a nicely formatted way. kable_styling() is a function that sets the styling options for the table:

bootstrap_options = “striped” applies striped row styling, which is a common styling choice in tables.
full_width = T indicates that the table should take up the full width of its container.
position = “left” specifies the position of the table on the page, in this case, aligning it to the left.

In the following section, we will conduct a more in-depth examination of the attributes associated with the places.

1.1 Places attributes

We first explore how the students (Elements) are distributed among places using the table() function in R base:

table(Result1_df$NbElements)


  1   2   3   4   5   7  10  11  12  15  16  18 
179  15  12   4   1   3   1   1   1   2   2   2

We can utilize ggplot2 to visualize this distribution:

library(ggplot2)

ggplot(Result1_df, aes(x = NbElements)) +
  geom_histogram(binwidth = 1, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Students by place",
       x = "Number of Students",
       y = "Frequency") +
  theme_minimal()

Explanation:

ggplot(Result1_df, aes(x = NbElements)): This initializes the ggplot object, specifying the data frame (Result1_df) and the aesthetic mapping (x = NbElements).
geom_histogram(binwidth = 1, fill = “blue”, color = “black”, alpha = 0.7): This adds the histogram layer to the plot. binwidth sets the width of the bins, fill and color set the fill and border colors, and alpha controls the transparency.
labs(title = “Students by place”, x = “Number of Elements on X-axis”, y = “Frequency”): This sets the title and axis labels.
theme_minimal(): This applies a minimal theme to the plot, but you can customize the theme based on your preferences.

The table and histogram below reveal that most places (179, or 80%) consist of unique trajectories centered on a single student. These places are perfectly aligned with individual students. This is an intriguing finding in itself. It suggests something significant about the structure of the network, which can be attributed by historical circumstances. Specifically, the prevalence of these idiosyncratic places reflects the widespread adoption of the elective system in American higher education institutions during the late 19th century.

Similarly, we can explore the distribution the universities (Sets) among places:

table(Result1_df$NbSets)


  1   2   3   4 
 79 119  21   4

library(ggplot2)

ggplot(Result1_df, aes(x = NbSets)) +
  geom_histogram(binwidth = 1, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Universities by place",
       x = "Number of Universities",
       y = "Frequency") +
  theme_minimal()

Most places contain a maximum of two universities, meaning that the majority of students attended a maximum of two different universities. This suggests that many students were relatively mobile during their studies and transferred to a different institution to complete their training. Fewer students (25 places) attended more than two universities during their studies. More specifically, 21 places involved students who attended 3 and 4 places involved students who attended 4 different universities.

1.2 Most significant places

Beyond crude statistics, we want to gain deeper insights into the students and universities that define each place. The “PlaceDetail” column provides this information. Since examining all 223 places individually would be time-consuming, we can start by focusing on the most populated places, which include a minimum of 2 students and 2 colleges. In this approach, we choose to discard idiosyncratic places that involve only one student or unique curricula. Using the filter() function, 13 such places are found:

n2 <- Result1_df %>% 
  filter(NbElements >1 & NbSets>1)

kable(n2, caption = "The 13 most significant places") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

The 13 most significant places
PlaceNumber	PlaceLabel	NbElements	NbSets	PlaceDetail
26	P026(4-2)	4	2	{Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University}
27	P027(3-2)	3	2	{Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan}
28	P028(3-2)	3	2	{Chang_Ting-Chin;Hsueh_Wei Fan;Wong_Tse-Kong} - {Ohio State;Pennsylvania}
29	P029(3-2)	3	2	{Ho_Teh-Kuei;Sze_F.C.;Tsai_Thomas Wen-hsi} - {Harvard;Wisconsin}
30	P030(3-2)	3	2	{Huang_H.L.;Wang_K.P.;Welles_Henry H.} - {Columbia;Princeton}
31	P031(2-2)	2	2	{Chen_Kwan-Pu;Wong_I.K.} - {Pennsylvania;St. John’s University}
32	P032(2-2)	2	2	{Jen_Lemuel C.C.;West_Eric Ralph} - {California;George Washington}
33	P033(2-2)	2	2	{Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology}
34	P034(2-2)	2	2	{Lin_Peter Wei;Ma_Y.C.} - {Columbia;Yale}
35	P035(2-2)	2	2	{Lum_Joe W.;Wu_Jack Foy} - {Columbia;Stanford}
36	P036(2-2)	2	2	{Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan}
37	P037(2-2)	2	2	{Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology}
38	P038(2-2)	2	2	{Tsao_Y.S.;Yen_Fu-ching} - {Harvard;Yale}

In subsequent steps, it is recommended to carefully examine the list of places and their associated details, beginning with the most significant and progressively broadening the selection to encompass less populous places. It is beyond the scope of this tutorial to extensively describe the places found in this dataset, but I have contributed an in-depth research paper that demonstrates how places can be used to identify typical educational paths and track students who followed similar trajectories (Armand 2024). Interested readers are encouraged to consult this paper for further information.

It is important to emphasize that being part of the same place does not necessarily imply that the individuals actually met or physically interacted.

1.3 Typology of places

If the data includes qualitative attributes, these attributes can be used to further characterize the places and build a typology. In our example, for instance, we considered the students’ field of study and the time of graduation to establish the relative strength of places, as shown in the table below:

	SAME TIME	DIFFERENT TIME
SAME DISCIPLINE	TYPE A : Strong potential for regular interaction (4 places, 9%)	TYPE C : Potential for later collaboration (7 places, 16%)
DIFFERENT DISCIPLINE	TYPE B: Potential for extra-curricula interaction (8 places, 18%)	TYPE D : Shared academic experience and cultural background (25 places, 32%)

Type A places represent the strongest potential for direct interaction, as they involve students who enrolled in the same programs at the same time, likely attending the same classes. In our population, there are four places of this type, all featuring students in sciences or engineering who graduated during or after World War I (P032, P033, P037, P173).
Type B places are characterized by potential interactions outside the classroom setting. While students in these places may not have enrolled in the same courses, their paths could have crossed on campus through various extracurricular activities. An illustrative example of a Type B place is represented by physician Yan Fuqing (顔福慶) (Yen Fu-ching) and businessman Cao Maoxiang (曹蝥祥) (Y.S. Tsao), who attended Yale and Harvard between 1909 and 1914 (P038).
Type C places refer to students who attended the same universities and graduated in the same disciplines but at different periods of time. Although these students did not have the opportunity to physically interact on campus, their shared educational background created a potential for future collaborations and intergenerational connections. For instance, economist Ma Yinchu (馬寅初) graduated from Yale and Columbia in 1910–1914, a decade before banker Lin Zhang (林障) completed his studies at the same institutions in 1920–1922 (P34).
Type D places represent the weakest form of association, bringing together students from different generations who pursued diverse academic disciplines at the same alma mater. For instance, place P161 exemplifies a Type D place where three graduates from Stanford University, spanning a period between 1905 and 1922, pursued distinct fields of study ranging from engineering to the humanities. These multigenerational and multidisciplinary places encompass a significant number of students, often exceeding ten individuals, who share a common educational institution.
Interested readers are encouraged to refer to our comprehensive research paper (Armand 2024), which offers an in-depth analysis of the different types of places.

This part is not directly reproducible because it depends on the intrinsic qualities of the dataset. Nevertheless, it is worth mentioning because the rationale can be adapted to other data and research questions. The places were manually coded based on the attributes of the universities and the students they encompassed. For students, the attributes considered were nationality, period of study, and the nature of their degree. For universities, the coding was based on the region in which they were located (East Coast, West, or Midwest).

While it is beyond the scope of this study to detail the methodology used for classifying places, we provide the typology below as an illustration, used to compute the number of places in each category. We hope this case study inspires other researchers to create their own typology of places, tailored to their research questions and data attributes:

# load data 
library(readr)
placetypo <- read_delim("data/placetypo.csv", 
    delim = "\t", escape_double = FALSE, 
    col_types = cols(...1 = col_skip()), 
    trim_ws = TRUE)

kable(head(placetypo)) %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

PlaceNumber	PlaceLabel	NbElements	NbSets	PlaceDetail	Nationality	discipline_variety	discipline_group	degree_variety	degree_highest	Region_variety	Region_code	Mobility	period_nbr	period_group	Type
33	P033(2-2)	2	2	{Lee_Shee-Mou;Parker_Frederick A.} - {Harvard;Massachusetts Institute of Technology}	Multinational	Same Discipline	Sciences	Different Degrees	Master	Same Region	EAST	INTRA	SYNC	1909-1918	TypeA
36	P036(2-2)	2	2	{Ngao_Sz-Chow;Speery_Henry M.} - {Columbia;Michigan}	Multinational	Same Discipline	Humanities	Different Degrees	Master	Different Regions	EM	INTER	SYNC	1919-1935	TypeA
37	P037(2-2)	2	2	{Sze_Ying Tse-yu;Zhen_M.S.} - {Columbia;Massachusetts Institute of Technology}	Chinese	Same Discipline	Sciences	Same Degrees	Master	Same Region	EAST	INTRA	SYNC	1909-1918	TypeA
173	P173(2-1)	2	1	{Harkson_US.;Miller_H.P.} - {Nebraska}	Non-Chinese	Same Discipline	Sciences	Same Degrees	Bachelor	Same Region	MID	NULL	SYNC	1909-1918	TypeA
26	P026(4-2)	4	2	{Chu_Percy;Lee_Alfred S.;Liang_Louis K.L.;Sun_J.H.} - {Columbia;New York University}	Chinese	Different Disciplines	Sci-Pro	Different Degrees	Master	Same Region	EAST	INTRA	SYNC	1919-1935	TypeB
27	P027(3-2)	3	2	{Au_Silwing P.C.;Yee_S.K.;Zee_Andrew} - {Chicago;Michigan}	Chinese	Different Disciplines	Sci-Pro	Different Degrees	Doctorate	Same Region	MID	INTRA	SYNC	1919-1935	TypeB

We count the number of places for each type using the group_by() and count() functions:

placetypo %>% group_by(Type) %>% count(sort = TRUE)

# A tibble: 4 × 2
# Groups:   Type [4]
  Type      n
  <chr> <int>
1 TypeD    25
2 TypeB     8
3 TypeC     7
4 TypeA     4

In the following section, we will demonstrate how to construct and analyze networks of places in order to investigate the structure and dynamics of two-mode networks, taking Sino-American alumni networks as a specific case.

2 Creating A Network of Places & A Network of Sets (Reduction)

2.1 Creating Networks

As highlighted earlier, the result of place detection includes an edge list of places linked by sets (designated as “Edgelist”). We can take advantage of this list to build a network of places linked by universities (Sets) and its transposed network of universities (Sets) linked by places. This involves performing a dual projection, referred to as “reduction” in Figure 1, but applied to the places and sets rather than the students and universities.

To build a network of places linked by universities, we begin by creating an adjacency matrix⁶ from the edgelist:

bimod<-table(Result1$Edgelist$Places, Result1$Edgelist$Set) 
PlacesMatrix<-bimod %*% t(bimod)
diag(PlacesMatrix)<-0

In essence, the chunk above is creating a two-mode network representation using a contingency table (bimod) and then computing the cross-product of this bimodal matrix with its transpose to create a square matrix (PlacesMatrix). Finally, it sets the diagonal elements of this matrix to 0. The resulting matrix may be used for analyzing relationships or patterns between Places and Sets in the context of the data stored in Result1$Edgelist. Below, we provide a detailed explanation of the code line by line:

table(Result1$Edgelist$Places, Result1$Edgelist$Set): This line uses the table function to create a contingency table (cross-tabulation) of the occurrences of each combination of values in the vectors Result1$Edgelist$Places and Result1$Edgelist$Set. The result, assigned to the variable bimod, is a two-dimensional table.
bimod %*% t(bimod): This line performs matrix multiplication. The %*% operator is used for matrix multiplication, and t(bimod) transposes the matrix bimod. The result is a matrix called PlacesMatrix that represents the cross-product of bimod and its transpose.

Next, we use the graph_from_adjacency_matrix() function included in the igraph package to transform the matrix into a network of places linked by universities (Net1):

library(igraph)
Net1 <-graph_from_adjacency_matrix(PlacesMatrix, mode="undirected", weighted = TRUE)

We apply the same method for building the transposed network of universities (Net2) :

bimod2<-table(Result1$Edgelist$Set, Result1$Edgelist$Places)
PlacesMat2<-bimod2 %*% t(bimod2)
diag(PlacesMat2)<-0

Net2<-graph_from_adjacency_matrix(PlacesMat2, mode="undirected", weighted = TRUE)

An alternative method of projection is provided below:

# Creating a network from the list of Places links
Net <- graph_from_data_frame(Result1$Edgelist, directed = FALSE)
## Transformation into a 2-mode network
V(Net)$type <- bipartite_mapping(Net)$type
# Projection
projNet <- bipartite_projection(Net, multiplicity = TRUE)
Net1 <- projNet$proj1  # Network of elements/places
Net2 <- projNet$proj2  # Network of sets (universities)

If you aim to inspect visually your network, using Cytoscape or Gephi might be helpful. In this case, you need to convert your igraph objects into edge lists and export them as comma separated value (csv) files, as shown below.

# Convert igraph objects into edge lists (not run in this session)
  # edgelist1 <- as_edgelist(Net1)
  # edgelist2 <- as_edgelist(Net2)
# Export edge lists and node lists as csv files (not run in this session)
  # write.csv(edgelist1, "edgelist1.csv")
  # write.csv(Result1_df, "nodelist1.csv")
  # write.csv(edgelist2, "edgelist2.csv")

2.2 Visualizing Networks

Let’s plot the network of places linked by universities:

plot(Net1, vertex.size = 5, 
     vertex.color = "orange", 
     vertex.label.color = "black", 
     vertex.label.cex = 0.3, 
     main="Network of places linked by universities")

In the network of places linked by sets (universities), isolated nodes refer to places with only one student who attended only one university, such as P200(1-1), which refers to N.E. Lurton, who studied exclusively at Benton University.

plot(Net2, vertex.size = 5, 
     vertex.color = "light blue", 
     vertex.label.color = "black", 
     vertex.label.cex = 0.3, 
     main="Network of universities linked by places")

In the network of sets (universities) linked by places, isolated nodes refer to universities attended by students who did not study at any other university.

The plot() function from the igraph package includes various arguments. The first argument is required, as it specifies the network object to be plotted. The other arguments are optional. In the above example, we specified the following arguments:

vertex.size: size of vertices (or nodes).
vertex.color: color of vertices.
vertex.label.color: color of vertices labels.
vertex.label.cex: size of vertices labels.
main: title for the graph.

We can remove the labels to improve legibility and adjust the node sizes according to the number of students in each place, to add meaningful information:

V(Net1)$size <- Result1_df$NbElements

plot(Net1, vertex.size = V(Net1)$size, 
     vertex.color = "orange", 
     vertex.label = NA,
     layout = layout_components,
     main="Network of places linked by universities")

For the graph of universities linked by places, we can adjust the node sizes according to the number of students attending each university:

univ_count <- auc %>% group_by(University) %>% count()
V(Net2)$size <- univ_count$n/2

plot(Net2, vertex.size = V(Net2)$size, 
     vertex.color = "light blue", 
     vertex.label = NA, 
     layout = layout_components,
     main="Network of universities linked by places")

As evident from the graphs, the two networks are each made up of a large, densely connected component surrounded by a myriad of isolated nodes and smaller components, which refer to the singular curricula described in the previous section. To substantiate this preliminary visual exploration, it is recommended to turn to network metrics.

2.3 Applying One-Mode Network Metrics to Places and Sets

In network analysis, we usually distinguish between global metrics, which serve to characterize the overall structure of the network, and local metrics, which characterize the vertices and their relative position in the network.

2.3.1 Global metrics

There are many metrics to define the structure of networks. In the following, we focus on the most basic ones, which can be computed with the following functions from the igraph package:

summary(): provides summary statistics on the network (nature of the network, number of vertices and ties, attributes if applicable).
edge_density(): density of the graph.
count_components(): number of components.
components()$size: size of components.
table(E()$weight): table of edge weight.

summary(Net1)

IGRAPH 84fe129 UNW- 223 1606 -- 
+ attr: name (v/c), size (v/n), weight (e/n)

summary(Net2)

IGRAPH 3c351af UNW- 146 197 -- 
+ attr: name (v/c), size (v/n), weight (e/n)

In both cases, the summary indicates that the two networks are undirected (U), named (N), weighted (W) networks (UNW), which are the options chosen during the transformation. The network of places linked by universities (Net1) contains 223 vertices (places) and 1606 ties (sets of universities). The network of universities linked by places includes 146 vertices (sets of universities) and 197 ties (places). The name of vertices is the only attribute.

edge_density(Net1)

[1] 0.06488102

edge_density(Net2)

[1] 0.01861124

The edge_density() function is useful mostly when comparing networks of similar order (with similar number of vertices). In our example, the network of places linked by universities is denser than the network of universities linked by places.

count_components(Net1)

[1] 39

count_components(Net2)

[1] 39

The two networks comprise 39 components each:

components(Net1)$csize

 [1] 184   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[20]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[39]   1

components(Net2)$csize

 [1] 105   3   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[20]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[39]   1

The tables show the size of the components. The largest component includes 184 vertices. The remaining components include one dyad and a myriad of isolated vertices. The largest component in the network of universities includes 105 vertices. The other components includes one triad, one dyad, and a myriad of isolated vertices.

Since the networks are weighted, it is interesting to examine the relative weight of ties by simply using the table() function in base R:

table(E(Net1)$weight)


   1    2 
1594   12

table(E(Net2)$weight)


  1   2   4 
190   6   1

The network of places consists mostly of simple ties (1,594) and 12 ties with a weight of 2. These 12 ties represent pairs of places that share 2 sets (i.e., 2 two different combinations of universities). Similarly, the network of universities also consists primarily of simple ties (190), along with 6 ties of weight 2 (6 pairs of universities sharing 2 places) and one tie of weight 4 (one pair of universities sharing 4 places).

Let’s find out what are these most significant pairs by filtering the ties whose weight is superior to 1:

E(Net1)[weight > 1]

+ 12/1606 edges from 84fe129 (vertex names):
 [1] P003(1-4)--P016(1-3) P003(1-4)--P018(1-3) P003(1-4)--P026(4-2)
 [4] P003(1-4)--P121(1-2) P007(1-3)--P019(1-3) P007(1-3)--P085(1-2)
 [7] P015(1-3)--P031(2-2) P015(1-3)--P052(1-2) P016(1-3)--P018(1-3)
[10] P016(1-3)--P026(4-2) P017(1-3)--P072(1-2) P018(1-3)--P026(4-2)

E(Net2)[weight == 2]

+ 6/197 edges from 3c351af (vertex names):
[1] Columbia           --California           
[2] Columbia           --Pomona               
[3] Columbia           --Chicago              
[4] New York University--Pennsylvania         
[5] Pennsylvania       --Hawaii               
[6] Pennsylvania       --St. John's University

E(Net2)[weight == 4]

+ 1/197 edge from 3c351af (vertex names):
[1] Columbia--New York University

The results indicate that the most frequent circulations occurred between Columbia and New York University, which largely reflects their geographical proximity, as both universities are located in New York City. Other important ties link more distant universities, such as California and Columbia, or Hawaii and the University of Pennsylvania, which suggests that physical proximity was not the only factor accounting for the students’ mobility. Further investigation is required to understand the logic underlying these strong ties, but one advantage of network analysis is to point out connections that would otherwise remain unnoticed.

2.3.2 Local metrics

There are many metrics to measure the relative position of vertices in networks. In the following, we focus on the most popular centrality metrics, which can all be computed with the igraph package:

Degree: the number of ties a node has. It is the simplest measure of centrality. In the following, we use a normalized version of the measure in order enable comparisons across networks built from different data structure. Since the projected networks are valued, it is also possible to calculate a weighted degree based on the intensity of the links. In the network of universities linked by places, for example, it measures the number of connections or ties (e.g., places) each university has with other universities in the network. It can be interpreted as a measure of university popularity or the extent to which a university is actively connected to other institutions.
Eigenvector: the number of connections a node has to other well-connected nodes. It is a measure of the influence of a node in a network. In the network of universities, it quantifies the extent to which a university is connected to other influential universities. Universities with high eigenvector centrality are connected to other highly influential institutions, indicating their own influence and prestige.
Betweenness: the number of times a node acts as a bridge along the shortest path between two other nodes. In this sense, the more central a node is, the greater control it has over the flows that goes through it. It is often considered as a measure of brokerage, or the capacity of a node to mediate between other nodes. In the network of universities linked by places, it quantifies the university’s brokering power or the frequency with which it falls on the shortest path between other universities. Universities with high betweenness centrality play a crucial role in connecting different parts of the network and facilitating students’ exchanges between institutions.
Closeness: the average length of the shortest path between the node and all other nodes in the graph. In this sense, the more central a node is, the closer it is to all other nodes. It measures how quickly influence can spread from one university to another through direct or indirect connections. Universities with high closeness centrality are considered central in terms of being well-connected and easily reachable within the network. This is particularly relevant in the subnetwork of specialized institutions, such as the Massachusetts Institute of Technology.

Since betweenness and closeness centralities make sense only in fully connected networks, we first need to extract the main component in each network. We can resort to the components() function to find the id number of the largest component in each network. In this specific case, the largest component is component n°1 in Net1 and n°3 in Net2. Then, we apply the induced_subgraph function to extract these components:

# components(Net1) (not run in this session)
# components(Net2) (not run in this session)

Net1MC <- induced_subgraph(Net1,vids=components(Net1)$membership==1)
Net2MC <- induced_subgraph(Net2,vids=components(Net2)$membership==1)

The following code serves to compute the centrality metrics in the network of universities and compile them in a coherent data frame. We chose to normalize degree centrality to facilitate comparisons with other metrics and across networks.

Degree2 <- degree(Net2MC, normalized = TRUE) 
Eig2 <- eigen_centrality(Net2MC)$vector 
Betw2 <- betweenness(Net2MC)
univ_metrics <- cbind(Degree2, Eig2, Betw2) 
univ_metrics_df <- as.data.frame(univ_metrics)

head(univ_metrics_df %>% arrange(desc(Degree2)))

                      Degree2      Eig2     Betw2
Columbia            0.3750000 1.0000000 2191.8655
Pennsylvania        0.1923077 0.5089454  695.0678
Chicago             0.1923077 0.4721450 1338.0047
Harvard             0.1730769 0.3653417 1020.2555
California          0.1346154 0.4282910  476.5158
New York University 0.1153846 0.6514840  343.6654

The table presents the 6 first universities ranked by degree centrality. Columbia University clearly stands out, meaning that it attracted the larger number of students. Columbia also has the highest eigenvector centrality, which means that it was connected to other important universities. It also shows a high betweenness centrality score, meaning that it serves as an important bridge in the academic network.

We can visualize the relative importance of universities in the network by indexing the size of vertices on their centrality metrics. In the following example, we chose to make the size of vertices proportionate to their degree centrality:

V(Net2MC)$size <- degree(Net2MC)

plot(Net2MC,
     vertex.color="light blue",
     vertex.shape = "circle",
     vertex.size = V(Net2MC)$size/2, 
     vertex.label.color = "black", 
     vertex.label.cex = V(Net2MC)$size/100, 
     main="Network of universities",
     sub = "The size of vertices represents their degree centrality.")

In my article (Armand 2024), I demonstrated how centrality metrics can be utilized to categorize universities and to study the formation of university ranking based on their position in the networks. Interested readers are invited to consult this paper for further information.

Based on this initial investigation, it is evident that the networks of Sino-American alumni exhibit significant heterogeneity, consisting of various subgroups and communities that are connected with differing degrees of density. In the upcoming section, we will delve into the application of community detection to identify subgroups of more densely connected nodes within the two networks.

3 Communities of Places and Sets

The purpose of this section is twofold :

Substantively, to understand how academic communities took shape through the interconnection of students’ trajectories (the reader can transpose the concepts of place and community to his own data and research questions).
Methodologically, to illustrate the duality of place-based networks and to demonstrate the value of jointly analyzing the network of places (elements) and its transposed network of universities (sets).

The igraph package offers various methods for detecting communities. In this document, we selected the Louvain algorithm (Blondel et al. 2008), one of the most popular methods for finding communities, especially but not exclusively in large networks.

To keep this document simple, we opted to concentrate on the initial outcome of the Louvain algorithm. Nevertheless, it is advisable to examine and contrast various algorithms and multiple iterations of the same algorithm. These variations can potentially yield distinct results, which might significantly influence the ultimate conclusions.

3.1 Finding Communities

To detect communities with the Louvain algorithm, we apply the cluster_louvain() function included in igraph. We continue to focus on the main component to avoid the detection of artificial clusters made up of isolated nodes:

set.seed(2024)
lvc1 <- cluster_louvain(Net1MC)
lvc2 <- cluster_louvain(Net2MC)

Let’s inspect the results:

print(lvc1)

IGRAPH clustering multi level, groups: 7, mod: 0.52
+ groups:
  $`1`
   [1] "P001(1-4)"  "P004(1-4)"  "P007(1-3)"  "P012(1-3)"  "P014(1-3)" 
   [6] "P016(1-3)"  "P017(1-3)"  "P018(1-3)"  "P019(1-3)"  "P026(4-2)" 
  [11] "P030(3-2)"  "P034(2-2)"  "P035(2-2)"  "P036(2-2)"  "P037(2-2)" 
  [16] "P048(1-2)"  "P056(1-2)"  "P061(1-2)"  "P070(1-2)"  "P072(1-2)" 
  [21] "P085(1-2)"  "P088(1-2)"  "P091(1-2)"  "P098(1-2)"  "P101(1-2)" 
  [26] "P114(1-2)"  "P119(1-2)"  "P128(1-2)"  "P132(1-2)"  "P136(1-2)" 
  [31] "P138(1-2)"  "P149(15-1)" "P169(2-1)"  "P180(1-1)"  "P221(1-1)" 
  
  $`2`
  + ... omitted several groups/vertices

print(lvc2)

IGRAPH clustering multi level, groups: 8, mod: 0.49
+ groups:
  $`1`
   [1] "Columbia"                     "Haverford"                   
   [3] "New York University"          "General Theological Seminary"
   [5] "Trinity"                      "University of Edinburgh"     
   [7] "Pomona"                       "Colorado"                    
   [9] "Denver"                       "Stanford"                    
  [11] "National (Manila)"            "Philippine"                  
  [13] "Bucknell"                     "Crozen Theological Seminary" 
  [15] "Wake Forest"                  "North Central College"       
  [17] "Rochester"                    "Butler"                      
  + ... omitted several groups/vertices

The algorithm found 7 communities of places and 9 communities of universities. The modularity scores (mod.)⁷ are quite satisfactory (0.52 and 0.49, respectively). As shown in the tables below, the size of communities ranges from 10 to 42 vertices in the network of places, and from 6 to 21 vertices in the network of universities.

table(sizes(lvc1))


10 20 25 26 35 42 
 1  1  1  2  1  1

table(sizes(lvc2))


 6 10 11 13 14 16 21 
 1  1  1  1  2  1  1

In the next section, we will show how to visualize the communities.

3.2 Visualizating Communities

3.2.1 Network of Places

First, we create a group for each community and we set a different color for each group:

V(Net1MC)$group <- lvc1$membership
V(Net1MC)$color <- lvc1$membership

Next, we plot the communities using the plot() function from igraph:

plot(lvc1, Net1MC, vertex.label=NA,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size=1.8,
     main="Communities of places", 
     sub = "Louvain method")

On the clustered graph, black ties connect vertices within each group, whereas red ties link vertices across different communities.

3.2.2 Network of Sets (Universities)

Similarly, we create a group for each community of universities and we set a different color for each group:

V(Net2MC)$group <- lvc2$membership  # create a group for each community
V(Net2MC)$color <- lvc2$membership # node color reflects group membership

Next, we plot the communities using the plot() function from igraph:

plot(lvc2, Net2MC, vertex.label=NA,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size=3,
     main="Communities of universities", 
     sub = "Louvain method")

We need to acknowledge that these visualizations contain an overwhelming amount of information, which impose significant limitations on their practical utility. To facilitate a meaningful interpretation of the results, it is preferable to extract and scrutinize each community separately.

3.3 Extracting Communities

The following code serves to retrieve the membership information contained in the results of community detection (lvc1$membership) in a coherent data frame. Additionally, we compute the size of communities and we join this data with the detailed description of the places.

place_clusters <- data.frame(lvc1$membership,
                          lvc1$names) %>% 
  group_by(lvc1.membership) %>% 
  add_tally() %>% # add size of clusters
  rename(PlaceLabel = lvc1.names, cluster_no = lvc1.membership, cluster_size = n) %>%
  select(cluster_no, cluster_size, PlaceLabel)

place_clusters <- inner_join(place_clusters, Result1_df, by = "PlaceLabel") 


kable(head(place_clusters), caption = "Communities of places (6 first places)") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

Communities of places (6 first places)
cluster_no	cluster_size	PlaceLabel	PlaceNumber	NbElements	NbSets	PlaceDetail
1	35	P001(1-4)	1	1	4	{Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
2	25	P002(1-4)	2	1	4	{Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
3	26	P003(1-4)	3	1	4	{Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
1	35	P004(1-4)	4	1	4	{Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
4	20	P005(1-3)	5	1	3	{Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
3	26	P006(1-3)	6	1	3	{Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}

We follow the same method for extracting community membership in the network of universities:

univ_clusters <- data.frame(lvc2$membership,
                        lvc2$names)  %>% 
  group_by(lvc2.membership) %>%  
  add_tally() %>% # add size of clusters
  rename(University = lvc2.names, cluster_no = lvc2.membership, 
         cluster_size = n) %>%
  select(cluster_no, cluster_size, University)


kable(head(univ_clusters), caption = "Communities of universities (6 first places)") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

Communities of universities (6 first places)
cluster_no	cluster_size	University
1	21	Columbia
2	14	Garrett Biblical Institute
2	14	Northwestern
2	14	Ohio Wesleyan
2	14	McCormick Seminary
2	14	Wabash

In the following steps, we extract the communities of places as individual graphs:

gp1 <- induced_subgraph(Net1MC, V(Net1MC)$group==1)  
gp2 <- induced_subgraph(Net1MC, V(Net1MC)$group==2) 
gp3 <- induced_subgraph(Net1MC, V(Net1MC)$group==3) 
gp4 <- induced_subgraph(Net1MC, V(Net1MC)$group==4) 
gp5 <- induced_subgraph(Net1MC, V(Net1MC)$group==5) 
gp6 <- induced_subgraph(Net1MC, V(Net1MC)$group==6)
gp7 <- induced_subgraph(Net1MC, V(Net1MC)$group==7)

Similarly, we extract the communities of universities:

gu1 <- induced_subgraph(Net2MC, V(Net2MC)$group==1)  
gu2 <- induced_subgraph(Net2MC, V(Net2MC)$group==2) 
gu3 <- induced_subgraph(Net2MC, V(Net2MC)$group==3) 
gu4 <- induced_subgraph(Net2MC, V(Net2MC)$group==4) 
gu5 <- induced_subgraph(Net2MC, V(Net2MC)$group==5) 
gu6 <- induced_subgraph(Net2MC, V(Net2MC)$group==6)
gu7 <- induced_subgraph(Net2MC, V(Net2MC)$group==7)
gu8 <- induced_subgraph(Net2MC, V(Net2MC)$group==8)

To illustrate the duality of place-based networks, we will plot the corresponding communities of places and universities to visually compare their structure.

3.4 Visual Comparisons

Using this method, three main categories of communities can be identified based on their topological structure:

a. Star-like communities are characterized by a prominent and highly influential university that attracts students from a wide range of smaller institutions. Columbia University serves as a prime example, having developed into a comprehensive institution with diverse academic offerings and exceptional resources for postgraduate research. Its reputation and extensive curriculum make it an attractive destination for students seeking a broad educational experience.

The two plots below compare the Columbia-centered community and the corresponding community of places. The hairball structure of the community of places is transposed into the star-like structure of the community of universities:

plot(gp1, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size= 5,
     main="Columbia community (places)")

plot(gu1, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = degree(gu1)*0.15, 
     vertex.size= degree(gu1)*1.5,
     main="Columbia community (universities)")

b. Chain-type communities can be observed in two scenarios. First, in specialized curricula like engineering and technical studies, institutions such as MIT and Purdue University form chains where student mobility is limited due to the specialized nature of their programs. Second, in cases when American students are dispersed across numerous peripheral institutions with little curricular coherence, chain structures also emerge, as illustrated by the Princeton community below:

plot(gp2, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size= 5,
     main="Princeton Community (places)")

plot(gu2, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = degree(gu2)*0.15, 
     vertex.size= degree(gu2)*1.5, # node size proportionate to node degree (in cluster)
     main="Princeton community (universities)")

c. Hybrid structures consist of two equally central institutions that have established their own networks of feeder colleges. These institutions not only exchange students but also complement or compete in terms of program offerings. For example, Cornell and the University of California attracted students who navigated between the two due to their strong programs in science, engineering, and agriculture:

plot(gp5, vertex.label=V(Net1MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = 0.5, 
     vertex.size= 5,
     main="Cornell Community (places)")

plot(gu6, vertex.label=V(Net2MC)$id,
     vertex.label.color = "black", 
     vertex.label.cex = degree(gu6)*0.3, 
     vertex.size= degree(gu6)*2, # node size proportionate to node degree (in cluster)
     main="Cornell community (universities)")

For an in-depth analysis of university and place-based communities, please refer to my aforementioned paper (Armand 2024). In this paper, I have demonstrated how the joint analysis of communities of places and universities can be used to examine patterns of student mobility within select groups of universities.

These visual observations can be corroborated by retrieving the global metrics of each community. To maintain brevity in this document, we will not undertake this process here. Interested readers are encouraged to consult the comprehensive documentation produced by the author for further details.

4 From Structural Equivalence to Regular Equivalence: The k-Places Function

The final section introduces the notion of regular equivalence as a more flexible approach to places or structural equivalence.

The Places package includes a k-places() function which is specifically designed to identify regular equivalence patterns within two-mode networks. The k-places() function is very similar to the place() function. It includes four main arguments:

data: the input data frame (auc).
col.elements: the name of the column of elements (e.g., students).
col.sets: the name of the column of sets (e.g., universities).
k: a natural number that indicates the tolerance threshold.

In the following example, we set k = 1, meaning that we tolerate only one difference among the universities attended by students:

Result2 <- kplaces(data = auc, col.elements = "Name", col.sets = "University", k = 1)
Result2 <- kplaces(auc, "Name", "University", 1) # shorter version, same results

From the initial edge list of 418 students and 146 universities, 219 places and 2 k-places (or “ambiguous cases”) were found.

The k-places() function returns a list with four data frames:

The original two-column data frame and the column “Places” with places labels.
A data frame containing information about places and k-places.
A data frame with the relation of places merged to k-places and the sets in common.
The network of places in a two-mode edgelist format.

The data frame (2) containing information about places and k-places includes the following features:

PlaceLabel contains places and k-places labels. Places labels start with P, followed by the place number, the number of elements in place and the number of sets defining place. K-Places labels start with P, followed by the k-place number, an *, the number of elements in k-place, the number of sets in common, and the value of k.
NbElements contains the number of elements in the place or k-place.
NbSets contains the number of sets defining the place or k-place.
PlaceDetail contains the name of all the elements in the place or k-place and all the sets defining the place or k-place.

Let’s extract the information about places and k-places:

Result2_df <- as.data.frame(Result2$KPlacesData) 

kable(head(Result2_df), caption = "First 6 places/kplaces") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

First 6 places/kplaces
PlaceLabel	NbElements	NbSets	PlaceDetail
P001(1-4)	1	4	{Lacy_Carleton} - {Columbia;Garrett Biblical Institute;Northwestern;Ohio Wesleyan}
P002(1-4)	1	4	{Luccock_Emory W.} - {McCormick Seminary;Northwestern;Wabash;Wooster}
P003(1-4)	1	4	{Ly_J .Usang} - {Columbia;Haverford;New York University;Pennsylvania}
P004(1-4)	1	4	{Pott_Francis L. Hawks} - {Columbia;General Theological Seminary;Trinity;University of Edinburgh}
P005(1-3)	1	3	{Chu_Fred M.C.} - {Chicago;Pratt Institute;Y.M.C.A. College}
P006(1-3)	1	3	{Chung_Elbert} - {Georgetown;Pennsylvania;Southern California}

Next, we focus on k-places and identify the sets they have in common:

Result2k_df <- as.data.frame(Result2$kPlaces) 

kable(Result2k_df, caption = "Kplaces, corresponding places and common sets") %>%
  kable_styling(bootstrap_options = "striped", full_width = T, position = "left")

Kplaces, corresponding places and common sets
	k_places	Places	Common_Sets
380	P007*(2-3-2-1)	P007(1-3)	Columbia,Pomona
384	P007*(2-3-2-1)	P085(1-2)	Columbia,Pomona
45	P017*(2-3-2-1)	P017(1-3)	Chicago,Columbia
87	P017*(2-3-2-1)	P072(1-2)	Chicago,Columbia

The 2 k-places identified contain 2 elements (students) and 3 sets (universities). They have 2 sets in common and one difference:

P007*(2-3-2-1) includes F. Sec Fong and Edward Y.K. Kwong who both attended Columbia University and Pomona College. They differ in that Fong F. Sec also attended the University of California, whereas Edward Y.K. Kwong did not. (The differences can be viewed by consulting the “Place Detail” column in the previous table.)
P017*(2-3-2-1) includes H.C.E. Liu and Jui-Ching Hsia who both attended the University of Chicago and Columbia University. Additionally, H.C.E. Liu studied at Denison University, whereas Jui-Ching Hsia did not.

5 Conclusion

This tutorial has laid the foundations for a standard workflow based on the “Places” R package, which can be reused and adapted for other research across diverse disciplinary fields. This method can be applied to virtually any type of nodes, not only human and social actors, but also objects, concepts, and other entities. Furthermore, it can be extended to multimodal networks involving more than two different types of nodes. For example, the places could encompass not only the universities attended by the students but also their place of birth or the institutions in which they were employed. Interested readers can refer to my article (Armand 2024), which provides a detailed demonstration of the multiple possible uses of place-based networks to advance historical findings. Specifically, this article explores how the network was used to identify typical trajectories among American and Chinese alumni, examine how these educational trajectories shaped future careers and collaborations, and trace the formation of places and the emergence of university rankings over time. Ultimately, we hope this paper will inspire innovative research based on this framework.

The curious reader can refer to the extensive documentation produced by the author on place detection, network projection, community detection, and place formation over time.

Acknowledgments: I am grateful to Delio de Lucena, the creator of the “Places” package, for his assistance and valuable suggestions for improving this documentation. I extend my thanks to Prof. Jiang Jie (Shanghai University) for providing me with a digital copy of the original source book from which the example dataset was created.

Bibliography

ARMAND, Cécile, 2024. Bonding minds, bridging nations: Sino-American alumni networks in the Era of Exclusion (1882-1936). In : HENRIOT, Christian et WU, Jen-shu (éd.), Modern China in Flux: Networks, Mobility, and Transformation. Berlin : De Gruyter. pp. 163‑239.

BLONDEL, Vincent D., GUILLAUME, Jean-Loup, LAMBIOTTE, Renaud et LEFEBVRE, Etienne, 2008. Fast unfolding of communities in large networks. In : Journal of Statistical Mechanics: Theory and Experiment [en ligne]. octobre 2008. Vol. 2008, n° 10, pp. P10008. [Consulté le 28 octobre 2023]. DOI 10.1088/1742-5468/2008/10/P10008. Disponible à l'adresse : https://dx.doi.org/10.1088/1742-5468/2008/10/P10008.

BORGATTI, Stephen P., 2009. Two-Mode Concepts in Social Network Analysis. In : Encyclopedia of complexity and system science. 2009. Vol. 6, pp. 8279‑8291.

BORGATTI, Stephen P et EVERETT, Martin G, 1997. Network analysis of 2-mode data. In : Social Networks. 1997. Vol. 19, n° 3, pp. 243‑269.

BORGATTI, Stephen P. et HALGIN, Daniel S., 2011. Analyzing Affiliation Networks. In : The Sage Handbook of Social Network Analysis [en ligne]. S.l. : SAGE Publications Ltd. [Consulté le 21 août 2023]. ISBN 978-1-4462-9441-3. Disponible à l'adresse : https://doi.org/10.4135/9781446294413.n28.

EVERETT, M. G. et BORGATTI, S. P., 2013. The dual-projection approach for two-mode networks. In : Social Networks [en ligne]. 2013. Vol. 35, n° 2, pp. 204‑210. [Consulté le 21 août 2023]. DOI 10.1016/j.socnet.2012.05.004. Disponible à l'adresse : https://www.sciencedirect.com/science/article/pii/S0378873312000354.

FIELD, Sam, FRANK, Kenneth A., SCHILLER, Kathryn, RIEGLE-CRUMB, Catherine et MULLER, Chandra, 2006. Identifying positions from affiliation networks: Preserving the duality of people and events. In : Social Networks [en ligne]. 2006. Vol. 28, n° 2, pp. 97‑123. [Consulté le 21 août 2023]. DOI 10.1016/j.socnet.2005.04.005. Disponible à l'adresse : https://www.sciencedirect.com/science/article/pii/S0378873305000341.

NEWMAN, M. E. J., STROGATZ, S. H. et WATTS, D. J., 2001. Random graphs with arbitrary degree distributions and their applications. In : Physical Review E [en ligne]. juillet 2001. Vol. 64, n° 2, pp. 026118. [Consulté le 21 août 2023]. DOI 10.1103/PhysRevE.64.026118. Disponible à l'adresse : http://arxiv.org/abs/cond-mat/0007235.

PIZARRO, Narciso, 2002. Appartenances, places et réseaux de places. La reproduction des processus sociaux et la génération d’un espace homogène pour la définition des structures sociales. In : Sociologie et sociétés. 2002. Vol. 31, n° 1, pp. 143‑161.

PIZARRO, Narciso, 2007. Structural Identity and Equivalence of Individuals in Social Networks. In : International Sociology. 2007. Vol. 22, n° 6, pp. 767‑792.

SHANGHAI, American University Club of, 1936. American University Men in China. Shanghai : Comacrib Press.

UZZI, B. et SPIRO, J., 2005. Collaboration and Creativity: The Small World Problem. In : American Journal of Sociology. 2005. Vol. 111, pp. 447.

ZHOU, Tao, REN, Jie, MEDO, Matús et ZHANG, Yi-Cheng, 2007. Bipartite network projection and personal recommendation. In : Physical review. E, Statistical, nonlinear, and soft matter physics. octobre 2007. Vol. 76, pp. 046115.

Glossary

Annexes

Info session

setting	value
version	R version 4.2.2 (2022-10-31)
os	Rocky Linux 8.10 (Green Obsidian)
system	x86_64, linux-gnu
ui	X11
language	(EN)
collate	fr_FR.UTF-8
ctype	fr_FR.UTF-8
tz	Europe/Paris
date	2024-10-03
pandoc	3.1.11 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/x86_64/ (via rmarkdown)

package	ondiskversion	source
dplyr	1.1.4	CRAN (R 4.2.2)
ggplot2	3.4.0	CRAN (R 4.2.2)
igraph	1.3.5	CRAN (R 4.2.2)
kableExtra	1.3.4	CRAN (R 4.2.2)
Places	0.2.3	local
readr	2.1.5	CRAN (R 4.2.2)

Citation

Armand C (2024). “Uncovering Places in Two-Mode Networks.”, doi:10.48645/xxxxxx https://doi.org/10.48645/xxxxxx,, https://rzine.fr/publication_rzine/xxxxxxx/.

BibTex :

@Misc{,
  title = {Uncovering Places in Two-Mode Networks},
  subtitle = {Using Structural Equivalence to Study Affiliation Networks},
  author = {Cécile Armand},
  doi = {10.48645/xxxxxx},
  url = {https://rzine.fr/publication_rzine/xxxxxxx/},
  keywords = {FOS: Other social sciences},
  language = {fr},
  publisher = {FR2007 CIST},
  year = {2024},
  copyright = {Creative Commons Attribution Share Alike 4.0 International},
}

Two-mode network: A specific kind of network that involves two different types of nodes, such as persons and organizations. Such networks are also refer to as affiliation networks or two-mode graphs.↩︎
Place: In a two-mode network, a place refers to an assemblage of type-1 nodes that are associated with the exact same set of type-2 nodes. For example, in an affiliation network linking students with the universities they attended, two or more students form a place if they attended the exact same set of one or more universities.↩︎
Structural equivalence: Two actors in a network are structurally equivalent if they have exactly the same ties to exactly the same other individual actors.↩︎
Regular equivalence: Two actors are regularly equivalent if they are equally related to equivalent others. That is, regular equivalence sets are composed of actors who have similar relations to members of other regular equivalence sets. It correspond quite closely to the sociological concept of a role.↩︎
Edge list: An edge list is a data structure used in network analysis to represent a graph as a list of its edges↩︎
Adjacency Matrix: an adjacency matrix is a matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.↩︎
Modularity score: In network analysis, the modularity score measures the strength of a clustering method on a scale ranging from −0.5 to 1. It indicates how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections. Modularity measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. The modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.↩︎