Networks

The RELISON library allows the creation, reading, writing and manipulation of networks. Mathematically, a network is modelled as a graph, \(G = \langle \mathcal{U}, E \rangle\), where \(\mathcal{U}\) is the set of users in the network, and \(E \in \mathcal{U}^2\) is the set of links in the network. For each user \(u \in \mathcal{U}\), we define his neighborhood, \(\Gamma(u)\) as the set of people in the network sharing a link with him.

In order to use the basic graph structures, the following package must be included in the pom.xml file in your Maven project.

<dependency>
  <groupId>es.uam.eps.ir</groupId>
  <artifactId>RELISON-core</artifactId>
  <version>1.0.0</version>
</dependency>

Properties of a graph

The RELISON library allows different types of network, depending on its properties. We have to select three options:

Number of edges between users

The first option to consider in a network is the number of edges we allow between the same pair of users. If we consider many of them, we are talking about multigraphs, whereas, if we just consider one, we are using a simple graph.

Direction of the edges

The second consideration defines whether all the edges are reciprocal or not. If they are, there is no difference between the (u,v) and (v,u), and we are talking about an undirected network. This is the case of the Facebook friendship network, where a link has to be accepted by both users before it is created. In case the direction of the edges matters, we are talking about a directed network, as the Twitter follows network.

Direction of the edges

The third and final consideration is referred to the properties of the edges. Sometimes, it is possible to define a function \(w\), which assigns a weight to each of the edges in the network. This weight might define very different properties: the frequency of interaction between the users, the cost of travelling from one node to another, etc. If the network considers these weights, we consider it a weighted network, while an unweighed network just takes binary weights (weight equal to 1 if the edge exists, 0 otherwise).

In the table below, we include the basic interfaces for defining any of those graphs, depending on the properties of the network.

Basic classes for the definition of network graphs
Multigraph	Directed	Weighted	Class
			UndirectedUnweightedGraph
		✓	UndirectedWeightedGraph
	✓		DirectedUnweightedGraph
	✓	✓	DirectedWeightedGraph
✓			UndirectedUnweightedMultiGraph
✓		✓	UndirectedWeightedMultiGraph
✓	✓		DirectedUnweightedMultiGraph
✓	✓	✓	DirectedWeightedMultiGraph

Creation of empty graphs

When integrated in another Java library, the RELISON library provides two options for generating empty graphs.

The first option considers the straightforward use of the class constructors. RELISON provides an efficient implementation for each of the network types described above. The obtain this implementation, it is just necessary to add the prefix Fast to the interface names described earlier. No additional argument needs to be provided to the constructors. For instance, if we want to create a directed unweighted and simple graph, we would just create it as:

Graph<U> graph = new FastDirectedUnweightedGraph();

The second option uses graph generators: the EmptyGraphGenerator class builds empty simple graphs, whereas the EmptyMultiGraphGenerator class does the same for multigraphs. The constructor has to be configured with two parameters, indicating whether the graph is directed and/or weighted. Following the previous example, we would do the following:

Graph<U> graph;
boolean directed = true;
boolean weighted = false;

GraphGenerator<U> ggen = new EmptyGraphGenerator();
ggen.configure(directed, weighted);
graph = ggen.generate();

Reading / writing graphs

Most times, the networks which shall be used in the library are stored in an external file. RELISON provides classes for reading this type of networks. When we want to read the network, we use a GraphReader. This works as follows:

String file = "file.txt"; // the route of the file
boolean readWeights = true; // if we want to read the weights.
boolean readTypes = false; // if we want to read the types
GraphReader<U> greader; // we assume here that it has been configured
Graph<U> graph = greader.read(file, readWeights, readTypes);

To write a graph into a file, we use a GraphWriter. Once it is created, this works as follows:

String file = "file.txt"; // the route of the file
Graph<U> graph; // the graph to write
GraphWriter<U> gwriter; // the graph we want to write
boolean writeWeights = true; // if we want to write the edge weights.
boolean writeTypes = false; // if we want to write the edge types
gwriter.write(graph, file, writeWeights, writeTypes);

Basic format

The basic format prints, on each line, an edge of the network. The format of a line is the following (separated by a delimiter):

node1 node2 (weight edgeType)

where node1 and node2 are the identifiers of the nodes, weight is the weight value (a double value) and type contains an integer value classifying the edge. The last values are optional. The weight should only be provided when we want to read the weights of the network, and the types must be provided when the user wants to read them.

For instance, let’s suppose that we have a Twitter network with 10,000 users and 100,000 edges. The first two users have, as nicknames, “JavierSanzCruza” and “pcastells”, respectively, and the first follows the second. Then, if we use “,” as delimiters the file would look as:

JavierSanzCruza,pcastells,1.0
<...>

If we want to read graphs from this format, we use the TextGraphReader and TextMultiGraphReader classes, depending on whether we want to read a simple network or a multigraph. These classes receive the following arguments:

directed: true if the network is directed, false otherwise.
weighted: true if the network is weighted, false otherwise.
selfloops: true if we want to read edges from a node to itself, false otherwise.
delimiter: the delimiter separating the different fields in the file. In the main programs of RELISON, it is a tab.
uParser: a parser for reading the type of the nodes from text.

When we want to write graphs into this format, we use the TextGraphWriter class, which just receives the delimiter in the constructor.

We wanted to note here that this is the format available in the programs provided by the RELISON library, with the fields separated by tabs (the t character).

Pajek format

This format allows reading and writing networks in the Pajek format (more information in the following link ). These graphs have the following format (space separated):

*Vertices numVertices
vertexId1 "vertexLabel1"
vertexId2 "vertexLabel2"
<...>
vertexIdN "vertexLabelN"
*Edges numEdges
vertexId1.1. vertexId1.2. weight1
<...>

Here, for each node, we differentiate two value: the vertexId is a numerical value identifying the user, and the vertexLabel is the actual identifier of the user. For instance, let’s suppose that we have a Twitter network with 10,000 users and 100,000 edges. The first two users have, as nicknames, “JavierSanzCruza” and “pcastells”, respectively, and the first follows the second. Then, the Pajek file would be the following: .. code:

*Vertices 10000
1 "JavierSanzCruza"
2 "pcastells"
<...>
*Edges 100000
1 2 1.0
<...>

If we want to read graphs from this format, we use the PajekGraphReader class. This class receives the following arguments:

multigraph: true if the network is modelled after a multigraph, false otherwise.
directed: true if the network is directed, false otherwise.
weighted: true if the network is weighted, false otherwise.
selfloops: true if we want to read edges from a node to itself, false otherwise.
uParser: a parser for reading the type of the nodes from text.

When we want to write graphs into this format, we use the PajekGraphWriter class, which does not receive any argument in its constructor.

Differently from the basic format, this one does not allow reading the types of the edges.

Graph manipulation

The RELISON library provides methods for the manipulation of the network (adding nodes, edges, changing the weights of edges, etc.). All this methods are provided in the Graph interface, but we summarize them here.

Nodes

The simplest way to modify a network is to add or remove one of its edges.

Add nodes

If we want to add a node to the network, we use the following method:

boolean addNode(U user)

Arguments:

user: the identifier of the user.

Returns

If the node is added, it returns true. Otherwise, it returns false. A user can only be added once, so, if a node is added twice, the second time, this method will return false.

Remove nodes

If we want to remove a node from the network, we use:

boolean removeNode(U user)

Arguments:

user: the identifier of the user.

Returns

If the node is removed, it returns true. Otherwise, it returns false. If the user does not exist in the network, this method will return false.

Edges

The second group of elements that we can modify in a network is the group of edges in the network. In this case, we have several methods of interest.

Add edges

To add edges, we can consider several options. We include here the most complete one, although more of them can be seen on the reference of the Graph interface, here.

boolean addEdge(U orig, U dest, double weight, int type, boolean insertNodes)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
weight: the weight of the edge.
type: the type of the edge.
insertNodes: true if we want to add the nodes to the network if they do not exist, false otherwise.

Returns

If the edge is added, it returns true. Otherwise, it returns false. In simple networks, an edge can only be added once.

Update edge weights

If we want to modify the weight of an edge, we have to use the following methods. In simple networks, we have to use:

boolean updateEdgeWeight(U orig, U dest, double newWeight)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
newWeight: the new weight of the edge.

Returns

If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.

In multigraphs, we use the following method instead (the previous one just updates the first created edge between the users):

boolean updateEdgeWeight(U orig, U dest, double newWeight, int idx)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
newWeight: the new weight of the edge.
idx: the number of the edge between the users to modify.

Returns

If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.

Update edge types

If we want to modify the type of an edge, we have to use the following methods. In simple networks, we have to use:

boolean updateEdgeType(U orig, U dest, int newType)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
newType: the new type of the edge.

Returns

If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.

In multigraphs, we use the following method instead (the previous one just updates the first created edge between the users):

boolean updateEdgeType(U orig, U dest, double newType, int idx)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
newType: the new type of the edge.
idx: the number of the edge between the users to modify.

Returns

If the edge type is updated, it returns true. If the edge does not exist, it cannot be updated.

Remove edges

Finally, if we want to remove an edge, we have to use the following method:

boolean removeEdge(U orig, U dest)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.

Returns

If the edge weight is remove, it returns true. If the edge does not exist, it cannot be removed.

In multigraphs, we use the following method instead (the previous one just removes the first created edge between the users):

boolean removeEdge(U orig, U dest, int idx)

Arguments:

orig: the first node of the edge.
dest: the second node of the edge.
idx: the number of the edge between the users to remove.

Returns

If the edge type is updated, it returns true. If the edge does not exist, it cannot be updated.

Also, we provide a method to remove all the edges between two nodes in the multigraph:

boolean removeEdges(U orig, U dest)

Arguments:

orig: the first node of the edges.
dest: the second node of the edges.

Returns

If the edge weight is remove, it returns true. If there are not edges between the users, they cannot be removed.

Accessing the properties of a network

In addition to the methods for modifying the structure of a social network, RELISON also provides methods for accessing to the basic information of a network: the nodes in the network, his neighbors, the weights and types of the edges, etc. Here, we summarize how this can be done.

Users in the network

The first element we explain how to access are the nodes in the network. In order to access the complete set of nodes, we can use the following method:

Stream<U> getAllNodes()

Returns

An stream object containing the identifiers of all the nodes in the network.

If, instead of accessing to them, we just need to count them, we can use the following method of the graphs:

long getVertexCount()

Returns

The number of vertices (nodes, users) in the network graph.

Edge properties

The next element we can access are the edges. If we know the endpoints of the edge, we can access their different properties. We differentiate four methods here:

Edge existence

The first method allows us to know if there is a link between two users. The method signature is:

boolean containsEdge(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

true if there is (at least) an edge, false otherwise.

In multigraphs, there is another method which allows us to obtain the number of edges between a pair of users:

int getNumEdges(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

the number of edges between the two nodes.

Edge weight

We might want to access the weight of an edge. For this, in simple graphs, we use the following method:

double getEdgeWeight(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

the weight if it exists, NaN otherwise.

In the case of multigraphs, the previous method allows us to obtain the sum of all the edge weights. If we want to obtain the individual weights, we have another method:

List<Double> getEdgeWeights(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

a list containing all the edge weights if there is (at least) one edge between the users, null otherwise.

Edge types

We might want to access the type of an edge. For this, in simple graphs, we use the following method:

int getEdgeWeight(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

the type if it exists, -1 otherwise.

In the case of multigraphs, we have another method, which allows us to retrieve the weights of all the edges:

List<Integer> getEdgeTypes(U orig, U dest)

Arguments

orig: the first node.
dest: the second node.

Returns

a list containing all the edge types if there is (at least) one edge between the users, null otherwise.

Edge number

The last method just allows us to determine how many edges our network has:

long getEdgeCount()

Returns

the number of edges in the network.

Neighbors of a node

As accessing the edges in the network just by running over all the possible pairs of users and checking if the edge exists would be very slow, RELISON provides methods for accessing the neighborhood of a user. We explain them here.

Existence of neighbors

The first group of methods study whether a node has (or not) neighbors in the network. Although there are several methods for this, here we only explain the following one:

boolean hasNeighbors(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for identifying if the user has incoming neighbors.
- OUT: for identifying if the user has outgoing neighbors.
- UND: for identifying if the user has either incoming or outgoing neighbors.
- MUTUAL: for identifying if the user has neighbors who are both incoming and outgoing.

Returns

true if the node has neighbors, false otherwise.

Count

The second group of methods allows us to identify how many neighbors a user has.

int getNeighbourhoodSize(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

the number of neighbors of the user.

There is another method, that allows us to study the number of edges connected to a node. This value is the same as the previous one in the case of simple networks, but it changes when it comes to multigraphs:

int degree(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

the number of edges connected to the user (the degree), -1 if the user does not exist.

Neighbors

The third group of methods allows us to identify the actual neighbors of a user in the network. The method we introduce here is the following:

Stream<U> get(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

an stream containing the identifier of the networks.

Weights

The fourth group of methods allows us to identify the weights of the nodes. We can use the following method:

Stream<Weight<U, Double>> getNeighbourhoodWeights(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

an stream containing (user, weight) pairs. In multigraphs, a pair is included for each edge.

Additionally, in multigraphs, we can also use the following method:

Stream<Weight<U, Double>> getNeighbourhoodWeightsLists(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

an stream containing (user, list of weights) pairs.

Edge types

The fifth and last group of methods allows us to identify the weights of the nodes. We can use the following method:

Stream<Weight<U, Integer>> getNeighbourhoodTypes(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

an stream containing (user, type) pairs. In multigraphs, a pair is included for each edge.

Additionally, in multigraphs, we can also use the following method:

Stream<Weight<U, Integer>> getNeighbourhoodTypesLists(U u, EdgeOrientation orient)

Arguments

u: the node to study.
orient: the orientation for selecting the neighbors (only useful in directed networks).
- IN: for retrieving the incoming neighbors.
- OUT: for retrieving the outgoing neighbors.
- UND: for retrieving either incoming or outgoing neighbors.
- MUTUAL: for retrieving the neighbors who are both incoming and outgoing.

Returns

an stream containing (user, list of types) pairs.

Adjacency matrix

In addition to the graph object, we can represent the network using the adjacency matrix of the network. This matrix has in the \((u,v)\) coordinate, the weight of the edge between nodes \(u\) and \(v\). We can obtain this matrix using the following method:

double[][] getAdjacencyMatrix(EdgeOrientation orient)

Arguments

orient: the orientation for selecting the matrix (only useful in directed networks).
- IN: the \((u,v)\) coordinate contains the weight of the \((v,u)\) weight.
- OUT: the \((u,v)\) coordinate contains the weight of the \((u,v)\) weight (natural adjacency matrix)
- UND: the \((u,v)\) coordinate contains the \(w(u,v) + w(v,u)\).
- MUTUAL: the \((u,v)\) coordinate contains the \(w(u,v) + w(v,u)\) value if both edges exist, 0 otherwise.

Returns

an array containing the matrix.

We should note that the nodes in the network can be represented by any type of identifier (int, long, string, etc.). So, in order to map these identifiers to the indexes in the matrix, we can use the following method:

Index<U> getAdjacencyMatrixMap()

Returns

an index, mapping user identifiers to indexes in the (0, numNodes-1) rank.