Networks
The RELISON library allows the creation, reading, writing and manipulation of networks. Mathematically, a network is modelled as a graph, \(G = \langle \mathcal{U}, E \rangle\), where \(\mathcal{U}\) is the set of users in the network, and \(E \in \mathcal{U}^2\) is the set of links in the network. For each user \(u \in \mathcal{U}\), we define his neighborhood, \(\Gamma(u)\) as the set of people in the network sharing a link with him.
In order to use the basic graph structures, the following package must be included in the pom.xml file in your Maven project.
<dependency>
<groupId>es.uam.eps.ir</groupId>
<artifactId>RELISON-core</artifactId>
<version>1.0.0</version>
</dependency>
Properties of a graph
The RELISON library allows different types of network, depending on its properties. We have to select three options:
Number of edges between users
The first option to consider in a network is the number of edges we allow between the same pair of users. If we consider many of them, we are talking about multigraphs, whereas, if we just consider one, we are using a simple graph.
Direction of the edges
The second consideration defines whether all the edges are reciprocal or not. If they are, there is no difference between the (u,v)
and (v,u)
, and we are talking about an undirected network. This is the case of the Facebook friendship network, where a link has to be accepted by both users before it is created. In case the direction of the edges matters, we are talking about a directed network, as the Twitter follows network.
Direction of the edges
The third and final consideration is referred to the properties of the edges. Sometimes, it is possible to define a function \(w\), which assigns a weight to each of the edges in the network. This weight might define very different properties: the frequency of interaction between the users, the cost of travelling from one node to another, etc. If the network considers these weights, we consider it a weighted network, while an unweighed network just takes binary weights (weight equal to 1 if the edge exists, 0 otherwise).
In the table below, we include the basic interfaces for defining any of those graphs, depending on the properties of the network.
Multigraph |
Directed |
Weighted |
Class |
UndirectedUnweightedGraph |
|||
✓ |
UndirectedWeightedGraph |
||
✓ |
DirectedUnweightedGraph |
||
✓ |
✓ |
DirectedWeightedGraph |
|
✓ |
UndirectedUnweightedMultiGraph |
||
✓ |
✓ |
UndirectedWeightedMultiGraph |
|
✓ |
✓ |
DirectedUnweightedMultiGraph |
|
✓ |
✓ |
✓ |
DirectedWeightedMultiGraph |
Creation of empty graphs
When integrated in another Java library, the RELISON library provides two options for generating empty graphs.
The first option considers the straightforward use of the class constructors. RELISON provides an efficient implementation for each of the network types described above. The obtain this implementation, it is just necessary to add the prefix Fast
to the interface names
described earlier. No additional argument needs to be provided to the constructors. For instance, if we want to create a directed unweighted and simple graph, we would just create it as:
Graph<U> graph = new FastDirectedUnweightedGraph();
The second option uses graph generators: the EmptyGraphGenerator
class builds empty simple graphs, whereas the EmptyMultiGraphGenerator
class does the same for multigraphs. The constructor has to be configured with two parameters, indicating whether the graph is directed and/or weighted. Following the previous example, we would do the following:
Graph<U> graph;
boolean directed = true;
boolean weighted = false;
GraphGenerator<U> ggen = new EmptyGraphGenerator();
ggen.configure(directed, weighted);
graph = ggen.generate();
Reading / writing graphs
Most times, the networks which shall be used in the library are stored in an external file. RELISON provides classes for reading this type of networks. When we want to read the network, we use a GraphReader
. This works as follows:
String file = "file.txt"; // the route of the file
boolean readWeights = true; // if we want to read the weights.
boolean readTypes = false; // if we want to read the types
GraphReader<U> greader; // we assume here that it has been configured
Graph<U> graph = greader.read(file, readWeights, readTypes);
To write a graph into a file, we use a GraphWriter
. Once it is created, this works as follows:
String file = "file.txt"; // the route of the file
Graph<U> graph; // the graph to write
GraphWriter<U> gwriter; // the graph we want to write
boolean writeWeights = true; // if we want to write the edge weights.
boolean writeTypes = false; // if we want to write the edge types
gwriter.write(graph, file, writeWeights, writeTypes);
Basic format
The basic format prints, on each line, an edge of the network. The format of a line is the following (separated by a delimiter):
node1 node2 (weight edgeType)
where node1
and node2
are the identifiers of the nodes, weight
is the weight value (a double value) and type
contains an integer value classifying the edge. The last values are optional. The weight should only be provided when we want to read the weights of the network, and the types must be provided when the user wants to read them.
For instance, let’s suppose that we have a Twitter network with 10,000 users and 100,000 edges. The first two users have, as nicknames, “JavierSanzCruza” and “pcastells”, respectively, and the first follows the second. Then, if we use “,” as delimiters the file would look as:
JavierSanzCruza,pcastells,1.0
<...>
If we want to read graphs from this format, we use the TextGraphReader
and TextMultiGraphReader
classes, depending on whether we want to read a simple network or a multigraph. These classes receive the following arguments:
directed
: true if the network is directed, false otherwise.weighted
: true if the network is weighted, false otherwise.selfloops
: true if we want to read edges from a node to itself, false otherwise.delimiter
: the delimiter separating the different fields in the file. In the main programs of RELISON, it is a tab.uParser
: a parser for reading the type of the nodes from text.
When we want to write graphs into this format, we use the TextGraphWriter
class, which just receives the delimiter in the constructor.
We wanted to note here that this is the format available in the programs provided by the RELISON library, with the fields separated by tabs (the t
character).
Pajek format
This format allows reading and writing networks in the Pajek format (more information in the following link ). These graphs have the following format (space separated):
*Vertices numVertices
vertexId1 "vertexLabel1"
vertexId2 "vertexLabel2"
<...>
vertexIdN "vertexLabelN"
*Edges numEdges
vertexId1.1. vertexId1.2. weight1
<...>
Here, for each node, we differentiate two value: the vertexId
is a numerical value identifying the user, and the vertexLabel
is the actual identifier of the user. For instance, let’s suppose that we have a Twitter network with 10,000 users and 100,000 edges. The first two users have, as nicknames, “JavierSanzCruza” and “pcastells”, respectively, and the first follows the second. Then, the Pajek file would be the following:
.. code:
*Vertices 10000
1 "JavierSanzCruza"
2 "pcastells"
<...>
*Edges 100000
1 2 1.0
<...>
If we want to read graphs from this format, we use the PajekGraphReader
class. This class receives the following arguments:
multigraph
: true if the network is modelled after a multigraph, false otherwise.directed
: true if the network is directed, false otherwise.weighted
: true if the network is weighted, false otherwise.selfloops
: true if we want to read edges from a node to itself, false otherwise.uParser
: a parser for reading the type of the nodes from text.
When we want to write graphs into this format, we use the PajekGraphWriter
class, which does not receive any argument in its constructor.
Differently from the basic format, this one does not allow reading the types of the edges.
Graph manipulation
The RELISON library provides methods for the manipulation of the network (adding nodes, edges, changing the weights of edges, etc.). All this methods are provided in the Graph
interface, but we summarize them here.
Nodes
The simplest way to modify a network is to add or remove one of its edges.
Add nodes
If we want to add a node to the network, we use the following method:
boolean addNode(U user)
- Arguments:
user: the identifier of the user.
- Returns
If the node is added, it returns true. Otherwise, it returns false. A user can only be added once, so, if a node is added twice, the second time, this method will return false.
Remove nodes
If we want to remove a node from the network, we use:
boolean removeNode(U user)
- Arguments:
user: the identifier of the user.
- Returns
If the node is removed, it returns true. Otherwise, it returns false. If the user does not exist in the network, this method will return false.
Edges
The second group of elements that we can modify in a network is the group of edges in the network. In this case, we have several methods of interest.
Add edges
To add edges, we can consider several options. We include here the most complete one, although more of them can be seen on the reference of the Graph
interface, here.
boolean addEdge(U orig, U dest, double weight, int type, boolean insertNodes)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.weight
: the weight of the edge.type
: the type of the edge.insertNodes
: true if we want to add the nodes to the network if they do not exist, false otherwise.
- Returns
If the edge is added, it returns true. Otherwise, it returns false. In simple networks, an edge can only be added once.
Update edge weights
If we want to modify the weight of an edge, we have to use the following methods. In simple networks, we have to use:
boolean updateEdgeWeight(U orig, U dest, double newWeight)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.newWeight
: the new weight of the edge.
- Returns
If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.
In multigraphs, we use the following method instead (the previous one just updates the first created edge between the users):
boolean updateEdgeWeight(U orig, U dest, double newWeight, int idx)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.newWeight
: the new weight of the edge.idx
: the number of the edge between the users to modify.
- Returns
If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.
Update edge types
If we want to modify the type of an edge, we have to use the following methods. In simple networks, we have to use:
boolean updateEdgeType(U orig, U dest, int newType)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.newType
: the new type of the edge.
- Returns
If the edge weight is updated, it returns true. If the edge does not exist, it cannot be updated.
In multigraphs, we use the following method instead (the previous one just updates the first created edge between the users):
boolean updateEdgeType(U orig, U dest, double newType, int idx)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.newType
: the new type of the edge.idx
: the number of the edge between the users to modify.
- Returns
If the edge type is updated, it returns true. If the edge does not exist, it cannot be updated.
Remove edges
Finally, if we want to remove an edge, we have to use the following method:
boolean removeEdge(U orig, U dest)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.
- Returns
If the edge weight is remove, it returns true. If the edge does not exist, it cannot be removed.
In multigraphs, we use the following method instead (the previous one just removes the first created edge between the users):
boolean removeEdge(U orig, U dest, int idx)
- Arguments:
orig
: the first node of the edge.dest
: the second node of the edge.idx
: the number of the edge between the users to remove.
- Returns
If the edge type is updated, it returns true. If the edge does not exist, it cannot be updated.
Also, we provide a method to remove all the edges between two nodes in the multigraph:
boolean removeEdges(U orig, U dest)
- Arguments:
orig
: the first node of the edges.dest
: the second node of the edges.
- Returns
If the edge weight is remove, it returns true. If there are not edges between the users, they cannot be removed.
Accessing the properties of a network
In addition to the methods for modifying the structure of a social network, RELISON also provides methods for accessing to the basic information of a network: the nodes in the network, his neighbors, the weights and types of the edges, etc. Here, we summarize how this can be done.
Users in the network
The first element we explain how to access are the nodes in the network. In order to access the complete set of nodes, we can use the following method:
Stream<U> getAllNodes()
- Returns
An stream object containing the identifiers of all the nodes in the network.
If, instead of accessing to them, we just need to count them, we can use the following method of the graphs:
long getVertexCount()
- Returns
The number of vertices (nodes, users) in the network graph.
Edge properties
The next element we can access are the edges. If we know the endpoints of the edge, we can access their different properties. We differentiate four methods here:
Edge existence
The first method allows us to know if there is a link between two users. The method signature is:
boolean containsEdge(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
true if there is (at least) an edge, false otherwise.
In multigraphs, there is another method which allows us to obtain the number of edges between a pair of users:
int getNumEdges(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
the number of edges between the two nodes.
Edge weight
We might want to access the weight of an edge. For this, in simple graphs, we use the following method:
double getEdgeWeight(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
the weight if it exists,
NaN
otherwise.
In the case of multigraphs, the previous method allows us to obtain the sum of all the edge weights. If we want to obtain the individual weights, we have another method:
List<Double> getEdgeWeights(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
a list containing all the edge weights if there is (at least) one edge between the users, null otherwise.
Edge types
We might want to access the type of an edge. For this, in simple graphs, we use the following method:
int getEdgeWeight(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
the type if it exists, -1 otherwise.
In the case of multigraphs, we have another method, which allows us to retrieve the weights of all the edges:
List<Integer> getEdgeTypes(U orig, U dest)
- Arguments
orig
: the first node.dest
: the second node.
- Returns
a list containing all the edge types if there is (at least) one edge between the users, null otherwise.
Edge number
The last method just allows us to determine how many edges our network has:
long getEdgeCount()
- Returns
the number of edges in the network.
Neighbors of a node
As accessing the edges in the network just by running over all the possible pairs of users and checking if the edge exists would be very slow, RELISON provides methods for accessing the neighborhood of a user. We explain them here.
Existence of neighbors
The first group of methods study whether a node has (or not) neighbors in the network. Although there are several methods for this, here we only explain the following one:
boolean hasNeighbors(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for identifying if the user has incoming neighbors.OUT
: for identifying if the user has outgoing neighbors.UND
: for identifying if the user has either incoming or outgoing neighbors.MUTUAL
: for identifying if the user has neighbors who are both incoming and outgoing.
- Returns
true if the node has neighbors, false otherwise.
Count
The second group of methods allows us to identify how many neighbors a user has.
int getNeighbourhoodSize(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
the number of neighbors of the user.
There is another method, that allows us to study the number of edges connected to a node. This value is the same as the previous one in the case of simple networks, but it changes when it comes to multigraphs:
int degree(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
the number of edges connected to the user (the degree), -1 if the user does not exist.
Neighbors
The third group of methods allows us to identify the actual neighbors of a user in the network. The method we introduce here is the following:
Stream<U> get(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
an stream containing the identifier of the networks.
Weights
The fourth group of methods allows us to identify the weights of the nodes. We can use the following method:
Stream<Weight<U, Double>> getNeighbourhoodWeights(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
an stream containing (user, weight) pairs. In multigraphs, a pair is included for each edge.
Additionally, in multigraphs, we can also use the following method:
Stream<Weight<U, Double>> getNeighbourhoodWeightsLists(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
an stream containing (user, list of weights) pairs.
Edge types
The fifth and last group of methods allows us to identify the weights of the nodes. We can use the following method:
Stream<Weight<U, Integer>> getNeighbourhoodTypes(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
an stream containing (user, type) pairs. In multigraphs, a pair is included for each edge.
Additionally, in multigraphs, we can also use the following method:
Stream<Weight<U, Integer>> getNeighbourhoodTypesLists(U u, EdgeOrientation orient)
- Arguments
u
: the node to study.orient
: the orientation for selecting the neighbors (only useful in directed networks).IN
: for retrieving the incoming neighbors.OUT
: for retrieving the outgoing neighbors.UND
: for retrieving either incoming or outgoing neighbors.MUTUAL
: for retrieving the neighbors who are both incoming and outgoing.
- Returns
an stream containing (user, list of types) pairs.
Adjacency matrix
In addition to the graph object, we can represent the network using the adjacency matrix of the network. This matrix has in the \((u,v)\) coordinate, the weight of the edge between nodes \(u\) and \(v\). We can obtain this matrix using the following method:
double[][] getAdjacencyMatrix(EdgeOrientation orient)
- Arguments
orient
: the orientation for selecting the matrix (only useful in directed networks).IN
: the \((u,v)\) coordinate contains the weight of the \((v,u)\) weight.OUT
: the \((u,v)\) coordinate contains the weight of the \((u,v)\) weight (natural adjacency matrix)UND
: the \((u,v)\) coordinate contains the \(w(u,v) + w(v,u)\).MUTUAL
: the \((u,v)\) coordinate contains the \(w(u,v) + w(v,u)\) value if both edges exist, 0 otherwise.
- Returns
an array containing the matrix.
We should note that the nodes in the network can be represented by any type of identifier (int, long, string, etc.). So, in order to map these identifiers to the indexes in the matrix, we can use the following method:
Index<U> getAdjacencyMatrixMap()
- Returns
an index, mapping user identifiers to indexes in the (0, numNodes-1) rank.