Metrics
In order to evaluate link prediction approaches, it is common to use some machine learning and classification metrics. We include the following ones:
Accuracy
The accuracy measures the number of correctly classified links.
where \(TP\) is the number of true positives, \(FP\) is the number of false positives, \(TN\) is the number of true negatives, and \(FN\) is the number of false negatives.
Parameters
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Configuration file
Accuracy:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
Area under the ROC curve
The area under the receiver operating characteristic curve (AUC), as its name indicates, measures the area under a curve. Such curve shows the rate of true positives as a function of the rate of false positives.
Reference: T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874 (2006).
Configuration file
AUC:
F1 Score
The F1 score combines precision and recall (see Precision and Recall, respectively) in a single value. It is the harmonic mean of the two measures:
where \(TP\) is the number of true positives, \(FP\) is the number of false positives and \(FN\) is the number of false negatives.
Parameters
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Configuration file
F1-score:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
Precision
The precision measures the proportion of correctly predicted links among those the algorithm has label as positive.
where \(TP\) is the number of true positives and \(FP\) is the number of false positives.
Parameters
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Configuration file
Precision:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
Recall
The recall measures the proportion of correctly predicted links which have been labeled as positive
where \(TP\) is the number of true positives and \(FN\) is the number of false negatives.
Parameters
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Configuration file
Recall:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]