What is the F1 Score in Machine Learning (Python Example)

When it comes to evaluating the performance of a machine learning model, accuracy is often the first metric that comes to mind. However, accuracy can be misleading in certain situations, especially when dealing with imbalanced datasets. In such cases, F1 score can be a more reliable measure of a model’s effectiveness. In this article, we’ll take a closer look at what F1 score is and how it can be used in machine learning.

What is F1 Score?

F1 score is a measure of a model’s accuracy that takes both precision and recall into account. Precision measures the proportion of true positive predictions among all positive predictions, while recall measures the proportion of true positive predictions among all actual positives in the dataset. F1 score is the harmonic mean of precision and recall:

F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

By using the harmonic mean, F1 score puts more emphasis on the smaller of the two values. This means that a model will only achieve a high F1 score if both precision and recall are high. If one of these values is low, the F1 score will also be low, even if the other value is high.

When to Use F1 Score?

F1 score is especially useful in situations where the dataset is imbalanced. This is because accuracy can be misleading in such cases. For example, if a dataset contains 95% negative samples and only 5% positive samples, a model that predicts all samples as negative will have a high accuracy of 95%, even though it is not useful in practice. In such cases, F1 score can provide a more reliable measure of a model’s performance, as it takes both precision and recall into account.

How to Calculate F1 Score?

Calculating F1 score requires first calculating precision and recall:

  • Precision: the number of true positive predictions divided by the number of true positive predictions plus the number of false positive predictions.
  • Recall: the number of true positive predictions divided by the number of true positive predictions plus the number of false negative predictions.

Once precision and recall have been calculated, F1 score can be calculated using the formula mentioned above.

Python code Examples

Calculating F1 Score using scikit-learn


from sklearn.metrics import f1_score

# True values
y_true = [0, 1, 0, 0, 1, 1, 0, 1]

# Predicted values
y_pred = [0, 1, 0, 0, 0, 1, 1, 1]

# Calculate F1 score
f1 = f1_score(y_true, y_pred)

print(f"F1 score: {f1}")
F1 score: 0.75

Useful Python Libraries for F1 Score

  • scikit-learn: classification_report()
  • numpy: average()
  • pandas: crosstab()

Datasets useful for F1 Score

breast_cancer (from scikit-learn)


from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X = data.data
y = data.target

iris (from scikit-learn)


from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = data.target

Relevant entities

EntityProperty
PrecisionThe number of true positives divided by the sum of true positives and false positives
RecallThe number of true positives divided by the sum of true positives and false negatives
True positiveAn instance that is truly positive and is classified as positive
False positiveAn instance that is actually negative but is classified as positive
False negativeAn instance that is actually positive but is classified as negative
True negativeAn instance that is truly negative and is classified as negative

Important Concepts in F1 Score

  • Confusion matrix
  • Precision
  • Recall
  • Thresholds
  • Binary classification
  • Multiclass classification

Conclusion

F1 score is an important metric to consider when evaluating the performance of a machine learning model, especially when dealing with imbalanced datasets. By taking both precision and recall into account, F1 score provides a more accurate measure of a model’s effectiveness than accuracy alone. As with any metric, it’s important to consider the specific needs of your project when deciding which metric to use.

For more information on F1 score and how it can be used in machine learning, check out the following resources: