A Package for Machine Learning Evaluation Reporting

When working on machine learning projects, evaluating a model’s performance is a critical step. The ML-Report-Kit is a Python package that simplifies this process by automating the generation of evaluation metrics and reports. In this post, we’ll take a closer look at what ML-Report-Kit offers and how you can use it effectively.

Figure 1 - Precision Recall Curve and a Confusion Matrix.

Introduction

ML-Report-Kit is designed to help data scientists and machine learning practitioners create comprehensive evaluation reports for supervised learning models. It provides a straightforward way to generate various metrics and visualizations that can aid in understanding model performance.

To use ML-Report-Kit, you first need to install it. You can do this via pip:

pip install ml-report-kit

Once installed, you can easily create a report by following these steps:

from ml_report import MLReport

report = MLReport(y_true, y_pred, y_pred_prob, class_names)
report.run(results_path="results")

This code will generate a report with various metrics, saving it the results folder, containing:

Classification Report: Detailed metrics for each class, including precision, recall, and F1-score.
Confusion Matrix: A visual representation of true vs. predicted classifications.
Precision-Recall Curves: Graphs that show the trade-off between precision and recall at different thresholds.
CSV Files: Data files containing detailed metric values for further analysis.

Running ML-Report-Toolkit on cross-fold classification

This example demonstrates how to use ml-report-kit in a cross-fold classification scenario generating reports for individual folds and the entire dataset. We’ll use the 20 Newsgroups dataset, a popular text classification dataset, to illustrate the process.

Install the following packages

pip install ml-report-kit
pip install scikit-learn

Run the code

import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

from ml_report_kit import MLReport

dataset = fetch_20newsgroups(subset='all', shuffle=True, random_state=42)
k_folds = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
folds = {}

for fold_nr, (train_index, test_index) in enumerate(k_folds.split(dataset.data, dataset.target)):
    x_train, x_test = np.array(dataset.data)[train_index], np.array(dataset.data)[test_index]
    y_train, y_test = np.array(dataset.target)[train_index], np.array(dataset.target)[test_index]
    folds[fold_nr] = {"x_train": x_train, "x_test": x_test, "y_train": y_train, "y_test": y_test}

all_y_true_label = []
all_y_pred_label = []
all_y_pred_prob = []

for fold_nr in folds.keys():
    clf = Pipeline([('tfidf', TfidfVectorizer()), ('clf', LogisticRegression(class_weight='balanced'))])
    clf.fit(folds[fold_nr]["x_train"], folds[fold_nr]["y_train"])
    y_pred = clf.predict(folds[fold_nr]["x_test"])
    y_pred_prob = clf.predict_proba(folds[fold_nr]["x_test"])
    y_true_label = [dataset.target_names[sample] for sample in folds[fold_nr]["y_test"]]
    y_pred_label = [dataset.target_names[sample] for sample in y_pred]
    
    # accumulate the results for all folds to generate a report for the entire dataset
    all_y_true_label.extend(y_true_label)
    all_y_pred_label.extend(y_pred_label)
    all_y_pred_prob.extend(list(y_pred_prob))
    
    # generate the report for the current fold
    report = MLReport(y_true_label, y_pred_label, y_pred_prob, dataset.target_names)
    report.run(results_path="results", fold_nr=fold_nr)

# generate the report for the entire dataset
ml_report = MLReport(all_y_true_label, all_y_pred_label, list(all_y_pred_prob), dataset.target_names, y_id=None)
ml_report.run(results_path="results", final_report=True)

This code will generate reports for each fold and the entire dataset, saving them in the results folder. The reports will include:

classification reports with precision, recall, and F1-score for each class
confusion matrices in both text and image formats
- confusion_matrix.png
- confusion_matrix.txt
the precision-recall curve for each fold and the entire dataset in both raw CSV values and image formats
- precision_recall_threshold_.csv
- precision_recall_threshold_.png

Introduction

Running ML-Report-Toolkit on cross-fold classification

Where to get ML-Report-Kit