Aim basics: using context and subplots to compare validation and test metrics

The validation and test metrics are used to compare the models on additional unseen data to verify how well they generalise.

Validation and test metrics comparison is a crucial step in ML experiments. ML researchers divide datasets into three subsets — trainvalidation and test so they can test their model performance at different levels.

You train the model on the train subset and collect subsequent metrics to evaluate how well the training is going. After that, you compute loss, accuracy and other metrics.

You can use the validation and test sets to test the model on additional unseen data to verify how well it generalises.

Models are usually ran on validation subset after each epoch. After the training , models are tested on the test subset to verify the final performance and generalisation. There is a need to collect and effectively compare all these metrics.

Here is how to do that on Aim

Using context to track for different subsets?

Use the aim.track context arguments to pass additional information about the metrics. You can use all context parameters to query, group and do other operations on top of the metrics.

import aim

# train loop
for epoch in range(num_epochs):
  for i, (images, labels) in enumerate(train_loader):
    if i % 30 == 0:
      aim.track(loss.item(), name='loss', epoch=epoch, subset='train')
      aim.track(acc.item(), name='accuracy', epoch=epoch, subset='train')
  # calculate validation metrics at the end of each epoch
  # ...
  aim.track(loss.item(), name='loss', epoch=epoch, subset='val')
  aim.track(acc.item(), name='acc', epoch=epoch, subset='val')
  # ...
  # calculate test metrics 
  # ...
  aim.track(loss.item(), name='loss', subset='test')
  aim.track(acc.item(), name='loss', subset='test')

Once the training is ran, execute aim up in your terminal and start the Aim UI.

Using subplots to compare test, val loss and bleu metrics

Note: We’ve used the bleu metric here instead of accuracy because we are looking at Neural Machine Translation experiments. But this works with every other metric too.

Let’s go step-by-step on how to break down lots of experiments using subplots.

Step 1. Explore the runs and the context table, play with the query language.

Aim: Using subplots to compare test, val loss and bleu metrics
Explore the training runs

Step 2. Add the bleu metric to the Select input — query both metrics at the same time. Divide into subplots by metric.

Aim: Using subplots to compare test, val loss and bleu metric
Divide into subplots by metric

Step 3. Search by context.subset to show both test and val loss and bleu metrics. Divide into subplots further by context.subset too so Aim UI shows test and val metrics on different subplots for better comparison.

Aim: Using subplots to compare test, val loss and bleu metric
Divide into subplots by context / subset

Now it’s easy and straightforward to simultaneously compare both 4 metrics and find the best version of the model.


Here is a full summary video of how to compare the validation and test metrics on the UI.

Learn More

If you find Aim useful, support us and star the project on GitHub. Join the Aim community and share more about your use-cases and how we can improve Aim to suit them.

Leave a reply

Your email address will not be published. Required fields are marked *