Go Back

New Releases

Oct 22, 2021

Aim’s foundations & why we’re building a Tensorboard alternative

Author:Gev Soghomonian

The origins of Aim

In the fateful summer of 2020, our friend mahnerak – a researcher at a non-profit lab was hitting the limits of Tensorboard. He wasn’t going to send the training logs to a third-party cloud. Meanwhile, spending hours on Tensorboard bothered to focus on his actual research. That’s how we decided to build a Tensorboard alternative.

Gor and I started hacking on an open-source library to store metrics and hyperparameters. In a month, mahnerak was using Aim 1.0 instead of Tensorboard to track, store, search and group his metrics.

By fall 2020, Aim 2.0 launched as a free, open-source and self-hosted alternative to Weights and Biases, Tensorboard and MLflow. To our surprise even r/MachineLearning loved it.

By spring 2021, mahnerak co-authoerd a paper WARP (code): Word-level Adverserial ReProgramming on his ACL-published work. At that point Aim users had contributed over 100 feature requests already.

A scale problem

But Aim’s power users — who often do 5K+ runs —were hitting issues.

After over 250 pull requests, 1.2K GitHub stars and 200 feature requests. Live updates, image tracking, distribution tracking… and Aim 2.0 was hitting the limits of Aim 1.0’s design.

In order to support the future, we had to make changes to the foundation now.

Launching Aim 3.0.0

An additional 317 pull requests later, we are excited to launch Aim v3.0.0 !!!

As a result, the most important changes include:

A completely revamped UI

Home page and run detail page
Runs, metrics and params explorers
Bookmarks and Tags

A completely revamped Aim Python SDK

New and much more intuitive (but still quite vanilla) API to track your training runs
New and 10x faster embedded storage based on Rocksdb. This will allow us to store virtually any type of AI metadata. On the contrary, AimRecords was designed for metrics and hyperparams only.

Enjoy the changes!

blog image

Performance improvements

Average run query execution time on ~2000 runs: 0.784s.
Average metrics query execution time on ~2000 runs with 6000 metrics: 1.552s.
New UI works smooth with ~500 metrics displayed at the same time with full Aim table interactions (for comparison, v2 was performant with limitation for only 100 metrics).

Comparisons to familiar tools

Tensorboard

Training run comparison

The tracked params are first class citizens at Aim. So, you can search, group, and aggregate via params. Aim allows to deeply explore all the tracked data (metrics, params, images) on the UI.
With Tensorboard alternative solution the users are forced to record those parameters in the training run name to be able to search and compare. This causes a super tedious comparison experience and usability issues on the UI when there are many experiments and params. After all, TensorBoard doesn’t have features to group, aggregate the metrics

Scalability

Aim can handle 1000s of training runs both on the backend and on the UI.
TensorBoard becomes really slow and hard to use when a few hundred training runs are queried / compared.

Aim will have the beloved TB visualizations

Embedding projector.
Neural network visualization.

MLFlow

MLflow is an end-to-end ML Lifecycle tool. Aim is focused on training tracking. In general, the differences of Aim and MLflow are around the UI scalability and run comparison features.

Run comparison

Aim treats tracked parameters as first-class citizens. Users can query runs, metrics and images. Also, they can filter using the params.
MLflow does have a search by tracked config. However, there is no feature availability such as grouping, aggregation, subplotting by hyparparams .

UI Scalability

Aim UI can handle several thousands of metrics at the same time smoothly with 1000s of steps. It may get shaky when you explore 1000s of metrics with 10000s of steps each. But we are constantly optimizing!
MLflow UI becomes slow to use when there are a few hundreds of runs.

Weights and Biases

Hosted vs self-hosted

Weights and Biases is a hosted closed-source MLOps platform.
Aim is self-hosted, free and open-source experiment tracking tool.

Aim Roadmap

With this version we are also publishing the Aim roadmap for the next 3 months.

This is a living document and we hope that the community will help us shape it towards supporting the most important use-cases.

We are also inviting community contributors to help us get there faster!

Why are we building Aim?

We have started to work on Aim with strong belief that the open-source is in the DNA of AI software (2.0) development.

Existing open-source tools (TensorBoard, MLFlow) are super-inspiring for us.

However we see lots of improvements to be made. Especially around issues like:

ability to handle 1000s of large-scale experiments
actionable, beautiful and performant visualizations
extensibility — how easy are the apis for extension/democratization?

With this in mind, we are inspired to build beautiful and performant AI dev tools with great APIs.

Our mission…

Aim’s mission is to democratize AI dev tools. We believe that the best AI tools need to be:

open-source, open-data-format, community-driven
have great UI/UX, CLI and other interfaces for automation
performant both on UI and data
extensible — enable ways to build around for so many use-cases

Thanks to

Ruben Karapetyan for being the first to believe in this project and spending lots of his time and setting the foundations for the beautiful UI.

Mahnerak for sharing his problems and continuously testing and coming up with better solutions on UX, features. Also for helping us build the next-gen storage for Aim.

Aim users Mohammad Elgaar, Vopani for continuous feedback on our work.

The contributors who have been relentlessly iterating over the course of the summer.

On to the next generation of ML tools!!

Join Us!

Join the Aim community, test Aim out, ask questions, help us build the future of AI tooling!

If you find Aim useful, drop by and star the repo ⭐

Aim 3.18 — Metric min/max/first values support in Runs Explorer, SQLAlchemy 2.0 Support

An end-to-end example of Aim logger used with XGBoost library