Think Less, Save More: Reducing Energy Footprint with Modular AI
- Istvan Benedek
- Mar 31
- 3 min read
Updated: Apr 1

In machine learning — and in human reasoning — we often make the basic but critical mistake of expecting a single model, a single pass, or a single act of inference to solve a deeply complex problem. This expectation is not only unrealistic but also computationally inefficient. It’s a design error rooted in a misunderstanding of how complex solutions should be built.
In the world of AI, computational demands are growing by the day — often by the hour. That’s why it truly matters how much energy we consume when training deep learning models. In my own development work, my desktop machine runs continuously, consuming ~ 14 kWh of electricity per day, 5MWh/year. Ladies and Gents, that's a lot! With that level of energy usage, the quality and significance of the results I produce really have to count.
Monolithic Models = Wasted time & energy
When facing a high-dimensional problem, it’s tempting to throw everything into one massive neural network and hope it will “figure it out.”. There are bunch of solutions out there where the final delivery contains only one single, big - and, just admit it - insufficient model.
But real AI development doesn’t work that way. Just like we don’t go from raw sensation to philosophical insight in a single cognitive leap, we shouldn’t expect our models to leap from input to understanding in one step.
Complex cognition emerges modularly, in layers of processing. The output of one model is often the starting point of another.
These intermediate representations are not waste — on the contrary, they are crucial assets, rich with structure, insight, and future reuse.
The Power of Reusability
Think of model outputs as thoughts, and checkpoints as mental waypoints. Once you’ve computed a useful feature representation (e.g., an embedding), why recompute it every time? Instead, cache it. Save it. Reuse it. This isn’t just a performance trick — it’s a fundamental architectural principle that saves time, energy, and electricity.
Checkpointing the output of a model allows for:
Parallelism — multiple downstream tasks can operate independently
Specialization — you can train small, task-specific heads efficiently
Memory-based reasoning — revisit high-level abstractions without low-level reprocessing
In short, reusability beats reprocessing — and it shrinks your computational and electrical footprint.
I'm starting to think that every time we freeze a model, it's a strong indicator of where modularity is naturally needed.
Think Like a Thinker
Human thought isn’t monolithic either. We don’t discard every conclusion after we reach it. At least, I don't for sure:)
We remember. We reference. We build new reasoning paths from old conclusions. So why should machine learning do any differently?
We need to save our models’ thoughts time to time. And if we are smart about where we place out checkpoints, we have the chance to discover the theory of divide et impera. And finally, the complex reasoning becomes a matter of reassembly.
Motivation of this Article
I’ve been working on a complex problem for a while: transforming an existing classification model into a regression model with continuous float outputs, while satisfying additional constraints such as monotonicity and interpolative behavior.
After carefully working through the architecture, I was ready to swap out the classification head for a regression head — freezing the convolutional layers and training only the interpolation component.
That was the moment my steady stream of happiness vanished. One epoch with 250,000 training images and 50,000 validation images takes around 15 minutes.
That’s when it hit me: there’s a much smarter way to move forward from here…
Comments