top of page

Think Less, Save More: Reducing Energy Footprint with Modular AI

  • Writer: Istvan Benedek
    Istvan Benedek
  • Mar 31
  • 3 min read

Updated: Apr 1




Generated by ChatGPT 4o
Generated by ChatGPT 4o

In machine learning — and in human reasoning — we often make the basic but critical mistake of expecting a single model, a single pass, or a single act of inference to solve a deeply complex problem. This expectation is not only unrealistic but also computationally inefficient. It’s a design error rooted in a misunderstanding of how complex solutions should be built.


In the world of AI, computational demands are growing by the day — often by the hour. That’s why it truly matters how much energy we consume when training deep learning models. In my own development work, my desktop machine runs continuously, consuming ~ 14 kWh of electricity per day, 5MWh/year. Ladies and Gents, that's a lot! With that level of energy usage, the quality and significance of the results I produce really have to count.


Monolithic Models = Wasted time & energy


When facing a high-dimensional problem, it’s tempting to throw everything into one massive neural network and hope it will “figure it out.”. There are bunch of solutions out there where the final delivery contains only one single, big - and, just admit it - insufficient model.


But real AI development doesn’t work that way. Just like we don’t go from raw sensation to philosophical insight in a single cognitive leap, we shouldn’t expect our models to leap from input to understanding in one step.


Complex cognition emerges modularly, in layers of processing. The output of one model is often the starting point of another.


These intermediate representations are not waste — on the contrary, they are crucial assets, rich with structure, insight, and future reuse.



The Power of Reusability


Think of model outputs as thoughts, and checkpoints as mental waypoints. Once you’ve computed a useful feature representation (e.g., an embedding), why recompute it every time? Instead, cache it. Save it. Reuse it. This isn’t just a performance trick — it’s a fundamental architectural principle that saves time, energy, and electricity.

Checkpointing the output of a model allows for:

  • Parallelism — multiple downstream tasks can operate independently

  • Specialization — you can train small, task-specific heads efficiently

  • Memory-based reasoning — revisit high-level abstractions without low-level reprocessing

In short, reusability beats reprocessing — and it shrinks your computational and electrical footprint.


I'm starting to think that every time we freeze a model, it's a strong indicator of where modularity is naturally needed.


Think Like a Thinker


Human thought isn’t monolithic either. We don’t discard every conclusion after we reach it. At least, I don't for sure:)


We remember. We reference. We build new reasoning paths from old conclusions. So why should machine learning do any differently?


We need to save our models’ thoughts time to time. And if we are smart about where we place out checkpoints, we have the chance to discover the theory of divide et impera. And finally, the complex reasoning becomes a matter of reassembly.


Motivation of this Article


I’ve been working on a complex problem for a while: transforming an existing classification model into a regression model with continuous float outputs, while satisfying additional constraints such as monotonicity and interpolative behavior.


After carefully working through the architecture, I was ready to swap out the classification head for a regression head — freezing the convolutional layers and training only the interpolation component.


That was the moment my steady stream of happiness vanished. One epoch with 250,000 training images and 50,000 validation images takes around 15 minutes.


That’s when it hit me: there’s a much smarter way to move forward from here…

Comments


© 2023 by istvan benedek. Powered and secured by Wix

bottom of page