Join the Community

21,807
Expert opinions
43,930
Total members
420
New members (last 30 days)
194
New opinions (last 30 days)
28,633
Total comments

Lean, Mean, Green Machines: Optimizing AI for Energy Efficiency

Be the first to comment

As AI technology advances, the industry is actively working to meet the increasing demand for electricity and water to support the servers powering this innovation. A standard DGX computer, the gold standard for AI tasks, consumes more than 10KW of power. Big Tech is expected to purchase millions of these systems this year, surpassing the power usage of New York City. With this comes the responsibility to find sustainable ways to manage energy consumption. Researchers and engineers are already working on creative solutions to mitigate the environmental impact.

But it's not just the electricity required to operate these computers. They generate a significant amount of heat, getting really hot, and therefore require cooling. Getting rid of that heat actually consumes twice as much power as the computer itself. So now, that 10KW machine is effectively utilizing 30KW while in operation. These new servers will consume three times more electricity than the entire state of California used in 2022! To address this issue, server farms are exploring alternative cooling methods, like utilizing water, to minimize electricity consumption. While this approach shifts the burden of resources, it also opens up opportunities to develop more efficient and environmentally friendly cooling technologies.

But this uses our precious fresh water to save on electricity for cost cutting.

AI's hungry power consumption is a growing concern and will worsen. Is there a way we can address this issue? Luckily, researchers have already begun exploring more effective approaches to creating and utilizing AI. Model reuse, ReLora, MoE, and quantization are all promising techniques that could help address this issue.

By using model reuse, existing models are retrained for new tasks rather than training from scratch, so we can save time, energy, and resources while also improving performance. Meta and Mixtral have both been releasing reusable models, and leaders in this province.

ReLora and Lora make it possible to minimize the number of calculations required during model retraining for new purposes. This not only saves energy but also allows the utilization of smaller and less power-hungry computers. Consequently, instead of depending on high-energy systems such as NVidia's DGX, a simple graphics card can often be sufficient for the retraining process.

Mistral's recently released MoE models have fewer parameters compared to traditional models. This means fewer calculations and less energy consumption. Additionally, these MoE models only activate the required blocks when in use, similar to turning off lights in unused rooms. As a result, there is a remarkable 65% reduction in energy usage.

Quantization is a cutting-edge method that decreases the size of AI models without significantly affecting their performance. By quantizing a model, the number of bits required to represent each parameter is reduced, resulting in a smaller model size, enabling the use of less powerful and more energy-efficient hardware. For example, a huge 40 billion parameter model would typically need a power-hungry GPU system like the DGX to operate effectively. However, through quantization, this same model can be optimized to run on a low-power consumer GPU, such as those commonly found in laptops. While quantization may lead to a slight decrease in model accuracy in certain scenarios, for many practical purposes, this compromise is minimal or hardly noticeable. The performance remains excellent while demanding only a fraction of the computing resources.

In general, quantization helps AI models become more efficient, compact, and environmentally friendly, reducing the hardware requirements and energy consumption. This enables cutting-edge AI to operate on everyday consumer devices without sacrificing accuracy in crucial areas. Quantization is a crucial advancement towards scalable and sustainable AI.

As an example of what is possible, at smartR AI we managed to repurpose a 47 billion parameter MoE model by utilizing these four methods, retraining it for a client on a server that uses less than 1KW of power, finishing the process in only 10 hours. Additionally, the client can now operate the model on regular Apple Mac computers equipped with energy-efficient M2 silicon chips. We have been privileged to be able to utilize the super computer at EPCC, Edinburgh University, reducing the time span required for training of models substantially – we trained a model from scratch in nearly one hour.

As AI becomes more prevalent, we all need to start thinking more proactively about the energy and water usage.  Research into more efficient training and utilization methods is yielding promising results. But we also need to start using these methods actively; by integrating these new techniques into our tool flows, we not only benefit our clients but also contribute to a more sustainable future for our planet.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

21,807
Expert opinions
43,930
Total members
420
New members (last 30 days)
194
New opinions (last 30 days)
28,633
Total comments

Trending

Fang Yu

Fang Yu Co-Founder and Chief Product Officer at DataVisor

Navigating Holiday Fraud: Key Strategies for BNPL Providers

Hassan Zebdeh

Hassan Zebdeh Financial Crime Advisor at Eastnets

The hidden world of trade based financial crime

Now Hiring