Join the Community

23,408

Expert opinions

42,308

Total members

280

New members (last 30 days)

171

New opinions (last 30 days)

29,114

Total comments

Join Sign in

Lean, Mean, Green Machines: Optimizing AI for Energy Efficiency

13 October 2024 Be the first to comment

Erica Andersen

Marketing

smartR AI

As AI technology advances, the industry is actively working to meet the increasing demand for electricity and water to support the servers powering this innovation. A standard DGX computer, the gold standard for AI tasks, consumes more than 10KW of power. Big Tech is expected to purchase millions of these systems this year, surpassing the power usage of New York City. With this comes the responsibility to find sustainable ways to manage energy consumption. Researchers and engineers are already working on creative solutions to mitigate the environmental impact.

But it's not just the electricity required to operate these computers. They generate a significant amount of heat, getting really hot, and therefore require cooling. Getting rid of that heat actually consumes twice as much power as the computer itself. So now, that 10KW machine is effectively utilizing 30KW while in operation. These new servers will consume three times more electricity than the entire state of California used in 2022! To address this issue, server farms are exploring alternative cooling methods, like utilizing water, to minimize electricity consumption. While this approach shifts the burden of resources, it also opens up opportunities to develop more efficient and environmentally friendly cooling technologies.

But this uses our precious fresh water to save on electricity for cost cutting.

AI's hungry power consumption is a growing concern and will worsen. Is there a way we can address this issue? Luckily, researchers have already begun exploring more effective approaches to creating and utilizing AI. Model reuse, ReLora, MoE, and quantization are all promising techniques that could help address this issue.

By using model reuse, existing models are retrained for new tasks rather than training from scratch, so we can save time, energy, and resources while also improving performance. Meta and Mixtral have both been releasing reusable models, and leaders in this province.

ReLora and Lora make it possible to minimize the number of calculations required during model retraining for new purposes. This not only saves energy but also allows the utilization of smaller and less power-hungry computers. Consequently, instead of depending on high-energy systems such as NVidia's DGX, a simple graphics card can often be sufficient for the retraining process.

Mistral's recently released MoE models have fewer parameters compared to traditional models. This means fewer calculations and less energy consumption. Additionally, these MoE models only activate the required blocks when in use, similar to turning off lights in unused rooms. As a result, there is a remarkable 65% reduction in energy usage.

Quantization is a cutting-edge method that decreases the size of AI models without significantly affecting their performance. By quantizing a model, the number of bits required to represent each parameter is reduced, resulting in a smaller model size, enabling the use of less powerful and more energy-efficient hardware. For example, a huge 40 billion parameter model would typically need a power-hungry GPU system like the DGX to operate effectively. However, through quantization, this same model can be optimized to run on a low-power consumer GPU, such as those commonly found in laptops. While quantization may lead to a slight decrease in model accuracy in certain scenarios, for many practical purposes, this compromise is minimal or hardly noticeable. The performance remains excellent while demanding only a fraction of the computing resources.

In general, quantization helps AI models become more efficient, compact, and environmentally friendly, reducing the hardware requirements and energy consumption. This enables cutting-edge AI to operate on everyday consumer devices without sacrificing accuracy in crucial areas. Quantization is a crucial advancement towards scalable and sustainable AI.

As an example of what is possible, at smartR AI we managed to repurpose a 47 billion parameter MoE model by utilizing these four methods, retraining it for a client on a server that uses less than 1KW of power, finishing the process in only 10 hours. Additionally, the client can now operate the model on regular Apple Mac computers equipped with energy-efficient M2 silicon chips. We have been privileged to be able to utilize the super computer at EPCC, Edinburgh University, reducing the time span required for training of models substantially – we trained a model from scratch in nearly one hour.

As AI becomes more prevalent, we all need to start thinking more proactively about the energy and water usage. Research into more efficient training and utilization methods is yielding promising results. But we also need to start using these methods actively; by integrating these new techniques into our tool flows, we not only benefit our clients but also contribute to a more sustainable future for our planet.

Written by: Dr Oliver King-Smith is CEO of smartR AI, a company which develops applications based on their SCOTi® AI and alertR frameworks.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

1530

Report

Channels

/artificial intelligence /sustainable

Artificial Intelligence and Financial Services

Join group

460 opinions 126 members 04 July 2025

Comments: (0)

Erica Andersen

Marketing

smartR AI

Member since

08 Jul 2024

Location

Edinburgh

More expert opinions

Steve Wilcockson Technical Product Marketing at Quantexa