Inteligência artificial: modelos diminuem e data centers aumentam?

Efficiency in models, expansion in infrastructure: the paradox that is reshaping the logic of artificial intelligence.

By Alexandre Silveira Pupo

Over the past decade, we have witnessed a revolution in the development of Large Language Models (LLMs). While researchers and users are increasingly impressed by the emergence of ever-larger models comprised of hundreds of billions of parameters, another facet of this reality points in a somewhat opposite direction. Efficiency-focused approaches are generating results that are as, or even more, relevant than increasing the size of the models. This phenomenon, where less does not always mean less capacity to generate results, indicates a potential transformation in how the pillars of current Artificial Intelligence (AI) systems are being conceived and developed.

The first pillar supporting reductions in the size of LLMs is quantization, an approach that transforms the representation of the neural network parameters that make up an LLM. Traditionally, in neural networks, weights are defined using 16-bit or 32-bit floating-point numbers. Quantization reduces the number of bits in these numbers without causing a significant loss of precision obtained by the neural networks. In this way, the size of the neural network representation can be reduced without a perceptible reduction in the quality of the results generated by the LLMs, as recent research on the subject shows.123. Consequently, there is a reduction in the amount of computational resources required, and there is also an increase in the performance of the models, allowing LLMs to be used in mobile devices and in low-power devices such as those of the Internet of Things (IoT).

The second pillar is knowledge distillation, a process where knowledge is transferred from a larger model (also called the "teacher") to a smaller model (also called the "student"). Distillation is based on a kind of shortening of the path in terms of knowledge acquisition by the smaller model. This is achieved by using a larger model to generate, for example, synthetic training sets for the smaller model via prior data categorization or response generation by the larger model. An example was the experiment conducted by Google, in which a model with 770 million parameters achieved similar performance to a model with more than 500 billion parameters.45.

The adoption of knowledge quantization and distillation, as well as other techniques, can reduce the size of LLMs (Laboratory Management Learnings) without the consequent reduction in the quality of the results. This trend has been studied, and researchers have already proposed a kind of law on this subject called the "law of densification."“6, in a similar vein to what Gordon Moore proposed.7 Regarding the increase in component density in electronic circuits over time. In the case of LLMs, researchers showed that in approximately 100 days the relationship between performance improvement and model size doubles, pointing to a path of exponential efficiency increase that, if maintained, could create an alternative route for the development of model-based AI technologies.

However, if there is an increase in the efficiency of the models even with size reductions, why are we witnessing a race to build ever larger data centers?

Brazil, for example, projects attracting R$2 trillion in investments by 2035 to boost the technology sector.8. It is estimated that the five largest American companies leading in the fields of AI and cloud computing plan to invest more than $350 billion by 2025 in the construction of data centers.9. This paradoxical scenario, in which AI models are shrinking in size but accompanied by multi-billion dollar investments in ever-larger infrastructures, is proving to be one of the great contradictions of contemporary technological development.

This scenario has several explanations, some of which are interrelated. To begin with, increased efficiency in individual models may not reduce aggregate demand, but rather amplify it. This phenomenon has roots in economics and was recently highlighted by the current CEO of Microsoft.10. Jevons' Paradox was described in the 19th century by William Stanley Jevons and states, succinctly, that improvements in technologies sensitive to price adjustments tend to cause increases in demand as prices decrease.

In the case of AI technologies, the more efficient and cheaper they become, the more their use is democratized and the greater the demand. Smaller models can be used in more devices, in more places, and more frequently. Technologies that were once restricted to large corporations can now be accessed by independent researchers, small businesses, and end users. This democratization increases the number of applications, users, and model instances running simultaneously, demanding more centralized and distributed computing infrastructures.

A second factor concerns the model creation processes. While ready-to-use models are decreasing in size, training new models at the forefront of knowledge continues to demand high computational power. AI research does not cease simply because it has become possible to use smaller models; rather, the opposite is true. This constant evolution generates demands for investment in research into even more advanced models capable of performing more complex tasks.

A third, and perhaps most important, factor is the proliferation of specialized models. Instead of a single model with hundreds of billions or even trillions of parameters to perform tasks in multiple domains of knowledge, specialization in certain tasks ends up demanding several smaller models optimized for specific domains such as medicine, law, programming, finance, natural language processing, computer vision, robotics, among others. Each sector, each industry, each organization, and each individual user wants to have their own model trained with their respective datasets.

Because of these factors, the infrastructure needs to support not only the large, cutting-edge reference models, but also an exponentially growing number of smaller models that are used simultaneously in different geographical, temporal, and functional contexts.

This configuration of demand for computing infrastructures means that the forecast for 2030 is that almost half of the global data center capacity will be occupied by AI demands.11. In this context, Brazil, with an energy matrix predominantly based on renewable resources, is attracting increasingly large billion-dollar investments related to AI-focused data centers, such as those planned for Rio de Janeiro and Rio Grande do Sul.12.

Everything indicates that this paradox will persist. The future of AI technologies will still be based on large reference models that will advance the frontiers of knowledge, while day-to-day operations will depend on smaller models to solve specific problems faced by organizations and individual users.

The tipping point for potentially overcoming this paradox seems to lie in the ability to combine, in the best possible way, the socio-technical elements that guarantee the efficiency and scalability of AI technologies to meet human demands in economic, social, environmental, and governance terms, even if this, to some extent, implies a slowdown in technological advances.

*Alexandre Silveira Pupo is a researcher at ABES Think Tank, PhD and Master of Science in the field of Administration, with specialization and professional development in Business Management, specialization in Information Security and a degree in Computer Science.

Notice: The opinion presented in this article is the responsibility of its author and not of ABES - Brazilian Association of Software Companies

Article originally published on the It Forum website: https://itforum.com.br/colunas/modelos-ia-data-centers-aumentam/