Seis previsões de observabilidade para 2026, segundo a Dynatrace

Comments by Bernd Greifeneder, Chief Technology Officer and Founder of Dynatrace

Prediction 1: Agent-based Artificial Intelligence (AI) unleashes a new era of systems complexity.

Agent AI is introducing a new level of interaction between systems. It is more powerful, but exponentially more difficult to manage. As agents coordinate tasks, exchange context, and trigger subsequent actions, even well-architected digital environments can spiral into unpredictable behavior. Most organizations are unprepared for this change. Without strong observability and consistent governance, these systems will become increasingly difficult to understand and control.

Imagine each AI agent acting autonomously based on instructions and inputs provided not only by humans, but also by various internal and third-party sources. A single customer interaction can trigger hundreds of background conversations between agents, each taking its own initiative. Roles change depending on the situation, and some agents may guide others.

Common scenarios illustrate how this works. When a vehicle detects a problem, agents specializing in specific tasks can verify customer information, assess service options, estimate timelines, and coordinate a solution. A travel assistance agent might do something similar, contacting agents who compare flights, check loyalty benefits, book transportation, and adjust plans in real time. In both cases, many agents work behind the scenes toward a single outcome, and interactions between them can multiply in unpredictable ways. Each agent still reports to a human or another agent, and accountability remains with human oversight. This exponential growth in communication between agents cannot be managed without observability.

Organizations that adopt agentic AI without unified context and clear safeguards will face increasing costs, wasted time and resources, unpredictable behavior, and ever-greater risks. The challenge is no longer improving individual models, but managing the network of autonomous interactions that unfold in real time. In this next phase, observability is no longer a supporting function. It becomes the foundation for secure, scalable, and governable agent ecosystems.

Prediction 2: The path to autonomy begins with proven operational maturity.

Companies will take significant steps toward autonomous operations. Maturity, not ambition, will determine who succeeds. AI cannot act independently until the underlying systems, automation, and processes are stable, observable, and well understood. Agent systems are available, but first, a solid foundation is needed. The early stages of automation are crucial, as they reveal gaps in data access, service performance, and the contextual signals upon which AI relies. Only after these components are reliable and available in real time will supervised and autonomous operations become established.

Most companies will follow a gradual process: they will start with preventative operations, in which AI identifies and resolves routine problems before they have an impact. Next, they will implement guided automation, where AI proposes actions and humans oversee each step and decision of the AI. Only after trust is earned through repeatable and auditable results will full autonomy emerge, with AI operating within limits and scaling only when necessary.

The journey toward fully autonomous operations will be gradual. Organizations that invest now in proactive workflows and recommendation-enriched automation will be better positioned to safely and responsibly introduce autonomous capabilities.

Prediction 3: Resilience becomes the new benchmark for operational excellence.

Resilience will become the defining measure of digital performance. As systems become more distributed and interconnected, small failures can spread rapidly across applications, cloud regions, payment systems, and third-party services, causing serious damage to the business and its users. Leaders will not treat reliability, availability, security, and observability as separate practices. They will see them as a single requirement: a system's ability to handle disruptions, recover quickly, and maintain a consistent customer experience under stress.

Independent research commissioned by Dynatrace and conducted by FreedomPay reveals why this shift is accelerating. The findings show how fragile digital ecosystems have become and how quickly technical failures translate into customer disruptions and financial losses. In the UK, for example, payment disruptions put an estimated £1.6 billion in annual revenue at risk. In France, the figure rises to €1.9 billion. A single service problem can spread across connected systems and channels, demonstrating how closely intertwined modern operations have become.

Customers notice these failures immediately, and their patience for waiting is limited: many abandon the transaction if the problem persists for more than a few minutes. However, the average interruption lasts more than an hour, meaning that most of the damage has already occurred. Almost one in three customers says that a single incident is enough to reduce their trust in a company, and younger, digitally native consumers are even more likely to abandon the company permanently.

This environment demands a unified approach to resilience. Organizations need shared visibility into how services behave, how failures propagate, and how recovery impacts the customer journey. Resilience will be measured by how systems respond under stress, not just by performance when digital services function as expected.

Prediction 4: Reliability becomes the foundation of Artificial Intelligence progress.

Organizations will prioritize building foundations that make AI systems consistently reliable. The next phase of Artificial Intelligence progress will depend as much on deterministic fundamentals and factual signals as on the generative power of stochastic models. Companies are recognizing that creativity alone is insufficient. Reliable AI requires both structured inputs and mechanisms that ensure the outputs remain reliable.

As previously mentioned, agentic systems add a new layer of complexity. As agents coordinate tasks, exchange contexts, and initiate subsequent actions, even a small misunderstanding can propagate throughout the system. Greater capacity amplifies this effect, as a powerful agent can accelerate results but, at the same time, accelerate an error. This is how hallucination arises at the system scale, not from a single faulty model, but from inaccuracies that accumulate in the interactions between agents. The deterministic basis and end-to-end observability prevent this inaccuracy, ensuring that agents act based on the same factual signals and remain accountable to the human operator.

Common situations demonstrate how this occurs. A vehicle that detects a problem can trigger agents that analyze customer data, vehicle status information, identify service locations, assess schedules, estimate travel time, and plan the entire workflow to resolve the problem. In each case, many agents collaborate behind the scenes to produce a single result. Organizations that want transparent and reliable AI results will prioritize deterministic safeguards, allowing agentic systems to behave safely, act predictably, and collaborate clearly.

Prediction 5: Collaboration between humans and machines as an engine for growth.

In the coming year, the growth of agentic AI will lead to a new operational model, in which humans define the goals and Artificial Intelligence performs a well-defined execution. As systems gain more context and become capable of coordinated actions, the human role will shift from task execution to establishing guidelines, providing instructions, and ensuring supervision. Organizations will rely on Artificial Intelligence to analyze relationships, identify risks, and initiate safe actions, while humans will remain responsible for analysis and decision-making based on the results.

Agent AI will behave much like a high-speed intern. When given clear goals, good tools, instructions, and the right context, it will deliver results at a speed that teams will find difficult to match manually. But it will still need guidance. Humans will set the objective, interpret the payoffs, and make decisions when intent is unclear or outcomes are ambiguous. If something goes wrong, the responsibility will remain with the human operator, not the system.

This operational model will help teams manage complexity in a more predictable way. Artificial Intelligence will take over repetitive or urgent tasks, and humans will focus on strategic decisions and system-level understanding. Growth in the age of agentic AI will come from organizations that combine human judgment with AI-driven execution in a transparent, governed manner aligned with business objectives.

Prediction 6: Convergence of AI and cloud teams

AI will cease to operate as an isolated discipline and will become a standard component of cloud-native software delivery. Teams will integrate Artificial Intelligence into digital services in the same way they integrate databases or other essential systems. As a result, AI engineering, cloud engineering, Site Reliability Engineering (SRE), and security will converge into a shared operational model. pipelines Common principles, shared Service Level Objectives (SLOs), and unified responsibility across the entire lifecycle of AI-enabled services.

This change reflects how modern software already behaves. Artificial Intelligence capabilities influence cost, latency, behavior, and compliance, and these effects encompass the entire process. stack. They cannot be monitored or governed in isolation. To operate reliably in production, AI must run within the same workflows, safeguards, and delivery pipelines used for the rest of the cloud-native system.

End-to-end observability becomes essential because what matters is the complete outcome for the user. The instructions agents receive, the actions they perform, the database calls they trigger, and the costs they incur all contribute to the overall user experience. Observability must track all these signals together and treat the AI components, application logic, and cloud infrastructure as an interconnected system. This eliminates the distinction between “AI observability” and traditional telemetry and creates a unified view that aligns with how customers experience the service.

Organizations that adopt this model will treat AI as a first-class software component. Core teams will define use cases and establish... stacks common practices will ensure compliance, while product teams will incorporate AI directly into their pipelines Delivery. This practical convergence will allow companies to operate AI-based services with the same discipline and predictability as any other cloud-native system.