Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Oreilly
The Java Developer’s Dilemma: Part 3 28 October 2025 at 07:08

The Java Developer’s Dilemma: Part 3

Oreilly

By: Markus Eisele

28 October 2025 at 07:08

This is the final part of a three-part series by Markus Eisele. Part 1 can be found here, and Part 2 here.

In the first article we looked at the Java developer’s dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article we explored why new types of applications are needed, and how AI changes the shape of enterprise software. This article focuses on what those changes mean for architecture. If applications look different, the way we structure them has to change as well.

The Traditional Java Enterprise Stack

Enterprise Java applications have always been about structure. A typical system is built on a set of layers. At the bottom is persistence, often with JPA or JDBC. Business logic runs above that, enforcing rules and processes. On top sit REST or messaging endpoints that expose services to the outside world. Crosscutting concerns like transactions, security, and observability run through the stack. This model has proven durable. It has carried Java from the early servlet days to modern frameworks like Quarkus, Spring Boot, and Micronaut.

The success of this architecture comes from clarity. Each layer has a clear responsibility. The application is predictable and maintainable because you know where to add logic, where to enforce policies, and where to plug in monitoring. Adding AI does not remove these layers. But it does add new ones, because the behavior of AI doesn’t fit into the neat assumptions of deterministic software.

New Layers in AI-Infused Applications

AI changes the architecture by introducing layers that never existed in deterministic systems. Three of the most important ones are fuzzy validation, context sensitive guardrails, and observability of model behavior. In practice you’ll encounter even more components, but validation and observability are the foundation that make AI safe in production.

Validation and Guardrails

Traditional Java applications assume that inputs can be validated. You check whether a number is within range, whether a string is not empty, or whether a request matches a schema. Once validated, you process it deterministically. With AI outputs, this assumption no longer holds. A model might generate text that looks correct but is misleading, incomplete, or harmful. The system cannot blindly trust it.

This is where validation and guardrails come in. They form a new architectural layer between the model and the rest of the application. Guardrails can take different forms:

Schema validation: If you expect a JSON object with three fields, you must check that the model’s output matches that schema. A missing or malformed field should be treated as an error.
Policy checks: If your domain forbids certain outputs, such as exposing sensitive data, returning personal identifiers, or generating offensive content, policies must filter those out.
Range and type enforcement: If the model produces a numeric score, you need to confirm that the score is valid before passing it into your business logic.

Enterprises already know what happens when validation is missing. SQL injection, cross-site scripting, and other vulnerabilities have taught us that unchecked inputs are dangerous. AI outputs are another kind of untrusted input, even if they come from inside your own system. Treating them with suspicion is a requirement.

In Java, this layer can be built with familiar tools. You can write bean validation annotations, schema checks, or even custom CDI interceptors that run after each AI call. The important part is architectural: Validation must not be hidden in utility methods. It has to be a visible, explicit layer in the stack so that it can be maintained, evolved, and tested rigorously over time.

Observability

Observability has always been critical in enterprise systems. Logs, metrics, and traces allow us to understand how applications behave in production. With AI, observability becomes even more important because behavior is not deterministic. A model might give different answers tomorrow than it does today. Without visibility, you cannot explain or debug why.

Observability for AI means more than logging a result. It requires:

Tracing prompts and responses: Capturing what was sent to the model and what came back, ideally with identifiers that link them to the original request
Recording context: Storing the data retrieved from vector databases or other sources so you know what influenced the model’s answer
Tracking cost and latency: Monitoring how often models are called, how long they take, and how much they cost
Notifying drift: Identifying when the quality of answers changes over time, which may indicate a model update or degraded performance on specific data

For Java developers, this maps to existing practice. We already integrate OpenTelemetry, structured logging frameworks, and metrics exporters like Micrometer. The difference is that now we need to apply those tools to AI-specific signals. A prompt is like an input event. A model response is like a downstream dependency. Observability becomes an additional layer that cuts through the stack, capturing the reasoning process itself.

Consider a Quarkus application that integrates with OpenTelemetry. You can create spans for each AI call; add attributes for the model name, token count, latency, and cache hits; and export those metrics to Grafana or another monitoring system. This makes AI behavior visible in the same dashboards your operations team already uses.

Mapping New Layers to Familiar Practices

The key insight is that these new layers do not replace the old ones. They extend them. Dependency injection still works. You should inject a guardrail component into a service the same way you inject a validator or logger. Fault tolerance libraries like MicroProfile Fault Tolerance or Resilience4j are still useful. You can wrap AI calls with time-outs, retries, and circuit breakers. Observability frameworks like Micrometer and OpenTelemetry are still relevant. You just point them at new signals.

By treating validation and observability as layers, not ad hoc patches, you maintain the same architectural discipline that has always defined enterprise Java. That discipline is what keeps systems maintainable when they grow and evolve. Teams know where to look when something fails, and they know how to extend the architecture without introducing brittle hacks.

An Example Flow

Imagine a REST end point that answers customer questions. The flow looks like this:

1. The request comes into the REST layer.
2. A context builder retrieves relevant documents from a vector store.
3. The prompt is assembled and sent to a local or remote model.
4. The result is passed through a guardrail layer that validates the structure and content.
5. Observability hooks record the prompt, context, and response for later analysis.
6. The validated result flows into business logic and is returned to the client.

This flow has clear layers. Each one can evolve independently. You can swap the vector store, upgrade the model, or tighten the guardrails without rewriting the whole system. That modularity is exactly what enterprise Java architectures have always valued.

A concrete example might be using LangChain4j in Quarkus. You define an AI service interface, annotate it with the model binding, and inject it into your resource class. Around that service you add a guardrail interceptor that enforces a schema using Jackson. You add an OpenTelemetry span that records the prompt and tokens used. None of this requires abandoning Java discipline. It’s the same stack thinking we’ve always used, now applied to AI.

Implications for Architects

For architects, the main implication is that AI doesn’t remove the need for structure. If anything, it increases it. Without clear boundaries, AI becomes a black box in the middle of the system. That’s not acceptable in an enterprise environment. By defining guardrails and observability as explicit layers, you make AI components as manageable as any other part of the stack.

This is what evaluation in this context means: systematically measuring how an AI component behaves, using tests and monitoring that go beyond traditional correctness checks. Instead of expecting exact outputs, evaluations look at structure, boundaries, relevance, and compliance. They combine automated tests, curated prompts, and sometimes human review to build confidence that a system is behaving as intended. In enterprise settings, evaluation becomes a recurring activity rather than a one-time validation step.

Evaluation itself becomes an architectural concern that reaches beyond just the models themselves. Hamel Husain describes evaluation as a first-class system, not an add-on. For Java developers, this means building evaluation into CI/CD, just as unit and integration tests are. Continuous evaluation of prompts, retrieval, and outputs becomes part of the deployment gate. This extends what we already do with integration testing suites.

This approach also helps with skills. Teams already know how to think in terms of layers, services, and crosscutting concerns. By framing AI integration in the same way, you lower the barrier to adoption. Developers can apply familiar practices to unfamiliar behavior. This is critical for staffing. Enterprises should not depend on a small group of AI specialists. They need large teams of Java developers who can apply their existing skills with only moderate retraining.

There is also a governance aspect. When regulators or auditors ask how your AI system works, you need to show more than a diagram with a “call LLM here” box. You need to show the validation layer that checks outputs, the guardrails that enforce policies, and the observability that records decisions. This is what turns AI from an experiment into a production system that can be trusted.

Looking Forward

The architectural shifts described here are only the beginning. More layers will emerge as AI adoption matures. We’ll see specialist and per-user caching layers to control cost, fine-grained access control to limit who can use which models, and new forms of testing to verify behavior. But the core lesson is clear: AI requires us to add structure, not remove it.

Java’s history gives us confidence. We’ve already navigated shifts from monoliths to distributed systems, from synchronous to reactive programming, and from on-premises to cloud. Each shift added layers and patterns. Each time, the ecosystem adapted. The arrival of AI is no different. It’s another step in the same journey.

For Java developers, the challenge is not to throw away what we know but to extend it. The shift is real, but it’s not alien. Java’s history of layered architectures, dependency injection, and crosscutting services gives us the tools to handle it. The result is not prototypes or one-off demos but applications that are reliable, auditable, and ready for the long lifecycles that enterprises demand.

In our book, Applied AI for Enterprise Java Development, we explore these architectural shifts in depth with concrete examples and patterns. From retrieval pipelines with Docling to guardrail testing and observability integration, we show how Java developers can take the ideas outlined here and turn them into production-ready systems.

Oreilly
The Java Developer’s Dilemma: Part 2 21 October 2025 at 07:17

The Java Developer’s Dilemma: Part 2

Oreilly

By: Markus Eisele

21 October 2025 at 07:17

This is the second of a three-part series by Markus Eisele. Part 1 can be found here. Stay tuned for part 3.

Many AI projects fail. The reason is often simple. Teams try to rebuild last decade’s applications but add AI on top: A CRM system with AI. A chatbot with AI. A search engine with AI. The pattern is the same: “X, but now with AI.” These projects usually look fine in a demo, but they rarely work in production. The problem is that AI doesn’t just extend old systems. It changes what applications are and how they behave. If we treat AI as a bolt-on, we miss the point.

What AI Changes in Application Design

Traditional enterprise applications are built around deterministic workflows. A service receives input, applies business logic, stores or retrieves data, and responds. If the input is the same, the output is the same. Reliability comes from predictability.

AI changes this model. Outputs are probabilistic. The same question asked twice may return two different answers. Results depend heavily on context and prompt structure. Applications now need to manage data retrieval, context building, and memory across interactions. They also need mechanisms to validate and control what comes back from a model. In other words, the application is no longer just code plus a database. It’s code plus a reasoning component with uncertain behavior. That shift makes “AI add-ons” fragile and points to a need for entirely new designs.

Defining AI-Infused Applications

AI-infused applications aren’t just old applications with smarter text boxes. They have new structural elements:

Context pipelines: Systems need to assemble inputs before passing them to a model. This often includes retrieval-augmented generation (RAG), where enterprise data is searched and embedded into the prompt. But also hierarchical, per user memory.
Memory: Applications need to persist context across interactions. Without memory, conversations reset on every request. And this memory might need to be stored in different ways. In process, midterm and even long-term memory. Who wants to start support conversations by saying your name and purchased products over and over again?
Guardrails: Outputs must be checked, validated, and filtered. Otherwise, hallucinations or malicious responses leak into business workflows.
Agents: Complex tasks often require coordination. An agent can break down a request, call multiple tools or APIs or even other agents, and assemble complex results. Executed in parallel or synchronously. Instead of workflow driven, agents are goal driven. They try to produce a result that satisfies a request. Business Process Model and Notation (BPMN) is turning toward goal-context–oriented agent design.

These are not theoretical. They’re the building blocks we already see in modern AI systems. What’s important for Java developers is that they can be expressed as familiar architectural patterns: pipelines, services, and validation layers. That makes them approachable even though the underlying behavior is new.

Models as Services, Not Applications

One foundational thought: AI models should not be part of the application binary. They are services. Whether they’re served through a container locally, served via vLLM, hosted by a model cloud provider, or deployed on private infrastructure, the model is consumed through a service boundary. For enterprise Java developers, this is familiar territory. We have decades of experience consuming external services through fast protocols, handling retries, applying backpressure, and building resilience into service calls. We know how to build clients that survive transient errors, timeouts, and version mismatches. This experience is directly relevant when the “service” happens to be a model endpoint rather than a database or messaging broker.

By treating the model as a service, we avoid a major source of fragility. Applications can evolve independently of the model. If you need to swap a local Ollama model for a cloud-hosted GPT or an internal Jlama deployment, you change configuration, not business logic. This separation is one of the reasons enterprise Java is well positioned to build AI-infused systems.

Java Examples in Practice

The Java ecosystem is beginning to support these ideas with concrete tools that address enterprise-scale requirements rather than toy examples.

Retrieval-augmented generation (RAG): Context-driven retrieval is the most common pattern for grounding model answers in enterprise data. At scale this means structured ingestion of documents, PDFs, spreadsheets, and more into vector stores. Projects like Docling handle parsing and transformation, and LangChain4j provides the abstractions for embedding, retrieval, and ranking. Frameworks such as Quarkus then extend those concepts into production-ready services with dependency injection, configuration, and observability. The combination moves RAG from a demo pattern into a reliable enterprise feature.

LangChain4j as a standard abstraction: LangChain4j is emerging as a common layer across frameworks. It offers CDI integration for Jakarta EE and extensions for Quarkus but also supports Spring, Micronaut, and Helidon. Instead of writing fragile, low-level OpenAPI glue code for each provider, developers define AI services as interfaces and let the framework handle the wiring. This standardization is also beginning to cover agentic modules, so orchestration across multiple tools or APIs can be expressed in a framework-neutral way.
Cloud to on-prem portability: In enterprises, portability and control matter. Abstractions make it easier to switch between cloud-hosted providers and on-premises deployments. With LangChain4j, you can change configuration to point from a cloud LLM to a local Jlama model or Ollama instance without rewriting business logic. These abstractions also make it easier to use more and smaller domain-specific models and maintain consistent behavior across environments. For enterprises, this is critical to balancing innovation with control.

These examples show how Java frameworks are taking AI integration from low-level glue code toward reusable abstractions. The result is not only faster development but also better portability, testability, and long-term maintainability.

Testing AI-Infused Applications

Testing is where AI-infused applications diverge most sharply from traditional systems. In deterministic software, we write unit tests that confirm exact results. With AI, outputs vary, so testing has to adapt. The answer is not to stop testing but to broaden how we define it.

Unit tests: Deterministic parts of the system—context builders, validators, database queries—are still tested the same way. Guardrail logic, which enforces schema correctness or policy compliance, is also a strong candidate for unit tests.
Integration tests: AI models should be tested as opaque systems. You feed in a set of prompts and check that outputs meet defined boundaries: JSON is valid, responses contain required fields, values are within expected ranges.
Prompt testing: Enterprises need to track how prompts perform over time. Variation testing with slightly different inputs helps expose weaknesses. This should be automated and included in the CI pipeline, not left to ad hoc manual testing.

Because outputs are probabilistic, tests often look like assertions on structure, ranges, or presence of warning signs rather than exact matches. Hamel Husain stresses that specification-based testing with curated prompt sets is essential, and that evaluations should be problem-specific rather than generic. This aligns well with Java practices: We design integration tests around known inputs and expected boundaries, not exact strings. Over time, this produces confidence that the AI behaves within defined boundaries, even if specific sentences differ.

Collaboration with Data Science

Another dimension of testing is collaboration with data scientists. Models aren’t static. They can drift as training data changes or as providers update versions. Java teams cannot ignore this. We need methodologies to surface warning signs and detect sudden drops in accuracy on known inputs or unexpected changes in response style. They need to be fed back into monitoring systems that span both the data science and the application side.

This requires closer collaboration between application developers and data scientists than most enterprises are used to. Developers must expose signals from production (logs, metrics, traces) to help data scientists diagnose drift. Data scientists must provide datasets and evaluation criteria that can be turned into automated tests. Without this feedback loop, drift goes unnoticed until it becomes a business incident.

Domain experts play a central role here. Looking back at Husain, he points out that automated metrics often fail to capture user-perceived quality. Java developers shouldn’t leave evaluation criteria to data scientists alone. Business experts need to help define what “good enough” means in their context. A clinical assistant has very different correctness criteria than a customer service bot. Without domain experts, AI-infused applications risk delivering the wrong things.

Guardrails and Sensitive Data

Guardrails belong under testing as well. For example, an enterprise system should never return personally identifiable information (PII) unless explicitly authorized. Tests must simulate cases where PII could be exposed and confirm that guardrails block those outputs. This is not optional. While a best practice on the model training side, especially RAG and memory carry a lot of risks for exactly that personal identifiable information to be carried across boundaries. Regulatory frameworks like GDPR and HIPAA already enforce strict requirements. Enterprises must prove that AI components respect these boundaries, and testing is the way to demonstrate it.

By treating guardrails as testable components, not ad hoc filters, we raise their reliability. Schema checks, policy enforcement, and PII filters should all have automated tests just like database queries or API endpoints. This reinforces the idea that AI is part of the application, not a mysterious bolt-on.

Edge-Based Scenarios: Inference on the JVM

Not all AI workloads belong in the cloud. Latency, cost, and data sovereignty often demand local inference. This is especially true at the edge: in retail stores, factories, vehicles, or other environments where sending every request to a cloud service is impractical.

Java is starting to catch up here. Projects like Jlama allow language models to run directly inside the JVM. This makes it possible to deploy inference alongside existing Java applications without adding a separate Python or C++ runtime. The advantages are clear: lower latency, no external data transfer, and simpler integration with the rest of the enterprise stack. For developers, it also means you can test and debug everything inside one environment rather than juggling multiple languages and toolchains.

Edge-based inference is still new, but it points to a future where AI isn’t just a remote service you call. It becomes a local capability embedded into the same platform you already trust.

Performance and Numerics in Java

One reason Python became dominant in AI is its excellent math libraries like NumPy and SciPy. These libraries are backed by native C and C++ code, which delivers strong performance. Java has historically lacked first-rate numerics libraries of the same quality and ecosystem adoption. Libraries like ND4J (part of Deeplearning4j) exist, but they never reached the same critical mass.

That picture is starting to change. Project Panama is an important step. It gives Java developers efficient access to native libraries, GPUs, and accelerators without complex JNI code. Combined with ongoing work on vector APIs and Panama-based bindings, Java is becoming much more capable of running performance-sensitive tasks. This evolution matters because inference and machine learning won’t always be external services. In many cases, they’ll be libraries or models you want to embed directly in your JVM-based systems.

Why This Matters for Enterprises

Enterprises cannot afford to live in prototype mode. They need systems that run for years, can be supported by large teams, and fit into existing operational practices. AI-infused applications built in Java are well positioned for this. They are:

Closer to business logic: Running in the same environment as existing services
More auditable: Observable with the same tools already used for logs, metrics, and traces
Deployable across cloud and edge: Capable of running in centralized data centers or at the periphery, where latency and privacy matter

This is a different vision from “add AI to last decade’s application.” It’s about creating applications that only make sense because AI is at their core.

In Applied AI for Enterprise Java Development, we go deeper into these patterns. The book provides an overview of architectural concepts, shows how to implement them with real code, and explains how emerging standards like the Agent2Agent Protocol and Model Context Protocol fit in. The goal is to give Java developers a road map to move beyond demos and build applications that are robust, explainable, and ready for production.

The transformation isn’t about replacing everything we know. It’s about extending our toolbox. Java has adapted before, from servlets to EJBs to microservices. The arrival of AI is the next shift. The sooner we understand what these new types of applications look like, the sooner we can build systems that matter.

Oreilly
The Java Developer’s Dilemma: Part 1 30 September 2025 at 07:09

The Java Developer’s Dilemma: Part 1

Oreilly

By: Markus Eisele

30 September 2025 at 07:09

This is the first of a three-part series by Markus Eisele. Stay tuned for the follow-up posts.

AI is everywhere right now. Every conference, keynote, and internal meeting has someone showing a prototype powered by a large language model. It looks impressive. You ask a question, and the system answers in natural language. But if you are an enterprise Java developer, you probably have mixed feelings. You know how hard it is to build reliable systems that scale, comply with regulations, and run for years. You also know that what looks good in a demo often falls apart in production. That’s the dilemma we face. How do we make sense of AI and apply it to our world without giving up the qualities that made Java the standard for enterprise software?

The History of Java in the Enterprise

Java became the backbone of enterprise systems for a reason. It gave us strong typing, memory safety, portability across operating systems, and an ecosystem of frameworks that codified best practices. Whether you used Jakarta EE, Spring, or later, Quarkus and Micronaut, the goal was the same: build systems that are stable, predictable, and maintainable. Enterprises invested heavily because they knew Java applications would still be running years later with minimal surprises.

This history matters when we talk about AI. Java developers are used to deterministic behavior. If a method returns a result, you can rely on that result as long as your inputs are the same. Business processes depend on that predictability. AI does not work like that. Outputs are probabilistic. The same input might give different results. That alone challenges everything we know about enterprise software.

The Prototype Versus Production Gap

Most AI work today starts with prototypes. A team connects to an API, wires up a chat interface, and demonstrates a result. Prototypes are good for exploration. They aren’t good for production. Once you try to run them at scale you discover problems.

Latency is one issue. A call to a remote model may take several seconds. That’s not acceptable in systems where a two-second delay feels like forever. Cost is another issue. Calling hosted models is not free, and repeated calls across thousands of users quickly adds up. Security and compliance are even bigger concerns. Enterprises need to know where data goes, how it’s stored, and whether it leaks into a shared model. A quick demo rarely answers those questions.

The result is that many prototypes never make it into production. The gap between a demo and a production system is large, and most teams underestimate the effort required to close it.

Why This Matters for Java Developers

Java developers are often the ones who receive these prototypes and are asked to “make them real.” That means dealing with all the issues left unsolved. How do you handle unpredictable outputs? How do you log and monitor AI behavior? How do you validate responses before they reach downstream systems? These are not trivial questions.

At the same time, business stakeholders expect results. They see the promise of AI and want it integrated into existing platforms. The pressure to deliver is strong. The dilemma is that we cannot ignore AI, but we also cannot adopt it naively. Our responsibility is to bridge the gap between experimentation and production.

Where the Risks Show Up

Let’s make this concrete. Imagine an AI-powered customer support tool. The prototype connects a chat interface to a hosted LLM. It works in a demo with simple questions. Now imagine it deployed in production. A customer asks about account balances. The model hallucinates and invents a number. The system has just broken compliance rules. Or imagine a user submits malicious input and the model responds with something harmful. Suddenly you’re facing a security incident. These are real risks that go beyond “the model sometimes gets it wrong.”

For Java developers, this is the dilemma. We need to preserve the qualities we know matter: correctness, security, and maintainability. But we also need to embrace a new class of technologies that behave very differently from what we’re used to.

The Role of Java Standards and Frameworks

The good news is that the Java ecosystem is already moving to help. Standards and frameworks are emerging that make AI integration less of a wild west. The OpenAI API turns into a standard, providing a way to access models in a standard form, regardless of vendor. That means code you write today won’t be locked in to a single provider. The Model Context Protocol (MCP) is another step, defining how tools and models can interact in a consistent way.

Frameworks are also evolving. Quarkus has extensions for LangChain4j, making it possible to define AI services as easily as you define REST endpoints. Spring has introduced Spring AI. These projects bring the discipline of dependency injection, configuration management, and testing into the AI space. In other words, they give Java developers familiar tools for unfamiliar problems.

The Standards Versus Speed Dilemma

A common argument against Java and enterprise standards is that they move too slowly. The AI world changes every month, with new models and APIs appearing at a pace that no standards body can match. At first glance, it looks like standards are a barrier to progress. The reality is different. In enterprise software, standards are not the anchors holding us back. They’re the foundation that makes long-term progress possible.

Standards define a shared vocabulary. They ensure that knowledge is transferable across projects and teams. If you hire a developer who knows JDBC, you can expect them to work with any database supported by the driver ecosystem. If you rely on Jakarta REST, you can swap frameworks or vendors without rewriting every service. This is not slow. This is what allows enterprises to move fast without constantly breaking things.

AI will be no different. Proprietary APIs and vendor-specific SDKs can get you started quickly, but they come with hidden costs. You risk locking yourself in to one provider, or building a system that only a small set of specialists understands. If those people leave, or if the vendor changes terms, you’re stuck. Standards avoid that trap. They make sure that today’s investment remains useful years from now.

Another advantage is the support horizon. Enterprises don’t think in terms of weeks or hackathon demos. They think in years. Standards bodies and established frameworks commit to supporting APIs and specifications over the long term. That stability is critical for applications that process financial transactions, manage healthcare data, or run supply chains. Without standards, every system becomes a one-off, fragile and dependent on whoever built it.

Java has shown this again and again. Servlets, CDI, JMS, JPA: These standards secured decades of business-critical development. They allowed millions of developers to build applications without reinventing core infrastructure. They also made it possible for vendors and open source projects to compete on quality, not just lock-in. The same will be true for AI. Emerging efforts like LangChain4j and the Java SDK for the Model Context Protocol or the Agent2Agent Protocol SDK will not slow us down. They’ll enable enterprises to adopt AI at scale, safely and sustainably.

In the end, speed without standards leads to short-lived prototypes. Standards with speed lead to systems that survive and evolve. Java developers should not see standards as a constraint. They should see them as the mechanism that allows us to bring AI into production, where it actually matters.

Performance and Numerics: Java’s Catching Up

One more part of the dilemma is performance. Python became the default language for AI not because of its syntax, but because of its libraries. NumPy, SciPy, PyTorch, and TensorFlow all rely on highly optimized C and C++ code. Python is mostly a frontend wrapper around these math kernels. Java, by contrast, has never had numerics libraries of the same adoption or depth. JNI made calling native code possible, but it was awkward and unsafe.

That is changing. The Foreign Function & Memory (FFM) API (JEP 454) makes it possible to call native libraries directly from Java without the boilerplate of JNI. It’s safer, faster, and easier to use. This opens the door for Java applications to integrate with the same optimized math libraries that power Python. Alongside FFM, the Vector API (JEP 508) introduces explicit support for SIMD operations on modern CPUs. It allows developers to write vectorized algorithms in Java that run efficiently across hardware platforms. Together, these features bring Java much closer to the performance profile needed for AI and machine learning workloads.

For enterprise architects, this matters because it changes the role of Java in AI systems. Java isn’t the only orchestration layer that calls external services. With projects like Jlama, models can run inside the JVM. With FFM and the Vector API, Java can take advantage of native math libraries and hardware acceleration. That means AI inference can move closer to where the data lives, whether in the data center or at the edge, while still benefiting from the standards and discipline of the Java ecosystem.

The Testing Dimension

Another part of the dilemma is testing. Enterprise systems are only trusted when they’re tested. Java has a long tradition of unit testing and integration testing, supported by standards and frameworks that every developer knows: JUnit, TestNG, Testcontainers, Jakarta EE testing harnesses, and more recently, Quarkus Dev Services for spinning up dependencies in integration tests. These practices are a core reason Java applications are considered production-grade. Hamel Husain’s work on evaluation frameworks is directly relevant here. He describes three levels of evaluation: unit tests, model/human evaluation, and production-facing A/B tests. For Java developers treating models as black boxes, the first two levels map neatly onto our existing practice: unit tests for deterministic components and black-box evaluations with curated prompts for system behavior.

AI-infused applications bring new challenges. How do you write a unit test for a model that gives slightly different answers each time? How do you validate that an AI component works correctly when the definition of “correct” is fuzzy? The answer is not to give up testing but to extend it.

At the unit level, you still test deterministic components around the AI service: context builders, data retrieval pipelines, validation, and guardrail logic. These remain classic unit test targets. For the AI service itself, you can use schema validation tests, golden datasets, and bounded assertions. For example, you may assert that the model returns valid JSON, contains required fields, or produces a result within an acceptable range. The exact words may differ, but the structure and boundaries must hold.

At the integration level, you can bring AI into the picture. Dev Services can spin up a local Ollama container or mock inference API for repeatable test runs. Testcontainers can manage vector databases like PostgreSQL with pgvector or Elasticsearch. Property-based testing libraries such as jqwik can generate varied inputs to expose edge cases in AI pipelines. These tools are already familiar to Java developers; they simply need to be applied to new targets.

The key insight is that AI testing must complement, not replace, the testing discipline we already have. Enterprises cannot put untested AI into production and hope for the best. By extending unit and integration testing practices to AI-infused components, we give stakeholders the confidence that these systems behave within defined boundaries. Even when individual model outputs are probabilistic.

This is where Java’s culture of testing becomes an advantage. Teams already expect comprehensive test coverage before deploying. Extending that mindset to AI ensures that these applications meet enterprise standards, not just demo requirements. Over time, testing patterns for AI outputs will mature into the same kind of de facto standards that JUnit brought to unit tests and Arquillian brought to integration tests. We should expect evaluation frameworks for AI-infused applications to become as normal as JUnit in the enterprise stack.

A Path Forward

So what should we do? The first step is to acknowledge that AI is not going away. Enterprises will demand it, and customers will expect it. The second step is to be realistic. Not every prototype deserves to become a product. We need to evaluate use cases carefully, ask whether AI adds real value, and design with risks in mind.

From there, the path forward looks familiar. Use standards to avoid lock-in. Use frameworks to manage complexity. Apply the same discipline you already use for transactions, messaging, and observability. The difference is that now you also need to handle probabilistic behavior. That means adding validation layers, monitoring AI outputs, and designing systems that fail gracefully when the model is wrong.

The Java developer’s dilemma is not about choosing whether to use AI. It’s about how to use it responsibly. We cannot treat AI like a library we drop into an application and forget about. We need to integrate it with the same care we apply to any critical system. The Java ecosystem is giving us the tools to do that. Our challenge is to learn quickly, apply those tools, and keep the qualities that made Java the enterprise standard in the first place.

This is the beginning of a larger conversation. In the next article we will look at new types of applications that emerge when AI is treated as a core part of the architecture, not just an add-on. That’s where the real transformation happens.