AI CLI — Building a Production Newsletter Engine with Java, Spring Boot & LangChain4j

TL;DR

AI CLI is a Java 21 terminal application built on Spring Boot, PicoCLI, and LangChain4j that transforms files using LLMs. It powers three fully automated newsletters — researched, written, validated, and delivered entirely by AI agents, on a schedule, with zero human intervention.

This article walks you through the stack, the architecture, the production lessons, and the live output.

Why Build This?

You probably subscribe to a handful of newsletters. Some of them are great. Most are cluttered, irrelevant, and clearly optimized for someone else's agenda — not yours.

That was the starting point: what if the newsletter was built for you, by you, covering exactly what you care about?

AI CLI was born out of a simple idea — a personal information feed that is 100% in your control. No vendor lock-in. No algorithmic feed deciding what you should see. No attention-harvesting social media platform reducing your world to engagement bait.

Instead, you define exactly what you want through the power of natural language. You write prompt files in Markdown. You choose the LLM, the search engine, the data sources, and the output format. You own the code, the infrastructure, and the data.

With AI CLI, you can spin up a highly specialized newsletter in a matter of hours and run it indefinitely on a schedule via CI/CD — for example:

Morocco Run Radar: all verified running events near Casablanca for the next 60 days
IT Events Casablanca: tech meetups and conferences for developers in the city
Assistant Professor Jobs: academic job openings matching specific criteria

Each of these newsletters is a configuration, not a product. The engine is the same. The prompts are different.

And unlike any SaaS newsletter builder, you can swap the LLM, the embedding model, the search engine, or the delivery platform at any time — because the architecture was designed around pluggability, not dependency.

This is also the follow-up and natural evolution from JBang Meets Spring Boot & LangChain4j and Easy RAG — Using Embeddings in LangChain4j, where we built the foundations of chaining LLM calls and embedding data for context. This time, we went to production.

The Stack

Technology	Version	Role
Java	21	Runtime — Records, Virtual Threads, modern Stream/Optional APIs
Spring Boot	3.4.11	Application framework and dependency injection
PicoCLI	4.7.7	Command-line interface (type-safe argument parsing)
LangChain4j	1.11.0	AI/LLM orchestration — the main course
Playwright + Stealth4j	1.58 / 1.1.2	Browser automation for JS-rendered page crawling
MailerSend SDK	1.4.1	Email delivery
Commonmark	0.27.1	Markdown → HTML rendering (with GFM tables)
DuckDB	—	In-process vector store for RAG

The Beauty of Spring Boot + PicoCLI

Before talking about LangChain4j, let's appreciate the backbone: Spring Boot and PicoCLI working together.

PicoCLI gives us a declarative, type-safe CLI layer where every argument (--chat-model, --tools, --embedding-model, --search-engine) is parsed, validated, and converted before it reaches business logic.

Spring Boot then takes those parsed values and engineers the perfect ephemeral context for each execution. This is the key design insight: the Spring context is different for every run, because the beans loaded depend entirely on what the user requested via CLI flags.

Here is how the ChatModel bean is resolved at startup:

@Configuration
public class AIChatModelConfig {

    @Bean
    ChatModel chatModel(ApplicationArguments aa,
                    ProviderProperties providerProperties, Environment environment,
                    List<ChatModelFactory> factories) {

        ContextUtils.ParsingMainCommand parsingCommand = ContextUtils
                        .parseIntoArgs(new ContextUtils.ParsingMainCommand(), aa, environment);
        String finalModelName = parsingCommand.getMainArgs().getChatModel();

        return factories.stream()
                        .filter(factory -> factory.supports(finalModelName, providerProperties))
                        .findFirst()
                        .map(factory -> factory.create(finalModelName, providerProperties))
                        .orElseThrow(() -> new IllegalArgumentException(
                                        "Unsupported chat model: " + finalModelName));
    }
}

The same factory pattern is replicated for embedding models, vector stores, search engines, and scoring models. This means swapping from OpenAI GPT-5 to a local Ollama model is nothing more than changing a CLI flag — no code change, no reconfiguration, no rebuild. The factory resolves the right bean, and Spring wires it.

Here is the full factory resolution flow across all 5 component types:

Similarly, every tool (search, crawler, RAG, validation) is a Spring @Component guarded by a custom @Conditional annotation. If you don't ask for search, the WebSearchTool bean is never instantiated. This keeps the runtime lean and the configuration explicit.

--tools=search,rag,content_crawler,json_schema_validate

This single flag composes a different Spring context every time. You get the flexibility and versatility of a plugin architecture without writing a single plugin loader — because Spring's dependency injection is the plugin loader.

LangChain4j — The Main Course

LangChain4j is the orchestration layer that connects your Java code to the LLM world. The project uses 16 LangChain4j modules covering:

Chat models (OpenAI, Gemini, DeepSeek, Groq, Ollama)
Embedding models (OpenAI, Gemini, Voyage AI, Ollama, local ONNX)
Vector stores (DuckDB, Lucene, Qdrant)
Search engines (Tavily, Google Custom Search, DuckDuckGo)
RAG infrastructure (document splitting, content retrieval, query routing, reranking)
Agentic workflows (UntypedAgent, AgenticScope)

It's an impressive library that lets you go from "call an LLM" to "build a multi-agent pipeline with RAG, tool calling, and structured output" — all in pure Java.

Architecture Overview

The application follows a layered architecture, where each layer has a clear responsibility:

Four key design principles power this architecture:

Factory Pattern for AI Components — Every LLM component is resolved via a *Factory interface matched against CLI arguments.
Spring @Conditional for Tool Activation — Each tool bean is conditionally instantiated based on the --tools flag.
Optional<> Constructor Injection — Services accept optional dependencies (Optional<IngestionService>, Optional<WebSearchTool>) so that missing beans don't break the wiring.
Stage-Scoped Tools — Each prompt file can declare its own tools in YAML front matter. Tools are resolved per-stage, not globally.

From Multi-Stage Pipelines to Agentic Workflows

This section tells the evolution story — from simple chaining to production-grade agent orchestration.

Level 1: Simple Multi-Stage Prompt Pipeline

The simplest building block is the multi-stage prompt pipeline. You define a directory of numbered Markdown files, each containing a prompt.

Here is a visual overview of the pipeline flow:

You define a directory of numbered Markdown files:

transformations/moroccan_runners/
  ├── 1-research.md         # Stage 1: research via web search
  └── 2-presentation.md     # Stage 2: format into newsletter

Each file can optionally declare tools in YAML front matter:

---
tools: [search, search_social_media, content_crawler, rag, rerank,
        now, json_schema_validate, apify_instagram_scraper, distance]
---
# Prompt: Morocco Running Events Researcher (Phase 1)

You are an expert researcher tasked with finding running events in Morocco...

At runtime, these files are loaded, parsed, and chained together using Function::andThen:

public Function<String, String> build(AssistantRequest ar) {
    return ar.prompts().stream()
            .map(pd -> build(pd, ar))        // Build a GenericAssistant per stage
            .map(this::safeAssistantStep)     // Wrap with null-safety
            .reduce(Function.identity(), Function::andThen);  // Chain stages
}

Each stage gets its own isolated InMemoryChatMemoryStore — this is critical for defeating context pollution, where the tool-call noise from Stage 1 leaks into Stage 2 and causes hallucinations:

InMemoryChatMemoryStore isolatedMemoryStore = new InMemoryChatMemoryStore();
ChatMemory chatMemory = MessageWindowChatMemory.builder()
        .maxMessages(1000)
        .chatMemoryStore(isolatedMemoryStore)
        .build();

The output of Stage 1 becomes the input of Stage 2 — clean, focused, no residual tool-call artifacts.

Level 2: Agentic Workflows with `transform-agentic`

The multi-stage pipeline was powerful but had limitations: all stages shared the same flat Function<String, String> interface. There was no structured memory, no explicit input/output contracts, and no way for Stage 3 to directly read Stage 1's output without it being piped through Stage 2.

Enter transform-agentic — the evolution from chaining functions to orchestrating LangChain4j UntypedAgents via an AgenticScope (a shared state dictionary).

Each stage is now defined by a richer frontmatter contract:

---
name: editor-in-chief
description: Produces the editorial brief for downstream agents.
tools: [now, json_schema_validate]
input_keys: [input]
output_key: brief
---
You are the editor-in-chief for "Morocco Run Radar"...
---
Use {{input}} as the raw operator request and convert it into the editorial brief.

Key differences from the simple pipeline:

Explicit input_keys and output_key — each agent declares what it reads and what it writes
Separate system message and user message sections (separated by ---)
Named agents — each stage has a human-readable name for logging and debugging
State dictionary (AgenticScope) — agents communicate via a shared map, not piping strings

Here is how agents are built and composed:

// Build each agent from its frontmatter definition
var builder = AgenticServices.agentBuilder()
        .chatModel(chatModel)
        .chatMemory(chatMemory)          // isolated per stage
        .name(prompt.name())
        .description(prompt.description())
        .systemMessage(prompt.systemMessage())
        .userMessage(prompt.userMessage())
        .inputs(prompt.inputKeys().stream()
                .map(key -> new AgentArgument(String.class, key))
                .toArray(AgentArgument[]::new))
        .outputKey(prompt.outputKey())
        .tools(stageTools)
        .maxSequentialToolsInvocations(hardLimit);

// Compose all agents into a sequential workflow
return AgenticServices.sequenceBuilder()
        .name("transform-agentic-sequence")
        .subAgents(promptAgents.toArray())
        .outputKey(request.transformation().lastOutputKey())
        .build();

The real-world Moroccan Runners agentic pipeline is a 3-agent sequence:

Agent	Role	Input Key	Output Key	Tools
Editor-in-Chief	Produces the editorial brief	`input`	`brief`	`now`, `json_schema_validate`
Search Specialist	Researches and verifies events	`brief`	`research`	`search`, `crawler`, `apify`, `distance`, `rag`, `rerank`
Output Specialist	Formats the newsletter	`research`	`newsletter`	`markdown_validate`, `markdown_security`

The state flows through the scope as shown in this diagram:

Each agent reads from specific keys and writes to its designated output key. The state dictionary accumulates context across the pipeline — Stage 3 can read Stage 1's output directly without it being piped through Stage 2.

This is a significant leap from Function::andThen. Each agent operates with full awareness of its role, its inputs, and its outputs — and the state dictionary provides structured memory across the pipeline.

Tool System — Empowering the LLM

Tools are what turn an LLM from a text generator into an agent that can act on the world. AI CLI ships with 11 callable tools, each guarded by a @Conditional annotation so it only loads when requested.

The activation flow works like this:

How Spring `@Conditional` Actually Works

The @Conditional annotation is one of Spring's most powerful mechanisms, and it runs very early in the application lifecycle — during the bean definition phase, before any bean is actually instantiated.

Here's the contract: you implement org.springframework.context.annotation.Condition, which gives you a single method — matches(). Spring calls this method while scanning your @Component or @Bean classes. If matches() returns false, the bean is never registered in the application context. It doesn't exist. No constructor is called, no dependencies are wired, no memory is allocated.

This is what our SearchEnabledCondition looks like:

public class SearchEnabledCondition implements Condition {

    @Override
    public boolean matches(ConditionContext context, AnnotatedTypeMetadata metadata) {

        // Access the already-parsed CLI arguments from Spring's bean factory
        ApplicationArguments aa = context.getBeanFactory()
                .getBean(ApplicationArguments.class);

        // Resolve which tools the user requested via --tools=...
        List<Tool> tools = ContextUtils.resolveRequestedTools(
                aa, context.getEnvironment());

        // Only activate if "search" or "rag_search" was requested
        return Optional.ofNullable(tools)
                .filter(l -> l.contains(Tool.SEARCH) || l.contains(Tool.RAG_SEARCH))
                .isPresent();
    }
}

Key things to notice:

The ConditionContext — Spring gives you access to the BeanFactory, the Environment, the ClassLoader, and the ResourceLoader. You can inspect anything about the current application state.
It runs before instantiation — this is not a runtime check. When matches() returns false, the WebSearchTool class is never constructed. This means its dependencies (the WebSearchEngine bean, for example) also don't need to exist.
CLI → Condition → Bean graph — PicoCLI parses --tools=search,rag, Spring stores the arguments as ApplicationArguments, and the Condition reads them to decide which beans to load. This is how a single CLI flag reshapes the entire Spring context.

You then annotate the tool with it:

@Component
@Conditional(SearchEnabledCondition.class)
public class WebSearchTool {
    // Only exists if --tools contains "search"
}

Every tool in AI CLI follows this pattern. The result is that the bean graph is perfectly tailored to whatever the user requested — no unused beans, no wasted connections, no accidental API calls.

Tool	Description
`search`	Web search via Tavily, Google, or DuckDuckGo
`search_social_media`	Targeted `site:` queries for Instagram, Facebook, TikTok
`content_crawler`	Full-page extraction via Playwright + Stealth4j (JS-rendered pages)
`now`	Current date/time in a timezone
`json_schema_validate`	Draft 2020-12 JSON Schema validation with actionable diagnostics
`html_validate`	Newsletter/email HTML safety and compatibility checks
`markdown_validate`	GFM-aware Markdown syntax validation
`apify_instagram_scraper`	Instagram profile scraping via Apify actors
`apify_facebook_scraper`	Facebook page scraping via Apify actors
`distance`	Geocoding + Haversine distance between cities (OpenStreetMap Nominatim)
`markdown_security`	URL extraction + Google Web Risk scanning (malware, phishing)

Here is how the WebSearchTool is implemented — note the constructor injection, rate limiting check, query sanitization, and auto-ingestion into the RAG store:

@Component
@Conditional(SearchEnabledCondition.class)
public class WebSearchTool {

    private final WebSearchEngine webSearchEngine;
    private final Optional<IngestionService> ingestionService;
    private final ToolRateLimiter rateLimiter;

    // Constructor injection...

    @Tool("Performs a web search to find relevant information.")
    public List<WebSearchOrganicResult> search(String query) {
        var limitReached = rateLimiter.tryAcquire("search");
        if (limitReached.isPresent()) {
            return limitReached.get();
        }

        var sanitizedQuery = sanitize(query);
        var results = webSearchEngine.search(request);

        // Auto-ingest into RAG store if IngestionService is available
        ingestionService.ifPresent(service ->
                service.ingestSearchResults(results.results()));

        return results.results();
    }
}

The auto-ingestion is the bridge between the Tool layer and the RAG layer — every search result is automatically embedded into the vector store for retrieval during the same run or future runs. This is the self-feeding loop that makes the system progressively more informed.

That said, this approach has a clear tradeoff: ingesting everything makes the RAG layer noisy over time. Not every search result is relevant, and low-quality pages dilute the store, making retrieval less precise. Some ideas for future releases:

Reranker as a pre-ingestion gate — the scoring model is already wired for retrieval. Running search results through it before ingesting and only keeping segments above a relevance threshold would filter out noise at the source.
TTL-based expiration — the ingestion_timestamp metadata is already stored. Adding a time-to-live filter at retrieval time would let stale entries (e.g., past events) naturally age out of the results.

For now, the current approach works well enough for weekly newsletters where the store is rebuilt frequently. But for long-running stores, smarter ingestion filtering will be necessary.

The Infinite Tool-Loop Caveat

One of the hardest production lessons: LLMs obsessively retry failing tools. A confused model can enter an infinite loop of calling search("running events Morocco") hundreds of times, draining your API budget in minutes.

The ToolRateLimiter is the deterministic Java boundary that stops this:

@Component
public class ToolRateLimiter {

    private final ConcurrentHashMap<String, AtomicInteger> counters = new ConcurrentHashMap<>();

    public Optional<List<WebSearchOrganicResult>> tryAcquire(String toolName) {
        int limit = resolveLimit(toolName);
        AtomicInteger counter = counters.computeIfAbsent(toolName, k -> new AtomicInteger(0));

        if (counter.get() >= limit) {
            return Optional.of(Collections.singletonList(new WebSearchOrganicResult(
                    "SYSTEM", URI.create("https://system"), "LIMIT_REACHED",
                    "SYSTEM NOTIFICATION: You have reached the maximum number of allowed "
                    + toolName + " calls. Do not search again.")));
        }

        counter.incrementAndGet();
        return Optional.empty();
    }
}

Two levels of defense:

Soft limit (per-tool): The ToolRateLimiter returns a LIMIT_REACHED sentinel response — the LLM reads this as a signal to stop.
Hard limit (global): LangChain4j's maxSequentialToolsInvocations() throws a Java exception and forcefully terminates if the LLM keeps calling tools beyond the hard ceiling.

The soft limit allows the LLM one last strategic attempt; the hard limit is the kill switch.

RAG — The Self-Feeding Intelligence Layer

RAG (Retrieval-Augmented Generation) in AI CLI is not a simple "embed some files and query them." It's a carefully designed two-phase pipeline that feeds itself.

Phase 1: Ingestion

Documents enter the embedding store through two paths:

Static data files (--data=docs/) — loaded at startup
Dynamic web search results — auto-ingested during execution by the WebSearchTool

Every document goes through the same pipeline:

Documents → ConfigurableDocumentSplitter → MetadataEnrichedTransformer → EmbeddingStoreIngestor → VectorStore

The MetadataEnrichedTextSegmentTransformer is where the magic happens. It doesn't just store the text — it enriches every segment with contextual labels:

@Component
public class MetadataEnrichedTextSegmentTransformer implements TextSegmentTransformer {

    @Override
    public TextSegment transform(TextSegment textSegment) {
        var metadata = textSegment.metadata().copy();

        // Store statistics and timestamps
        metadata.put("original_text", originalText);
        metadata.put("character_count", String.valueOf(originalText.length()));
        metadata.put("ingestion_timestamp", String.valueOf(System.currentTimeMillis()));

        // Contextualize: prepend filename or title to the embedded text
        String contextPrefix = "";
        if (metadata.containsKey("file_name")) {
            contextPrefix = metadata.getString("file_name") + "\n";
        } else if (metadata.containsKey("title")) {
            contextPrefix = "Title: " + metadata.getString("title") + "\n";
        }

        return TextSegment.from(contextPrefix + originalText, metadata);
    }
}

Why prepend the filename or title to the embedded text? Because embedding models retrieve based on semantic similarity, and a naked paragraph of text about "registration opens April 15" is meaningless without the context of which event it belongs to. The label grounds the embedding.

For web search results, the IngestionService also handles deduplication by URL before ingesting — it queries the store's metadata filter to check if it has already been seen:

private boolean exists(String url) {
    var request = EmbeddingSearchRequest.builder()
            .queryEmbedding(embeddingModel.embed(url).content())
            .filter(MetadataFilterBuilder.metadataKey("url").isEqualTo(url))
            .maxResults(1)
            .build();
    return !embeddingStore.search(request).matches().isEmpty();
}

Phase 2: Retrieval

When a stage requests RAG (via tools: [rag] in front matter), the ContentRetrievalService builds a DefaultRetrievalAugmentor:

Single retriever (just rag) → direct attachment for efficiency
Multiple retrievers (rag + rag_search) → LanguageModelQueryRouter with LLM-based routing and ROUTE_TO_ALL fallback
Reranking (when rerank is active) → ReRankingContentAggregator with a minScore(0.3) threshold

Reranking is essential in production. Embedding similarity alone returns many segments, but not all are relevant. The reranker (Voyage AI API or a local ONNX model) re-scores the results based on semantic relevance to the actual query, filtering out noise.

This two-phase architecture means the RAG layer is not static. It grows during every pipeline run as web search results are auto-ingested, and future queries benefit from the enriched store. It's a self-improving loop.

After the pipeline finishes, the result needs to go somewhere. The OutputServiceProvider dispatches to the right handler based on the --output-mode flag:

Mode	Handler	Flow
`new_file`	`NewFileOutputService`	Write to a new timestamped file
`replace_file`	`ReplaceFileOutputService`	Overwrite the input file
`mail`	`MailOutputService`	Markdown → HTML (Commonmark with GFM tables) → MailerSend email
`buttondown`	`ButtondownOutputService`	Raw Markdown → Buttondown newsletter API

Here is the full output dispatch flow:

The mail flow is particularly interesting: Commonmark parses the Markdown, renders it to HTML with GFM table extensions, and the MailerSend SDK delivers it to configured recipients. All automated, all from a cron-triggered GitLab CI job.

The CI/CD pipeline for newsletters is straightforward:

Each newsletter is a separate CI job with its own transformation directory, tools, and delivery configuration. Adding a new newsletter is creating a new prompt directory and a new CI job — nothing else.

Production Realities — Hard-Won Lessons

This is the most important section. Building an LLM application is easy. Keeping it running reliably in production is hard.

Context Pollution

When multiple stages share the same chat memory, Stage 2 sees all of Stage 1's tool calls — including failed attempts, retries, and debugging noise. The LLM starts hallucinating based on stale tool-call artifacts.

Fix: Every stage gets its own isolated InMemoryChatMemoryStore. Stage 2 never sees Stage 1's conversations.

Tool Isolation

The generation agent (Stage 2) must never have access to web-scraping or search tools. If it does, it will try to "verify" its own output by searching, find contradictory results, and enter a loop of self-correction that produces garbage.

Fix: Front matter tool declarations per stage. The researcher has search, crawler, apify. The writer has markdown_validate. They never overlap.

Hybrid Predictability

LLMs are non-deterministic. But newsletters need consistent structure. The solution is deterministic Java guardrails around non-deterministic LLM output:

JSON Schema validation (json_schema_validate) enforces the exact structure of Stage 1's output
Markdown validation (markdown_validate) catches malformed formatting before delivery
Security scanning (markdown_security) checks every URL against Google Web Risk before the email goes out

The LLM is creative. Java is the enforcer.

The Refeed Loop

The best newsletter outputs are validated, curated, and structurally sound. By ingesting these outputs back into the vector database, future generations are improved — they can reference previous editions as examples of good formatting, successful event verification, and proper structure. The system improves itself.

Testing — The Quest for the Right Model

The Model Testing Journey

We tested multiple models across different providers and scenarios.

Tool calling turned out to be a bit of a challenge. The LLM doesn't just generate text — it needs to decide when to call tools, how to interpret the results, and when to stop. Not all models handle this well.

Some models would enter infinite tool loops, calling search 200 times with the same query. Others would skip the validation tool before returning their final answer. Some would ignore the JSON schema entirely.

We tested across OpenAI, Gemini, DeepSeek, Groq, and local Ollama models. For local zero-cost testing, Qwen 3 (1.7B) turned out to be well suited for the job. At 1.7 billion parameters it runs fast on local machines, and it was fairly consistent with tool calling — it follows structured prompts, calls tools in the right order, and respects validation cycles. It became the model powering our entire zero-cost test suite.

Zero-Cost Integration Testing

The project runs 43 integration tests — all via shell scripts, no unit tests. The entire test suite runs against a local stack that costs exactly $0:

Production (Paid)	Zero-Cost Alternative
OpenAI / DeepSeek chat	Ollama `qwen3:1.7b`
Voyage AI embeddings	Ollama `mxbai-embed-large`
Google Custom Search	DuckDuckGo / Stub engine
Qdrant Cloud vector store	DuckDB (in-process)
Voyage AI reranker	ONNX `ms-marco-mini-l6-v2`

This is possible because of the factory pattern. The same code, same tests, different beans. Swapping --chat-model=gpt-5 to --chat-model=qwen3:1.7b loads a completely different Spring context with zero code changes.

# Full zero-cost suite (43 tests)
./scripts/test_integration_ollama.sh

# Dedicated agentic workflow coverage
./scripts/test_transform_agentic_ollama.sh

The tests cover: basic transformations, RAG, data ingestion, search tools, social media search, content crawler (JS-rendered pages), tool execution limits, validation tools (JSON, HTML, Markdown), reranking (ONNX), and output modes.

Assertion Strategy: Structure Over Content

LLM output is non-deterministic, so exact-string assertions are a recipe for flaky tests. Instead, we assert on structure:

"Did it return valid JSON with the required fields?"
"Does the output contain at least one event?"
"Was the validation tool called before the final answer?"

This approach gives us reliable CI/CD without fighting non-determinism.

The Real Output — Live Newsletters

This is not a demo. These newsletters run on a schedule and land in real inboxes.

Morocco Run Radar: Every week, the pipeline researches all upcoming running events in Morocco through web search, Instagram scraping, Facebook scraping, and official race websites. It computes the distance of each event from Casablanca, validates the data against a JSON schema, formats it into a Markdown newsletter, scans every URL for malware, and delivers it via email.

IT Events Casablanca: Targets tech professionals in Casablanca with upcoming meetups, conferences, and workshops.

Assistant Professor Jobs: Monitors academic job openings matching specific criteria and delivers a curated digest.

Each pipeline follows the same 2-stage (classic) or 3-agent (agentic) architecture. The prompt files are the only difference.

What's Next

The engine works, the newsletters ship, and the architecture holds up. But there is plenty of room to grow.

Multimodality

Right now, AI CLI operates in a text-only world. The LLM reads text prompts, searches text results, and produces text output. But the real world is not text-only — race organizers post flyers as images, trail maps are PDFs, and results are sometimes scanned documents. Introducing multimodal inputs (image understanding, PDF extraction) would let the pipeline process these richer sources directly rather than relying on whatever text happens to be on the webpage.

Streaming & Batch

The current pipeline runs synchronously — the LLM generates its full response before anything is returned. For long-running agentic workflows, streaming would provide real-time feedback and reduce perceived latency. On the other end of the spectrum, batch APIs would allow high-volume processing (e.g., ingesting hundreds of data sources at once) at reduced cost, since most providers offer batch endpoints at a discount.

Stateful Newsletters — Delta Reporting

Weekly readers lose engagement when they see the same 50 events repeated. The next evolution is giving the pipeline memory across runs by injecting the previous edition's structured JSON into the agentic scope. The agents can then highlight what changed: "3 new marathons added," "Rabat Marathon now sold out," or "Early bird registration ends tomorrow." This shifts the newsletter from a static list to a living update.

Smarter Ingestion

As discussed in the RAG section, blindly ingesting all search results makes the vector store noisy over time. Future iterations could introduce relevance-scored ingestion (using the reranker as a gate), TTL-based expiration, or even limiting ingestion to only the final validated output — so the RAG layer learns from curated content, not raw web noise.

Beyond Sequential — Other Workflow Patterns

It's worth being honest about what this engine is and what it isn't. Everything described in this article uses sequential workflows — agents execute one after another in a fixed order, passing state forward. This was the right choice for newsletters: there is a natural pipeline from research → editorial → formatting → delivery, and sequence gives you predictability and debuggability.

But sequential is not the only pattern, and it's not always the best one. LangChain4j supports several others that would suit different use cases:

Parallel workflows — multiple agents running concurrently. Useful when you have independent research tasks (e.g., one agent searches web, another scrapes social media, a third queries a database) and want to merge results at the end.
Loop workflows — self-correcting agents that iterate until a quality threshold is met. Instead of validating once and hoping, the agent retries with feedback until the output passes.
Supervisor workflows — a manager agent that dynamically decides which worker agents to call, in what order, and how many times. This is the most powerful pattern — it handles complex, branching tasks where the right next step depends on what was learned so far.

For a newsletter engine, sequential gives us exactly the control we need. But if the use case grows into something more complex — say, a real-time research assistant that needs to decide dynamically whether to search, crawl, or ask a follow-up question — a supervisor or loop pattern would be the right tool for the job. The factory-based architecture makes that transition straightforward: the workflow pattern is just another bean.

Downloads — See It in Action

Live Presentation

LangChain4j in Action: A Walkthrough from Basic Chaining to Agentic Workflows (PDF)

Morocco Run Radar — Agentic Edition (April 2026)

IT Events Casablanca Newsletter (April 2026)

Assistant Professor Jobs Newsletter (April 2026)

13 April 2026

Iqbal´s DLQ Help

TL;DR

Why Build This?

The Stack

The Beauty of Spring Boot + PicoCLI

LangChain4j — The Main Course

Architecture Overview

From Multi-Stage Pipelines to Agentic Workflows

Level 1: Simple Multi-Stage Prompt Pipeline

Level 2: Agentic Workflows with `transform-agentic`

Tool System — Empowering the LLM

How Spring `@Conditional` Actually Works

The Infinite Tool-Loop Caveat

RAG — The Self-Feeding Intelligence Layer

Phase 1: Ingestion

Phase 2: Retrieval

Production Realities — Hard-Won Lessons

Context Pollution

Tool Isolation

Hybrid Predictability

The Refeed Loop

Testing — The Quest for the Right Model

The Model Testing Journey

Zero-Cost Integration Testing

Assertion Strategy: Structure Over Content

The Real Output — Live Newsletters

What's Next

Multimodality

Streaming & Batch

Stateful Newsletters — Delta Reporting

Smarter Ingestion

Beyond Sequential — Other Workflow Patterns

Further Reading

Downloads — See It in Action

Live Presentation

AI CLI — Building a Production Newsletter Engine with Java, Spring Boot & LangChain4j

TL;DR

Why Build This?

The Stack

The Beauty of Spring Boot + PicoCLI

LangChain4j — The Main Course

Architecture Overview

From Multi-Stage Pipelines to Agentic Workflows

Level 1: Simple Multi-Stage Prompt Pipeline

Level 2: Agentic Workflows with transform-agentic

Tool System — Empowering the LLM

How Spring @Conditional Actually Works

The Infinite Tool-Loop Caveat

RAG — The Self-Feeding Intelligence Layer

Phase 1: Ingestion

Phase 2: Retrieval

Output System & Newsletter Delivery

Production Realities — Hard-Won Lessons

Context Pollution

Tool Isolation

Hybrid Predictability

The Refeed Loop

Testing — The Quest for the Right Model

The Model Testing Journey

Zero-Cost Integration Testing

Assertion Strategy: Structure Over Content

The Real Output — Live Newsletters

What's Next

Multimodality

Streaming & Batch

Stateful Newsletters — Delta Reporting

Smarter Ingestion

Beyond Sequential — Other Workflow Patterns

Further Reading

Downloads — See It in Action

Live Presentation

Newsletter Examples (Real Output)

Level 2: Agentic Workflows with `transform-agentic`

How Spring `@Conditional` Actually Works