Our initial use case was to just prompt the LLM and chain the transformations of the Markdown article. Here we will enhance the .topic file with new articles and their summaries, which act as a parent to the articles:
The idea here is to generate summaries of each article, build a simple CSV file and use embedding models to pass these summaries as context to the LLM.
This, alongside the original topic file, will allow us to continuously rebuild the index you see here:
The Process
We will focus on the green steps, because the previous ones are simple LLM prompting we covered in the previous article.
The Code
Setting the model properties for autowiring:
OpenAI
#OPENAI Chat Model
langchain4j.open-ai.chat-model.api-key=${OPENAI_API_KEY}
langchain4j.open-ai.chat-model.model-name=gpt-4o-mini
langchain4j.open-ai.chat-model.log-requests=true
langchain4j.open-ai.chat-model.log-responses=true
langchain4j.open-ai.chat-model.timeout=1h
#OPENAI Embedding Model
langchain4j.open-ai.embedding-model.api-key=${OPENAI_API_KEY}
langchain4j.open-ai.embedding-model.model-name=text-embedding-3-small
package com.iqbalaissaoui.runners;
import com.iqbalaissaoui.services.XMLTopicRefinerService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Optional;
import static com.iqbalaissaoui.assistants.WriterSideConstants.WRITERSIDE_TOPICS;
@Component
@Profile("single & xml")
public class XmlTopicGeneratorRunner implements CommandLineRunner {
@Autowired
private XMLTopicRefinerService xmlTopicRefinerService;
@Override
public void run(String... varargs) throws IOException {
System.out.println("XmlTopicGeneratorRunner.run");
// check args that the first arg is an md file or throw illegalarg
Optional.of(varargs)
.filter(args -> args.length == 0)
.ifPresent(args -> {
throw new IllegalArgumentException("Please provide a topic file as an argument");
});
//check the argument is
Path topic = WRITERSIDE_TOPICS.resolve(varargs[0]);
// check if the file exists or throw illegalarg
Optional.of(Files.exists(topic))
.filter(Boolean.FALSE::equals)
.ifPresent(b -> {
throw new IllegalArgumentException("The file does not exist");
});
// another check if the file is a topic file
Optional.of(topic)
.filter(path -> !path.getFileName().toString().endsWith(".topic"))
.ifPresent(path -> {
throw new IllegalArgumentException("The file is not a topic file");
});
xmlTopicRefinerService.refine(topic);
}
}
The Service:
This is where we create the summaries of the articles, build the index and use the embedding model to generate the context for the LLM:
generating the index file from the summaries
Here we use inference concurrently to summarize all articles and generate a simple CSV:
Summarizer AI Service
package com.iqbalaissaoui.assistants;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.spring.AiService;
@AiService
public interface WriterSideMarkDownSumerizer {
String SYSTEM_PROMPT = """
Summarize the key points of the content in a concise and clear manner, keeping the length suitable for the summary attribute.
Do not exceed 30 words and do not include new lines or special characters.
""";
@SystemMessage(SYSTEM_PROMPT)
String chat(String userMessage);
}
index generation and inference
// create an index of the topics
// hinting to the LLM to use valid hrefs
Path index = Files.createTempFile("index", ".csv");
index.toFile().deleteOnExit();
Files.writeString(index, "filename;summary");
Files.writeString(index, System.lineSeparator());
List<String> indexLines = mds.stream()
.parallel()
.peek(md -> System.out.println("Processing file: " + md))
.map(p -> {
try {
return Map.entry(p.getFileName().toString(), writerSideMarkDownSumerizer.chat(Files.readString(p)));
} catch (IOException e) {
throw new RuntimeException(e);
}
}).map(e -> e.getKey() + ";" + e.getValue())
.toList();
indexLines.forEach(s -> {
try {
Files.writeString(index, s + System.lineSeparator(), StandardOpenOption.APPEND);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
System.out.println("index = " + Files.readString(index));
//load the documents
List<Document> documents = new ArrayList<>();
//load index as well
documents.add(FileSystemDocumentLoader.loadDocument(index));
creating the embedding store and retriever
This is the snippet where we create the embedding store and retriever:
//create the embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
//ingest the documents with the autowired embedding model
EmbeddingStoreIngestor embeddingStoreIngestor = EmbeddingStoreIngestor.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
embeddingStoreIngestor.ingest(documents);
//create the content retriever
EmbeddingStoreContentRetriever embeddingStoreContentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
interface for the AI Service
This is a regular interface for the AI service, we will create the implementation programmatically instead of the annotation style used in the previous article to be able to pass the embedding model and the retriever:
package com.iqbalaissaoui.assistants;
public interface WriterSideXmlTopicGenerator {
String SYSTEM_PROMPT = """
You are an expert in JetBrains WriterSide, a technical documentation tool. Your task is to enhance XML.topic files by integrating references to Markdown (.md) articles provided as input.
### Instructions:
**1. Input:**
- An existing XML.topic file (if available).
- A set of Markdown (.md) files that need to be referenced.
**2. Task:**
- Ensure all provided Markdown files are referenced in the XML.topic file using every filename and every summary as content for the primary section
- Maintain proper XML.topic structure and formatting.
- Improve the existing XML.topic file by integrating missing references while ensuring logical organization.
**3. Output Requirements:**
- Produce a valid, well-formed XML.topic file.
- Ensure all Markdown files are correctly linked.
- Avoid duplicate references.
- Maintain consistent indentation and structure.
**4. Constraints:**
- Reference only the provided Markdown files—no external additions.
- Group related topics logically based on filenames or inferred context when needed.
""";
String chat(String userMessage);
}
inference with the embedding model
Final step, we create a LangChain4j Assistant programmatically this time in contrast to the previous article so that we can pass the embedding model and the retriever:
//create the writer side xml topic generator
WriterSideXmlTopicGenerator writerSideXmlTopicGenerator = AiServices.builder(WriterSideXmlTopicGenerator.class)
.chatLanguageModel(chatLanguageModel)
.systemMessageProvider(
_ -> WriterSideXmlTopicGenerator.SYSTEM_PROMPT
)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(embeddingStoreContentRetriever)
.build();
String input = Files.readString(parent);
String output = writerSideXmlTopicGenerator.chat(input);
Full Service Code
Putting the pieces together.
package com.iqbalaissaoui.services;
import com.iqbalaissaoui.assistants.WriterSideMarkDownSumerizer;
import com.iqbalaissaoui.assistants.WriterSideXmlTopicGenerator;
import com.iqbalaissaoui.utils.FilesService;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Profile;
import org.springframework.stereotype.Service;
import org.xml.sax.SAXException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Optional;
@Service
@Profile("xml")
public class XMLTopicRefinerService {
@Autowired
private WriterSideMarkDownSumerizer writerSideMarkDownSumerizer;
@Autowired
private ChatLanguageModel chatLanguageModel;
@Autowired
private EmbeddingModel embeddingModel;
public void refine(Path parent) {
try {
List<Path> mds = FilesService.getMarkDownTopics(parent);
// create an index of the topics
// hinting to the LLM to use valid hrefs
Path index = Files.createTempFile("index", ".csv");
index.toFile().deleteOnExit();
Files.writeString(index, "filename;summary");
Files.writeString(index, System.lineSeparator());
List<String> indexLines = mds.stream()
.parallel()
.peek(md -> System.out.println("Processing file: " + md))
.map(p -> {
try {
return Map.entry(p.getFileName().toString(), writerSideMarkDownSumerizer.chat(Files.readString(p)));
} catch (IOException e) {
throw new RuntimeException(e);
}
}).map(e -> e.getKey() + ";" + e.getValue())
.toList();
indexLines.forEach(s -> {
try {
Files.writeString(index, s + System.lineSeparator(), StandardOpenOption.APPEND);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
System.out.println("index = " + Files.readString(index));
//load the documents
List<Document> documents = new ArrayList<>();
//load index as well
documents.add(FileSystemDocumentLoader.loadDocument(index));
//create the embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
//ingest the documents with the autowired embedding model
EmbeddingStoreIngestor embeddingStoreIngestor = EmbeddingStoreIngestor.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
embeddingStoreIngestor.ingest(documents);
//create the content retriever
EmbeddingStoreContentRetriever embeddingStoreContentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
//create the WriterSide xml topic generator
WriterSideXmlTopicGenerator writerSideXmlTopicGenerator = AiServices.builder(WriterSideXmlTopicGenerator.class)
.chatLanguageModel(chatLanguageModel)
.systemMessageProvider(
_ -> WriterSideXmlTopicGenerator.SYSTEM_PROMPT
)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(embeddingStoreContentRetriever)
.build();
String input = Files.readString(parent);
String output = writerSideXmlTopicGenerator.chat(input);
Optional.of(StringUtils.difference(input, output))
.ifPresent(System.out::println);
Files.writeString(parent, output);
} catch (IOException | XPathExpressionException | ParserConfigurationException | SAXException e) {
throw new RuntimeException(e);
}
}
}