Simple OpenAI Whisper Java Client

whisperer is a Micronaut CLI that records microphone input until you press Enter, forwards the WAV file to the OpenAI Audio Transcriptions API, then prints and copies the transcript. The project lives in <your-repo>/examples/whisperer and targets macOS users who want a fast, keyboard-friendly dictation workflow.

Why Another Whisper Client?

pbcopy integration keeps hands on the keyboard—no manual copy/paste after dictation.
GraalVM native image builds let the CLI start instantly and access CoreAudio without a JVM startup delay.
The Micronaut runtime keeps the code modular while staying lightweight for a CLI tool.

Command Flow

The entry point is the Picocli-powered command in <your-repo>/examples/whisperer/src/main/java/com/iqbalaissaoui/WhispererCommand.java.

@Command(name = "whisperer", description = "Press-to-stop voice capture that sends audio to OpenAI's transcription API")
public class WhispererCommand implements Callable<Integer> {
    public Integer call() {
        tempFile = Files.createTempFile("whisperer-", ".wav");
        audioRecorder.start(tempFile);
        waitForEnter();
        audioRecorder.stop();
        audioRecorder.awaitCompletion();
        String transcript = transcriptionService.transcribe(tempFile);
        System.out.println(transcript);
        clipboardService.copyToClipboard(transcript);
    }
}

Key runtime behavior:

Forces java.home when missing so Java Sound can discover CoreAudio providers inside a native image.
Shows a "● recording" prompt, waits for Enter, and then joins the background writer thread before calling OpenAI.
Sends failures to stderr, prints the temp WAV path for inspection, and cleans up files after a successful clipboard copy.

Capturing Microphone Audio

AudioRecorder walks through possible sample rates, channels, and endianness to find a working TargetDataLine. It streams audio to WAV using Java Sound’s AudioSystem.

for (AudioFormat candidate : candidateFormats()) {
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, candidate);
    for (Mixer.Info mixerInfo : mixers) {
        Mixer mixer = AudioSystem.getMixer(mixerInfo);
        if (!mixer.isLineSupported(info)) {
            continue;
        }
        TargetDataLine candidateLine = (TargetDataLine) mixer.getLine(info);
        candidateLine.open(candidate);
        candidateLine.start();
        line = candidateLine;
        selectedFormat = candidate;
        break outer;
    }
}

Highlights:

Runs the WAV writer on a dedicated daemon thread so Picocli stays responsive.
When the mixer cannot satisfy the requested format, it reopens the line to discover the mixer’s default format.
Offers helpful diagnostics if no microphone is available or permissions are missing.

Talking to the OpenAI Audio API

TranscriptionService uses Micronaut’s HttpClient to POST a multipart request to /v1/audio/transcriptions.

MultipartBody body = MultipartBody.builder()
        .addPart("model", model)
        .addPart("response_format", "text")
        .addPart("prompt", prompt)
        .addPart("file", fileName, MediaType.of("audio/wav"), audioBytes)
        .build();

MutableHttpRequest<?> request = HttpRequest.POST(URI.create(API), body)
        .accept(MediaType.TEXT_PLAIN_TYPE)
        .contentType(MediaType.MULTIPART_FORM_DATA_TYPE)
        .header(HttpHeaders.AUTHORIZATION, "Bearer " + apiKey);

Reads the binary WAV into memory for the multipart upload (typical mic captures are tiny, so this stays lightweight).
Accepts overrides via Micronaut properties: openai.model defaults to whisper-1, and openai.prompt defaults to Output in English.
Throws a descriptive IllegalStateException when OpenAI responds with non-2xx status codes to surface API errors in the CLI.

Clipboard Integration

ClipboardService shells out to pbcopy, letting the transcript land directly on the macOS clipboard. Any non-zero exit code triggers a warning, making it easy to diagnose if pbcopy is unavailable.

Running Locally

Export the OpenAI API key:
export OPENAI_API_KEY=sk-...
Launch the CLI in JVM mode:
./mvnw exec:exec -Dexec.mainClass=com.iqbalaissaoui.WhispererCommand
Speak, press Enter, and the transcript appears in both the terminal output and your clipboard.

Building the Native Image

The Maven build is preconfigured for GraalVM via -Dpackaging=native-image.

./mvnw -Dpackaging=native-image package
./target/whisperer

Tips:

Ensure JAVA_HOME or GRAALVM_HOME points to your GraalVM distribution before running the native binary; otherwise Java Sound cannot bootstrap.
Use mvn -Pnative package if you prefer Maven’s native profile instead of the packaging flag.

Testing Strategy

WhispererCommandTest runs Picocli inside a Micronaut context marked with Environment.TEST. The command short-circuits before touching the microphone, which keeps CI runs hermetic while still checking that Picocli wiring and the -v flag work.

Source Code

macOS Automator Launcher

To avoid running the binary manually, create an Automator "Quick Action" that executes the bundled whisperer.sh script. Assign it a keyboard shortcut, and macOS will start whisperer and begin recording whenever you trigger the shortcut.

Challenges

Language drift: Whisper occasionally ignores the language you speak, especially in multilingual sessions, and eagerly translates English prompts into another language (or vice versa). The CLI already sends an openai.prompt with "Output in English" by default, but that is just a hint—Whisper may still decide to translate.
What helps: When you need a strict language, pass the language field in TranscriptionService (for English, en) or override the Micronaut property (./mvnw -Dopenai.language=en). You can also tighten the prompt—e.g., -Dopenai.prompt="Transcribe exactly what you hear in English."—and add a quick post-processing check in the command to detect non-English text and re-run with a more forceful prompt. Those adjustments keep Whisper focused on transcription instead of translation while still letting you fall back to its defaults when you want auto-language detection.

21 October 2025