Simple OpenAI Whisper Java Client
whisperer is a Micronaut CLI that records microphone input until you press Enter, forwards the WAV file to the OpenAI Audio Transcriptions API, then prints and copies the transcript. The project lives in <your-repo>/examples/whisperer and targets macOS users who want a fast, keyboard-friendly dictation workflow.
Why Another Whisper Client?
pbcopyintegration keeps hands on the keyboard—no manual copy/paste after dictation.GraalVM native image builds let the CLI start instantly and access CoreAudio without a JVM startup delay.
The Micronaut runtime keeps the code modular while staying lightweight for a CLI tool.
Command Flow
The entry point is the Picocli-powered command in <your-repo>/examples/whisperer/src/main/java/com/iqbalaissaoui/WhispererCommand.java.
Key runtime behavior:
Forces
java.homewhen missing so Java Sound can discover CoreAudio providers inside a native image.Shows a "● recording" prompt, waits for Enter, and then joins the background writer thread before calling OpenAI.
Sends failures to stderr, prints the temp WAV path for inspection, and cleans up files after a successful clipboard copy.
Capturing Microphone Audio
AudioRecorder walks through possible sample rates, channels, and endianness to find a working TargetDataLine. It streams audio to WAV using Java Sound’s AudioSystem.
Highlights:
Runs the WAV writer on a dedicated daemon thread so Picocli stays responsive.
When the mixer cannot satisfy the requested format, it reopens the line to discover the mixer’s default format.
Offers helpful diagnostics if no microphone is available or permissions are missing.
Talking to the OpenAI Audio API
TranscriptionService uses Micronaut’s HttpClient to POST a multipart request to /v1/audio/transcriptions.
Reads the binary WAV into memory for the multipart upload (typical mic captures are tiny, so this stays lightweight).
Accepts overrides via Micronaut properties:
openai.modeldefaults towhisper-1, andopenai.promptdefaults toOutput in English.Throws a descriptive
IllegalStateExceptionwhen OpenAI responds with non-2xx status codes to surface API errors in the CLI.
Clipboard Integration
ClipboardService shells out to pbcopy, letting the transcript land directly on the macOS clipboard. Any non-zero exit code triggers a warning, making it easy to diagnose if pbcopy is unavailable.
Running Locally
Export the OpenAI API key:
export OPENAI_API_KEY=sk-...Launch the CLI in JVM mode:
./mvnw exec:exec -Dexec.mainClass=com.iqbalaissaoui.WhispererCommandSpeak, press Enter, and the transcript appears in both the terminal output and your clipboard.
Building the Native Image
The Maven build is preconfigured for GraalVM via -Dpackaging=native-image.
Tips:
Ensure
JAVA_HOMEorGRAALVM_HOMEpoints to your GraalVM distribution before running the native binary; otherwise Java Sound cannot bootstrap.Use
mvn -Pnative packageif you prefer Maven’s native profile instead of the packaging flag.
Testing Strategy
WhispererCommandTest runs Picocli inside a Micronaut context marked with Environment.TEST. The command short-circuits before touching the microphone, which keeps CI runs hermetic while still checking that Picocli wiring and the -v flag work.
Source Code
macOS Automator Launcher
To avoid running the binary manually, create an Automator "Quick Action" that executes the bundled whisperer.sh script. Assign it a keyboard shortcut, and macOS will start whisperer and begin recording whenever you trigger the shortcut.
Challenges
Language drift: Whisper occasionally ignores the language you speak, especially in multilingual sessions, and eagerly translates English prompts into another language (or vice versa). The CLI already sends an
openai.promptwith "Output in English" by default, but that is just a hint—Whisper may still decide to translate.What helps: When you need a strict language, pass the
languagefield inTranscriptionService(for English,en) or override the Micronaut property (./mvnw -Dopenai.language=en). You can also tighten the prompt—e.g.,-Dopenai.prompt="Transcribe exactly what you hear in English."—and add a quick post-processing check in the command to detect non-English text and re-run with a more forceful prompt. Those adjustments keep Whisper focused on transcription instead of translation while still letting you fall back to its defaults when you want auto-language detection.