Documentation Index
Fetch the complete documentation index at: https://mintlify.com/remorses/kimaki/llms.txt
Use this file to discover all available pages before exploring further.
Voice Messages
Kimaki can transcribe Discord voice messages using AI-powered transcription that understands your project context for accurate file names and technical terms.Overview
When you record a voice message in a Kimaki-linked channel:- Upload completes - Discord processes the audio
- Kimaki detects the voice attachment
- Transcription starts using Gemini or OpenAI
- Project context is included for accuracy
- Text appears in the thread
- Session starts (unless it’s a
/queuemessage)
Setup
Voice transcription requires an API key from either OpenAI or Google.Choose a provider
- OpenAI (recommended): Uses Whisper for high-quality transcription
- Gemini: Google’s transcription with multimodal understanding
OpenAI is the preferred provider. Set it first if both keys are available.
Context-Aware Transcription
Kimaki enhances transcription accuracy by including your project’s file tree:- File names (“auth utils dot TS” →
auth-utils.ts) - Function names from your codebase
- Project-specific terminology
- Path references
Example
Without context:“Open the off file and fix the error in line 42”With project context:
“Open the auth.ts file and fix the error in line 42”
Session Context
For even better transcription, Kimaki also sends:- Current session context (last message in the thread)
- Last session context (previous session’s final state)
Implementation Details
Fromdiscord/src/voice-handler.ts:564-572:
Queue Messages
Voice messages work with the/queue command:
Permissions
The same permission rules apply to voice messages:- Server Owner
- Administrator
- Manage Server
- “Kimaki” role
Thread Naming
When a voice message starts a new thread, the transcribed text becomes the thread name (truncated to 80 characters). Fromdiscord/src/voice-handler.ts:595-617:
Audio Processing Pipeline
For real-time voice chat (when bot joins voice channels):Error Handling
If transcription fails:- No API key: Button appears to set one via
/login - Network error: Error message with details
- Empty transcription: “No speech detected” message
- Invalid format: Unsupported audio format error
Supported Formats
Discord voice messages are sent as:- OGG Opus (most common)
- Webm Opus
Privacy
Voice messages are:- Sent to OpenAI or Google APIs for transcription
- Not stored by Kimaki beyond the session
- Processed with the same privacy as text messages
If you’re concerned about audio privacy, use text messages or file attachments instead.
Related Features
- Text Messages - Standard text interaction
- File Attachments - Attach files to prompts
- Scheduled Tasks - Schedule voice reminders