Voice Input
Speech-to-text dictation with real-time transcription and optional AI-powered text correction, available in agent input bars and standalone Notes panels.
Overview
Voice Input lets you dictate into any agent's hybrid input bar or into a standalone Notes panel. Audio is streamed live to Deepgram for real-time transcription, with text appearing in the target surface as you speak. An optional AI correction layer (powered by OpenAI) can clean up the raw transcript in agent input bars, fixing punctuation, technical terms, and filler words before you send the command.
Two things are required to get started: a Deepgram API key for transcription, and microphone permission at the OS level. AI text correction is optional and requires a separate OpenAI API key.
Prerequisites
- Deepgram API key for speech-to-text transcription. Sign up at console.deepgram.com and create an API key. Transcription costs depend on the model: Nova-3 runs at $0.0077/min, Nova-2 at $0.0043/min.
- Microphone access granted at the OS level (macOS, Windows, or Linux).
- OpenAI API key (optional) for AI text correction and file reference resolution. Only needed if you want the AI correction layer on top of the raw transcription.
Setup
Enable Voice Input
- Open Settings > Integrations.
- In the Speech-to-Text section, toggle Voice Input on.
- Enter your Deepgram API key (starts with
dg_). The key is validated when you save. - Choose your language from the 10 supported options: English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, and Russian.
- Pick a transcription model. Nova-3 (default) is the latest and most accurate. Nova-2 is a stable, lower-cost alternative.
- Select a paragraph break strategy. Spoken commands (default, English only) lets you say "new paragraph" to insert paragraph breaks. Manual Enter requires you to press Enter instead. Non-English languages fall back to Manual Enter automatically.
Grant Microphone Permission
Daintree needs microphone access from your operating system before it can capture audio. The settings panel shows your current permission status and provides instructions specific to your OS.
Open System Settings > Privacy & Security > Microphone and enable Daintree. If permission hasn't been requested yet, Daintree shows a Request button that triggers the system dialog.
If permission was previously denied, click Open System Settings in the Daintree settings panel to go directly to the Microphone privacy settings. Toggle Daintree on, then click Re-check back in Daintree.
Open Windows Settings > Privacy & security > Microphone and make sure desktop app access is allowed. Daintree shows a Request button if permission hasn't been determined yet.
If access was denied, click Open System Settings in the Daintree settings panel to jump to the Microphone privacy page. Enable access, then click Re-check in Daintree.
Grant microphone access through your system audio settings. Linux doesn't support the in-app Request button, so you'll need to configure permissions through your desktop environment or audio manager (e.g. PipeWire, PulseAudio).
Once permissions are granted externally, click Re-check in the Daintree settings panel to confirm access.
Starting and Stopping Dictation
There are three ways to start voice dictation:
- Mic button. Each agent panel has its own mic button in the hybrid input bar. Standalone Notes panels have a separate floating mic button at the bottom-right of the editor in Edit and Split modes. Click the button to start recording for that panel.
- Keyboard shortcut: Cmd+Shift+V on macOS, Ctrl+Shift+V on Windows/Linux. Toggles voice dictation for the focused panel. When an agent input bar has focus, the same shortcut pastes clipboard content as plain text instead of toggling voice. Focused Notes panels have no such conflict, so the shortcut always toggles voice there.
- Action palette: search for "Toggle Voice Dictation" in the action palette.
While recording, the mic button shows an animated ring that pulses with your audio level. A square stop icon replaces the mic icon. Click it again or press the shortcut to stop recording. In agent input bars, pressing Enter also stops recording and triggers the wait-before-submit flow. In Notes panels, Enter just inserts a newline (see Dictation in Notes Panels).
Text appears in the input bar as you speak. Interim (in-progress) transcription shows at reduced opacity, then solidifies once Deepgram confirms the final text. The session goes through five states: idle, connecting (up to 10 seconds), recording, finishing (up to 3 seconds to drain final text), and then back to idle.
Toolbar Indicator
When a recording session is active, a mic icon appears in the global toolbar with a pulsing accent dot and elapsed time in M:SS format. The tooltip shows which project and worktree is recording. Clicking the toolbar indicator focuses the panel where recording is happening, but it doesn't stop the session.
Dictation in Notes Panels
Voice input is also available inside standalone Notes panels. Each panel has its own floating mic button at the bottom-right of the editor in Edit and Split modes. Preview mode has no editor mounted, so the button doesn't appear there, and it's also hidden while a conflict banner is active.
Transcribed text is inserted at the cursor position that was active when recording started. That lets you dictate into the middle of an existing note without jumping to the end.
Voice is only available in panels opened in the panel grid or dock. The Notes Palette modal (Cmd+Shift+N) has no mic button. To dictate into a note from the palette, select it and press Shift+Enter (grid) or Shift+Cmd+Enter (dock) to open it as a standalone panel first.
Pressing Enter while recording in a Notes panel simply inserts a newline. There's no submit concept in notes, so the wait-before-submit flow described below doesn't apply here.
Submitting While Recording
Pressing Enter while recording or while AI corrections are still in flight triggers the wait-before-submit flow:
- Daintree stops the recording session.
- A spinner overlay appears on the input bar, which becomes read-only.
- Daintree waits up to 10 seconds for any pending AI corrections to settle.
- Once all corrections are complete (or the timeout is reached), the text is submitted automatically.
Press Escape to cancel the wait and keep the text in the input bar for further editing.
Paragraph Breaks
Spoken Commands (Default, English Only)
With the spoken-command strategy selected, you can say formatting commands while dictating:
- "New paragraph", "next paragraph", or "start a new paragraph" inserts a blank line (
\n\n). - "New line" or "line break" inserts a single newline (
\n).
These commands are handled by Deepgram's Dictation mode and stripped from the transcript. You can also press Enter to commit the current paragraph manually at any time.
Manual Enter
With the manual strategy, paragraph breaks are inserted by pressing Enter only. Spoken formatting commands are disabled. Choose this if you dictate in a non-English language or prefer explicit control over paragraph breaks.
AI Text Correction
The AI correction layer reviews the raw transcription and fixes common issues: punctuation, filler words (um, uh), technical term spelling, and homophones. It uses a confidence-based system where Deepgram's per-word confidence scores determine how much AI intervention is needed.
- Words with confidence below 0.8 are flagged as uncertain and prioritised for correction.
- If all words in a segment have confidence above 0.85, the AI call is skipped entirely. Good transcription doesn't need fixing.
- Text currently being corrected shows a green dotted underline in the input bar, then resolves in place once the correction arrives.
Enable AI Correction
- In Settings > Integrations, scroll to the AI Text Correction section (visible once Voice Input is enabled).
- Toggle AI Text Correction on.
- Enter your OpenAI API key (starts with
sk-). - Choose a correction model. GPT-5 Mini (recommended) applies paragraph-level correction with higher quality. GPT-5 Nano is faster and lower cost, better suited for lower-latency correction. This is a single model selection that applies to all correction passes.
Custom Instructions
You can add project-specific correction rules in the Custom Instructions textarea. These are appended to the core correction prompt, so the AI applies them alongside its built-in rules. For example, you might write: "Always capitalise ProductName as one word" or "React component names should use PascalCase."
The Inspect core prompt toggle in settings lets you view the full base correction prompt (read-only). Your project name and custom dictionary terms are injected into this prompt automatically.
File Reference Resolution
When AI correction is enabled, Daintree can detect spoken file references and resolve them into @file links. Use natural phrases while dictating:
- "Link to the auth helper"
- "At file the button component"
- "Reference the user model"
- "Add file the config service"
- "Open the main layout"
Daintree's correction prompt detects these patterns and sends them to a file resolver, which searches the project file tree and uses AI to pick the best match. The resolved path appears as an @path/to/file reference that renders as a clickable file chip in the input bar.
If resolution fails, the text falls back to @?description so you can see what was being looked up and fix it manually.
This feature requires AI correction to be enabled and is toggled via Resolve file references in settings (on by default when your OpenAI key is configured).
Custom Dictionary
The custom dictionary lets you add up to 100 domain-specific terms that Daintree sends to Deepgram as recognition hints. These terms boost transcription accuracy for project names, product names, technical vocabulary, and abbreviations that the base model might not recognize.
Add terms in Settings > Integrations under the Speech-to-Text section. Each term appears as a removable pill. These same terms are also injected into the AI correction prompt, where they're treated as highest-priority required substitutions.
Your project name is automatically included in the recognition hints, so there's no need to add it manually.
Settings Reference
All voice input settings are in Settings > Integrations.
Speech-to-Text
| Setting | Values / Notes |
|---|---|
| Voice Input | Enabled / Disabled (default: Disabled) |
| Deepgram API Key | dg_... prefix, validated on save |
| Language | English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, Russian |
| Transcription Model | Nova-3 ($0.0077/min, default) or Nova-2 ($0.0043/min) |
| Paragraph Breaks | Spoken commands (default, English only) or Manual Enter |
| Custom Dictionary | Up to 100 terms, sent as Deepgram keyterm hints |
AI Text Correction
This section is only visible when Voice Input is enabled.
| Setting | Values / Notes |
|---|---|
| AI Text Correction | Enabled / Disabled (default: Disabled) |
| OpenAI API Key | sk-... prefix, validated on save |
| Correction Model | GPT-5 Mini (recommended, full paragraph) or GPT-5 Nano (faster, lower cost) |
| Resolve File References | Enabled by default when OpenAI key is configured |
| Custom Instructions | Free-form textarea for project-specific correction rules |
Keyboard Shortcuts
| Action | Shortcut |
|---|---|
| Toggle voice dictation (global) | Cmd+Shift+V / Ctrl+Shift+V |
| Paste as plain text (when input bar has focus) | Cmd+Shift+V / Ctrl+Shift+V |
| Commit paragraph / submit | Enter |
| Cancel voice wait-submit | Escape |
The Cmd+Shift+V shortcut is context-sensitive in agent input bars: it toggles voice when the bar is unfocused, and pastes clipboard content as plain text when it has focus. Notes panels don't intercept the shortcut, so it always toggles voice when a Notes panel is focused. See Keyboard Shortcuts for the full reference.
Troubleshooting
Mic button is hidden
The microphone button only appears when Voice Input is fully configured. Check that:
- Voice Input is enabled in Settings > Integrations.
- A valid Deepgram API key has been entered.
- Microphone permission is granted at the OS level.
"Connection timed out"
Daintree couldn't reach Deepgram within 10 seconds. Verify your internet connection and check that your Deepgram API key is valid and has available credits.
"Invalid API key"
Shown when saving settings if the API key format is incorrect or the key is no longer valid. Re-enter a valid key (Deepgram keys start with dg_, OpenAI keys start with sk-).
Spoken paragraph commands aren't working
Spoken commands (like "new paragraph") only work when:
- The language is set to English.
- The paragraph break strategy is set to Spoken commands.
AI correction not running
Check that AI Text Correction is toggled on and a valid OpenAI API key is entered. If transcription confidence is consistently high (all words above 0.85), the correction call is intentionally skipped because the raw transcription doesn't need fixing.
File references showing as @?
The @?description format means file resolution failed for that reference. This can happen if the project file tree doesn't contain a close match, or if the description was too vague for the AI to resolve. Try using more specific descriptions when dictating file references.