Voice Input

Speech-to-text dictation with real-time transcription and optional AI-powered text correction, available in agent input bars and standalone Notes panels.

Updated Apr 18, 2026, 12:39 PM

Reviewed Apr 18, 2026, 12:39 PM

Overview

Voice Input lets you dictate into any agent's hybrid input bar or into a standalone Notes panel. Audio is streamed live to Deepgram for real-time transcription, with text appearing in the target surface as you speak. An optional AI correction layer (powered by OpenAI) can clean up the raw transcript in agent input bars, fixing punctuation, technical terms, and filler words before you send the command.

Two things are required to get started: a Deepgram API key for transcription, and microphone permission at the OS level. AI text correction is optional and requires a separate OpenAI API key.

Note

Voice Input is disabled by default. Enable it in Settings > Integrations under the Speech-to-Text section.

Prerequisites

Deepgram API key for speech-to-text transcription. Sign up at console.deepgram.com and create an API key. Transcription costs depend on the model: Nova-3 runs at $0.0077/min, Nova-2 at $0.0043/min.
Microphone access granted at the OS level (macOS, Windows, or Linux).
OpenAI API key (optional) for AI text correction and file reference resolution. Only needed if you want the AI correction layer on top of the raw transcription.

Setup

Enable Voice Input

Open Settings > Integrations.
In the Speech-to-Text section, toggle Voice Input on.
Enter your Deepgram API key (starts with dg_). The key is validated when you save.
Choose your language from the 10 supported options: English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, and Russian.
Pick a transcription model. Nova-3 (default) is the latest and most accurate. Nova-2 is a stable, lower-cost alternative.
Select a paragraph break strategy. Spoken commands (default, English only) lets you say "new paragraph" to insert paragraph breaks. Manual Enter requires you to press Enter instead. Non-English languages fall back to Manual Enter automatically.

Grant Microphone Permission

Daintree needs microphone access from your operating system before it can capture audio. The settings panel shows your current permission status and provides instructions specific to your OS.

Open System Settings > Privacy & Security > Microphone and enable Daintree. If permission hasn't been requested yet, Daintree shows a Request button that triggers the system dialog.

If permission was previously denied, click Open System Settings in the Daintree settings panel to go directly to the Microphone privacy settings. Toggle Daintree on, then click Re-check back in Daintree.

Warning

If the microphone button doesn't appear in the input bar, check that Voice Input is enabled in settings and that microphone permission has been granted. The button is hidden entirely when the feature isn't configured.

Starting and Stopping Dictation

There are three ways to start voice dictation:

Mic button. Each agent panel has its own mic button in the hybrid input bar. Standalone Notes panels have a separate floating mic button at the bottom-right of the editor in Edit and Split modes. Click the button to start recording for that panel.
Keyboard shortcut: Cmd+Shift+V on macOS, Ctrl+Shift+V on Windows/Linux. Toggles voice dictation for the focused panel. When an agent input bar has focus, the same shortcut pastes clipboard content as plain text instead of toggling voice. Focused Notes panels have no such conflict, so the shortcut always toggles voice there.
Action palette: search for "Toggle Voice Dictation" in the action palette.

While recording, the mic button shows an animated ring that pulses with your audio level. A square stop icon replaces the mic icon. Click it again or press the shortcut to stop recording. In agent input bars, pressing Enter also stops recording and triggers the wait-before-submit flow. In Notes panels, Enter just inserts a newline (see Dictation in Notes Panels).

Text appears in the input bar as you speak. Interim (in-progress) transcription shows at reduced opacity, then solidifies once Deepgram confirms the final text. The session goes through five states: idle, connecting (up to 10 seconds), recording, finishing (up to 3 seconds to drain final text), and then back to idle.

Toolbar Indicator

When a recording session is active, a mic icon appears in the global toolbar with a pulsing accent dot and elapsed time in M:SS format. The tooltip shows which project and worktree is recording. Clicking the toolbar indicator focuses the panel where recording is happening, but it doesn't stop the session.

Tip

The Cmd+Shift+V shortcut is context-sensitive in agent input bars. When an input bar has focus, the shortcut pastes clipboard content as plain text instead of toggling voice, so use the mic button directly to start recording from there. Notes panels don't intercept the shortcut, so it always toggles voice when a Notes panel is focused.

Dictation in Notes Panels

Voice input is also available inside standalone Notes panels. Each panel has its own floating mic button at the bottom-right of the editor in Edit and Split modes. Preview mode has no editor mounted, so the button doesn't appear there, and it's also hidden while a conflict banner is active.

Transcribed text is inserted at the cursor position that was active when recording started. That lets you dictate into the middle of an existing note without jumping to the end.

Voice is only available in panels opened in the panel grid or dock. The Notes Palette modal (Cmd+Shift+N) has no mic button. To dictate into a note from the palette, select it and press Shift+Enter (grid) or Shift+Cmd+Enter (dock) to open it as a standalone panel first.

Pressing Enter while recording in a Notes panel simply inserts a newline. There's no submit concept in notes, so the wait-before-submit flow described below doesn't apply here.

Warning

Two input bar features don't run when dictating into notes. AI text correction doesn't apply to note content, so the raw transcription is what you see. Spoken paragraph commands ("new paragraph", "new line") aren't processed either. Press Enter to add paragraph breaks manually.

Submitting While Recording

Pressing Enter while recording or while AI corrections are still in flight triggers the wait-before-submit flow:

Daintree stops the recording session.
A spinner overlay appears on the input bar, which becomes read-only.
Daintree waits up to 10 seconds for any pending AI corrections to settle.
Once all corrections are complete (or the timeout is reached), the text is submitted automatically.

Press Escape to cancel the wait and keep the text in the input bar for further editing.

Note

If corrections don't settle within 10 seconds, Daintree submits the text anyway. This safety valve ensures that pressing Enter never blocks indefinitely.

Paragraph Breaks

Spoken Commands (Default, English Only)

With the spoken-command strategy selected, you can say formatting commands while dictating:

"New paragraph", "next paragraph", or "start a new paragraph" inserts a blank line (\n\n).
"New line" or "line break" inserts a single newline (\n).

These commands are handled by Deepgram's Dictation mode and stripped from the transcript. You can also press Enter to commit the current paragraph manually at any time.

Note

Spoken paragraph commands only work in English. For all other languages, the strategy automatically falls back to manual Enter, regardless of what's selected in settings.

Manual Enter

With the manual strategy, paragraph breaks are inserted by pressing Enter only. Spoken formatting commands are disabled. Choose this if you dictate in a non-English language or prefer explicit control over paragraph breaks.

AI Text Correction

The AI correction layer reviews the raw transcription and fixes common issues: punctuation, filler words (um, uh), technical term spelling, and homophones. It uses a confidence-based system where Deepgram's per-word confidence scores determine how much AI intervention is needed.

Words with confidence below 0.8 are flagged as uncertain and prioritised for correction.
If all words in a segment have confidence above 0.85, the AI call is skipped entirely. Good transcription doesn't need fixing.
Text currently being corrected shows a green dotted underline in the input bar, then resolves in place once the correction arrives.

Note

Prompt caching keeps AI correction costs minimal. The system prompt is structured so that only the user message changes per request, allowing the provider to cache the fixed portion.

Enable AI Correction

In Settings > Integrations, scroll to the AI Text Correction section (visible once Voice Input is enabled).
Toggle AI Text Correction on.
Enter your OpenAI API key (starts with sk-).
Choose a correction model. GPT-5 Mini (recommended) applies paragraph-level correction with higher quality. GPT-5 Nano is faster and lower cost, better suited for lower-latency correction. This is a single model selection that applies to all correction passes.

Custom Instructions

You can add project-specific correction rules in the Custom Instructions textarea. These are appended to the core correction prompt, so the AI applies them alongside its built-in rules. For example, you might write: "Always capitalise ProductName as one word" or "React component names should use PascalCase."

The Inspect core prompt toggle in settings lets you view the full base correction prompt (read-only). Your project name and custom dictionary terms are injected into this prompt automatically.

File Reference Resolution

When AI correction is enabled, Daintree can detect spoken file references and resolve them into @file links. Use natural phrases while dictating:

"Link to the auth helper"
"At file the button component"
"Reference the user model"
"Add file the config service"
"Open the main layout"

Daintree's correction prompt detects these patterns and sends them to a file resolver, which searches the project file tree and uses AI to pick the best match. The resolved path appears as an @path/to/file reference that renders as a clickable file chip in the input bar.

If resolution fails, the text falls back to @?description so you can see what was being looked up and fix it manually.

This feature requires AI correction to be enabled and is toggled via Resolve file references in settings (on by default when your OpenAI key is configured).

Tip

Use natural phrases like "link to the auth helper" or "at the Button component". You don't need exact file names or paths. Daintree resolves the description against the project file tree.

Custom Dictionary

The custom dictionary lets you add up to 100 domain-specific terms that Daintree sends to Deepgram as recognition hints. These terms boost transcription accuracy for project names, product names, technical vocabulary, and abbreviations that the base model might not recognize.

Add terms in Settings > Integrations under the Speech-to-Text section. Each term appears as a removable pill. These same terms are also injected into the AI correction prompt, where they're treated as highest-priority required substitutions.

Your project name is automatically included in the recognition hints, so there's no need to add it manually.

Tip

Add technical names, internal package names, and unusual abbreviations your team uses. This directly improves transcription accuracy for those terms, both at the Deepgram level and in the AI correction pass.

Settings Reference

All voice input settings are in Settings > Integrations.

Speech-to-Text

Setting	Values / Notes
Voice Input	Enabled / Disabled (default: Disabled)
Deepgram API Key	`dg_...` prefix, validated on save
Language	English, Spanish, French, German, Japanese, Chinese, Korean, Portuguese, Italian, Russian
Transcription Model	Nova-3 ($0.0077/min, default) or Nova-2 ($0.0043/min)
Paragraph Breaks	Spoken commands (default, English only) or Manual Enter
Custom Dictionary	Up to 100 terms, sent as Deepgram keyterm hints

AI Text Correction

This section is only visible when Voice Input is enabled.

Setting	Values / Notes
AI Text Correction	Enabled / Disabled (default: Disabled)
OpenAI API Key	`sk-...` prefix, validated on save
Correction Model	GPT-5 Mini (recommended, full paragraph) or GPT-5 Nano (faster, lower cost)
Resolve File References	Enabled by default when OpenAI key is configured
Custom Instructions	Free-form textarea for project-specific correction rules

Keyboard Shortcuts

Action	Shortcut
Toggle voice dictation (global)	`Cmd`+`Shift`+`V` / `Ctrl`+`Shift`+`V`
Paste as plain text (when input bar has focus)	`Cmd`+`Shift`+`V` / `Ctrl`+`Shift`+`V`
Commit paragraph / submit	`Enter`
Cancel voice wait-submit	`Escape`

The Cmd+Shift+V shortcut is context-sensitive in agent input bars: it toggles voice when the bar is unfocused, and pastes clipboard content as plain text when it has focus. Notes panels don't intercept the shortcut, so it always toggles voice when a Notes panel is focused. See Keyboard Shortcuts for the full reference.

Troubleshooting

Mic button is hidden

The microphone button only appears when Voice Input is fully configured. Check that:

Voice Input is enabled in Settings > Integrations.
A valid Deepgram API key has been entered.
Microphone permission is granted at the OS level.

"Connection timed out"

Daintree couldn't reach Deepgram within 10 seconds. Verify your internet connection and check that your Deepgram API key is valid and has available credits.

"Invalid API key"

Shown when saving settings if the API key format is incorrect or the key is no longer valid. Re-enter a valid key (Deepgram keys start with dg_, OpenAI keys start with sk-).

Spoken paragraph commands aren't working

Spoken commands (like "new paragraph") only work when:

The language is set to English.
The paragraph break strategy is set to Spoken commands.

Note

Non-English languages fall back to manual Enter automatically, even if the Spoken commands strategy is selected. This is a limitation of the underlying Deepgram Dictation mode.

AI correction not running

Check that AI Text Correction is toggled on and a valid OpenAI API key is entered. If transcription confidence is consistently high (all words above 0.85), the correction call is intentionally skipped because the raw transcription doesn't need fixing.

File references showing as @?

The @?description format means file resolution failed for that reference. This can happen if the project file tree doesn't contain a close match, or if the description was too vague for the AI to resolve. Try using more specific descriptions when dictating file references.