Voice Channel (Phone)
Connect a phone number to your agent via Vapi — setup guide, voice modes, model options, and tips.
Tip: Your agent handles phone calls with the same tools, skills, credentials, and context it has in the dashboard. The only difference is the interface — voice instead of text.
Connecting Voice
The Voice channel uses Vapi to handle phone calls. Vapi manages the telephony layer (STT → LLM → TTS) while your agent provides the intelligence.
Step 1: Create a Vapi Account
- Go to vapi.ai and create an account
- In the Vapi dashboard, navigate to Dashboard → API Keys
- Copy your API key — you'll need it when connecting in Communa
Step 2: Add a Phone Number in Vapi
You need a phone number for callers to reach your agent:
- In the Vapi dashboard, go to Phone Numbers
- Click Add Phone Number
- Choose a provider — Vapi offers built-in numbers, or you can connect your own via Twilio or Vonage
- Follow the prompts to provision a number
Info: Vapi's built-in numbers are the easiest way to get started. For production use with specific area codes or international numbers, connect a Twilio or Vonage account.
Step 3: Connect in Communa
- Go to your agent → Channels tab
- Click Connect Channel → select the Voice tab
- Paste your Vapi API key
- Select a phone number from the dropdown (Communa fetches your available numbers automatically)
- Optionally configure:
- Greeting message — The first thing callers hear (default: "Hi! How can I help you today?")
- STT language — The language for speech-to-text recognition (default: English)
- Voice mode — Custom LLM or Managed (see below)
- Click Connect Voice
Communa automatically registers the webhook on your Vapi phone number — no manual webhook configuration needed (unlike WhatsApp).
Test Your Connection
Call the phone number shown on your connection card. Your agent should pick up, speak the greeting message, and be ready to converse.
Voice Modes
The Voice channel supports two modes that trade off between capability and latency:
Custom LLM (Default)
In Custom LLM mode, Vapi routes the conversation through your agent's full pipeline:
- Caller speaks → Vapi transcribes speech to text (STT)
- Text is sent to your agent's full agentic pipeline — same as dashboard chat
- Agent processes with complete tool access (sandbox, bash, web search, credentials, skills, datasets, email)
- Response text is sent back to Vapi → converted to speech (TTS) → played to caller
Pros: Full agent capabilities — everything the agent can do in the dashboard, it can do on a phone call.
Cons: Higher latency due to the full pipeline processing. Each response may take a few seconds, which can feel less conversational for simple Q&A.
Best for: Agents that need to perform real work during calls — looking up data, running scripts, accessing credentials, using skills.
Managed Mode
In Managed mode, Vapi runs the LLM natively with optimized turn-taking:
- Caller speaks → Vapi transcribes speech to text (STT)
- Vapi sends text directly to the selected LLM (no intermediary)
- LLM responds → Vapi converts to speech (TTS) → played to caller
- Agent capabilities are available via a synthetic
agent_actiontool that Vapi invokes when needed
Pros: Lower latency, more natural conversational flow. Smart endpointing detects when the caller is done speaking.
Cons: Agent capabilities are accessed through tool calls rather than being native, which adds latency only when tools are needed.
Best for: Conversational agents where fast turn-taking matters — customer support, scheduling, FAQ bots. The agent can still access its sandbox and tools when needed via agent_action.
Model Options (Managed Mode)
When using Managed mode, you can choose which LLM powers the conversation:
OpenAI
| Model | Notes |
|---|---|
| GPT-5.4 | Latest flagship |
| GPT-5.4 Mini | Fast, cost-effective |
| GPT-5.4 Nano | Ultra-lightweight |
| GPT-5.2 | Previous generation |
| GPT-5.1 | Previous generation |
| GPT-5 | First GPT-5 release |
| GPT-5 Mini | Compact GPT-5 |
| GPT-5 Nano | Lightweight GPT-5 |
| o3 | Reasoning model |
| o4 Mini | Compact reasoning |
| GPT-4o | Reliable all-rounder |
| GPT-4o Mini | Fast and affordable |
Anthropic
| Model | Notes |
|---|---|
| Claude Sonnet 4 | Balanced performance |
| Claude Sonnet 4.5 | Enhanced capabilities |
| Claude Haiku 4.5 | Fast, cost-effective |
| Claude Opus 4 | Most capable |
| Claude 3.5 Sonnet | Previous generation |
| Claude 3.5 Haiku | Previous generation |
| Model | Notes |
|---|---|
| Gemini 2.5 Flash | Fast and capable |
| Gemini 2.5 Pro | Most capable |
| Gemini 2.0 Flash | Previous generation |
| Gemini 1.5 Flash | Lightweight |
| Gemini 1.5 Pro | Previous generation |
Info: In Custom LLM mode, the model is determined by your agent's Chat settings — the same model used for dashboard conversations.
Voice Settings
The Settings tab includes a Voice section where you can customize the AI's behavior during phone calls.
Voice Processing Instructions
These are the primary instructions your agent receives when handling calls. They control:
- How the agent greets and interacts with callers
- When and how to use tools during calls
- Language preferences and multilingual behavior
- Conversation style and tone
The default instructions configure a natural, multilingual conversational style. You can customize them for your specific use case — for example, making the agent always respond in a specific language, follow a script, or prioritize certain tools.
Tip: Voice output rules (no markdown, spell out numbers, no emojis) are added automatically as guardrails. You don't need to include those in your custom instructions.
Reset to Defaults
If you've customized the instructions and want to start fresh, click the Reset button to restore the default voice processing instructions.
Features
Auto-Wake
When your agent is sleeping and a call comes in:
- The sandbox is automatically provisioned — no dashboard visit needed
- The greeting message plays while the sandbox warms up
- The agent is ready to handle the call with full capabilities
This means your agent is effectively always reachable by phone, even when its sandbox is shut down to save resources.
Greeting Message
The first thing callers hear when the call connects. Configurable during setup or in the connection settings. Supports personalization — if the caller's name is available (from caller ID), it's included automatically.
Speech-to-Text (STT)
Powered by Deepgram Nova-3. Configure the primary language during setup:
- English (default), Hebrew, Spanish, French, German, Arabic, and many more
- Language selection optimizes recognition accuracy for the primary spoken language
- The agent itself can respond in any language based on its instructions
Text-to-Speech (TTS)
Configurable voice provider and voice ID. Default: OpenAI Alloy — a natural, conversational voice. You can change the TTS provider and voice in the connection configuration.
Call Duration
Default maximum: 30 minutes per call. Configurable in the connection settings. After the maximum duration, the call ends gracefully.
Call Recording
Enabled by default. Call recordings are captured by Vapi and available for review. Useful for quality assurance and training.
End-Call Function
The agent can hang up the call when appropriate — for example, after saying goodbye or when the caller's needs are fully addressed. This is handled automatically by Vapi's end-call function.
Silence Timeout
If neither party speaks for 30 seconds, the call ends automatically. This prevents abandoned calls from consuming resources.
How Voice Differs from Text Channels
| Feature | Telegram / WhatsApp | Voice |
|---|---|---|
| Message format | Text + attachments | Spoken audio (STT/TTS) |
| File attachments | ✅ Photos, docs, videos | ❌ Audio only |
send_channel_message | ✅ Used for outbound messages | ❌ Vapi handles audio delivery |
| Bot commands | ✅ Telegram: /start, /stop, /help | ❌ Not applicable |
| Group chats | ✅ Telegram groups | ❌ 1:1 calls only |
| Conversation history | Messages appear in dashboard chat | Call transcript appears after call ends |
| Latency | Near-instant text delivery | Depends on voice mode (managed = low, custom-llm = moderate) |
Tips & Best Practices
- Start with Managed mode for most use cases — it provides the most natural conversational experience. Switch to Custom LLM if you need the agent to perform complex tasks during calls.
- Test your greeting message — Call your agent and listen to the first impression. A good greeting sets the tone for the entire call.
- Keep voice instructions focused — Unlike text chat, callers can't scroll back. Instruct your agent to be concise and confirm understanding.
- Set the right STT language — If your callers primarily speak a non-English language, set the STT language accordingly for better recognition accuracy.
- Combine with other channels — An agent can handle phone calls during business hours and process Telegram/WhatsApp messages anytime. Use the voice channel for high-touch interactions and text channels for async communication.
- Monitor from the dashboard — While a call is in progress, you can observe the agent's actions in the dashboard chat in real time.
What's Next?
- Channels Overview — Shared channel features, auto-wake, and connection management
- Telegram Channel — Connect your agent via Telegram
- WhatsApp Channel — Connect your agent via WhatsApp
- Chat & Sandbox — The dashboard workspace for direct agent interaction