Voice Channel (Phone)

Connect a phone number to your agent via Vapi — setup guide, voice modes, model options, and tips.

Tip: Your agent handles phone calls with the same tools, skills, credentials, and context it has in the dashboard. The only difference is the interface — voice instead of text.

Connecting Voice

The Voice channel uses Vapi to handle phone calls. Vapi manages the telephony layer (STT → LLM → TTS) while your agent provides the intelligence.

Step 1: Create a Vapi Account

Go to vapi.ai and create an account
In the Vapi dashboard, navigate to Dashboard → API Keys
Copy your API key — you'll need it when connecting in Communa

Step 2: Add a Phone Number in Vapi

You need a phone number for callers to reach your agent:

In the Vapi dashboard, go to Phone Numbers
Click Add Phone Number
Choose a provider — Vapi offers built-in numbers, or you can connect your own via Twilio or Vonage
Follow the prompts to provision a number

Info: Vapi's built-in numbers are the easiest way to get started. For production use with specific area codes or international numbers, connect a Twilio or Vonage account.

Step 3: Connect in Communa

Go to your agent → Channels tab
Click Connect Channel → select the Voice tab
Paste your Vapi API key
Select a phone number from the dropdown (Communa fetches your available numbers automatically)
Optionally configure:
- Greeting message — The first thing callers hear (default: "Hi! How can I help you today?")
- STT language — The language for speech-to-text recognition (default: English)
- Conversation model — The LLM that powers the voice conversation (default: GPT-4o)
Click Connect Voice

Communa automatically registers the webhook on your Vapi phone number — no manual webhook configuration needed (unlike WhatsApp).

Test Your Connection

Call the phone number shown on your connection card. Your agent should pick up, speak the greeting message, and be ready to converse.

How It Works

The Voice channel uses Vapi's native LLM integration with smart turn-taking for natural conversations:

Caller speaks → Vapi transcribes speech to text (STT)
Vapi sends text directly to the selected LLM (low latency, native integration)
LLM responds → Vapi converts to speech (TTS) → played to caller
When the caller needs the agent to do something (check emails, run a script, look up data), the LLM invokes the agent_action tool — your full agent pipeline runs and returns the result

This gives you natural conversational flow for simple interactions, with the full power of your agent's tools (computer, credentials, skills, email, web search, etc.) available when needed.

Model Options

Choose which LLM powers the voice conversation:

OpenAI

Model	Notes
GPT-5.4	Latest flagship
GPT-5.4 Mini	Fast, cost-effective
GPT-5.4 Nano	Ultra-lightweight
GPT-5.2	Previous generation
GPT-5.1	Previous generation
GPT-5	First GPT-5 release
GPT-5 Mini	Compact GPT-5
GPT-5 Nano	Lightweight GPT-5
o3	Reasoning model
o4 Mini	Compact reasoning
GPT-4o	Reliable all-rounder
GPT-4o Mini	Fast and affordable

Anthropic

Model	Notes
Claude Sonnet 4	Balanced performance
Claude Sonnet 4.5	Enhanced capabilities
Claude Haiku 4.5	Fast, cost-effective
Claude Opus 4	Most capable
Claude 3.5 Sonnet	Previous generation
Claude 3.5 Haiku	Previous generation

Google

Model	Notes
Gemini 2.5 Flash	Fast and capable
Gemini 2.5 Pro	Most capable
Gemini 2.0 Flash	Previous generation
Gemini 1.5 Flash	Lightweight
Gemini 1.5 Pro	Previous generation

Info: The conversation model handles turn-taking and general responses. When tools are invoked, the agent uses its configured model from the Chat settings.

Voice Settings

The Settings tab includes a Voice section where you can customize the AI's behavior during phone calls.

Voice Processing Instructions

These are the primary instructions your agent receives when handling calls. They control:

How the agent greets and interacts with callers
When and how to use tools during calls
Language preferences and multilingual behavior
Conversation style and tone

The default instructions configure a natural, multilingual conversational style. You can customize them for your specific use case — for example, making the agent always respond in a specific language, follow a script, or prioritize certain tools.

Tip: Voice output rules (no markdown, spell out numbers, no emojis) are added automatically as guardrails. You don't need to include those in your custom instructions.

Reset to Defaults

If you've customized the instructions and want to start fresh, click the Reset button to restore the default voice processing instructions.

Features

Auto-Wake

When your agent is sleeping and a call comes in:

The computer is automatically started — no dashboard visit needed
The greeting message plays while the computer warms up
The agent is ready to handle the call with full capabilities

This means your agent is effectively always reachable by phone, even when its computer is shut down to save resources.

Greeting Message

The first thing callers hear when the call connects. Configurable during setup or in the connection settings. Supports personalization — if the caller's name is available (from caller ID), it's included automatically.

Speech-to-Text (STT)

English (default), Hebrew, Spanish, French, German, Arabic, and many more
Language selection optimizes recognition accuracy for the primary spoken language
The agent itself can respond in any language based on its instructions

Text-to-Speech (TTS)

Configurable voice provider and voice ID. Default: OpenAI Alloy — a natural, conversational voice. You can change the TTS provider and voice in the connection configuration.

Call Duration

Default maximum: 30 minutes per call. Configurable in the connection settings. After the maximum duration, the call ends gracefully.

Call Recording

Enabled by default. Call recordings are captured by Vapi and available for review. Useful for quality assurance and training.

End-Call Function

The agent can hang up the call when appropriate — for example, after saying goodbye or when the caller's needs are fully addressed. This is handled automatically by Vapi's end-call function.

Silence Timeout

If neither party speaks for 30 seconds, the call ends automatically. This prevents abandoned calls from consuming resources.

How Voice Differs from Text Channels

Feature	Telegram / WhatsApp	Voice
Message format	Text + attachments	Spoken audio (STT/TTS)
File attachments	✅ Photos, docs, videos	❌ Audio only
`send_channel_message`	✅ Used for outbound messages	❌ Vapi handles audio delivery
Bot commands	✅ Telegram: /start, /stop, /help	❌ Not applicable
Group chats	✅ Telegram groups	❌ 1:1 calls only
Conversation history	Messages appear in dashboard chat	Call transcript appears after call ends
Latency	Near-instant text delivery	Low latency with smart turn-taking

Tips & Best Practices

Choose the right model — GPT-4o is a great default. For simpler use cases, GPT-4o Mini offers faster responses at lower cost.
Test your greeting message — Call your agent and listen to the first impression. A good greeting sets the tone for the entire call.
Keep voice instructions focused — Unlike text chat, callers can't scroll back. Instruct your agent to be concise and confirm understanding.
Set the right STT language — If your callers primarily speak a non-English language, set the STT language accordingly for better recognition accuracy.
Combine with other channels — An agent can handle phone calls during business hours and process Telegram/WhatsApp messages anytime. Use the voice channel for high-touch interactions and text channels for async communication.
Monitor from the dashboard — While a call is in progress, you can observe the agent's actions in the dashboard chat in real time.

What's Next?

Channels Overview — Shared channel features, auto-wake, and connection management
Telegram Channel — Connect your agent via Telegram
WhatsApp Channel — Connect your agent via WhatsApp
Chat & Computer — The dashboard workspace for direct agent interaction

WhatsApp Channel

Webhook / API Channel