Voice Channel (Phone)

Connect a phone number to your agent via Vapi — setup guide, voice modes, model options, and tips.

Tip: Your agent handles phone calls with the same tools, skills, credentials, and context it has in the dashboard. The only difference is the interface — voice instead of text.

Connecting Voice

The Voice channel uses Vapi to handle phone calls. Vapi manages the telephony layer (STT → LLM → TTS) while your agent provides the intelligence.

Step 1: Create a Vapi Account

  1. Go to vapi.ai and create an account
  2. In the Vapi dashboard, navigate to Dashboard → API Keys
  3. Copy your API key — you'll need it when connecting in Communa

Step 2: Add a Phone Number in Vapi

You need a phone number for callers to reach your agent:

  1. In the Vapi dashboard, go to Phone Numbers
  2. Click Add Phone Number
  3. Choose a provider — Vapi offers built-in numbers, or you can connect your own via Twilio or Vonage
  4. Follow the prompts to provision a number

Info: Vapi's built-in numbers are the easiest way to get started. For production use with specific area codes or international numbers, connect a Twilio or Vonage account.

Step 3: Connect in Communa

  1. Go to your agent → Channels tab
  2. Click Connect Channel → select the Voice tab
  3. Paste your Vapi API key
  4. Select a phone number from the dropdown (Communa fetches your available numbers automatically)
  5. Optionally configure:
    • Greeting message — The first thing callers hear (default: "Hi! How can I help you today?")
    • STT language — The language for speech-to-text recognition (default: English)
    • Voice mode — Custom LLM or Managed (see below)
  6. Click Connect Voice

Communa automatically registers the webhook on your Vapi phone number — no manual webhook configuration needed (unlike WhatsApp).

Test Your Connection

Call the phone number shown on your connection card. Your agent should pick up, speak the greeting message, and be ready to converse.


Voice Modes

The Voice channel supports two modes that trade off between capability and latency:

Custom LLM (Default)

In Custom LLM mode, Vapi routes the conversation through your agent's full pipeline:

  1. Caller speaks → Vapi transcribes speech to text (STT)
  2. Text is sent to your agent's full agentic pipeline — same as dashboard chat
  3. Agent processes with complete tool access (sandbox, bash, web search, credentials, skills, datasets, email)
  4. Response text is sent back to Vapi → converted to speech (TTS) → played to caller

Pros: Full agent capabilities — everything the agent can do in the dashboard, it can do on a phone call.

Cons: Higher latency due to the full pipeline processing. Each response may take a few seconds, which can feel less conversational for simple Q&A.

Best for: Agents that need to perform real work during calls — looking up data, running scripts, accessing credentials, using skills.

Managed Mode

In Managed mode, Vapi runs the LLM natively with optimized turn-taking:

  1. Caller speaks → Vapi transcribes speech to text (STT)
  2. Vapi sends text directly to the selected LLM (no intermediary)
  3. LLM responds → Vapi converts to speech (TTS) → played to caller
  4. Agent capabilities are available via a synthetic agent_action tool that Vapi invokes when needed

Pros: Lower latency, more natural conversational flow. Smart endpointing detects when the caller is done speaking.

Cons: Agent capabilities are accessed through tool calls rather than being native, which adds latency only when tools are needed.

Best for: Conversational agents where fast turn-taking matters — customer support, scheduling, FAQ bots. The agent can still access its sandbox and tools when needed via agent_action.


Model Options (Managed Mode)

When using Managed mode, you can choose which LLM powers the conversation:

OpenAI

ModelNotes
GPT-5.4Latest flagship
GPT-5.4 MiniFast, cost-effective
GPT-5.4 NanoUltra-lightweight
GPT-5.2Previous generation
GPT-5.1Previous generation
GPT-5First GPT-5 release
GPT-5 MiniCompact GPT-5
GPT-5 NanoLightweight GPT-5
o3Reasoning model
o4 MiniCompact reasoning
GPT-4oReliable all-rounder
GPT-4o MiniFast and affordable

Anthropic

ModelNotes
Claude Sonnet 4Balanced performance
Claude Sonnet 4.5Enhanced capabilities
Claude Haiku 4.5Fast, cost-effective
Claude Opus 4Most capable
Claude 3.5 SonnetPrevious generation
Claude 3.5 HaikuPrevious generation

Google

ModelNotes
Gemini 2.5 FlashFast and capable
Gemini 2.5 ProMost capable
Gemini 2.0 FlashPrevious generation
Gemini 1.5 FlashLightweight
Gemini 1.5 ProPrevious generation

Info: In Custom LLM mode, the model is determined by your agent's Chat settings — the same model used for dashboard conversations.


Voice Settings

The Settings tab includes a Voice section where you can customize the AI's behavior during phone calls.

Voice Processing Instructions

These are the primary instructions your agent receives when handling calls. They control:

  • How the agent greets and interacts with callers
  • When and how to use tools during calls
  • Language preferences and multilingual behavior
  • Conversation style and tone

The default instructions configure a natural, multilingual conversational style. You can customize them for your specific use case — for example, making the agent always respond in a specific language, follow a script, or prioritize certain tools.

Tip: Voice output rules (no markdown, spell out numbers, no emojis) are added automatically as guardrails. You don't need to include those in your custom instructions.

Reset to Defaults

If you've customized the instructions and want to start fresh, click the Reset button to restore the default voice processing instructions.


Features

Auto-Wake

When your agent is sleeping and a call comes in:

  1. The sandbox is automatically provisioned — no dashboard visit needed
  2. The greeting message plays while the sandbox warms up
  3. The agent is ready to handle the call with full capabilities

This means your agent is effectively always reachable by phone, even when its sandbox is shut down to save resources.

Greeting Message

The first thing callers hear when the call connects. Configurable during setup or in the connection settings. Supports personalization — if the caller's name is available (from caller ID), it's included automatically.

Speech-to-Text (STT)

Powered by Deepgram Nova-3. Configure the primary language during setup:

  • English (default), Hebrew, Spanish, French, German, Arabic, and many more
  • Language selection optimizes recognition accuracy for the primary spoken language
  • The agent itself can respond in any language based on its instructions

Text-to-Speech (TTS)

Configurable voice provider and voice ID. Default: OpenAI Alloy — a natural, conversational voice. You can change the TTS provider and voice in the connection configuration.

Call Duration

Default maximum: 30 minutes per call. Configurable in the connection settings. After the maximum duration, the call ends gracefully.

Call Recording

Enabled by default. Call recordings are captured by Vapi and available for review. Useful for quality assurance and training.

End-Call Function

The agent can hang up the call when appropriate — for example, after saying goodbye or when the caller's needs are fully addressed. This is handled automatically by Vapi's end-call function.

Silence Timeout

If neither party speaks for 30 seconds, the call ends automatically. This prevents abandoned calls from consuming resources.


How Voice Differs from Text Channels

FeatureTelegram / WhatsAppVoice
Message formatText + attachmentsSpoken audio (STT/TTS)
File attachments✅ Photos, docs, videos❌ Audio only
send_channel_message✅ Used for outbound messages❌ Vapi handles audio delivery
Bot commands✅ Telegram: /start, /stop, /help❌ Not applicable
Group chats✅ Telegram groups❌ 1:1 calls only
Conversation historyMessages appear in dashboard chatCall transcript appears after call ends
LatencyNear-instant text deliveryDepends on voice mode (managed = low, custom-llm = moderate)

Tips & Best Practices

  • Start with Managed mode for most use cases — it provides the most natural conversational experience. Switch to Custom LLM if you need the agent to perform complex tasks during calls.
  • Test your greeting message — Call your agent and listen to the first impression. A good greeting sets the tone for the entire call.
  • Keep voice instructions focused — Unlike text chat, callers can't scroll back. Instruct your agent to be concise and confirm understanding.
  • Set the right STT language — If your callers primarily speak a non-English language, set the STT language accordingly for better recognition accuracy.
  • Combine with other channels — An agent can handle phone calls during business hours and process Telegram/WhatsApp messages anytime. Use the voice channel for high-touch interactions and text channels for async communication.
  • Monitor from the dashboard — While a call is in progress, you can observe the agent's actions in the dashboard chat in real time.

What's Next?