How VoiceHub Works

Core Models

VoiceHub operates as an orchestration layer over three core components in the voice AI pipeline:

  1. Transcriber (STT) — Converts audio input into text

  2. Language Model (LLM) — Interprets the text and generates intelligent responses

  3. Voice (TTS) — Converts the LLM’s response back into speech

These modules can be flexibly configured using top-tier providers:

  • STT: Deepgram, Gladia, Azure, etc.

  • LLM: OpenAI, Groq, Claude, Cohere, etc.

  • TTS: ElevenLabs, PlayHT, LMNT, etc.

VoiceHub handles orchestration, streaming, and optimization across the three components — ensuring real-time interaction, smooth latency, and seamless switching between providers.


The Voice-to-Voice Pipeline (Real-Time Streaming)

Step 1: Listen (Intake Raw Audio)

User speaks into their device (laptop, phone, etc.). Audio is streamed and recorded in real time. That audio is then transcribed by the selected STT engine into text.

Step 2: Understand (Run an LLM)

The transcribed text is sent to the selected LLM model. That model uses the agent’s prompt and context to generate a response.

Step 3: Speak (Text → Raw Audio)

The response text is passed to a TTS engine, which synthesizes speech audio and streams it back to the user.

🎯 All three steps are optimized for real-time execution, targeting end-to-end latency of <200–500ms, depending on configuration.


What Makes VoiceHub Unique

  • You can switch providers at each stage without writing custom glue code

  • We stream audio and text between stages for responsiveness down to ~200ms

  • Our routing system handles scaling, retrying, failover, and QoS behind the scenes


Built-in DQ Models

For teams focused on Arabic, English or Dutch support, VoiceHub offers in-house DataQueue models:

  • Optimized for MENA voice patterns

  • Fine-tuned for dialect-specific recognition and tone

  • Lower latency and higher reliability than many generic models

Use DQ Mode for:

  • Arabic-first customer service

  • Multilingual deployments with low setup overhead

  • Government, telco, or regulated deployments with regional preferences

You can switch to DQ Mode at any time from the Configuration panel.

Last updated