Voice & VoIP

Text-to-Speech (TTS)

Convert text to natural-sounding audio using AWS Polly, Azure Neural TTS, OpenAI, and ElevenLabs voice engines — accessed through a single unified API. Power IVR prompts, RVM messages, outbound voice notifications, and AI voice bots with production-ready synthesis across 60+ languages and 300+ voices.

Get Started Talk to Sales

Overview

One API, Multiple TTS Engines

Rather than integrating each TTS vendor separately, MOBITELSMS provides a single REST endpoint that routes synthesis requests to the appropriate engine based on voice selection, cost preference, or quality requirements. Responses are cached in our high-speed distributed cache to eliminate redundant synthesis for identical text-voice combinations, reducing cost by up to 80% on high-volume use cases.

Features

TTS Platform Capabilities

Multi-Engine Support

Route to AWS Polly, Azure Neural TTS, OpenAI TTS, or ElevenLabs based on voice name, engine policy, or fallback rules. Automatic engine failover maintains availability if one provider has an outage.

60+ Languages

Synthesis in English, Spanish, French, German, Portuguese, Arabic, Mandarin, Japanese, and 50+ more languages. Neural voices deliver near-human prosody and natural pacing across all supported languages.

SSML Support

Full Speech Synthesis Markup Language support for fine-grained control over pronunciation, rate, pitch, volume, pauses, and phoneme substitution. Standardised SSML works consistently across all supported TTS engines.

Audio Caching

Identical text-voice pairs are synthesised once and cached in our high-speed distributed cache with configurable TTL. Cache hit rates above 80% are typical for IVR prompts, reducing both latency and provider API costs significantly.

Output Formats

Receive audio as MP3, WAV (PCM 8/16kHz), OGG, or telephony-optimised ULAW/ALAW for direct injection into IVR systems. Streaming output supported for real-time voice bot applications.

Telecom Integration

Native integration with the MOBITELSMS IVR and RVM systems — synthesise prompts on-demand during a live call or pre-generate audio for voicemail drops. No external TTS API keys needed when using our hosted service.

Technical Specifications