Websockets

This API provides real-time text-to-speech conversion using WebSockets. It allows you to send text messages and receive audio data back in real-time.

Endpoint

The WebSocket endpoint for the Text-to-Speech API is:

wss://tts.snr.audio/v1/speech/ws

Ideal Usecase for the Websockets Endpoint

The input text is being streamed or in chunks.
Word-to-audio alignment information is required.

Use our Streaming Endpoint wherever:

The entire input text is available with you.
Prototyping or testing the API, rapidly converting text to speech

Protocol

The WebSocket API uses a bidirectional protocol that encodes all messages as JSON objects.

Streaming Input Text

The client can send messages with text input to the server. The messages should contain the following fields:

{
    "text": "Your text here",
    "voice": "sarah",
    "similarity": 0.0,
    "expressiveness": 0.0,
    "pitch": 1.3,
    "speed": 1.0
}

text: The input text to be converted to speech.
voice: The voice to be used for the conversion. (e.g., “sarah”)
similarity: Similarity factor for voice settings.
expressiveness: Expressiveness factor for voice settings.
pitch: Pitch factor for voice settings.
speed: Speed factor for voice settings.

Streaming Output Audio

The server responds with a message containing the audio data:

{
    "audio": "base64_encoded_audio_data",
    "isFinal": false
}

audio: Base64 encoded audio data.
isFinal: Indicates if the audio generation is complete.

Example Script

The following example demonstrates how to use the WebSocket API to convert text to speech in real-time using Python:

import asyncio
import websockets
import json
import base64
import nltk
import subprocess



async def text_to_speech_input_streaming(text_iterator):
    AVAILABLE_FORMATES = [
        "mp3_44100_192",
        "mp3_44100_128",
        "mp3_44100_64",
        "mp3_44100_32",
        "mp3_44100_16",
        "mp3_22050_192",
        "mp3_22050_128",
        "pcm_16000",	
        "pcm_22050",	
        "pcm_24000",	
        "pcm_44100",	
        "mulaw_8000"
    ]
    audio_encoding = "mulaw_8000"
    auth_token ="YOUR_AUTH_TOKEN"
    uri = f"wss://tts.snr.audio/v1/speech/ws?token={auth_token}&output_formate={audio_encoding}"
    async with websockets.connect(uri) as websocket:
        async def listen():
            message = await websocket.recv()
            data = json.loads(message)
            
            if data.get("audio"):
                audio_data = base64.b64decode(data["audio"])
            """Stream audio data using mpv player."""
            mpv_process = subprocess.Popen(
                ["mpv", "--no-cache", "--no-terminal", "--", "fd://0"],
                stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
            )

            print(f"Started streaming audio ")
            #async for chunk in audio_data:
            #    if chunk:
            mpv_process.stdin.write(audio_data)
            mpv_process.stdin.flush()

            if mpv_process.stdin:
                mpv_process.stdin.close()
            mpv_process.wait()
            
        listen_task = asyncio.create_task(listen())

        async for text in text_iterator:
            print(f"sending {text} to SNR")
            payload = {
                "text": text,
                "voice": "sarah",
                "similarity": 0.0,
                "expressiveness": 0.0,
                "pitch": 1.3,
                "speed": 1.0
            }
            await websocket.send(json.dumps(payload))

        await websocket.send(json.dumps({
            "text": " ",
            "voice": "sarah",
            "similarity": 0.0,
            "expressiveness": 0.0,
            "pitch": 1.3,
            "speed": 1.0
        }))

        await listen_task

async def chat_completion(text):
    response = nltk.sent_tokenize(text)
    async def text_iterator():
        for chunk in response:
            yield chunk

    await text_to_speech_input_streaming(text_iterator())

# Main execution
if __name__ == "__main__":
    user_query = """\"Hi how are you ... this is raghuram i want to see how the world is working Hi how are you ... this is raghuram i want to see how the world is working\""""
    asyncio.run(chat_completion(user_query))

Text to Speech

Speech to Text

Endpoint

Ideal Usecase for the Websockets Endpoint

Use our Streaming Endpoint wherever:

Protocol

Streaming Input Text

Streaming Output Audio

Example Script

Text to Speech

Speech to Text

​Endpoint

​Ideal Usecase for the Websockets Endpoint

​Use our Streaming Endpoint wherever:

​Protocol

​Streaming Input Text

​Streaming Output Audio

​Example Script

Endpoint

Ideal Usecase for the Websockets Endpoint

Use our Streaming Endpoint wherever:

Protocol

Streaming Input Text

Streaming Output Audio

Example Script