Text to Speech
Websockets
This API provides real-time text-to-speech conversion using WebSockets. It allows you to send text messages and receive audio data back in real-time.
Endpoint
The WebSocket endpoint for the Text-to-Speech API is:
wss://tts.snr.audio/v1/speech/ws
Ideal Usecase for the Websockets Endpoint
- The input text is being streamed or in chunks.
- Word-to-audio alignment information is required.
Use our Streaming Endpoint wherever:
- The entire input text is available with you.
- Prototyping or testing the API, rapidly converting text to speech
Protocol
The WebSocket API uses a bidirectional protocol that encodes all messages as JSON objects.
Streaming Input Text
The client can send messages with text input to the server. The messages should contain the following fields:
{
"text": "Your text here",
"voice": "sarah",
"similarity": 0.0,
"expressiveness": 0.0,
"pitch": 1.3,
"speed": 1.0
}
text
: The input text to be converted to speech.voice
: The voice to be used for the conversion. (e.g., “sarah”)similarity
: Similarity factor for voice settings.expressiveness
: Expressiveness factor for voice settings.pitch
: Pitch factor for voice settings.speed
: Speed factor for voice settings.
Streaming Output Audio
The server responds with a message containing the audio data:
{
"audio": "base64_encoded_audio_data",
"isFinal": false
}
audio
: Base64 encoded audio data.isFinal
: Indicates if the audio generation is complete.
Example Script
The following example demonstrates how to use the WebSocket API to convert text to speech in real-time using Python:
import asyncio
import websockets
import json
import base64
import nltk
import subprocess
async def text_to_speech_input_streaming(text_iterator):
AVAILABLE_FORMATES = [
"mp3_44100_192",
"mp3_44100_128",
"mp3_44100_64",
"mp3_44100_32",
"mp3_44100_16",
"mp3_22050_192",
"mp3_22050_128",
"pcm_16000",
"pcm_22050",
"pcm_24000",
"pcm_44100",
"mulaw_8000"
]
audio_encoding = "mulaw_8000"
auth_token ="YOUR_AUTH_TOKEN"
uri = f"wss://tts.snr.audio/v1/speech/ws?token={auth_token}&output_formate={audio_encoding}"
async with websockets.connect(uri) as websocket:
async def listen():
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
audio_data = base64.b64decode(data["audio"])
"""Stream audio data using mpv player."""
mpv_process = subprocess.Popen(
["mpv", "--no-cache", "--no-terminal", "--", "fd://0"],
stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
print(f"Started streaming audio ")
#async for chunk in audio_data:
# if chunk:
mpv_process.stdin.write(audio_data)
mpv_process.stdin.flush()
if mpv_process.stdin:
mpv_process.stdin.close()
mpv_process.wait()
listen_task = asyncio.create_task(listen())
async for text in text_iterator:
print(f"sending {text} to SNR")
payload = {
"text": text,
"voice": "sarah",
"similarity": 0.0,
"expressiveness": 0.0,
"pitch": 1.3,
"speed": 1.0
}
await websocket.send(json.dumps(payload))
await websocket.send(json.dumps({
"text": " ",
"voice": "sarah",
"similarity": 0.0,
"expressiveness": 0.0,
"pitch": 1.3,
"speed": 1.0
}))
await listen_task
async def chat_completion(text):
response = nltk.sent_tokenize(text)
async def text_iterator():
for chunk in response:
yield chunk
await text_to_speech_input_streaming(text_iterator())
# Main execution
if __name__ == "__main__":
user_query = """\"Hi how are you ... this is raghuram i want to see how the world is working Hi how are you ... this is raghuram i want to see how the world is working\""""
asyncio.run(chat_completion(user_query))