v0.1.0 · Open-source realtime voice runtime

Ship voice AI that can change its mind.

Vona is the Rust runtime layer for voice-native products: fast, provider-neutral speech-to-speech infrastructure with backend adapters, local model provisioning, deterministic interruption tests, and transport-ready session orchestration.

Start building View source

transport HTTP IPC realtime

01 audio.frame.in 20ms

02 backend.step seamless

03 interruption.detected barge-in

04 tool.context.injected schema-ok

05 audio.frame.out first=0ms

13 Workspace crates

6 Provider surfaces

2 Local STS families

MIT Open source

Runtime substrate

The layer between a demo and a real voice product.

Vona owns the reusable runtime boundary: sessions, frames, tools, fallback decisions, and backend-neutral contracts. Your product owns the experience, policy, and deployment.

Speech-to-speech contracts

Step-oriented and event-stream voice interfaces give hosted realtime APIs, local STS models, and sidecar transports one coherent Rust boundary.

AudioInputFrame BackendStep RealtimeVoiceOutput

Interruption-aware sessions

Measure time-to-first-audio, tool calls, interruption behavior, output frames after barge-in, and fallback decisions in tests before users discover the edge cases.

Transport ready

Run backends in-process, behind local HTTP, or over length-prefixed IPC without rewriting host application logic.

Tool context without glue sprawl

Skill registries, schema validation, audit events, and external context injection let voice systems coordinate with application tools while keeping runtime policy explicit.

SkillRegistry ExternalContextEvent FallbackPolicy

Backend portability

Local when you need control. Cloud when you need reach.

Vona is designed for mixed reality: one app can explore local Moshi and Seamless-style paths, cloud realtime protocols, STT/TTS cascades, and sidecar deployments without hard-coding its future to a single provider.

OpenAI Realtimeevent-stream mapping

Gemini Liverealtime protocol surface

Azure Voice Livevoice and speech helpers

ElevenLabsstreaming TTS surface

DeepgramFlux, listen, Aura helpers

Moshi / Seamlesslocal STS families

Release confidence

Test the voice moments that normally feel impossible to pin down.

Vona’s deterministic harnesses make the slippery parts of speech runtime behavior observable: event order, first audio, tool calls, interruption cleanup, fallback paths, and transport latency.

$ cargo run -p vona-test-harness --example mock_session --locked
session_id=mock-session-1
close_reason=BackendFinished
metrics time_to_first_audio_ms=Some(0) tool_calls=1 interruptions=1 fallback_count=0
output_frames_after_interruption=0
injected_events=1

$ bash scripts/release_gate.sh
[release-gate] Release gate PASSED

Start with the facade

One dependency. Opt-in surfaces.

Use the umbrella crate for applications, then enable only the adapters you actually ship. Drop lower when you need direct crate boundaries.

Read the README Changelog

[dependencies]
vona = { version = "0.1.0", features = [
  "seamless",
  "moshi",
  "transport-local",
  "openai-realtime",
  "model-provisioning",
] }