v0.1.0 · Open-source realtime voice runtime

Ship voice AI that can change its mind.

Vona is the Rust runtime layer for voice-native products: fast, provider-neutral speech-to-speech infrastructure with backend adapters, local model provisioning, deterministic interruption tests, and transport-ready session orchestration.

vona session trace
transport HTTP IPC realtime
01 audio.frame.in 20ms
02 backend.step seamless
03 interruption.detected barge-in
04 tool.context.injected schema-ok
05 audio.frame.out first=0ms
13 Workspace crates
6 Provider surfaces
2 Local STS families
MIT Open source
Runtime substrate

The layer between a demo and a real voice product.

Vona owns the reusable runtime boundary: sessions, frames, tools, fallback decisions, and backend-neutral contracts. Your product owns the experience, policy, and deployment.

Speech-to-speech contracts

Step-oriented and event-stream voice interfaces give hosted realtime APIs, local STS models, and sidecar transports one coherent Rust boundary.

AudioInputFrame BackendStep RealtimeVoiceOutput

Interruption-aware sessions

Measure time-to-first-audio, tool calls, interruption behavior, output frames after barge-in, and fallback decisions in tests before users discover the edge cases.

Transport ready

Run backends in-process, behind local HTTP, or over length-prefixed IPC without rewriting host application logic.

Tool context without glue sprawl

Skill registries, schema validation, audit events, and external context injection let voice systems coordinate with application tools while keeping runtime policy explicit.

SkillRegistry ExternalContextEvent FallbackPolicy
Backend portability

Local when you need control. Cloud when you need reach.

Vona is designed for mixed reality: one app can explore local Moshi and Seamless-style paths, cloud realtime protocols, STT/TTS cascades, and sidecar deployments without hard-coding its future to a single provider.

OpenAI Realtimeevent-stream mapping
Gemini Liverealtime protocol surface
Azure Voice Livevoice and speech helpers
ElevenLabsstreaming TTS surface
DeepgramFlux, listen, Aura helpers
Moshi / Seamlesslocal STS families
Release confidence

Test the voice moments that normally feel impossible to pin down.

Vona’s deterministic harnesses make the slippery parts of speech runtime behavior observable: event order, first audio, tool calls, interruption cleanup, fallback paths, and transport latency.

release gate
$ cargo run -p vona-test-harness --example mock_session --locked
session_id=mock-session-1
close_reason=BackendFinished
metrics time_to_first_audio_ms=Some(0) tool_calls=1 interruptions=1 fallback_count=0
output_frames_after_interruption=0
injected_events=1

$ bash scripts/release_gate.sh
[release-gate] Release gate PASSED
Start with the facade

One dependency. Opt-in surfaces.

Use the umbrella crate for applications, then enable only the adapters you actually ship. Drop lower when you need direct crate boundaries.

Cargo.toml
[dependencies]
vona = { version = "0.1.0", features = [
  "seamless",
  "moshi",
  "transport-local",
  "openai-realtime",
  "model-provisioning",
] }