Industry-leading Voice AI models. Straight from the source.

Modulate's Velma is the #1 model for transcription accuracy, deepfake detection, and conversation understanding. Now available as direct APIs — so you can build on the same intelligence that powers our enterprise platform.

Get API Access Read the Docs

Three colored pencils colored red, yellow, and green pointing upward on a black background.

Velma

Audio-native AI that understands conversations the way a human listener would, detecting intent, emotion, fraud signals, compliance risk, and more.

#1 for conversation understanding. Beats major LLMs on accuracy and cost.
Understands emotion from audio, not just word choice. 20+ emotions available.
Access 50 out-of-box behaviors, 100 templates, or build your in plain english to fit your business needs.
Batch + real-time streaming.

Starting at $1.25 / hour
1,000 free credits

View Docs

Icon of a dartboard with an arrow hitting the bullseye, symbolizing goal achievement.

Transcribe

The most accurate speech-to-text API for real-world conversations. Handles interruptions, accents, overlapping speech, and noise that breaks typical systems.

#1 accuracy on real-world benchmarks including Earnings 22 and AMI
Up to 10× cheaper than Deepgram
Batch + real-time streaming
Diarization, emotion, accent detection included free
PII/PHI redaction (including redacted audio) for +$0.02/hr

Starting at $0.03 / hour
‍400 hours in free credits

View Docs

Deepfake Detect

The #1 ranked deepfake detection model on 🤗 Hugging Face's Speech Deepfake Arena. Catches what others miss — including mid-call voice switches that gate-check systems are blind to.

1.1% equal error rate, less than half the next-best model
120× lower cost than closest competitor
Works with just 3 seconds of audio
Segment-level scores, updated every 2 seconds

$0.25 / hour
1,000 free credits

View Docs

Music & Speech Detect

Classify music and speech at the frame level - handling tricky content like music-with-vocals that single-label detectors get wrong.

Independent music + speech probabilities per frame — handles music-with-vocals, jingles, and background music under dialogue.
Sub-200ms time-to-first-result on streaming
Built for ad-break detection, podcast segmentation, broadcast monitoring, and UGC moderation
Batch (REST) + streaming

$0.02 / hour
‍1,000 hours / month witin base quota

PII/PHI Redact

Transcribe audio and automatically redact sensitive personal and health information — replacing detected spans with entity-type tags in the transcript and silencing the matching audio ranges.

Replaces names, SSNs, PHI, and more with entity tags (e.g. [FIRSTNAME], [SSN], [PHI])
Returns both redacted transcript and silenced audio in one call
Multilingual, with speaker diarization included
Batch + real-time streaming

+$0.02 / hour add-on to Velma Transcribe

View Docs

Not just another LLM wrapper.

Most voice AI APIs transcribe audio and hand the text to a language model. Context, tone, and everything that makes a voice conversation meaningful gets discarded at step one.

Velma is built differently. Our Ensemble Listening Model (ELM) processes audio natively — understanding conversations the way a human listener would, with full awareness of how something is said, not just the words.

The result is an API that's more accurate, more cost-efficient, and capable of outputs that text-first systems simply can't produce.

Learn more about the ELM architecture

Built for developers shipping production systems

Velma Transcribe is designed to integrate cleanly into modern infrastructure.

REST endpoints for batch transcription

Streaming endpoints for real-time transcription

Predictable structured output for downstream pipelines

Built for scalable high-throughput workloads

Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Read the docs

Person typing on a keyboard in front of a monitor displaying code with potted plants nearby on a desk.

Start free. Scale
when you're ready.

Our API includes free credits to get you started — no card required, no sales call necessary. When you're ready to go to production, usage-based pricing means you pay for what you use.

Need volume, SLAs, or custom endpoints? We can help with that too.

Get Free API Access

Talk to Sales

Cookie consent notice

Preferences Dashboard

Industry-leading Voice AI models. Straight from the source.

Not just another LLM wrapper.

Built for developers shipping production systems

Start free. Scalewhen you're ready.

Start free. Scale
when you're ready.