Industry-leading Voice AI models. Straight from the source.

Modulate's Velma is the #1 model for transcription accuracy, deepfake detection, and conversation understanding. Now available as direct APIs — so you can build on the same intelligence that powers our enterprise platform.

No credit card required.

Icon of a dartboard with an arrow hitting the bullseye, symbolizing goal achievement.
Velma Transcribe

The most accurate speech-to-text API for real-world conversations. Handles interruptions, accents, overlapping speech, and noise that breaks typical systems.

  • #1 accuracy on real-world benchmarks including Earnings 22 and AMI

  • Up to 10× cheaper than Deepgram

  • Batch + real-time streaming

  • Diarization, emotion, accent detection included free

  • PII/PHI redaction (including redacted audio) for +$0.02/hr
Starting at $0.03 / hour
400 hours in free credits
White plus sign on a black background.
Deepfake Detect

The #1 ranked deepfake detection model on 🤗 Hugging Face's Speech Deepfake Arena. Catches what others miss — including mid-call voice switches that gate-check systems are blind to.

  • 1.1% equal error rate, less than half the next-best model

  • 120× lower cost than closest competitor

  • Works with just 3 seconds of audio

  • Segment-level scores, updated every 2 seconds

$0.25 / hour
1,000 free credits
Three colored pencils colored red, yellow, and green pointing upward on a black background.
Voice Intelligence (Coming Soon)

Full voice intelligence via API — intent, emotion, fraud signals, compliance risk, policy violations, and more. Built on the same Ensemble Listening Model that powers Modulate's enterprise platform.

No fine-tuning. No LLM overhead. Auditable, structured outputs you can act on.
Music note icon representing audio classification.
Music & Speech Detect

Classify music and speech at the frame level - handling tricky content like music-with-vocals that single-label detectors get wrong.

  • Independent music + speech probabilities per frame — handles music-with-vocals, jingles, and background music under dialogue.

  • Sub-200ms time-to-first-result on streaming

  • Built for ad-break detection, podcast segmentation, broadcast monitoring, and UGC moderation

  • Batch (REST) + streaming

$0.02 / hour
1,000 hours / month witin base quota
Shield icon representing security and data protection.
PII/PHI Redact

Transcribe audio and automatically redact sensitive personal and health information — replacing detected spans with entity-type tags in the transcript and silencing the matching audio ranges.

  • Replaces names, SSNs, PHI, and more with entity tags (e.g. [FIRSTNAME], [SSN], [PHI])

  • Returns both redacted transcript and silenced audio in one call

  • Multilingual, with speaker diarization included

  • Batch + real-time streaming

+$0.02 / hour add-on to Velma Transcribe

Not just another LLM wrapper.

Most voice AI APIs transcribe audio and hand the text to a language model. Context, tone, and everything that makes a voice conversation meaningful gets discarded at step one.

Velma is built differently. Our Ensemble Listening Model (ELM) processes audio natively — understanding conversations the way a human listener would, with full awareness of how something is said, not just the words.

The result is an API that's more accurate, more cost-efficient, and capable of outputs that text-first systems simply can't produce.

Built for developers shipping production systems

Velma Transcribe is designed to integrate cleanly into modern infrastructure.

REST endpoints for batch transcription

Streaming endpoints for real-time transcription

Predictable structured output for downstream pipelines

Built for scalable high-throughput workloads

Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Read the docs
Person typing on a keyboard in front of a monitor displaying code with potted plants nearby on a desk.

Start free. Scale
when you're ready.

Our API includes free credits to get you started — no card required, no sales call necessary. When you're ready to go to production, usage-based pricing means you pay for what you use.

Need volume, SLAs, or custom endpoints? We can help with that too.