Deepgram Review
Deepgram is a voice AI API platform offering best-in-class speech-to-text, text-to-speech, and real-time voice agent APIs with sub-300ms latency, used by 200,000+ developers and IBM as its named voice AI infrastructure partner.
78
Updated 36d agoFree plan
Best for
- Developers building voice-enabled applications, call center automation, or transcription pipelines
- AI teams building voice agents that need reliable STT and TTS in a single API
- Product teams adding real-time transcription to video conferencing or meeting tools
- Enterprises requiring on-premise or private cloud deployment with HIPAA compliance
- Startups building voice-first products who need cost-effective, scalable infrastructure
Skip this if…
- Non-technical users who need a consumer transcription app rather than an API
- Teams building voice workflows where an all-in-one no-code tool like Otter.ai is sufficient
- Projects requiring speech-to-text in 50+ languages where Google or Azure may have better coverage
- Users who only need occasional one-off transcription rather than continuous API usage
What is Deepgram?
Deepgram is an API platform for voice AI. It offers three core products: speech-to-text that converts audio to text with industry-leading accuracy, text-to-speech that generates natural-sounding voices from text, and a Voice Agent API that combines STT, TTS, and LLM inference into a single endpoint for building conversational voice agents.
Founded in 2015 and headquartered in San Francisco, Deepgram built its own end-to-end deep learning models rather than relying on traditional speech recognition pipelines. The result is significantly lower latency and better accuracy than legacy providers, particularly on noisy audio and accented speech. In February 2026, IBM named Deepgram as its first voice AI partner, integrating Deepgram's APIs into IBM's enterprise AI stack.
Nova-3 and accuracy benchmarks
Deepgram's Nova-3 model consistently ranks first or second in word error rate benchmarks across English audio types. On typical business audio (meetings, phone calls, podcasts), Nova-3 outperforms Google Speech-to-Text v2, AWS Transcribe, and OpenAI Whisper on both accuracy and latency.
The latency story is what separates Deepgram for real-time applications. Sub-300ms round-trip latency makes it viable for live conversation, whereas many competing services introduce delays that make voice agents feel unresponsive. For synchronous voice agent use cases, this is the most important technical differentiator.
Pricing and the Voice Agent API
The free tier provides $200 in API credits, which translates to roughly 46 hours of Nova-3 transcription or 45 minutes of Voice Agent API usage. Pay-as-you-go rates start at $0.0043 per minute for Nova-3 speech-to-text and $0.015 per 1000 characters for text-to-speech.
The Voice Agent API is priced at $4.50 per hour and bundles STT, TTS, and LLM inference together. For teams building voice agents, this simplifies pricing to a single per-conversation cost rather than managing three separate API bills. Volume discounts are negotiable for enterprise workloads above a certain monthly spend.
Community & Tutorials
What creators and developers are saying about Deepgram.
Building a Real-Time Voice Agent with Deepgram in 15 Minutes
Deepgram DevRel · tutorial
Pricing
- Free tier includes $200 in API credits (approximately 46+ hours of audio)$200
- Pay-as-you-go$0.0043/minutefor Nova-3 STT and $0.015/1000 chars for TTS
- Voice Agent API$4.50/hourincludes LLM costs
- Volume discounts and committed-use plans available at scaleCustom
Free And PaidFree plan available
Pros
- Industry-leading word error rate with Nova-3 model, outperforming Google and AWS in benchmarks
- Sub-300ms end-to-end latency makes it practical for real-time conversational AI applications
- Voice Agent API bundles STT, TTS, and LLM costs at $4.50/hour, simplifying budgeting
- 200,000+ developers and $200 free credits lower the barrier to getting started
- IBM partnership (February 2026) validates enterprise-grade reliability and compliance posture
- PartnerStack affiliate program for developers and agencies who build on the platform
- On-premise deployment available for healthcare, finance, and government use cases
Cons
- API-only product, no consumer-facing app for users who need simple file transcription
- Language support is narrower than Google Speech-to-Text or Azure Cognitive Services
- Pay-as-you-go pricing can become unpredictable for applications with variable traffic spikes
- Documentation and onboarding are developer-oriented, less accessible to non-technical evaluators
- Voice Agent API at $4.50/hour includes LLM costs but not all providers, requiring evaluation
Platforms
webapi
Last verified: April 2, 2026
We may earn a commission at no extra cost to you. Learn more