Deepgram 评测

Deepgram 是一个语音 AI API 平台,提供业界领先的语音转文字、文字转语音和实时语音智能体 API,端到端延迟低于 300 毫秒,被超过 20 万名开发者使用,并被 IBM 选为官方语音 AI 合作伙伴。

36 天前更新免费版

最适合

  • 构建语音应用、呼叫中心自动化或转录流水线的开发者
  • 构建需要可靠 STT 和 TTS 的语音智能体的 AI 团队
  • 为视频会议工具添加实时转录的产品团队
  • 需要符合 HIPAA 合规的本地或私有云部署的企业

不适合的情况…

  • 需要消费级转录应用而非 API 的非技术用户
  • 构建语音工作流且 Otter.ai 等一体化工具已足够的团队
  • 需要 50 多种语言语音转文字且 Google 或 Azure 覆盖更好的项目

What is Deepgram?

Deepgram is an API platform for voice AI. It offers three core products: speech-to-text that converts audio to text with industry-leading accuracy, text-to-speech that generates natural-sounding voices from text, and a Voice Agent API that combines STT, TTS, and LLM inference into a single endpoint for building conversational voice agents. Founded in 2015 and headquartered in San Francisco, Deepgram built its own end-to-end deep learning models rather than relying on traditional speech recognition pipelines. The result is significantly lower latency and better accuracy than legacy providers, particularly on noisy audio and accented speech. In February 2026, IBM named Deepgram as its first voice AI partner, integrating Deepgram's APIs into IBM's enterprise AI stack.

Nova-3 and accuracy benchmarks

Deepgram's Nova-3 model consistently ranks first or second in word error rate benchmarks across English audio types. On typical business audio (meetings, phone calls, podcasts), Nova-3 outperforms Google Speech-to-Text v2, AWS Transcribe, and OpenAI Whisper on both accuracy and latency. The latency story is what separates Deepgram for real-time applications. Sub-300ms round-trip latency makes it viable for live conversation, whereas many competing services introduce delays that make voice agents feel unresponsive. For synchronous voice agent use cases, this is the most important technical differentiator.

Pricing and the Voice Agent API

The free tier provides $200 in API credits, which translates to roughly 46 hours of Nova-3 transcription or 45 minutes of Voice Agent API usage. Pay-as-you-go rates start at $0.0043 per minute for Nova-3 speech-to-text and $0.015 per 1000 characters for text-to-speech. The Voice Agent API is priced at $4.50 per hour and bundles STT, TTS, and LLM inference together. For teams building voice agents, this simplifies pricing to a single per-conversation cost rather than managing three separate API bills. Volume discounts are negotiable for enterprise workloads above a certain monthly spend.

社区与教程

创作者和开发者对 Deepgram 的看法。

Building a Real-Time Voice Agent with Deepgram in 15 Minutes

Deepgram DevRel · tutorial

定价

免费套餐含 200 美元 API 积分(约 46+ 小时音频)。Nova-3 STT 按需付费从 $0.0043/分钟起。语音智能体 API $4.50/小时,含 LLM 成本。

Free And Paid提供免费版

优点

  • Nova-3 模型词错率业界领先,超越 Google 和 AWS
  • 端到端延迟低于 300 毫秒,适合实时对话 AI 应用
  • 语音智能体 API 以 $4.50/小时打包 STT、TTS 和 LLM 成本
  • 20 万以上开发者和 200 美元免费积分降低了入门门槛
  • IBM 合作关系(2026 年 2 月)验证了企业级可靠性

缺点

  • 仅提供 API 产品,没有面向消费者的简单文件转录应用
  • 语言支持比 Google Speech-to-Text 或 Azure 更有限
  • 按需付费对流量波动大的应用可能难以预算

平台

webapi
最后验证: 2026年4月2日

我们可能会获得佣金,但不会增加您的费用。 了解更多

常见问题

什么是 Deepgram?
Deepgram 是一个语音 AI API 平台,提供业界领先的语音转文字、文字转语音和实时语音智能体 API,端到端延迟低于 300 毫秒,被超过 20 万名开发者使用,并被 IBM 选为官方语音 AI 合作伙伴。
Deepgram 有免费版吗?
是的,Deepgram 提供免费版。免费套餐含 200 美元 API 积分(约 46+ 小时音频)。Nova-3 STT 按需付费从 $0.0043/分钟起。语音智能体 API $4.50/小时,含 LLM 成本。
Deepgram 最适合谁?
Deepgram 最适合构建语音应用、呼叫中心自动化或转录流水线的开发者; 构建需要可靠 STT 和 TTS 的语音智能体的 AI 团队; 为视频会议工具添加实时转录的产品团队; 需要符合 HIPAA 合规的本地或私有云部署的企业。
谁应该跳过 Deepgram?
Deepgram 可能不太适合需要消费级转录应用而非 API 的非技术用户; 构建语音工作流且 Otter.ai 等一体化工具已足够的团队; 需要 50 多种语言语音转文字且 Google 或 Azure 覆盖更好的项目。
Deepgram 有 API 吗?
是的,Deepgram 提供 API 以便程序化访问。
Deepgram 支持哪些平台?
Deepgram 可在 web, api 上使用。

Get the best AI deals in your inbox

Weekly digest of new tools, exclusive promo codes, and comparison guides.

No spam. Unsubscribe anytime.

Deepgram

立即开始