OpenAI Announces Realtime Voice API

bottlepalm · 2024-10-01T22:20:09 1727821209

Looks like even for the non-realtime API they're charging $200/M for output audio. Their current TTS API is $15/M (characters) for output audio, which equates to $60/M if each token is around 4 characters. Then add in the manual piping to the 4o LLM which is $15/M, around $75/M total.

So from $75 to $200/M is a big premium for the convenience of one model and the quality of multi modal input/output. Will have to test and see if it's worth it.

Also is there still no way to connect users directly to OpenAI? Like directly from a user's browser to OpenAI's servers, without the user having to supply their own API key? How does this work with realtime that needs websockets? Do I need an intermediate proxy server for all my users conversations? Seems like a waste of bandwidth, an unnecessary failure point, and a privacy problem. I hope I am wrong.

babyshake · 2024-10-01T17:46:34 1727804794

Is this currently only in the playground for DevDay attendees? Not seeing it on my end.