real-time full duplex like OpenAI GPT-4o is pretty expensive. cascaded approaches (usually about 800ms - 1 second delay) are slower and worse, but very very cheap. when I built this a year ago, I estimated the LLM + TTS + other serving costs to be less than the Twilio costs.
which is why we need to adopt nuclear power so we run thousands of these so the odds of them picking up a bot instead of a person is overwhelmingly likely