Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Open is bringing new transcriptions and voice-made AI models on its API that the company claims that its previous manifestations have improved.
For Openai, models fits in its broad “agent” aspect: Creating automated systems that can perform the tasks independently for users. The definition of “agent” may be in disputeHowever, the OpenAI Product describes the Olivier Godment as an explanation as a chatbot that can talk to a business customer.
“We are going to see more agents pop up in the next months” Godment told Techchen during a briefing. “And therefore the general theme is helping consumer and developers earning agents that are useful, available and accurate.”
OpenAI has claimed that its new text-to-spit model, “GPT-4 O-Mini-TTS” not only provides more short and realistic-sounding speeches, but it is more “stable” than the previous generation speech-synthesizing model. Developers can instruct GPT-4 O-Mini-TTS in a natural language-for example, “Speak like a crazy scientist” or “Use a pure voice like a Mind Flower teacher.”
Here is a “true crime-style”: Voice:
And here is a sample of a female “professional” voice:
Jeff Harris, a member of OpenAI’s product staff, told TechCrunch that the goal was to give developers the voice of “experience” and “context” both.
“In different contexts you just don’t want a flat, monotonous voice in different contexts.” “If you are in a customer support experience and you want the voice to apologize because it is wrong, you can actually have that emotion in it … our big faith, here, the developers and the users are not just what it is said, but want to control how talking.”
For the new speech-to-text models of Openai, “GPT-4-Transcriptions” and “GPT-4-Mini-Transcription”, they effectively are the company’s long-dont Whipper Transcription ModelThe Trained “Miscellaneous, High -quality Audio Datasets”, new models can also capture even better in a chaotic environment, even openAI claims, even higher and varied speeches.
Harris added, they are less likely to be hallucinates. Whispers tend to fabricate the words notoriously – And even the whole passages – in conversation, from colorful commentary to imagined treatment treats all things in replicas.
“[T]Hess models are more developed on that front, “Harris said.” To ensure that models are correct is that a reliable voice experience is fully necessary and accurate to have [in this context] Mean models are listening to the words precisely [and] They are not fulfilling the details that they haven’t heard. “
However, your mileage may vary depending on the language that is copied.
According to the Openai’s Internal Benchmark, GPT -4 -Transkrib, more accurate of two transcript models, is approaching 30% (within 120%) for Dravidian languages like Tamil, Telugu, Malayalam and Kannad. This means that three out of every 10 words from the model will be separated from one human transcript of those languages.

At break from the Tradition, Open is not planning its new transcription models publicly. Organization Histor Tihasically has published new versions of whisper For commercial use under MIT license.
Harris said that GPT-4-O-Transcriptions and GPT-4-Mine-Transcriptions “are much larger than whisper” and thus not a good candidate for an open release.
“[T]Hey whisper is not just a model that you can run locally on your laptop, “he said.”[W]E would like to make sure that if we are giving up things in the open source we are thinking of it and we have a model that is truly honorable for that particular need. And we think that end user devices are one of the most attractive cases for open-source models “”
Updated March 20, 2025, 11:54 AM PT to clarify the language Benchmark results update charts around and more recent versions of the word error.