ElevenLabs is launching its own speech-to-text model

Spread the love

AgenlabAn AI startup that barely raised an $ 180 million mega funding roundOriginally known for its audio generation skills. The company first took the Standelone Speech-to-Text-text model a step called Scribe.

Startup, Price is $ 3.3 billionMany other organizations have helped to provide speech-to-text services through its vast library. However, the company is now seeking to go into speech identification and compete with preferences Sword, Speechmatics, Assembly, DipgramAnd Openai’s whipper models.

Elevlab’s Scribe Model supports more than 99 languages at launch. The company classifies more than 25 languages in the category of great accuracy for the model where the rate of defects is less than 5%. This list includes English (97%of claims claimed), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish and Vietnamese. Other languages are high (5-10% word defect rate), well (10 to 20% noise defect rate), and medium (25 to 50%) in various categories, including the rate of defects.

The company says the model has surpassed Google Gemi 2.0 flash and Fleers and General Voice Benchmark exams large V3 through multiple languages.

Elevlab created speech-to-text material for the AI conversation agent platform published last year. However this is the first time The company is releasing a distinct speech identification modelThe In a conversation with TechCrunch last month, the CEO Mati Stanisjusky spoke about the detection models of the lecture detection.

“We want to understand what is being better saying in your conversation. We’re simply working on the way to make content and move away from compromise and lecture replica, “Stanisjusky said.” Many people say that the text from speech is a solution problem. But it is pretty bad for many languages. We think we are better Can make speech identification models because we have internal teams to intrate and to respond quickly. ”

The model also has a smart speaker diarrheization to tell you who is talking, the words such as timestamp and audience laugh at the Word level for the right subtitles are auto-tagging sound events. Startup is providing a way to replicate live video content for customers to add subtitles or captions to its studio.

The author currently works with pre-recorded audio formats. The company says it will soon release a short-compliance real-time version of the model. This means that it is not yet effective for taking transcriptions or voice notes.

Elevlabs are determining the price at $ 0.40 for an hour of audio. Although the rate is competitive, Some of the rivals Low Price Offer For audio transcripts at the moment with the differences of some features.

Leave a ReplyCancel Reply

Trending now