AI Sucks at Reading Clocks

Spread the love

Nowadays, artificial intelligence can create photorealistic images, write novels, can do your homework and even Predict the protein structureThe New research, however, reveals that it often fails in very early work: time to say.

Researchers at the University of Edinburgh have tested the skills of seven well-known multimodal large language models-AI that can explain and produce different types of media-to answer time-related questions based on various images of calendar or calendar. To study them, arrived in April and Has been hosted currently In the preprint server RXIV, proves that LLMS has difficulty in these initial work.

Researchers wrote in the study, “Visual inputs are important for time to explain time and for many reasonable real-world applications,” from the event to the autonomous system, “researchers wrote in the study.

The team tested the OpenAI GPT -4O and GPT -O 1; Google Dipmind’s Gemstone 2.0; Anthropic Clod 3.5 Sonnet; Mater Lama 3.2-11B-Vision-Instruct; Alibaba Qwen2-VL7B-Insuct; And Modelbest’s MinicPM-V-2.6. They fed different images of analog clock – Roman numerals, different dial colors and even seconds of seconds – as well as different images of 10 years calendar images.

For clock images, researchers asked LLMS, W.The huts are shown during the clock in the picture given? For calendar images, researchers asked such simple questions, W.What day of the week is the new year? And more strict questions including W.The Hat is the 153th day of the year?

Researchers explained, “Analog clock readings and calendar understanding complicated cognitive steps involved: they claim subtle-gradual visual recognition (eg, day-cell layout) and non-existent numeric arguments (eg, calculate the day offsets),” researchers explained.

Overall, the AI ​​systems have not done well. They fall less than 25% of the time on analog clock. Researchers say they fought with the watch that carry Roman numerals and stylized hands as they did with the clock for a second hand, indicating that the problem started with identifying hands and explaining angles on the clock face, according to researchers.

Google’s Gemi-2.5 team scored the highest score on the clock task, while GPT-O 1 period was correct in the 5% calendar task-this is a much better result than competitors. However, the most successful MLLM of the calendar task still made the mistake of about 20% of the time.

“Most people can say time and use the calendar from a young age. Our search highlights a significant gap in the skill of the people of AI, “co-authors of the University of Edinburgh’s School of Informatics University and PhD student Rohit Saxena says at a university, at a university. StatementThe “If the AI ​​systems are to be successfully integrated in time sensitive, real-world applications such as schedule, automation and helpful technology, these deficit must be resolved.”

So when AI may be able to complete your homework, it will not rely on it to stick to any timeframe.

Leave a Reply

Your email address will not be published. Required fields are marked *