During the programming and writing of “Automating Boggle”, I remember being surprised that popular AI models (like GPT-4) — capable of many remarkable tasks — couldn’t do something so “trivial” as reading rotated dice. I regularly retry using the OpenAI API to recognise the labels on these dice, but the models always fail to do so (and they often fail spectacularly). The prompt I use is the following:
"Given an image of a 4x4 grid of Boggle dice, extract and return the 16 letters from the grid, in a continuous string from top left to bottom right, left-to-right across each row. Account for any rotations of the letters. The output should be the correct sequence of letters as a continuous string, without any additional text, newlines, and formatting."
It often happens that I get more than 16 letters in return, most of them not even present on the board. If we look at the OpenAI API documentation, under the section of Limitations, we see:
Rotation: The model may misinterpret rotated / upside-down text or images.
Instead of manually calling the API once in a while to see if an updated version of a vision model can complete the task, I plan to write a GitHub Actions workflow that regularly calls the API and compares the output to the expected solution, printing the results on this website. Guess this will be my AI model benchmark: how long before these models can read rotated dice from an image and the instructions from the prompt above?