MacDirectory magazine is the premiere creative lifestyle magazine for Apple enthusiasts featuring interviews, in-depth tech reviews, Apple news, insights, latest Apple patents, apps, market analysis, entertainment and more.
Issue link: https://digital.macdirectory.com/i/1505412
But when it comes to text and quantities, the associations must be incredibly accurate, since even minor imperfections are noticeable. Our brains can overlook slight deviations in a pencil’s tip, or a roof – but not as much when it comes to how a word is written, or the number of fingers on a hand. As far as text-to-image models are concerned, text symbols are just combinations of lines and shapes. Since text comes in so many different styles – and since letters and numbers are used in seemingly endless arrangements – the model often won’t learn how to effectively reproduce text. The main reason for this is insufficient training data. AI image generators require much more training data to accurately represent text and quantities than they do for other tasks. The tragedy of AI hands Issues also arise when dealing with smaller objects that require intricate details, such as hands. In training images, hands are often small, holding objects, or partially obscured by other elements. It becomes challenging for AI to associate the term “hand” with the exact representation of a human hand with five fingers. Consequently, AI-generated hands often look misshapen, have additional or fewer fingers, or have hands partially covered by objects such as sleeves or purses. We see a similar issue when it comes to quantities. AI models lack a clear understanding of quantities, such as the abstract concept of “four”. As such, an image generator may respond to a prompt for “four apples” by drawing on learning from myriad images featuring many quantities of apples – and return an output with the incorrect amount. In other words, the huge diversity of associations within the training data impacts the accuracy of quantities in outputs. Will AI ever be able to write and count? It’s important to remember text-to-image and text-to-video conversion is a relatively new concept in AI. Current generative platforms are “low-resolution” versions of what we can expect in the future. With advancements being made in training processes and AI technology, future AI image generators will likely be much more capable of producing accurate visualisations. It’s also worth noting most publicly accessible AI platforms don’t offer the highest level of capability. Generating accurate text and quantities demands highly optimised and tailored networks, so paid subscriptions to more advanced platforms will likely deliver better results. AI-generated image produced in response to the prompt ‘KFC logo’. (Imagine AI)