MacDirectory magazine is the premiere creative lifestyle magazine for Apple enthusiasts featuring interviews, in-depth tech reviews, Apple news, insights, latest Apple patents, apps, market analysis, entertainment and more.
Issue link: https://digital.macdirectory.com/i/1488864
Text-to-image Just as people were starting to grapple with the consequences of GAN-generated deepfakes – including videos that show someone doing or saying something they didn’t – a new player emerged on the scene: text-to-image deepfakes. In this latest incarnation, a model is trained on a massive set of images, each captioned with a short text description. The model progressively corrupts each image until only visual noise remains, and then trains a neural network to reverse this corruption. Repeating this process hundreds of millions of times, the model learns how to convert pure noise into a coherent image from any caption. While GANs are only capable of creating an image of a general category, text-to-image synthesis engines are more powerful. They are capable of creating nearly any image, including images that include an interplay between people and objects with specific and complex interactions, for instance “The president of the United States burning classified documents while sitting around a bonfire on the beach during sunset.” OpenAI’s text-to-image image generator, DALL-E, took the internet by storm when it was unveiled on Jan. 5, 2021. A beta version of the tool was made available to 1 million users on July 20, 2022. Users around the world have found seemingly endless ways to prompt DALL-E, yielding delightful, bizarre and fantastical imagery. A wide range of people, from computer scientists to legal scholars and regulators, however, have pondered the potential misuses of the technology. Deep fakes have already been used to create nonconsensual pornography, commit small- and large-scale fraud, and fuel disinformation campaigns. These even more powerful image generators could add jet fuel to these misuses. Three image generators, three different approaches Aware of the potential abuses, Google declined to release its text-to-image technology. OpenAI took a more open, and yet still cautious, approach when it initially released its technology to only a few thousand users (myself included). They also placed guardrails on allowable text prompts, including no nudity, hate, violence or identifiable persons. Over time, OpenAI has expanded access, lowered some guardrails and added more features, including the ability to semantically modify and edit real photographs.