Unlike the NFT craze of 2020, a fire that burnt out quite quickly, it seems like the rage centering AI’s world takeover, a similarly feared practice, has become accepted and less scary, at least for now. OpenAI, one of the largest software companies for text-to-image AI, is now readying to launch Sora, an AI program that will generate videos based on text input. Sora will challenge the legitimacy of video with its potential hyper-realistic charm while channeling its continuous creativity.
With AI constantly innovating and updating, the results we get to see are at first glance outstanding. As we saw with the first iterations pumped out by DALL•E, sometimes the text-to-image results are nothing short of laughable due to its low quality definition. People have yet to fully realize the role time plays in creating high quality renderings. “We take inspiration from large language models which acquire generalist capabilities by training on internet-scale data,” Sora said.
View this post on Instagram
Much like a text-to-image program, thousands of data points and code are input to direct Sora to generate video. With the inout data, the system can then run through a visual encoder to create the finished product. Sora has demonstrated the potential the software can create both on its Instagram and website. Text-to-video, much like text-to-image, can be simple, realistic, funny, or all three.
View this post on Instagram
Text-to-image has been a creative help for designers and artists who have used it as a foundation for work. AI allows for image and essence capturing that can then be adjusted for realistic capabilities. Video is the next step up, that will allow for 3D renderings and further advance design creation. Sora will even allow for users to input already generated images, such as ones from DALLE, and create GIFs or videos from it.
Along with image-to-video generated content, Sora will also be able to merge two videos and create a singular flick that abides by the structure of the originals. And for the adventous, videos will be able to be input alongside a text prompt to generate and add new features, such as a background change or a different art style.
View this post on Instagram
“We believe the capabilities Sora has today demonstrate that continued scaling of video models is a promising path towards the development of capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the company said.
Other AI elements such as audio have yet to be announced from OpenAI, but should be expected considering the mass appeal social media users have toward AI music. Though text-to-image softwares like Midjourney have been extremely popular, the 3D element offered by Sora will allow for every corner of a person, place, or thing, to be vividly captured, a tool architects and designers may look into harnessing.