A year ago, when GPT-4 had not yet been released, when people were fretting about ChatGPT's hallucinations, no one would have thought something like this was possible. And yet it's already here: Sora, an AI model for creating realistic videos of up to 60 seconds, is currently surprising and scaring the entire industry. The text-to-video model is better than anything published so far – clearly putting AI video startup Runway and Image among the rest.
Sora clips can still only be viewed on OpenAI social media channels and websites; The AI model (still) brings with it so many bugs and uncertainties that the world's most popular AI startup is initially only available to security (“red team”), visual arts, design and testers. Image. Because it's not yet clear what harm will come from careless use — especially in the super-election year of 2024.
What is Sora, how does it work, and what it can and cannot do? A team of researchers from Lehigh University in Pennsylvania and Microsoft Research have discovered Sora in a new paper. Divided in detail. Here are the answers to the most important questions:
What does Sora mean?
Sora means “sky” in Japanese. The developers chose the name because it “evokes the idea of limitless creative potential,” according to OpenAI.
How is video created?
Sora combines so-called transformers, which are used in GPDs, to form a diffusion model called the diffusion transformer model. “The model starts with an image filled with visual noise, iteratively denoises the image, and adds specific details to a given text line. The generated video is created through a multi-stage refinement process, in which the video is modeled closer to the desired content and quality at each step,” the researchers explain in the above-mentioned research report. .
@openai 👀 Prompt: This video of a close-up of a woman's eye, with her iris appearing as Earth, was created without any editing by our text-to-video model, Sora. What production would you like us to see next with Sora? *Sora is not yet available to the public. We proactively share our research progress to learn from feedback and give the public a sense of what AI capabilities are on the horizon. #medwith sora #sora #Opening ♬ Cribs – Sub Urban
What video content does Sora create?
One-minute videos are presented at the maximum picture resolution of 1920x1080p or 1080x1920p (and everything in between), commonly known as Full HD. In terms of content, the following things are already working:
- Many roles
- Specific types of movement
- Detailed previews and backgrounds
- An accurate representation of objects in the physical world
- Continuation of existing videos
- Game environments
@openai Responding to the @Movie Clip Prompt: An alien, paranoid thriller-style, 35mm film that blends naturally with New York City This video was created by our text-to-video model, Sora, with no editing. What production would you like us to see next with Sora? *Sora is not yet available to the public. We proactively share our research progress to learn from feedback and give the public a sense of what AI capabilities are on the horizon. #medwith sora #sora #Opening ♬ Scary, Quiet, Scary Atmospheric Piano Songs – Skittlegirl Sound
What is Sora based on?
Sora is an adaptation of Dall-E 3, a text-to-image model from OpenAI, which also relies on a diffusion model. It is trained to create an image from a jumble of random pixels. Sora takes the concept from a still image to a video level. GPT-4 is called a Large Language Model (LLM), while Chora is called a Large Visual Model (LVM). So it is not GPT-5 but a complete AI model. However, GPT-4 is used to capture and interpret user instructions.
Which industries could be disrupted?
This is one of the reasons why AI videos can disrupt anything related to videos and movies. Especially for short clips in advertising, for example, companies could use Sora and related tools in the future instead of creating videos in the real environment at great expense using Camvera. In addition, there are several application areas that are not so obvious:
- Education: Teachers can convert lesson plans from text to videos…
Recapture students' attention and use videos to present complex material more easily. - Gaming: Sora may be used to design 3D worlds for computer games in the future
- Hygiene: In addition to generating videos, diffusion models such as Psora can be used to detect dynamic abnormalities in the body, such as early cellular apoptosis, the progression of skin lesions, or irregular human movements, which are important for early disease diagnosis and intervention strategies. and complex video sequences – At least that's what researchers think.
- Robotics: Future robots can better interpret their complex environment with the help of Sora and co. Not surprisingly, OpenAi is now interested in collaborating with or has invested in robotics startup Figure AI.
@openai IYKYK Instruction: “An old man with gray hair and glasses is devouring a delicious cheeseburger. The bun is topped with sesame seeds, fresh spinach, a slice of cheese, and a golden brown beef patty. His eyes are closed in delight as he bites. He wears a red jacket and a It looks like someone is sitting inside a fast food restaurant. This video was created without modification by our text-to-video model Sora. What product would you like to see us produce next with Sora? *Sora is not yet available to the public. To learn from feedback and give the public a sense of what AI capabilities are on the horizon, our We share research progress in advance. #medwith sora #sora #Opening ♬ Edit Phonk (Slow) – Bgnzinho
What dangers arise from chora?
Not surprisingly, Sora has yet to be released for public use. Due to the rapid spread through social media channels (TikTok, YouTube, etc.), the potential misuse of such tools (eg by fake politicians) and AI videos that are deceptively real and flawless (not even technically ) is a very big risk. Hallucinations and biases that distort the videos produced cannot be ruled out either.
AI models have the potential to be banned from some applications; Dall-E 3, for example, cannot create images of real people. However, the researchers point out that one can never rule out “jailbreak” attacks that attempt to exploit vulnerabilities to create banned or harmful content. That's why OpenAi is currently working with Red Teamers, who are actively testing Sora for exploitable vulnerabilities.
According to OpenAI, once Sora is available, it will not be able to create content that contains extreme violence, sexual acts, hateful images, images of celebrities, or other people's intellectual property. They want to integrate C2PA metadata so that Sora videos are automatically recognized as AI videos. But if you're simply shooting videos, you can easily get around this.
@openai Sora Sunday ❤️ Instruction: A tour of the museum with many paintings and sculptures and beautiful works of art in all forms This video was created by our text-to-video model Sora without modification. What production would you like us to see next with Sora? *Sora is not yet available to the public. We proactively share our research progress to learn from feedback and give the public a sense of what AI capabilities are on the horizon. #medwith sora #sora #Opening ♬ Fragments (Solo Piano Version) – Danilo Stankovic