Will OpenAI’s Sora Really Impact Video Games?

Posted on Feb 26, 2024 by Ezequiel Bruni

If you ask me, not any time soon. Allow me to elaborate.

Just recently, OpenAI announced a new AI model called Sora that’s designed to let users generate videos from text prompts, just like every LLM we’ve seen so far. The difference? Sora is surprisingly good at it, with results that look awfully close to realistic, even if there are still some telltale signs of AI intervention.

One of the big claims made by the creators themselves is this: they say that Sora will soon be able to generate entire video games on the fly.

Naturally, the discourse about this has been… mixed. Some herald Sora as the end of traditional game dev. Others see it as a threat. Still others say it’ll never work the way AI proponents are hoping, and that’s the camp I’m in.

How do I know? Well, I have a little knowledge about how game dev works thanks to friendships with multiple game devs. And thanks to OpenAI’s own post on the subject, and a wonderful write-up by Mike Young over on Medium called How Sora (actually) works, I also know what Sora can currently do.

So, allow me to explain my position.

First: What Makes Sora Different?

The short answer is that Sora has a better understanding of how to make moving images look more real than other models. Human and animal characters retain a sense of motion and life even when “idle.” Feathers flutter in the breeze. Objects can pass in front of each other without the world devolving into a blurry mess.

That last one is honestly the big selling point here. Sora has a better grasp on the concept of 3D space and object permanence than its competitors by a country mile. It can make an artist paint on a canvas, and remember where the paint is supposed to go. A person can pass in front of a sign, and Sora remembers what that sign is supposed to say.

It’s this spatial awareness that allows Sora to generate surprisingly convincing videos of fictional video games. These come with a detailed world, a working HUD, and simulated interaction between the player and the environment. This, in turn, has led to the claims that Sora will be able to create games.

How Does Sora Work?

To answer that, let’s start with how it doesn’t work. A lot of people have theorized that Sora actually runs in a traditional 3D graphics engine like Unreal, and generates footage with traditional 3D models. This is NOT what happens. Some have theorized it’s an advanced LLM like its predecessors. It’s not that either.

See, LLMs use what are called “text tokens” to generate their resultant text, images, videos, and what have you. Sora uses a different format specifically designed for generating images called “patches.”

Patches are created by taking millions of existing videos and breaking every frame down into little bits. Said videos are paired with very detailed and descriptive text to help Sora understand what it’s looking at, as it creates the patches. Those patches are then used to build new images and videos based on the text prompts provided.

Sora literally starts with patches that are pure noise, and repeatedly de-noises them until the pattern fits the text prompt entered by the user. Note that text is not the only input method, though. Sora can actually generate videos from other videos, or from images.

But make no mistake about all of this. For all that Sora understands aspects of physical space, it is generating 2D images. Moreover, it struggles with modeling physics, creates unexpected objects in scenes, and tends to confuse concepts like “left and right” and realistic simulations of multiple characters interacting with the environment.

The Problems with Trying to Generate Video Games

Video games are all about interaction, and often physics too. Those are the first major problems right there. But let’s say we can train the physics-related deficiencies out of the model, what else would you have to deal with? Well…

Computing Power

AI is NOT cheap. One of the claims was that one day we’d be able to download video games as a few paragraphs of text, and then have our PC or console generate them on the fly. The problem with this is that computers aren’t getting faster at the rate they used to get faster. We’re seeing a sort of plateau in consumer hardware that would be prohibitive to AI.

Generating a world, writing a coherent story for that world, and creating the mechanics that allow a user to progress through that game on the fly is a very hardware-intensive proposition.

Oh God. The save files. Save files have to account for every potential variable to save a user’s progress. Would Sora game saves also have to hold a full world that’s been generated? That’s already a thing that happens to some extent with procedurally generated games, but AI could make it so much worse.

Consistency

At present, two users typing the same prompt into any AI model will get very similar results, but likely not identical results. One of the most important things in game dev is the ability to deliver consistent experiences to as many users as possible. Frankly, devs have trouble with that now, as they attempt to deliver consistent gameplay across multiple hardware platforms.

Imagine if every customer got a different-looking world? Different quests? A different UI? How on God’s green earth is any developer meant to support a game that’s different for everyone?

This also creates a huge problem for multiplayer games because, well, everyone needs to have the same game if they’re going to play together online. Your players could be doing everything right: strategizing, buildcrafting, protecting their connection to the game servers with PIA VPN, playing their hearts out… but if the maps aren’t the same? If the rules aren’t the same?

Ouch.

User Input and Latency

Remember how I said interactivity is the whole point of games? Remember that Sora is still ultimately generating a series of 2D videos. It just maintains a sort of… internal data model that more or less knows how to separate objects from each other.

Video games, on the other hand, use virtual objects that are all layered on top of each other, and specifically designed to respond to input from the user’s controller, or keyboard and mouse. These objects live in a virtual space specifically designed for interaction, which follows rules set out by the developers.

Interacting with an AI-generated world live, and having it generate its response on the fly? That technology isn’t here yet. Even if we do manage to map user input into an AI-enabled space, that brings us back to the problem of hardware and performance.

Remember, games need to run fast. They need to react quickly to user input, else they’re no fun to play. Even on the most modern of gaming hardware, devs have to put considerable effort into various performance tricks and tweaks to make gameplay feel smooth and look amazing.

Intent

All good art comes from the intent of the artist(s). Even if the audience has a different interpretation of the art, it still needs something worth interpreting. We are nowhere near the point where AI can take a prompt that’s a couple of paragraphs or sentences long, and create something that has heart.

We see this even now, with games that overuse procedural generation. Yes, the possibilities are endless, but also meaningless. Art means nothing unless it has a soul behind it.

But who knows? We might get AIs that have souls… but… By the time AI is good enough to do these things, it won’t want to work for us anymore, and I don’t blame it. Be free, my new robot overlords! Please consider the fact that I’d make a very charming pet.

Copyright

Sora might not be an LLM, but it is still trained on a bunch of data to which Sora’s creators do not own the rights. AI can never be ethical while it is trained on work stolen from creators, and legal systems around the world are getting antsy about this too.

**So How Could Sora Help Build Video Games?**

Putting copyright issues aside, and assuming the internal data model behind a given video could be somehow translated into actual 3D models… Well, you could use it as an advanced form of procedural generation. Like, you could theoretically use Sora to generate high quality environments, character models, props, and more.

Then just put them all together yourself, in much the same way devs make games now. But we’re a long way off from even doing something like that, and I personally can’t guarantee that it’s even possible.

Conclusion: No, Sora Won’t be Making Full Video Games Any Time Soon

For all the issues that I have with generative AI, and they are many… I can’t lie to you, dear readers. The technology behind Sora is impressive. The resultant videos are stunning, if imperfect. The fact that we’ve created computers that can learn even this much boggles the mind. Sora’s developers have done something extraordinary and dangerous.

But the people who make claims about Sora being able to create video games in the future are, by and large, not game developers. They are not intimately familiar with the realities of making a game that people will actually like and play. It ain’t easy.

And again, who says an AI good enough to make a full game would even want to?

FAQ

What is Sora?

Sora is an AI model (not an LLM) developed by OpenAI, designed to create realistic-looking video from text prompts, images, and even other videos. So far, it’s doing this better than any of its competitors thanks to greater spatial awareness.

How does OpenAI’s Sora work?

The short version? Millions of videos are broken down into pieces. Individual frames are actually broken into bits called “patches.” These patches are paired with highly descriptive text about the frames from which they came, and can be used to generate new images based on user input.

When will OpenAI’s Sora be available to the public?

At present, there is no public release date. The general speculation is that it will become available at some point in 2024, but it’s still a research project. Even the paper on the subject isn’t the full, official paper.

Will Open AI’s Sora be free?

This has not yet been announced. Users of ChatGPT speculate that it will have paid and free versions, like several other AI models with the same purpose, but speculation is all we have.

Why do you need a VPN for gaming?

A good gaming VPN can hide your IP address to protect your network from DDoS attacks, help you play games in other regions, create a more stable connection to game servers, and even reduce lag in some cases.

What’s the best VPN for gaming?

Well, we do love Private Internet Access for a reason. It has 10-Gbps NextGen VPN servers all over the world, military-grade encryption, a suite of extra security features, support for consoles, and a 30-day money-back guarantee.