Using AI for on-demand simulations - a.k.a. The Holodeck / Dreamatorium

I’ve recently been seeing a lot of noise being made around using generative AI tools for the creation of virtual worlds. There have been some really incredible experiments, mainly using a generative AI tool to create either a panorama or 360 photo/skybox and either a depth estimation tool or manual adjustment to take it from 3dof to 6dof.

Here is a great example: https://twitter.com/ScottieFoxTTV/status/1627366581628960770?s=20

This gorgeous world was generated using an AI tool from Blockade Labs, and it’s two skyboxes at different distances, the inner one with alpha channels manually cut out, placed in a Unity scene and the mirror display of the VR headset moving through it was screen recorded. If you look closely, you can see the two layers and tell that they’re monoscopic.

Here is another generated skybox (monoscopic 360) in WebXR: https://twitter.com/felix_trz/status/1632097476013617152?s=20

In this demo, they were able to host their 360 images in a WebXR scene, and allow users to generate the scenes from within a VR headset.

You can imagine stringing a variety of similar AI tools together to give yourself a holodeck-esque experience. Imaging being in an XR compositor and saying the words “Holodeck, generate a laid-back tropical beach vacation at a resort for me on a sci-fi planet with lots of interesting people to talk to.”

  1. ML for speech-to-text, so the words you speak can be parsed easily

  2. Prompt parser to identify different key phrases within the prompt to pass through different tools

  3. “laid-back” and “lots of interesting people to talk to” defines the gameplay genre, no loss condition

  4. The following will be used in tandem with an AI trained on a database of meticulously tagged levels from various games and worlds, in addition to synthetic training data

  • “tropical beach vacation” generates a map with a beautiful beach
  • “sci-fi planet” means the sand and trees and water and sky might be different shapes and colors than a beach on Earth
  • “at a resort” and “sci-fi” make the resort you’re staying at a mix of a modern tropical resort and sci-fi themes like metal walls and walkways, neon lights, and hovering furniture mixed in with the tropical theme
  1. The generated world would then be parsed by a CV algorithm for segmentation. It would generate the navigation mesh, identify different objects and materials and interactables like doors, chairs, and things like sand/water. Each of those would be flagged with a purpose-built tool which could add necessary shaders and physics if need be.

  2. Another pass could be done after to populate the world with characters based on the above. Using an ML tool trained on existing highly-rated games, it would create and position spawn points for “utility NPCs” (front desk, surf rental), “main NPCs” (characters to chat with or befriend) and “background NPCs” (just to fill it out, background chatter, and some flavor text for small conversations).

  3. Another ML tool trained on lots of characters from literature, games, movies, and other media is able to take your phrases like “sci-fi” and “tropical vacation” and also knowledge of both the level design and placement of the characters and create backstories for each. An alien couple on their honeymoon, a retired space marine colonel on his first vacation in years after fighting in a galactic war, a few potential love interests of various genders and species.

  4. Another model takes those backstories and generates lots of potential dialogue and conversation starters with each of these characters (they will be able to respond in real-time and go off-script, but having a base will help them approach you if it makes sense to strike up the conversation).

  5. Another tool generates the characters models based on their descriptions.

  6. Another tool auto-rigs them.

  7. Another tool assigns them personality weights based on their backstories.

  8. Another gives them facial animations which will reflect their personality weights and how your dialogue choices have affected them.

  9. Another tool decides which text-to-speech voice to assign to each character based on their species, visual design, and backstory.

  10. A tool will automatically place more complicated gameplay / interaction systems that are pre-built (not auto-generated) in places where it makes sense. For example, adds a surfing mechanic to the surf shack and makes a surfing nav mesh in the ocean, adds a beach volleyball game mechanic, adds a mechanic where you can bond with characters by playing catch with a football, etc.

  11. Finally, a tool will generate a variety of background music where it’s appropriate and generate and apply sound effects to objects and scenes (noises when you move the chair, doors slam noises if you close them too fast, ocean waves if you’re on the beach, etc.)

Though that’s not every conceivable feature you may want in your simulation, it should hopefully give you an idea of how attainable it seems. Currently, there are ML tools for the majority of the above. For example, Mixamo can be used to auto-rig characters. Riffusion can be used to generate the background music. Any of the LLMs can be used to generate the scripts for the characters, both ahead of time and in real-time.

Obviously similar ideas have been done before, for example AIDungeon. But nothing has ever been done that is like an actual interactive world or game entirely generated from ML in near real-time. Some tech demos have shown “infinite worlds” or other very lightly interactive experiences, but crucially they’re from pre-existing seeds and building blocks, not truly generative. I very much think it will be possible in the near future.

However, I think there are a few roadblocks. Firstly, though I’ve been following the space closely, a few of the above steps I mentioned don’t have any sufficient automated ML tools to guide the creation of the assets needed for the simulation. Secondly, many of the tools are new and aren’t sufficient to the quality that I want or I believe most users would want to make it worth their time to try and/or regularly use a Holodeck-like program. Thirdly, they’re all disparate tools that run in different platforms and still require a lot of work to generate and put into a game engine. Finally, most of the ML tools are not openly available nor are they available as an API to be able to make parallel requests to and have the assets streamed as they become available into your simulation. It may be very difficult to run all these various models on a single high-end computer in a sufficiently fast amount of time, even within this decade. However, with various optimizations, I could easily imagine even a standalone headset hitting various APIs which can generate each of the needed assets in parallel and construct the world for the user as they download the assets, especially if the cloud services packaged the assets knowing how it would be used. This would enable entire interactive simulations tailored custom to your desires for a few cents or dollars, as easy as using DreamStudio or Midjourney.

I think there’s some real magic to being able to just vocalize the world you want to be in, and then be an active part of it nearly instantly. I think this future is inevitable, whether it takes 2 more years or 20.