OpenAI, the AI startup founded by Elon Musk that is behind the popular text-to-image generator DALL-E, on Tuesday announced the release of its latest POINT-E image-making machine, which can produce 3D raster clouds directly from text messages. Whereas existing systems like Google’s DreamFusion typically require several hours — and GPUs — to generate their images, Point-E only needs one GPU and a minute or two.
3D modeling is used across a variety of industries and applications. CGI effects for recent blockbusters, video games, virtual and augmented reality, NASA’s lunar crater mapping missions, Google’s Heritage Site Preservation projects, and Meta’s vision for the Metaverse rely on 3D modeling capabilities. However, creating photorealistic 3D images is still a resource and time-consuming process, despite NVIDIA’s work to automate object creation and Epic Game’s RealityCapture mobile app, which allows anyone with an iOS phone to scan real-world objects as 3D images.
Text-to-image systems such as OpenAI’s DALL-E 2, Craiyon’s DeepAI, Prisma Lab’s Lensa, or HuggingFace’s Stable Diffusion have gained popularity, popularity, and fame in recent years. Text-to-3D is an offshoot of this research. Unlike similar systems, “Point-E takes advantage of a large set of (text-image) pairs, which allows it to follow diverse and complex prompts, while the Image-to-3D model is trained on a smaller data set of (image, 3D) pairs,” he wrote. The OpenAI research team led by Alex Nicol in Point E: A System for Generating 3D Clouds from Complex Vectors, published last week. “To produce a 3D object from a text prompt, we first sample an image using the text-to-image model, and then sample a 3D object conditioned by the image with which the sample was tested. These two steps can be performed in a number of seconds, and do not require optimization procedures Expensive “.
If you want to enter a text message, for example, “The cat is eating a burrito,” Point-E will first create a synthetic 3D rendering of the cat eating a burrito. That generated image will then be run through a series of diffusion models to create a 3D RGB point cloud for the raw image – first producing a coarse 1,024-point cloud model, then a finer 4,096-point cloud. “In practice, we assume that the image contains the relevant information from the text, and the point clouds are not explicitly located on the text,” the research team notes.
Each of these diffusion models has been trained on “millions” of 3D models, all of which have been converted to a standardized format. “While our method performs worse in this evaluation than the latest technology,” the team acknowledges, “it does produce samples in a fraction of the time.” If you want to try it out for yourself, OpenAI has posted the open source project code on Github.
All products recommended by Engadget are selected by our editorial team, independently of the parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publication.
#OpenAI #releases #PointE #similar #DALLE #modeling #Engadget