Allen Institute for AI researchers propose PROCTHOR: a machine learning framework for the procedural generation of embodied AI environments


The use of large-scale training data, computer vision, and natural language processing models have gained strength. Recent models like CLIP, DALL-E, GPT-3, and Flamingo leverage large amounts of task-independent data to pre-train large neural networks that perform incredibly well. In comparison, the embedded AI research community primarily trains agents in simulators with far fewer situations. Due to task complexity and the need for extended planning horizons, the most successful E-AI models continue to over-fit constrained training scenes and, therefore, transfer poorly to unfamiliar contexts. .

Although E-AI simulators have become increasingly powerful in recent years, with support for physics, manipulators, object states, deformable objects, fluids, and real world equivalents, their upgrade to scaling up to tens of thousands of scenes remained difficult. Existing E-AI parameters are either developed by hand or obtained from 3D scans of real-world structures. The first method requires considerable effort on the part of 3D designers to create 3D assets, arranging them into acceptable arrangements inside huge locations and meticulously establishing the appropriate textures and lighting within these environments. The latter involves moving specialized cameras through various real-world situations, then stitching the resulting photos together to create 3D reconstructions of the scenes.

These techniques are not scalable and it is not possible to extend existing scene repositories by orders of magnitude. PROCTHOR, an AI2-THOR based framework, is presented to build fully interactive and physics-enabled procedural parameters for E-AI research. PROCTHOR can generate a wide and diverse selection of floor plans that match the specifications of a given space. To automatically populate each floor plan, a huge asset library of 108 element types and 1633 fully interactive instances is used, ensuring that object placements are physically feasible, natural and realistic.

The intensity and tint of lighting elements in each scene can also be changed to reflect differences in interior lighting and time of day. Larger assets and buildings, such as walls and doors, can be assigned different colors and textures drawn from sets of realistic colors and materials for each type of asset. The variety of layouts, components, locations and lighting combined results in an arbitrarily huge collection of settings, allowing PROCTHOR to scale orders of magnitude beyond the number of scenes currently handled by modern simulators. Additionally, PROCTHOR allows for dynamic material randomizations, which allow specific colors and materials to be randomized each time an environment is brought into memory for training.

Free 2 Minute AI NewsletterJoin over 500,000 AI people

ARCHITECTHOR is an artist-designed 3D collection of ten high-quality, fully interactive homes, intended to be used as a testing framework only for research in home settings. ARCHITECTHOR environments are more complete, diverse and realistic than AI2-iTHOR and RoboTHOR settings. Unlike settings created using 3D scans, PROCTHOR scenes feature fully interactive elements and support multiple distinct object states, allowing them to be physically moved by agents equipped with robotic arms. The researchers illustrate the ease of use and effectiveness of PROCTHOR by sampling an environment of 10,000 homes with different layouts ranging from modest one-room cottages to larger 10-room houses.

Agents are trained on PROCTHOR-10K using minimal neural architectures – no depth sensor, only RGB channels, no explicit mapping and no human task supervision – and produce state-of-the-art models on various benchmarks navigation and interaction.

In summary, contributions include PROCTHOR, a framework for high-performance procedural generation of an infinite number of diverse and fully interactive simulated environments, ARCHITECTHOR, a new set of artist-designed 3D houses for E-AI evaluation, and SoTA results across six E-AI benchmarks covering manipulation and navigation tasks, including strong 0-shot results. Ablation analysis demonstrates the benefits of scaling from 10 to 100 to 1K, then 10K scenes, and indicates that additional gains can be gained by calling PROCTHOR to generate even larger environments. PROCTHOR will soon be open-source, and the code used in this project will be made available. Until then, a Google Colab notebook was designed to start on ProcTHOR-10K.

This Article is written as a research summary article by Marktechpost Research Staff based on the research paper 'ProcTHOR: Large-Scale Embodied AI Using Procedural Generation'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and project.

Please Don't Forget To Join Our ML Subreddit

Sherry J. Basler