Reference Number: DEVCOM-099
Project Description
The Army seeks to discover novel mechanisms that provide context and insight into how individuals and teams understand complex real-world environments, enabling commanders and intelligent systems to communicate and adapt effectively, thereby improving joint human-system performance. The Spatiotemporal Analysis and Inference Toolkit (STAIT) was developed to support research and analysis of joint human-system performance. This project focuses on expanding key capabilities of the toolkit. Modern XR devices, such as the HoloLens 2 and Quest Pro, have built-in Simultaneous Localization and Mapping (SLAM) capabilities that enable them to track their own position and build a real-time mesh of the environment. However, this data is often stored in proprietary formats and is not easily accessible or interoperable between devices. This project will focus on creating a tool to extract this environmental map data from a target XR device and convert it into a standardized format that can be incorporated into the STAIT “world data” layer. The team will develop a process to capture a photogrammetry recording, 3D mesh, or 3D point cloud generated by a device during a session, clean it, and store it in an open format (e.g., glTF, OBJ, or PLY) alongside the corresponding STAIT data.
- Stretch goal #1: STAIT-Aligner – align location and orientation data to the ‘world data layer’ by using collected first-person video feeds and location data to enable a shared co-localization of the world.
- Stretch goal #2: STAIT-Context – conduct semantic analysis on forward facing videos and eye gaze data to update and label the 3D semantic mesh to understand what agents are looking at and interacting with.
- Stretch goal #3: STAIT-Delta – create an understanding of when the underlying world data has changed. This can be used to create an understanding of how the environment has changed or for real time updating of the mesh for shared situational understanding.
Motivation and Relevance
This project tackles a foundational problem for multi-agent, mixed-reality research. By creating a standardized representation of the environment map, it enables multiple agents using different devices to operate within a shared, common frame of reference. This is a prerequisite for analyzing inter-agent interactions in a shared space and for developing joint human-machine systems. This work would formalize the “world data” component of the STAIT hierarchy, significantly increasing the toolkit’s utility and scope.
Expected Outcome(s) The MVP will be a set of scripts and a documented workflow for one target device (e.g., HoloLens 2):
- A tool to access and export the 3D spatial map generated by the device’s OS.
- A Python script to process the raw map data, including cleaning (e.g., removing floating artifacts) and down-sampling.
- A defined standard for storing the map data and linking it to the corresponding STAIT session data (e.g., via metadata in the file headers).
- A simple loader function within the STAIT Python library that can load both the agent data and its associated world map for synchronized analysis.
Technical Skills
- Python: For data processing and tool development.
- Device-Specific APIs: Willingness to learn how to interface with device SDKs (e.g., using C# for HoloLens, or the Quest developer tools). This may require some cross-language work.
- 3D File Formats: Understanding of common 3D file formats (OBJ, PLY, glTF).
- 3D Data Processing: Experience with libraries for mesh manipulation (e.g., Trimesh).
- Software Engineering: Good practices for documenting a data standard and workflow.

