Reference Number: DEVCOM-092
Project Description
As spatial computing technologies become increasingly prevalent on and off the battlefield, new tools are needed to enhance spatial reasoning. One application for spatial reasoning is shared-situational awareness between humans and robots. Towards this end, ARL is conducting research and has collected multi-modal, multi-person visual-spatial data. We require data-fusion tools that can combine data streams to produce a unified spatial scene. Specifically, we have video data from multiple POV cameras (HUDs) in the same spatial reference frame (spatial anchor). We need a tool that uses information about camera frustum (head location) to place 2D visual objects into the correct 3D locations. Specifically, this tool must identify unique objects from multiple perspectives over time (e.g. identify that this object viewed by Soldier 1 was the same object as viewed by Soldier 2 at a later point in time).
Motivation and Relevance
Dismounted Soldiers and UAS provide vastly different viewpoints. A shared representation of the environment across multiple humans and autonomous systems, when effectively communicated, expands each unit member’s situational awareness beyond their immediate environment, helping to coordinate efforts and allocate resources appropriately. This project will explore building this shared representation offline using pre-collected video data. This capability can then be expanded to allow for real-time communication, leveraging AR technology, and include off-site human and AI agents with different functionalities.
Expected Outcome(s)
The technician will be supplied with two Soldiers’ POV video feeds, their head locations and orientations, as well as a UAS’ video footage as the heterogeneous unit searches a mock urban environment for targets (red boxes).
Through the integration of these data streams, the technician will provide:
- A 3D coordinate space of the environment in which target locations and other identified objects are plotted.
- The ability to identify targets in a video frame by ID, linked to their x,y,z location in world coordinates (e.g., the ability to look at Soldier 1’s video and Soldier 2’s video and estimate that they are looking at the same target)
- A minimal method by which to validate this model (e.g., collecting ground truth locations of 2-5 objects in the environment to test the accuracy of the 3D SSU)
Technical Skills
- Programming skills
- Experience in view synthesis pipelines or other related computer vision tools
software development

