Every pixel in every image of the real world should contain geographic (and time/date) information.
Whether satellite, aerial drone, autonomous vehicle, security camera, cell phone camera..every sensor should attempt to tag its captured pixels with geospatial and temporal information–that is, with a grid coordinate (plus altitude) and a date and time. In the proliferation of the web of connected sensors and actuators–the Internet of Things (IoT)–and other dynamic and multi-sensor decision systems (e.g. the control system of an autonomous vehicle) the most valuable information immediately after what do I see? is where and when is it?
From the perspective of an AI decision system, trying to decide about when to trigger an actuator, the raw unprocessed and un-geo-registered pixels of a sensor are far less valuable than a simple dot on a map showing time, location, and a digital description of what that dot represents .
For example, in an autonomous fire-fighting decision system context, everything from the fire-fighting drone flight control system, down the motor to release a drone-carried water bucket, is triggered not just by the raw pixels of the fire-detecting IR camera, but by the geo-shapefile (or kml/kmz) detected as “fire/hotspot” and plotted on the system map.
There are many methods to capture and encode this (geo-)spatial-temporal data: internal GPS sensors (for sensor location), gyroscopes (for 3-D angle of sensor during capture), and system time information that can be embedded in the metadata of image and video files, as well as methods that are downstream of the sensor that include comparing captured pixels to other pixels with known geo-coordinates (e.g. pre-geo-registered satellite imagery/maps).