Training Data for Machine Learning GIS Applications

The intersection of GIS and machine learning is evolving and bringing forth new use cases and applications of ML. These applications, spanning both private and public sectors, are powered by large volumes of data captured by satellites, drones, cameras, LIDAR sensors, and more, which come together to provide a comprehensive view of the world. The volume and variety of data complicate their management and use.

As applications of ML for GIS become increasingly complex, it can become difficult to generate high-quality ground truth or training data for these new applications. While it is widely known that training datasets often need to be quite large, less is known that these datasets increasingly need to be labeled either by subject matter experts or personnel trained in a variety of different fields. Feature extraction or the process of extracting similar spectral, spatial, and texture attributes from geospatial imagery to power different use cases is fundamental in creating datasets.

Today there is a unique opportunity to create GIS training data using closely trained and managed data labelers and/or GIS technicians. With the range of applications in the field, a deep understanding of the type of data required for each use case along with subject matter expertise or a hybrid of the two is required.

Geospatial intelligence for the private and public sector

Geospatial intelligence provides geographic information and the distribution of items in a geographic space and is now an essential tool for everything from national security to land use and planning to agriculture and a host business and government functions. Use cases for geospatial applications cover a wide range of public and private sector activities, including land use planning, commercial and residential insurance, agriculture, national security, oil and gas exploration and retail.

Take the case of insurance companies. GIS can provide them with accurate location-based information that they need and can be used for risk management. Location-related information, such as the location of assets, their proximity to hazards such as industrial areas, natural elements, is important for insurance companies to develop risk profiles. Access to this information could be valuable for insurers to make informed decisions.

Similarly, in agriculture, geospatial intelligence can complement farmers’ efforts by providing them with an overview of fields and crops. This data is useful for understanding yield spread, crop health, threats, or the availability of natural resources such as water bodies. All of this can help farmers or companies involved make the right decisions to improve yields and reduce time and effort.

In the public sector, there is huge innovation in geospatial intelligence. Often, defense departments in countries like the United States use geographic data to assess security measures and deploy smart military operations. Companies like Maxar Technologies provide ground-based images of the Russian-Ukrainian conflict, in real time, to share information about the Russian advance with the world. Similarly, defense departments around the world use geospatial data and remote sensing to monitor enemy movements on the ground or in the air – detecting unidentified aircraft, spy drones, fighter jets, etc

Beyond that, geospatial is also used by governments to keep tabs on possible natural disasters such as floods and earthquakes. This is important for quickly planning and strategizing rescue missions to reduce loss of life and property.

Faced with this array of applications, workflows that help scale skilled teams, provide onboarding training, project management, and quality assurance throughout the project, must be created with mind the final deployment of the data and the ML model.

Tools and techniques for GIS

For geospatial intelligence, data is collected through satellites, drones, and other aerial sources that capture everything in a specific geographic area, and the data annotation required varies depending on the end use case. At iMerit, project workflows and designs are tailored to accommodate the wide range of geospatial applications.

For example, in the case of insurance companies, iMerit uses image classification and 2D polygon annotation to capture building features such as windows, doors, garages, swimming pools in order to estimate the insurance premium rate. In the case of the military, data capture is sophisticated and involves multiple methods and technological supports. Technologies like SONAR are also used to collect data. The LIDAR data method creates images of the top of the house that can be used to create topographic maps, showing different elevations and spraying elevation patterns for different use cases.

In case of infrastructure inspection, drones are sent with RGB video as well as LIDAR sensors to get very high fidelity infrastructure scans or models using which you can assess different areas of weakness or which ones must be replaced. For example, California is known to have aging energy infrastructure for power lines that have been updated using geospatial intelligence. Texas is an example where infrastructure affected by snowstorms is identified using drones and other aerial means and restored.

The future of data solutions for geospatial intelligence

The amount of data available and collected has increased dramatically. So is the demand for higher resolution and higher quality data. There are different types of data emerging, such as Synthetic Openness Data (SAR), which is relatively new. We are starting to see different ways of collecting data. The truly transformative one is LIDAR, which helps create highly accurate 3D representations or models of different aspects of the world.

There is another interesting perspective that is increasingly being used, the application of machine learning and computer vision to these types of data to automate some of this analysis, which was typically done manually by GIS analysts. .

GIS and machine learning will continue to evolve, bringing to the fore the unique use cases and resulting complexities in using data. This dynamism requires the training and skill development of data annotators to seamlessly deliver high-quality data to power systems based on artificial intelligence and machine learning. It is therefore vital for data annotation vendors to bring together technology, talent, and technique to deliver simplified, high-quality data that enables businesses and society to make informed decisions.

Sherry J. Basler