This Flight Plan is divided into two parts: Geospatial intelligence and Machine Learning. The Sample Solution will detail how we combine these technologies in the DAIR cloud.
Part 1
Geospatial Intelligence Overview
Best Practices
Tips and Traps
Resources
Part 2
Machine Learning Overview
Best Practices
Tips and Traps
Resources
Geospatial Intelligence Overview
What is geospatial intelligence?
Geospatial intelligence is a broad field that combines geospatial data with other types of data from sources like social, political, and environmental sciences.
The Intelligence Community defines geospatial intelligence as:
“…the use and analysis of geospatial information to assess geographically referenced activities on Earth.”
While often associated with national defense – geospatial intelligence is increasingly being leveraged by civilian and private sector organizations in the telecommunications, transportation, public health and safety, and real estate industries. This data helps them improve their products and services and better serve their customers.
The basic principle is to organize and combine all available data around its geographical location on Earth and then use it to prepare products that can be used by planners, emergency responders, and decision makers.
In this BoosterPack, you will deploy an application that uses machine learning algorithms to perform analysis of geospatial data and the relationship, with a variable that serves as the predict (the entity we want to predict). This data is overlaid on a map to effectively visualize patterns, and relationships within a set of geospatial data.
What value has it added to my business?
The merging of cloud computing with machine learning and satellite imagery has allowed us to gather more accurate global insights about everything from extreme weather and sea-level rise to water and air pollution. We collect and analyze information by harnessing geospatial data and then – using machine learning algorithms – detect trends, patterns, and changes. This helps us translate environmental data into easy-to-digest information for potential clients in sectors like agriculture, health care, insurance, and government.
Best Practices
The data structure for this project requires three important dimensions:
spatial (where the attribute was measured),
temporal (when it was measured), and
the value/magnitude of the measured attribute/associated attributes.
In most data frames with all three features, the spatial dimension typically takes the form of addresses which must be converted into longitudinal and latitudinal coordinates that can be overlaid on a map. Several APIs are used to facilitate this conversion process and work with data that does not have associated longitude and latitude values. See the following resources for an overview of spatio-temporal data and geocoding for data analysis.
See these resources for an overview of spatio-temporal data and geocoding for data analysis.
What is geocoding?
Source: arcgis.com
Spatiotemporal data analysis with chronological networks
Source: nature.com
Tips and Traps
The output of a geospatial analysis (with spatio-temporal data) should show change over space and time, and shouldn’t be confused with a time-series model or analysis. Cluster analysis of several points within a spatial region may introduce spurious data output if the data is treated as a purely time-series dataset. It can limit the accurate prediction of variables which have a relationship with other dependent variables that are significantly affected by their spatial attributes.
For example, real estate costs are dependent on many factors. Graphical plots showing price changes over time provide a one-dimensional view of what is happening temporally. However, there is little understanding of how other factors may be contributing to this change. Spatial elements, like proximity of public transportation, schools, and law-enforcement presence (patrol routes, police stations, or effective neighborhood watch organizations) can influence the price of homes in comparison with other neighborhoods. Factoring all, or some elements (both spatial and temporal) in the analysis process greatly improves home price predictions over time.
Resources
Tutorials
The table below provides a non-comprehensive list of tutorials the author found most useful.
A research paper detailing the advancement of AI in Geospatial applications
Machine Learning Overview
What is Machine Learning?
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, thus gradually improving its accuracy. Machine learning is a vital component in the growing field of data science.
What value has it added to my business?
Using statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights within data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics.
The convergence of cloud computing with machine learning and satellite imagery means there’s now a better way to gather global insights about everything from extreme weather and sea-level rise to water and air pollution. The collection and analysis of information by harnessing geospatial data and then, using machine learning algorithms to detect trends, patterns, and changes helps our business translate environmental data into easy-to-digest information for potential clients in sectors like agriculture, health care, insurance and government.
Why choose machine learning over the alternatives?
We chose machine learning to leverage the power of advanced pattern recognition which leads to highly accurate data and the ability to predict future trends.
Machine learning models can be built to be highly modular and scalable enabling deployment at scale.
Best Practices
Technology infrastructure has multiple roles when it comes to machine learning applications. One of the major tasks is to define how we gather, process, and receive new data. After that, we need to decide how we train our models and version them. Finally, we must consider how to deploy the model in a production environment. In all these tasks, infrastructure plays a crucial role. You will probably spend more time working on the infrastructure of your system, than on the machine learning model itself.
Microservices architecture can help you achieve modularity and scalability. Using technologies like Docker and Kubernetes, you should be able to encapsulate separate parts of the system. This way you can make incremental improvements in each of them and replace each component independently as necessary. Also, scaling with Kubernetes is typically a low-effort process.
Tips and Traps
When choosing an algorithm, always consider accuracy, training time, and ease of use. Many users put accuracy first, while beginners tend to focus on algorithms they know best.
When presented with a dataset, first consider how to obtain results, no matter what those results might look like. Beginners tend to choose algorithms that are easy to implement and can obtain results quickly – this is fine when it is just the first step in the process. Once you obtain results and become familiar with the data, you may spend more time using more sophisticated algorithms to strengthen your understanding of the data, and further improving the results.
The best algorithms may not have the highest reported accuracy. An algorithm usually requires careful tuning and extensive training to obtain its best performance.
The SAS blog is a free resource is designed for beginner to intermediate data scientists to help identify and apply machine learning algorithms to address the problems of their interest.
Docker is an open-source containerization platform. It enables developers to package applications into containers — standardized executable components combining application source code with the operating system (OS) libraries and dependencies required to run that code in any environment.
Dash is the original low-code framework for rapidly building data apps in Python. Written on top of Plotly.js and React.js, Dash is ideal for building and deploying data apps with customized user interfaces. It’s particularly suited for anyone who works with data.Through a couple of simple patterns, Dash abstracts away all the technologies and protocols that are required to build a full-stack web app with interactive data visualization.
Got it? Now let us show you how we deployed it on the DAIR Cloud…
There is a substantial need for an open-source platform that includes a NoSQL database application with its interface and AI capabilities that are automated and quick-to-deploy in the cloud, while being compatible with geospatial data. This problem is general and relevant to several businesses types and industries. It enables users to record, edit, browse, and query spatio-temporal data to answer questions of where, when, how, what, and who, for informed decision-making.
This Sample Solution demonstrates how a modular and scalable cloud-based platform can be built to leverage the power of geospatial intelligence and machine learning. Once deployed, the basic infrastructure platform can be used as a foundation to build and deploy similar cloud-based platforms at scale.