As a data and analytics company that powers better decisions in agriculture, Arable takes data accuracy and improvement very seriously. In a newly published white paper, our Data Science team explains how we use machine learning (ML) to improve Arable’s core measurements, and how we validate their performance. Results of this study found that at one testing site, Arable was shown to be up to 60x more accurate in detecting rainfall amounts than a similar device. A synopsis of the report follows. Download the full white paper here.
The Case for Machine Learning in the Field
Arable’s Mark 2 is an all-in-one weather station and crop monitor that collects climate and plant data for actionable insights in all growing conditions. The device captures more than 40 plant and climate data streams using a robust sensor suite that includes three radiometers, dual 6-band spectrometers, and an acoustic disdrometer to measure rainfall. Syncing to the cloud every 15 minutes, these insights provide a 360-degree view in real time, a powerful yet affordable tool for stakeholders across the agricultural spectrum.
Arable co-locates devices with sensors at research site to train machine learning models with gold-standard data.
While the Mark 2 is designed with an eye toward accuracy and durability in the field, there is also an understanding that the core measurements — as well as any derived agronomic features — would be enhanced by applying machine learning solutions. Machine learning broadly refers to the study of computer algorithms that build mathematical models based on sample data (known as “training data”) to make predictions. Arable’s use of ML focuses on the calibration of core measurements to improve accuracy and predictive analytics.
To validate our process, we co-located Mark 2 devices at sites with gold-standard research-grade reference instruments in North America, Europe, and Australia. This allows us to compare our measurements with local data to establish truly accurate metrics across a wide range of climate zones. To determine how Arable compares to similar devices in the market, we also co-located with two commercial-grade weather stations sold at comparable price points: the Davis Vantage Pro2 and METER ATMOS 41. We use the data collected from all of these research locations to align Mark 2 measurements with reference measurements over time.
Leveraging Arable’s Unique Cal/Val Network
To acquire the training data required to build Arable’s ML models, and testing data to validate their performance, we have invested heavily in collecting vast amounts reference data. To date, we have gathered more than 70 million data points across 45 measurements at 36 different sites. Together, these 36 controlled research sites make up our proprietary calibration/validation network, or “Cal/Val Network” as we call it here at Arable.
The Cal/Val Network spans 36 sites across 11 climate zones.
This unique effort started in late 2018 with the collection of air temperature and rainfall data, and expanded in 2019 to include measurements across our full sensor suite. Today, our Cal/Val Network includes a wide range of institutions that use the high-accuracy, high-precision instruments that provide meteorological inputs for studies at the finest research facilities around the world. We now cover 11 Köppen-Geiger climate zones (see Figure 1) and work with renowned universities and organizations such as the NASA Goddard Space Flight Center and the National Renewable Energy Laboratory and others that are members of AmeriFlux.
These alliances enable us to leverage in situ, high resolution, research-grade datasets for Arable’s analytics and Mark 2 sensor calibrations. This not only allows us to regularly update our devices, it lets us test new features, engineer hardware in shorter timelines, and jump-start novel analytics with third-party sensor integrations.
Map of Arable’s Cal/Val Network. Blue dots indicate the location of specific sites, with Köppen-Geiger climate zones differentiated by color.
The True Power of Machine Learning
The Cal/Val Network provides us with continuous data streams across multiple measurements, allowing us to identify any discrepancies between raw Mark 2 recordings and localized ground truth. This data is used to train ML models that “learn” the corrected or calibrated measurements coming from the gold-standard reference instruments. The ML models are then applied to incoming data in real time to obtain a more accurate and reliable core measurement base.
Two more of Arable’s Cal/Val sites, on opposite sides of the globe.
This process is iterative in the sense that these models are updated on a regular basis as we collect more data. As the models improve with more training data, they can discern more complex patterns and, in turn, produce predictions that are better and more accurate. This is the true power of ML: we can keep improving results by adding data to the models and releasing new software without having to update the hardware — a flexible and sustainable way of building technologies. And as we collect more data through our Cal/Val Network, we build smarter models that more accurately assess weather and crop conditions.
To ensure the highest level of accuracy, we not only use our Cal/Val Network to build the models, but also to extensively test and confirm their continued performance. In addition, we use it to measure our device performance against similar commercial grade weather station device. To learn more about our validation process, and see our Data Science team’s analysis, read the white paper here.
Stacy Basko is a writer at Arable. For more like this, visit our blog, The Plot.