# Estimating Wind Speeds

*- Modeling wind speeds at a certain location using the nearest known observations.*

### Introduction

Say you’re interested in finding out how much power a certain wind turbine generates at a given time. The primary deciding factor is then of course the wind speed. Modern turbines have quite accurate wind sensors built in (beside the generator itself, which also acts as a gigantic anemometer), but you might not be able to get access to those data.

One way of addressing this, is to use the nearest known value. The Danish meteorological institute (DMI) makes the measurements of several weather stations available, along with their coordinates.^{1} We can then simply find the nearest weather station,^{2} and use that measurement. How well would this kind of model perform? And could we improve it?

In this post we will look at both evaluating this type of model, and how visualizing our data points on a map can help us deduct valuable insights about the shortcomings of our model.

### Data Analysis

In Figure 1 we can see the different publicly available danish weather stations that provide wind data.^{3} We have collected the data from these observation points for a couple of weeks, resulting in about 2000 readings.^{4}

Since we do not have a known wind measurement from a turbine to evaluate our simple model against, we can instead use each of our known weather stations as a mock turbine, excluding it from the set of observations and evaluating the model on the remaining data with the mock turbine data as truth labels.

Our metric for evaluation will be the mean error of wind speed in meters per second (m/s). There might be better suited metrics for evaluation, but well leave that subject for another time. When running this evaluation we find that our model has a mean error of **1.85 m/s**.

Could we improve this somehow? What if we included more known values? We can expand our model to include the *k* nearest observations^{5} and taking the mean of these, and then evaluate for different values of k. Furthermore we can exchange the simple average for a weighted average based on the distance to our mock turbine. The weight of each observation is calculated as the inverse of the distance, giving the nearer observations a greater impact.

Figure 2 shows the mean errors for k in the range [1,50] calculated using both a simple and weighted mean.

As we can see, the weighted mean consistently yields lower errors than the simple mean, as we would expect. The lowest average error of **1.5 m/s** is found in the *weighted mean model using the six nearest observations*. Although this is an accuracy increase of ~19% from our initial model, there is still a lot of room for improvement, which could be an indication that our model of just looking at distance to observations might be overly simplistic.

If we click the individual observation points in Figure 1, we see both the latest weather data for each observation point, and the results of running the evaluation with that specific weather station as the mock turbine.

By clicking around a bit (try it!), we can sense some patterns in how well the models perform corresponding to the geographical context of the weather station. For example, observations near the coast do not seem to correlate that well to inland observations, even though they might be located relatively close to each other.

But grouping observations in “coast” or “inland” might not be sufficient, as we are not taking the wind direction into account. Presumably the western coast is more windy that the eastern coast, if the wind is coming from the west. And the topology! Hills and valleys and trees and buildings all affect the wind as well. Suddenly the complexity greatly increases, and our simple estimation of wind speeds is not so simple anymore.

We are not interested in developing a full meteorological model here, but it is always important to be aware of the limitations and simplifications of our model, so they can be taken into account when applying it.

### Conclusions

In this post we have evaluated a couple of models for estimating wind speeds at a given location based on known observations at other locations. We have observations from a series of weather stations spread throughout Denmark, and we evaluate our model by using one of these weather stations as a “mock turbine”, estimating the wind speed at this location from the remaining stations, and then iterating over the stations.

We found that the best overall accuracy was achieved when using a weighted average (with the inverse of the distance as weight) of the nearest 6 neighbors. The average error of this model was 1.5 m/s.

When plotting our different weather stations and their individual results on a map, we see a significant difference in the accuracies, which indicate that our method of only using distance as measure of relevance is too simplistic. Looking at the data on a map allows us to see similarities between location near a coast and inland, and between the different coasts, which might improve our model performance if included. These insights might have been hard to extrude if only looking at the data in a spreadsheet.

So, we have evaluated a model and subsequently found some shortcomings in it, but where does that leave us? Can we still use it? Well, that depends on the application. Our model might still work fine, if we just want to know if it is “windy” or not, and not necessarily exactly how windy. Or if you are only interested in locations where the nearest weather stations are in a similar context. It all depends.

- Derived from this page on the DMI website
^{[return]} - Calculating distance using the haversine formula
^{[return]} - DMI also provides data from a bunch of stations in Greenland and in the northern sea, but we’ll leave them out for this experiment.
^{[return]} - Available as JSON here.
^{[return]} - Not to be confused with the KNN algorithm for classification and regression that finds the nearest neighbors in a vector space.
^{[return]}