In closing the last post, we introduced our prediction widget, which is a playground to get a feel for how predictions work. It is a graphing tool that shows how different prediction models behave on a set of real samples.

It shows two linear models:

  • One Baume per week (or OPW), which is a useful rule of thumb;
  • X Baume per week (or XPW), which uses a line of best fit on the set of samples (aka linear regression).

In a previous post, we discussed the linearity bias inherent in these models, which is important for vintage intake planning.

It also shows two non-linear models:

  • The Allawah model, developed in Allawah, NSW by Thoughtpool;
  • The Clayton model, developed in Clayton, VIC by CSIRO.

We provide illustrations further down this page; but first a quick word about these models.

The Allawah model gives really good results when precise and abundant sampling records are available for a block. In regular weather situations it predicts harvest Baume more than three weeks in advance. The widget uses a sample set where the prediction is precise within 0.2 Baume 44 days in advance and never changes in the interval (we call that the “predictive horizon”). However, its predictive quality degrades for “low doc” or data-light blocks, in which case it provides no particular advantage over OPW.

The Clayton model gives sensible results even in “low doc” situations, which we’re pretty happy about, given this will be the starting point for many customers. The widget currently uses the Clayton-Variety model, which only uses the variety and the samples; it is essentially a starting point because not all blocks have ten years of historical weekly sampling records. It predicts low for late harvests (from late March onwards), so the default sample set does not show how useful Clayton really is.

Please note: both models are the result of “machine learning” processes and a-priori have limited bias and quantifiable tolerance at any point of the ripening curve (typically between 0.25 and 0.7 Baume). But “machine learning” is an appropriate name for what is at work here; at the beginning the system is perfectly ignorant and discovers the data and its trends as it collects block, variety and sample information.

So far, the machine has learnt from (give or take) 11,500 sample sets, from vintage 2000 onwards, for about twenty varieties and twenty Australian wine regions and 1,150 blocks. This is about 70,000 samples. This means that if the system doesn’t know much about your grape or your region (or their combination), it will attempt a “best guess” that can sometimes be less than perfect. Obviously, the more people using the system, the faster it learns.

 

A worked example

Let’s see how all models discover the harvest date. For information, we’re looking at a  block that is normally harvested at 13 Baume.

On the 24th of February, our models predict this way (see Graph 1):

  • OPW predicts 13 Baume on the 22nd of March
  • XPW predicts 13 Baume on the 14th of March
  • Allawah predicts 13 Baume on the 5th of April. Further samples show that on the 4th of April we have 12.7, on the 8th 13.4. (see graph 4). If we extrapolate the value on the 5th, it would be around about 12.9. Allawah already predicts the harvest date and will stick to it until harvest.
  • Clayton predicts a peak of 12.5 Baume on the 25th of April. We said earlier that Clayton predicts too low for late harvests, this is an example of it. That said, you can see that at this early stage its ripening curve is much better than OPW or XPW.
Predictions after two samples

Graph 1

 

A few days before the XPW predicted date, on the 10th of March our models predict this way (Graph 2):

  • OPW predicts 13 Baume on the 23rd of March. This is a change of one day, still 14 days too early.
  • XPW predicts 13 Baume on the 19th of March. This is a change of  five days, still 18 days too early.
  • Allawah still predicts 13 Baume on the 5th of April
  • Clayton predicts a peak of 12.6 Baume on the 15th of April. Clayton is still “closer to the truth” than OPW and XPW.
Predictions after six samples

Graph 2

 

One day before the XPW predicted date, on the 18th of March our models predict this way (Graph 3):

  • OPW predicts 13 Baume on the 30th of March. This is a change of seven days, still 7 days too early.
  • XPW predicts 13 Baume on the 26th of March. This is a change of eleven days, still 10 days too early.
  • Allawah still predicts 13 Baume on the 5th of April
  • Clayton predicts a peak of 12.4 Baume on the 30th of April. Clayton is still “closer to the truth” than XPW and is comparable in precision with OPW.
Predictions after eight samples

Graph 3

 

One day after OPW’s predicted date, on the 31st of March with a few more samples our models predict this way (Graph 4):

  • OPW predicts 13 Baume on the 6th of April. OPW is now “correct” with 6 days to go.
  • XPW predicts 13 Baume on the 4th of April. XPW is now “correct” with 6 days to go.
  • Allawah still predicts 13 Baume on the 5th of April
  • Clayton predicts a peak of 12.4 Baume on the 18th of April. Clayton’s “low prediction for late harvest” now makes it the least precise (its “variety plus location” variant (release date mid-December 2014) will address this issue).
Predictions after ten samples

Graph 4

 

Eventually, on the harvest date, here the 9th of April; the graphs are now just reporting what happened. With all their per-sample adjustments, all models now look roughly equivalent from a distance. The continuation of the plots is the only indication of how the models interpret the ripening phenomenon.

Predictions on delivery

Graph 5

Why is all this important?

While it’s easy to get carried away with the precision implied by the maths, the exercise is about giving vintage planners a useful predictive horizon, rather than trying to make the decisions about when to pick. If planners have a sufficiently long horizon that can be trusted, they can plan crushing and storage capacity, harvesters, transport, spraying and assessments (both technical and sensory) weeks in advance. With no disruptions, this means an efficient queue of grapes arriving for crushing. Even with disruptions (e.g. an adverse weather event), the adjustments required are easier to cope with.

The end result is better protection of the potential value of the grapes selected for harvest, and a more efficient harvest.