The prediction strategy we use depends what data we already have about the block or vineyard of interest. There are three possibilities we deal with:
  • We have no prior data related to the variety grown on the block
  • We have no prior data related to THIS block, but we do know about similar blocks
  • We have a reasonable set of data about the variety and the region.
No history
Some blocks have no historical data to learn from; it may be a “new” variety for the region in question, or even relatively rare for the country. In this case, we currently use the common rule of thumb of One Baume Per Week (OPW for short). This is (in statistical terms) a poor prediction model because its variances on residuals do not result in a normal distribution, which denotes a modelling bias (in other words the model does not “really fit” the data). As harvest approaches it gets more useful, but this usually means an abbreviated planning horizon.

When there is only similar historical data to learn from
Some blocks have no historical data, but many similar neighbouring blocks to leverage from. In this case, we use regional/varietal linear regressions to determine ripening speeds of X Baume Per Week the grape ripens at (XPW for short). Using the samples in our possession, we calculated speeds ranging from 0.46 Baume Per Week (Cabernet Franc in Margaret River) to 1.45 Baume per Week (Semillon in Clare Valley). Sadras and Petrie conducted similar research resulting in a list of 9 region/variety speeds (“Predicting the time course of grape ripening” Australian Journal of Grape and Wine Research Volume 18,  Issue 1pages 48–56February 2012; registration required).

This is still a sub-optimal prediction model in the sense it has a linearity bias resulting in a non-normal distribution of variances on residuals; but it is better as XPW captures the ripening similarities and uses a high number of samples resulting in better noise cancellation. We have systematised this approach and now have a list of 140 regional/varietal speeds. This list augments itself as samples for more regions and varieties arrive. While it’s an improvement, it is no panacea because in addition to the linearity bias, it also suffers from a ‘constant weather’ bias. The data in our possession shows the same block can be harvested at an “historical date” give or take 5 weeks, again shortening the planning horizon.

In the next post, we discuss a couple of non-linear prediction methods, one of which is currently being adapted to provide a more useful way of dealing with the above “data-light” situations. We’ll update you when that is available.

For the stats buffs among you, Marc has put together an explanation of some of the issues with linear models, and how we correct statistical errors here.