- We have no prior data related to the variety grown on the block
- We have no prior data related to THIS block, but we do know about similar blocks
- We have a reasonable set of data about the variety and the region.
When there is only similar historical data to learn from
Some blocks have no historical data, but many similar neighbouring blocks to leverage from. In this case, we use regional/varietal linear regressions to determine ripening speeds of X Baume Per Week the grape ripens at (XPW for short). Using the samples in our possession, we calculated speeds ranging from 0.46 Baume Per Week (Cabernet Franc in Margaret River) to 1.45 Baume per Week (Semillon in Clare Valley). Sadras and Petrie conducted similar research resulting in a list of 9 region/variety speeds (“Predicting the time course of grape ripening” Australian Journal of Grape and Wine Research Volume 18, Issue 1, pages 48–56, February 2012; registration required).
This is still a sub-optimal prediction model in the sense it has a linearity bias resulting in a non-normal distribution of variances on residuals; but it is better as XPW captures the ripening similarities and uses a high number of samples resulting in better noise cancellation. We have systematised this approach and now have a list of 140 regional/varietal speeds. This list augments itself as samples for more regions and varieties arrive. While it’s an improvement, it is no panacea because in addition to the linearity bias, it also suffers from a ‘constant weather’ bias. The data in our possession shows the same block can be harvested at an “historical date” give or take 5 weeks, again shortening the planning horizon.
In the next post, we discuss a couple of non-linear prediction methods, one of which is currently being adapted to provide a more useful way of dealing with the above “data-light” situations. We’ll update you when that is available.
For the stats buffs among you, Marc has put together an explanation of some of the issues with linear models, and how we correct statistical errors here.