The Weibull distribution is used to extrapolate from available data to gain an insight into the possible magnitude of rare events.
ISO standards provide it as an example (“…an appropriate extreme distribution such as Gumbel or Weibull…”), whilst DNV recommended practises are a little more specific (“For Peak over threshold (POT) and storm statistics analysis, a 2-parameter Weibull distribution or an exponential distribution is recommended”).
Whatever the standards and guidance documents may say, time and again I’ve heard Weibull referred to as the de facto probability distribution to use for extrapolation to rare events in metocean science. The question is, why? Jamie Hernon, a metocean specialist at ABPmer, took a dive into the statistical theory behind extreme value analysis to find out.
Selection of the appropriate probability model to use in describing low-probability events requires a good understanding of the preceding process in extreme value analysis – choosing what data you’re going to fit the model to.
Let’s imagine that we have a dataset containing 30 years’ worth of hourly wave height data from a hindcast, equating to around 263,000 observations. How should we select the data to fit our extrapolation model to? What we want to select is the ‘top-end tail’ of the data, i.e. the ‘extreme’ part of the dataset. Here area few possible choices:
• Take annual maxima? What if there are five events in one year that are all bigger than the biggest event in the following year? Four of those events get wasted. Therefore ABPmer would generally reject this;
• Block maxima? This relates grouping the data into blocks of a particular length, for instance, months, and taking the highest from each block. It’s better than annual maxima, but suffers from the same problem of being wasteful of data, where a few extreme events might occur in the same block;
• R-largest? This is similar to the above, but takes, for example, the top five events in each block. This would be called ‘5 largest’ selection. Not bad, but careful consideration would be required to select an r value that captures the necessary events, whilst rejecting those which fall outside the ‘extreme’ bracket.
Historically, when extreme value methods were in their infancy, the three methods above were favoured because computer memory limitations would have meant that dealing efficiently with 263,000 observations would have been tricky.
These days, with modern computing power, we can do better than this, and make use of all our data by using threshold excesses - pick a threshold, and say all your events over that threshold are ‘extreme’, and therefore we want to model them.
Of course, this introduces further questions around what threshold to select and declustering, which are complicated enough topics to warrant an article in their own right. But, in terms of making best use of all the data available to us, threshold excesses is the clear winner in nearly all metocean applications these days.
Once we have identified the data we will use, then it comes down to selecting which model family to use for the analysis. There are two possible choices here:
• Asymptotic models (the Generalised Extreme Value family of distributions, of which Weibull is one), appropriate for block maxima-type data, or;
• Threshold models (i.e. the Generalised Pareto Distribution), appropriate for modelling threshold excesses.
As we have enough computing power to be using threshold excesses as our dataset for extreme value modelling, I think it’s time to say goodbye to the Weibull Distribution, and hello to the Generalised Pareto Distribution, in metocean science.
+44 (0) 2380 711 879