using Pkg
Pkg.add("Extremes")
Pkg.add("Dates")
Pkg.add("DataFrames")
Pkg.add("Distributions")
Pkg.add("Gadfly")
using Extremes, Dates, DataFrames, Distributions, Gadfly
# Derived from blockMaxima.md
## Load the data
Loading the annual maximum sea-levels at Port Pirie:
```@example portpirie
data = Extremes.dataset("portpirie")
first(data,5)
```
The annual maxima can be shown as a function of the year with the Gadfly package:
```@example portpirie
set_default_plot_size(12cm, 8cm)
plot(data, x=:Year, y=:SeaLevel, Geom.line)
```
## Maximum likelihood inference
### GEV parameter estimation
The GEV parameter estimation with maximum likelihood is performed with the
[`gevfit`](@ref) function:
```@repl portpirie
fm = gevfit(data, :SeaLevel)
```
The approximate covariance matrix of the parameter estimates can be obtained
with the [`parametervar`](@ref) function:
```@repl portpirie
parametervar(fm)
```
Confidence intervals for the parameters are obtained with the [`cint`](@ref)
function:
```@repl portpirie
cint(fm, 0.95)
```
For instance, the shape parameter 95% confidence interval is as follows:
```@repl portpirie
cint(fm, 0.95)[3]
```
### Diagnostic plots
Several diagnostic plots for assessing the accuracy of the GEV model fitted to
the Port Pirie data are can be shown with the [`diagnosticplots`](@ref) function:
```@example portpirie
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
```
The diagnostic plots consist in the probability plot (upper left panel), the
quantile plot (upper right panel), the density plot (lower left panel) and the
return level plot (lower right panel). These plots can be displayed separately
using respectively the [`probplot`](@ref), [`qqplot`](@ref), [`histplot`](@ref)
and [`returnlevelplot`](@ref) functions.
### Return level estimation
*T*-year return level estimate can be obtained using the [`returnlevel`](@ref)
function. For example, the 100-year return level for the Port Pirie block maxima
model is computed as follows:
```@repl portpirie
r = returnlevel(fm, 100)
```
The return level can be accessed as follows:
```@repl portpirie
r.value
```
The corresponding confidence interval can be computed with the [`cint`](@ref)
function:
```@repl portpirie
c = cint(r, 0.95)
```
To get the scalar return level in the stationary case, the following command can
be used:
```@repl portpirie
r.value[]
```
To get the scalar confidence interval in the stationary case, the following
command can be used:
```@repl portpirie
c[]
```
# derived from blockmaxima_ns.md
#### The location parameter varying as a linear function of the year
```@repl portpirie
fm₁ = gevfit(data, :SeaLevel, locationcovid = [:Year])
```
### Diagnostic plots
Several diagnostic plots for assessing the accuracy of the GEV model fitted to
the Port Pirie data can be shown with the [`diagnosticplots`](@ref) function:
```@example portpirie
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm₁)
```
### Return level estimation
Since the model parameters vary in time, the quantiles also vary in time.
Therefore, a *T*-year return level can be estimated for each year. This set of
return levels are referred to as *effective return levels* as proposed by
Katz *et al.* (2002).
The 100-year effective return levels for the `fm₁` model can be computed using
the [`returnlevel`](@ref) function:
```@repl portpirie
r = returnlevel(fm₁, 100)
```
The return level vector can be accessed as follows:
```@repl portpirie
r.value
```
The corresponding confidence interval can be computed with the [`cint`](@ref)
function:
```@repl portpirie
c = cint(r, 0.95)
```
The effective return levels along with their confidence intervals can be plotted
as follows:
```@example portpirie
rmin = [c[i][1] for i in eachindex(c)]
rmax = [c[i][2] for i in eachindex(c)]
df = DataFrame(Year = data[:,:Year], r = r.value, rmin = rmin, rmax = rmax)
nothing # hide
```
```@example portpirie
set_default_plot_size(12cm, 9cm)
plot(df, x=:Year, y=:r, ymin=:rmin, ymax=rmax, Geom.line, Geom.ribbon,
Coord.cartesian(xmin=1920, xmax=2000), Guide.xticks(ticks=1920:20:2000),
Guide.ylabel("100-year Effective Return Level"))
```
# derived from TresholdExceedence.md
## Threshold Exceedance Model
The stationary [`ThresholdExceedance`](@ref) model is illustrated using the
daily rainfall accumulations at a location in south-west England from 1914 to
1962. This dataset was studied by Coles (2001) in Chapter 4. The daily rainfall
are assume **independent and identically distributed**.
```@setup rain
using Extremes, Dates, DataFrames, Distributions, Gadfly
```
## Load the data
Loading the daily precipitations:
```@example rain
data = Extremes.dataset("rain")
first(data,5)
```
Plotting the data using the Gadfly package:
```@example rain
set_default_plot_size(14cm ,8cm) # hide
plot(data, x=:Date, y=:Rainfall, Geom.point,
Theme(discrete_highlight_color=c->nothing))
```
## Threshold selection
A suitable threshold for the Peaks-Over-Threshold model can be chosen by
examining the mean residual life plot. The mean residual life is expected to be
a linear function of the threshold when the latter is high enough. The mean
residual life plot can be plotted with the [`mrlplot`](@ref) function:
```@example rain
set_default_plot_size(14cm ,8cm) # hide
mrlplot(data[:,:Rainfall])
```
As concluded by Coles (2001, Chapter 4), a reasonable threshold is 30 *mm*.
```@example rain
threshold = 30.0
nothing # hide
```
## Exceedances extraction
Parameter estimation of the Generalized Pareto distribution in *Extremes.jl* is
performed using the threshold exceedances previously extracted. The support of
the exceedances given in the fit function is therefore ``(0,∞)``.
For the *Rainfall* example, let's extract the threshold exceedances.
Identify first the threshold exceedances:
```@example rain
df_exceedance = filter(row -> row.Rainfall > threshold, data)
first(df_exceedance, 5)
```
Retrieve the exceedances above the threshold:
```@example rain
df_exceedance.Exceedance = df_exceedance.Rainfall .- threshold
select!(df_exceedance, :Date, :Exceedance)
first(df_exceedance, 5)
```
Alternative, as in TresholdExceedence.md:
```@example rain
df = filter(row -> row.Rainfall > threshold, data)
df[!,:Rainfall] = df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)
```
## Bayesian Inference
Most functions described in the previous sections also work in the Bayesian
context. To reproduce exactly the results, the seed should be fixed as follows:
```@example rain
import Random
Random.seed!(4786)
nothing #hide
```
### GP parameter estimation
The Bayesian GP parameter estimation is performed with the [`gpfitbayes`](@ref)
function:
```@repl rain
fm = gpfitbayes(df_exceedance, :Exceedance)
```
The MCMC sample of the scale and shape parameters can be extracted respectively
with the functions Extremes.scale and shape. The marginal chains for those
parameters can be plotted as follows:
```@example rain
set_default_plot_size(12cm ,10cm) #hide
p1 = plot(y = Extremes.scale(fm), Geom.line,
Guide.xlabel("Iteration"), Guide.ylabel("σ"))
p2 = plot(y = Extremes.shape(fm), Geom.line,
Guide.xlabel("Iteration"), Guide.ylabel("ξ"))
vstack(p1, p2)
```
Several diagnostic plots for assessing the accuracy of the fitted GP distribution
to the rainfall data are can be shown with the [`diagnosticplots`](@ref) function:
```@example rain
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
```
The empirical covariance matrix of the parameters can be obtained with
[`parametervar`](@ref) as follows:
```@repl rain
parametervar(fm)
```
Credible intervals for the parameters are obtained with the [`cint`](@ref)
function:
```@repl rain
cint(fm, 0.95)
```
# derived from thresholdexceedance_ns.md
```@setup rainfall
using Extremes, DataFrames, Dates, Distributions, Gadfly
```
### Load the data
Loading the daily rainfall accumulations:
```@example rainfall
data = Extremes.dataset("rain")
first(data,5)
```
Extract the exceedances over the threshold of 30 *mm*:
```@example rainfall
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
df[!,:Exceedance] = df[:,:Rainfall] .- threshold
df[!,:Year] = year.(df[:,:Date])
set_default_plot_size(12cm, 8cm) # hide
plot(df, x=:Date, y=:Exceedance, Geom.point)
```
### Return level estimation
Since the model parameters vary in time, the quantiles also vary in time.
Therefore, a *T*-year return level can be estimated for each year. This set of
return levels are referred to as *effective return levels* as proposed by
Katz *et al.* (2002).
The 100-year effective return levels for the `fm₁` model can be computed using
the [`returnlevel`](@ref) function:
```@repl rainfall
nobs = size(data,1)
nobsperblock = 365
r = returnlevel(fm, threshold, nobs, nobsperblock, 100)
```
The effective return levels can be accessed as follows:
```@repl rainfall
r.value
```
The corresponding confidence interval can be computed with the [`cint`](@ref)
function:
```@repl rainfall
c = cint(r, 0.95)
```
# derived from Declustering.md
```@example wooster
data = Extremes.dataset("wooster")
set_default_plot_size(12cm, 8cm) # hide
plot(data, x=:Date, y=:Temperature, Geom.point)
```
```@example wooster
df = copy(data)
df[!,:Temperature] = -data[:,:Temperature]
filter!(row -> month(row.Date) ∈ (1,2,11,12), df)
plot(df, x=:Date, y=:Temperature, Geom.point)
```
## The runs method
Two cluster definitions are used in *Extremes.jl*. The first one is based on the
*runs method*: exceedances separated by fewer than ``r`` non-exceedances are
assumed to belong to the same cluster (see Chapter 10, Beirlant *et al.*, 2004).
The parameter ``r`` is generally referred to as the *runlength* parameter. When
``r = 0``, each exceedance forms a separate cluster.
When using the threshold of -10°F and the runlength of 4, 17 clusters are
idenfied:
```@example wooster
threshold = -10.0
cluster = getcluster(df[:,:Temperature], threshold, runlength=4)
nothing # hide
```
The first cluster can be retrieved as follows:
```@example wooster
first(cluster)
```
The GP distribution can be fitted on the cluster maximal exceedances (the cluster
maximum can be computed with the `maximum` method applied to the Cluster type):
```@repl wooster
z = maximum.(cluster)
```
## The two thresholds method
With the two thresholds method, a cluster of threshold exceedances is defined as
a streak of data higher than a second threshold ``u₂`` whose at least one data
is higher than a first threshold ``u₁``, where ``u₁ ≥ u₂``.
When using the ``u₁ = 0°F`` as the first threshold and ``u₂ = -10°F`` as the
second threshold, 11 clusters are idenfied:
```@example wooster
u₁ = -10
u₂ = -15
cluster = getcluster(df[:,:Temperature], u₁, u₂)
nothing # hide
```
The first cluster can be retrieved as follows:
```@example wooster
first(cluster)
```
The GP distribution can be fitted on the cluster maximal exceedances:
```@repl wooster
z = maximum.(cluster)
```
# Derived from blockmaxima_ns.md
### Load the data
Loading the annual maximum sea-levels at Fremantle:
```@example fremantle
data = Extremes.dataset("fremantle")
first(data,5)
```
The annual maxima can be plotted as function of the year:
```@example fremantle
set_default_plot_size(12cm, 8cm) # hide
plot(data, x=:Year, y=:SeaLevel, Geom.line,
Coord.cartesian(xmin=1895, xmax=1990), Guide.xticks(ticks=1895:10:1990))
```
and as function of the Southern Oscillation Index:
```@example fremantle
set_default_plot_size(12cm, 8cm) # hide
plot(data, x=:SOI, y=:SeaLevel, Geom.point)
```
## Maximum likelihood inference
### GEV parameter estimation
The GEV parameter estimation with maximum likelihood is performed with the
[`gevfit`](@ref) function. The parameter estimate vector
``\mathbf{θ̂} = (\mathbf{β̂₁},\, \mathbf{β̂₂},\, \mathbf{β̂₃})^\top`` is contained
in the field `θ̂` of the returned structure.
Several non-stationary model can be fitted.
#### The stationary model
```@repl fremantle
fm₀ = gevfit(data, :SeaLevel)
```
#### The location parameter varying as a linear function of the year
```@repl fremantle
fm₁ = gevfit(data, :SeaLevel, locationcovid = [:Year])
```
#### The location parameter varying as a linear function of the year and the SOI
```@repl fremantle
fm₂ = gevfit(data, :SeaLevel, locationcovid = [:Year, :SOI])
```
#### Both the location and logscale parameters varying as a linear function of
the year and the SOI
```@repl fremantle
fm₃ = gevfit(data, :SeaLevel, locationcovid = [:Year, :SOI],
logscalecovid = [:Year, :SOI])
```
As show by Coles (2001), the best model is the one where the location parameter
varies as a linear function of the year and the SOI, the `fm₂` in the present
section.
The vector of the parameter estimates (location scale and shape) can be extracted
with the function [`params`](@ref):
```@repl fremantle
params(fm₂)
```
The approximate covariance matrix of the parameter estimates for this model can
be obtained with the [`parametervar`](@ref) function:
```@repl fremantle
parametervar(fm₂)
```
Confidence intervals for the parameters are obtained with the [`cint`](@ref)
function:
```@repl fremantle
cint(fm₂, 0.95)
```
### Diagnostic plots
Several diagnostic plots for assessing the accuracy of the GEV model fitted to
the Fremantle data can be shown with the [`diagnosticplots`](@ref) function:
```@example fremantle
set_default_plot_size(21cm ,16cm)
diagnosticplots(fm₂)
```
The diagnostic plots consist in the residual probability plot (upper left panel),
the residual quantile plot (upper right panel) and the residual density plot
(lower left panel) of the standardized data (see Chapter 6 of Coles, 2001).
These plots can be displayed separately using respectively the
[`probplot`](@ref), [`qqplot`](@ref), [`histplot`](@ref) and
[`returnlevelplot`](@ref) functions.
### Return level estimation
Since the model parameters vary in time, the quantiles also vary in time.
Therefore, a *T*-year return level can be estimated for each year. This set of
return levels are referred to as *effective return levels* as proposed by
Katz *et al.* (2002).
The 100-year effective return levels for the `fm₂` model can be computed using
the [`returnlevel`](@ref) function:
```@repl fremantle
r = returnlevel(fm₂, 100)
```
The effective return levels can be accessed as follows:
```@repl fremantle
r.value
```
The corresponding confidence interval can be computed with the [`cint`](@ref)
function:
```@repl fremantle
c = cint(r, 0.95)
```
The effective return levels along with their confidence intervals can be plotted
as follows:
```@example fremantle
rmin = [c[i][1] for i in eachindex(c)]
rmax = [c[i][2] for i in eachindex(c)]
df = DataFrame(Year = data[:,:Year], r = r.value, rmin = rmin, rmax = rmax)
nothing # hide
```
```@example fremantle
set_default_plot_size(12cm, 9cm)
plot(df, x=:Year, y=:r, ymin=:rmin, ymax=rmax, Geom.line, Geom.ribbon,
Coord.cartesian(xmin = 1895, xmax = 1990),
Guide.xticks(ticks = 1895:10:1990),
Guide.ylabel("100-year Effective Return Level"))
```
# Derived from thresholdexceedance_ns.md
### Load the data
Loading the daily rainfall accumulations:
```@example rainfall
data = Extremes.dataset("rain")
first(data,5)
```
Extract the exceedances over the threshold of 30 *mm*:
```@example rainfall
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
df[!,:Exceedance] = df[:,:Rainfall] .- threshold
df[!,:Year] = year.(df[:,:Date])
```
## Bayesian Inference
Non-stationary parameter estimation can also be performed by the Bayesian
approach. To reproduce exactly the results, the seed should be fixed as follows:
```@example fremantle
import Random
Random.seed!(4786)
nothing #hide
```
### GP parameter estimation
The Bayesian GP parameter estimation is performed with the [`gpfitbayes`](@ref)
function:
```@repl rainfall
fm = gpfitbayes(df, :Exceedance, logscalecovid = [:Year])
```
The empirical covariance matrix of the parameters and the credible intervals can
be obtained with the functions [`parametervar`](@ref) and [`cint`](@ref),
respectively:
```@repl rainfall
parametervar(fm)
```
```@repl rainfall
cint(fm, 0.95)
```
### Return level estimation
The 100-year effective return level estimates can be obtained using the function
[`returnlevel`](@ref):
```@repl rainfall
nobs = size(data,1)
nobsperblock = 365
r = returnlevel(fm, threshold, nobs, nobsperblock, 100)
```
The corresponding 95% credible interval can be computed using the function
[`cint`](@ref):
```@repl rainfall
c = cint(r, 0.95)
```
The 100-year effective return levels along with their 95% credible intervals can
be illustrated as follows:
```@example rainfall
df_plot = DataFrame(Year=df.Year, ReturnLevel = vec(mean(r.value, dims=1)),
LowerBound = first.(c), UpperBound=last.(c))
set_default_plot_size(12cm, 8cm)
plot(df_plot, x=:Year, y=:ReturnLevel, Geom.line,
ymin=:LowerBound, ymax=:UpperBound, Geom.ribbon,
Guide.ylabel("100-year effective return level"))
```