This quick guide provides an overview of the core FEH functions that can be used to generate extreme flow estimates from single-site or pooling (gauged & ungauged) approaches. It is assumed that the reader has knowledge of the theory of these approaches; this guide is intended to demonstrate how to use the UKFE package rather than to explain the FEH statistical method.
The following table provides the key functions for implementing the FEH methods.
Function | Purpose |
---|---|
GetCDs/CDsXML | Get catchment descriptors using the NRFA gauge reference number or import them from XML files downloaded from the FEH Web Service |
GetAM/AMImport | Get annual maximum data from sites suitable for pooling or from AM files |
QuickResults | Provides default pooling results (gauged, ungauged & fake ungauged), directly from catchment descriptors |
QMED | Estimates QMED from catchment descriptors with the option of zero, one or two donors |
DonAdj | Provides a list of donor candidates based on proximity to the study catchment |
Pool | Creates a pooling group with the option of excluding one or more sites |
H2 | The heterogeneity measure for pooling groups |
DiagPlots | Provides a number of diagnostic plots to compare the sites in a pooling group |
Zdists | The Zdist statistic to determine the distribution with the best fit to the pooling group |
PoolEst | Flood estimates from the pooling group |
Uncertainty | Quantifies uncertainty for gauged (enhanced single-site) and ungauged pooling groups |
GenLog/GEV/GenPareto/Gumbel/Kappa3 | Four functions for each distribution to fit the distribution and obtain flood frequency estimates and growth factors |
POTextract or POTt | Functions for extracting peaks over threshold data |
AnnualStat | Extract annual statistics. The default is maximum, but any statistic can be used such as sum or mean. |
We need to load the package before we can use any of the functions:
This section provides a brief overview of the FEH statistical method, and its core assumptions. Examples of how to apply the method can be found later.
The FEH statistical method comes under the broader term of ‘regional frequency analysis’ and employs the “index flood” procedure for estimating peak flows at a single location on a river line. The procedure has two main steps, which are multiplied together for a final result:
Estimate the index flood (in the FEH method, the index flood is QMED - the median annual maximum flow).
Calculate the growth curve (this provides growth factors as a function of return period and is a standardised model of the AMAX distribution).
Multiplying the index flood by the growth curve scales the standardised growth factor to produce the flow estimate for the catchment of interest. When the catchment is gauged, the index flood is estimated as the median of the observed annual maximum sample (assuming suitable data quality). When the catchment is ungauged, the index flood is estimated using a multiple regression equation, using catchment descriptors as predictors.
The growth curve is estimated by pooling gauged data from catchments which are similar to the catchment of interest. A statistical distribution is assumed. Then, the average parameters of the fitted distribution (L-moment ratios) from the AMAX samples in the pooling group are used to estimate the growth curve.
There are numerous choices and possible adjustments which can be made for each step. The UKFE package, in general, provides default settings with no adjustments; however, these can be changed by the user.
There are several key underlying assumptions in FEH statistical analysis. As stated in Hosking and Wallis (1997), the index-flood procedure makes the following assumptions:
Observations at any given site are identically distributed.
Observations at any given site are serially independent.
Observations at different sites are independent.
Frequency distributions at different sites are identical apart from a scale factor.
The mathematical form of the regional growth curve is correctly specified.
The first of these leads to the assumption of stationarity
(i.e. observations of past flood events are assumed to represent the
behaviour of future events). Where this is not the case, non-stationary
methods can be applied. For example, non-stationary distributions can be
fitted using other software such as the nonstat
R package
(Hecht and Zitzmann, 2025).
The QMED regression equation was calibrated using an approach similar to the generalised least squares method. This allows for covariance of the model residuals (errors). Firstly, the observed QMEDs (on which it was calibrated) were logarithmically transformed. The catchment descriptors used were also transformed. With this in mind, the assumptions of the QMED equation are as follows:
The relationship between the transformed independent variables (predictors) and the logarithmically transformed dependent variable (QMED) is linear.
Observations (observed QMEDs used in calibration) are independent of each other.
Model and sampling errors are independent of each other.
Model and sampling errors are normally distributed and have a mean of zero.
Predictor variables are independent.
The cross-correlation of model errors can be described by the distance between catchment centroids, and the form of the associated correlation matrix is known for the calibration process.
One of the first steps in any FEH analysis is to obtain catchment descriptors (CDs). See the ‘Data input’ article for information on functions for importing CDs.
Once you have some catchment descriptors, you can start to use many
of the FEH based functions. The full process, covered shortly, is
encompassed in the QuickResults()
function. This provides
results from a default pooling group for gauged or ungauged catchments,
automatically accounts for urbanisation, and provides a two donor
transfer if gauged = FALSE
. The default distribution is the
generalised logistic distribution. Further details and assumptions are
provided in the function’s help file.
First, we’ll obtain some CDs for a site suitable for pooling, allowing us to conduct both a gauged estimate and an as-if ungauged estimate (i.e., treating the gauged site as if no flow record were available, using only catchment descriptors). The following code then provides estimates and plots for both the gauged and as-if ungauged cases.
For the gauged case, the plot shows the default gauged growth curve for site 55002. The results from the default enhanced single-site analysis at gauge 55002 are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q) and growth factor estimates (GF). Two additional columns quantify the 95% confidence interval of the peak flow estimates. The second element is the estimated L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide distribution parameters (location, scale and shape for the default generalised logistic distribution). The third provides distribution parameters for the growth curve, and the fourth provides distribution parameters for the frequency curve.
# Get catchment descriptors for NRFA site 55002 and save to an object called CDs.55002
CDs.55002 <- GetCDs(55002)
# Obtain estimates and plots for the gauged scenario using default settings
QuickResults(CDs.55002, gauged = TRUE)
#> [[1]]
#> RP Q GF lower95 upper95
#> 1 2 410.053 1.000 383.959 436.147
#> 2 5 494.926 1.207 455.250 534.602
#> 3 10 553.498 1.350 502.620 604.376
#> 4 20 614.227 1.498 550.099 678.355
#> 5 50 701.803 1.711 615.528 788.078
#> 6 75 744.017 1.814 645.744 842.290
#> 7 100 775.451 1.891 667.672 883.230
#> 8 200 856.755 2.089 722.066 991.444
#> 9 500 977.811 2.385 796.648 1158.974
#> 10 1000 1081.070 2.636 853.953 1308.187
#>
#> [[2]]
#> PooledLcv PooledLSkew
#> [1,] 0.1347467 0.1515658
#>
#> [[3]]
#> loc scale shape
#> 1 1 0.134 -0.152
#>
#> [[4]]
#> loc scale shape
#> 1 410 55 -0.151
The fake ungauged quick results estimate removes the pooled site of interest from the process and implements an ungauged assessment. Here, the plot shows the default fake ungauged growth curve for site 55002. The same outputs are again printed to the console, but this time they provide the 68% confidence interval.
# Obtain estimates and plots for the fake ungauged scenario using default settings
QuickResults(CDs.55002, FUngauged = TRUE)
#> [[1]]
#> RP Q GF lower68 upper68
#> 1 2 562.017 1.000 562.017 562.017
#> 2 5 724.757 1.290 721.151 728.381
#> 3 10 838.057 1.491 824.048 852.304
#> 4 20 956.271 1.701 923.933 989.740
#> 5 50 1127.930 2.007 1054.140 1206.885
#> 6 75 1211.127 2.155 1112.146 1318.917
#> 7 100 1273.253 2.266 1153.309 1405.671
#> 8 200 1434.598 2.553 1252.924 1642.615
#> 9 500 1676.419 2.983 1386.616 2026.791
#> 10 1000 1884.038 3.352 1489.358 2383.308
#>
#> [[2]]
#> PooledLcv PooledLSkew
#> [1,] 0.1853155 0.1596821
#>
#> [[3]]
#> loc scale shape
#> 1 1 0.187 -0.16
#>
#> [[4]]
#> loc scale shape
#> 1 562 105 -0.16
The default setting for the QuickResults()
function is
to treat the site as ungauged, in which case, for a catchment with no
gauge, the code can simply be run with only the first argument (the CDs)
provided. If the site is not urban and the CDs are from a site suitable
for pooling, the site itself will be included (so it won’t be truly
ungauged unless we use FUngauged = TRUE
).
Note that if the default ungauged is used, then ESS will not be applied, even if the gauge of interest is at the top of the pooling group. The gauge will be included but won’t be weighted according to the ESS method. The QMED is estimated as if ungauged; however, because the CDs are the same as those of the closest donor (which are derived from the gauged site), the QMED will end up being the same as the observed one.
There is a QMED()
function which provides the ungauged
QMED (median annual maximum flow) estimate from catchment descriptors.
There is an option to adjust this using one or two donor sites. The
simplest case is to provide catchment descriptors with no donor
transfer:
By default, this gives rural QMED results. If the site is urbanised
(has an URBEXT2000 value greater than 0.03), there will be a warning and
a suggestion to change the UrbAdj
argument to TRUE.
To undertake a donor transfer (with the aim of improving the
estimate), a list of donor site candidates is necessary. For this, there
is the DonAdj()
function, which provides the closest
N gauges (default of 10) to the site (geographically by
catchment centroid) that are suitable for QMED, along with catchment
descriptors. It also provides the donor adjusted result beside each one
in the last column.
# Identify donor catchments based on the catchment descriptors for site 55002
DonAdj(CDs.55002)
#> AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR FARL FPEXT
#> 55002 1894.2575 297 128 0.08 0.421 95.63 134.4 0.967 0.0693
#> 55007 1283.4025 343 132 0.09 0.387 42.94 152.4 0.960 0.0412
#> 55016 358.9450 324 157 0.07 0.398 32.92 132.2 0.998 0.0428
#> 55013 125.8975 303 103 0.20 0.468 14.84 130.1 0.999 0.0382
#> 55012 246.6575 333 136 0.11 0.380 20.67 158.6 0.997 0.0419
#> 56003 62.6575 311 175 0.21 0.449 10.94 121.0 0.999 0.0268
#> 56013 63.5450 349 151 0.18 0.406 11.95 137.9 1.000 0.0257
#> 55014 202.5450 299 122 0.18 0.540 17.88 158.5 0.996 0.0646
#> 55004 73.0125 412 152 0.19 0.342 11.67 188.7 1.000 0.0284
#> 55023 4016.2550 227 127 0.10 0.496 130.34 116.1 0.979 0.0824
#> LDP PROPWET RMED.1H RMED.1D RMED.2D SAAR SAAR4170 SPRHOST URBEXT2000
#> 55002 157.73 0.50 10.2 42.4 56.1 1230 1269 39.67 0.0029
#> 55007 84.45 0.53 10.3 44.7 60.0 1386 1413 42.23 0.0022
#> 55016 62.29 0.49 9.6 36.2 48.5 1066 1129 40.33 0.0031
#> 55013 28.33 0.49 9.8 37.5 48.7 962 1086 34.31 0.0046
#> 55012 39.53 0.65 10.8 50.0 66.4 1627 1639 42.75 0.0020
#> 56003 21.79 0.53 10.4 43.1 56.6 1171 1257 35.20 0.0026
#> 56013 23.08 0.61 10.7 45.5 59.5 1299 1426 38.25 0.0003
#> 55014 31.70 0.49 9.8 36.1 47.3 977 1062 31.41 0.0009
#> 55004 23.04 0.65 11.4 52.8 70.0 1845 1913 46.25 0.0033
#> 55023 244.16 0.38 10.1 38.7 50.3 1010 1054 35.60 0.0072
#> QMED QMEDcd X Y QMEDfse N URBEXT1990 BFIHOST Dists
#> 55002 410.0530 575.18962 306152 255939 1.039511 49 0.0020 0.472 0.00000
#> 55007 514.5350 520.28951 298496 263086 1.058186 83 0.0010 0.426 10.47349
#> 55016 120.1240 130.51968 309100 270212 1.037415 48 0.0018 0.427 14.57427
#> 55013 27.7575 36.84552 323593 254543 1.055332 54 0.0030 0.553 17.49678
#> 55012 191.0210 180.89006 289404 250198 1.061544 54 0.0004 0.431 17.70465
#> 56003 23.4560 30.39078 302460 237118 1.132355 21 0.0009 0.529 19.17970
#> 56013 34.5900 40.46358 297636 238410 1.035393 51 0.0000 0.495 19.48815
#> 55014 26.5040 45.03533 324893 265277 1.094494 55 0.0023 0.593 20.93856
#> 55004 56.5420 80.86561 284963 252747 1.075714 45 0.0009 0.402 21.42808
#> 55023 492.1000 660.63796 326243 248368 1.050621 53 0.0067 0.542 21.47017
#> a QMED.adj
#> 55002 1.0000000 410.0530
#> 55007 0.3765023 572.7861
#> 55016 0.3440466 558.9970
#> 55013 0.3241613 524.7323
#> 55012 0.3228052 585.3972
#> 56003 0.3133672 530.3481
#> 56013 0.3114327 547.7699
#> 55014 0.3025056 489.9611
#> 55004 0.2995537 516.7287
#> 55023 0.2993012 526.6564
You can apply two donors by selecting the two favoured candidates
from the list and including a vector of the site references in the
Don2
argument of the QMED()
function. The
following, then, provides a two-donor adjusted QMED (the associated
weights are also provided in the output):
# Estimate QMED from the catchment descriptors for site 55002 using gauges with NRFA
# IDs 55007 and 55016 as donor sites
QMED(CDs.55002, Don2 = c(55007, 55016))
#> QMEDs.adj a1 a2
#> 1 562.0169 0.2906814 0.2401811
The user should be aware of the following specifics about the UKFE implementation of QMED estimation. However, note that there is an inherent flexibility in ‘R’, so any code can be edited.
The QMED()
function is written to use a maximum of
two donors. Note that the QMEDDonEq()
function can be used,
and a number of donors could be used in turn - the user would need to
choose the donor weights.
The DonAdj()
function only provides candidate donor
sites from the NRFA “suitable for QMED” dataset. However, the
QMEDDonEq()
can be applied, which is entirely flexible
regarding inputs.
By default, the QMED()
function uses the URBEXT2000
catchment descriptor without an urban expansion. There is an argument
within the function called uef
, which is set to
FALSE
. This can be set to TRUE
for the urban
expansion to be applied for the present year. There is also the
UEF()
function, which can be applied independently, and for
any year.
Urban donors should be avoided as far as possible. However, if an
urban donor is deemed to be the best choice, the ungauged QMED estimate
of the donor (or donors) can be urban adjusted by setting the
DonUrbAdj
argument to TRUE
.
To create a pooling group, the Pool()
function can be
used. If it is for a gauged site, the site will be automatically
included in the group, unless it is urban. If you wish to include the
urban site, the iug
argument can be set to
TRUE
. In which case, it is also standard practice to set
the DeUrb
argument to TRUE
, to de-urbanise the
subject site so that the estimate can undergo urban adjustment
afterwards. To create and view the pooling group:
# Create a pooling group based on the catchment descriptors for site 55002 and
# save to an object called Pool.55002
Pool.55002 <- Pool(CDs = CDs.55002)
# View the pooling group
Pool.55002
#> AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR FARL FPEXT LDP
#> 55002 1894.257 297 128 0.08 0.421 95.63 134.4 0.967 0.0693 157.73
#> 8010 1745.842 495 13 0.05 0.422 51.13 158.7 0.938 0.0612 102.29
#> 76007 2276.000 248 329 0.08 0.493 62.12 103.3 0.971 0.0741 125.17
#> 54005 2026.770 249 99 0.10 0.444 67.88 133.6 0.977 0.0919 121.66
#> 12002 1833.213 447 82 0.07 0.455 66.91 169.4 0.980 0.0483 127.69
#> 21006 1505.540 358 77 0.07 0.444 43.88 191.9 0.963 0.0443 83.97
#> 23001 2172.360 286 83 0.09 0.342 55.02 93.7 0.961 0.0504 94.29
#> 76002 1374.845 284 330 0.08 0.499 62.71 124.8 0.955 0.0619 107.50
#> 76017 1371.697 284 330 0.08 0.499 61.17 124.9 0.955 0.0615 105.81
#> 12001 1380.062 512 93 0.07 0.455 60.44 185.7 0.976 0.0468 107.57
#> PROPWET SAAR SPRHOST URBEXT2000 QMED Lcv LSkew LKurt
#> 55002 0.50 1230 39.67 0.0029 410.0530 0.1179085 0.13450244 0.09947790
#> 8010 0.69 1194 46.42 0.0012 240.8780 0.1726428 0.09243711 0.08754162
#> 76007 0.64 1182 37.80 0.0082 639.3560 0.1986134 0.23353973 0.22855384
#> 54005 0.50 1147 38.49 0.0042 308.6500 0.1424610 0.05356933 0.12785352
#> 12002 0.58 1080 39.69 0.0016 556.4020 0.1551762 0.03231110 0.12110222
#> 21006 0.58 1164 38.34 0.0023 398.4495 0.1888078 0.19790768 0.15568335
#> 23001 0.51 1016 48.19 0.0026 837.9620 0.1639191 0.14451529 0.17419227
#> 76002 0.65 1272 36.88 0.0050 413.4200 0.1907121 0.30184716 0.23553495
#> 76017 0.65 1273 36.89 0.0049 481.8050 0.2811401 0.40201113 0.28327368
#> 12001 0.62 1108 40.27 0.0010 436.8090 0.2067674 0.07502250 0.19674386
#> L1 L2 N SDM Discordancy Discordant?
#> 55002 432.3641 50.97940 49 0.0000000 1.3987422 FALSE
#> 8010 248.8548 42.96299 71 0.2409683 1.6668700 FALSE
#> 76007 678.3431 134.72801 56 0.2741162 0.4760075 FALSE
#> 54005 313.7110 44.69157 71 0.3075738 0.4442087 FALSE
#> 12002 550.6405 85.44631 33 0.3546197 0.5305789 FALSE
#> 21006 422.4297 79.75803 62 0.4392073 0.3612837 FALSE
#> 23001 869.9198 142.59646 67 0.4649502 0.2743309 FALSE
#> 76002 466.2935 88.92779 39 0.4661871 1.0987304 FALSE
#> 76017 606.2199 170.43270 21 0.4702818 2.0994326 FALSE
#> 12001 427.5946 88.41262 77 0.5497881 1.6498151 FALSE
You may now wish to assess the group for heterogeneity and see which
distribution fits best. The H2()
function can be used to
check for heterogeneity.
This group appears to be homogeneous, but if a group is
heterogeneous, you may wish to use the DiagPlots()
function
to check for any outliers etc. You can also look at the discordancy
measures, which can be found in the second to last column of the pooling
group created.
Let’s pretend that site 8010 and site 76017 are to be removed
(because they have the highest discordancy - in reality, a more thorough
analysis would be undertaken). To remove the sites, we recreate the
pooling group and use the exclude
option. The function
automatically adds the next site(s) with the lowest similarity distance
measure if the total number of years of AMAX in the group has dropped
below 500 (or whatever we set as the N
argument in the
Pool()
function, which has a default of 500).
# Recreate the pooling group, excluding NRFA sites 8010 and 76017
Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017))
# View the new pooling group
Pool.55002
#> AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR FARL FPEXT LDP
#> 55002 1894.257 297 128 0.08 0.421 95.63 134.4 0.967 0.0693 157.73
#> 76007 2276.000 248 329 0.08 0.493 62.12 103.3 0.971 0.0741 125.17
#> 54005 2026.770 249 99 0.10 0.444 67.88 133.6 0.977 0.0919 121.66
#> 12002 1833.213 447 82 0.07 0.455 66.91 169.4 0.980 0.0483 127.69
#> 21006 1505.540 358 77 0.07 0.444 43.88 191.9 0.963 0.0443 83.97
#> 23001 2172.360 286 83 0.09 0.342 55.02 93.7 0.961 0.0504 94.29
#> 76002 1374.845 284 330 0.08 0.499 62.71 124.8 0.955 0.0619 107.50
#> 12001 1380.062 512 93 0.07 0.455 60.44 185.7 0.976 0.0468 107.57
#> 8006 2852.390 460 17 0.06 0.438 87.36 156.9 0.959 0.0525 162.27
#> PROPWET SAAR SPRHOST URBEXT2000 QMED Lcv LSkew LKurt
#> 55002 0.50 1230 39.67 0.0029 410.0530 0.1179085 0.13450244 0.0994779
#> 76007 0.64 1182 37.80 0.0082 639.3560 0.1986134 0.23353973 0.2285538
#> 54005 0.50 1147 38.49 0.0042 308.6500 0.1424610 0.05356933 0.1278535
#> 12002 0.58 1080 39.69 0.0016 556.4020 0.1551762 0.03231110 0.1211022
#> 21006 0.58 1164 38.34 0.0023 398.4495 0.1888078 0.19790768 0.1556834
#> 23001 0.51 1016 48.19 0.0026 837.9620 0.1639191 0.14451529 0.1741923
#> 76002 0.65 1272 36.88 0.0050 413.4200 0.1907121 0.30184716 0.2355350
#> 12001 0.62 1108 40.27 0.0010 436.8090 0.2067674 0.07502250 0.1967439
#> 8006 0.63 1119 43.98 0.0013 500.7330 0.1761823 0.16587645 0.1238761
#> L1 L2 N SDM Discordancy Discordant?
#> 55002 432.3641 50.97940 49 0.0000000 1.4730236 FALSE
#> 76007 678.3431 134.72801 56 0.2741162 0.6909721 FALSE
#> 54005 313.7110 44.69157 71 0.3075738 0.7047585 FALSE
#> 12002 550.6405 85.44631 33 0.3546197 0.6614546 FALSE
#> 21006 422.4297 79.75803 62 0.4392073 0.9304440 FALSE
#> 23001 869.9198 142.59646 67 0.4649502 0.2671468 FALSE
#> 76002 466.2935 88.92779 39 0.4661871 1.4005348 FALSE
#> 12001 427.5946 88.41262 77 0.5497881 1.5316138 FALSE
#> 8006 523.8306 92.28970 71 0.6306813 1.3400518 FALSE
# Check for heterogeneity
H2(Pool.55002)
#> [1] "0.954" "Group is homogenous"
Once you are happy with the pooling group, you can check for the best
distribution. The Zdists()
function can be used to do this.
This compares the Generalised Extreme Value (GEV), Generalised logistic
(GenLog), Gumbel and Kappa3 distributions. The method behind it is
outlined in Hosking and Wallis (1997). Note that it is not the same as
the method detailed in Science Report SC050050 (Kjeldsen et
al., 2008). The difference is described in the function’s help file
and is necessary to encompass the two-parameter Gumbel distribution.
# Calculate goodness-of-fit measure
Zdists(Pool.55002)
#> [[1]]
#> GEV GenLog Gumbel Kappa3
#> 1 1.36 -1.04 0.783 -0.246
#>
#> [[2]]
#> [1] "Kappa3 has the best fit"
The output is a list, where the first element contains the Z-scores for each distribution, and the second element states which distribution has the best fit. In this case, the Kappa3 distribution appears to fit best.
The L-CV and L-skew for each site are provided in the pooling group
and are derived from the NRFA AMAX data. If the user has further
information, for example, an extra year of data to provide a new annual
maximum, the L-CV and L-skew can be calculated using the
Lcv()
& LSkew()
functions. The user’s data
can be pasted in or read into the ‘R’ environment. Data can be read in
using a function such as read.csv()
, or a column of data
can be pasted in from Excel by running the following code and then
pressing “Ctrl+V” in the console (assuming you have already copied the
required data):
This creates a numeric vector called MyData
that
contains your flow data. The Lcv()
&
LSkew()
functions take a numeric vector of flows as their
only input, so you can then use Lcv(MyData)
and
LSkew(MyData)
to derive the L-moment ratios.
The relevant results can be updated in the pooling group by applying
the LRatioChange()
function. For example, if you wanted a
new pooling group with updated L-moment ratios for site 55002, the
following code could be used:
# Update the pooling group to use user-supplied L-CV and L-skew values for site 55002
UpdatedPool <- LRatioChange(Pool.55002, SiteID = 55002, lcv = 0.187, lskew = 0.164)
# View the updated pooling group
UpdatedPool
#> AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR FARL FPEXT LDP
#> 55002 1894.257 297 128 0.08 0.421 95.63 134.4 0.967 0.0693 157.73
#> 76007 2276.000 248 329 0.08 0.493 62.12 103.3 0.971 0.0741 125.17
#> 54005 2026.770 249 99 0.10 0.444 67.88 133.6 0.977 0.0919 121.66
#> 12002 1833.213 447 82 0.07 0.455 66.91 169.4 0.980 0.0483 127.69
#> 21006 1505.540 358 77 0.07 0.444 43.88 191.9 0.963 0.0443 83.97
#> 23001 2172.360 286 83 0.09 0.342 55.02 93.7 0.961 0.0504 94.29
#> 76002 1374.845 284 330 0.08 0.499 62.71 124.8 0.955 0.0619 107.50
#> 12001 1380.062 512 93 0.07 0.455 60.44 185.7 0.976 0.0468 107.57
#> 8006 2852.390 460 17 0.06 0.438 87.36 156.9 0.959 0.0525 162.27
#> PROPWET SAAR SPRHOST URBEXT2000 QMED Lcv LSkew LKurt
#> 55002 0.50 1230 39.67 0.0029 410.0530 0.1870000 0.16400000 0.0994779
#> 76007 0.64 1182 37.80 0.0082 639.3560 0.1986134 0.23353973 0.2285538
#> 54005 0.50 1147 38.49 0.0042 308.6500 0.1424610 0.05356933 0.1278535
#> 12002 0.58 1080 39.69 0.0016 556.4020 0.1551762 0.03231110 0.1211022
#> 21006 0.58 1164 38.34 0.0023 398.4495 0.1888078 0.19790768 0.1556834
#> 23001 0.51 1016 48.19 0.0026 837.9620 0.1639191 0.14451529 0.1741923
#> 76002 0.65 1272 36.88 0.0050 413.4200 0.1907121 0.30184716 0.2355350
#> 12001 0.62 1108 40.27 0.0010 436.8090 0.2067674 0.07502250 0.1967439
#> 8006 0.63 1119 43.98 0.0013 500.7330 0.1761823 0.16587645 0.1238761
#> L1 L2 N SDM Discordancy Discordant?
#> 55002 432.3641 50.97940 49 0.0000000 1.4730236 FALSE
#> 76007 678.3431 134.72801 56 0.2741162 0.6909721 FALSE
#> 54005 313.7110 44.69157 71 0.3075738 0.7047585 FALSE
#> 12002 550.6405 85.44631 33 0.3546197 0.6614546 FALSE
#> 21006 422.4297 79.75803 62 0.4392073 0.9304440 FALSE
#> 23001 869.9198 142.59646 67 0.4649502 0.2671468 FALSE
#> 76002 466.2935 88.92779 39 0.4661871 1.4005348 FALSE
#> 12001 427.5946 88.41262 77 0.5497881 1.5316138 FALSE
#> 8006 523.8306 92.28970 71 0.6306813 1.3400518 FALSE
To derive the final estimates, the PoolEst()
function
can be applied to the pooling group. It is necessary to provide an input
for the QMED
argument. Therefore, we will first obtain a
QMED estimate (in this case, we will use the data available from the
NRFA). To estimate QMED, we can either use the GetAM()
function and assess the data manually, or we can get it more directly
from the QMEDData
data frame embedded within the package
(these are calculated as the median of the AMAX data (rejected years
excluded) from the NRFA Peak Flow Dataset). The following code provides
the latter.
We can use functions within functions so we can include the above in
PoolEst()
:
# Derive the growth curve and final flow estimates based on the pooling group and QMED
PoolEst(Pool.55002, QMED = GetQMED(55002), gauged = TRUE)
#> [[1]]
#> RP Q GF lower95 upper95
#> 1 2 410.053 1.000 383.959 436.147
#> 2 5 492.945 1.202 453.269 532.621
#> 3 10 549.716 1.341 498.838 600.594
#> 4 20 608.258 1.483 544.130 672.386
#> 5 50 692.178 1.688 605.903 778.453
#> 6 75 732.440 1.786 634.167 830.713
#> 7 100 762.347 1.859 654.568 870.126
#> 8 200 839.430 2.047 704.741 974.119
#> 9 500 953.549 2.325 772.386 1134.712
#> 10 1000 1050.346 2.561 823.229 1277.463
#>
#> [[2]]
#> PooledLcv PooledLSkew
#> [1,] 0.1320962 0.1445543
#>
#> [[3]]
#> loc scale shape
#> 1 1 0.132 -0.144
#>
#> [[4]]
#> loc scale shape
#> 1 410 54 -0.144
The results are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q), growth factor estimates (GF), and factorial standard error. The latter is calculated using methods detailed in Circulation edition 152 (the British Hydrological Society magazine) and you can use it to calculate confidence intervals. The second element is the estimated pooled L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide parameters (location, scale, and shape) for the standardised and non-standardised distributions, respectively. If the RP argument has been supplied by the user rather than left as the default, then only the first two elements are returned.
The user should be aware of the following specifics about how UKFE employs the pooled method:
The default QMED FSE is set to 1.55, as recommended in Statistical procedures for flood frequency estimation (Flood Estimation Handbook, Volume 3, Section 13.9.2). This value represents the factorial standard error associated with the combined model and sampling uncertainty of the QMED model. You can update FSE if donor catchments are applied or if the QMED is estimated from a gauged sample.
The default distribution applied is the generalised logistic distribution.
The final FSE comprises the FSE associated with QMED estimation and growth curve estimation. The FSE associated with the growth curve quantifies sampling error (aleatoric uncertainty) only. The FSE associated with QMED includes model uncertainty. In general, the uncertainty associated with the modelling process or the hydrometric data collection is not included in the FSE. Many choices are made during the process, and deriving a range of results from a range of justifiable choices is a way to quantify some of the epistemic uncertainty.
By default, urban adjustment for the L-CV and L-skew is not
applied. If the user thinks these are necessary, they can set the
UrbAdj
argument to TRUE
.
The EVPool()
function can be used to plot a growth curve
(which should look like the one from the QuickResults()
example above).
For the ungauged catchment, the same process can be followed. First, we’ll exclude the site from our pooling group. We will assume we went through a rigorous pooling group analysis and concluded all was well:
# Recreate the pooling group, excluding NRFA gauge 55002 to make it an ungauged analysis
PoolUG.55002 <- Pool(CDs.55002, exclude = 55002)
# View the new pooling group
PoolUG.55002
#> AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR FARL FPEXT LDP
#> 8010 1745.842 495 13 0.05 0.422 51.13 158.7 0.938 0.0612 102.29
#> 76007 2276.000 248 329 0.08 0.493 62.12 103.3 0.971 0.0741 125.17
#> 54005 2026.770 249 99 0.10 0.444 67.88 133.6 0.977 0.0919 121.66
#> 12002 1833.213 447 82 0.07 0.455 66.91 169.4 0.980 0.0483 127.69
#> 21006 1505.540 358 77 0.07 0.444 43.88 191.9 0.963 0.0443 83.97
#> 23001 2172.360 286 83 0.09 0.342 55.02 93.7 0.961 0.0504 94.29
#> 76002 1374.845 284 330 0.08 0.499 62.71 124.8 0.955 0.0619 107.50
#> 76017 1371.697 284 330 0.08 0.499 61.17 124.9 0.955 0.0615 105.81
#> 12001 1380.062 512 93 0.07 0.455 60.44 185.7 0.976 0.0468 107.57
#> 8006 2852.390 460 17 0.06 0.438 87.36 156.9 0.959 0.0525 162.27
#> PROPWET SAAR SPRHOST URBEXT2000 QMED Lcv LSkew LKurt
#> 8010 0.69 1194 46.42 0.0012 240.8780 0.1726428 0.09243711 0.08754162
#> 76007 0.64 1182 37.80 0.0082 639.3560 0.1986134 0.23353973 0.22855384
#> 54005 0.50 1147 38.49 0.0042 308.6500 0.1424610 0.05356933 0.12785352
#> 12002 0.58 1080 39.69 0.0016 556.4020 0.1551762 0.03231110 0.12110222
#> 21006 0.58 1164 38.34 0.0023 398.4495 0.1888078 0.19790768 0.15568335
#> 23001 0.51 1016 48.19 0.0026 837.9620 0.1639191 0.14451529 0.17419227
#> 76002 0.65 1272 36.88 0.0050 413.4200 0.1907121 0.30184716 0.23553495
#> 76017 0.65 1273 36.89 0.0049 481.8050 0.2811401 0.40201113 0.28327368
#> 12001 0.62 1108 40.27 0.0010 436.8090 0.2067674 0.07502250 0.19674386
#> 8006 0.63 1119 43.98 0.0013 500.7330 0.1761823 0.16587645 0.12387605
#> L1 L2 N SDM Discordancy Discordant?
#> 8010 248.8548 42.96299 71 0.2409683 1.2961436 FALSE
#> 76007 678.3431 134.72801 56 0.2741162 0.4807292 FALSE
#> 54005 313.7110 44.69157 71 0.3075738 0.5870875 FALSE
#> 12002 550.6405 85.44631 33 0.3546197 0.5260091 FALSE
#> 21006 422.4297 79.75803 62 0.4392073 0.2908117 FALSE
#> 23001 869.9198 142.59646 67 0.4649502 0.4057173 FALSE
#> 76002 466.2935 88.92779 39 0.4661871 1.3895692 FALSE
#> 76017 606.2199 170.43270 21 0.4702818 2.2455011 FALSE
#> 12001 427.5946 88.41262 77 0.5497881 2.0897338 FALSE
#> 8006 523.8306 92.28970 71 0.6306813 0.6886975 FALSE
# Estimate QMED from the catchment descriptors with two donors (note that the QMED
# value needs to be extracted using `$QMEDs.adj` as there are three elements to the
# output when two donors are used and the PoolEst function expects just the QMED value)
CDsQmed.55002 <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj
# Use the pooling group and the ungauged QMED estimate to provide the pooled estimates
Results55002 <- PoolEst(PoolUG.55002, QMED = CDsQmed.55002)
# Print the results
Results55002
#> [[1]]
#> RP Q GF lower68 upper68
#> 1 2 562.017 1.000 384.943 820.545
#> 2 5 724.757 1.290 493.939 1063.436
#> 3 10 838.057 1.491 564.417 1244.364
#> 4 20 956.271 1.701 632.831 1445.021
#> 5 50 1127.930 2.007 722.014 1762.052
#> 6 75 1211.127 2.155 761.744 1925.619
#> 7 100 1273.253 2.266 789.938 2052.280
#> 8 200 1434.598 2.553 858.167 2398.217
#> 9 500 1676.419 2.983 949.737 2959.114
#> 10 1000 1884.038 3.352 1020.108 3479.630
#>
#> [[2]]
#> PooledLcv PooledLSkew
#> [1,] 0.1853155 0.1596821
#>
#> [[3]]
#> loc scale shape
#> 1 1 0.187 -0.16
#>
#> [[4]]
#> loc scale shape
#> 1 562 105 -0.16
# Plot a growth curve for the pooling group
EVPool(PoolUG.55002)
The same results are provided as for the gauged case.
The quickest way to get results for a single sample are the
GEV()
, GenLog()
, Kappa3()
,
Gumbel()
and GenPareto()
functions. For
example, the following code can be used to obtain a 75-year return
period estimate for an AMAX sample called MyAMAX
using the
generalised logistic distribution:
An annual maximum series can be obtained for sites suitable for
pooling using the GetAM()
or AMImport()
function (see the ‘Data input’ article). We will retrieve and plot the
AMAX for our site 55002:
# Extract the AMAX data for NRFA site 55002
AM.55002 <- GetAM(55002)
# View the head of the AMAX series
head(AM.55002)
#> Date Flow id
#> 1 1974-02-12 457.894 55002
#> 2 1975-01-22 410.053 55002
#> 3 1975-12-02 364.079 55002
#> 4 1977-02-03 286.690 55002
#> 5 1977-11-20 318.908 55002
#> 6 1978-12-14 397.771 55002
# Plot the AMAX data
AMplot(AM.55002)
The AMplot()
function returns a bar plot of the AMAX
series.
There are four functions for each of the GEV, GenLog, Gumbel, Kappa3,
and GenPareto distributions. In the function names below,
XXX
should be replaced by the distribution prefix
(GEV
, GenLog
, Gumbel
,
Kapp3
, or GenPareto
). For the Generalised
Pareto distribution, the first function is named
GenParetoPOT()
instead of GenParetoAM()
.
XXXAM()
- estimates return levels or return periods
directly from an AMAX (or POT for GenPareto) series.
XXXEst()
- estimates return levels or return periods
from user-supplied parameters.
XXXGF()
- provides the growth factor as a function
of L-CV, L-skew, and RP (the Gumbel version only requires the L-CV and
RP).
XXXPars
- estimates distribution parameters from an
AMAX or POT series (using L-moments, or MLE in some cases).
We can look at the fit of the distributions using the
ERPlot()
or EVPlot()
functions. By default,
ERPlot()
compares the percentage difference between
simulated results with the observed for each rank of the data. Another
option (see the ERType
argument) compares the simulated
flows for each rank of the sample with the observed flows of the same
rank. For both plots, 500 simulated samples are used. The function’s
help file provides more information, including how to compare the
distribution estimated from pooling with the observed data. This is an
updated version of that described in Hammond (2019).
The EVPlot()
function plots the extreme value frequency
curve or growth curve with observed sample points. The uncertainty is
quantified by bootstrapping.
The following code generates an extreme rank plot and an extreme value plot using the GEV distribution:
# Generate an extreme rank plot for the AMAX data for NRFA site 55002 and the GEV
# distribution and set a user-defined title (using the 'main' argument)
ERPlot(AM.55002$Flow, dist = "GEV", main = "Extreme rank plot for NRFA site 55002: GEV")
# Generate an extreme value plot for the AMAX data for NRFA site 55002 and the GEV
# distribution and set a user-defined title (using the 'Title' argument)
EVPlot(AM.55002$Flow, dist = "GEV", Title = "Extreme value plot for NRFA site 55002: GEV")
As an example, we can estimate the 100-year flow. Then, estimate the return period of the maximum observed flow. Make sure to select the “Flow” column.
# Estimate the 100-year flow directly from the AMAX data for NRFA site 55002 using
# the GEV distribution
GEVAM(AM.55002$Flow, RP = 100)
#> [1] 705.0645
# Estimate the return period of the maximum observed flow directly from the AMAX
# data using the GEV distribution
GEVAM(AM.55002$Flow, q = max(AM.55002$Flow))
#> [1] 83.23851
We can also estimate the parameters of the GEV from the data, using the default L-moments, and estimate quantiles as a function of return period (and vice versa) from user input parameters.
# Estimate the parameters of the GEV distribution from the AMAX data for NRFA site 55002
GEVPars55002 <- GEVPars(AM.55002$Flow)
# View the parameters
GEVPars55002
#> Loc Scale Shape
#> 1 391.8503 77.26544 0.05618528
# Estimate the 75-year flow using these GEV parameters
GEVEst(loc = GEVPars55002$Loc, scale = GEVPars55002$Scale, shape = GEVPars55002$Shape, RP = 75)
#> [1] 687.6577
If we derive L-moments for the site, we can then estimate growth
factors. The LMoments()
function does this. We can also
calculate L-CV and L-skew separately using the Lcv()
and
LSkew()
functions.
# Estimate the 75-year GEV growth factor from the L-moments for the AMAX data for
# NRFA site 55002
GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75)
#> [1] 1.63775
We can estimate the 75-year flow by combining this latter result with QMED (the median of the AMAX record).
# Estimate the 75-year flow by multiplying the growth factor by the median AMAX flow
GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75) * median(AM.55002$Flow)
#> [1] 671.5643
This gives a different 75-year flow estimate from estimating it directly using the GEV parameters because here we are incorporating QMED from the data.
A peaks over threshold (POT) series can also be obtained from a time
series using the POTextract()
function. A daily sample of
the Thames at Kingston flow is included in the package for example
purposes (the data frame ThamesPQ
). The
POTextract()
function can be used on a single vector or a
data frame, with dates in the first column and the time series of
interest in the second. In the ThamesPQ
data frame, the
Thames flow is in the third column and the date in the first. Therefore,
the POT series can be extracted as follows:
# Extract a POT series from the Thames at Kingston daily mean flow data, selecting
# the first and third columns from the ThamesPQ data frame which contain the date
# and flow data respectively. The threshold is set as the 90th percentile.
POT.Thames <- POTextract(ThamesPQ[, c(1, 3)], thresh = 0.9)
#> [1] "Peaks per year: 1.867263"
# View the output
POT.Thames
#> Date peak
#> 3 2000-11-07 440.0
#> 25 2002-02-05 316.0
#> 34 2003-01-02 461.0
#> 40 2004-01-13 227.0
#> 41 2004-02-02 238.0
#> 43 2004-05-05 194.0
#> 56 2007-03-07 330.0
#> 58 2007-07-26 274.0
#> 59 2007-11-20 196.0
#> 62 2007-12-09 201.0
#> 64 2008-01-16 362.0
#> 66 2008-03-17 239.0
#> 67 2008-06-04 232.0
#> 68 2008-11-11 231.0
#> 69 2008-12-14 286.0
#> 72 2009-02-11 369.0
#> 77 2010-01-18 312.0
#> 82 2010-03-01 309.0
#> 83 2010-04-04 178.0
#> 84 2011-01-18 289.0
#> 85 2012-05-01 260.0
#> 87 2012-06-12 216.0
#> 88 2012-11-05 203.0
#> 91 2012-12-26 407.0
#> 106 2014-02-09 502.5
#> 110 2014-04-27 186.4
#> 111 2014-11-24 192.3
#> 113 2015-01-16 250.6
This produces a figure showing the daily mean flow data with peaks over the threshold shown as red circles. The blue line is the threshold, and the green line is the mean flow. Independent peaks are chosen above the threshold. If peaks are separated by a drop below the mean flow (green line), they are considered independent. More details can be found in the function’s help file (including other options for defining independence and how to rename axis labels and the plot title).
Note that you can also use the POTt()
function for
extracting peaks over threshold. It works significantly faster, but only
has the option of separating peaks by the number of timesteps.
The POT approach is similar to the AMAX approach, except that we need
to convert from the POT RP scale to the annual RP scale. To do so, the
functions have a peaks per year (ppy) argument. The
POTextract()
function provides the number of peaks per year
(in this case, 1.867). To estimate the associated 100-year daily mean
flow peak, the GenPareto function can be used as follows:
Methods for quantifying aleatoric (sampling) uncertainty are provided here for three estimation methods: single-site, enhanced single-site (or gauged pooling) and ungauged pooling. The uncertainty is predicated on forming a sampling distribution for the statistic of interest, and in the three cases (single site, gauged pooling, and ungauged pooling), it is done using bootstrapping. In the pooling approaches (gauged and ungauged), the whole pooling group is bootstrapped, and the weighted estimates provided on every iteration.
Extract an annual maximum sample with the following code.
The uncertainty for the 100-year flow can be estimated using the
Bootstrap()
function. The 95% confidence intervals are
provided by default, but this can be adjusted using the
Conf
argument. The lower limit is provided in the ‘Lower’
column and the upper limit in the ‘Upper’ column.
Note that the Bootstrap()
function can also be applied
for any function/statistic, and the sampling distribution developed can
be returned.
The Uncertainty()
function can be used for the pooled
estimates to quantify the sampling uncertainty. The estimated QMED and
the pooling group are required. Furthermore, the factorial standard
error needs to be provided for the ungauged case, using the
fseQMED
argument (the default is 1.55, as recommended in
Statistical procedures for flood frequency estimation
(Flood Estimation Handbook, Volume 3, Section 13.9.2)). Note
that by default the generalised logistic distribution is selected for
the estimates, but this can be changed.
The following example to estimate uncertainty for the ungauged pooled
estimates for NRFA site 55002 uses the catchment descriptors object
created earlier (CDs.55002
). This provides the 95%
confidence interval by default, and this can be changed using the
Conf
argument. The lower limit is provided in the ‘Lower’
column and the upper limit in the ‘Upper’ column. There is an option to
plot the flood frequency curve (return level plot), which includes the
confidence interval.
# Estimate QMED for site 55002 from catchment descriptors using two donors
QMEDEst <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj
# Create an ungauged pooling group for site 55002
PoolUG.55002 <- Pool(CDs.55002, exclude = 55002)
# Estimate the uncertainty of the pooling results
Uncertainty(PoolUG.55002, qmed = QMEDEst)
#> RP Central Lower Upper
#> 1 2 562 250 1260
#> 2 5 725 311 1590
#> 3 10 838 351 1850
#> 4 20 956 393 2060
#> 5 50 1130 451 2630
#> 6 75 1210 478 2910
#> 7 100 1270 498 3170
#> 8 200 1430 541 3830
#> 9 500 1680 587 5180
#> 10 1000 1880 615 6750
The enhanced single-site case is similar to the above, but the
Gauged
argument is set to TRUE
, and there is
no need to set qmed
; it is derived from the observed AMAX
if Gauged = TRUE
.
It is important to note that when Gauged
equals
TRUE
, the top site in the pooling group is assumed to be
the “gauged” site of interest. This will happen by default because the
similarity distance measure should be zero - assuming the that “gauged”
site is within the NRFAData
data frame.
The following example uses the gauged pooling group we previously derived:
# Create a gauged pooling group for site 55002
Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017))
The uncertainty for a range of return levels can then be quantified as follows:
#> RP Central Lower Upper
#> 1 2 410 385 439
#> 2 5 493 457 531
#> 3 10 550 504 599
#> 4 20 608 551 669
#> 5 50 692 615 772
#> 6 75 732 645 825
#> 7 100 762 668 866
#> 8 200 839 721 980
#> 9 500 954 789 1150
#> 10 1000 1050 844 1300
For use outside of ‘R’, outputs can be saved as objects and then written to CSV files. For example, to save the peak flow estimates from earlier: