FEH statistical analysis

Function	Purpose
GetCDs/CDsXML	Get catchment descriptors using the NRFA gauge reference number or import them from XML files downloaded from the FEH Web Service
GetAM/AMImport	Get annual maximum data from sites suitable for pooling or from AM files
QuickResults	Provides default pooling results (gauged, ungauged & fake ungauged), directly from catchment descriptors
QMED	Estimates QMED from catchment descriptors with the option of zero, one or two donors
DonAdj	Provides a list of donor candidates based on proximity to the study catchment
Pool	Creates a pooling group with the option of excluding one or more sites
H2	The heterogeneity measure for pooling groups
DiagPlots	Provides a number of diagnostic plots to compare the sites in a pooling group
Zdists	The Zdist statistic to determine the distribution with the best fit to the pooling group
PoolEst	Flood estimates from the pooling group
Uncertainty	Quantifies uncertainty for gauged (enhanced single-site) and ungauged pooling groups
GenLog/GEV/GenPareto/Gumbel/Kappa3	Four functions for each distribution to fit the distribution and obtain flood frequency estimates and growth factors
POTextract or POTt	Functions for extracting peaks over threshold data
AnnualStat	Extract annual statistics. The default is maximum, but any statistic can be used such as sum or mean.

Overview and assumptions of the FEH statistical method

This section provides a brief overview of the FEH statistical method, and its core assumptions. Examples of how to apply the method can be found later.

The FEH statistical method comes under the broader term of ‘regional frequency analysis’ and employs the “index flood” procedure for estimating peak flows at a single location on a river line. The procedure has two main steps, which are multiplied together for a final result:

Estimate the index flood (in the FEH method, the index flood is QMED - the median annual maximum flow).
Calculate the growth curve (this provides growth factors as a function of return period and is a standardised model of the AMAX distribution).

Multiplying the index flood by the growth curve scales the standardised growth factor to produce the flow estimate for the catchment of interest. When the catchment is gauged, the index flood is estimated as the median of the observed annual maximum sample (assuming suitable data quality). When the catchment is ungauged, the index flood is estimated using a multiple regression equation, using catchment descriptors as predictors.

The growth curve is estimated by pooling gauged data from catchments which are similar to the catchment of interest. A statistical distribution is assumed. Then, the average parameters of the fitted distribution (L-moment ratios) from the AMAX samples in the pooling group are used to estimate the growth curve.

There are numerous choices and possible adjustments which can be made for each step. The UKFE package, in general, provides default settings with no adjustments; however, these can be changed by the user.

There are several key underlying assumptions in FEH statistical analysis. As stated in Hosking and Wallis (1997), the index-flood procedure makes the following assumptions:

Observations at any given site are identically distributed.
Observations at any given site are serially independent.
Observations at different sites are independent.
Frequency distributions at different sites are identical apart from a scale factor.
The mathematical form of the regional growth curve is correctly specified.

The first of these leads to the assumption of stationarity (i.e. observations of past flood events are assumed to represent the behaviour of future events). Where this is not the case, non-stationary methods can be applied. For example, non-stationary distributions can be fitted using other software such as the nonstat R package (Hecht and Zitzmann, 2025).

The QMED regression equation was calibrated using an approach similar to the generalised least squares method. This allows for covariance of the model residuals (errors). Firstly, the observed QMEDs (on which it was calibrated) were logarithmically transformed. The catchment descriptors used were also transformed. With this in mind, the assumptions of the QMED equation are as follows:

The relationship between the transformed independent variables (predictors) and the logarithmically transformed dependent variable (QMED) is linear.
Observations (observed QMEDs used in calibration) are independent of each other.
Model and sampling errors are independent of each other.
Model and sampling errors are normally distributed and have a mean of zero.
Predictor variables are independent.
The cross-correlation of model errors can be described by the distance between catchment centroids, and the form of the associated correlation matrix is known for the calibration process.

Getting catchment descriptors

One of the first steps in any FEH analysis is to obtain catchment descriptors (CDs). See the ‘Data input’ article for information on functions for importing CDs.

Quick results

Once you have some catchment descriptors, you can start to use many of the FEH based functions. The full process, covered shortly, is encompassed in the QuickResults() function. This provides results from a default pooling group for gauged or ungauged catchments, automatically accounts for urbanisation, and provides a two donor transfer if gauged = FALSE. The default distribution is the generalised logistic distribution. Further details and assumptions are provided in the function’s help file.

First, we’ll obtain some CDs for a site suitable for pooling, allowing us to conduct both a gauged estimate and an as-if ungauged estimate (i.e., treating the gauged site as if no flow record were available, using only catchment descriptors). The following code then provides estimates and plots for both the gauged and as-if ungauged cases.

For the gauged case, the plot shows the default gauged growth curve for site 55002. The results from the default enhanced single-site analysis at gauge 55002 are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q) and growth factor estimates (GF). Two additional columns quantify the 95% confidence interval of the peak flow estimates. The second element is the estimated L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide distribution parameters (location, scale and shape for the default generalised logistic distribution). The third provides distribution parameters for the growth curve, and the fourth provides distribution parameters for the frequency curve.

# Get catchment descriptors for NRFA site 55002 and save to an object called CDs.55002
CDs.55002 <- GetCDs(55002)

# Obtain estimates and plots for the gauged scenario using default settings
QuickResults(CDs.55002, gauged = TRUE)

#> [[1]]
#>      RP        Q    GF lower95  upper95
#> 1     2  410.053 1.000 383.959  436.147
#> 2     5  494.926 1.207 455.250  534.602
#> 3    10  553.498 1.350 502.620  604.376
#> 4    20  614.227 1.498 550.099  678.355
#> 5    50  701.803 1.711 615.528  788.078
#> 6    75  744.017 1.814 645.744  842.290
#> 7   100  775.451 1.891 667.672  883.230
#> 8   200  856.755 2.089 722.066  991.444
#> 9   500  977.811 2.385 796.648 1158.974
#> 10 1000 1081.070 2.636 853.953 1308.187
#> 
#> [[2]]
#>      PooledLcv PooledLSkew
#> [1,] 0.1347467   0.1515658
#> 
#> [[3]]
#>   loc scale  shape
#> 1   1 0.134 -0.152
#> 
#> [[4]]
#>   loc scale  shape
#> 1 410    55 -0.151

The fake ungauged quick results estimate removes the pooled site of interest from the process and implements an ungauged assessment. Here, the plot shows the default fake ungauged growth curve for site 55002. The same outputs are again printed to the console, but this time they provide the 68% confidence interval.

# Obtain estimates and plots for the fake ungauged scenario using default settings
QuickResults(CDs.55002, FUngauged = TRUE)

#> [[1]]
#>      RP        Q    GF  lower68  upper68
#> 1     2  562.017 1.000  562.017  562.017
#> 2     5  724.757 1.290  721.151  728.381
#> 3    10  838.057 1.491  824.048  852.304
#> 4    20  956.271 1.701  923.933  989.740
#> 5    50 1127.930 2.007 1054.140 1206.885
#> 6    75 1211.127 2.155 1112.146 1318.917
#> 7   100 1273.253 2.266 1153.309 1405.671
#> 8   200 1434.598 2.553 1252.924 1642.615
#> 9   500 1676.419 2.983 1386.616 2026.791
#> 10 1000 1884.038 3.352 1489.358 2383.308
#> 
#> [[2]]
#>      PooledLcv PooledLSkew
#> [1,] 0.1853155   0.1596821
#> 
#> [[3]]
#>   loc scale shape
#> 1   1 0.187 -0.16
#> 
#> [[4]]
#>   loc scale shape
#> 1 562   105 -0.16

The default setting for the QuickResults() function is to treat the site as ungauged, in which case, for a catchment with no gauge, the code can simply be run with only the first argument (the CDs) provided. If the site is not urban and the CDs are from a site suitable for pooling, the site itself will be included (so it won’t be truly ungauged unless we use FUngauged = TRUE).

Note that if the default ungauged is used, then ESS will not be applied, even if the gauge of interest is at the top of the pooling group. The gauge will be included but won’t be weighted according to the ESS method. The QMED is estimated as if ungauged; however, because the CDs are the same as those of the closest donor (which are derived from the gauged site), the QMED will end up being the same as the observed one.

Estimating QMED

There is a QMED() function which provides the ungauged QMED (median annual maximum flow) estimate from catchment descriptors. There is an option to adjust this using one or two donor sites. The simplest case is to provide catchment descriptors with no donor transfer:

# Estimate QMED from the catchment descriptors for site 55002
QMED(CDs.55002)
#> [1] 575.1896

By default, this gives rural QMED results. If the site is urbanised (has an URBEXT2000 value greater than 0.03), there will be a warning and a suggestion to change the UrbAdj argument to TRUE.

To undertake a donor transfer (with the aim of improving the estimate), a list of donor site candidates is necessary. For this, there is the DonAdj() function, which provides the closest N gauges (default of 10) to the site (geographically by catchment centroid) that are suitable for QMED, along with catchment descriptors. It also provides the donor adjusted result beside each one in the last column.

# Identify donor catchments based on the catchment descriptors for site 55002
DonAdj(CDs.55002)
#>            AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR  FARL  FPEXT
#> 55002 1894.2575    297    128   0.08     0.421  95.63  134.4 0.967 0.0693
#> 55007 1283.4025    343    132   0.09     0.387  42.94  152.4 0.960 0.0412
#> 55016  358.9450    324    157   0.07     0.398  32.92  132.2 0.998 0.0428
#> 55013  125.8975    303    103   0.20     0.468  14.84  130.1 0.999 0.0382
#> 55012  246.6575    333    136   0.11     0.380  20.67  158.6 0.997 0.0419
#> 56003   62.6575    311    175   0.21     0.449  10.94  121.0 0.999 0.0268
#> 56013   63.5450    349    151   0.18     0.406  11.95  137.9 1.000 0.0257
#> 55014  202.5450    299    122   0.18     0.540  17.88  158.5 0.996 0.0646
#> 55004   73.0125    412    152   0.19     0.342  11.67  188.7 1.000 0.0284
#> 55023 4016.2550    227    127   0.10     0.496 130.34  116.1 0.979 0.0824
#>          LDP PROPWET RMED.1H RMED.1D RMED.2D SAAR SAAR4170 SPRHOST URBEXT2000
#> 55002 157.73    0.50    10.2    42.4    56.1 1230     1269   39.67     0.0029
#> 55007  84.45    0.53    10.3    44.7    60.0 1386     1413   42.23     0.0022
#> 55016  62.29    0.49     9.6    36.2    48.5 1066     1129   40.33     0.0031
#> 55013  28.33    0.49     9.8    37.5    48.7  962     1086   34.31     0.0046
#> 55012  39.53    0.65    10.8    50.0    66.4 1627     1639   42.75     0.0020
#> 56003  21.79    0.53    10.4    43.1    56.6 1171     1257   35.20     0.0026
#> 56013  23.08    0.61    10.7    45.5    59.5 1299     1426   38.25     0.0003
#> 55014  31.70    0.49     9.8    36.1    47.3  977     1062   31.41     0.0009
#> 55004  23.04    0.65    11.4    52.8    70.0 1845     1913   46.25     0.0033
#> 55023 244.16    0.38    10.1    38.7    50.3 1010     1054   35.60     0.0072
#>           QMED    QMEDcd      X      Y  QMEDfse  N URBEXT1990 BFIHOST    Dists
#> 55002 410.0530 575.18962 306152 255939 1.039511 49     0.0020   0.472  0.00000
#> 55007 514.5350 520.28951 298496 263086 1.058186 83     0.0010   0.426 10.47349
#> 55016 120.1240 130.51968 309100 270212 1.037415 48     0.0018   0.427 14.57427
#> 55013  27.7575  36.84552 323593 254543 1.055332 54     0.0030   0.553 17.49678
#> 55012 191.0210 180.89006 289404 250198 1.061544 54     0.0004   0.431 17.70465
#> 56003  23.4560  30.39078 302460 237118 1.132355 21     0.0009   0.529 19.17970
#> 56013  34.5900  40.46358 297636 238410 1.035393 51     0.0000   0.495 19.48815
#> 55014  26.5040  45.03533 324893 265277 1.094494 55     0.0023   0.593 20.93856
#> 55004  56.5420  80.86561 284963 252747 1.075714 45     0.0009   0.402 21.42808
#> 55023 492.1000 660.63796 326243 248368 1.050621 53     0.0067   0.542 21.47017
#>               a QMED.adj
#> 55002 1.0000000 410.0530
#> 55007 0.3765023 572.7861
#> 55016 0.3440466 558.9970
#> 55013 0.3241613 524.7323
#> 55012 0.3228052 585.3972
#> 56003 0.3133672 530.3481
#> 56013 0.3114327 547.7699
#> 55014 0.3025056 489.9611
#> 55004 0.2995537 516.7287
#> 55023 0.2993012 526.6564

You can apply two donors by selecting the two favoured candidates from the list and including a vector of the site references in the Don2 argument of the QMED() function. The following, then, provides a two-donor adjusted QMED (the associated weights are also provided in the output):

# Estimate QMED from the catchment descriptors for site 55002 using gauges with NRFA
# IDs 55007 and 55016 as donor sites
QMED(CDs.55002, Don2 = c(55007, 55016))
#>   QMEDs.adj        a1        a2
#> 1  562.0169 0.2906814 0.2401811

The user should be aware of the following specifics about the UKFE implementation of QMED estimation. However, note that there is an inherent flexibility in ‘R’, so any code can be edited.

The QMED() function is written to use a maximum of two donors. Note that the QMEDDonEq() function can be used, and a number of donors could be used in turn - the user would need to choose the donor weights.
The DonAdj() function only provides candidate donor sites from the NRFA “suitable for QMED” dataset. However, the QMEDDonEq() can be applied, which is entirely flexible regarding inputs.
By default, the QMED() function uses the URBEXT2000 catchment descriptor without an urban expansion. There is an argument within the function called uef, which is set to FALSE. This can be set to TRUE for the urban expansion to be applied for the present year. There is also the UEF() function, which can be applied independently, and for any year.
Urban donors should be avoided as far as possible. However, if an urban donor is deemed to be the best choice, the ungauged QMED estimate of the donor (or donors) can be urban adjusted by setting the DonUrbAdj argument to TRUE.

Creating a pooling group

To create a pooling group, the Pool() function can be used. If it is for a gauged site, the site will be automatically included in the group, unless it is urban. If you wish to include the urban site, the iug argument can be set to TRUE. In which case, it is also standard practice to set the DeUrb argument to TRUE, to de-urbanise the subject site so that the estimate can undergo urban adjustment afterwards. To create and view the pooling group:

# Create a pooling group based on the catchment descriptors for site 55002 and 
# save to an object called Pool.55002
Pool.55002 <- Pool(CDs = CDs.55002)

# View the pooling group
Pool.55002
#>           AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR  FARL  FPEXT    LDP
#> 55002 1894.257    297    128   0.08     0.421  95.63  134.4 0.967 0.0693 157.73
#> 8010  1745.842    495     13   0.05     0.422  51.13  158.7 0.938 0.0612 102.29
#> 76007 2276.000    248    329   0.08     0.493  62.12  103.3 0.971 0.0741 125.17
#> 54005 2026.770    249     99   0.10     0.444  67.88  133.6 0.977 0.0919 121.66
#> 12002 1833.213    447     82   0.07     0.455  66.91  169.4 0.980 0.0483 127.69
#> 21006 1505.540    358     77   0.07     0.444  43.88  191.9 0.963 0.0443  83.97
#> 23001 2172.360    286     83   0.09     0.342  55.02   93.7 0.961 0.0504  94.29
#> 76002 1374.845    284    330   0.08     0.499  62.71  124.8 0.955 0.0619 107.50
#> 76017 1371.697    284    330   0.08     0.499  61.17  124.9 0.955 0.0615 105.81
#> 12001 1380.062    512     93   0.07     0.455  60.44  185.7 0.976 0.0468 107.57
#>       PROPWET SAAR SPRHOST URBEXT2000     QMED       Lcv      LSkew      LKurt
#> 55002    0.50 1230   39.67     0.0029 410.0530 0.1179085 0.13450244 0.09947790
#> 8010     0.69 1194   46.42     0.0012 240.8780 0.1726428 0.09243711 0.08754162
#> 76007    0.64 1182   37.80     0.0082 639.3560 0.1986134 0.23353973 0.22855384
#> 54005    0.50 1147   38.49     0.0042 308.6500 0.1424610 0.05356933 0.12785352
#> 12002    0.58 1080   39.69     0.0016 556.4020 0.1551762 0.03231110 0.12110222
#> 21006    0.58 1164   38.34     0.0023 398.4495 0.1888078 0.19790768 0.15568335
#> 23001    0.51 1016   48.19     0.0026 837.9620 0.1639191 0.14451529 0.17419227
#> 76002    0.65 1272   36.88     0.0050 413.4200 0.1907121 0.30184716 0.23553495
#> 76017    0.65 1273   36.89     0.0049 481.8050 0.2811401 0.40201113 0.28327368
#> 12001    0.62 1108   40.27     0.0010 436.8090 0.2067674 0.07502250 0.19674386
#>             L1        L2  N       SDM Discordancy Discordant?
#> 55002 432.3641  50.97940 49 0.0000000   1.3987422       FALSE
#> 8010  248.8548  42.96299 71 0.2409683   1.6668700       FALSE
#> 76007 678.3431 134.72801 56 0.2741162   0.4760075       FALSE
#> 54005 313.7110  44.69157 71 0.3075738   0.4442087       FALSE
#> 12002 550.6405  85.44631 33 0.3546197   0.5305789       FALSE
#> 21006 422.4297  79.75803 62 0.4392073   0.3612837       FALSE
#> 23001 869.9198 142.59646 67 0.4649502   0.2743309       FALSE
#> 76002 466.2935  88.92779 39 0.4661871   1.0987304       FALSE
#> 76017 606.2199 170.43270 21 0.4702818   2.0994326       FALSE
#> 12001 427.5946  88.41262 77 0.5497881   1.6498151       FALSE

You may now wish to assess the group for heterogeneity and see which distribution fits best. The H2() function can be used to check for heterogeneity.

# Check for heterogeneity
H2(Pool.55002)
#> [1] "1.77"                "Group is homogenous"

This group appears to be homogeneous, but if a group is heterogeneous, you may wish to use the DiagPlots() function to check for any outliers etc. You can also look at the discordancy measures, which can be found in the second to last column of the pooling group created.

# Create diagnostic plots to aid review of the pooling group
DiagPlots(Pool.55002)

Let’s pretend that site 8010 and site 76017 are to be removed (because they have the highest discordancy - in reality, a more thorough analysis would be undertaken). To remove the sites, we recreate the pooling group and use the exclude option. The function automatically adds the next site(s) with the lowest similarity distance measure if the total number of years of AMAX in the group has dropped below 500 (or whatever we set as the N argument in the Pool() function, which has a default of 500).

# Recreate the pooling group, excluding NRFA sites 8010 and 76017
Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017))

# View the new pooling group
Pool.55002
#>           AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR  FARL  FPEXT    LDP
#> 55002 1894.257    297    128   0.08     0.421  95.63  134.4 0.967 0.0693 157.73
#> 76007 2276.000    248    329   0.08     0.493  62.12  103.3 0.971 0.0741 125.17
#> 54005 2026.770    249     99   0.10     0.444  67.88  133.6 0.977 0.0919 121.66
#> 12002 1833.213    447     82   0.07     0.455  66.91  169.4 0.980 0.0483 127.69
#> 21006 1505.540    358     77   0.07     0.444  43.88  191.9 0.963 0.0443  83.97
#> 23001 2172.360    286     83   0.09     0.342  55.02   93.7 0.961 0.0504  94.29
#> 76002 1374.845    284    330   0.08     0.499  62.71  124.8 0.955 0.0619 107.50
#> 12001 1380.062    512     93   0.07     0.455  60.44  185.7 0.976 0.0468 107.57
#> 8006  2852.390    460     17   0.06     0.438  87.36  156.9 0.959 0.0525 162.27
#>       PROPWET SAAR SPRHOST URBEXT2000     QMED       Lcv      LSkew     LKurt
#> 55002    0.50 1230   39.67     0.0029 410.0530 0.1179085 0.13450244 0.0994779
#> 76007    0.64 1182   37.80     0.0082 639.3560 0.1986134 0.23353973 0.2285538
#> 54005    0.50 1147   38.49     0.0042 308.6500 0.1424610 0.05356933 0.1278535
#> 12002    0.58 1080   39.69     0.0016 556.4020 0.1551762 0.03231110 0.1211022
#> 21006    0.58 1164   38.34     0.0023 398.4495 0.1888078 0.19790768 0.1556834
#> 23001    0.51 1016   48.19     0.0026 837.9620 0.1639191 0.14451529 0.1741923
#> 76002    0.65 1272   36.88     0.0050 413.4200 0.1907121 0.30184716 0.2355350
#> 12001    0.62 1108   40.27     0.0010 436.8090 0.2067674 0.07502250 0.1967439
#> 8006     0.63 1119   43.98     0.0013 500.7330 0.1761823 0.16587645 0.1238761
#>             L1        L2  N       SDM Discordancy Discordant?
#> 55002 432.3641  50.97940 49 0.0000000   1.4730236       FALSE
#> 76007 678.3431 134.72801 56 0.2741162   0.6909721       FALSE
#> 54005 313.7110  44.69157 71 0.3075738   0.7047585       FALSE
#> 12002 550.6405  85.44631 33 0.3546197   0.6614546       FALSE
#> 21006 422.4297  79.75803 62 0.4392073   0.9304440       FALSE
#> 23001 869.9198 142.59646 67 0.4649502   0.2671468       FALSE
#> 76002 466.2935  88.92779 39 0.4661871   1.4005348       FALSE
#> 12001 427.5946  88.41262 77 0.5497881   1.5316138       FALSE
#> 8006  523.8306  92.28970 71 0.6306813   1.3400518       FALSE

# Check for heterogeneity
H2(Pool.55002)
#> [1] "0.954"               "Group is homogenous"

Choosing a distribution.

Once you are happy with the pooling group, you can check for the best distribution. The Zdists() function can be used to do this. This compares the Generalised Extreme Value (GEV), Generalised logistic (GenLog), Gumbel and Kappa3 distributions. The method behind it is outlined in Hosking and Wallis (1997). Note that it is not the same as the method detailed in Science Report SC050050 (Kjeldsen et al., 2008). The difference is described in the function’s help file and is necessary to encompass the two-parameter Gumbel distribution.

# Calculate goodness-of-fit measure
Zdists(Pool.55002)
#> [[1]]
#>    GEV GenLog Gumbel Kappa3
#> 1 1.36  -1.04  0.783 -0.246
#> 
#> [[2]]
#> [1] "Kappa3 has the best fit"

The output is a list, where the first element contains the Z-scores for each distribution, and the second element states which distribution has the best fit. In this case, the Kappa3 distribution appears to fit best.

The L-CV and L-skew for each site are provided in the pooling group and are derived from the NRFA AMAX data. If the user has further information, for example, an extra year of data to provide a new annual maximum, the L-CV and L-skew can be calculated using the Lcv() & LSkew() functions. The user’s data can be pasted in or read into the ‘R’ environment. Data can be read in using a function such as read.csv(), or a column of data can be pasted in from Excel by running the following code and then pressing “Ctrl+V” in the console (assuming you have already copied the required data):

MyData <- scan()

This creates a numeric vector called MyData that contains your flow data. The Lcv() & LSkew() functions take a numeric vector of flows as their only input, so you can then use Lcv(MyData) and LSkew(MyData) to derive the L-moment ratios.

The relevant results can be updated in the pooling group by applying the LRatioChange() function. For example, if you wanted a new pooling group with updated L-moment ratios for site 55002, the following code could be used:

# Update the pooling group to use user-supplied L-CV and L-skew values for site 55002
UpdatedPool <- LRatioChange(Pool.55002, SiteID = 55002, lcv = 0.187, lskew = 0.164)

# View the updated pooling group
UpdatedPool
#>           AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR  FARL  FPEXT    LDP
#> 55002 1894.257    297    128   0.08     0.421  95.63  134.4 0.967 0.0693 157.73
#> 76007 2276.000    248    329   0.08     0.493  62.12  103.3 0.971 0.0741 125.17
#> 54005 2026.770    249     99   0.10     0.444  67.88  133.6 0.977 0.0919 121.66
#> 12002 1833.213    447     82   0.07     0.455  66.91  169.4 0.980 0.0483 127.69
#> 21006 1505.540    358     77   0.07     0.444  43.88  191.9 0.963 0.0443  83.97
#> 23001 2172.360    286     83   0.09     0.342  55.02   93.7 0.961 0.0504  94.29
#> 76002 1374.845    284    330   0.08     0.499  62.71  124.8 0.955 0.0619 107.50
#> 12001 1380.062    512     93   0.07     0.455  60.44  185.7 0.976 0.0468 107.57
#> 8006  2852.390    460     17   0.06     0.438  87.36  156.9 0.959 0.0525 162.27
#>       PROPWET SAAR SPRHOST URBEXT2000     QMED       Lcv      LSkew     LKurt
#> 55002    0.50 1230   39.67     0.0029 410.0530 0.1870000 0.16400000 0.0994779
#> 76007    0.64 1182   37.80     0.0082 639.3560 0.1986134 0.23353973 0.2285538
#> 54005    0.50 1147   38.49     0.0042 308.6500 0.1424610 0.05356933 0.1278535
#> 12002    0.58 1080   39.69     0.0016 556.4020 0.1551762 0.03231110 0.1211022
#> 21006    0.58 1164   38.34     0.0023 398.4495 0.1888078 0.19790768 0.1556834
#> 23001    0.51 1016   48.19     0.0026 837.9620 0.1639191 0.14451529 0.1741923
#> 76002    0.65 1272   36.88     0.0050 413.4200 0.1907121 0.30184716 0.2355350
#> 12001    0.62 1108   40.27     0.0010 436.8090 0.2067674 0.07502250 0.1967439
#> 8006     0.63 1119   43.98     0.0013 500.7330 0.1761823 0.16587645 0.1238761
#>             L1        L2  N       SDM Discordancy Discordant?
#> 55002 432.3641  50.97940 49 0.0000000   1.4730236       FALSE
#> 76007 678.3431 134.72801 56 0.2741162   0.6909721       FALSE
#> 54005 313.7110  44.69157 71 0.3075738   0.7047585       FALSE
#> 12002 550.6405  85.44631 33 0.3546197   0.6614546       FALSE
#> 21006 422.4297  79.75803 62 0.4392073   0.9304440       FALSE
#> 23001 869.9198 142.59646 67 0.4649502   0.2671468       FALSE
#> 76002 466.2935  88.92779 39 0.4661871   1.4005348       FALSE
#> 12001 427.5946  88.41262 77 0.5497881   1.5316138       FALSE
#> 8006  523.8306  92.28970 71 0.6306813   1.3400518       FALSE

Final estimates (the growth curve and results)

To derive the final estimates, the PoolEst() function can be applied to the pooling group. It is necessary to provide an input for the QMED argument. Therefore, we will first obtain a QMED estimate (in this case, we will use the data available from the NRFA). To estimate QMED, we can either use the GetAM() function and assess the data manually, or we can get it more directly from the QMEDData data frame embedded within the package (these are calculated as the median of the AMAX data (rejected years excluded) from the NRFA Peak Flow Dataset). The following code provides the latter.

# Extract QMED from the QMEDdata data frame within the UKFE package
GetQMED(55002)
#> [1] 410.053

We can use functions within functions so we can include the above in PoolEst():

# Derive the growth curve and final flow estimates based on the pooling group and QMED
PoolEst(Pool.55002, QMED = GetQMED(55002), gauged = TRUE)
#> [[1]]
#>      RP        Q    GF lower95  upper95
#> 1     2  410.053 1.000 383.959  436.147
#> 2     5  492.945 1.202 453.269  532.621
#> 3    10  549.716 1.341 498.838  600.594
#> 4    20  608.258 1.483 544.130  672.386
#> 5    50  692.178 1.688 605.903  778.453
#> 6    75  732.440 1.786 634.167  830.713
#> 7   100  762.347 1.859 654.568  870.126
#> 8   200  839.430 2.047 704.741  974.119
#> 9   500  953.549 2.325 772.386 1134.712
#> 10 1000 1050.346 2.561 823.229 1277.463
#> 
#> [[2]]
#>      PooledLcv PooledLSkew
#> [1,] 0.1320962   0.1445543
#> 
#> [[3]]
#>   loc scale  shape
#> 1   1 0.132 -0.144
#> 
#> [[4]]
#>   loc scale  shape
#> 1 410    54 -0.144

The results are printed to the console. This is a list with four elements. The first is a data frame with the following columns: return period (RP), peak flow estimates (Q), growth factor estimates (GF), and factorial standard error. The latter is calculated using methods detailed in Circulation edition 152 (the British Hydrological Society magazine) and you can use it to calculate confidence intervals. The second element is the estimated pooled L-CV and L-skew (linear coefficients of variation and skewness). The third and fourth elements provide parameters (location, scale, and shape) for the standardised and non-standardised distributions, respectively. If the RP argument has been supplied by the user rather than left as the default, then only the first two elements are returned.

The user should be aware of the following specifics about how UKFE employs the pooled method:

The default QMED FSE is set to 1.55, as recommended in Statistical procedures for flood frequency estimation (Flood Estimation Handbook, Volume 3, Section 13.9.2). This value represents the factorial standard error associated with the combined model and sampling uncertainty of the QMED model. You can update FSE if donor catchments are applied or if the QMED is estimated from a gauged sample.
The default distribution applied is the generalised logistic distribution.
The final FSE comprises the FSE associated with QMED estimation and growth curve estimation. The FSE associated with the growth curve quantifies sampling error (aleatoric uncertainty) only. The FSE associated with QMED includes model uncertainty. In general, the uncertainty associated with the modelling process or the hydrometric data collection is not included in the FSE. Many choices are made during the process, and deriving a range of results from a range of justifiable choices is a way to quantify some of the epistemic uncertainty.
By default, urban adjustment for the L-CV and L-skew is not applied. If the user thinks these are necessary, they can set the UrbAdj argument to TRUE.

The EVPool() function can be used to plot a growth curve (which should look like the one from the QuickResults() example above).

# Plot the growth curve
EVPool(Pool.55002, gauged = TRUE)

For the ungauged catchment, the same process can be followed. First, we’ll exclude the site from our pooling group. We will assume we went through a rigorous pooling group analysis and concluded all was well:

# Recreate the pooling group, excluding NRFA gauge 55002 to make it an ungauged analysis
PoolUG.55002 <- Pool(CDs.55002, exclude = 55002)

# View the new pooling group
PoolUG.55002
#>           AREA ALTBAR ASPBAR ASPVAR BFIHOST19 DPLBAR DPSBAR  FARL  FPEXT    LDP
#> 8010  1745.842    495     13   0.05     0.422  51.13  158.7 0.938 0.0612 102.29
#> 76007 2276.000    248    329   0.08     0.493  62.12  103.3 0.971 0.0741 125.17
#> 54005 2026.770    249     99   0.10     0.444  67.88  133.6 0.977 0.0919 121.66
#> 12002 1833.213    447     82   0.07     0.455  66.91  169.4 0.980 0.0483 127.69
#> 21006 1505.540    358     77   0.07     0.444  43.88  191.9 0.963 0.0443  83.97
#> 23001 2172.360    286     83   0.09     0.342  55.02   93.7 0.961 0.0504  94.29
#> 76002 1374.845    284    330   0.08     0.499  62.71  124.8 0.955 0.0619 107.50
#> 76017 1371.697    284    330   0.08     0.499  61.17  124.9 0.955 0.0615 105.81
#> 12001 1380.062    512     93   0.07     0.455  60.44  185.7 0.976 0.0468 107.57
#> 8006  2852.390    460     17   0.06     0.438  87.36  156.9 0.959 0.0525 162.27
#>       PROPWET SAAR SPRHOST URBEXT2000     QMED       Lcv      LSkew      LKurt
#> 8010     0.69 1194   46.42     0.0012 240.8780 0.1726428 0.09243711 0.08754162
#> 76007    0.64 1182   37.80     0.0082 639.3560 0.1986134 0.23353973 0.22855384
#> 54005    0.50 1147   38.49     0.0042 308.6500 0.1424610 0.05356933 0.12785352
#> 12002    0.58 1080   39.69     0.0016 556.4020 0.1551762 0.03231110 0.12110222
#> 21006    0.58 1164   38.34     0.0023 398.4495 0.1888078 0.19790768 0.15568335
#> 23001    0.51 1016   48.19     0.0026 837.9620 0.1639191 0.14451529 0.17419227
#> 76002    0.65 1272   36.88     0.0050 413.4200 0.1907121 0.30184716 0.23553495
#> 76017    0.65 1273   36.89     0.0049 481.8050 0.2811401 0.40201113 0.28327368
#> 12001    0.62 1108   40.27     0.0010 436.8090 0.2067674 0.07502250 0.19674386
#> 8006     0.63 1119   43.98     0.0013 500.7330 0.1761823 0.16587645 0.12387605
#>             L1        L2  N       SDM Discordancy Discordant?
#> 8010  248.8548  42.96299 71 0.2409683   1.2961436       FALSE
#> 76007 678.3431 134.72801 56 0.2741162   0.4807292       FALSE
#> 54005 313.7110  44.69157 71 0.3075738   0.5870875       FALSE
#> 12002 550.6405  85.44631 33 0.3546197   0.5260091       FALSE
#> 21006 422.4297  79.75803 62 0.4392073   0.2908117       FALSE
#> 23001 869.9198 142.59646 67 0.4649502   0.4057173       FALSE
#> 76002 466.2935  88.92779 39 0.4661871   1.3895692       FALSE
#> 76017 606.2199 170.43270 21 0.4702818   2.2455011       FALSE
#> 12001 427.5946  88.41262 77 0.5497881   2.0897338       FALSE
#> 8006  523.8306  92.28970 71 0.6306813   0.6886975       FALSE

# Estimate QMED from the catchment descriptors with two donors (note that the QMED 
# value needs to be extracted using `$QMEDs.adj` as there are three elements to the 
# output when two donors are used and the PoolEst function expects just the QMED value)
CDsQmed.55002 <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj

# Use the pooling group and the ungauged QMED estimate to provide the pooled estimates
Results55002 <- PoolEst(PoolUG.55002, QMED = CDsQmed.55002)

# Print the results
Results55002
#> [[1]]
#>      RP        Q    GF  lower68  upper68
#> 1     2  562.017 1.000  384.943  820.545
#> 2     5  724.757 1.290  493.939 1063.436
#> 3    10  838.057 1.491  564.417 1244.364
#> 4    20  956.271 1.701  632.831 1445.021
#> 5    50 1127.930 2.007  722.014 1762.052
#> 6    75 1211.127 2.155  761.744 1925.619
#> 7   100 1273.253 2.266  789.938 2052.280
#> 8   200 1434.598 2.553  858.167 2398.217
#> 9   500 1676.419 2.983  949.737 2959.114
#> 10 1000 1884.038 3.352 1020.108 3479.630
#> 
#> [[2]]
#>      PooledLcv PooledLSkew
#> [1,] 0.1853155   0.1596821
#> 
#> [[3]]
#>   loc scale shape
#> 1   1 0.187 -0.16
#> 
#> [[4]]
#>   loc scale shape
#> 1 562   105 -0.16

# Plot a growth curve for the pooling group
EVPool(PoolUG.55002)

The same results are provided as for the gauged case.

Single site analysis

The quickest way to get results for a single sample are the GEV(), GenLog(), Kappa3(), Gumbel() and GenPareto() functions. For example, the following code can be used to obtain a 75-year return period estimate for an AMAX sample called MyAMAX using the generalised logistic distribution:

GenLogAM(MyAMAX, RP = 75)

An annual maximum series can be obtained for sites suitable for pooling using the GetAM() or AMImport() function (see the ‘Data input’ article). We will retrieve and plot the AMAX for our site 55002:

# Extract the AMAX data for NRFA site 55002
AM.55002 <- GetAM(55002)

# View the head of the AMAX series
head(AM.55002)
#>         Date    Flow    id
#> 1 1974-02-12 457.894 55002
#> 2 1975-01-22 410.053 55002
#> 3 1975-12-02 364.079 55002
#> 4 1977-02-03 286.690 55002
#> 5 1977-11-20 318.908 55002
#> 6 1978-12-14 397.771 55002

# Plot the AMAX data
AMplot(AM.55002)

Bar chart of annual maximum river flow. The x-axis shows years, and the y-axis shows peak flow in cubic meters per second. Each bar represents the highest flow in that year. The flows vary from year to year, with several notably high peaks in recent years.

The AMplot() function returns a bar plot of the AMAX series.

There are four functions for each of the GEV, GenLog, Gumbel, Kappa3, and GenPareto distributions. In the function names below, XXX should be replaced by the distribution prefix (GEV, GenLog, Gumbel, Kapp3, or GenPareto). For the Generalised Pareto distribution, the first function is named GenParetoPOT() instead of GenParetoAM().

XXXAM() - estimates return levels or return periods directly from an AMAX (or POT for GenPareto) series.
XXXEst() - estimates return levels or return periods from user-supplied parameters.
XXXGF() - provides the growth factor as a function of L-CV, L-skew, and RP (the Gumbel version only requires the L-CV and RP).
XXXPars - estimates distribution parameters from an AMAX or POT series (using L-moments, or MLE in some cases).

We can look at the fit of the distributions using the ERPlot() or EVPlot() functions. By default, ERPlot() compares the percentage difference between simulated results with the observed for each rank of the data. Another option (see the ERType argument) compares the simulated flows for each rank of the sample with the observed flows of the same rank. For both plots, 500 simulated samples are used. The function’s help file provides more information, including how to compare the distribution estimated from pooling with the observed data. This is an updated version of that described in Hammond (2019).

The EVPlot() function plots the extreme value frequency curve or growth curve with observed sample points. The uncertainty is quantified by bootstrapping.

The following code generates an extreme rank plot and an extreme value plot using the GEV distribution:

# Generate an extreme rank plot for the AMAX data for NRFA site 55002 and the GEV 
# distribution and set a user-defined title (using the 'main' argument)
ERPlot(AM.55002$Flow, dist = "GEV", main = "Extreme rank plot for NRFA site 55002: GEV")

# Generate an extreme value plot for the AMAX data for NRFA site 55002 and the GEV
# distribution and set a user-defined title (using the 'Title' argument)
EVPlot(AM.55002$Flow, dist = "GEV", Title = "Extreme value plot for NRFA site 55002: GEV")

As an example, we can estimate the 100-year flow. Then, estimate the return period of the maximum observed flow. Make sure to select the “Flow” column.

# Estimate the 100-year flow directly from the AMAX data for NRFA site 55002 using
# the GEV distribution
GEVAM(AM.55002$Flow, RP = 100)
#> [1] 705.0645

# Estimate the return period of the maximum observed flow directly from the AMAX 
# data using the GEV distribution
GEVAM(AM.55002$Flow, q = max(AM.55002$Flow))
#> [1] 83.23851

We can also estimate the parameters of the GEV from the data, using the default L-moments, and estimate quantiles as a function of return period (and vice versa) from user input parameters.

# Estimate the parameters of the GEV distribution from the AMAX data for NRFA site 55002
GEVPars55002 <- GEVPars(AM.55002$Flow)

# View the parameters
GEVPars55002
#>        Loc    Scale      Shape
#> 1 391.8503 77.26544 0.05618528

# Estimate the 75-year flow using these GEV parameters 
GEVEst(loc = GEVPars55002$Loc, scale = GEVPars55002$Scale, shape = GEVPars55002$Shape, RP = 75)
#> [1] 687.6577

If we derive L-moments for the site, we can then estimate growth factors. The LMoments() function does this. We can also calculate L-CV and L-skew separately using the Lcv() and LSkew() functions.

# Estimate the 75-year GEV growth factor from the L-moments for the AMAX data for 
# NRFA site 55002
GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75)
#> [1] 1.63775

We can estimate the 75-year flow by combining this latter result with QMED (the median of the AMAX record).

# Estimate the 75-year flow by multiplying the growth factor by the median AMAX flow
GEVGF(lcv = Lcv(AM.55002$Flow), lskew= LSkew(AM.55002$Flow), RP = 75) * median(AM.55002$Flow)
#> [1] 671.5643

This gives a different 75-year flow estimate from estimating it directly using the GEV parameters because here we are incorporating QMED from the data.

Peaks over threshold

A peaks over threshold (POT) series can also be obtained from a time series using the POTextract() function. A daily sample of the Thames at Kingston flow is included in the package for example purposes (the data frame ThamesPQ). The POTextract() function can be used on a single vector or a data frame, with dates in the first column and the time series of interest in the second. In the ThamesPQ data frame, the Thames flow is in the third column and the date in the first. Therefore, the POT series can be extracted as follows:

# Extract a POT series from the Thames at Kingston daily mean flow data, selecting 
# the first and third columns from the ThamesPQ data frame which contain the date 
# and flow data respectively. The threshold is set as the 90th percentile.
POT.Thames <- POTextract(ThamesPQ[, c(1, 3)], thresh = 0.9)

#> [1] "Peaks per year: 1.867263"

# View the output
POT.Thames
#>           Date  peak
#> 3   2000-11-07 440.0
#> 25  2002-02-05 316.0
#> 34  2003-01-02 461.0
#> 40  2004-01-13 227.0
#> 41  2004-02-02 238.0
#> 43  2004-05-05 194.0
#> 56  2007-03-07 330.0
#> 58  2007-07-26 274.0
#> 59  2007-11-20 196.0
#> 62  2007-12-09 201.0
#> 64  2008-01-16 362.0
#> 66  2008-03-17 239.0
#> 67  2008-06-04 232.0
#> 68  2008-11-11 231.0
#> 69  2008-12-14 286.0
#> 72  2009-02-11 369.0
#> 77  2010-01-18 312.0
#> 82  2010-03-01 309.0
#> 83  2010-04-04 178.0
#> 84  2011-01-18 289.0
#> 85  2012-05-01 260.0
#> 87  2012-06-12 216.0
#> 88  2012-11-05 203.0
#> 91  2012-12-26 407.0
#> 106 2014-02-09 502.5
#> 110 2014-04-27 186.4
#> 111 2014-11-24 192.3
#> 113 2015-01-16 250.6

This produces a figure showing the daily mean flow data with peaks over the threshold shown as red circles. The blue line is the threshold, and the green line is the mean flow. Independent peaks are chosen above the threshold. If peaks are separated by a drop below the mean flow (green line), they are considered independent. More details can be found in the function’s help file (including other options for defining independence and how to rename axis labels and the plot title).

Note that you can also use the POTt() function for extracting peaks over threshold. It works significantly faster, but only has the option of separating peaks by the number of timesteps.

The POT approach is similar to the AMAX approach, except that we need to convert from the POT RP scale to the annual RP scale. To do so, the functions have a peaks per year (ppy) argument. The POTextract() function provides the number of peaks per year (in this case, 1.867). To estimate the associated 100-year daily mean flow peak, the GenPareto function can be used as follows:

# Estimate the 100-year flow for the Thames at Kingston using the extracted POT series 
GenParetoPOT(POT.Thames$peak, ppy = 1.867, RP = 100)
#> [1] 605.432

Uncertainty

Methods for quantifying aleatoric (sampling) uncertainty are provided here for three estimation methods: single-site, enhanced single-site (or gauged pooling) and ungauged pooling. The uncertainty is predicated on forming a sampling distribution for the statistic of interest, and in the three cases (single site, gauged pooling, and ungauged pooling), it is done using bootstrapping. In the pooling approaches (gauged and ungauged), the whole pooling group is bootstrapped, and the weighted estimates provided on every iteration.

Single-site

Extract an annual maximum sample with the following code.

# Extract the AMAX for NRFA site 55002
AM.55002 <- GetAM(55002)

The uncertainty for the 100-year flow can be estimated using the Bootstrap() function. The 95% confidence intervals are provided by default, but this can be adjusted using the Conf argument. The lower limit is provided in the ‘Lower’ column and the upper limit in the ‘Upper’ column.

Note that the Bootstrap() function can also be applied for any function/statistic, and the sampling distribution developed can be returned.

# Estimate uncertainty for the GEV-estimated 100-year flow
Bootstrap(AM.55002$Flow, Stat = GEVAM, RP = 100)
#>   Centre Lower Upper
#> 1    705   619   804

Ungauged pooling

The Uncertainty() function can be used for the pooled estimates to quantify the sampling uncertainty. The estimated QMED and the pooling group are required. Furthermore, the factorial standard error needs to be provided for the ungauged case, using the fseQMED argument (the default is 1.55, as recommended in Statistical procedures for flood frequency estimation (Flood Estimation Handbook, Volume 3, Section 13.9.2)). Note that by default the generalised logistic distribution is selected for the estimates, but this can be changed.

The following example to estimate uncertainty for the ungauged pooled estimates for NRFA site 55002 uses the catchment descriptors object created earlier (CDs.55002). This provides the 95% confidence interval by default, and this can be changed using the Conf argument. The lower limit is provided in the ‘Lower’ column and the upper limit in the ‘Upper’ column. There is an option to plot the flood frequency curve (return level plot), which includes the confidence interval.

# Estimate QMED for site 55002 from catchment descriptors using two donors
QMEDEst <- QMED(CDs.55002, Don2 = c(55007, 55016))$QMEDs.adj

# Create an ungauged pooling group for site 55002
PoolUG.55002 <- Pool(CDs.55002, exclude = 55002)

# Estimate the uncertainty of the pooling results 
Uncertainty(PoolUG.55002, qmed = QMEDEst)

#>      RP Central Lower Upper
#> 1     2     562   250  1260
#> 2     5     725   311  1590
#> 3    10     838   351  1850
#> 4    20     956   393  2060
#> 5    50    1130   451  2630
#> 6    75    1210   478  2910
#> 7   100    1270   498  3170
#> 8   200    1430   541  3830
#> 9   500    1680   587  5180
#> 10 1000    1880   615  6750

Enhanced single-site

The enhanced single-site case is similar to the above, but the Gauged argument is set to TRUE, and there is no need to set qmed; it is derived from the observed AMAX if Gauged = TRUE.

It is important to note that when Gauged equals TRUE, the top site in the pooling group is assumed to be the “gauged” site of interest. This will happen by default because the similarity distance measure should be zero - assuming the that “gauged” site is within the NRFAData data frame.

The following example uses the gauged pooling group we previously derived:

# Create a gauged pooling group for site 55002
Pool.55002 <- Pool(CDs = CDs.55002, exclude = c(8010, 76017))

The uncertainty for a range of return levels can then be quantified as follows:

# Estimate the uncertainty of the pooling results 
Uncertainty(Pool.55002, Gauged = TRUE)

#>      RP Central Lower Upper
#> 1     2     410   385   439
#> 2     5     493   457   531
#> 3    10     550   504   599
#> 4    20     608   551   669
#> 5    50     692   615   772
#> 6    75     732   645   825
#> 7   100     762   668   866
#> 8   200     839   721   980
#> 9   500     954   789  1150
#> 10 1000    1050   844  1300