---
title: "Data input"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data-input}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Data within the UKFE package

The UKFE package includes several datasets that can be used used in analyses. These are based on data from the National River Flow Archive (NRFA). There is a pre-processing script that converts new releases of the NRFA Peak Flow Dataset into data frames suitable for use within UKFE (this can be found in the 'inst' folder of the package). UKFE is updated shortly after each release to use the latest data. The user can also input their own data.

The UKFE package contains five datasets. These are:

* `AMSP`: This contains annual maximum data from the NRFA for sites suitable for pooling. These are from the AM files with years classed as rejected removed. This is a data frame with three columns containing the date, annual maximum peak flow and the NRFA gauge ID.

* `NRFAData`: This contains catchment descriptors from the NRFA and calculated statistics for sites suitable for pooling. The statistics include L-moments, L-moment ratios, sample size and QMED, all derived from the AMAX data with rejected years removed. It is possible for the user to temporarily edit this data frame.

* `QMEDData`: This contains catchment descriptors from the NRFA and calculated statistics for sites suitable for QMED and pooling. The statistics include QMED derived from the AMAX data with rejected years removed (`QMED`) and its factorial standard error (`QMEDfse`), QMED derived from catchment descriptors (`QMEDcd`) and the sample size of the AMAX data with rejected years removed (`N`). It is possible for the user to temporarily edit this data frame.

* `ThamesPQ`: This contains daily flow and catchment rainfall for the Thames at Kingston catchment from 2000-10-01 to 2015-09-30. There are three columns containing date (`Date`), precipitation (`P`) and daily mean flow (`Q`). Dates are in the format YYYY-MM-DD, following the ISO 8601 international standard. The data are from the NRFA (gauge 39001).

* `UKOutline`: This contains the eastings and northings around the coastline of the UK. The data are sourced from https://environment.data.gov.uk/.

These datasets each have a help file and can be viewed by typing the name of the dataset into the console, or can be saved to an object to view:

```{r setup}
# Load the package
library(UKFE)
```

```{r}
# Save the 'QMEDData' data frame within the UKFE package to an object within your R 
# environment
QMEDData <- QMEDData

# View the first rows of the data in the console
head(QMEDData)
```

The user can also supply their own data for use in analyses; however, AM files would need to be in the same format as those from the NRFA. Catchment descriptors for ungauged sites can be imported as XML files; these should either be from the FEH Web Service or NRFA, or be in the same format as those.

# Functions within the UKFE package for importing data

There are a range of functions for importing data, as set out in this section.

## Annual maximum data

An annual maximum series can be obtained for sites suitable for pooling using the `GetAM()` function. This extracts data from the embedded `AMSP` data frame within the UKFE package. For other AMAX series available from the NRFA Peak Flow Dataset, the `AMImport()` function can be used, as can the `GetDataNRFA()` function (with `Type = "AMAX"`). The former function imports the data from the AM files and excludes the years classed as rejected. The latter function extracts the AMAX using the NRFA API. If you have a flow time series, the `AnnualStat()` function can be used to extract the water year AMAX (or any other annual statistic of interest). The following example uses the `GetAM()` option. 

```{r fig.alt="Bar chart of annual maximum river flow. The x-axis shows years, and the y-axis shows peak flow in cubic meters per second. Each bar represents the highest flow in that year. The flows vary from year to year, with several notably high peaks in recent years."}
# Extract the AMAX data for NRFA site 55002 and save to an object called 'AM.55002'
AM.55002 <- GetAM(55002)

# View the head of the AMAX series
head(AM.55002)

# Plot the AMAX data
AMplot(AM.55002)
```

The `AMplot()` function returns a time series bar plot of the AMAX series.

## Catchment descriptors

Catchment descriptors (CDs) from the NRFA can be brought into the 'R' environment using the `GetCDs()` function. For gauged sites that are suitable for pooling or QMED, these are extracted from the `QMEDData` data frame, otherwise, they are extracted using the NRFA API. Note that if they are brought in from the NRFA API (when not suitable for QMED or pooling), some of the descriptors differ; for example, the gauge location is provided rather than the catchment centroid. There will be a warning message when this happens. An example of using the `GetCDs()` function to view the catchment descriptors for the gauge with an NRFA ID of 39001 is as follows:

```{r}
# Extract and view catchment descriptors for NRFA gauge 39001
GetCDs(39001)
```

It's useful to store them as an 'object' for use with other functions, in which case you can give them a name. You can assign the data to the named object using `<-`. For example:
```{r}
# Extract catchment descriptors for NRFA gauge 39001 and store in an object called 
# 'CDs.39001'
CDs.39001 <- GetCDs(39001)
```

Then, when you wish to view them, the object name `CDs.39001` can be entered into the console.

If you wish to derive CDs from an XML file for catchments that aren't suitable for pooling or QMED, or are not gauged at all, you can use the `CDsXML()` function. The file path will need to be used. For Windows operating systems, the backslashes will need to be changed to forward slashes, or the file path will need to be stated as follows: `r"{my\file\path}"`. For example, you can import some descriptors downloaded from the FEH Web Service as follows:
```{r, eval = FALSE}
# Extract catchment descriptors from an xml file and store in an object called 
# 'CDs.MySite'
CDs.MySite <- CDsXML("C:/Data/FEH_Catchment_384200_458200.xml")

# As above but retaining backslashes in the file path
CDs.MySite <- CDsXML(r"{C:\Data\FEH_Catchment_384200_458200.xml}")
```

Or if importing CDs from the NRFA Peak Flow Dataset:
```{r, eval = FALSE}
# Extract catchment descriptors from an xml file and store in an object called 
# 'CDs.27003'
CDs.27003 <- CDsXML("C:\Data\NRFAPeakFlow_v13-0-2\suitable-for-neither\027003.xml")
```

## Other hydrological data retrieval functions using APIs

There are several functions with names starting with `GetData` that extract data from the websites of different organisations using their APIs. These are: 

* `GetDataEA_QH()`: Extracts flow or level data from the Environment Agency's Hydrology Data Explorer.

* `GetDataEA_Rain()`: Extracts rainfall data from the Environment Agency's Hydrology Data Explorer.

* `GetDataMetOffice()`: Extracts regional mean (monthly, seasonal and annual) temperature or rainfall from the UK Met Office. Sunshine duration is also available.

* `GetDataNRFA()`: Extracts National River Flow Archive data (daily mean flow or catchment rainfall, AMAX, POT, gaugings and metadata).

* `GetDataSEPA_QH()`: Extracts flow or level data from the Scottish Environment Protection Agency.

* `GetDataSEPA_Rain()`: Extracts hourly rainfall data from the Scottish Environment Protection Agency.

There are examples for all of these within each function's help file.

## QMED

The `GetQMED()` function can be used to import the QMED data from the `QMEDData` data frame (derived from AMAX data). If it is not in that data frame, it automatically imports the AMAX data using the `GetAM()` function and calculates the median.