---
title: "A short introduction to climodr"
output: 
  rmarkdown::html_vignette:
    toc: TRUE
    toc_depth: 3
    number_sections: TRUE
    highlight: tango
vignette: >
  %\VignetteIndexEntry{climodr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
Version 0.0.0.9003

# Introduction to climodr

Welcome to climate modeler in R, short climodr. 
This package uses point data from climate stations, spectral imagery and 
elevation models to automatically create ready-to-use climate maps.\ 

First of all, the idea of climodr is to deliver an easy to use method for 
creating high quality climate maps. Like the one we create in this vignette:

![Temperature Map produced with climodr](map_vignette.png)

Note: This example will be created with dummy data and will not create a good
model, it is for educational purposes only.\ 
\ 
Lets take a look into the basic structure of climodr:
![Climodr Structure](Climodr_Structure.png)

Climodr is mainly split into four steps:\ 
Environment, Pre-Processing, Processing and Plotting\ 
With only these four steps climodr creates basic but resilient climate models and climate maps, without making it to complicated for the user.\ 

This Vignette should guide you through the package, explain its functions and give 
you an idea of how to use climodr. The example contains a few dummy climate stations, a metadata-file for the climate stations, a vector file containing our area of interest, a small multi band satellite image and a digital elevation model (DEM). And this is everything one needs to run climodr!\ 

This example will create climate maps for our climate sensor TA_200, which is the air temperature measured at a height of 2m above ground.\ 


# Getting Started with climodr

The Idea of climodr is, to speed up climate modelling processes and make them 
easier to use. The package foresees that one needs to store relevant input data 
into one folder structure and the package does the rest using the example 
workflow provided in this vignette.\ 
The functions are still modifyable, so one can adjust the model workflow to your 
liking.

## Downloading climodr

To start with climodr, you first need to download and install the package 
from CRAN.  

```{r install, eval = FALSE}
#install climodr
install.packages("climodr") 
```

It may asks you to install all packages climodr needs to execute all its 
functions. Its mandatory to install these packages, otherwise climodr won't be
able to execute its functions, as it takes many functions into its comprehensive
workflow.

You can also install the latest development version from the
Environmental Informatics Lab (envima) @ Marburg University from Github. 
To do so, you need [devtools](https://devtools.r-lib.org/)
installed to your R. Once devtools is installed, you can simply add climodr by 
following commands.  

```{r install dev, eval = FALSE}
#install climodr (last a bit, but shouldn't take longer than 5-10 Minutes)
devtools::install_github(
  "https://github.com/envima/climodr.git", 
  auth_token = "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", 
  dependencies = TRUE,
  build_vignettes = TRUE)
```

Nice to add:
Link to the help-pages on CRAN, once package is published

## How to setup climodr

Setting up climodr just requires one step, before you can get started.  
Activating the package will cause some conflicts originating from tydiverse. 
This is completely fine, as these functions won't be tackled. You can still use
the old functions by addressing the package, e.g. like `stats::filter()`.\
\
With the *envi.create()* function, one points out a path where the package 
should store all its data. There is also the *memfrac* argument. This argument 
allows you to change the fraction of your RAM the terra-package is allowed to use.
By default this number is pretty low, so this way the process can be sped up.

```{r setup env}
library(climodr)

# setting up the environment for climodr
envrmt <- envi.create(tempdir(),
                      memfrac = 0.8)

# load in all the climodr example data for this vignette
clim.sample(envrmt = envrmt)

# remove everything in the global environment except of our environment path list
rm(list = setdiff(ls(), "envrmt"))
```

Climodr then creates an environment with three main folders:\
- Input (for all necessary data the user must bring)\
- Output (for ready-to-use data created by climodr)\
- Workflow (for climodr to store data during the process)\

![Climodr Environment](Environment.png)

The Input-Directory is the place, where all data, which shall be used for modelling, should be saved beforehand. It consists of four different folders:\
- dep (Dependency, like a resolution image or metadata)\
- raster (Raster data, work in progress)\
- tabular (Tabular data, containing climate data from the climate stations)\
- vector (Vector data, like the study area or climate station point data)\

See [list of possible inputs](https://envima.github.io/climodr/unit02/unit02-02_prepcsv.html) for further details, what kind of input-data can be used.\
The Output-Folder is the place, where all final data, which is created by the package, is stored in. It consists of three different folders:\
- maps (basic ready-to-use maps)\
- predictions (plain prediction imagery)\
- statistics (perfomance of the predictions and other statistics)\

The Output-Directory contains all the reade-to-use data in some basic formats, which should be publication-ready if no other needs are wanted or required. 

The Workflow-Directory contains all steps in between the Input and the Output. In here there are models, test and training data, clean tabular data, and so on.

```
Note: 
 - Do not delete any of these folders, since climodr requires those to run properly!
 - The higher you set the fraction of RAM that climodr will use, the slower the
PC will become when running climodr, in case you want to do something in parallel 
on the PC. Using a fraction > 0.8 can even make it hard to use a browser while
using climodr.
```

# Pre-Processing

For this package, a small showcase product has been edited, which comes with 
climodr. Its a small scene located in Hainich national park in Germany. There are 
ten climate stations located in this scene.\

```
Note: This is just an example with very few stations and a very small scene,
which will not result into a good model and is only used for educational purposes. 
```

You don't have to use raw format data that wasn't pre-processed earlier. If you 
have data, that equals one of the levels in this Pre-Processing step, you can 
step in at the corresponding stage. For now you'll need to take care that the 
data matches the pattern climodr produces in this workflow. 

## Prepare tabular data for processing

First, we have to prepare the raw tabular data for further uses. The prep.csv 
function cleans up the data and removes all NA values from the data.

```{r prep csv}
prep.csv(envrmt = envrmt, 
         method = "proc", 
         save_output = TRUE)

#check the created csv files
csv_files <- grep("_no_NAs.csv$", 
                  list.files(envrmt$path_tworkflow), 
                   value=TRUE)
csv_files
```

## Process tabular data to average values

Next, the data needs to be aggregated for to the desired time steps.\
In this version one can aggregate data into "monthly" and into "anual" data.\
The *rbind* argument stores all climate station data into one file. This step is 
recommended, since the data usually will become way shorter after the time 
aggregation and is easier to be processed further this way. 


```{r proc csv}
csv_data <- proc.csv(envrmt = envrmt, 
                     method = "monthly",
                     rbind = TRUE,
                     save_output = TRUE)
head(csv_data)
```

## Spatial aggregation of tabular data

Next, the stations have to be spatially located in a coordinate system. This 
step is crucial to process the data in a modelation use case.\

```{r spat csv}
csv_spat <- spat.csv(envrmt = envrmt, 
                     method = "monthly",
                     des_file = "plot_description.csv",
                     save_output = TRUE)
head(csv_spat)
```

## Pre-Process Raster Data for data extraction

Now, that we have spatial points of our stations, we can continue with our raster 
data. The preferred *method* here is `"MB_Timeseries"`, which stands for **multi band time series**. Use this method, if you provide multiple single band rasters or raster 
stacks with different time stamps (YYYYMMDD...) *in the file names* per scene. The function sorts them by date and crops the data to our study area.

```{r crop all}
crop.all(envrmt = envrmt, 
         method = "MB_Timeseries", 
         overwrite = TRUE)
```
One important step in climate modelling is to have good predictor variables, which 
we extract from our spatial raster imagery. In this case we have the 10 spectral 
bands from the spectral raster stack and one additional elevation layer. This 
data is already okay to predict with, but may not be sufficient. We can enhance 
our predictors by creating even more spatial raster layers. This way we can gain 
more information from these layers by calculating spectral indices, which derive 
from different spectral layers. For example, we can calculate the NDVI, which 
indicates the presence or absence of chlorophyll and thus of vegetation. These 
layers can also be fed in our model as new predictor variables.\
Next, we calculate some basic indices, so we can create more predictor variables
for our models. Therefore `vi` chooses the vegetation indices one wants to create.
You can either list the desired indices in a *vector*, or simply use `all` to 
generate all available indices. For more detailed information use `?calc.indices`
in the console.

```{r calc indices}
calc.indices(envrmt = envrmt, 
             vi = "all",
             bands = c("blue", "green", "red", 
                       "nir", "nirb", 
                       "re1", "re2", "re3", 
                       "swir1", "swir2"),
             overwrite = TRUE)
```

## Finalize tabular data for modelling

Now, that we have spatial points as well as raster data, we can extract additional
predictor variables at the station points from the spatial raster data. Therefore 
we use the `fin.csv` function. The function uses the positions we have added to our climate station data using `spat.csv` to extract the raster values at these positions from every layer of the spatial raster data and adds the data to each corresponding climate station. During our modeling steps this climate station csv-file will be used to generate Spatial Points which then will be used to train our models. 

```
Reminder: You can check your data at any step, just go into your Workflow-Folder of the Project directory you defined in the beginning with `envi.create` and take a look into this data. Just make sure to not alter that data, as this may cause climodr to nut run following functions correctly.
```

```{r finalize csv}
csv_fin <- fin.csv(envrmt = envrmt, 
                   method = "monthly",
                   save_output = TRUE)
head(csv_fin)
```

Now the data is ready for further modelling.

![Pre-processed data created by climodr](Preprocessing.png)
# Processing

In this step the spatial raster data and the climate station data is ready to use. 
If your data isn't, check out the 'Pre-Processing' chapter.

## Test for Autocorrelation

First, one tests the data for autocorrelation. The evaluation vector contains 
all columns with the sensor data and predictor variables, which will be tested for 
autocorrelation. It creates the first outputs in the package with one tabular-file 
per sensor, which contains all columns which should be excluded from 
the modulation because they autocorrelate. It also creates some visualization 
for the user of the autocorrelation, if `plot.corrplot` is set to *TRUE*.\

Note': The visualization is quite messy, when there are a lot of predictors. 
Maybe make it prettier in future.

```{r autocorr, warning = FALSE}
autocorr(
  envrmt = envrmt, 
  method = "monthly",
  resp = 5, 
  pred = c(8:23),
  plot.corrplot = TRUE,
  corrplot = "coef"
  )
```

## Create climate models from spatial station data

![Processing steps during model workflow](Processing.png)

Now, that we've done all necessary previous steps, one can start modelling. This
is by far the function with the most variables. Here we give a quick overview, 
what these arguments do. For more detailed information take a look into the 
associated paper for this package at **(to be published)**. 

`timespan` = Vector with last two digits of years to build models from (in this 
example 2017)\
`climresp` = Vector of rows to create models for. (In this example Ta_200)\
`classifier` = Vector of all model variants to be used. In this case:\
\    - random forest = "rf"  
\    - partial-least-squares = "pls"  
\    - neural networks = "nnet"  
\    - linear regression = "lm"  
 \
`seed` = Number to "pick randomness". With the seed one can reproduce random 
pulls.\
`p` = Fraction of random taken training data from full data.\
`folds` = Character or vector. Method to Create spacetime folds. "all" ,"LLO", 
"LTO" or "LLTO".\
`mnote` = Character. "Model Note". 6 digits, Marks the different model runs in a
project.\
`predrows` = Vector with the row numbers used as predictors.\
`tc_method` = Train control method. Default is cross validation "cv".\
`metric` = Summary Metric to select optimal model. Default: Root Mean Square 
Error "RMSE".\
`autocorrelation` = Logical Parameter. TRUE, if the results of the autocorrelation 
should be considered.\
`doParallel` = Logical Parameter. When set True, the Model-Process will 
parallelize on all cores except two, so your PC will slow down a lot. Only 
recommended for PCs with at least 8 Cores. Warning: Your PC wont be able to 
process other stuff efficiently during parallelization.\

Once all parameters are set, one can run the model workflow `calc.model` like this:
```{r model, warning = FALSE}
calc.model(
  envrmt = envrmt, 
  method = "monthly",
  timespan = c(2017),
  climresp = c(5),
  classifier = c(
    "rf", 
    "pls", 
    "lm"),
  seed = 707,
  p = 0.8,
  folds = "LLO",
  mnote = "vignette",
  predrows = c(8:23),
  tc_method = "cv",
  metric = "RMSE",
  autocorrelation = TRUE,
  doParallel = FALSE)
```

Congratulations, you have created your first models using climodr!\
\
Climodr also creates an evaluation data frame, which is saved in the statistics
folder. These performance information are later used to predict the best models 
from this model run. 

## Predictions

Further we can predict the scenes from our example with the climate models from
our spatial station data using the `climpred` function. `climpred` also calculates 
the area of applicability, if the `AOA` argument is set to *TRUE*. In this example
the AOA will create a lot of dissimilaritys, because the sample data is dummy data 
and far away from the reality of the new data. Keep in mind to use the same `mnote` 
as in your model run, so climodr predicts with the models you created in this 
modelrun.\

```{r predict}
climpred(
  envrmt = envrmt, 
  method = "monthly",
  mnote = "vignette", 
  AOA = TRUE) 
```


Lets show the list of predictions:
```{r list predictions}
predlist <- list.files(envrmt$path_predictions, 
                       pattern = ".tif", 
                       recursive = TRUE)
head(predlist)
```

For easier navigation and search the names of the predictions, as well as all other
names created by using climodr, follow a pattern. They always consist of:
The *Mnote* - the *Sensor Name* - the *Date* of the Scene - the *Folds* used during 
the model run - the *model classifier* - and the word *prediction*. 

# Plotting

The `climplot` function finally plots your predictions and saves them in the
output folder. It uses the plotting functions from the `terra`-package. These 
plots are very simple, but they consist of all important information you'll 
need in a map. 

You can create plots of your predictions like this:
```{r plot predictions}
climplot(
  envrmt = envrmt, 
  mnote = "vignette",
  sensor = "Ta_200",
  aoa = TRUE,
  mapcolors = rev(heat.colors(50)),
  scale_position = "bottomleft",
  north_position = "topright"
)
```

So in the end you receive a map like the one we saw in the beginning. 
![Temperature Map produced with climodr](map_vignette_2.png)