--- title: "A short introduction to climodr" output: rmarkdown::html_vignette: toc: TRUE toc_depth: 3 number_sections: TRUE highlight: tango vignette: > %\VignetteIndexEntry{climodr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Version 0.0.0.9003 # Introduction to climodr Welcome to climate modeler in R, short climodr. This package uses point data from climate stations, spectral imagery and elevation models to automatically create ready-to-use climate maps.\ First of all, the idea of climodr is to deliver an easy to use method for creating high quality climate maps. Like the one we create in this vignette: ![Temperature Map produced with climodr](map_vignette.png) Note: This example will be created with dummy data and will not create a good model, it is for educational purposes only.\ \ Lets take a look into the basic structure of climodr: ![Climodr Structure](Climodr_Structure.png) Climodr is mainly split into four steps:\ Environment, Pre-Processing, Processing and Plotting\ With only these four steps climodr creates basic but resilient climate models and climate maps, without making it to complicated for the user.\ This Vignette should guide you through the package, explain its functions and give you an idea of how to use climodr. The example contains a few dummy climate stations, a metadata-file for the climate stations, a vector file containing our area of interest, a small multi band satellite image and a digital elevation model (DEM). And this is everything one needs to run climodr!\ This example will create climate maps for our climate sensor TA_200, which is the air temperature measured at a height of 2m above ground.\ # Getting Started with climodr The Idea of climodr is, to speed up climate modelling processes and make them easier to use. The package foresees that one needs to store relevant input data into one folder structure and the package does the rest using the example workflow provided in this vignette.\ The functions are still modifyable, so one can adjust the model workflow to your liking. ## Downloading climodr To start with climodr, you first need to download and install the package from CRAN. ```{r install, eval = FALSE} #install climodr install.packages("climodr") ``` It may asks you to install all packages climodr needs to execute all its functions. Its mandatory to install these packages, otherwise climodr won't be able to execute its functions, as it takes many functions into its comprehensive workflow. You can also install the latest development version from the Environmental Informatics Lab (envima) @ Marburg University from Github. To do so, you need [devtools](https://devtools.r-lib.org/) installed to your R. Once devtools is installed, you can simply add climodr by following commands. ```{r install dev, eval = FALSE} #install climodr (last a bit, but shouldn't take longer than 5-10 Minutes) devtools::install_github( "https://github.com/envima/climodr.git", auth_token = "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", dependencies = TRUE, build_vignettes = TRUE) ``` Nice to add: Link to the help-pages on CRAN, once package is published ## How to setup climodr Setting up climodr just requires one step, before you can get started. Activating the package will cause some conflicts originating from tydiverse. This is completely fine, as these functions won't be tackled. You can still use the old functions by addressing the package, e.g. like `stats::filter()`.\ \ With the *envi.create()* function, one points out a path where the package should store all its data. There is also the *memfrac* argument. This argument allows you to change the fraction of your RAM the terra-package is allowed to use. By default this number is pretty low, so this way the process can be sped up. ```{r setup env} library(climodr) # setting up the environment for climodr envrmt <- envi.create(tempdir(), memfrac = 0.8) # load in all the climodr example data for this vignette clim.sample(envrmt = envrmt) # remove everything in the global environment except of our environment path list rm(list = setdiff(ls(), "envrmt")) ``` Climodr then creates an environment with three main folders:\ - Input (for all necessary data the user must bring)\ - Output (for ready-to-use data created by climodr)\ - Workflow (for climodr to store data during the process)\ ![Climodr Environment](Environment.png) The Input-Directory is the place, where all data, which shall be used for modelling, should be saved beforehand. It consists of four different folders:\ - dep (Dependency, like a resolution image or metadata)\ - raster (Raster data, work in progress)\ - tabular (Tabular data, containing climate data from the climate stations)\ - vector (Vector data, like the study area or climate station point data)\ See [list of possible inputs](https://envima.github.io/climodr/unit02/unit02-02_prepcsv.html) for further details, what kind of input-data can be used.\ The Output-Folder is the place, where all final data, which is created by the package, is stored in. It consists of three different folders:\ - maps (basic ready-to-use maps)\ - predictions (plain prediction imagery)\ - statistics (perfomance of the predictions and other statistics)\ The Output-Directory contains all the reade-to-use data in some basic formats, which should be publication-ready if no other needs are wanted or required. The Workflow-Directory contains all steps in between the Input and the Output. In here there are models, test and training data, clean tabular data, and so on. ``` Note: - Do not delete any of these folders, since climodr requires those to run properly! - The higher you set the fraction of RAM that climodr will use, the slower the PC will become when running climodr, in case you want to do something in parallel on the PC. Using a fraction > 0.8 can even make it hard to use a browser while using climodr. ``` # Pre-Processing For this package, a small showcase product has been edited, which comes with climodr. Its a small scene located in Hainich national park in Germany. There are ten climate stations located in this scene.\ ``` Note: This is just an example with very few stations and a very small scene, which will not result into a good model and is only used for educational purposes. ``` You don't have to use raw format data that wasn't pre-processed earlier. If you have data, that equals one of the levels in this Pre-Processing step, you can step in at the corresponding stage. For now you'll need to take care that the data matches the pattern climodr produces in this workflow. ## Prepare tabular data for processing First, we have to prepare the raw tabular data for further uses. The prep.csv function cleans up the data and removes all NA values from the data. ```{r prep csv} prep.csv(envrmt = envrmt, method = "proc", save_output = TRUE) #check the created csv files csv_files <- grep("_no_NAs.csv$", list.files(envrmt$path_tworkflow), value=TRUE) csv_files ``` ## Process tabular data to average values Next, the data needs to be aggregated for to the desired time steps.\ In this version one can aggregate data into "monthly" and into "anual" data.\ The *rbind* argument stores all climate station data into one file. This step is recommended, since the data usually will become way shorter after the time aggregation and is easier to be processed further this way. ```{r proc csv} csv_data <- proc.csv(envrmt = envrmt, method = "monthly", rbind = TRUE, save_output = TRUE) head(csv_data) ``` ## Spatial aggregation of tabular data Next, the stations have to be spatially located in a coordinate system. This step is crucial to process the data in a modelation use case.\ ```{r spat csv} csv_spat <- spat.csv(envrmt = envrmt, method = "monthly", des_file = "plot_description.csv", save_output = TRUE) head(csv_spat) ``` ## Pre-Process Raster Data for data extraction Now, that we have spatial points of our stations, we can continue with our raster data. The preferred *method* here is `"MB_Timeseries"`, which stands for **multi band time series**. Use this method, if you provide multiple single band rasters or raster stacks with different time stamps (YYYYMMDD...) *in the file names* per scene. The function sorts them by date and crops the data to our study area. ```{r crop all} crop.all(envrmt = envrmt, method = "MB_Timeseries", overwrite = TRUE) ``` One important step in climate modelling is to have good predictor variables, which we extract from our spatial raster imagery. In this case we have the 10 spectral bands from the spectral raster stack and one additional elevation layer. This data is already okay to predict with, but may not be sufficient. We can enhance our predictors by creating even more spatial raster layers. This way we can gain more information from these layers by calculating spectral indices, which derive from different spectral layers. For example, we can calculate the NDVI, which indicates the presence or absence of chlorophyll and thus of vegetation. These layers can also be fed in our model as new predictor variables.\ Next, we calculate some basic indices, so we can create more predictor variables for our models. Therefore `vi` chooses the vegetation indices one wants to create. You can either list the desired indices in a *vector*, or simply use `all` to generate all available indices. For more detailed information use `?calc.indices` in the console. ```{r calc indices} calc.indices(envrmt = envrmt, vi = "all", bands = c("blue", "green", "red", "nir", "nirb", "re1", "re2", "re3", "swir1", "swir2"), overwrite = TRUE) ``` ## Finalize tabular data for modelling Now, that we have spatial points as well as raster data, we can extract additional predictor variables at the station points from the spatial raster data. Therefore we use the `fin.csv` function. The function uses the positions we have added to our climate station data using `spat.csv` to extract the raster values at these positions from every layer of the spatial raster data and adds the data to each corresponding climate station. During our modeling steps this climate station csv-file will be used to generate Spatial Points which then will be used to train our models. ``` Reminder: You can check your data at any step, just go into your Workflow-Folder of the Project directory you defined in the beginning with `envi.create` and take a look into this data. Just make sure to not alter that data, as this may cause climodr to nut run following functions correctly. ``` ```{r finalize csv} csv_fin <- fin.csv(envrmt = envrmt, method = "monthly", save_output = TRUE) head(csv_fin) ``` Now the data is ready for further modelling. ![Pre-processed data created by climodr](Preprocessing.png) # Processing In this step the spatial raster data and the climate station data is ready to use. If your data isn't, check out the 'Pre-Processing' chapter. ## Test for Autocorrelation First, one tests the data for autocorrelation. The evaluation vector contains all columns with the sensor data and predictor variables, which will be tested for autocorrelation. It creates the first outputs in the package with one tabular-file per sensor, which contains all columns which should be excluded from the modulation because they autocorrelate. It also creates some visualization for the user of the autocorrelation, if `plot.corrplot` is set to *TRUE*.\ Note': The visualization is quite messy, when there are a lot of predictors. Maybe make it prettier in future. ```{r autocorr, warning = FALSE} autocorr( envrmt = envrmt, method = "monthly", resp = 5, pred = c(8:23), plot.corrplot = TRUE, corrplot = "coef" ) ``` ## Create climate models from spatial station data ![Processing steps during model workflow](Processing.png) Now, that we've done all necessary previous steps, one can start modelling. This is by far the function with the most variables. Here we give a quick overview, what these arguments do. For more detailed information take a look into the associated paper for this package at **(to be published)**. `timespan` = Vector with last two digits of years to build models from (in this example 2017)\ `climresp` = Vector of rows to create models for. (In this example Ta_200)\ `classifier` = Vector of all model variants to be used. In this case:\ \ - random forest = "rf" \ - partial-least-squares = "pls" \ - neural networks = "nnet" \ - linear regression = "lm" \ `seed` = Number to "pick randomness". With the seed one can reproduce random pulls.\ `p` = Fraction of random taken training data from full data.\ `folds` = Character or vector. Method to Create spacetime folds. "all" ,"LLO", "LTO" or "LLTO".\ `mnote` = Character. "Model Note". 6 digits, Marks the different model runs in a project.\ `predrows` = Vector with the row numbers used as predictors.\ `tc_method` = Train control method. Default is cross validation "cv".\ `metric` = Summary Metric to select optimal model. Default: Root Mean Square Error "RMSE".\ `autocorrelation` = Logical Parameter. TRUE, if the results of the autocorrelation should be considered.\ `doParallel` = Logical Parameter. When set True, the Model-Process will parallelize on all cores except two, so your PC will slow down a lot. Only recommended for PCs with at least 8 Cores. Warning: Your PC wont be able to process other stuff efficiently during parallelization.\ Once all parameters are set, one can run the model workflow `calc.model` like this: ```{r model, warning = FALSE} calc.model( envrmt = envrmt, method = "monthly", timespan = c(2017), climresp = c(5), classifier = c( "rf", "pls", "lm"), seed = 707, p = 0.8, folds = "LLO", mnote = "vignette", predrows = c(8:23), tc_method = "cv", metric = "RMSE", autocorrelation = TRUE, doParallel = FALSE) ``` Congratulations, you have created your first models using climodr!\ \ Climodr also creates an evaluation data frame, which is saved in the statistics folder. These performance information are later used to predict the best models from this model run. ## Predictions Further we can predict the scenes from our example with the climate models from our spatial station data using the `climpred` function. `climpred` also calculates the area of applicability, if the `AOA` argument is set to *TRUE*. In this example the AOA will create a lot of dissimilaritys, because the sample data is dummy data and far away from the reality of the new data. Keep in mind to use the same `mnote` as in your model run, so climodr predicts with the models you created in this modelrun.\ ```{r predict} climpred( envrmt = envrmt, method = "monthly", mnote = "vignette", AOA = TRUE) ``` Lets show the list of predictions: ```{r list predictions} predlist <- list.files(envrmt$path_predictions, pattern = ".tif", recursive = TRUE) head(predlist) ``` For easier navigation and search the names of the predictions, as well as all other names created by using climodr, follow a pattern. They always consist of: The *Mnote* - the *Sensor Name* - the *Date* of the Scene - the *Folds* used during the model run - the *model classifier* - and the word *prediction*. # Plotting The `climplot` function finally plots your predictions and saves them in the output folder. It uses the plotting functions from the `terra`-package. These plots are very simple, but they consist of all important information you'll need in a map. You can create plots of your predictions like this: ```{r plot predictions} climplot( envrmt = envrmt, mnote = "vignette", sensor = "Ta_200", aoa = TRUE, mapcolors = rev(heat.colors(50)), scale_position = "bottomleft", north_position = "topright" ) ``` So in the end you receive a map like the one we saw in the beginning. ![Temperature Map produced with climodr](map_vignette_2.png)