Built on the top of data.tree, a
Node (tree) is an R6 object that is especially useful when we are facing
hierarchical data. The forestry package helps
to reshape or create tree objects. This package is a series of utility
functions to help with nested data. Since data.tree has
the capability to convert a tree to JSON using toJSON()
after converting to a list using as.list(), the
forestry package is particularly useful when creating a
specific JSON object for building htmlwidgets. The
forestry package aims to reshape or create tree objects
with a specific format.
create_nodes() creates a Node object.
tree_name is to assign the name of this Node.
add_children_count is to assign the number of children to
this Node, it will be listed in numerical order. To assign values to
each node, simply put the appropriate variable as a parameter with a
vector containing the values. The name of the parameter will be the
variable name and the values in the vector will be assigned to each node
respectively.
library(data.tree)
library(forestry)
new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C") )
print(new_node, "class")#>   levelName class
#> 1     tree1      
#> 2      ¦--1     A
#> 3      ¦--2     B
#> 4      °--3     CThe fill_NA_level() function will fill missing values
across the desired level with desired value (default as 0). For example,
new_node is a tree with missing value in hc field.
new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         hc = c(1, 2, NA))
print(new_node, "hc" )#>   levelName hc
#> 1     tree1 NA
#> 2      ¦--1  1
#> 3      ¦--2  2
#> 4      °--3 NAWe apply fill_NA_level() to new_node,
simply put new_node as input_node, assign the
field_name with hc, and assign
by_level = 2, we will fill the NA in hc field
with 0 across level 2.
result <- fill_NA_level(input_node = new_node, 
                        field_name = "hc", 
                        by_level = 2, 
                        fill_with = 0)
print(result, "hc")#>   levelName hc
#> 1     tree1 NA
#> 2      ¦--1  1
#> 3      ¦--2  2
#> 4      °--3  0create_tree() creates a new tree from a list. It appends
each item of the input list as a numbered child in the new tree. This is
useful when we convert a Node to a JSON array.
For instance, let’s use test_node$children (a list) as
an example. We can see a list of groupA, groupB and groupC.
#> $groupA
#>    levelName
#> 1 groupA    
#> 2  ¦--Male  
#> 3  °--Female
#> 
#> $groupB
#>    levelName
#> 1 groupB    
#> 2  ¦--Male  
#> 3  °--Female
#> 
#> $groupC
#>    levelName
#> 1 groupC    
#> 2  ¦--Male  
#> 3  °--FemaleNow we see that this list is reshaped into a list, new_tree,
with each item in test_node$children added as a child. The
index of each item in the list is assigned as the name of each
child.
library(data.tree)
test_node <- as.Node(test_df)
new_shape <- create_tree(test_node$children,"new_tree")
print(new_shape, "hc")#>             levelName hc
#> 1  new_tree           NA
#> 2   ¦--1              NA
#> 3   ¦   °--groupA     NA
#> 4   ¦       ¦--Male   80
#> 5   ¦       °--Female 97
#> 6   ¦--2              NA
#> 7   ¦   °--groupB     NA
#> 8   ¦       ¦--Male   44
#> 9   ¦       °--Female 37
#> 10  °--3              NA
#> 11      °--groupC     NA
#> 12          ¦--Male   81
#> 13          °--Female 46fix_items() creates a tree with fixed children nodes
from another tree. It automatically copies fields to the tree and fills
missing values with NA. Similar to left joining to a tree
with certian children nodes.
This function is to make sure the tree has the desired children nodes.
See cell_node2, it has only B and C.
cell_node2 <- Node$new("cell2")
cell_node2$AddChild("B")
cell_node2$AddChild("C")
cell_node2$Set(class = c(NA, "B1", "C1"))
print(cell_node2, "class")#>   levelName class
#> 1     cell2      
#> 2      ¦--B    B1
#> 3      °--C    C1Now we put fix_vector = c("A", "B", "C", "D") and assign
to a new tree, cell_fixed_items. We can see that
cell_fixed_items has all of the nodes from
fix_vector and still inherits the fields from
cell_node2.
cell_fixed_items <- fix_items(fix_vector = c("A", "B", "C", "D"), 
                              input_node = cell_node2)
print(cell_fixed_items, "class")#>   levelName class
#> 1     cell2      
#> 2      ¦--A      
#> 3      ¦--B    B1
#> 4      ¦--C    C1
#> 5      °--Dchildren_sort() function sorts the children nodes into a
desired order. If there are children nodes not listed in the
input_order, we can set the mismatch_last
parameter (default is T) to put the mismatched children
nodes to the top or bottom.
data(test_df)
test_node <- data.tree::as.Node(test_df)
sorted_node <- children_sort(
  input_node = test_node, 
  input_order = c("groupB", "groupA"),
  mismatch_last = T)
print(sorted_node)#>         levelName
#> 1  tree1         
#> 2   ¦--groupB    
#> 3   ¦   ¦--Male  
#> 4   ¦   °--Female
#> 5   ¦--groupA    
#> 6   ¦   ¦--Male  
#> 7   ¦   °--Female
#> 8   °--groupC    
#> 9       ¦--Male  
#> 10      °--Femalecumsum_across_level() gets the cumulative value across a
level, the cumulative value will be added to the
cumsum_number field.
In this example, it calculates the cumulative
exercise_time field across level 3.
data(exercise_df)
exercise_node <- as.Node(exercise_df)
test <- forestry::cumsum_across_level(input_node = exercise_node, 
                              attri_name = "exercise_time", 
                              level_num = 3)
print(test, "cumsum_number", "exercise_time", "level")#>      levelName cumsum_number exercise_time level
#> 1  Year                   NA            NA     1
#> 2   ¦--Q1                 NA            NA     2
#> 3   ¦   ¦--Jan          0.83          0.83     3
#> 4   ¦   ¦--Feb          1.14          0.31     3
#> 5   ¦   °--Mar          1.98          0.84     3
#> 6   ¦--Q2                 NA            NA     2
#> 7   ¦   ¦--Apr          2.17          0.19     3
#> 8   ¦   ¦--May          2.18          0.01     3
#> 9   ¦   °--Jun          2.45          0.27     3
#> 10  ¦--Q3                 NA            NA     2
#> 11  ¦   ¦--Jul          2.56          0.11     3
#> 12  ¦   ¦--Aug          3.54          0.98     3
#> 13  ¦   °--Sep          4.30          0.76     3
#> 14  °--Q4                 NA            NA     2
#> 15      ¦--Oct          4.49          0.19     3
#> 16      ¦--Nov          5.25          0.76     3
#> 17      °--Dec          5.54          0.29     3In addition, level_num = "All" will get the cumulative
value across all levels. Please note that there should be no missing
values in the appropriate level when we apply
cumsum_across_level().
data(exercise_df)
exercise_node <- as.Node(exercise_df)
exercise_node$Do(function(node) node$exercise_time <- Aggregate(node, 
                                                   attribute = "exercise_time", 
                                                   aggFun = sum), 
             traversal = "post-order")
print(exercise_node,  "exercise_time")#>      levelName exercise_time
#> 1  Year                 5.54
#> 2   ¦--Q1               1.98
#> 3   ¦   ¦--Jan          0.83
#> 4   ¦   ¦--Feb          0.31
#> 5   ¦   °--Mar          0.84
#> 6   ¦--Q2               0.47
#> 7   ¦   ¦--Apr          0.19
#> 8   ¦   ¦--May          0.01
#> 9   ¦   °--Jun          0.27
#> 10  ¦--Q3               1.85
#> 11  ¦   ¦--Jul          0.11
#> 12  ¦   ¦--Aug          0.98
#> 13  ¦   °--Sep          0.76
#> 14  °--Q4               1.24
#> 15      ¦--Oct          0.19
#> 16      ¦--Nov          0.76
#> 17      °--Dec          0.29exercise_node_test <- cumsum_across_level(input_node = exercise_node, 
                              attri_name = "exercise_time", 
                              level_num = "All")
print(exercise_node_test,"exercise_time", "cumsum_number", "level")#>      levelName exercise_time cumsum_number level
#> 1  Year                 5.54            NA     1
#> 2   ¦--Q1               1.98          1.98     2
#> 3   ¦   ¦--Jan          0.83          0.83     3
#> 4   ¦   ¦--Feb          0.31          1.14     3
#> 5   ¦   °--Mar          0.84          1.98     3
#> 6   ¦--Q2               0.47          2.45     2
#> 7   ¦   ¦--Apr          0.19          2.17     3
#> 8   ¦   ¦--May          0.01          2.18     3
#> 9   ¦   °--Jun          0.27          2.45     3
#> 10  ¦--Q3               1.85          4.30     2
#> 11  ¦   ¦--Jul          0.11          2.56     3
#> 12  ¦   ¦--Aug          0.98          3.54     3
#> 13  ¦   °--Sep          0.76          4.30     3
#> 14  °--Q4               1.24          5.54     2
#> 15      ¦--Oct          0.19          4.49     3
#> 16      ¦--Nov          0.76          5.25     3
#> 17      °--Dec          0.29          5.54     3The pre_get_array() function changes the numeric item
name in a list into a format that is compatible with the JSON array
standard. As mentioned earlier, when converting a tree to JSON, we need
to save the tree as a list using as.list() then use
htmlwidgets:::toJSON() to convert the list to JSON
data.
For example, new_node is a tree with numeric children
nodes.
new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C"))
print(as.list(new_node) )#> $name
#> [1] "tree1"
#> 
#> $`1`
#> $`1`$class
#> [1] "A"
#> 
#> 
#> $`2`
#> $`2`$class
#> [1] "B"
#> 
#> 
#> $`3`
#> $`3`$class
#> [1] "C"We can see the numeric children node names are listed. If we apply
pre_get_array() to this list, we can change all numeric
names so the nodes can be saved as a JSON array instead of JSON objects
after we use htmlwidgets:::toJSON().
new_node <- create_nodes(tree_name = "tree1", 
                         add_children_count = 3, 
                         class = c("A", "B", "C"))
print(pre_get_array(as.list(new_node) ) )#> [[1]]
#> [1] "tree1"
#> 
#> [[2]]
#> [[2]]$class
#> [1] "A"
#> 
#> 
#> [[3]]
#> [[3]]$class
#> [1] "B"
#> 
#> 
#> [[4]]
#> [[4]]$class
#> [1] "C"