--- title: "Plotting pedigrees with ggPedigree()" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Plotting pedigrees with ggPedigree()} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5.6, fig.height = 4 ) ``` # Introduction This vignette demonstrates pedigree visualization with `ggPedigree()` from `ggpedigree`. Pedigree plots are central to fields like human genetics, behavioral science, and genealogy, where understanding lineage and familial relationships informs diagnosis, analysis, and interpretation. `ggPedigree()` builds on `ggplot2` and `kinship2`, leveraging modular helpers from both `{BGmisc}` and `{ggpedigree}` to streamline plot generation while retaining full aesthetic flexibility. Unlike base R pedigree tools, this package returns a `ggplot2` object, meaning users can customize their output using familiar `ggplot2` syntax, themes, scales, and layering. Throughout this vignette, we walk through basic usage, layout control, status overlays, aesthetic customization, and faceting for multifamily visualization. We use two bundled example datasets (from `BGmisc`) to illustrate the package's capabilities. The first dataset, `potter`, contains a fictional wizarding family tree, while the second dataset, `hazard`, includes a multigenerational historical pedigree with affected/unaffected status. # Basic usage We begin by loading the required libraries. These include the main plotting package `ggpedigree`, helper utilities from `BGmisc`, and supporting tools for aesthetic and data manipulation. ```{r libraries, message=FALSE, warning=FALSE} library(ggpedigree) # ggPedigree lives here library(BGmisc) # helper utilities & example data library(ggplot2) # ggplot2 for plotting library(viridis) # viridis for color palettes library(tidyverse) # for data wrangling ``` The simplest usage requires a data frame and column names for family and individual IDs. Here’s a basic pedigree plot using the `potter` data: ```{r load-data, include=FALSE} # if you don't have the most recent version of BGmisc, you may need to install it first as a stop-gap I've added the data loading here data("potter") # load example data from BGmisc # if (!"twinID" %in% names(potter)) { # Add twinID and zygosity columns for demonstration purposes potter <- potter %>% mutate( twinID = case_when( name == "Fred Weasley" ~ 13, name == "George Weasley" ~ 12, TRUE ~ NA_real_ ), zygosity = case_when( name == "Fred Weasley" ~ "mz", name == "George Weasley" ~ "mz", TRUE ~ NA_character_ ) ) # } ``` ```{r basic-usage} ggPedigree(potter, famID = "famID", personID = "personID" ) ``` Behind the scenes, `ggPedigree()`: 1. reshapes the data by family (`ped2fam()`), 2. recodes sex (e.g., 0/1/NA) into semantic labels via `recodeSex()` 3. extracts a layout grid with `calculateCoordinates()` 4. builds connection segments for spouses, siblings, parents, and offspring with `calculateConnections()` The result is returned as a `ggplot2` object, which allows immediate post-processing with standard `ggplot` syntax. # Customizing Aesthetics Most appearance settings are managed through a `config` list, which allows for fine-tuned customization. If any options are omitted, they fall back to predefined defaults. For example, we can turn off sex-based fill color and assign specific colors to connection segments, while also changing the shapes of individuals in the pedigree: ```{r customize-aesthetics} ggPedigree( potter, famID = "famID", personID = "personID", config = list( code_male = 1, # Here, 1 = male, 0 = female sex_color_include = FALSE, point_size = 5, outline_multiplier = 1.15, line_width = 2.5, sex_shape_female = "💸", sex_shape_male = "🖤", segment_spouse_color = viridis_pal()(5)[1], segment_sibling_color = viridis_pal()(5)[2], segment_parent_color = viridis_pal()(5)[3], segment_offspring_color = viridis_pal()(5)[4], segment_mz_color = viridis_pal()(5)[5], # segment_linetype = 3, outline_include = TRUE, outline_color = "grey" # viridis_pal()(5)[5] ) ) ``` As with any `ggplot2` object, further theming and labeling can be applied using layers or themes. For instance: ```{r} ggPedigree(potter, famID = "famID", personID = "personID" ) + theme_bw(base_size = 12) ``` This can be particularly useful for matching house styles in academic publications or removing clutter for presentations. ## Labeling Individuals You can also label individuals in the pedigree using the `label_method` argument. The default is `geom_label_repel()`, which uses the `ggrepel` package to avoid overlapping labels. You can also use `geom_text()` for simpler labeling. Other configurable options include `label_col` for the label column, `label_text_angle` for text rotation, `label_nudge_y` for vertical adjustment, and `label_nudge_x` for horizontal adjustment. ```{r} ggPedigree( potter, famID = "famID", personID = "personID", config = list( label_col = "name", sex_color_palette = c("pink", "blue"), label_text_angle = -45, label_nudge_y = -.25, label_nudge_x = 0.45, label_method = "geom_text", # sex_color_palette = c("black", "black"), sex_color_include = TRUE ) ) ``` # Overlaying Status Information In many applied settings, pedigrees include binary phenotypes such as affected/unaffected status. To incorporate this, use the `status_column` argument to point to a column containing the relevant variable. ```{r} data("hazard") p <- ggPedigree( hazard, famID = "famID", personID = "ID", status_column = "affected", config = list( code_male = 0, sex_color_include = TRUE, status_code_affected = TRUE, status_code_unaffected = FALSE, status_shape_affected = 4 ) ) p ``` The `status_column` argument allows you to specify a column that identifies affected/unaffected individuals. This column should be a factor or character vector; if not, `ggPedigree()` will coerce it. The behavior differs depending on whether `sex_color_include` is also enabled: * If `sex_color_include = TRUE`, affected individuals are overlaid with a shape (controlled by `status_shape_affected`), preserving the fill color by sex. * If `sex_color_include = FALSE`, affected and unaffected individuals are instead colored by status, using a default fill scale. You can control how values are interpreted via the `affected` and `unaffected` parameters in `config`. By default, `TRUE` is considered affected and `FALSE` unaffected. ```{r} ggPedigree( hazard, famID = "famID", personID = "ID", status_column = "affected", config = list( code_male = 0, sex_color_include = FALSE, status_code_affected = TRUE, status_code_unaffected = FALSE, status_label_affected = "Infected", status_label_unaffected = "Not infected", status_legend_title = "Status" ) ) ``` ## Highlighting Specific Individuals You can also highlight specific individuals in the pedigree through other approaches, such as by creating a new column in your data frame. For example, if you want to highlight Harry Potter and Dudley Dursley, you can create a new column called `proband` and set it to `TRUE` for those individuals. You can specify the `status_shape_affected` to be a star shape (8) and use the `status_column` argument to indicate the probands. ```{r} df <- potter df <- df %>% mutate(proband = ifelse(name %in% c( "Harry Potter", "Dudley Dursley" ), TRUE, FALSE)) ggPedigree( df, famID = "famID", personID = "personID", status_column = "proband", config = list( sex_color_include = TRUE, status_include = TRUE, status_code_affected = TRUE, status_code_unaffected = FALSE, status_shape_affected = 8 # "✨" # star shape ) ) ``` ## Highlighting Harry Potter Alternatively, you can use what I've called focal filling. The `focal_fill` argument allows you to highlight one person and their relatives without modifying the data frame. This is particularly useful when you want to emphasize a specific individual, such as Harry Potter, in the pedigree plot. The `focal_fill_personID` argument allows you to specify the person ID to highlight, and the `focal_fill_include` argument determines whether to include this highlighting in the plot. Other arguments like `focal_fill_method`, `focal_fill_n_breaks`, and `focal_fill_scale_midpoint` control the appearance of the highlighted area. Notably, you can select focal_fill_component to a specific component, such as "additive", which will highlight the additive genetic relatives of the focal person. ```{r focal_fill} ggPedigree(potter, famID = "famID", personID = "personID", config = list( focal_fill_personID = 7, focal_fill_include = TRUE, # focal_fill_high_color = "yellow", # focal_fill_mid_color = "red", # focal_fill_low_color = "#0D082AFF", focal_fill_force_zero = TRUE, focal_fill_na_value = "black", focal_fill_scale_midpoint = 0.25, focal_fill_component = "additive", focal_fill_method = "gradient", focal_fill_n_breaks = NULL, focal_fill_legend_title = "Genetic Relatives \nof Harry Potter", # "additive", sex_color_include = FALSE ) # highlight Harry Potter # config = list(segment_mz_color = NA) # color for monozygotic twins ) ``` Here we highlight Harry Potter (ID 7) and his relatives in the pedigree plot. Other components can be highlighted by changing the `focal_fill_component` argument. For example, you can set it to "mitochondrial" to highlight mitochondrial relatives, or "x" to highlight X-linked relatives. The `focal_fill_method` can be set to "gradient" for a smooth gradient effect or "steps" for distinct color breaks. ```{r focal_fill_mitochondrial} m1 <- ggPedigree(potter, famID = "famID", personID = "personID", config = list( focal_fill_personID = 7, focal_fill_include = TRUE, focal_fill_high_color = "green", # focal_fill_mid_color = "white", focal_fill_low_color = "black", focal_fill_scale_midpoint = 0.55, focal_fill_component = "mitochondrial", focal_fill_method = "steps", focal_fill_n_breaks = 19, focal_fill_legend_show = FALSE, focal_fill_legend_title = "Mitochondrial Relatives \nof Harry Potter", sex_color_include = FALSE ) # highlight Harry Potter # config = list(segment_mz_color = NA) # color for monozygotic twins ) + ggplot2::guides(shape = "none") m2 <- ggPedigree(potter, famID = "famID", personID = "personID", config = list( focal_fill_personID = 8, focal_fill_include = TRUE, focal_fill_high_color = "orange", # focal_fill_mid_color = "white", focal_fill_low_color = "black", focal_fill_scale_midpoint = 0.55, focal_fill_component = "mitochondrial", focal_fill_method = "steps", focal_fill_n_breaks = 19, focal_fill_legend_show = FALSE, focal_fill_legend_title = "Mitochondrial Relatives \nof Ginny Weasley", sex_color_include = FALSE ) # highlight Harry Potter # config = list(segment_mz_color = NA) # color for monozygotic twins ) + ggplot2::guides(shape = "none") library(patchwork) # for combining plots m1 + m2 + plot_layout(ncol = 2) + plot_annotation(title = "Mitochondrial Relatives of Harry Potter and Ginny Weasley") ``` ```{r focal_fill_matID} ggPedigree(potter, famID = "famID", personID = "personID", config = list( focal_fill_personID = 8, focal_fill_include = TRUE, # focal_fill_mid_color = "white", focal_fill_low_color = "black", focal_fill_scale_midpoint = 0.55, focal_fill_component = "matID", focal_fill_method = "viridis_d", focal_fill_viridis_option = "turbo", focal_fill_n_breaks = 19, focal_fill_legend_show = FALSE, focal_fill_legend_title = "Mitochondrial Relatives", sex_color_include = FALSE, overlay_include = FALSE ) # highlight Harry Potter # config = list(segment_mz_color = NA) # color for monozygotic twins ) + ggplot2::guides(shape = "none") ``` # Multiple families in one graphic If you have multiple families in your data frame, you can use the `facet_wrap()` function from `ggplot2` to create separate plots for each family. This is useful if you want to compare the pedigrees of different families side by side. Note that you can use `scales = "free_x"` to allow each family to have its own x-axis scale. ```{r facet_wrap} p + facet_wrap(~famID, scales = "free_x") ``` # Adjusting Plot Appearance The layout appearance can be refined using `theme()` elements. This doesn't change the pedigree structure but helps declutter the visualization. ```{r} p + theme_bw(base_size = 12) + theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black"), axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks.x = element_blank(), axis.ticks.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank() ) + scale_color_viridis( option = "mako", discrete = TRUE, labels = c("Female", "Male", "Unknown") ) ``` # Advanced Examples: Large Pedigrees with self-loops Additional examples are provided as articles that demonstrate more complex pedigree plots. You can find these on the [ggpedigree website](https://r-computing-lab.github.io/ggpedigree/). These articles cover advanced topics such as: multiple families, self-loops, and other advanced features.