JohnsonKinaseData 0.99.0
Johnson et al. (Johnson et al. 2023) published for 303 human serine/threonine specific kinases substrate affinities in the form of position-specific weight matrices (PWMs). The JohnsonKinaseData package provides access to these PWMs including basic functionality to match user-provided phosphosites against all kinase PWMs. The aim is to give the user a simple way of predicting kinase-substrate relationships based on PWM-phosphosite matching. These predictions can serve to infer kinase activity from differential phospho-proteomic data.
The JohnsonKinaseData package can be install using the following code:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ExperimentHub")
BiocManager::install("JohnsonKinaseData")
The kinase PWMs can be accessed with the getKinasePWM()
function. It returns a list with 303 human serine/threonine specific PWMs.
library(JohnsonKinaseData)
pwms <- getKinasePWM()
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
head(names(pwms))
#> [1] "AAK1" "ACVR2A" "ACVR2B" "AKT1" "AKT2" "AKT3"
Each PWM is a numeric matrix with amino acids as rows and positions as columns. Matrix elements are log2-odd scores measuring differential affinity relative to a random frequency of amino acids (Johnson et al. 2023).
pwms[["PLK2"]]
#> -5 -4 -3 -2 -1 0
#> A -0.036821844 -0.277009455 -0.83856373 -0.4463446 -0.186229068 NA
#> C 0.009633819 -0.034899138 -0.24690897 0.4799548 -0.467333943 NA
#> D 0.549718451 0.795766948 0.82130204 1.6459783 1.329410671 NA
#> E 0.614756952 1.127897364 2.86862751 1.2354207 0.689388627 NA
#> F 0.449006639 0.078199920 -0.41273103 -0.9773836 -0.602963759 NA
#> G 0.326652391 -0.151522275 -0.77793738 -0.6106535 -0.767584829 NA
#> H 0.148478616 -0.172018427 -0.67807191 -0.3219281 0.214995135 NA
#> I -0.311864412 -0.172018427 -1.65154094 -0.8406292 -0.519941731 NA
#> K -0.469329925 -0.647467443 -1.77349147 -1.7345631 -0.656307931 NA
#> L -0.245197993 0.144568518 -0.71785677 0.3032255 -0.511690664 NA
#> M -0.248793390 -0.206894852 -0.38948891 0.3123167 -0.194955239 NA
#> N -0.065823218 0.002018361 -0.54077824 0.9076598 0.307545102 NA
#> P -0.066578437 -0.108114249 -1.05139915 -0.4418303 0.542703792 NA
#> Q -0.530739153 -0.241782116 -0.48096139 -0.1800049 -0.264477823 NA
#> R -0.528032212 -0.715485867 -1.58640592 -1.1059389 -0.339345148 NA
#> S -0.065823218 -0.172018427 -0.77793738 -0.4463446 -0.194955239 0.00000000
#> T -0.065823218 -0.172018427 -0.77793738 -0.4463446 -0.194955239 -0.09585422
#> V -0.401253684 -0.367545642 -1.89324968 -1.3562361 -0.152804813 NA
#> W -0.034160317 -0.140189435 -1.05799229 -1.1256358 -1.093879047 NA
#> Y 0.083383588 -0.242293983 -1.12217724 -0.5640514 -0.004045212 NA
#> s 0.059632160 0.750692249 0.06873959 0.1075540 0.101650076 NA
#> t 0.059632160 0.750692249 0.06873959 0.1075540 0.101650076 NA
#> y 0.707878133 0.679784089 0.26351522 -0.1321035 2.184534212 NA
#> 1 2 3 4
#> A -0.812485602 -0.109981413 -0.53574997 -0.33515312
#> C -0.310253562 0.145612247 0.00000000 0.04362448
#> D -0.942307133 1.124791311 1.17957474 0.98389654
#> E -0.201410261 1.154194325 1.37389873 1.13638828
#> F 1.906390375 -0.122334266 -0.21541226 -0.12610808
#> G -0.918660373 -0.888701547 -0.30329392 -0.24827921
#> H -0.671163536 -0.002165667 -0.13020754 -0.01785518
#> I 0.374065718 -0.042308229 -0.25963366 -0.03785821
#> K -1.145924538 -2.141143704 -1.48196851 -1.17755536
#> L 0.032665112 -0.500013836 -0.19379970 -0.02664588
#> M 0.833902077 0.008200014 -0.23463499 -0.20273795
#> N -0.818579360 -0.015082595 0.07710624 -0.20706138
#> P -2.650181828 -0.911044318 -0.71667083 0.10218779
#> Q 0.266756562 -0.411003598 -0.01873185 -0.18852897
#> R -0.532824877 -1.190338611 -1.33715648 -1.18082233
#> S -0.532824877 -0.109981413 -0.21541226 -0.12610808
#> T -0.532824877 -0.109981413 -0.21541226 -0.12610808
#> V -0.008682243 -0.249993850 -0.38571419 -0.85152138
#> W -0.550465037 0.385154897 0.11769504 0.30836088
#> Y 0.360757558 0.526569660 0.07546417 -0.04751733
#> s 0.412402175 1.196984664 1.25574242 1.70655265
#> t 0.412402175 1.196984664 1.25574242 1.70655265
#> y 0.490467444 3.461305904 1.53012070 1.85199884
Beside the 20 standard amino acids, also phosphorylated serine, threonine and tyrosine residues are included. These phosphorylated residues are distinct from the central phospho-acceptor (serine/threonine at position 0
) and can have a strong impact on the affinity of a given kinase-substrate pair (phospho-priming).
The central phospho-acceptor site is located at position 0
and only measures the favorability of serine over threonine. The user can exclude this favorability measure by setting the parameter includeSTfavorability
to FALSE
, in which case the central position doesn’t contribute to the PWM score.
pwms2 <- getKinasePWM(includeSTfavorability=FALSE)
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> loading from cache
Phosphorylated peptides are often represented in two different formats: (1) the phosphorylated residues are indicated by an asterix as in SAGLLS*DEDC
. Alternatively, phosphorylated residues are given by lower case letters as in SAGLLsDEDC
. In order to unify the phosophosite representation for PWM matching, JohnsonKinaseData provides the function processPhosphopeptides()
. It takes a character vector with phospho-peptides, aligns them to the central phospho-acceptor position and pads and/or truncates the surrounding residues, such that the processed site consists of 5 upstream residues, a central acceptor and 4 downstream residues. The central phospho-acceptor position is defined as the left closest position to the midpoint of the peptide given by floor(nchar(sites)/2)+1
.
ppeps <- c("SAGLLS*DEDC", "GDS*ND", "EKGDSN__", "___LySDEDC", "EKGtS*N")
sites <- processPhosphopeptides(ppeps)
#> Warning in processPhosphopeptides(ppeps): No S/T at central phospho-acceptor
#> position.
sites
#> # A tibble: 5 × 3
#> sites processed acceptor
#> <chr> <chr> <chr>
#> 1 SAGLLS*DEDC SAGLLSDEDC S
#> 2 GDS*ND ___GDSND__ S
#> 3 EKGDSN__ _EKGDSN___ S
#> 4 ___LySDEDC ____LYSDED Y
#> 5 EKGtS*N __EKGTsN__ T
If a peptide contains several phosphorylated residues, option onlyCentralAcceptor
controls how to select the acceptor position. Setting onlyCentralAcceptor=FALSE
will return all possible aligned phosphosites for a given input peptide. Note that in this case the output is not parallel to the input.
processPhosphopeptides("KART*LLS*DEC")
#> # A tibble: 1 × 3
#> sites processed acceptor
#> <chr> <chr> <chr>
#> 1 KART*LLS*DEC ARtLLSDEC_ S
processPhosphopeptides("KART*LLS*DEC", onlyCentralAcceptor=FALSE)
#> # A tibble: 2 × 3
#> sites processed acceptor
#> <chr> <chr> <chr>
#> 1 KART*LLS*DEC __KARTLLsD T
#> 2 KART*LLS*DEC ARtLLSDEC_ S
Once peptides are processed to sites, the function scorePhosphosites()
can be used to create a matrix of kinase-substrate match scores.
selected <- sites |>
dplyr::filter(acceptor %in% c('S','T')) |>
dplyr::pull(processed)
scores <- scorePhosphosites(pwms, selected)
dim(scores)
#> [1] 4 303
head(scores[,1:5])
#> AAK1 ACVR2A ACVR2B AKT1 AKT2
#> SAGLLSDEDC -6.794078 -0.1666423 0.3039018 -5.8821117 -4.7783302
#> ___GDSND__ -8.107231 -1.0652463 -0.6211398 -2.2011502 -1.7940957
#> _EKGDSN___ -8.274386 -1.5402977 -0.9296051 -0.6188352 -0.8554523
#> __EKGTsN__ -2.159839 0.7307256 0.8912120 -2.7357203 -1.2022251
The PWM scoring can be parallelized by supplying a BiocParallelParam
object to BPPARAM=
.
scores <- scorePhosphosites(pwms, selected, BPPARAM=BiocParallel::SerialParam())
By default, the resulting score is the log2-odds score of the PWM. Alternatively, by setting scoreType="percentile"
, a percentile rank of the log2-odds score is calculated, using for each PWM a background score distribution which is derived by matching each PWM to the 85’603 unique phosphosites published in Johnson et al. 2023.
scores <- scorePhosphosites(pwms, selected, scoreType="percentile")
#> see ?JohnsonKinaseData and browseVignettes('JohnsonKinaseData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
head(scores[,1:5])
#> AAK1 ACVR2A ACVR2B AKT1 AKT2
#> SAGLLSDEDC 22.375586 79.73910 83.79933 14.73447 14.59609
#> ___GDSND__ 9.030272 67.08203 74.19169 64.63912 64.38251
#> _EKGDSN___ 7.927565 57.36739 69.80942 79.14942 74.56646
#> __EKGTsN__ 83.891454 87.55389 88.32535 57.76235 71.31353
Quantifying PWM matches by percentile rank was first described in Jaffe et al. 2001 (???). It is also the matching score underlying the kinase activity predictions published in Johnson et al. 2023 (Johnson et al. 2023).
Note that these percentile ranks cannot not account for phospho-priming, as non-central phosphorylated residues were missing in the background sites published in Johnson et al. I.e. the score distributions derived from the background sites do not reflect the impact of phospho-priming.
sessionInfo()
#> R Under development (unstable) (2024-03-18 r86148)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] JohnsonKinaseData_0.99.0 BiocStyle_2.31.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.43.0 xfun_0.42 bslib_0.6.1
#> [4] Biobase_2.63.0 vctrs_0.6.5 tools_4.4.0
#> [7] generics_0.1.3 stats4_4.4.0 curl_5.2.1
#> [10] parallel_4.4.0 tibble_3.2.1 fansi_1.0.6
#> [13] AnnotationDbi_1.65.2 RSQLite_2.3.5 blob_1.2.4
#> [16] pkgconfig_2.0.3 checkmate_2.3.1 dbplyr_2.5.0
#> [19] S4Vectors_0.41.5 lifecycle_1.0.4 GenomeInfoDbData_1.2.11
#> [22] stringr_1.5.1 compiler_4.4.0 Biostrings_2.71.4
#> [25] codetools_0.2-19 GenomeInfoDb_1.39.9 htmltools_0.5.7
#> [28] sass_0.4.9 yaml_2.3.8 tidyr_1.3.1
#> [31] pillar_1.9.0 crayon_1.5.2 jquerylib_0.1.4
#> [34] BiocParallel_1.37.1 cachem_1.0.8 mime_0.12
#> [37] ExperimentHub_2.11.1 AnnotationHub_3.11.3 tidyselect_1.2.1
#> [40] digest_0.6.35 stringi_1.8.3 purrr_1.0.2
#> [43] dplyr_1.1.4 bookdown_0.38 BiocVersion_3.19.1
#> [46] fastmap_1.1.1 cli_3.6.2 magrittr_2.0.3
#> [49] utf8_1.2.4 withr_3.0.0 backports_1.4.1
#> [52] filelock_1.0.3 rappdirs_0.3.3 bit64_4.0.5
#> [55] rmarkdown_2.26 XVector_0.43.1 httr_1.4.7
#> [58] bit_4.0.5 png_0.1-8 memoise_2.0.1
#> [61] evaluate_0.23 knitr_1.45 IRanges_2.37.1
#> [64] BiocFileCache_2.11.1 rlang_1.1.3 glue_1.7.0
#> [67] DBI_1.2.2 BiocManager_1.30.22 BiocGenerics_0.49.1
#> [70] jsonlite_1.8.8 R6_2.5.1 zlibbioc_1.49.3
Johnson, Jared L., Tomer M. Yaron, Emily M. Huntsman, Alexander Kerelsky, Junho Song, Amit Regev, Ting-Yu Lin, et al. 2023. “An Atlas of Substrate Specificities for the Human Serine/Threonine Kinome.” Journal Article. Nature 613 (7945): 759–66. https://doi.org/10.1038/s41586-022-05575-3.