library(PnT)
library(plotly)
mylar <- readRDS('../inst/extdata/mylar_data.rds')This data represents a spectrum of light released by analyzing the elements in a sample. The first peak represents hydrogen in the sample, and the second peak represents other elements.
plot_ly(mylar, x = ~Ch, y = ~CPS,
type = 'scatter', mode = 'lines')To find these peaks, we can just use the following function:
find_peaks(mylar, 'Ch', 'CPS')
#> # A tibble: 2 × 2
#> X Y
#> <int> <dbl>
#> 1 217 1352
#> 2 323 3000Next, let’s see how to find peaks in more noisy data
X <- seq(0, 10, 0.01)
Y <- sin(X) + runif(length(X), min=-0.1, max=0.1)
data <- data.frame(X=X, Y=Y)
plot_ly(data, x = ~X, y = ~Y,
type = 'scatter', mode = 'lines')To find the peaks, we can just use the same function
find_peaks(data, 'X', 'Y')
#> # A tibble: 2 × 2
#> X Y
#> <dbl> <dbl>
#> 1 1.55 1.10
#> 2 7.73 1.09Now, if we have even more noisy data, the peak finder might need better filter parameters.
Y <- Y + runif(length(X), min=-0.1, max=0.1)
data <- data.frame(X=X, Y=Y)
find_peaks(data, 'X', 'Y')
#> # A tibble: 38 × 2
#> X Y
#> <dbl> <dbl>
#> 1 0.31 0.435
#> 2 0.71 0.830
#> 3 0.82 0.880
#> 4 1.26 1.07
#> 5 1.33 1.11
#> 6 1.42 1.13
#> 7 1.65 1.17
#> 8 1.9 1.08
#> 9 1.99 1.02
#> 10 2.22 0.902
#> # ℹ 28 more rowsThis data has too much noise for the filter. So, we can increase the amount of noise blocked by the filter using the parameter minYerror. This parameter has default value 0.1.
find_peaks(data, 'X', 'Y', minYerror=0.2)
#> # A tibble: 2 × 2
#> X Y
#> <dbl> <dbl>
#> 1 1.65 1.17
#> 2 7.65 1.14Another possibility is that we only want the thin, sharp peaks.
Y <- (7/(5+(X-5)^2)) + (0.05/(0.02+(X-8)^2)) + (0.03/(0.03+(X-1)^2))
data <- data.frame(X=X, Y=Y)
plot_ly(data, x = ~X, y = ~Y,
type = 'scatter', mode = 'lines')For example, say we only want the sharp peaks (the first and last), not the wide peak in the middle.
find_peaks(data, 'X', 'Y')
#> # A tibble: 3 × 2
#> X Y
#> <dbl> <dbl>
#> 1 1 1.33
#> 2 5 1.41
#> 3 8 3.00The usual function call does not only get those peaks. Instead, we use the parameter minSlope, which controls a filter testing the slope around peaks relative to the dimensions of the data. This argument is 0 by default. Increasing minSlope allows us to filter out smooth peaks in favor of sharp ones.
find_peaks(data, 'X', 'Y', minSlope = 1)
#> # A tibble: 2 × 2
#> X Y
#> <dbl> <dbl>
#> 1 1 1.33
#> 2 8 3.00If, instead, the outer peaks are just edge effects, and we only want the middle peak, we can use edgeFilter. In this case, setting edgeFilter to 0.25 filters out the first and last quarter (0.25) of the data.
find_peaks(data, 'X', 'Y', edgeFilter = 0.25)
#> # A tibble: 1 × 2
#> X Y
#> <dbl> <dbl>
#> 1 5 1.41logy <- function(figure){
layout(figure, yaxis=list(type="log"))
}Here, we see some real world examples where this package can be used.
Here, we have a data set to analyze: air quality data from a school.
fig <- plot_ly(school_data, x=~Energy, y=~Counts, type="scatter", mode="lines"); figTo find the prominent peaks, we can use the basic function call:
find_peaks(school_data, 'Energy', 'Counts')
#> # A tibble: 6 × 2
#> X Y
#> <dbl> <dbl>
#> 1 -0.0022 10642
#> 2 0.278 12839
#> 3 0.528 7317
#> 4 1.03 1748
#> 5 1.49 2128
#> 6 1.75 5361However, if we plot the data using a log plot, we see that many of the other peaks are not noise.
logy(fig)The main example of this is the peak at x = 6.4 keV. In the previous
plot, this peak was barely visible, but in this one, it is obviously not
just noise. By hovering over the peak with your mouse, you can see that
this peak has a height of about 60 from the rest of the graph. Using the
argument asFraction in conjunction with
minYerror, you can make sure this is the smallest peak
found.
find_peaks(school_data, 'Energy', 'Counts', asFraction = FALSE, minYerror = 60)
#> # A tibble: 13 × 2
#> X Y
#> <dbl> <dbl>
#> 1 -0.0022 10642
#> 2 0.278 12839
#> 3 0.528 7317
#> 4 0.718 595
#> 5 1.03 1748
#> 6 1.26 785
#> 7 1.49 2128
#> 8 1.75 5361
#> 9 2.31 494
#> 10 2.63 208
#> 11 3.31 362
#> 12 3.69 242
#> 13 6.40 81In this example, setting asFraction = FALSE allows you
to input the height of 60 you measured for the peak directly into the
parameter minYerror without having to divide it by the
total height of the data.
fig <- plot_ly(jadeite_control, x=~Ch, y=~CPS, type="scatter", mode="lines"); figAgain, we need to use a log plot to see the smaller peaks clearly.
logy(fig)We can see that the small peak at x = 1000 has a height of about 400. So, we can try what we did before to find the peaks of the graph.
find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 400)
#> # A tibble: 19 × 2
#> X Y
#> <dbl> <dbl>
#> 1 129 32702
#> 2 155 45298
#> 3 184 382405
#> 4 214 1048170
#> 5 285 7100
#> 6 323 8754
#> 7 366 9347
#> 8 403 27282
#> 9 425 24060
#> 10 452 58156
#> 11 491 12029
#> 12 551 9306
#> 13 608 4596
#> 14 662 10285
#> 15 722 3632
#> 16 781 26212
#> 17 862 4895
#> 18 915 1601
#> 19 990 1462However, what if the peaks at 285, 323, and 366 are actually noise? Say you want to filter out these peaks, and other small peaks in the area where most of the peaks are larger. In this case, we have to break up the peak finding into two separate ranges. First, we find the peaks in the range where the peaks are smaller:
smallerPeaks <- find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 400, ROI = c(500, 1050))
smallerPeaks
#> # A tibble: 8 × 2
#> X Y
#> <dbl> <dbl>
#> 1 551 9306
#> 2 608 4596
#> 3 662 10285
#> 4 722 3632
#> 5 781 26212
#> 6 862 4895
#> 7 915 1601
#> 8 990 1462Now, we can find the peaks in the other range:
largerPeaks <- find_peaks(jadeite_control, 'Ch', 'CPS', asFraction = FALSE, minYerror = 10000, ROI = c(0, 500))
largerPeaks
#> # A tibble: 5 × 2
#> X Y
#> <dbl> <dbl>
#> 1 129 32702
#> 2 155 45298
#> 3 184 382405
#> 4 214 1048170
#> 5 452 58156Here, the ROI argument determines the region searched by
the peak finder. If we want to combine these peaks into one data frame,
we can just use bind_rows.
dplyr::bind_rows(largerPeaks, smallerPeaks)
#> # A tibble: 13 × 2
#> X Y
#> <dbl> <dbl>
#> 1 129 32702
#> 2 155 45298
#> 3 184 382405
#> 4 214 1048170
#> 5 452 58156
#> 6 551 9306
#> 7 608 4596
#> 8 662 10285
#> 9 722 3632
#> 10 781 26212
#> 11 862 4895
#> 12 915 1601
#> 13 990 1462