Title: | Detecting Changes in Autocorrelated and Fluctuating Signals |
---|---|
Description: | Detect abrupt changes in time series with local fluctuations as a random walk process and autocorrelated noise as an AR(1) process. See Romano, G., Rigaill, G., Runge, V., Fearnhead, P. (2021) <doi:10.1080/01621459.2021.1909598>. |
Authors: | Gaetano Romano [aut, cre], Guillem Rigaill [aut], Vincent Runge [aut], Paul Fearnhead [aut] |
Maintainer: | Gaetano Romano <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.3.3 |
Built: | 2025-02-24 05:18:40 UTC |
Source: | https://github.com/gtromano/decafs |
iteration of the least square criterion for a grid of the phi parameter
bestParameters(y, nbK = 10, type = "MAD", sdEta = TRUE)
bestParameters(y, nbK = 10, type = "MAD", sdEta = TRUE)
y |
A time-series obtained by the dataRWAR function |
nbK |
number of diff k elements to consider |
type |
type of robust variance estimator (MAD, S or Q) |
sdEta |
if sdEta = FALSE there is no random walk |
a list with an estimation of the best parameters for Eta2, Nu2 and phi
bestParameters(dataRWAR(10000, sdEta = 0.2, sdNu = 0.1, phi = 0.3, type = "rand1", nbSeg = 10)$y)
bestParameters(dataRWAR(10000, sdEta = 0.2, sdNu = 0.1, phi = 0.3, type = "rand1", nbSeg = 10)$y)
the least-square value
cost(v, sdEta, sdNu, phi)
cost(v, sdEta, sdNu, phi)
v |
the estimated variances of the diff k operator |
sdEta |
standard deviation in Random Walk |
sdNu |
standard deviation in AR(1) |
phi |
the autocorrelative AR(1) parameter |
the value of the sum of squares
Generate a Realization from the RWAR model (check the references for further details).
where
and
dataRWAR( n = 1000, sdEta = 0, sdNu = 1, phi = 0, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
dataRWAR( n = 1000, sdEta = 0, sdNu = 1, phi = 0, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
n |
The length of the sequence of observations. |
sdEta |
The standard deviation of the Random Walk Component on the signal drift |
sdNu |
The standard deviation of the Autocorrelated noise |
phi |
The autocorrelation parameter |
type |
Possible change scenarios for the jump structure (default: |
nbSeg |
Number of segments |
jumpSize |
Maximum magnitude of a change |
A list containing:
y
the data sequence,
signal
the underlying signal without the superimposed AR(1) noise,
changepoints
the changepoint locations
Romano, G., Rigaill, G., Runge, V., Fearnhead, P. Detecting Abrupt Changes in the Presence of Local Fluctuations and Autocorrelated Noise. arXiv preprint https://arxiv.org/abs/2005.01379 (2020).
library(ggplot2) set.seed(42) Y = dataRWAR(n = 1e3, phi = .5, sdEta = 3, sdNu = 1, jumpSize = 15, type = "updown", nbSeg = 5) y = Y$y ggplot(data.frame(t = 1:length(y), y), aes(x = t, y = y)) + geom_point() + geom_vline(xintercept = Y$changepoints, col = 4, lty = 3)
library(ggplot2) set.seed(42) Y = dataRWAR(n = 1e3, phi = .5, sdEta = 3, sdNu = 1, jumpSize = 15, type = "updown", nbSeg = 5) y = Y$y ggplot(data.frame(t = 1:length(y), y), aes(x = t, y = y)) + geom_point() + geom_vline(xintercept = Y$changepoints, col = 4, lty = 3)
This function generates a sequence of observation from a sinusoidal model with changes. This can be used as an example for model misspecification.
dataSinusoidal( n, amplitude = 1, frequency = 0.001, phase = 0, sd = 1, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
dataSinusoidal( n, amplitude = 1, frequency = 0.001, phase = 0, sd = 1, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
n |
The length of the sequence of observations. |
amplitude |
The amplitude of the sinusoid |
frequency |
The angular frequency of the sinusoid |
phase |
where the signal starts at time t = 0 |
sd |
standard deviation of the noise added on top of the signal |
type |
Possible change scenarios for the jump structure (default: |
nbSeg |
Number of segments |
jumpSize |
Maximum magnitude of a change |
A list containing:
y
the data sequence,
signal
the underlying signal without the noise,
changepoints
the changepoint locations
Y <- dataSinusoidal( 1e4, frequency = 1 / 1e3, amplitude = 10, type = "updown", jumpSize = 4, nbSeg = 4 ) res <- DeCAFS(Y$y) plot(res, col = "grey") lines(Y$signal, col = "blue", lwd = 2, lty = 2) abline(v = res$changepoints, col = 2) abline(v = Y$changepoints, col = 4, lty = 2)
Y <- dataSinusoidal( 1e4, frequency = 1 / 1e3, amplitude = 10, type = "updown", jumpSize = 4, nbSeg = 4 ) res <- DeCAFS(Y$y) plot(res, col = "grey") lines(Y$signal, col = "blue", lwd = 2, lty = 2) abline(v = res$changepoints, col = 2) abline(v = Y$changepoints, col = 4, lty = 2)
This function implements the DeCAFS algorithm to detect abrupt changes in mean of a univariate data stream in the presence of local fluctuations and auto-correlated noise.
It detects the changes under a penalised likelihood model where the data, , is
with an AR(1) process, and for
where at time if we do not have a change then
and
; whereas if we have a change then
and
.
DeCAFS estimates the change by minimising a cost equal to twice the negative log-likelihood of this model, with a penalty
for adding a change.
Note that the default DeCAFS behavior will assume the RWAR model, but fit on edge cases is still possible. For instance, should the user wish for DeCAFS to fit an AR model only with a piecewise constant signal, or similarly a model that just assumes random fluctuations in the signal, this can be specified within the initial parameter estimation, by setting the argument:
modelParam = estimateParameters(y, model = "AR")
. Similarly, to allow for negative autocorrelation estimation, set modelParam = estimateParameters(Y$y, phiLower = -1)
.
DeCAFS( data, beta = 2 * log(length(data)), modelParam = estimateParameters(data, warningMessage = warningMessage), penalties = NULL, warningMessage = TRUE )
DeCAFS( data, beta = 2 * log(length(data)), modelParam = estimateParameters(data, warningMessage = warningMessage), penalties = NULL, warningMessage = TRUE )
data |
A vector of observations y |
beta |
The l0 penalty. The default one is |
modelParam |
A list of 3 initial model parameters: |
penalties |
Can be used as an alternative to the model parameters, a list of 3 initial penalties: |
warningMessage |
When |
Returns an s3 object of class DeCAFSout where:
$changepoints
is the vector of change-point locations,
$signal
is the estimated signal without the auto-correlated noise,
$costFunction
is the optimal cost in form of piecewise quadratics at the end of the sequence,
$estimatedParameters
is a list of parameters estimates (if estimated, otherwise simply the initial modelParam
input),
$data
is the sequence of observations.
Romano, G., Rigaill, G., Runge, V., Fearnhead, P. (2021). Detecting Abrupt Changes in the Presence of Local Fluctuations and Autocorrelated Noise. Journal of the American Statistical Association. doi:10.1080/01621459.2021.1909598.
library(ggplot2) set.seed(42) Y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5) y <- Y$y res = DeCAFS(y) ggplot(data.frame(t = 1:length(y), y), aes(x = t, y = y)) + geom_point() + geom_vline(xintercept = res$changepoints, color = "red") + geom_vline(xintercept = Y$changepoints, col = "blue", lty = 3)
library(ggplot2) set.seed(42) Y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5) y <- Y$y res = DeCAFS(y) ggplot(data.frame(t = 1:length(y), y), aes(x = t, y = y)) + geom_point() + geom_vline(xintercept = res$changepoints, color = "red") + geom_vline(xintercept = Y$changepoints, col = "blue", lty = 3)
This function perform robust estimation of parameters in the Random Walk plus Autoregressive model using a method of moments estimator. To model the time-dependency DeCAFS relies on three parameters. These are sdEta
, the standard deviation of the drift (random fluctuations) in the signal, modeled as a Random Walk process, sdNu
, the standard deviation of the AR(1) noise process, and phi
, the autocorrelation parameter of the noise process.
The final estimation of the change locations is affected by the l0 penalty beta and the estimation of the process by those three initial parameters. Therefore, the choice of penalties for DeCAFS is important: where possible investigate resulting segmentations. Should the algorithm return a misspecified estimation of the signal, it might be good to constrain the estimation of the parameters to an edge case. This can be done through the argument model
. Alternatively, one could employ a range of penalties or tune these on training data. To manually specify different penalties, see DeCAFS()
documentation.
If unsure of which model is the most suited for a given sequence, see guidedModelSelection()
for guided model selection.
estimateParameters( y, model = c("RWAR", "AR", "RW"), K = 15, phiLower = 0, phiUpper = 0.999, sdEtaUpper = Inf, sdNuUpper = Inf, warningMessage = FALSE )
estimateParameters( y, model = c("RWAR", "AR", "RW"), K = 15, phiLower = 0, phiUpper = 0.999, sdEtaUpper = Inf, sdNuUpper = Inf, warningMessage = FALSE )
y |
A vector of observations |
model |
Constrain estimation to an edge case of the RWAR model. Defaults to |
K |
The number of K-lags differences of the data to run the robust estimation over. Default set at 15. |
phiLower |
Smallest value of the autocorrelation parameter. Default set at 0. |
phiUpper |
Highest value of the autocorrelation parameter. Default set at 0.99. |
sdEtaUpper |
Highest value of the RW standard deviation. Default set at Inf |
sdNuUpper |
Highest value of the AR(1) noise standard deviation. Default set at Inf |
warningMessage |
A message to warn the user when the automatic parameter estimation is employed. |
Returns a list of estimates that can be employed as an argument for parameter modelParam
to run DeCAFS()
. Those are:
sdEta
the SD of the drift (random fluctuations) in the signal,
sdNu
the SD of the AR(1) noise process,
phi
the autocorrelation parameter of the noise process.
set.seed(42) y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5)$y estimateParameters(y)
set.seed(42) y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5)$y estimateParameters(y)
Estimation of the variances for the diff k operator k = 1 to nbK
estimVar(y, nbK = 10, type = "MAD")
estimVar(y, nbK = 10, type = "MAD")
y |
A time-series obtained by the dataRWAR function |
nbK |
number of diff k elements to consider |
type |
type of robust variance estimator (MAD, S or Q) |
the vector varEst of estimated variances
estimVar(dataRWAR(1000, sdEta = 0.1, sdNu = 0.1, phi = 0.3, type = "rand1", nbSeg = 10)$y)
estimVar(dataRWAR(1000, sdEta = 0.1, sdNu = 0.1, phi = 0.3, type = "rand1", nbSeg = 10)$y)
Evaluation of the variances Eta2 and Nu2
evalEtaNu(v, phi, sdEta = TRUE)
evalEtaNu(v, phi, sdEta = TRUE)
v |
the estimated variances of the diff k operator |
phi |
the autocorrelative AR(1) parameter |
sdEta |
if sdEta = FALSE there is no random walk |
a list with an estimation of the variances Eta2 and Nu2
This function aids the user in selecting an appropriate model for a given sequence of observations.
The function goes an interactive visualization of different model fits for different choices of initial parameter estimators and l0 penalties (beta
).
At the end, a call to the DeCAFS function is printed, while a DeCAFS wrapper is provvided.
guidedModelSelection(data)
guidedModelSelection(data)
data |
A vector of observations y |
A function, being a wrapper of DeCAFS with the selected parameter estimators.
## Not run: y <- dataRWAR(1000, sdEta = 1, sdNu = 4, phi = .4, nbSeg = 4, jumpSize = 20, type = "updown")$y DeCAFSWrapper <- guidedModelSelection(y) ## End(Not run)
## Not run: y <- dataRWAR(1000, sdEta = 1, sdNu = 4, phi = .4, nbSeg = 4, jumpSize = 20, type = "updown")$y DeCAFSWrapper <- guidedModelSelection(y) ## End(Not run)
This data comes from lowering a probe into a bore-hole, and taking measurements of the rock structure as the probe is lowered. As the probe moves from one rock strata to another we expect to see an abrupt change in the signal from the measurements.
oilWell
oilWell
A numeric vector of 4050 obervations
Ruanaidh, Joseph JK O., and William J. Fitzgerald. Numerical Bayesian methods applied to signal processing. Springer Science & Business Media, 2012. doi:10.1007/978-1-4612-0717-7
# removing outliers n = length(oilWell) h = 32 med = rep(NA, n) for (i in 1:n) { index = max(1, i - h):min(n, i + h) med[i] = median(oilWell[index]) } residual = (oilWell - med) y = oilWell[abs(residual) < 8000] sigma = sqrt(var(residual[abs(residual) < 8000])) # running DeCAFS res <- DeCAFS(y/sigma) plot(res, xlab = "time", ylab = "y", type = "l") abline(v = res$changepoints, col = 4, lty = 3)
# removing outliers n = length(oilWell) h = 32 med = rep(NA, n) for (i in 1:n) { index = max(1, i - h):min(n, i + h) med[i] = median(oilWell[index]) } residual = (oilWell - med) y = oilWell[abs(residual) < 8000] sigma = sqrt(var(residual[abs(residual) < 8000])) # running DeCAFS res <- DeCAFS(y/sigma) plot(res, xlab = "time", ylab = "y", type = "l") abline(v = res$changepoints, col = 4, lty = 3)
DeCAFS output plotting method.
## S3 method for class 'DeCAFSout' plot(x, ...)
## S3 method for class 'DeCAFSout' plot(x, ...)
x |
the output object from a DeCAFS call |
... |
Additional graphical parameters to be passed down to the plot function |
An R plot
set.seed(42) Y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5) res = DeCAFS(Y$y) plot(res, type = "l")
set.seed(42) Y <- dataRWAR(n = 1e3, phi = .5, sdEta = 1, sdNu = 3, jumpSize = 15, type = "updown", nbSeg = 5) res = DeCAFS(Y$y) plot(res, type = "l")
Generate a piecewise constant signal of a given length
scenarioGenerator( n, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
scenarioGenerator( n, type = c("none", "up", "updown", "rand1"), nbSeg = 20, jumpSize = 1 )
n |
The length of the sequence of observations. |
type |
Possible change scenarios for the jump structure |
nbSeg |
Number of segments |
jumpSize |
Maximum magnitude of a change |
a sequence of N values for the piecewise constant signal
scenarioGenerator(1e3, "rand1")
scenarioGenerator(1e3, "rand1")