Title: | Statistical Testing for Functional Data |
---|---|
Description: | Implementation of two sample comparison procedures based on median-based statistical tests for functional data, introduced in Smida et al (2022) <doi:10.1080/10485252.2022.2064997>. Other competitive state-of-the-art approaches proposed by Chakraborty and Chaudhuri (2015) <doi:10.1093/biomet/asu072>, Horvath et al (2013) <doi:10.1111/j.1467-9868.2012.01032.x> or Cuevas et al (2004) <doi:10.1016/j.csda.2003.10.021> are also included in the package, as well as procedures to run test result comparisons and power analysis using simulations. |
Authors: | Zaineb Smida [aut] |
Maintainer: | Ghislain Durif <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.0.3 |
Built: | 2025-02-18 04:11:32 UTC |
Source: | https://github.com/cran/funStatTest |
Computation of the different statistics defined in the package. See Smida et al (2022) for more details.
comp_stat(MatX, MatY, stat = c("mo", "med"))
comp_stat(MatX, MatY, stat = c("mo", "med"))
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
stat |
character string or vector of character string, name of the
statistics for which the p-values will be computed, among |
For HKR statistics, only the values of the two statistics, namely HKR1
and
HKR2
and not the eigen values (see stat_hkr()
for
more details).
list of named numeric value corresponding to the statistic values
listed in stat
input.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
stat_mo()
, stat_med()
,
stat_wmw()
, stat_hkr()
,
stat_cff()
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- comp_stat(MatX, MatY, stat = c("mo", "med", "wmw", "hkr", "cff")) res
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- comp_stat(MatX, MatY, stat = c("mo", "med", "wmw", "hkr", "cff")) res
Computation of the p-values associated to any statistics described in the package with the permutation methods. See Smida et al (2022) for more details.
permut_pval(MatX, MatY, n_perm = 100, stat = c("mo", "med"), verbose = FALSE)
permut_pval(MatX, MatY, n_perm = 100, stat = c("mo", "med"), verbose = FALSE)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
n_perm |
integer, number of permutation to compute the p-values. |
stat |
character string or vector of character string, name of the
statistics for which the p-values will be computed, among |
verbose |
boolean, if TRUE, enable verbosity. |
list of named numeric value corresponding to the p-values for each
statistic listed in the stat
input.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
stat_mo()
, stat_med()
,
stat_wmw()
, stat_hkr()
,
stat_cff()
, comp_stat()
# simulate small data for the example simu_data <- simul_data( n_point = 20, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- permut_pval( MatX, MatY, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), verbose = TRUE) res
# simulate small data for the example simu_data <- simul_data( n_point = 20, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- permut_pval( MatX, MatY, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), verbose = TRUE) res
Graphical representation of simulated data
plot_simu(simu)
plot_simu(simu)
simu |
list, output of |
the ggplot2 graph of simulated tajectories.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
# constant delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "constant", distrib = "normal" ) plot_simu(simu_data) # linear delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "linear", distrib = "normal" ) plot_simu(simu_data) # quadratic delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "quadratic", distrib = "normal" ) plot_simu(simu_data)
# constant delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "constant", distrib = "normal" ) plot_simu(simu_data) # linear delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "linear", distrib = "normal" ) plot_simu(simu_data) # quadratic delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "quadratic", distrib = "normal" ) plot_simu(simu_data)
Computation of the statistical power (i.e. risk to reject the null hypothesis when it is false) associated to any statistics described in the package based on simulation permutation-based p-values computations. See Smida et al (2022) for more details.
power_exp( n_simu = 100, alpha = 0.05, n_perm = 100, stat = c("mo", "med"), n_point = 100, n_obs1 = 50, n_obs2 = 50, c_val = 1, delta_shape = "constant", distrib = "normal", max_iter = 10000, verbose = FALSE )
power_exp( n_simu = 100, alpha = 0.05, n_perm = 100, stat = c("mo", "med"), n_point = 100, n_obs1 = 50, n_obs2 = 50, c_val = 1, delta_shape = "constant", distrib = "normal", max_iter = 10000, verbose = FALSE )
n_simu |
integer value, number of simulations to compute the statistical power. |
alpha |
numerical value, between 0 and 1, type I risk level to reject
the null hypothesis in the simulation. Default value is |
n_perm |
integer, number of permutation to compute the p-values. |
stat |
character string or vector of character string, name of the
statistics for which the p-values will be computed, among |
n_point |
integer value, number of points in the trajectory. |
n_obs1 |
integer value, number of trajectories in the first sample. |
n_obs2 |
integer value, number of trajectories in the second sample. |
c_val |
numeric value, level of divergence between the two samples. |
delta_shape |
character string, shape of the divergence between the
two samples, among |
distrib |
character string, type of probability distribution used to
simulate the data among |
max_iter |
integer, maximum number of iteration for the iterative simulation process. |
verbose |
boolean, if TRUE, enable verbosity. |
The c_val
input argument should be strictly positive so that the null
hypothesis is not verified when simulating the data (i.e. so that the two
sample are not generated from the same probability distribution).
a list with the following elements:
power_res
: a list of named numeric value corresponding to the
power values for each statistic listed in stat
input.
pval_res
: a list of named numeric values corresponding to the p-values
for each simulation and each statistic listed in the stat
input.
simu_config
: information about input parameters used for simulation,
including n_simu
, c_val
, distrib
, delta_shape
, n_point
,
n_obs1
, n_obs2
.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
stat_mo()
, stat_med()
,
stat_wmw()
, stat_hkr()
,
stat_cff()
, comp_stat()
# simulate a few small data for the example res <- power_exp( n_simu = 20, alpha = 0.05, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), n_point = 25, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal", max_iter = 10000, verbose = FALSE ) res$power_res
# simulate a few small data for the example res <- power_exp( n_simu = 20, alpha = 0.05, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), n_point = 25, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal", max_iter = 10000, verbose = FALSE ) res$power_res
Simulate n_obs1
trajectories of length n_point
in the first sample and
n_obs2
trajectories of length n_point
in the second sample.
simul_data( n_point, n_obs1, n_obs2, c_val = 0, delta_shape = "constant", distrib = "normal", max_iter = 10000 )
simul_data( n_point, n_obs1, n_obs2, c_val = 0, delta_shape = "constant", distrib = "normal", max_iter = 10000 )
n_point |
integer value, number of points in the trajectory. |
n_obs1 |
integer value, number of trajectories in the first sample. |
n_obs2 |
integer value, number of trajectories in the second sample. |
c_val |
numeric value, level of divergence between the two samples. |
delta_shape |
character string, shape of the divergence between the
two samples, among |
distrib |
character string, type of probability distribution used to
simulate the data among |
max_iter |
integer, maximum number of iteration for the iterative simulation process. |
A list with the following elements
mat_sample1
: numeric matrix of dimension n_point x n_obs1
containing
n_obs1
trajectories (in columns) of size n_point
(in rows)
corresponding to sample 1.
mat_sample2
: numeric matrix of dimension n_point x n_obs2
containing
n_obs2
trajectories (in columns) of size n_point
(in rows)
corresponding to sample 2.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) str(simu_data)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) str(simu_data)
Simulate a trajectory of length n_point
using a random generator
associated to different probability distribution.
simul_traj(n_point, distrib = "normal", max_iter = 10000)
simul_traj(n_point, distrib = "normal", max_iter = 10000)
n_point |
integer value, number of points in the trajectory. |
distrib |
character string, type of probability distribution used to
simulate the data among |
max_iter |
integer, maximum number of iteration for the iterative simulation process. |
Vector of size n_point
with the trajectory values.
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_vec <- simul_traj(100) plot(simu_vec, xlab = "point", ylab = "value")
simu_vec <- simul_traj(100) plot(simu_vec, xlab = "point", ylab = "value")
The Cuevas-Febrero-Fraiman statistics defined in Cuevas et al (2004) (and noted CFF in Smida et al 2022) is computed to compare two sets of functional trajectories.
stat_cff(MatX, MatY)
stat_cff(MatX, MatY)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
numeric value corresponding to the WMW statistic value
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Cuevas, A, Febrero, M, and Fraiman, R (2004) An anova test for functional data. Computational Statistics & Data Analysis, 47(1): 111–122. doi:10.1016/j.csda.2003.10.021
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_cff(MatX, MatY)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_cff(MatX, MatY)
The Horváth-Kokoszka-Reeder statistics defined in Chakraborty & Chaudhuri (2015) (and noted HKR1 and HKR2 in Smida et al 2022) are computed to compare two sets of functional trajectories.
stat_hkr(MatX, MatY)
stat_hkr(MatX, MatY)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
A list with the following elements
T1
: numeric value corresponding to the HKR1 statistic value
T2
: numeric value corresponding to the HKR2 statistic value
eigenval
: numeric vector of eigen values from the empirical
pooled covariance matrix of MatX
and MatY
(see Smida et al, 2022, for
more details)
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Horváth, L., Kokoszka, P., & Reeder, R. (2013). Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(1), 103–122. doi:10.1111/j.1467-9868.2012.01032.x
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_hkr(MatX, MatY)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_hkr(MatX, MatY)
The MED median statistics defined in Smida et al (2022) is computed to compare two sets of functional trajectories.
stat_med(MatX, MatY)
stat_med(MatX, MatY)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
numeric value corresponding to the MED median statistic value
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_med(MatX, MatY)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_med(MatX, MatY)
The MO median statistics defined in Smida et al (2022) is computed to compare two sets of functional trajectories.
stat_mo(MatX, MatY)
stat_mo(MatX, MatY)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
numeric value corresponding to the MO median statistic value
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_mo(MatX, MatY)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_mo(MatX, MatY)
The Wilcoxon-Mann-Whitney statistic defined in Chakraborty & Chaudhuri (2015) (and noted WMW in Smida et al 2022) is computed to compare two sets of functional trajectories.
stat_wmw(MatX, MatY)
stat_wmw(MatX, MatY)
MatX |
numeric matrix of dimension |
MatY |
numeric matrix of dimension |
numeric value corresponding to the WMW statistic value
Zaineb Smida, Ghislain DURIF, Lionel Cucala
Anirvan Chakraborty, Probal Chaudhuri, A Wilcoxon–Mann–Whitney-type test for infinite-dimensional data, Biometrika, Volume 102, Issue 1, March 2015, Pages 239–246, doi:10.1093/biomet/asu072
Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif (2022) A median test for functional data, Journal of Nonparametric Statistics, 34:2, 520-553, doi:10.1080/10485252.2022.2064997, hal-03658578
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_wmw(MatX, MatY)
simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_wmw(MatX, MatY)