--- title: "Getting started with functional statistical testing" output: rmarkdown::html_vignette bibliography: "references.bib" csl: "journal-of-open-research-software.csl" vignette: > %\VignetteIndexEntry{Getting started with functional statistical testing} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(funStatTest) ``` ```{r setup_seed} #| include: no # fix seed for simulations set.seed(123456) ``` The `funStatTest` package implements various statistics for two sample comparison testing regarding functional data. ## Authorship and license This package is developed by: - Zaineb Smida ([ORCID](https://orcid.org/0000-0002-9974-299X)) - Ghislain Durif ([link](https://gdurif.perso.math.cnrs.fr/)|[ORCID](https://orcid.org/0000-0003-2567-1401)) - Lionel Cucala ([link](https://imag.umontpellier.fr/~cucala/)) It implements statistics (and related experiments) introduced and used in [@smida2022]. ## Data simulation Here are some functions used to simulate clustered trajectories of functional data based on the Karhunen-Loève decomposition. The functional data simulation process is described in [@smida2022] (section 3.1). ### Simulate a single trajectory ```{r examples-simul_traj} simu_vec <- simul_traj(100) plot(simu_vec, xlab = "point", ylab = "value") ``` ### Simulate trajectories from two samples diverging by a delta ```{r examples-simulate_data} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) str(simu_data) ``` ### Graphical representation of simulated data ```{r examples-plot_simu} # constant delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "constant", distrib = "normal" ) plot_simu(simu_data) # linear delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "linear", distrib = "normal" ) plot_simu(simu_data) # quadratic delta simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, delta_shape = "quadratic", distrib = "normal" ) plot_simu(simu_data) ``` ## Statistics ### MO median statistic The $MO$ median statistic [@smida2022] is implemented in the `stat_mo()` function. ```{r examples-stat_mo} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_mo(MatX, MatY) ``` ### MED median statistic The $MED$ median statistic [@smida2022] is implemented in the `stat_med()` function. ```{r examples-stat_med} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_med(MatX, MatY) ``` ### WMW statistic The Wilcoxon-Mann-Whitney statistic [@chakraborty2015] (noted $WMW$ in [@smida2022]) is implemented in the `stat_wmw()` function. ```{r examples-stat_wmw} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_wmw(MatX, MatY) ``` ### HKR statistics The Horváth-Kokoszka-Reeder statistics [@horvath2013] (noted $HKR1$ and $HKR2$ in [@smida2022]) are implemented in the `stat_hkr()` function. ```{r examples-stat_hkr} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_hkr(MatX, MatY) ``` ### CFF statistic The Cuevas-Febrero-Fraiman statistic [@cuevas2004] (noted $CFF$ in [@smida2022]) is implemented in the `stat_cff()` function. ```{r examples-stat_cff} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 stat_cff(MatX, MatY) ``` ### Compute multiple statistics The function `comp_stat()` allows to compute multiple statistics defined above in a single call on the same data. ```{r examples-comp_stat} simu_data <- simul_data( n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- comp_stat(MatX, MatY, stat = c("mo", "med", "wmw", "hkr", "cff")) res ``` ## Permutation-based computation of p-values P-values associated to the different statistics defined above can be computed with the permutation-based method as follow: ```{r examples-permut_pval} # simulate small data for the example simu_data <- simul_data( n_point = 20, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal" ) MatX <- simu_data$mat_sample1 MatY <- simu_data$mat_sample2 res <- permut_pval( MatX, MatY, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), verbose = TRUE) res ``` > :warning: computing p-values based on permutations may take some time (for large data or when using a large number of simulations. :warning: ## Simulation-based power analysis We use our simulation scheme with permutation-based p-values computation to run a power analysis to evaluate the different statistics. ```{r examples-power_exp} # simulate a few small data for the example res <- power_exp( n_simu = 20, alpha = 0.05, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), n_point = 25, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", distrib = "normal", max_iter = 10000, verbose = FALSE ) res$power_res ``` ## References ::: {#refs} :::