---
title: "Getting started with functional statistical testing"
output: rmarkdown::html_vignette
bibliography: "references.bib"
csl: "journal-of-open-research-software.csl"
vignette: >
  %\VignetteIndexEntry{Getting started with functional statistical testing}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r}
library(funStatTest)
```

<!-- WARNING - This vignette is generated by {fusen} from dev/flat_package.Rmd: do not edit by hand -->


```{r setup_seed}
#| include: no

# fix seed for simulations
set.seed(123456)
```


The `funStatTest` package implements various statistics for two sample comparison testing regarding functional data.

## Authorship and license

This package is developed by:

- Zaineb Smida ([ORCID](https://orcid.org/0000-0002-9974-299X))
- Ghislain Durif ([link](https://gdurif.perso.math.cnrs.fr/)|[ORCID](https://orcid.org/0000-0003-2567-1401))
- Lionel Cucala ([link](https://imag.umontpellier.fr/~cucala/))

It implements statistics (and related experiments) introduced and used in [@smida2022].

## Data simulation

Here are some functions used to simulate clustered trajectories of functional data based on the Karhunen-Loève decomposition.

The functional data simulation process is described in [@smida2022] (section 3.1).

### Simulate a single trajectory


```{r examples-simul_traj}
simu_vec <- simul_traj(100)
plot(simu_vec, xlab = "point", ylab = "value")
```


### Simulate trajectories from two samples diverging by a delta


```{r examples-simulate_data}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)
str(simu_data)
```


### Graphical representation of simulated data


```{r examples-plot_simu}
# constant delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "constant", distrib = "normal"
)
plot_simu(simu_data)
# linear delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "linear", distrib = "normal"
)
plot_simu(simu_data)
# quadratic delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "quadratic", distrib = "normal"
)
plot_simu(simu_data)
```


## Statistics

### MO median statistic

The $MO$ median statistic [@smida2022] is implemented in the `stat_mo()` function.


```{r examples-stat_mo}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_mo(MatX, MatY)
```


### MED median statistic

The $MED$ median statistic [@smida2022] is implemented in the `stat_med()` function.


```{r examples-stat_med}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_med(MatX, MatY)
```


### WMW statistic

The Wilcoxon-Mann-Whitney statistic [@chakraborty2015] (noted $WMW$ in [@smida2022]) is implemented in the `stat_wmw()` function.


```{r examples-stat_wmw}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_wmw(MatX, MatY)
```


### HKR statistics

The Horváth-Kokoszka-Reeder statistics [@horvath2013] (noted $HKR1$ and $HKR2$ in [@smida2022]) are implemented in the `stat_hkr()` function.


```{r examples-stat_hkr}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_hkr(MatX, MatY)
```


### CFF statistic

The Cuevas-Febrero-Fraiman statistic [@cuevas2004] (noted $CFF$ in [@smida2022]) is implemented in the `stat_cff()` function.


```{r examples-stat_cff}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_cff(MatX, MatY)
```


### Compute multiple statistics

The function `comp_stat()` allows to compute multiple statistics defined above in a single call on the same data.


```{r examples-comp_stat}
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

res <- comp_stat(MatX, MatY, stat = c("mo", "med", "wmw", "hkr", "cff"))
res
```


## Permutation-based computation of p-values

P-values associated to the different statistics defined above can be computed with the permutation-based method as follow:


```{r examples-permut_pval}
# simulate small data for the example
simu_data <- simul_data(
    n_point = 20, n_obs1 = 4, n_obs2 = 5, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
res <- permut_pval(
    MatX, MatY, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), 
    verbose = TRUE)
res
```


> :warning: computing p-values based on permutations may take some time (for large data or when using a large number of simulations. :warning:

## Simulation-based power analysis

We use our simulation scheme with permutation-based p-values computation to run a power analysis to evaluate the different statistics.


```{r examples-power_exp}
# simulate a few small data for the example
res <- power_exp(
    n_simu = 20, alpha = 0.05, n_perm = 100, 
    stat = c("mo", "med", "wmw", "hkr", "cff"), 
    n_point = 25, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", 
    distrib = "normal", max_iter = 10000, verbose = FALSE
)
res$power_res
```


## References

::: {#refs}
:::