Overview Functional Data Analysis of Spatial Metrics

Martin Emons1,2*, Fabian Scheipl3 and Mark D. Robinson1,2

1Department of Molecular Life Sciences, University of Zurich, Switzerland
2SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
3Department of Statistics, Ludwig-Maximilians-Universität München, Germany

*martin.emons@uzh.ch

10 October 2025

Abstract

spatialFDA is a package to calculate spatial statistics metrics and compare them with methods from functional data analysis. Here, we show how to perform a standard spatial analysis using spatialFDA.

Package

spatialFDA

1 Introduction
2 Installation
3 Getting started
- 3.1 Loading the data
- 3.2 Visualising the raw data
4 Spatial Inference

1 Introduction

This vignette serves as an overview how to use spatialFDA to perform functional data analysis on spatial statistics metrics. The main aim of this package is to detect differential spatial arrangements between cells in multi-sample/condition experiments. It combines the the estimation with the spatstat package with the inference of refund (Baddeley and Turner 2005; Baddeley, Rubak, and Turner 2015; Goldsmith et al. 2024).

This vignette is a brief overview version of the “Detailed Functional Data Analysis of Spatial Metrics” vignette. The content and code is in parts overlapping.

The use case is a dataset from Damond et al. (2019) which contains images from 12 human donors. The raw data is published under a CC-BY-4.0 License on Mendeley.

2 Installation

spatialFDA can be installed and loaded from Bioconductor as follows

if (!requireNamespace("BiocManager")) {
    install.packages("BiocManager")
}
BiocManager::install("spatialFDA")

library("spatialFDA")
library("dplyr")
library("ggplot2")
library("tidyr")
library("stringr")
library("dplyr")
library("patchwork")
library("SpatialExperiment")

set.seed(1234)

3 Getting started

In this vignette we will analyse a diabetes dataset acquired by imaging mass cytometry (IMC) as acquired by Damond et al. (2019). The dataset contains images from 12 human donors, 4 healthy and 8 with type 1 diabetes (T1D). With IMC, 35 markers were measured at single cell resolution (Damond et al. 2019).

3.1 Loading the data

The Damond et al. (2019) dataset is easily loaded from ExperimentHub via a small reader function .loadExample(). The entire dataset can be loaded by setting full = TRUE. For computational reasons, one can reduce to three patients as well by setting this flag to FALSE. We will subset the entire dataset to two samples per condition in order to have a multi-condition/multi-sample setting. The package offers multiple datatypes, we will use the SpatialExperiment (SPE) object (Righelli et al. 2022).

# retrieve example data from Damond et al. (2019)
spe <- .loadExample(full = TRUE)

spe <- subset(spe, ,patient_id %in% c(6089,6180,6126,6134,6228,6414))
# set cell types as factors
colData(spe)$cell_type <- as.factor(colData(spe)$cell_type)

3.2 Visualising the raw data

We can look at the fields of view (FOVs) of the diabetes dataset. To do so we extract the spatial coordinates, store them as a dataframe and add the colData from the SPE to this. We will look only at the first four FOVs of the healthy sample. We plot both the cell categories of all cells and then the cell types of secretory cells (\(\alpha, \beta\) and \(\delta\) cells) and T-cells (CD8+ and CD4+ T-cells).

df <- data.frame(spatialCoords(spe), colData(spe))

dfSub <- df %>%
    subset(image_name %in% c("E02", "E03", "E04", "E05"))

p <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_category)) +
    geom_point(size= 0.5) +
    facet_wrap(~image_name) +
    theme(legend.title.size = 20, legend.text.size = 20) +
    xlab("x") +
    ylab("y") +
    labs(color = "cell category")+
    coord_equal() +
    theme_light()

dfSub <- dfSub %>%
    subset(cell_type %in% c("alpha", "beta", "delta", "Th", "Tc"))

q <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_type)) +
    geom_point(size= 0.5) +
    facet_wrap(~image_name) +
    theme(legend.title.size = 20, legend.text.size = 20) +
    xlab("x") +
    ylab("y") +
    labs(color = "cell type") +
    coord_equal() +
    theme_light()
wrap_plots(list(p,q), widths = c(1,1), heights = c(1,1), nrow = 2, ncol = 1)

4 Spatial Inference

spatialFDA consists of two main steps, the calculation of the spatial statistics function per individual image and the comparison of these functions via functional data analysis (FDA). spatialFDA contains the convenience function spatialInference which streamlines the estimation explained in the “Detailed Functional Data Analysis of Spatial Metrics” vignette.

The first important step is to consider the sequences of radii to calculate the spatial statistics function over. (Baddeley, Rubak, and Turner 2015) recommend to take never a radius larger than \(1/4\) of the window length. We have implemented a heuristic of this measure as half of the minimum bounding radius over all images. The histogram indicates the variability in the bounding radii.

p <- rMaxHeuristic(spe,
subsetby = "image_number", marks = "cell_type"
)
p

From the plot we see that a maximum radius of \(50\) is justified across all images.

In order to perform the spatial inference, we have to pass the SpatialExperiment object, the cell types we want to analyse (selection), how we want to subset the data (subsetby), the spatial statistic function to compute (fun), which columns are the marks/cell types (marks), the radius domain to compute on (rSeq) and the edge correction to perform (correction).

Furthermore, for the functional data analysis part we can provide the sample_id if there are replicates (sample_id, will lead to a mixed effects model), the transformation to apply to the output of the spatial statistics metric (here, we apply a square root transformation to stabilise the variance), the image ID (image_id) as well as the conditions (conditions)

#relevel to have non-diabetic as the reference category
spe$patient_stage <- relevel(factor(spe$patient_stage), "Non-diabetic")

#run the spatial statistics inference
resLs <- crossSpatialInference(
    spe, 
    selection = c("alpha", "Tc"),
    fun = "Gcross", 
    marks = "cell_type",
    rSeq = seq(0, 50, length.out = 50), 
    correction = "rs",
    sample_id = "patient_id",
    transformation = "Fisher",
    eps = 0,
    delta = "minNnDist",
    family = mgcv::scat(link = "log"),
    image_id = "image_number", 
    condition = "patient_stage",
    ncores = 1,
    algorithm = "bam"
)

names(resLs)

We first consider the combination of cell types defined in selection as an overview bubble plot of the \(F\)-test results. This gives us a general indication of which cell types are co-localising and a notion of significance of the \(F\)-test.

p <- plotCrossHeatmap(resLs, coefficientsToPlot = c("conditionLong_duration(x)", "conditionOnset(x)"), QCThreshold = 0, QCMetric = "medianMinIntensity")
p <- p + theme(legend.position = "bottom") + guides(shape = "none") + 
  facet_wrap(~factor(coefficient, levels = c("conditionOnset(x)", "conditionLong_duration(x)"), labels = c("conditionOnset(x)" = "Onset", "conditionLong_duration(x)" = "Long-Duration")))

p

We notice that there is a strong co-localisation of cytoxic T-cells with pancreatic alpha cells in onset diabetes. We will consider the interaction of alpha cells and cytoxic T-cells in more detail.

res <- resLs$alpha_Tc
names(res)
#> [1] "metricRes"      "designmat"      "mdl"            "curveFittingQC"

The output is a list of four objects: The dataframe with the calculated spatial statistics curves from spatstat, the design matrix for the statistical inference, the output of the pffr function from refund, as well as the residual standard error per condition (Baddeley, Rubak, and Turner 2015; Baddeley and Turner 2005; Goldsmith et al. 2024; Scheipl, Staicu, and Greven 2015).

4.1 Spatial Statistics Curves

We can visualise the spatial statistics curves per image with the function plotMetricPerFov. Note that these curves are variance-stabilised, since we added a transformation parameter above.

#filtered and stabilised curves
metricRes <- res$metricRes

# create a unique plotting ID
metricRes$ID <- paste0(
    metricRes$patient_stage, "|", metricRes$patient_id
)
# change levels for plotting
metricRes$ID <- factor(metricRes$ID, levels = c("Non-diabetic|6126",
                                                "Non-diabetic|6134",
                                                "Onset|6228","Onset|6414",
                                                "Long-duration|6089",
                                                "Long-duration|6180"))
# plot metrics
plotMetricPerFov(metricRes, correction = "rs", x = "r",
                 imageId = "image_number", ID = "ID", ncol = 2)

We note that there is pronounced variability between the three conditions, between the patients as well as between individual images.

4.2 Functional Boxplot

In order to summarise the variability of the curves calculated above, we can use the functional boxplot. The functional boxplot is a generalisation of a standard boxplot, giving information about the median curve (black solid line), the 50% central region (area coloured in magenta), the minimum and maximum envelopes (blue solid lines) and outlier curves (red dashed lines). Here, the fbplot function from the fda package is used (Sun and Genton 2011; Ramsay 2024).

# create a unique ID per row in the dataframe
metricRes$ID <- paste0(
    metricRes$patient_stage, "x", metricRes$patient_id,
    "x", metricRes$image_number
)

collector <- plotFbPlot(metricRes, "r", "rs", "patient_stage")

The functional boxplot provides a convenient summary of the spatial statistics curves. We see that the median curve for onset samples plateaus at a higher level and that non-diabetic and long-duration functional boxplots are in general very similar. There are some outlier curves but since they are not far off the envelopes, we do not filter them out. In comparison to the raw spatial statistics curves we note that the transformations stabilise the variance along the domain \(r\)

4.3 Functional GAM

In the functional boxplot we got an overview of the variability of the spatial statistic curves and could describe differences qualitatively. In order to test these differences, we use inference via functional data analysis. The functional general additive model (GAM) provides a way to perform statistical inference on the spatial statistics functions and is implemented in the function pffr from refund (Scheipl, Staicu, and Greven 2015; Scheipl, Gertheiss, and Greven 2016; Goldsmith et al. 2024).

mdl <- res$mdl
mm <- res$designmat

summary(mdl)
#> 
#> Family: Scaled t(3,0.101) 
#> Link function: log 
#> 
#> Formula:
#> Y ~ conditionLong_duration + conditionOnset + s(patient_id, bs = "re")
#> <environment: 0x63ef5783d550>
#> 
#> Constant coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -1.7747     0.1811  -9.801   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Smooth terms & functional coefficients:
#>                              edf Ref.df      F  p-value    
#> Intercept(x)              17.716 19.000 56.681  < 2e-16 ***
#> conditionLong_duration(x)  1.000  1.000  0.079 0.778357    
#> conditionOnset(x)          2.919  3.088  6.711 0.000227 ***
#> s(patient_id)             12.160 27.000 31.560  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.692   Deviance explained =   57%
#> fREML score =  22090  Scale est. = 1         n = 13992 (318 x 44)

The summary of the functional GAM provides information on the constant parameters and the functional coefficients. For the functional coefficients, an \(F\)-test over the entire domain is performed which answers the question if there is any difference between the reference curves (here “Non-diabetic”) and the conditions over the entire domain.

We note that there is a small non-significant difference in the \(G\) function between non-diabetic and long-duration T1D samples, but a strong difference between non-diabetic and onset T1D according to the model summary.

As the functional coefficients are functions of the domain \(r\) themselves, we can plot them too

plotLs <- lapply(colnames(mm), plotMdl, mdl = mdl,
                 shift = mdl$coefficients[["(Intercept)"]])
#> using seWithMean for  s(x.vec) .
#> using seWithMean for  s(x.vec) .
#> using seWithMean for  s(x.vec) .
wrap_plots(plotLs, nrow = 3, axes = 'collect')

From the functional coefficients we see that the difference betwenn Non-diabetic and Onset diabetic curves is at short distances and that the difference becomes less pronounced at higher distances. This means that the differential co-localisation between cytotoxic T-cells and alpha cells in the islet happens at shorter distances.

The point wise confidence bands are a limitation of this method and could be improved with either bootstrapping or continuous confidence bands (Liebl and Reimherr 2023).

4.4 Model evaluation

The functional GAMs provide coefficients that we could plot above. In order to judge the confidence we can have in these coefficients, we look to quantify the model fit. First, we look at the correlation and check the qq plot of the model residuals.

residuals(mdl) |> cor() |> filled.contour(levels = seq(-1, 1, l = 40))

residuals(mdl) |> cov() |> filled.contour()


qqnorm(mdl$residuals, pch = 16)
qqline(mdl$residuals)

In these model diagnostics, we note that there is still some variability in the residuals that is not considered by the model. The Q-Q plot indicates a good but not perfect model fit. The residuals show a considerable structure that is in line with the structure in the auto-covariance / correlation plots.

sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] refund_0.1-37               imcdatasets_1.17.0         
#>  [3] cytomapper_1.21.0           EBImage_4.51.0             
#>  [5] SpatialExperiment_1.19.1    SingleCellExperiment_1.31.1
#>  [7] SummarizedExperiment_1.39.2 Biobase_2.69.1             
#>  [9] GenomicRanges_1.61.5        Seqinfo_0.99.2             
#> [11] IRanges_2.43.5              S4Vectors_0.47.4           
#> [13] BiocGenerics_0.55.3         generics_0.1.4             
#> [15] MatrixGenerics_1.21.0       matrixStats_1.5.0          
#> [17] patchwork_1.3.2             stringr_1.5.2              
#> [19] tidyr_1.3.1                 ggplot2_4.0.0              
#> [21] dplyr_1.1.4                 spatialFDA_1.1.21          
#> [23] BiocStyle_2.37.1           
#> 
#> loaded via a namespace (and not attached):
#>   [1] gamm4_0.2-7            splines_4.5.1          later_1.4.4           
#>   [4] bitops_1.0-9           filelock_1.0.3         tibble_3.3.0          
#>   [7] svgPanZoom_0.3.4       polyclip_1.10-7        deSolve_1.40          
#>  [10] lifecycle_1.0.4        httr2_1.2.1            Rdpack_2.6.4          
#>  [13] lattice_0.22-7         MASS_7.3-65            magrittr_2.0.4        
#>  [16] sass_0.4.10            rmarkdown_2.30         jquerylib_0.1.4       
#>  [19] yaml_2.3.10            httpuv_1.6.16          sp_2.2-0              
#>  [22] spatstat.sparse_3.1-0  minqa_1.2.8            DBI_1.2.3             
#>  [25] RColorBrewer_1.1-3     abind_1.4-8            purrr_1.1.0           
#>  [28] RCurl_1.98-1.17        pracma_2.4.4           rappdirs_0.3.3        
#>  [31] spatstat.utils_3.2-0   terra_1.8-70           goftest_1.2-3         
#>  [34] spatstat.random_3.4-2  svglite_2.2.1          codetools_0.2-20      
#>  [37] DelayedArray_0.35.3    tidyselect_1.2.1       raster_3.6-32         
#>  [40] farver_2.1.2           lme4_1.1-37            viridis_0.6.5         
#>  [43] BiocFileCache_2.99.6   spatstat.explore_3.5-3 jsonlite_2.0.0        
#>  [46] ks_1.15.1              systemfonts_1.3.1      tools_4.5.1           
#>  [49] Rcpp_1.1.0             glue_1.8.0             gridExtra_2.3         
#>  [52] SparseArray_1.9.1      mgcv_1.9-3             xfun_0.53             
#>  [55] HDF5Array_1.37.0       shinydashboard_0.7.3   withr_3.0.2           
#>  [58] BiocManager_1.30.26    fastmap_1.2.0          boot_1.3-32           
#>  [61] rhdf5filters_1.21.0    digest_0.6.37          R6_2.6.1              
#>  [64] mime_0.13              textshaping_1.0.4      colorspace_2.1-2      
#>  [67] tensor_1.5.1           jpeg_0.1-11            dichromat_2.0-0.1     
#>  [70] spatstat.data_3.1-8    RSQLite_2.4.3          h5mread_1.1.1         
#>  [73] pbs_1.1                httr_1.4.7             htmlwidgets_1.6.4     
#>  [76] S4Arrays_1.9.1         pkgconfig_2.0.3        gtable_0.3.6          
#>  [79] blob_1.2.4             S7_0.2.0               XVector_0.49.1        
#>  [82] pcaPP_2.0-5            htmltools_0.5.8.1      bookdown_0.45         
#>  [85] fftwtools_0.9-11       scales_1.4.0           png_0.1-8             
#>  [88] reformulas_0.4.1       spatstat.univar_3.1-4  knitr_1.50            
#>  [91] rjson_0.2.23           magic_1.6-1            nloptr_2.2.1          
#>  [94] nlme_3.1-168           curl_7.0.0             cachem_1.1.0          
#>  [97] rhdf5_2.53.6           RLRsim_3.1-8           BiocVersion_3.22.0    
#> [100] KernSmooth_2.23-26     parallel_4.5.1         fda_6.3.0             
#> [103] vipor_0.4.7            AnnotationDbi_1.71.1   pillar_1.11.1         
#> [106] grid_4.5.1             vctrs_0.6.5            promises_1.3.3        
#> [109] dbplyr_2.5.1           xtable_1.8-4           cluster_2.1.8.1       
#> [112] beeswarm_0.4.0         evaluate_1.0.5         tinytex_0.57          
#> [115] magick_2.9.0           mvtnorm_1.3-3          cli_3.6.5             
#> [118] locfit_1.5-9.12        compiler_4.5.1         rlang_1.1.6           
#> [121] crayon_1.5.3           grpreg_3.5.0           labeling_0.4.3        
#> [124] mclust_6.1.1           fds_1.8                ggbeeswarm_0.7.2      
#> [127] stringi_1.8.7          rainbow_3.8            viridisLite_0.4.2     
#> [130] deldir_2.0-4           BiocParallel_1.43.4    nnls_1.6              
#> [133] hdrcde_3.4             Biostrings_2.77.2      tiff_0.1-12           
#> [136] spatstat.geom_3.6-0    Matrix_1.7-4           ExperimentHub_2.99.5  
#> [139] bit64_4.6.0-1          Rhdf5lib_1.31.0        KEGGREST_1.49.2       
#> [142] shiny_1.11.1           AnnotationHub_3.99.6   rbibutils_2.3         
#> [145] memoise_2.0.1          bslib_0.9.0            bit_4.6.0

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. CRC press.

Baddeley, Adrian, and Rolf Turner. 2005. “spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (6): 1–42.

Damond, Nicolas, Stefanie Engler, Vito R. T. Zanotelli, Denis Schapiro, Clive H. Wasserfall, Irina Kusmartseva, Harry S. Nick, et al. 2019. “A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry.” Cell Metabolism 29 (3): 755–768.e5.

Goldsmith, Jeff, Fabian Scheipl, Lei Huang, Julia Wrobel, Chongzhi Di, Jonathan Gellar, Jaroslaw Harezlak, et al. 2024. Refund: Regression with Functional Data.

Liebl, Dominik, and Matthew Reimherr. 2023. “Fast and Fair Simultaneous Confidence Bands for Functional Parameters.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (3): 842–68.

Ramsay, James. 2024. Fda: Functional Data Analysis. https://CRAN.R-project.org/package=fda.

Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron TL Lun, Stephanie C Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.” Bioinformatics 38 (11): 3128–31.

Scheipl, Fabian, Jan Gertheiss, and Sonja Greven. 2016. “Generalized Functional Additive Mixed Models.” Electronic Journal of Statistics 10 (1): 1455–92.

Scheipl, Fabian, Ana-Maria Staicu, and Sonja Greven. 2015. “Functional Additive Mixed Models.” Journal of Computational and Graphical Statistics 24 (2): 477–501.

Sun, Ying, and Marc G Genton. 2011. “Functional Boxplots.” Journal of Computational and Graphical Statistics.