AggregateCellMetadata - take cell level metadata and collapse down to sample-level
metadata for use in pseudobuk testing. This can be useful if you do not already have
metadata for each sample in the experiment, but these data are stored in single cell
metadata. For example, metadata for the donor ID, unique sample
(e.g. donorid_timepoint), age, sex etc. Note these are all individual sample level data.
Single cell intrinsic variables like nUMI cannot be collapsed down, only variables that
are unique for each sample which are the columns of the pseudobulk data. Only include
metadata that you intend to adjust models for as covariates or random effects because all
variables will be referenced during count normalization and feature filtering.
EnrichmentJaccard - using gsea list and LeadingEdgeIndexed result, compute pairwise jaccard index of leadingedge genes within celltypes. saves a heatmap of modules for each cell type in savpath if saveplot = TRUE. Returns a gsea result dataframe with all celltypes combined and module annotated with average within celltype jaccard index and leadingedge genes.
ExtractResult - convenience function to return statistics for downstream analysis functions such as FgseaList. Returns results from list of dream or lmFit results from those functions natively (use model.fit.list = list(fit)) or from scglmmr::RunVoomLimma and scglmmr::dreamMixedModel
FgseaList - wrapper around fast gene set enrichment analysis with the fgsea R package https://bioconductor.org/packages/release/bioc/html/fgsea.html to implement on a list of ranks indexec by cell type.
FitDream - run mixed effects model on aggregated (summed) data using the method 'dream'
by Hoffman et. al. Bioinformatics (2021) doi.org/10.1093/bioinformatics/btaa687. Fits
mixed model using lme4 with REML and voom weights.
FitLmer - This is a simpler version of FitLmerContrast that makes no assumptions about structure of underlying
data except that it can accomodate a mixed effects model formula and that there are multiple cell types to be
separately fitted. fit a single cell mixed effects linear model. Designed to fit an aggregated gene module score.
Fits a model to each cell cluster to test a grouping, treatment or combination factor; returns model fits
for maximum flexibility
FitLmerContrast - for data with pre post treatment and 2 response groups; within each cell type contrast difference
in fold change between groups, baseline difference, and fold change across groups of module scores.
RunFgseaOnRankList - wrapper around fast gene set enrichment analysis with the fgsea R package https://bioconductor.org/packages/release/bioc/html/fgsea.html
TidySampleData convert data from PseudobulkList into a dataframe for each sample across cell types of the top differentially expressed genes for a contrast
calc_avg_module_zscore - calculate average module z score of list of modules on a PseudobulkList
This is equivalent to the average z score method used in in Kotliarov et. al. Nature Med 2020
zscore is calculated across both genes and samples
it is adopted below to run on 'pseudobulk lists' (average "averagemetacell.list" or pseudobulk list
created by PseudobulkList) this small wrapper is called by the AverageSampleModuleZscore.
calculate signature score for each cell type, BTM, Subject
function input = named list of modules, dataframe with subject as rows genes as columns