Consistency comparison of `MCube`, `CELINA`, and `C-SIDE` on the CRC datasets from different technologies

[1]:

set.seed(20250502)

library(Matrix)
library(spacexr)
library(MCube)
library(CELINA)
library(ggplot2)

[2]:

RAW_DATA_PATH <- "/import/home/share/zw/data/CRC"
DATA_PATH <- "/import/home/share/zw/pql/data/CRC"
RESULT_PATH <- "/import/home/share/zw/pql/results/CRC"

if (!dir.exists(file.path(RESULT_PATH, "Visium"))) {
    dir.create(file.path(RESULT_PATH, "Visium"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "VisiumHD"))) {
    dir.create(file.path(RESULT_PATH, "VisiumHD"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "comparison"))) {
    dir.create(file.path(RESULT_PATH, "comparison"), recursive = TRUE)
}

Load in scRNA-seq reference data

[23]:

library(Seurat)

FlexRef <- Read10X_h5(file.path(
    RAW_DATA_PATH, "sc", "HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5"
))
# MetaData <- readRDS(file.path(
#     RAW_DATA_PATH, "sc", "FlexSeuratV5_MetaData.rds"
# )) # See FlexSingleCell.R if not generated.

meta <- read.csv(file.path(
    RAW_DATA_PATH, "HumanColonCancer_VisiumHD/MetaData/SingleCell_MetaData.csv.gz"
))

KpIdents <- names(which(table(meta$Level2) > 25))
meta <- meta[meta$Level2 %in% KpIdents, ]
FlexRef <- FlexRef[, meta$Barcode]

CTRef <- meta$Level2
CTRef <- gsub("/", "_", CTRef)
CTRef <- as.factor(CTRef)
names(CTRef) <- meta$Barcode

reference <- Reference(FlexRef, CTRef, colSums(FlexRef))

Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t


Warning message in check_UMI(nUMI, "Reference", require_2d = T, require_int = require_int, :
“Reference: some nUMI values are less than min_UMI = 100, and these cells will be removed. Optionally, you may lower the min_UMI parameter.”
Warning message in Reference(FlexRef, CTRef, colSums(FlexRef)):
“Reference: number of cells per cell type is 34844, larger than maximum allowable of 10000. Downsampling number of cells to: 10000”

We then compare the statistical power of MCube and CELINA on ST CRC datasets. We focus on the cancer associated fibroblasts (CAFs) and macrophages to study the tumor microenvironment (TME). In the tumor boundary region, CAFs are the most prominent cell type, while macrophages are consistently identified as the most abundant immune cell type.

[3]:

celltypes_tme <- c("CAF", "Macrophage")

Since there is no ground truth on cell-type-specific SVGs, we used the identified SVGs in the paired Visium data for verification.

10x Visium

Cell type deconvolution using `RCTD`

[ ]:

# counts <- Read10X_h5(file.path(
#     RAW_DATA_PATH, "visium",
#     "Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5"
# ))

# coordinates <- as.data.frame(readr::read_csv(file.path(
#     RAW_DATA_PATH, "visium",
#     "spatial/tissue_positions.csv"
# )))
# rownames(coordinates)<-coordinates$barcode
# coordinates<-coordinates[colnames(counts), ]
# coordinates <- coordinates[, c(6, 5)]

# puck <- SpatialRNA(coordinates, counts, colSums(counts))

# myRCTD <- create.RCTD(puck, reference, max_cores = 32)
# myRCTD <- run.RCTD(myRCTD, doublet_mode = "full")
# saveRDS(myRCTD, file.path(RESULT_PATH, "Visium", "myRCTD.rds"))

[5]:

myRCTD <- readRDS(file.path(RESULT_PATH, "Visium", "myRCTD.rds"))

Cell-type-specific SVG identification using `MCube`

[ ]:

spot_names <- colnames(myRCTD@spatialRNA@counts)
weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)

[ ]:

mcube_object <- createMCube(
    counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spot_names])),
    coordinates = as.matrix(myRCTD@spatialRNA@coords),
    proportions = proportions_RCTD,
    library_sizes = myRCTD@spatialRNA@nUMI,
    reference = t(myRCTD@cell_type_info$info[[1]]),
    used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
    spot_effects = spot_effects_RCTD,
    celltype_test = celltypes_tme,
    celltype_threshold = 50
)
mcube_object <- mcubeFitNull(
    mcube_object,
    num_workers = 36, num_threads = 1
)
mcube_object <- mcubeTest(
    mcube_object,
    num_workers = 36, num_threads = 1, shared_memory = TRUE
)

saveRDS(
    mcube_object@pvalues,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("mcube_pvalues_visium", ".rds")
    )
)

Cell-type-specific SVG identification using `C-SIDE`

[23]:

myCSIDE <- run.CSIDE.nonparam(
    myRCTD,
    df = 15,
    weight_threshold = 0.8, cell_type_threshold = 20,
    doublet_mode = FALSE
)
saveRDS(
    myCSIDE@de_results$all_gene_list,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("cside_pvalues_visium", ".rds")
    )
)

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Enteric Glial", "Enterocyte", "Fibroblast", "Goblet", "Lymphatic Endothelial", "Macrophage", "Myofibroblast", "Neutrophil", "Pericytes", "Plasma", "Proliferating Immune II", "Smooth Muscle", "Tumor I", "Tumor III", "Tumor V", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Enteric Glial, Enterocyte, Fibroblast, Goblet, Lymphatic Endothelial, Myofibroblast, Neutrophil, Plasma, Proliferating Immune II, Smooth Muscle, Tumor I, Tumor V, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD8 Cytotoxic T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, CD4 T cell, Endothelial, Macrophage, Pericytes, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

Cell-type-specific SVG identification using `CELINA`

[ ]:

RhpcBLASctl::blas_set_num_threads(1)

celina_object <- Create_Celina_Object(
    celltype_mat = t(proportions_RCTD),
    gene_expression_mat = myRCTD@originalSpatialRNA@counts[, spot_names],
    location = as.matrix(myRCTD@spatialRNA@coords),
    covariates = NULL
)
celina_object <- preprocess_input(
    celina_object, # Celina object
    cell_types_to_test = celltypes_tme, # a vector of cell types to be used for testing
    scRNA_count = FlexRef, # a gene x cell expression matrix of reference scRNA-seq data
    sc_cell_type_labels = CTRef, # a vector of cell type labels for each cell in scRNA_count
    threshold = 5e-5
)
celina_object <- Calculate_Kernel(
    celina_object,
    approximation = FALSE
)
celina_object <- Testing_interaction_all(
    celina_object,
    celltype_to_test = celltypes_tme,
    method = "MM",
    num_cores = 36
)

pvalues <- celina_object@result[
    sapply(celina_object@result, function(x) ncol(x) > 1, simplify = TRUE)
]
rm(celina_object)

saveRDS(
    pvalues,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("celina_pvalues_visium", ".rds")
    )
)

Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 37.7 GiB”

10x Visium HD

Cell type deconvolution using `RCTD`

[ ]:

# library(Seurat)
# library(arrow)

# counts <- Read10X_h5(file.path(
#     RAW_DATA_PATH, "visium_hd",
#     "binned_outputs/square_016um/filtered_feature_bc_matrix.h5"
# ))

# coords <- read_parquet(
#     file.path(
#         RAW_DATA_PATH, "visium_hd",
#         "binned_outputs/square_016um/spatial/tissue_positions.parquet"
#     )
# )
# coords <- as.data.frame(coords)
# rownames(coords) <- coords$barcode
# coords <- coords[colnames(counts), ]
# coords <- coords[, c(6, 5)]

# nUMI <- colSums(counts)

# puck <- SpatialRNA(coords, counts, nUMI)
# barcodes <- colnames(puck@counts)

# myRCTDhd <- create.RCTD(puck, reference, max_cores = 32)
# myRCTD <- run.RCTD(myRCTD, doublet_mode = "doublet")

# saveRDS(
#     myRCTD,
#     file = file.path(
#         RESULT_PATH, "VisiumHD", "myRCTD_16um.rds"
#     )
# )

[27]:

myRCTD <- readRDS(file.path(RESULT_PATH, "VisiumHD", "myRCTD_16um.rds"))

Cell-type-specific SVG identification using `MCube` with different sample sizes

[28]:

weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)
doublet_results_RCTD <- myRCTD@results$results_df

We compare the performances of MCube with different sample sizes on Visium HD data.

[29]:

sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)

[ ]:

for (celltype in celltypes_tme) {
    spots_used <- rownames(doublet_results_RCTD)[
        ((doublet_results_RCTD$spot_class == "singlet" |
            doublet_results_RCTD$spot_class == "doublet_uncertain") &
            doublet_results_RCTD$first_type == celltype
        ) |
            (doublet_results_RCTD$spot_class == "doublet_certain" &
                (doublet_results_RCTD$first_type == celltype |
                    doublet_results_RCTD$second_type == celltype))
    ]
    length(spots_used)

    for (sample_size in sample_size_seq) {
        spots_sample <- sample(spots_used, size = sample_size, replace = FALSE)

        # MCube
        mcube_object <- createMCube(
            counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spots_sample])),
            coordinates = as.matrix(myRCTD@spatialRNA@coords[spots_sample, ]),
            proportions = proportions_RCTD[spots_sample, ],
            library_sizes = myRCTD@spatialRNA@nUMI[spots_sample],
            reference = t(myRCTD@cell_type_info$info[[1]]),
            used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
            spot_effects = spot_effects_RCTD[spots_sample],
            celltype_test = celltype,
            proportion_threshold = 0.01
        )
        mcube_object <- mcubeFitNull(
            mcube_object,
            num_workers = 36, num_threads = 1
        )
        mcube_object <- mcubeTest(
            mcube_object,
            num_workers = 36, num_threads = 1, shared_memory = TRUE
        )
        saveRDS(
            mcube_object@pvalues,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )
        rm(mcube_object)

        # CELINA
        celina_object <- Create_Celina_Object(
            celltype_mat = t(proportions_RCTD[spots_sample, ]),
            gene_expression_mat = myRCTD@originalSpatialRNA@counts[, spots_sample],
            location = as.matrix(myRCTD@spatialRNA@coords[spots_sample, ]),
            covariates = NULL
        )
        celina_object <- preprocess_input(
            celina_object, # Celina object
            cell_types_to_test = celltypes_tme, # a vector of cell types to be used for testing
            scRNA_count = FlexRef, # a gene x cell expression matrix of reference scRNA-seq data
            sc_cell_type_labels = CTRef, # a vector of cell type labels for each cell in scRNA_count
            threshold = 5e-5
        )
        celina_object <- Calculate_Kernel(
            celina_object,
            approximation = FALSE
        )
        celina_object <- Testing_interaction_all(
            celina_object,
            celltype_to_test = celltypes_tme,
            method = "MM",
            num_cores = 36
        )
        pvalues <- celina_object@result[
            sapply(celina_object@result, function(x) ncol(x) > 1, simplify = TRUE)
        ]
        rm(celina_object)
        saveRDS(
            pvalues,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )

        # C-SIDE
        myCSIDE <- run.CSIDE.nonparam(
            myRCTD,
            df = 15, barcodes = spots_sample,
            weight_threshold = 0.8, doublet_mode = TRUE
        )
        saveRDS(
            myCSIDE@de_results$all_gene_list,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("cside_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )
        rm(myCSIDE)
    }
}

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Tumor III")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_thresholdchoose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell, Plasma, Proliferating Immune II, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD8 Cytotoxic T cell, Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD8 Cytotoxic T cell, Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Plasma, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Pericytes", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Pericytes, Plasma, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Pericytes", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Pericytes, Plasma, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Fibroblast", "Macrophage", "Pericytes", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell, Fibroblast, Pericytes, Plasma, Proliferating Immune II, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

Visualization of the comparison results

[3]:

celltypes_tme <- c("CAF", "Macrophage")

[4]:

mcube_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("mcube_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(mcube_pvalues_visium, nrow)

celina_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("celina_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(celina_pvalues_visium, nrow)

cside_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("cside_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(cside_pvalues_visium, nrow)

CAF: 2185
Macrophage: 2683

CAF: 5009
Macrophage: 4770

CAF: 1901
Macrophage: 1908

[5]:

# sample_size_seq <- c(3000L, 4000L, 5000L, 6000, 7000L)
# for (celltype in celltypes_tme) {
#     mcube_pvalues_df <- data.frame(
#         gene = rownames(mcube_pvalues_visium[[celltype]]),
#         Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
#     )
#     for (sample_size in sample_size_seq) {
#         mcube_pvalues_visiumhd <- readRDS(
#             file = file.path(
#                 RESULT_PATH, "comparison",
#                 paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
#             )
#         )[[celltype]]
#         mcube_pvalues_df <- merge(
#             mcube_pvalues_df,
#             data.frame(
#                 gene = rownames(mcube_pvalues_visiumhd),
#                 pvalue = mcube_pvalues_visiumhd$combined_pvalue
#             ),
#             by = "gene"
#         )
#         colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
#     }

#     celina_pvalues_df <- data.frame(
#         gene = rownames(celina_pvalues_visium[[celltype]]),
#         Visium = celina_pvalues_visium[[celltype]]$CombinedPvals
#     )
#     for (sample_size in sample_size_seq) {
#         celina_pvalues_visiumhd <- readRDS(
#             file = file.path(
#                 RESULT_PATH, "comparison",
#                 paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
#             )
#         )[[celltype]]
#         celina_pvalues_df <- merge(
#             celina_pvalues_df,
#             data.frame(
#                 gene = rownames(celina_pvalues_visiumhd),
#                 pvalue = celina_pvalues_visiumhd$CombinedPvals
#             ),
#             by = "gene"
#         )
#         colnames(celina_pvalues_df)[ncol(celina_pvalues_df)] <- paste0("Visium HD ", sample_size)
#     }

#     overlapped_genes <- intersect(mcube_pvalues_df$gene, celina_pvalues_df$gene)
#     length(overlapped_genes)

#     mcube_pvalues_df <- mcube_pvalues_df[
#         mcube_pvalues_df$gene %in% overlapped_genes,
#     ]
#     celina_pvalues_df <- celina_pvalues_df[
#         celina_pvalues_df$gene %in% overlapped_genes,
#     ]

#     mcube_svgs_list <- lapply(
#         mcube_pvalues_df[, -1],
#         FUN = function(x) {
#             x <- p.adjust(x, method = "BH")
#             mcube_pvalues_df$gene[x < 0.05]
#         }
#     )
#     names(mcube_svgs_list) <- c(
#         "Visium", paste0("Visium HD ", sample_size_seq)
#     )
#     mcube_overlapped_svgs <- sapply(
#         mcube_svgs_list,
#         FUN = function(x) {
#             length(intersect(x, mcube_svgs_list[["Visium"]]))
#         },
#         simplify = TRUE
#     )
#     mcube_overlapped_svgs <- data.frame(
#         setting = names(mcube_overlapped_svgs),
#         number = mcube_overlapped_svgs
#     )
#     mcube_overlapped_svgs$proportion <-
#         mcube_overlapped_svgs$number / mcube_overlapped_svgs$number[1] * 100
#     mcube_overlapped_svgs$proportion[1] <- NA
#     mcube_overlapped_svgs$setting <- factor(
#         mcube_overlapped_svgs$setting,
#         levels = c(paste0("Visium HD ", sample_size_seq), "Visium")
#     )

#     celina_svgs_list <- lapply(
#         celina_pvalues_df[, -1],
#         FUN = function(x) {
#             x <- p.adjust(x, method = "BH")
#             celina_pvalues_df$gene[x < 0.05]
#         }
#     )
#     names(celina_svgs_list) <- c(
#         "Visium", paste0("Visium HD ", sample_size_seq)
#     )
#     celina_overlapped_svgs <- sapply(
#         celina_svgs_list,
#         FUN = function(x) {
#             length(intersect(x, celina_svgs_list[["Visium"]]))
#         },
#         simplify = TRUE
#     )
#     celina_overlapped_svgs <- data.frame(
#         setting = names(celina_overlapped_svgs),
#         number = celina_overlapped_svgs
#     )
#     celina_overlapped_svgs$proportion <-
#         celina_overlapped_svgs$number / celina_overlapped_svgs$number[1] * 100
#     celina_overlapped_svgs$proportion[1] <- NA
#     celina_overlapped_svgs$setting <- factor(
#         celina_overlapped_svgs$setting,
#         levels = c(paste0("Visium HD ", sample_size_seq), "Visium")
#     )

#     overlapped_svgs_comparison <- rbind(
#     data.frame(
#         mcube_overlapped_svgs,
#         method = "MMM"
#     ),
#     data.frame(
#         celina_overlapped_svgs,
#         method = "Celina"
#     )
# )
# overlapped_svgs_comparison$method <- factor(
#     overlapped_svgs_comparison$method,
#     levels = c("MMM", "Celina")
# )

# p <- ggplot(data = overlapped_svgs_comparison, aes(x = setting, y = number, group = method)) +
#     geom_bar(aes(fill = method), stat = "identity", position = "dodge") +
#     geom_text(
#         aes(label = ifelse(!is.na(proportion), paste0(round(proportion), "%"), "")),
#         position = position_dodge(width = 0.9), vjust = -0.5, size = 5
#     ) +
#     theme_classic() +
#     labs(
#         title = paste("Number of overlapped SVGs of", celltype),
#         x = NULL, y = NULL
#     ) +
#     theme(
#         plot.title = element_text(size = 20),
#         axis.text.x = element_text(size = 16,angle = 45, vjust = 0.5),
#         # axis.ticks.x = element_blank(),
#         axis.text.y = element_text(size = 16),
#         legend.title = element_blank(),
#         legend.text = element_text(size = 16),
#         legend.position = "bottom"
#     )
#     ggsave(
#         filename = file.path(
#             RESULT_PATH, "comparison",
#             paste0("comparison_overlapped_svgs_", celltype, ".pdf")
#         ),
#         plot = p
#     )
# }

Comparison beween `MCube` and `CELINA`

[6]:

sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)
overlapped_svgs_comparison <- data.frame()
for (celltype in celltypes_tme) {
    mcube_pvalues_df <- data.frame(
        gene = rownames(mcube_pvalues_visium[[celltype]]),
        Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
    )
    for (sample_size in sample_size_seq) {
        mcube_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        mcube_pvalues_df <- merge(
            mcube_pvalues_df,
            data.frame(
                gene = rownames(mcube_pvalues_visiumhd),
                pvalue = mcube_pvalues_visiumhd$combined_pvalue
            ),
            by = "gene"
        )
        colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    celina_pvalues_df <- data.frame(
        gene = rownames(celina_pvalues_visium[[celltype]]),
        Visium = celina_pvalues_visium[[celltype]]$CombinedPvals
    )
    for (sample_size in sample_size_seq) {
        celina_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        celina_pvalues_df <- merge(
            celina_pvalues_df,
            data.frame(
                gene = rownames(celina_pvalues_visiumhd),
                pvalue = celina_pvalues_visiumhd$CombinedPvals
            ),
            by = "gene"
        )
        colnames(celina_pvalues_df)[ncol(celina_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    overlapped_genes <- intersect(mcube_pvalues_df$gene, celina_pvalues_df$gene)
    message(celltype, ", overlapped genes: ", length(overlapped_genes))

    mcube_pvalues_df <- mcube_pvalues_df[
        mcube_pvalues_df$gene %in% overlapped_genes,
    ]
    celina_pvalues_df <- celina_pvalues_df[
        celina_pvalues_df$gene %in% overlapped_genes,
    ]

    mcube_svgs_list <- lapply(
        mcube_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            mcube_pvalues_df$gene[x < 0.05]
        }
    )
    names(mcube_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", MCube, ", length(mcube_svgs_list$Visium))

    mcube_overlapped_svgs <- sapply(
        mcube_svgs_list,
        FUN = function(x) {
            length(intersect(x, mcube_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    mcube_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = mcube_overlapped_svgs[-1] / mcube_overlapped_svgs[1]
    )

    celina_svgs_list <- lapply(
        celina_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            celina_pvalues_df$gene[x < 0.05]
        }
    )
    names(celina_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", CELINA, ", length(celina_svgs_list$Visium))

    celina_overlapped_svgs <- sapply(
        celina_svgs_list,
        FUN = function(x) {
            length(intersect(x, celina_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    celina_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = celina_overlapped_svgs[-1] / celina_overlapped_svgs[1]
    )

    message(
        celltype, ", Visium, ", "SVGs identified by both methods, ",
        length(intersect(
            mcube_svgs_list[["Visium"]], celina_svgs_list[["Visium"]]
        ))
    )

    overlapped_svgs_comparison <- rbind(
        overlapped_svgs_comparison,
        data.frame(
            mcube_overlapped_svgs,
            method = "MMM",
            celltype = celltype
        ),
        data.frame(
            celina_overlapped_svgs,
            method = "Celina",
            celltype = celltype
        )
    )
}
overlapped_svgs_comparison$method <- factor(
    overlapped_svgs_comparison$method,
    levels = c("MMM", "Celina")
)
head(overlapped_svgs_comparison)

CAF, overlapped genes: 1273

CAF, MCube, 797

CAF, CELINA, 498

CAF, Visium, SVGs identified by both methods, 412

Macrophage, overlapped genes: 648

Macrophage, MCube, 315

Macrophage, CELINA, 223

Macrophage, Visium, SVGs identified by both methods, 150

A data.frame: 6 × 4
	sample_size	proportion	method	celltype
	<int>	<dbl>	<fct>	<chr>
Visium HD 3000	3000	0.2710163	MMM	CAF
Visium HD 4000	4000	0.3161857	MMM	CAF
Visium HD 5000	5000	0.3676286	MMM	CAF
Visium HD 6000	6000	0.3776662	MMM	CAF
Visium HD 7000	7000	0.4291092	MMM	CAF
Visium HD 30001	3000	0.1445783	Celina	CAF

[7]:

p <- ggplot(data = overlapped_svgs_comparison, aes(x = sample_size, y = proportion, group = method)) +
    geom_point(aes(color = method, shape = method), size = 6) +
    geom_line(aes(color = method), linewidth = 1.5) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    theme_classic() +
    labs(
        title = paste("Identified cell-type-specific SVGs in the Visium HD vs. Visium data"),
        x = "Number of sample bins in the Visium HD data", y = "Overlapped proportion"
    ) +
    facet_wrap(. ~ celltype, scales = "free_y") +
    theme(
        text = element_text(family = "Helvetica"),
        # plot.title = element_text(size = 20, hjust = 0.5),
        plot.title = element_blank(),
        axis.title = element_text(size = 20),
        axis.text = element_text(size = 16),
        strip.text = element_text(size = 16),
        legend.title = element_blank(),
        legend.text = element_text(size = 16),
        legend.position = "bottom",
        panel.spacing = unit(1, "lines")
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_Celina_overlapped_svgs", ".pdf")
    ),
    plot = p, width = 9, height = 5
)
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_Celina_overlapped_svgs", ".png")
    ),
    plot = p, width = 9, height = 5
)
p

../../_images/analysis_CRC_CRC_MCube_CELINA_CSIDE_comparison_34_0.png

Comparison beween `MCube` and `C-SIDE`

[8]:

sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)
overlapped_svgs_comparison <- data.frame()
for (celltype in celltypes_tme) {
    mcube_pvalues_df <- data.frame(
        gene = rownames(mcube_pvalues_visium[[celltype]]),
        Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
    )
    for (sample_size in sample_size_seq) {
        mcube_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        mcube_pvalues_df <- merge(
            mcube_pvalues_df,
            data.frame(
                gene = rownames(mcube_pvalues_visiumhd),
                pvalue = mcube_pvalues_visiumhd$combined_pvalue
            ),
            by = "gene"
        )
        colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    cside_pvalues_df <- data.frame(
        gene = rownames(cside_pvalues_visium[[celltype]]),
        Visium = cside_pvalues_visium[[celltype]]$p_val
    )
    for (sample_size in sample_size_seq) {
        cside_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("cside_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        cside_pvalues_df <- merge(
            cside_pvalues_df,
            data.frame(
                gene = rownames(cside_pvalues_visiumhd),
                pvalue = cside_pvalues_visiumhd$p_val
            ),
            by = "gene"
        )
        colnames(cside_pvalues_df)[ncol(cside_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    overlapped_genes <- intersect(mcube_pvalues_df$gene, cside_pvalues_df$gene)
    message(celltype, ", overlapped genes: ", length(overlapped_genes))

    mcube_pvalues_df <- mcube_pvalues_df[
        mcube_pvalues_df$gene %in% overlapped_genes,
    ]
    cside_pvalues_df <- cside_pvalues_df[
        cside_pvalues_df$gene %in% overlapped_genes,
    ]

    mcube_svgs_list <- lapply(
        mcube_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            mcube_pvalues_df$gene[x < 0.05]
        }
    )
    names(mcube_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", MCube, ", length(mcube_svgs_list$Visium))

    mcube_overlapped_svgs <- sapply(
        mcube_svgs_list,
        FUN = function(x) {
            length(intersect(x, mcube_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    mcube_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = mcube_overlapped_svgs[-1] / mcube_overlapped_svgs[1]
    )

    cside_svgs_list <- lapply(
        cside_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            cside_pvalues_df$gene[x < 0.05]
        }
    )
    names(cside_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", C-SIDE, ", length(cside_svgs_list$Visium))

    cside_overlapped_svgs <- sapply(
        cside_svgs_list,
        FUN = function(x) {
            length(intersect(x, cside_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    cside_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = cside_overlapped_svgs[-1] / cside_overlapped_svgs[1]
    )

    message(
        celltype, ", Visium, ", "SVGs identified by both methods, ",
        length(intersect(
            mcube_svgs_list[["Visium"]], celina_svgs_list[["Visium"]]
        ))
    )

    overlapped_svgs_comparison <- rbind(
        overlapped_svgs_comparison,
        data.frame(
            mcube_overlapped_svgs,
            method = "MMM",
            celltype = celltype
        ),
        data.frame(
            cside_overlapped_svgs,
            method = "C-SIDE",
            celltype = celltype
        )
    )
}
overlapped_svgs_comparison$method <- factor(
    overlapped_svgs_comparison$method,
    levels = c("MMM", "C-SIDE")
)
head(overlapped_svgs_comparison)

CAF, overlapped genes: 888

CAF, MCube, 601

CAF, C-SIDE, 214

CAF, Visium, SVGs identified by both methods, 40

Macrophage, overlapped genes: 997

Macrophage, MCube, 329

Macrophage, C-SIDE, 305

Macrophage, Visium, SVGs identified by both methods, 129

A data.frame: 6 × 4
	sample_size	proportion	method	celltype
	<int>	<dbl>	<fct>	<chr>
Visium HD 3000	3000	0.3227953	MMM	CAF
Visium HD 4000	4000	0.3643927	MMM	CAF
Visium HD 5000	5000	0.4159734	MMM	CAF
Visium HD 6000	6000	0.4509151	MMM	CAF
Visium HD 7000	7000	0.4941764	MMM	CAF
Visium HD 30001	3000	0.1635514	C-SIDE	CAF

[10]:

p <- ggplot(data = overlapped_svgs_comparison, aes(x = sample_size, y = proportion, group = method)) +
    geom_point(aes(color = method, shape = method), size = 6) +
    geom_line(aes(color = method), linewidth = 1.5) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    theme_classic() +
    labs(
        title = paste("Identified cell-type-specific SVGs in the Visium HD vs. Visium data"),
        x = "Number of sample bins in the Visium HD data", y = "Overlapped proportion"
    ) +
    facet_wrap(. ~ celltype, scales = "free") +
    theme(
        text = element_text(family = "Helvetica"),
        plot.title = element_blank(),
        axis.title = element_text(size = 20),
        axis.text = element_text(size = 16),
        strip.text = element_text(size = 16),
        legend.title = element_blank(),
        legend.text = element_text(size = 16),
        legend.position = "bottom",
        panel.spacing = unit(1, "lines")
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_CSIDE_overlapped_svgs", ".pdf")
    ),
    plot = p, width = 9, height = 5
)
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_CSIDE_overlapped_svgs", ".png")
    ),
    plot = p, width = 9, height = 5
)
p

../../_images/analysis_CRC_CRC_MCube_CELINA_CSIDE_comparison_37_0.png

[ ]:

Consistency comparison of MCube, CELINA, and C-SIDE on the CRC datasets from different technologies

Load in scRNA-seq reference data

10x Visium

Cell type deconvolution using RCTD

Cell-type-specific SVG identification using MCube

Cell-type-specific SVG identification using C-SIDE

Cell-type-specific SVG identification using CELINA

10x Visium HD

Cell type deconvolution using RCTD

Cell-type-specific SVG identification using MCube with different sample sizes

Visualization of the comparison results

Comparison beween MCube and CELINA

Comparison beween MCube and C-SIDE

Consistency comparison of `MCube`, `CELINA`, and `C-SIDE` on the CRC datasets from different technologies

Cell type deconvolution using `RCTD`

Cell-type-specific SVG identification using `MCube`

Cell-type-specific SVG identification using `C-SIDE`

Cell-type-specific SVG identification using `CELINA`

Cell type deconvolution using `RCTD`

Cell-type-specific SVG identification using `MCube` with different sample sizes

Comparison beween `MCube` and `CELINA`

Comparison beween `MCube` and `C-SIDE`