Consistency comparison of MCube, CELINA, and C-SIDE on the CRC datasets from different technologies

[1]:
set.seed(20250502)

library(Matrix)
library(spacexr)
library(MCube)
library(CELINA)
library(ggplot2)
[2]:
RAW_DATA_PATH <- "/import/home/share/zw/data/CRC"
DATA_PATH <- "/import/home/share/zw/pql/data/CRC"
RESULT_PATH <- "/import/home/share/zw/pql/results/CRC"

if (!dir.exists(file.path(RESULT_PATH, "Visium"))) {
    dir.create(file.path(RESULT_PATH, "Visium"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "VisiumHD"))) {
    dir.create(file.path(RESULT_PATH, "VisiumHD"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "comparison"))) {
    dir.create(file.path(RESULT_PATH, "comparison"), recursive = TRUE)
}

Load in scRNA-seq reference data

[23]:
library(Seurat)

FlexRef <- Read10X_h5(file.path(
    RAW_DATA_PATH, "sc", "HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5"
))
# MetaData <- readRDS(file.path(
#     RAW_DATA_PATH, "sc", "FlexSeuratV5_MetaData.rds"
# )) # See FlexSingleCell.R if not generated.

meta <- read.csv(file.path(
    RAW_DATA_PATH, "HumanColonCancer_VisiumHD/MetaData/SingleCell_MetaData.csv.gz"
))

KpIdents <- names(which(table(meta$Level2) > 25))
meta <- meta[meta$Level2 %in% KpIdents, ]
FlexRef <- FlexRef[, meta$Barcode]

CTRef <- meta$Level2
CTRef <- gsub("/", "_", CTRef)
CTRef <- as.factor(CTRef)
names(CTRef) <- meta$Barcode

reference <- Reference(FlexRef, CTRef, colSums(FlexRef))
Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t


Warning message in check_UMI(nUMI, "Reference", require_2d = T, require_int = require_int, :
“Reference: some nUMI values are less than min_UMI = 100, and these cells will be removed. Optionally, you may lower the min_UMI parameter.”
Warning message in Reference(FlexRef, CTRef, colSums(FlexRef)):
“Reference: number of cells per cell type is 34844, larger than maximum allowable of 10000. Downsampling number of cells to: 10000”

We then compare the statistical power of MCube and CELINA on ST CRC datasets. We focus on the cancer associated fibroblasts (CAFs) and macrophages to study the tumor microenvironment (TME). In the tumor boundary region, CAFs are the most prominent cell type, while macrophages are consistently identified as the most abundant immune cell type.

[3]:
celltypes_tme <- c("CAF", "Macrophage")

Since there is no ground truth on cell-type-specific SVGs, we used the identified SVGs in the paired Visium data for verification.

10x Visium

Cell type deconvolution using RCTD

[ ]:
# counts <- Read10X_h5(file.path(
#     RAW_DATA_PATH, "visium",
#     "Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5"
# ))

# coordinates <- as.data.frame(readr::read_csv(file.path(
#     RAW_DATA_PATH, "visium",
#     "spatial/tissue_positions.csv"
# )))
# rownames(coordinates)<-coordinates$barcode
# coordinates<-coordinates[colnames(counts), ]
# coordinates <- coordinates[, c(6, 5)]

# puck <- SpatialRNA(coordinates, counts, colSums(counts))

# myRCTD <- create.RCTD(puck, reference, max_cores = 32)
# myRCTD <- run.RCTD(myRCTD, doublet_mode = "full")
# saveRDS(myRCTD, file.path(RESULT_PATH, "Visium", "myRCTD.rds"))
[5]:
myRCTD <- readRDS(file.path(RESULT_PATH, "Visium", "myRCTD.rds"))

Cell-type-specific SVG identification using MCube

[ ]:
spot_names <- colnames(myRCTD@spatialRNA@counts)
weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)
[ ]:
mcube_object <- createMCube(
    counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spot_names])),
    coordinates = as.matrix(myRCTD@spatialRNA@coords),
    proportions = proportions_RCTD,
    library_sizes = myRCTD@spatialRNA@nUMI,
    reference = t(myRCTD@cell_type_info$info[[1]]),
    used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
    spot_effects = spot_effects_RCTD,
    celltype_test = celltypes_tme,
    celltype_threshold = 50
)
mcube_object <- mcubeFitNull(
    mcube_object,
    num_workers = 36, num_threads = 1
)
mcube_object <- mcubeTest(
    mcube_object,
    num_workers = 36, num_threads = 1, shared_memory = TRUE
)

saveRDS(
    mcube_object@pvalues,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("mcube_pvalues_visium", ".rds")
    )
)

Cell-type-specific SVG identification using C-SIDE

[23]:
myCSIDE <- run.CSIDE.nonparam(
    myRCTD,
    df = 15,
    weight_threshold = 0.8, cell_type_threshold = 20,
    doublet_mode = FALSE
)
saveRDS(
    myCSIDE@de_results$all_gene_list,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("cside_pvalues_visium", ".rds")
    )
)
choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Enteric Glial", "Enterocyte", "Fibroblast", "Goblet", "Lymphatic Endothelial", "Macrophage", "Myofibroblast", "Neutrophil", "Pericytes", "Plasma", "Proliferating Immune II", "Smooth Muscle", "Tumor I", "Tumor III", "Tumor V", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Enteric Glial, Enterocyte, Fibroblast, Goblet, Lymphatic Endothelial, Myofibroblast, Neutrophil, Plasma, Proliferating Immune II, Smooth Muscle, Tumor I, Tumor V, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD8 Cytotoxic T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, CD4 T cell, Endothelial, Macrophage, Pericytes, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

Cell-type-specific SVG identification using CELINA

[ ]:
RhpcBLASctl::blas_set_num_threads(1)

celina_object <- Create_Celina_Object(
    celltype_mat = t(proportions_RCTD),
    gene_expression_mat = myRCTD@originalSpatialRNA@counts[, spot_names],
    location = as.matrix(myRCTD@spatialRNA@coords),
    covariates = NULL
)
celina_object <- preprocess_input(
    celina_object, # Celina object
    cell_types_to_test = celltypes_tme, # a vector of cell types to be used for testing
    scRNA_count = FlexRef, # a gene x cell expression matrix of reference scRNA-seq data
    sc_cell_type_labels = CTRef, # a vector of cell type labels for each cell in scRNA_count
    threshold = 5e-5
)
celina_object <- Calculate_Kernel(
    celina_object,
    approximation = FALSE
)
celina_object <- Testing_interaction_all(
    celina_object,
    celltype_to_test = celltypes_tme,
    method = "MM",
    num_cores = 36
)

pvalues <- celina_object@result[
    sapply(celina_object@result, function(x) ncol(x) > 1, simplify = TRUE)
]
rm(celina_object)

saveRDS(
    pvalues,
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("celina_pvalues_visium", ".rds")
    )
)
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 37.7 GiB”

10x Visium HD

Cell type deconvolution using RCTD

[ ]:
# library(Seurat)
# library(arrow)

# counts <- Read10X_h5(file.path(
#     RAW_DATA_PATH, "visium_hd",
#     "binned_outputs/square_016um/filtered_feature_bc_matrix.h5"
# ))

# coords <- read_parquet(
#     file.path(
#         RAW_DATA_PATH, "visium_hd",
#         "binned_outputs/square_016um/spatial/tissue_positions.parquet"
#     )
# )
# coords <- as.data.frame(coords)
# rownames(coords) <- coords$barcode
# coords <- coords[colnames(counts), ]
# coords <- coords[, c(6, 5)]

# nUMI <- colSums(counts)

# puck <- SpatialRNA(coords, counts, nUMI)
# barcodes <- colnames(puck@counts)

# myRCTDhd <- create.RCTD(puck, reference, max_cores = 32)
# myRCTD <- run.RCTD(myRCTD, doublet_mode = "doublet")

# saveRDS(
#     myRCTD,
#     file = file.path(
#         RESULT_PATH, "VisiumHD", "myRCTD_16um.rds"
#     )
# )
[27]:
myRCTD <- readRDS(file.path(RESULT_PATH, "VisiumHD", "myRCTD_16um.rds"))

Cell-type-specific SVG identification using MCube with different sample sizes

[28]:
weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)
doublet_results_RCTD <- myRCTD@results$results_df

We compare the performances of MCube with different sample sizes on Visium HD data.

[29]:
sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)
[ ]:
for (celltype in celltypes_tme) {
    spots_used <- rownames(doublet_results_RCTD)[
        ((doublet_results_RCTD$spot_class == "singlet" |
            doublet_results_RCTD$spot_class == "doublet_uncertain") &
            doublet_results_RCTD$first_type == celltype
        ) |
            (doublet_results_RCTD$spot_class == "doublet_certain" &
                (doublet_results_RCTD$first_type == celltype |
                    doublet_results_RCTD$second_type == celltype))
    ]
    length(spots_used)

    for (sample_size in sample_size_seq) {
        spots_sample <- sample(spots_used, size = sample_size, replace = FALSE)

        # MCube
        mcube_object <- createMCube(
            counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spots_sample])),
            coordinates = as.matrix(myRCTD@spatialRNA@coords[spots_sample, ]),
            proportions = proportions_RCTD[spots_sample, ],
            library_sizes = myRCTD@spatialRNA@nUMI[spots_sample],
            reference = t(myRCTD@cell_type_info$info[[1]]),
            used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
            spot_effects = spot_effects_RCTD[spots_sample],
            celltype_test = celltype,
            proportion_threshold = 0.01
        )
        mcube_object <- mcubeFitNull(
            mcube_object,
            num_workers = 36, num_threads = 1
        )
        mcube_object <- mcubeTest(
            mcube_object,
            num_workers = 36, num_threads = 1, shared_memory = TRUE
        )
        saveRDS(
            mcube_object@pvalues,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )
        rm(mcube_object)

        # CELINA
        celina_object <- Create_Celina_Object(
            celltype_mat = t(proportions_RCTD[spots_sample, ]),
            gene_expression_mat = myRCTD@originalSpatialRNA@counts[, spots_sample],
            location = as.matrix(myRCTD@spatialRNA@coords[spots_sample, ]),
            covariates = NULL
        )
        celina_object <- preprocess_input(
            celina_object, # Celina object
            cell_types_to_test = celltypes_tme, # a vector of cell types to be used for testing
            scRNA_count = FlexRef, # a gene x cell expression matrix of reference scRNA-seq data
            sc_cell_type_labels = CTRef, # a vector of cell type labels for each cell in scRNA_count
            threshold = 5e-5
        )
        celina_object <- Calculate_Kernel(
            celina_object,
            approximation = FALSE
        )
        celina_object <- Testing_interaction_all(
            celina_object,
            celltype_to_test = celltypes_tme,
            method = "MM",
            num_cores = 36
        )
        pvalues <- celina_object@result[
            sapply(celina_object@result, function(x) ncol(x) > 1, simplify = TRUE)
        ]
        rm(celina_object)
        saveRDS(
            pvalues,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )

        # C-SIDE
        myCSIDE <- run.CSIDE.nonparam(
            myRCTD,
            df = 15, barcodes = spots_sample,
            weight_threshold = 0.8, doublet_mode = TRUE
        )
        saveRDS(
            myCSIDE@de_results$all_gene_list,
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("cside_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )
        rm(myCSIDE)
    }
}
choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Tumor III")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_thresholdchoose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell, Plasma, Proliferating Immune II, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD8 Cytotoxic T cell, Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "CD8 Cytotoxic T cell", "Endothelial", "Macrophage", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD8 Cytotoxic T cell, Plasma, Proliferating Immune II, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: CD4 T cell because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Plasma, vSM,

Warning message in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
“choose_cell_types: removing cell types: Endothelial because these cell types did not contain sufficient pixels passing total cell type weight of weight_threshold = 0.8. Consider removing this cell type or lowering weight_threshold”
run.CSIDE.general: running CSIDE with cell types CAF, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Pericytes", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Pericytes, Plasma, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "Endothelial", "Macrophage", "Pericytes", "Plasma", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: Pericytes, Plasma, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

choose_cell_types: detected the following cell types occuring at least cell_type_threshold times: c("CAF", "CD4 T cell", "Endothelial", "Fibroblast", "Macrophage", "Pericytes", "Plasma", "Proliferating Immune II", "Tumor III", "vSM")


Warning: run.CSIDE.general: removing the following cell types due to insufficient counts per region. Consider lowering cell_type_threshold or proceeding with removed cell types. Cell types: CD4 T cell, Fibroblast, Pericytes, Plasma, Proliferating Immune II, vSM,

run.CSIDE.general: running CSIDE with cell types CAF, Endothelial, Macrophage, Tumor III

run.CSIDE.general: configure params_to_test = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,

filter_genes: filtering genes based on threshold = 5e-05

set_global_Q_all: begin

set_global_Q_all: finished

Visualization of the comparison results

[3]:
celltypes_tme <- c("CAF", "Macrophage")
[4]:
mcube_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("mcube_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(mcube_pvalues_visium, nrow)

celina_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("celina_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(celina_pvalues_visium, nrow)

cside_pvalues_visium <- readRDS(
    file = file.path(
        RESULT_PATH, "comparison",
        paste0("cside_pvalues_visium", ".rds")
    )
)[celltypes_tme]
sapply(cside_pvalues_visium, nrow)
CAF
2185
Macrophage
2683
CAF
5009
Macrophage
4770
CAF
1901
Macrophage
1908
[5]:
# sample_size_seq <- c(3000L, 4000L, 5000L, 6000, 7000L)
# for (celltype in celltypes_tme) {
#     mcube_pvalues_df <- data.frame(
#         gene = rownames(mcube_pvalues_visium[[celltype]]),
#         Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
#     )
#     for (sample_size in sample_size_seq) {
#         mcube_pvalues_visiumhd <- readRDS(
#             file = file.path(
#                 RESULT_PATH, "comparison",
#                 paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
#             )
#         )[[celltype]]
#         mcube_pvalues_df <- merge(
#             mcube_pvalues_df,
#             data.frame(
#                 gene = rownames(mcube_pvalues_visiumhd),
#                 pvalue = mcube_pvalues_visiumhd$combined_pvalue
#             ),
#             by = "gene"
#         )
#         colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
#     }

#     celina_pvalues_df <- data.frame(
#         gene = rownames(celina_pvalues_visium[[celltype]]),
#         Visium = celina_pvalues_visium[[celltype]]$CombinedPvals
#     )
#     for (sample_size in sample_size_seq) {
#         celina_pvalues_visiumhd <- readRDS(
#             file = file.path(
#                 RESULT_PATH, "comparison",
#                 paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
#             )
#         )[[celltype]]
#         celina_pvalues_df <- merge(
#             celina_pvalues_df,
#             data.frame(
#                 gene = rownames(celina_pvalues_visiumhd),
#                 pvalue = celina_pvalues_visiumhd$CombinedPvals
#             ),
#             by = "gene"
#         )
#         colnames(celina_pvalues_df)[ncol(celina_pvalues_df)] <- paste0("Visium HD ", sample_size)
#     }

#     overlapped_genes <- intersect(mcube_pvalues_df$gene, celina_pvalues_df$gene)
#     length(overlapped_genes)

#     mcube_pvalues_df <- mcube_pvalues_df[
#         mcube_pvalues_df$gene %in% overlapped_genes,
#     ]
#     celina_pvalues_df <- celina_pvalues_df[
#         celina_pvalues_df$gene %in% overlapped_genes,
#     ]

#     mcube_svgs_list <- lapply(
#         mcube_pvalues_df[, -1],
#         FUN = function(x) {
#             x <- p.adjust(x, method = "BH")
#             mcube_pvalues_df$gene[x < 0.05]
#         }
#     )
#     names(mcube_svgs_list) <- c(
#         "Visium", paste0("Visium HD ", sample_size_seq)
#     )
#     mcube_overlapped_svgs <- sapply(
#         mcube_svgs_list,
#         FUN = function(x) {
#             length(intersect(x, mcube_svgs_list[["Visium"]]))
#         },
#         simplify = TRUE
#     )
#     mcube_overlapped_svgs <- data.frame(
#         setting = names(mcube_overlapped_svgs),
#         number = mcube_overlapped_svgs
#     )
#     mcube_overlapped_svgs$proportion <-
#         mcube_overlapped_svgs$number / mcube_overlapped_svgs$number[1] * 100
#     mcube_overlapped_svgs$proportion[1] <- NA
#     mcube_overlapped_svgs$setting <- factor(
#         mcube_overlapped_svgs$setting,
#         levels = c(paste0("Visium HD ", sample_size_seq), "Visium")
#     )

#     celina_svgs_list <- lapply(
#         celina_pvalues_df[, -1],
#         FUN = function(x) {
#             x <- p.adjust(x, method = "BH")
#             celina_pvalues_df$gene[x < 0.05]
#         }
#     )
#     names(celina_svgs_list) <- c(
#         "Visium", paste0("Visium HD ", sample_size_seq)
#     )
#     celina_overlapped_svgs <- sapply(
#         celina_svgs_list,
#         FUN = function(x) {
#             length(intersect(x, celina_svgs_list[["Visium"]]))
#         },
#         simplify = TRUE
#     )
#     celina_overlapped_svgs <- data.frame(
#         setting = names(celina_overlapped_svgs),
#         number = celina_overlapped_svgs
#     )
#     celina_overlapped_svgs$proportion <-
#         celina_overlapped_svgs$number / celina_overlapped_svgs$number[1] * 100
#     celina_overlapped_svgs$proportion[1] <- NA
#     celina_overlapped_svgs$setting <- factor(
#         celina_overlapped_svgs$setting,
#         levels = c(paste0("Visium HD ", sample_size_seq), "Visium")
#     )

#     overlapped_svgs_comparison <- rbind(
#     data.frame(
#         mcube_overlapped_svgs,
#         method = "MMM"
#     ),
#     data.frame(
#         celina_overlapped_svgs,
#         method = "Celina"
#     )
# )
# overlapped_svgs_comparison$method <- factor(
#     overlapped_svgs_comparison$method,
#     levels = c("MMM", "Celina")
# )

# p <- ggplot(data = overlapped_svgs_comparison, aes(x = setting, y = number, group = method)) +
#     geom_bar(aes(fill = method), stat = "identity", position = "dodge") +
#     geom_text(
#         aes(label = ifelse(!is.na(proportion), paste0(round(proportion), "%"), "")),
#         position = position_dodge(width = 0.9), vjust = -0.5, size = 5
#     ) +
#     theme_classic() +
#     labs(
#         title = paste("Number of overlapped SVGs of", celltype),
#         x = NULL, y = NULL
#     ) +
#     theme(
#         plot.title = element_text(size = 20),
#         axis.text.x = element_text(size = 16,angle = 45, vjust = 0.5),
#         # axis.ticks.x = element_blank(),
#         axis.text.y = element_text(size = 16),
#         legend.title = element_blank(),
#         legend.text = element_text(size = 16),
#         legend.position = "bottom"
#     )
#     ggsave(
#         filename = file.path(
#             RESULT_PATH, "comparison",
#             paste0("comparison_overlapped_svgs_", celltype, ".pdf")
#         ),
#         plot = p
#     )
# }

Comparison beween MCube and CELINA

[6]:
sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)
overlapped_svgs_comparison <- data.frame()
for (celltype in celltypes_tme) {
    mcube_pvalues_df <- data.frame(
        gene = rownames(mcube_pvalues_visium[[celltype]]),
        Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
    )
    for (sample_size in sample_size_seq) {
        mcube_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        mcube_pvalues_df <- merge(
            mcube_pvalues_df,
            data.frame(
                gene = rownames(mcube_pvalues_visiumhd),
                pvalue = mcube_pvalues_visiumhd$combined_pvalue
            ),
            by = "gene"
        )
        colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    celina_pvalues_df <- data.frame(
        gene = rownames(celina_pvalues_visium[[celltype]]),
        Visium = celina_pvalues_visium[[celltype]]$CombinedPvals
    )
    for (sample_size in sample_size_seq) {
        celina_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("celina_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        celina_pvalues_df <- merge(
            celina_pvalues_df,
            data.frame(
                gene = rownames(celina_pvalues_visiumhd),
                pvalue = celina_pvalues_visiumhd$CombinedPvals
            ),
            by = "gene"
        )
        colnames(celina_pvalues_df)[ncol(celina_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    overlapped_genes <- intersect(mcube_pvalues_df$gene, celina_pvalues_df$gene)
    message(celltype, ", overlapped genes: ", length(overlapped_genes))

    mcube_pvalues_df <- mcube_pvalues_df[
        mcube_pvalues_df$gene %in% overlapped_genes,
    ]
    celina_pvalues_df <- celina_pvalues_df[
        celina_pvalues_df$gene %in% overlapped_genes,
    ]

    mcube_svgs_list <- lapply(
        mcube_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            mcube_pvalues_df$gene[x < 0.05]
        }
    )
    names(mcube_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", MCube, ", length(mcube_svgs_list$Visium))

    mcube_overlapped_svgs <- sapply(
        mcube_svgs_list,
        FUN = function(x) {
            length(intersect(x, mcube_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    mcube_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = mcube_overlapped_svgs[-1] / mcube_overlapped_svgs[1]
    )

    celina_svgs_list <- lapply(
        celina_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            celina_pvalues_df$gene[x < 0.05]
        }
    )
    names(celina_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", CELINA, ", length(celina_svgs_list$Visium))

    celina_overlapped_svgs <- sapply(
        celina_svgs_list,
        FUN = function(x) {
            length(intersect(x, celina_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    celina_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = celina_overlapped_svgs[-1] / celina_overlapped_svgs[1]
    )

    message(
        celltype, ", Visium, ", "SVGs identified by both methods, ",
        length(intersect(
            mcube_svgs_list[["Visium"]], celina_svgs_list[["Visium"]]
        ))
    )

    overlapped_svgs_comparison <- rbind(
        overlapped_svgs_comparison,
        data.frame(
            mcube_overlapped_svgs,
            method = "MMM",
            celltype = celltype
        ),
        data.frame(
            celina_overlapped_svgs,
            method = "Celina",
            celltype = celltype
        )
    )
}
overlapped_svgs_comparison$method <- factor(
    overlapped_svgs_comparison$method,
    levels = c("MMM", "Celina")
)
head(overlapped_svgs_comparison)
CAF, overlapped genes: 1273

CAF, MCube, 797

CAF, CELINA, 498

CAF, Visium, SVGs identified by both methods, 412

Macrophage, overlapped genes: 648

Macrophage, MCube, 315

Macrophage, CELINA, 223

Macrophage, Visium, SVGs identified by both methods, 150

A data.frame: 6 × 4
sample_sizeproportionmethodcelltype
<int><dbl><fct><chr>
Visium HD 300030000.2710163MMM CAF
Visium HD 400040000.3161857MMM CAF
Visium HD 500050000.3676286MMM CAF
Visium HD 600060000.3776662MMM CAF
Visium HD 700070000.4291092MMM CAF
Visium HD 3000130000.1445783CelinaCAF
[7]:
p <- ggplot(data = overlapped_svgs_comparison, aes(x = sample_size, y = proportion, group = method)) +
    geom_point(aes(color = method, shape = method), size = 6) +
    geom_line(aes(color = method), linewidth = 1.5) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    theme_classic() +
    labs(
        title = paste("Identified cell-type-specific SVGs in the Visium HD vs. Visium data"),
        x = "Number of sample bins in the Visium HD data", y = "Overlapped proportion"
    ) +
    facet_wrap(. ~ celltype, scales = "free_y") +
    theme(
        text = element_text(family = "Helvetica"),
        # plot.title = element_text(size = 20, hjust = 0.5),
        plot.title = element_blank(),
        axis.title = element_text(size = 20),
        axis.text = element_text(size = 16),
        strip.text = element_text(size = 16),
        legend.title = element_blank(),
        legend.text = element_text(size = 16),
        legend.position = "bottom",
        panel.spacing = unit(1, "lines")
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_Celina_overlapped_svgs", ".pdf")
    ),
    plot = p, width = 9, height = 5
)
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_Celina_overlapped_svgs", ".png")
    ),
    plot = p, width = 9, height = 5
)
p
../../_images/analysis_CRC_CRC_MCube_CELINA_CSIDE_comparison_34_0.png

Comparison beween MCube and C-SIDE

[8]:
sample_size_seq <- c(3000L, 4000L, 5000L, 6000L, 7000L)
overlapped_svgs_comparison <- data.frame()
for (celltype in celltypes_tme) {
    mcube_pvalues_df <- data.frame(
        gene = rownames(mcube_pvalues_visium[[celltype]]),
        Visium = mcube_pvalues_visium[[celltype]]$combined_pvalue
    )
    for (sample_size in sample_size_seq) {
        mcube_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("mcube_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        mcube_pvalues_df <- merge(
            mcube_pvalues_df,
            data.frame(
                gene = rownames(mcube_pvalues_visiumhd),
                pvalue = mcube_pvalues_visiumhd$combined_pvalue
            ),
            by = "gene"
        )
        colnames(mcube_pvalues_df)[ncol(mcube_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    cside_pvalues_df <- data.frame(
        gene = rownames(cside_pvalues_visium[[celltype]]),
        Visium = cside_pvalues_visium[[celltype]]$p_val
    )
    for (sample_size in sample_size_seq) {
        cside_pvalues_visiumhd <- readRDS(
            file = file.path(
                RESULT_PATH, "comparison",
                paste0("cside_pvalues_visiumhd_", celltype, "_", sample_size, ".rds")
            )
        )[[celltype]]
        cside_pvalues_df <- merge(
            cside_pvalues_df,
            data.frame(
                gene = rownames(cside_pvalues_visiumhd),
                pvalue = cside_pvalues_visiumhd$p_val
            ),
            by = "gene"
        )
        colnames(cside_pvalues_df)[ncol(cside_pvalues_df)] <- paste0("Visium HD ", sample_size)
    }

    overlapped_genes <- intersect(mcube_pvalues_df$gene, cside_pvalues_df$gene)
    message(celltype, ", overlapped genes: ", length(overlapped_genes))

    mcube_pvalues_df <- mcube_pvalues_df[
        mcube_pvalues_df$gene %in% overlapped_genes,
    ]
    cside_pvalues_df <- cside_pvalues_df[
        cside_pvalues_df$gene %in% overlapped_genes,
    ]

    mcube_svgs_list <- lapply(
        mcube_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            mcube_pvalues_df$gene[x < 0.05]
        }
    )
    names(mcube_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", MCube, ", length(mcube_svgs_list$Visium))

    mcube_overlapped_svgs <- sapply(
        mcube_svgs_list,
        FUN = function(x) {
            length(intersect(x, mcube_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    mcube_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = mcube_overlapped_svgs[-1] / mcube_overlapped_svgs[1]
    )

    cside_svgs_list <- lapply(
        cside_pvalues_df[, -1],
        FUN = function(x) {
            x <- p.adjust(x, method = "BH")
            cside_pvalues_df$gene[x < 0.05]
        }
    )
    names(cside_svgs_list) <- c(
        "Visium", paste0("Visium HD ", sample_size_seq)
    )
    message(celltype, ", C-SIDE, ", length(cside_svgs_list$Visium))

    cside_overlapped_svgs <- sapply(
        cside_svgs_list,
        FUN = function(x) {
            length(intersect(x, cside_svgs_list[["Visium"]]))
        },
        simplify = TRUE
    )
    cside_overlapped_svgs <- data.frame(
        sample_size = sample_size_seq,
        proportion = cside_overlapped_svgs[-1] / cside_overlapped_svgs[1]
    )

    message(
        celltype, ", Visium, ", "SVGs identified by both methods, ",
        length(intersect(
            mcube_svgs_list[["Visium"]], celina_svgs_list[["Visium"]]
        ))
    )

    overlapped_svgs_comparison <- rbind(
        overlapped_svgs_comparison,
        data.frame(
            mcube_overlapped_svgs,
            method = "MMM",
            celltype = celltype
        ),
        data.frame(
            cside_overlapped_svgs,
            method = "C-SIDE",
            celltype = celltype
        )
    )
}
overlapped_svgs_comparison$method <- factor(
    overlapped_svgs_comparison$method,
    levels = c("MMM", "C-SIDE")
)
head(overlapped_svgs_comparison)
CAF, overlapped genes: 888

CAF, MCube, 601

CAF, C-SIDE, 214

CAF, Visium, SVGs identified by both methods, 40

Macrophage, overlapped genes: 997

Macrophage, MCube, 329

Macrophage, C-SIDE, 305

Macrophage, Visium, SVGs identified by both methods, 129

A data.frame: 6 × 4
sample_sizeproportionmethodcelltype
<int><dbl><fct><chr>
Visium HD 300030000.3227953MMM CAF
Visium HD 400040000.3643927MMM CAF
Visium HD 500050000.4159734MMM CAF
Visium HD 600060000.4509151MMM CAF
Visium HD 700070000.4941764MMM CAF
Visium HD 3000130000.1635514C-SIDECAF
[10]:
p <- ggplot(data = overlapped_svgs_comparison, aes(x = sample_size, y = proportion, group = method)) +
    geom_point(aes(color = method, shape = method), size = 6) +
    geom_line(aes(color = method), linewidth = 1.5) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    theme_classic() +
    labs(
        title = paste("Identified cell-type-specific SVGs in the Visium HD vs. Visium data"),
        x = "Number of sample bins in the Visium HD data", y = "Overlapped proportion"
    ) +
    facet_wrap(. ~ celltype, scales = "free") +
    theme(
        text = element_text(family = "Helvetica"),
        plot.title = element_blank(),
        axis.title = element_text(size = 20),
        axis.text = element_text(size = 16),
        strip.text = element_text(size = 16),
        legend.title = element_blank(),
        legend.text = element_text(size = 16),
        legend.position = "bottom",
        panel.spacing = unit(1, "lines")
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_CSIDE_overlapped_svgs", ".pdf")
    ),
    plot = p, width = 9, height = 5
)
ggsave(
    filename = file.path(
        RESULT_PATH, "comparison",
        paste0("comparison_MMM_CSIDE_overlapped_svgs", ".png")
    ),
    plot = p, width = 9, height = 5
)
p
../../_images/analysis_CRC_CRC_MCube_CELINA_CSIDE_comparison_37_0.png
[ ]: