Applying RCTD and MCube to the 10x Visium CRC dataset

[1]:
set.seed(20250502)

library(Matrix)
library(ggplot2)

library(spacexr)
library(MCube)
[2]:
RAW_DATA_PATH <- "/import/home/share/zw/data/CRC"
DATA_PATH <- "/import/home/share/zw/pql/data/CRC"
RESULT_PATH <- "/import/home/share/zw/pql/results/CRC/Visium"

if (!dir.exists(file.path(RESULT_PATH))) {
    dir.create(file.path(RESULT_PATH), recursive = TRUE)
}

Cell type deconvolution using RCTD

[3]:
library(Seurat)

FlexRef <- Read10X_h5(file.path(
    RAW_DATA_PATH, "sc",
    "HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5"
))
# MetaData <- readRDS(file.path(
#     RAW_DATA_PATH, "sc", "FlexSeuratV5_MetaData.rds"
# )) # See FlexSingleCell.R if not generated.

meta <- read.csv(file.path(
    RAW_DATA_PATH, "HumanColonCancer_VisiumHD/MetaData/SingleCell_MetaData.csv.gz"
))

KpIdents <- names(which(table(meta$Level2) > 25))
meta <- meta[meta$Level2 %in% KpIdents, ]
FlexRef <- FlexRef[, meta$Barcode]

CTRef <- meta$Level2
CTRef <- gsub("/", "_", CTRef)
CTRef <- as.factor(CTRef)
names(CTRef) <- meta$Barcode

reference <- Reference(FlexRef, CTRef, colSums(FlexRef))
Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t


Warning message in check_UMI(nUMI, "Reference", require_2d = T, require_int = require_int, :
“Reference: some nUMI values are less than min_UMI = 100, and these cells will be removed. Optionally, you may lower the min_UMI parameter.”
Warning message in Reference(FlexRef, CTRef, colSums(FlexRef)):
“Reference: number of cells per cell type is 34844, larger than maximum allowable of 10000. Downsampling number of cells to: 10000”
[4]:
counts <- Read10X_h5(file.path(
    RAW_DATA_PATH, "visium",
    "Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5"
))

coordinates <- as.data.frame(readr::read_csv(file.path(
    RAW_DATA_PATH, "visium",
    "spatial/tissue_positions.csv"
)))
rownames(coordinates)<-coordinates$barcode
coordinates<-coordinates[colnames(counts), ]
coordinates <- coordinates[, c(6, 5)]

puck <- SpatialRNA(coordinates, counts, colSums(counts))
Rows: 14336 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_ful...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
[5]:
myRCTD <- create.RCTD(puck, reference, max_cores = 12)
myRCTD <- run.RCTD(myRCTD, doublet_mode = "full")
saveRDS(myRCTD, file.path(RESULT_PATH, "myRCTD.rds"))
# myRCTD <- readRDS(file.path(RESULT_PATH, "myRCTD.rds"))
Begin: process_cell_type_info

process_cell_type_info: number of cells in reference: 186667

process_cell_type_info: number of genes in reference: 18082


              Adipocyte                     CAF              CD4 T cell
                     83                   10000                   10000
   CD8 Cytotoxic T cell             Endothelial           Enteric Glial
                  10000                    6940                    3680
             Enterocyte              Epithelial              Fibroblast
                   1739                    7432                   10000
                 Goblet   Lymphatic Endothelial              Macrophage
                  10000                    1095                   10000
                   Mast                Mature B                  mRegDC
                   1925                    8318                     404
          Myofibroblast          Neuroendocrine              Neutrophil
                    972                     529                    3138
                    pDC               Pericytes                  Plasma
                    119                    3046                   10000
Proliferating Immune II             QC_Filtered      SM Stress Response
                   2560                   10000                   10000
          Smooth Muscle                    Tuft                 Tumor I
                  10000                     101                   10000
               Tumor II               Tumor III                 Tumor V
                   8292                   10000                    7893
       Unknown III (SM)                     vSM
                   1784                    6617
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.0 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
End: process_cell_type_info

create.RCTD: getting regression differentially expressed genes:

get_de_genes: Adipocyte found DE genes: 360

get_de_genes: CAF found DE genes: 217

get_de_genes: CD4 T cell found DE genes: 387

get_de_genes: CD8 Cytotoxic T cell found DE genes: 441

get_de_genes: Endothelial found DE genes: 335

get_de_genes: Enteric Glial found DE genes: 329

get_de_genes: Enterocyte found DE genes: 203

get_de_genes: Epithelial found DE genes: 71

get_de_genes: Fibroblast found DE genes: 287

get_de_genes: Goblet found DE genes: 311

get_de_genes: Lymphatic Endothelial found DE genes: 370

get_de_genes: Macrophage found DE genes: 355

get_de_genes: Mast found DE genes: 351

get_de_genes: Mature B found DE genes: 415

get_de_genes: mRegDC found DE genes: 463

get_de_genes: Myofibroblast found DE genes: 348

get_de_genes: Neuroendocrine found DE genes: 178

get_de_genes: Neutrophil found DE genes: 406

get_de_genes: pDC found DE genes: 494

get_de_genes: Pericytes found DE genes: 311

get_de_genes: Plasma found DE genes: 108

get_de_genes: Proliferating Immune II found DE genes: 311

get_de_genes: QC_Filtered found DE genes: 55

get_de_genes: SM Stress Response found DE genes: 400

get_de_genes: Smooth Muscle found DE genes: 469

get_de_genes: Tuft found DE genes: 392

get_de_genes: Tumor I found DE genes: 185

get_de_genes: Tumor II found DE genes: 245

get_de_genes: Tumor III found DE genes: 355

get_de_genes: Tumor V found DE genes: 392

get_de_genes: Unknown III (SM) found DE genes: 350

get_de_genes: vSM found DE genes: 259

get_de_genes: total DE genes: 4567

create.RCTD: getting platform effect normalization differentially expressed genes:

get_de_genes: Adipocyte found DE genes: 720

get_de_genes: CAF found DE genes: 375

get_de_genes: CD4 T cell found DE genes: 867

get_de_genes: CD8 Cytotoxic T cell found DE genes: 923

get_de_genes: Endothelial found DE genes: 654

get_de_genes: Enteric Glial found DE genes: 714

get_de_genes: Enterocyte found DE genes: 423

get_de_genes: Epithelial found DE genes: 193

get_de_genes: Fibroblast found DE genes: 549

get_de_genes: Goblet found DE genes: 688

get_de_genes: Lymphatic Endothelial found DE genes: 758

get_de_genes: Macrophage found DE genes: 673

get_de_genes: Mast found DE genes: 706

get_de_genes: Mature B found DE genes: 972

get_de_genes: mRegDC found DE genes: 821

get_de_genes: Myofibroblast found DE genes: 667

get_de_genes: Neuroendocrine found DE genes: 452

get_de_genes: Neutrophil found DE genes: 685

get_de_genes: pDC found DE genes: 1003

get_de_genes: Pericytes found DE genes: 587

get_de_genes: Plasma found DE genes: 190

get_de_genes: Proliferating Immune II found DE genes: 788

get_de_genes: QC_Filtered found DE genes: 187

get_de_genes: SM Stress Response found DE genes: 810

get_de_genes: Smooth Muscle found DE genes: 965

get_de_genes: Tuft found DE genes: 906

get_de_genes: Tumor I found DE genes: 561

get_de_genes: Tumor II found DE genes: 643

get_de_genes: Tumor III found DE genes: 983

get_de_genes: Tumor V found DE genes: 957

get_de_genes: Unknown III (SM) found DE genes: 851

get_de_genes: vSM found DE genes: 499

get_de_genes: total DE genes: 7329

fitBulk: decomposing bulk

chooseSigma: using initial Q_mat with sigma =  1

Likelihood value: 10788253.6984735

Sigma value:  0.84

Likelihood value: 10489224.530827

Sigma value:  0.69

Likelihood value: 10212884.670123

Sigma value:  0.61

Likelihood value: 10073157.488242

Sigma value:  0.53

Likelihood value: 9944652.72074451

Sigma value:  0.45

Likelihood value: 9834772.22367206

Sigma value:  0.37

Likelihood value: 9754837.1924562

Sigma value:  0.29

Likelihood value: 9722265.94164512

Sigma value:  0.29

Cell-type-specific SVG identification using MCube

[6]:
myRCTD <- readRDS(file.path(RESULT_PATH, "myRCTD.rds"))
spot_names <- colnames(myRCTD@spatialRNA@counts)
weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)
[7]:
mcube_object <- createMCube(
    counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spot_names])),
    coordinates = as.matrix(myRCTD@spatialRNA@coords),
    proportions = weights_RCTD / rowSums(weights_RCTD),
    library_sizes = myRCTD@spatialRNA@nUMI,
    reference = t(myRCTD@cell_type_info$info[[1]]),
    used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
    spot_effects = spot_effects_RCTD,
    celltype_threshold = 50
)
mcube_object <- mcubeFitNull(
    mcube_object,
    num_workers = 70, num_threads = 1
)
mcube_object <- mcubeTest(
    mcube_object,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

saveRDS(
    mcube_object,
    file = file.path(
        RESULT_PATH,
        paste0("mcube", ".rds")
    )
)
The batch_id is not provided!
All spots are assumed to be from the same batch and share the same gene platform effects.

Select high-abundance cell types to analyze with proportion_threshold = 0.1 and celltype_threshold = 50.

mcubeFilterCellTypes: Cell type(s) CAF, Endothelial, Goblet, Macrophage, Plasma, Tumor III, vSM pass the threshold.

Cell type(s) Adipocyte, CD4 T cell, CD8 Cytotoxic T cell, Enteric Glial, Enterocyte, Epithelial, Fibroblast, Lymphatic Endothelial, Mast, Mature B, mRegDC, Myofibroblast, Neuroendocrine, Neutrophil, pDC, Pericytes, Proliferating Immune II, QC_Filtered, SM Stress Response, Smooth Muscle, Tuft, Tumor I, Tumor II, Tumor V, Unknown III (SM) has/have less than the minimum celltype_threshold = 50 with proportion_threshold = 0.1.
To include the above cell type(s), please reduce celltype_threshold or proportion_threshold.

Cell type(s) CAF, Endothelial, Goblet, Macrophage, Plasma, Tumor III, vSM will be analyzed.

Filter out lowly-expressed genes with gene_threshold = 5e-05.

mcubeFilterGenes: 5774 genes pass the threshold.

The platform effects are not provided and need to be estimated from data!

Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.

mcubeFilterGenesCellType: Select 2511 genes to analyze for CAF.

mcubeFilterGenesCellType: Select 3439 genes to analyze for Endothelial.

mcubeFilterGenesCellType: Select 3593 genes to analyze for Goblet.

mcubeFilterGenesCellType: Select 2941 genes to analyze for Macrophage.

mcubeFilterGenesCellType: Select 1197 genes to analyze for Plasma.

mcubeFilterGenesCellType: Select 4222 genes to analyze for Tumor III.

mcubeFilterGenesCellType: Select 2740 genes to analyze for vSM.

Preprocessed data description: 4269 spots and 32 cell types in total. 4244 spots, 5774 genes, and 7 cell type(s) to analyze.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

mcubeKernel: length_scale is set as 0.100936633966367 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.142745956695525 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.100936633966367 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.142745956695525 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

[ ]: