Applying RCTD and MCube to the 10x Visium CRC dataset
[1]:
set.seed(20250502)
library(Matrix)
library(ggplot2)
library(spacexr)
library(MCube)
[2]:
RAW_DATA_PATH <- "/import/home/share/zw/data/CRC"
DATA_PATH <- "/import/home/share/zw/pql/data/CRC"
RESULT_PATH <- "/import/home/share/zw/pql/results/CRC/Visium"
if (!dir.exists(file.path(RESULT_PATH))) {
dir.create(file.path(RESULT_PATH), recursive = TRUE)
}
Cell type deconvolution using RCTD
[3]:
library(Seurat)
FlexRef <- Read10X_h5(file.path(
RAW_DATA_PATH, "sc",
"HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5"
))
# MetaData <- readRDS(file.path(
# RAW_DATA_PATH, "sc", "FlexSeuratV5_MetaData.rds"
# )) # See FlexSingleCell.R if not generated.
meta <- read.csv(file.path(
RAW_DATA_PATH, "HumanColonCancer_VisiumHD/MetaData/SingleCell_MetaData.csv.gz"
))
KpIdents <- names(which(table(meta$Level2) > 25))
meta <- meta[meta$Level2 %in% KpIdents, ]
FlexRef <- FlexRef[, meta$Barcode]
CTRef <- meta$Level2
CTRef <- gsub("/", "_", CTRef)
CTRef <- as.factor(CTRef)
names(CTRef) <- meta$Barcode
reference <- Reference(FlexRef, CTRef, colSums(FlexRef))
Loading required package: SeuratObject
Loading required package: sp
Attaching package: ‘SeuratObject’
The following objects are masked from ‘package:base’:
intersect, t
Warning message in check_UMI(nUMI, "Reference", require_2d = T, require_int = require_int, :
“Reference: some nUMI values are less than min_UMI = 100, and these cells will be removed. Optionally, you may lower the min_UMI parameter.”
Warning message in Reference(FlexRef, CTRef, colSums(FlexRef)):
“Reference: number of cells per cell type is 34844, larger than maximum allowable of 10000. Downsampling number of cells to: 10000”
[4]:
counts <- Read10X_h5(file.path(
RAW_DATA_PATH, "visium",
"Visium_V2_Human_Colon_Cancer_P2_filtered_feature_bc_matrix.h5"
))
coordinates <- as.data.frame(readr::read_csv(file.path(
RAW_DATA_PATH, "visium",
"spatial/tissue_positions.csv"
)))
rownames(coordinates)<-coordinates$barcode
coordinates<-coordinates[colnames(counts), ]
coordinates <- coordinates[, c(6, 5)]
puck <- SpatialRNA(coordinates, counts, colSums(counts))
Rows: 14336 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_ful...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[5]:
myRCTD <- create.RCTD(puck, reference, max_cores = 12)
myRCTD <- run.RCTD(myRCTD, doublet_mode = "full")
saveRDS(myRCTD, file.path(RESULT_PATH, "myRCTD.rds"))
# myRCTD <- readRDS(file.path(RESULT_PATH, "myRCTD.rds"))
Begin: process_cell_type_info
process_cell_type_info: number of cells in reference: 186667
process_cell_type_info: number of genes in reference: 18082
Adipocyte CAF CD4 T cell
83 10000 10000
CD8 Cytotoxic T cell Endothelial Enteric Glial
10000 6940 3680
Enterocyte Epithelial Fibroblast
1739 7432 10000
Goblet Lymphatic Endothelial Macrophage
10000 1095 10000
Mast Mature B mRegDC
1925 8318 404
Myofibroblast Neuroendocrine Neutrophil
972 529 3138
pDC Pericytes Plasma
119 3046 10000
Proliferating Immune II QC_Filtered SM Stress Response
2560 10000 10000
Smooth Muscle Tuft Tumor I
10000 101 10000
Tumor II Tumor III Tumor V
8292 10000 7893
Unknown III (SM) vSM
1784 6617
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.0 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.3 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 1.1 GiB”
End: process_cell_type_info
create.RCTD: getting regression differentially expressed genes:
get_de_genes: Adipocyte found DE genes: 360
get_de_genes: CAF found DE genes: 217
get_de_genes: CD4 T cell found DE genes: 387
get_de_genes: CD8 Cytotoxic T cell found DE genes: 441
get_de_genes: Endothelial found DE genes: 335
get_de_genes: Enteric Glial found DE genes: 329
get_de_genes: Enterocyte found DE genes: 203
get_de_genes: Epithelial found DE genes: 71
get_de_genes: Fibroblast found DE genes: 287
get_de_genes: Goblet found DE genes: 311
get_de_genes: Lymphatic Endothelial found DE genes: 370
get_de_genes: Macrophage found DE genes: 355
get_de_genes: Mast found DE genes: 351
get_de_genes: Mature B found DE genes: 415
get_de_genes: mRegDC found DE genes: 463
get_de_genes: Myofibroblast found DE genes: 348
get_de_genes: Neuroendocrine found DE genes: 178
get_de_genes: Neutrophil found DE genes: 406
get_de_genes: pDC found DE genes: 494
get_de_genes: Pericytes found DE genes: 311
get_de_genes: Plasma found DE genes: 108
get_de_genes: Proliferating Immune II found DE genes: 311
get_de_genes: QC_Filtered found DE genes: 55
get_de_genes: SM Stress Response found DE genes: 400
get_de_genes: Smooth Muscle found DE genes: 469
get_de_genes: Tuft found DE genes: 392
get_de_genes: Tumor I found DE genes: 185
get_de_genes: Tumor II found DE genes: 245
get_de_genes: Tumor III found DE genes: 355
get_de_genes: Tumor V found DE genes: 392
get_de_genes: Unknown III (SM) found DE genes: 350
get_de_genes: vSM found DE genes: 259
get_de_genes: total DE genes: 4567
create.RCTD: getting platform effect normalization differentially expressed genes:
get_de_genes: Adipocyte found DE genes: 720
get_de_genes: CAF found DE genes: 375
get_de_genes: CD4 T cell found DE genes: 867
get_de_genes: CD8 Cytotoxic T cell found DE genes: 923
get_de_genes: Endothelial found DE genes: 654
get_de_genes: Enteric Glial found DE genes: 714
get_de_genes: Enterocyte found DE genes: 423
get_de_genes: Epithelial found DE genes: 193
get_de_genes: Fibroblast found DE genes: 549
get_de_genes: Goblet found DE genes: 688
get_de_genes: Lymphatic Endothelial found DE genes: 758
get_de_genes: Macrophage found DE genes: 673
get_de_genes: Mast found DE genes: 706
get_de_genes: Mature B found DE genes: 972
get_de_genes: mRegDC found DE genes: 821
get_de_genes: Myofibroblast found DE genes: 667
get_de_genes: Neuroendocrine found DE genes: 452
get_de_genes: Neutrophil found DE genes: 685
get_de_genes: pDC found DE genes: 1003
get_de_genes: Pericytes found DE genes: 587
get_de_genes: Plasma found DE genes: 190
get_de_genes: Proliferating Immune II found DE genes: 788
get_de_genes: QC_Filtered found DE genes: 187
get_de_genes: SM Stress Response found DE genes: 810
get_de_genes: Smooth Muscle found DE genes: 965
get_de_genes: Tuft found DE genes: 906
get_de_genes: Tumor I found DE genes: 561
get_de_genes: Tumor II found DE genes: 643
get_de_genes: Tumor III found DE genes: 983
get_de_genes: Tumor V found DE genes: 957
get_de_genes: Unknown III (SM) found DE genes: 851
get_de_genes: vSM found DE genes: 499
get_de_genes: total DE genes: 7329
fitBulk: decomposing bulk
chooseSigma: using initial Q_mat with sigma = 1
Likelihood value: 10788253.6984735
Sigma value: 0.84
Likelihood value: 10489224.530827
Sigma value: 0.69
Likelihood value: 10212884.670123
Sigma value: 0.61
Likelihood value: 10073157.488242
Sigma value: 0.53
Likelihood value: 9944652.72074451
Sigma value: 0.45
Likelihood value: 9834772.22367206
Sigma value: 0.37
Likelihood value: 9754837.1924562
Sigma value: 0.29
Likelihood value: 9722265.94164512
Sigma value: 0.29
Cell-type-specific SVG identification using MCube
[6]:
myRCTD <- readRDS(file.path(RESULT_PATH, "myRCTD.rds"))
spot_names <- colnames(myRCTD@spatialRNA@counts)
weights_RCTD <- as.matrix(myRCTD@results$weights)
proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)
spot_effects_RCTD <- log(rowSums(weights_RCTD))
names(spot_effects_RCTD) <- rownames(weights_RCTD)
[7]:
mcube_object <- createMCube(
counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spot_names])),
coordinates = as.matrix(myRCTD@spatialRNA@coords),
proportions = weights_RCTD / rowSums(weights_RCTD),
library_sizes = myRCTD@spatialRNA@nUMI,
reference = t(myRCTD@cell_type_info$info[[1]]),
used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),
spot_effects = spot_effects_RCTD,
celltype_threshold = 50
)
mcube_object <- mcubeFitNull(
mcube_object,
num_workers = 70, num_threads = 1
)
mcube_object <- mcubeTest(
mcube_object,
num_workers = 70, num_threads = 1, shared_memory = TRUE
)
saveRDS(
mcube_object,
file = file.path(
RESULT_PATH,
paste0("mcube", ".rds")
)
)
The batch_id is not provided!
All spots are assumed to be from the same batch and share the same gene platform effects.
Select high-abundance cell types to analyze with proportion_threshold = 0.1 and celltype_threshold = 50.
mcubeFilterCellTypes: Cell type(s) CAF, Endothelial, Goblet, Macrophage, Plasma, Tumor III, vSM pass the threshold.
Cell type(s) Adipocyte, CD4 T cell, CD8 Cytotoxic T cell, Enteric Glial, Enterocyte, Epithelial, Fibroblast, Lymphatic Endothelial, Mast, Mature B, mRegDC, Myofibroblast, Neuroendocrine, Neutrophil, pDC, Pericytes, Proliferating Immune II, QC_Filtered, SM Stress Response, Smooth Muscle, Tuft, Tumor I, Tumor II, Tumor V, Unknown III (SM) has/have less than the minimum celltype_threshold = 50 with proportion_threshold = 0.1.
To include the above cell type(s), please reduce celltype_threshold or proportion_threshold.
Cell type(s) CAF, Endothelial, Goblet, Macrophage, Plasma, Tumor III, vSM will be analyzed.
Filter out lowly-expressed genes with gene_threshold = 5e-05.
mcubeFilterGenes: 5774 genes pass the threshold.
The platform effects are not provided and need to be estimated from data!
Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.
mcubeFilterGenesCellType: Select 2511 genes to analyze for CAF.
mcubeFilterGenesCellType: Select 3439 genes to analyze for Endothelial.
mcubeFilterGenesCellType: Select 3593 genes to analyze for Goblet.
mcubeFilterGenesCellType: Select 2941 genes to analyze for Macrophage.
mcubeFilterGenesCellType: Select 1197 genes to analyze for Plasma.
mcubeFilterGenesCellType: Select 4222 genes to analyze for Tumor III.
mcubeFilterGenesCellType: Select 2740 genes to analyze for vSM.
Preprocessed data description: 4269 spots and 32 cell types in total. 4244 spots, 5774 genes, and 7 cell type(s) to analyze.
Number of physical cores: 72.
Number of workers: 70.
Number of thread(s) on BLAS per worker: 1.
mcubeKernel: length_scale is set as 0.100936633966367 for the Gaussian kernel.
mcubeKernel: length_scale is set as 0.142745956695525 for the Gaussian kernel.
mcubeKernel: length_scale is set as 0.100936633966367 for the Gaussian_transformed kernel.
mcubeKernel: length_scale is set as 0.142745956695525 for the Gaussian_transformed kernel.
Number of physical cores: 72.
Number of workers: 70.
Number of thread(s) on BLAS per worker: 1.
[ ]: