{ "cells": [ { "cell_type": "markdown", "id": "15e71d29", "metadata": {}, "source": [ "# Applying `RCTD` and `MCube` to the 10x Xenium CRC dataset" ] }, { "cell_type": "code", "execution_count": 1, "id": "833e8ca8-b82a-46ca-9df1-3c7c0723dad8", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "set.seed(20250502)\n", "\n", "library(Matrix)\n", "library(ggplot2)\n", "\n", "library(spacexr)\n", "library(MCube)" ] }, { "cell_type": "code", "execution_count": 2, "id": "82892b33", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "RAW_DATA_PATH <- \"/import/home/share/zw/data/CRC\"\n", "DATA_PATH <- \"/import/home/share/zw/pql/data/CRC\"\n", "RESULT_PATH <- \"/import/home/share/zw/pql/results/CRC\"\n", "\n", "if (!dir.exists(file.path(RESULT_PATH, \"Xenium\"))) {\n", " dir.create(file.path(RESULT_PATH, \"Xenium\"), recursive = TRUE)\n", "}" ] }, { "cell_type": "markdown", "id": "27ef0796", "metadata": {}, "source": [ "## Cell type deconvolution using `RCTD`" ] }, { "cell_type": "code", "execution_count": 3, "id": "88129183", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# library(Seurat)\n", "\n", "# FlexRef <- Read10X_h5(file.path(\n", "# RAW_DATA_PATH, \"sc\", \"HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5\"\n", "# ))\n", "# # MetaData <- readRDS(file.path(\n", "# # RAW_DATA_PATH, \"sc\", \"FlexSeuratV5_MetaData.rds\"\n", "# # )) # See FlexSingleCell.R if not generated.\n", "\n", "# meta <- read.csv(file.path(\n", "# RAW_DATA_PATH, \"HumanColonCancer_VisiumHD/MetaData/SingleCell_MetaData.csv.gz\"\n", "# ))\n", "\n", "# KpIdents <- names(which(table(meta$Level2) > 25))\n", "# meta <- meta[meta$Level2 %in% KpIdents, ]\n", "# FlexRef <- FlexRef[, meta$Barcode]\n", "\n", "# CTRef <- meta$Level2\n", "# CTRef <- gsub(\"/\", \"_\", CTRef)\n", "# CTRef <- as.factor(CTRef)\n", "# names(CTRef) <- meta$Barcode\n", "\n", "# reference <- Reference(FlexRef, CTRef, colSums(FlexRef))" ] }, { "cell_type": "code", "execution_count": 4, "id": "fbdbcbe3", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# counts <- as.data.frame(readr::read_csv(\n", "# file.path(DATA_PATH, \"Xenium\", \"xenium_p2_counts.csv\")\n", "# ))\n", "# rownames(counts) <- counts[, 1]\n", "# counts[, 1] <- NULL\n", "# # head(counts)\n", "\n", "# coordinates <- as.data.frame(readr::read_csv(\n", "# file.path(DATA_PATH, \"Xenium\", \"xenium_p2_coordinates.csv\")\n", "# ))\n", "# rownames(coordinates) <- coordinates[, 1]\n", "# coordinates[, 1] <- NULL\n", "# head(coordinates)\n", "# coordinates$x <- sum(range(coordinates$x)) - coordinates$x\n", "# # head(coordinates)\n", "\n", "# nUMI <- rowSums(counts)\n", "\n", "# puck <- SpatialRNA(coordinates, t(counts), nUMI)\n", "\n", "# myRCTD_xenium <- create.RCTD(puck, reference, max_cores = 8)\n", "# myRCTD_xenium <- run.RCTD(myRCTD_xenium, doublet_mode = \"doublet\")\n", "\n", "# saveRDS(\n", "# myRCTD_xenium,\n", "# file = file.path(\n", "# RESULT_PATH, \"Xenium\", \"myRCTD.rds\"\n", "# )\n", "# )" ] }, { "cell_type": "markdown", "id": "28cf38f8", "metadata": {}, "source": [ "## Cell-type-specific SVG identification using `MCube`" ] }, { "cell_type": "markdown", "id": "cdb454dc", "metadata": {}, "source": [ "Due to the high resolution of the Xenium data, for the cell types of interest, we select bins that are confirmed to contain those specific cell types based on the results from `RCTD` (doublet mode) for further analysis." ] }, { "cell_type": "code", "execution_count": 5, "id": "8978ca00", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "myRCTD <- readRDS(file.path(RESULT_PATH, \"Xenium\", \"myRCTD.rds\"))\n", "weights_RCTD <- as.matrix(myRCTD@results$weights)\n", "proportions_RCTD <- weights_RCTD / rowSums(weights_RCTD)\n", "spot_effects_RCTD <- log(rowSums(weights_RCTD))\n", "names(spot_effects_RCTD) <- rownames(weights_RCTD)\n", "doublet_results_RCTD <- myRCTD@results$results_df" ] }, { "cell_type": "code", "execution_count": 6, "id": "2cf32693", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD4 T cell, CD8 Cytotoxic T cell, Endothelial, Enteric Glial, Fibroblast, Lymphatic Endothelial, Macrophage, Myofibroblast, Pericytes, Plasma, Proliferating Immune II, SM Stress Response, Smooth Muscle, Tumor III, Unknown III (SM), vSM pass the threshold.\n", "\n", "Cell type(s) CAF will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 367 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 52 genes to analyze for CAF.\n", "\n", "Preprocessed data description: 10000 spots and 32 cell types in total. 10000 spots, 52 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0256977473470701 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0363421028206638 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0256977473470701 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0363421028206638 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD4 T cell, Fibroblast, Macrophage, Myofibroblast, Pericytes, Plasma, Proliferating Immune II, Tumor III, Unknown III (SM) pass the threshold.\n", "\n", "Cell type(s) CD4 T cell will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 392 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 87 genes to analyze for CD4 T cell.\n", "\n", "Preprocessed data description: 4709 spots and 32 cell types in total. 4709 spots, 87 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0218096971930091 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0308435695616038 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0218096971930091 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0308435695616038 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD8 Cytotoxic T cell, Macrophage, Tumor III, Unknown III (SM), vSM pass the threshold.\n", "\n", "Cell type(s) CD8 Cytotoxic T cell will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 398 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 82 genes to analyze for CD8 Cytotoxic T cell.\n", "\n", "Preprocessed data description: 2194 spots and 32 cell types in total. 2194 spots, 82 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0376170009946775 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0531984729824751 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0376170009946775 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0531984729824751 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD4 T cell, Endothelial, Lymphatic Endothelial, Macrophage, Myofibroblast, Pericytes, Plasma, Proliferating Immune II, Tumor III, Unknown III (SM) pass the threshold.\n", "\n", "Cell type(s) Endothelial will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 387 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 62 genes to analyze for Endothelial.\n", "\n", "Preprocessed data description: 7473 spots and 32 cell types in total. 7473 spots, 62 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0131275601296658 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0185651735762417 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0131275601296658 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0185651735762417 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Enteric Glial pass the threshold.\n", "\n", "Cell type(s) Enteric Glial will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 391 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 113 genes to analyze for Enteric Glial.\n", "\n", "Preprocessed data description: 193 spots and 32 cell types in total. 193 spots, 113 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0221289502402945 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0312950615509038 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0221289502402945 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0312950615509038 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Enterocyte pass the threshold.\n", "\n", "Cell type(s) Enterocyte will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 382 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 304 genes to analyze for Enterocyte.\n", "\n", "Preprocessed data description: 1318 spots and 32 cell types in total. 1318 spots, 304 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0391229340076264 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0553281838734128 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0391229340076264 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0553281838734128 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Fibroblast, Plasma pass the threshold.\n", "\n", "Cell type(s) Fibroblast will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 394 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 223 genes to analyze for Fibroblast.\n", "\n", "Preprocessed data description: 1173 spots and 32 cell types in total. 1173 spots, 223 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0434392838127099 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0614324243077083 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0434392838127099 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0614324243077083 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Goblet, Plasma, Tumor I, Tumor II, Tumor III, Tumor V pass the threshold.\n", "\n", "Cell type(s) Goblet will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 389 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 168 genes to analyze for Goblet.\n", "\n", "Preprocessed data description: 10000 spots and 32 cell types in total. 10000 spots, 168 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0215845400740981 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0305251493103751 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0215845400740981 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0305251493103751 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Lymphatic Endothelial pass the threshold.\n", "\n", "Cell type(s) Lymphatic Endothelial will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 387 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 170 genes to analyze for Lymphatic Endothelial.\n", "\n", "Preprocessed data description: 575 spots and 32 cell types in total. 575 spots, 170 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0252926114667311 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0357691541640844 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0252926114667311 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0357691541640844 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD4 T cell, CD8 Cytotoxic T cell, Endothelial, Enteric Glial, Fibroblast, Lymphatic Endothelial, Macrophage, mRegDC, Myofibroblast, Neutrophil, pDC, Pericytes, Plasma, Proliferating Immune II, Smooth Muscle, Tumor III, Tumor V, Unknown III (SM), vSM pass the threshold.\n", "\n", "Cell type(s) Macrophage will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 384 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 42 genes to analyze for Macrophage.\n", "\n", "Preprocessed data description: 10000 spots and 32 cell types in total. 10000 spots, 42 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0127138497858166 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0179800987970762 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0127138497858166 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0179800987970762 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Myofibroblast, Tumor III pass the threshold.\n", "\n", "Cell type(s) Myofibroblast will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 389 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 108 genes to analyze for Myofibroblast.\n", "\n", "Preprocessed data description: 836 spots and 32 cell types in total. 836 spots, 108 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0424814860461052 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0600778937161653 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0424814860461052 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0600778937161653 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Goblet, Neuroendocrine pass the threshold.\n", "\n", "Cell type(s) Neuroendocrine will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 402 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 124 genes to analyze for Neuroendocrine.\n", "\n", "Preprocessed data description: 578 spots and 32 cell types in total. 578 spots, 124 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0190330250810503 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0269167622026086 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0190330250810503 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0269167622026086 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Macrophage, Neutrophil, Tumor III pass the threshold.\n", "\n", "Cell type(s) Neutrophil will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 399 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 157 genes to analyze for Neutrophil.\n", "\n", "Preprocessed data description: 1713 spots and 32 cell types in total. 1713 spots, 157 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0281073319876069 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0397497700989968 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0281073319876069 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0397497700989968 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Endothelial, Macrophage, Myofibroblast, Pericytes, Tumor III pass the threshold.\n", "\n", "Cell type(s) Pericytes will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 380 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 95 genes to analyze for Pericytes.\n", "\n", "Preprocessed data description: 2832 spots and 32 cell types in total. 2832 spots, 95 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0181779695557007 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0257075310820772 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0181779695557007 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0257075310820772 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, Fibroblast, Macrophage, Plasma pass the threshold.\n", "\n", "Cell type(s) Plasma will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 397 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 84 genes to analyze for Plasma.\n", "\n", "Preprocessed data description: 3261 spots and 32 cell types in total. 3261 spots, 84 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0178435573652251 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0252346008268836 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0178435573652251 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0252346008268836 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) CAF, CD4 T cell, Macrophage, Plasma, Proliferating Immune II pass the threshold.\n", "\n", "Cell type(s) Proliferating Immune II will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 390 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 81 genes to analyze for Proliferating Immune II.\n", "\n", "Preprocessed data description: 2368 spots and 32 cell types in total. 2368 spots, 81 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.014175889673722 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0200477354352823 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.014175889673722 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0200477354352823 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Goblet, Tumor I pass the threshold.\n", "\n", "Cell type(s) Tumor I will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 395 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 184 genes to analyze for Tumor I.\n", "\n", "Preprocessed data description: 795 spots and 32 cell types in total. 795 spots, 184 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0256831607333036 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0363214742336461 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0256831607333036 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0363214742336461 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Tumor III, Tumor V pass the threshold.\n", "\n", "Cell type(s) Tumor III will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 382 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 285 genes to analyze for Tumor III.\n", "\n", "Preprocessed data description: 10000 spots and 32 cell types in total. 10000 spots, 285 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0181905065023158 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.025725261002011 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0181905065023158 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.025725261002011 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Tumor V pass the threshold.\n", "\n", "Cell type(s) Tumor V will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 380 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 126 genes to analyze for Tumor V.\n", "\n", "Preprocessed data description: 284 spots and 32 cell types in total. 284 spots, 126 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0189446316493579 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0267917550126845 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0189446316493579 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0267917550126845 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Unknown III (SM) pass the threshold.\n", "\n", "Cell type(s) Unknown III (SM) will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 379 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 99 genes to analyze for Unknown III (SM).\n", "\n", "Preprocessed data description: 371 spots and 32 cell types in total. 371 spots, 99 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0683202388756594 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0966194084025271 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0683202388756594 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0966194084025271 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "The batch_id is not provided!\n", "All spots are assumed to be from the same batch and share the same gene platform effects.\n", "\n", "Select high-abundance cell types to analyze with proportion_threshold = 0.01 and celltype_threshold = 100.\n", "\n", "mcubeFilterCellTypes: Cell type(s) Macrophage, vSM pass the threshold.\n", "\n", "Cell type(s) vSM will be analyzed.\n", "\n", "Filter out lowly-expressed genes with gene_threshold = 5e-05.\n", "\n", "mcubeFilterGenes: 382 genes pass the threshold.\n", "\n", "The platform effects are not provided and need to be estimated from data!\n", "\n", "Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.\n", "\n", "mcubeFilterGenesCellType: Select 92 genes to analyze for vSM.\n", "\n", "Preprocessed data description: 802 spots and 32 cell types in total. 802 spots, 92 genes, and 1 cell type(s) to analyze.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n", "mcubeKernel: length_scale is set as 0.0623277404375215 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0881447358388129 for the Gaussian kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0623277404375215 for the Gaussian_transformed kernel.\n", "\n", "mcubeKernel: length_scale is set as 0.0881447358388129 for the Gaussian_transformed kernel.\n", "\n", "Number of physical cores: 72.\n", "\n", "Number of workers: 35.\n", "\n", "Number of thread(s) on BLAS per worker: 2.\n", "\n" ] } ], "source": [ "sample_size_max <- 10000\n", "celltype_threshold <- 100\n", "for (celltype in colnames(proportions_RCTD)) {\n", " spots_used <- rownames(doublet_results_RCTD)[\n", " ((doublet_results_RCTD$spot_class == \"singlet\" |\n", " doublet_results_RCTD$spot_class == \"doublet_uncertain\") &\n", " doublet_results_RCTD$first_type == celltype\n", " ) |\n", " (doublet_results_RCTD$spot_class == \"doublet_certain\" &\n", " (doublet_results_RCTD$first_type == celltype |\n", " doublet_results_RCTD$second_type == celltype))\n", " ]\n", "\n", " if (length(spots_used) > 0 & sum(proportions_RCTD[spots_used, celltype]) > celltype_threshold) {\n", " if (length(spots_used) > sample_size_max) {\n", " spots_used <- sample(spots_used, size = sample_size_max, replace = FALSE)\n", " }\n", "\n", " mcube_object <- createMCube(\n", " counts = t(as.matrix(myRCTD@originalSpatialRNA@counts[, spots_used])),\n", " coordinates = as.matrix(myRCTD@spatialRNA@coords[spots_used, ]),\n", " proportions = proportions_RCTD[spots_used, ],\n", " library_sizes = myRCTD@spatialRNA@nUMI[spots_used],\n", " reference = t(myRCTD@cell_type_info$info[[1]]),\n", " used_for_deconvolution = rownames(myRCTD@spatialRNA@counts),\n", " spot_effects = spot_effects_RCTD[spots_used],\n", " celltype_test = celltype,\n", " proportion_threshold = 0.01\n", " )\n", " mcube_object <- mcubeFitNull(\n", " mcube_object,\n", " num_workers = 35, num_threads = 2\n", " )\n", " mcube_object <- mcubeTest(\n", " mcube_object,\n", " num_workers = 35, num_threads = 2, shared_memory = TRUE\n", " )\n", "\n", " saveRDS(\n", " mcube_object,\n", " file = file.path(\n", " RESULT_PATH, \"Xenium\",\n", " paste0(\"mcube_\", celltype, \".rds\")\n", " )\n", " )\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "0b53a9d1", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.2" } }, "nbformat": 4, "nbformat_minor": 5 }