Applying MCube to multiple adult mouse brain datasets from different sources

[1]:
set.seed(20240709)

library(MCube)
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.4      readr     2.1.5
 forcats   1.0.0      stringr   1.5.1
 lubridate 1.9.4      tibble    3.2.1
 purrr     1.0.4      tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
[2]:
DATA_PATH <- "/import/home/share/zw/pql/data/mouse_brain"
RESULT_PATH <- "/import/home/share/zw/pql/results/mouse_brain"

if (!dir.exists(file.path(RESULT_PATH, "visium_1"))) {
    dir.create(file.path(RESULT_PATH, "visium_1"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "visium_2"))) {
    dir.create(file.path(RESULT_PATH, "visium_2"), recursive = TRUE)
}
if (!dir.exists(file.path(RESULT_PATH, "ST_3D"))) {
    dir.create(file.path(RESULT_PATH, "ST_3D"), recursive = TRUE)
}

Two slices of the Visium dataset

Visium Slice 1

[3]:
reference_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", "reference_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))
dim(reference_ST)
num_celltypes <- nrow(reference_ST)

proportions_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", paste0("prop_slice", 0, ".csv")),
    header = TRUE, row.names = 1, check.names = FALSE
))
dim(proportions_ST)

counts <- as.data.frame(readr::read_csv(
    file.path(DATA_PATH, "visium_1", "counts.csv")
))
rownames(counts) <- counts[, 1]
counts[, 1] <- NULL
counts <- data.matrix(counts)
dim(counts)

coordinates <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", "3D_coordinates.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, c("x", "y")]

shuffle_idx <- sample(1:nrow(coordinates), nrow(coordinates))
write.csv(shuffle_idx, file = file.path(DATA_PATH, "visium_1", "shuffle_idx.csv"), row.names = FALSE)
shuffle_idx <- read.csv(file.path(DATA_PATH, "visium_1", "shuffle_idx.csv"), header = TRUE)[, 1]
length(unique(shuffle_idx))

coordinates_shuffle <- coordinates[shuffle_idx, ]
rownames(coordinates_shuffle) <- rownames(coordinates)
write.csv(coordinates_shuffle, file = file.path(DATA_PATH, "visium_1", "coordinates_shuffle.csv"))
coordinates_shuffle <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", "coordinates_shuffle.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))

dim(coordinates)
dim(coordinates_shuffle)

spot_effects_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", "spot_effects_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, 1]

library_sizes_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_1", "library_sizes_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, 1]
  1. 59
  2. 6415
  1. 4168
  2. 59
New names:
 `` -> `...1`
Rows: 4168 Columns: 6416
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr    (1): ...1
dbl (6415): 0610040J01Rik, 0610043K17Rik, 1110002J07Rik, 1110004F10Rik, 1110...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. 4168
  2. 6415
4168
  1. 4168
  2. 2
  1. 4168
  2. 2
[4]:
# Real case

mcube_object_1 <- createMCube(
    counts = counts, coordinates = coordinates,
    proportions = proportions_ST, library_sizes = library_sizes_ST,
    covariates = NULL,
    reference = reference_ST,
    spot_effects = spot_effects_ST, platform_effects = NULL,
    project = "mouse_brain_visium_1"
)
mcube_object_1 <- mcubeFitNull(
    mcube_object_1,
    num_workers = 70, num_threads = 1
)
mcube_object_1 <- mcubeTest(
    mcube_object_1,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

p <- mcubePlotPvalues(mcube_object_1@pvalues, "combined_pvalue", nrow = 2)
p <- p +
    labs(title = "10x Visium Slice 1") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
p
ggsave(
    filename = file.path(
        RESULT_PATH, "visium_1",
        paste0("pvalue_qqplot.png")
    ),
    plot = p, width = 12, height = 7
)
The batch_id is not provided!
All spots are assumed to be from the same batch and share the same gene platform effects.

Select high-abundance cell types to analyze with proportion_threshold = 0.1 and celltype_threshold = 100.

mcubeFilterCellTypes: Cell type(s) Ext_Hpc_CA1, Ext_Hpc_DG1, Ext_L23, Ext_L5_1, Ext_Pir, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_4, Oligo_2 pass the threshold.

Cell type(s) Astro_AMY, Astro_AMY_CTX, Astro_CTX, Astro_HPC, Astro_HYPO, Astro_STR, Astro_THAL_hab, Astro_THAL_lat, Astro_THAL_med, Astro_WM, Endo, Ext_Amy_1, Ext_Amy_2, Ext_ClauPyr, Ext_Hpc_CA2, Ext_Hpc_CA3, Ext_Hpc_DG2, Ext_L25, Ext_L56, Ext_L5_2, Ext_L5_3, Ext_L6, Ext_L6B, Ext_Med, Ext_Unk_1, Ext_Unk_2, Ext_Unk_3, Inh_2, Inh_3, Inh_5, Inh_6, Inh_Lamp5, Inh_Meis2_1, Inh_Meis2_2, Inh_Meis2_3, Inh_Meis2_4, Inh_Pvalb, Inh_Sst, Inh_Vip, LowQ_1, LowQ_2, Micro, Nb_1, Nb_2, OPC_1, OPC_2, Oligo_1, Unk_1, Unk_2 has/have less than the minimum celltype_threshold = 100 with proportion_threshold = 0.1.
To include the above cell type(s), please reduce celltype_threshold or proportion_threshold.

Cell type(s) Ext_Hpc_CA1, Ext_Hpc_DG1, Ext_L23, Ext_L5_1, Ext_Pir, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_4, Oligo_2 will be analyzed.

Filter out lowly-expressed genes with gene_threshold = 5e-05.

mcubeFilterGenes: 2768 genes pass the threshold.

The platform effects are not provided and need to be estimated from data!

Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.

mcubeFilterGenesCellType: Select 1114 genes to analyze for Ext_Hpc_CA1.

mcubeFilterGenesCellType: Select 1240 genes to analyze for Ext_Hpc_DG1.

mcubeFilterGenesCellType: Select 1226 genes to analyze for Ext_L23.

mcubeFilterGenesCellType: Select 1209 genes to analyze for Ext_L5_1.

mcubeFilterGenesCellType: Select 1289 genes to analyze for Ext_Pir.

mcubeFilterGenesCellType: Select 1182 genes to analyze for Ext_Thal_1.

mcubeFilterGenesCellType: Select 1331 genes to analyze for Ext_Thal_2.

mcubeFilterGenesCellType: Select 1501 genes to analyze for Inh_1.

mcubeFilterGenesCellType: Select 1479 genes to analyze for Inh_4.

mcubeFilterGenesCellType: Select 1315 genes to analyze for Oligo_2.

Preprocessed data description: 4168 spots and 59 cell types in total. 3428 spots, 2763 genes, and 10 cell type(s) to analyze.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

mcubeKernel: length_scale is set as 0.101059327818091 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.142919472004653 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.101059327818091 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.142919472004653 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_6_1.png
[5]:
saveRDS(
    mcube_object_1,
    file = file.path(
        RESULT_PATH, "visium_1",
        paste0("mcube", ".rds")
    )
)
# mcube_object_1 <- readRDS(
#     file = file.path(
#         RESULT_PATH, "visium_1",
#         paste0("mcube", ".rds")
#     )
# )
[6]:
# Negative control

mcube_object_null_1 <- mcube_object_1
mcube_object_null_1@coordinates <- coordinates_shuffle
mcube_object_null_1 <- mcubeTest(
    mcube_object_null_1,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

p <- mcubePlotPvalues(
    mcube_object_null_1@pvalues, "combined_pvalue",
    nrow = 2, under_null = TRUE
)
p <- p +
    labs(title = "10x Visium slice 1") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
p
ggsave(
    filename = file.path(
        RESULT_PATH, "visium_1",
        paste0("pvalue_qqplot_null.png")),
    plot = p, width = 12, height = 7
)
mcubeKernel: length_scale is set as 0.101059327818091 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.142919472004653 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.101059327818091 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.142919472004653 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_8_1.png
[7]:
pvalues_null_1 <- mcube_object_null_1@pvalues
rm(mcube_object_null_1)
saveRDS(
    pvalues_null_1,
    file = file.path(
        RESULT_PATH, "visium_1",
        paste0("mcube_pvalues_null", ".rds")
    )
)
# pvalues_null_1 <- readRDS(
#     file = file.path(
#         RESULT_PATH, "visium_1",
#         paste0("mcube_pvalues_null", ".rds")
#     )
# )

Visium Slice 2

[8]:
reference_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", "reference_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))
dim(reference_ST)
num_celltypes <- nrow(reference_ST)

proportions_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", paste0("prop_slice", 0, ".csv")),
    header = TRUE, row.names = 1, check.names = FALSE
))
dim(proportions_ST)

counts <- as.data.frame(readr::read_csv(
    file.path(DATA_PATH, "visium_2", "counts.csv")
))
rownames(counts) <- counts[, 1]
counts[, 1] <- NULL
counts <- data.matrix(counts)
dim(counts)

coordinates <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", "3D_coordinates.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, c("x", "y")]

shuffle_idx <- sample(1:nrow(coordinates), nrow(coordinates))
write.csv(shuffle_idx, file = file.path(DATA_PATH, "visium_2", "shuffle_idx.csv"), row.names = FALSE)
shuffle_idx <- read.csv(file.path(DATA_PATH, "visium_2", "shuffle_idx.csv"), header = TRUE)[, 1]
length(unique(shuffle_idx))

coordinates_shuffle <- coordinates[shuffle_idx, ]
rownames(coordinates_shuffle) <- rownames(coordinates)
write.csv(coordinates_shuffle, file = file.path(DATA_PATH, "visium_2", "coordinates_shuffle.csv"))
coordinates_shuffle <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", "coordinates_shuffle.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))

dim(coordinates)
dim(coordinates_shuffle)

spot_effects_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", "spot_effects_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, 1]

library_sizes_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "visium_2", "library_sizes_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[, 1]
  1. 59
  2. 6415
  1. 3925
  2. 59
New names:
 `` -> `...1`
Rows: 3925 Columns: 6416
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr    (1): ...1
dbl (6415): 0610040J01Rik, 0610043K17Rik, 1110002J07Rik, 1110004F10Rik, 1110...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. 3925
  2. 6415
3925
  1. 3925
  2. 2
  1. 3925
  2. 2
[9]:
# Real case

mcube_object_2 <- createMCube(
    counts = counts, coordinates = coordinates,
    proportions = proportions_ST, library_sizes = library_sizes_ST,
    covariates = NULL,
    reference = reference_ST,
    spot_effects = spot_effects_ST, platform_effects = NULL,
    project = "mouse_brain_visium_2"
)
mcube_object_2 <- mcubeFitNull(
    mcube_object_2,
    num_workers = 70, num_threads = 1
)
mcube_object_2 <- mcubeTest(
    mcube_object_2,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

p <- mcubePlotPvalues(mcube_object_2@pvalues, "combined_pvalue", nrow = 2)
p <- p +
    labs(title = "10x Visium Slice 2") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
p
ggsave(
    filename = file.path(
        RESULT_PATH, "visium_2",
        paste0("pvalue_qqplot.png")),
    plot = p, width = 12, height = 7
)
The batch_id is not provided!
All spots are assumed to be from the same batch and share the same gene platform effects.

Select high-abundance cell types to analyze with proportion_threshold = 0.1 and celltype_threshold = 100.

mcubeFilterCellTypes: Cell type(s) Ext_Hpc_CA1, Ext_L23, Ext_L5_1, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_4, Oligo_2 pass the threshold.

Cell type(s) Astro_AMY, Astro_AMY_CTX, Astro_CTX, Astro_HPC, Astro_HYPO, Astro_STR, Astro_THAL_hab, Astro_THAL_lat, Astro_THAL_med, Astro_WM, Endo, Ext_Amy_1, Ext_Amy_2, Ext_ClauPyr, Ext_Hpc_CA2, Ext_Hpc_CA3, Ext_Hpc_DG1, Ext_Hpc_DG2, Ext_L25, Ext_L56, Ext_L5_2, Ext_L5_3, Ext_L6, Ext_L6B, Ext_Med, Ext_Pir, Ext_Unk_1, Ext_Unk_2, Ext_Unk_3, Inh_2, Inh_3, Inh_5, Inh_6, Inh_Lamp5, Inh_Meis2_1, Inh_Meis2_2, Inh_Meis2_3, Inh_Meis2_4, Inh_Pvalb, Inh_Sst, Inh_Vip, LowQ_1, LowQ_2, Micro, Nb_1, Nb_2, OPC_1, OPC_2, Oligo_1, Unk_1, Unk_2 has/have less than the minimum celltype_threshold = 100 with proportion_threshold = 0.1.
To include the above cell type(s), please reduce celltype_threshold or proportion_threshold.

Cell type(s) Ext_Hpc_CA1, Ext_L23, Ext_L5_1, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_4, Oligo_2 will be analyzed.

Filter out lowly-expressed genes with gene_threshold = 5e-05.

mcubeFilterGenes: 2767 genes pass the threshold.

The platform effects are not provided and need to be estimated from data!

Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.

mcubeFilterGenesCellType: Select 1177 genes to analyze for Ext_Hpc_CA1.

mcubeFilterGenesCellType: Select 1311 genes to analyze for Ext_L23.

mcubeFilterGenesCellType: Select 1292 genes to analyze for Ext_L5_1.

mcubeFilterGenesCellType: Select 1229 genes to analyze for Ext_Thal_1.

mcubeFilterGenesCellType: Select 1400 genes to analyze for Ext_Thal_2.

mcubeFilterGenesCellType: Select 1572 genes to analyze for Inh_1.

mcubeFilterGenesCellType: Select 1557 genes to analyze for Inh_4.

mcubeFilterGenesCellType: Select 1351 genes to analyze for Oligo_2.

Preprocessed data description: 3925 spots and 59 cell types in total. 3024 spots, 2754 genes, and 8 cell type(s) to analyze.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

mcubeKernel: length_scale is set as 0.104577495815249 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.147894912900941 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.104577495815249 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.147894912900941 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_12_1.png
[10]:
saveRDS(
    mcube_object_2,
    file = file.path(
        RESULT_PATH, "visium_2",
        paste0("mcube", ".rds")
    )
)
# mcube_object_2 <- readRDS(
#     file = file.path(
#         RESULT_PATH, "visium_2",
#         paste0("mcube", ".rds")
#     )
# )
[11]:
# Negative control

mcube_object_null_2 <- mcube_object_2
mcube_object_null_2@coordinates <- coordinates_shuffle
mcube_object_null_2 <- mcubeTest(
    mcube_object_null_2,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

p <- mcubePlotPvalues(
    mcube_object_null_2@pvalues, "combined_pvalue",
    nrow = 2, under_null = TRUE
)
p <- p +
    labs(title = "10x Visium Slice 2") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
p
ggsave(
    filename = file.path(
        RESULT_PATH, "visium_2",
        paste0("pvalue_qqplot_null.png")),
    plot = p, width = 12, height = 7
)

pvalues_null_2 <- mcube_object_null_2@pvalues
rm(mcube_object_null_2)
saveRDS(
    pvalues_null_2,
    file = file.path(
        RESULT_PATH, "visium_2",
        paste0("mcube_pvalues_null", ".rds")
    )
)
# pvalues_null_2 <- readRDS(
#     file = file.path(
#         RESULT_PATH, "visium_2",
#         paste0("mcube_pvalues_null", ".rds")
#     )
# )
mcubeKernel: length_scale is set as 0.104577495815249 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.147894912900941 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.104577495815249 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.147894912900941 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_14_1.png

Comparing the results of the two slices

[12]:
mcube_object_1 <- readRDS(
    file = file.path(
        RESULT_PATH, "visium_1",
        paste0("mcube", ".rds")
    )
)
mcube_object_2 <- readRDS(
    file = file.path(
        RESULT_PATH, "visium_2",
        paste0("mcube", ".rds")
    )
)

pvalues_null_1 <- readRDS(
    file = file.path(
        RESULT_PATH, "visium_1",
        paste0("mcube_pvalues_null", ".rds")
    )
)
pvalues_null_2 <- readRDS(
    file = file.path(
        RESULT_PATH, "visium_2",
        paste0("mcube_pvalues_null", ".rds")
    )
)
[13]:
# For visualization
mcube_object_1@coordinates[, "y"] <-
    -mcube_object_1@coordinates[, "y"] + sum(range(mcube_object_1@coordinates[, "y"]))
mcube_object_2@coordinates[, "y"] <-
    -mcube_object_2@coordinates[, "y"] + sum(range(mcube_object_2@coordinates[, "y"]))

Cell type deconvolution results

[14]:
proportions_long_1 <- mcube_object_1@proportions[, mcube_object_1@celltype_test] %>%
  cbind(mcube_object_1@coordinates) %>%
  as.data.frame() %>%
  tidyr::pivot_longer(
    cols = -(x:y),
    names_to = "Cell type",
    values_to = "Proportion"
  )
# head(proportions_long_1)
proportions_long_2 <- mcube_object_2@proportions[, mcube_object_2@celltype_test] %>%
  cbind(mcube_object_2@coordinates) %>%
  as.data.frame() %>%
  tidyr::pivot_longer(
    cols = -(x:y),
    names_to = "Cell type",
    values_to = "Proportion"
  )
# head(proportions_long_2)

overlap_celltypes <- intersect(
    unique(proportions_long_1$`Cell type`),
    unique(proportions_long_2$`Cell type`)
)
overlap_celltypes <- sort(overlap_celltypes)
# overlap_celltypes

proportions_long <- rbind(
  cbind(
    proportions_long_1[proportions_long_1$`Cell type` %in% overlap_celltypes, ],
    Dataset = "Slice 1"
  ),
  cbind(
    proportions_long_2[proportions_long_2$`Cell type` %in% overlap_celltypes, ],
    Dataset = "Slice 2"
  )
)
# head(proportions_long)
[15]:
p <- ggplot(proportions_long, aes(x = x, y = y)) +
    geom_point(aes(color = Proportion), size = 1, alpha = 1) +
    scale_colour_gradientn(name = NULL, colors = pals::brewer.orrd(22)[3:22]) +
    scale_y_continuous(trans = "reverse") +
    coord_fixed(ratio = 1) +
    facet_grid(Dataset ~ `Cell type`) +
    labs(
        title = "Estimated cell type proportions by STitch3D",
        x = "x", y = "y"
    ) +
    theme_classic() +
    theme(
        text = element_text(family = "Helvetica"),
        plot.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        strip.text = element_text(size = 16),
        legend.title = element_blank(),
        legend.text = element_text(size = 16),
        legend.position = "right"
    )
ggsave(
    filename = file.path(RESULT_PATH, "proportions_comparison.pdf"),
    plot = p, width = 15, height = 4
)
ggsave(
    filename = file.path(RESULT_PATH, "proportions_comparison.png"),
    plot = p, width = 15, height = 4
)
p
../../_images/analysis_mouse_brain_mouse_brain_MCube_20_0.png

\(P\)-values

[16]:
pvalues_all <- merge(
    do.call(
        rbind,
        lapply(
            names(mcube_object_1@pvalues),
            FUN = function(x) {
                pvalues_x <- mcubeQQPlotDF(
                    mcube_object_1@pvalues[[x]],
                    minus_log10p_max = 17
                )
                pvalues_x$celltype <- x
                return(pvalues_x)
            }
        )
    ),
    do.call(
        rbind,
        lapply(
            names(mcube_object_2@pvalues),
            FUN = function(x) {
                pvalues_x <- mcubeQQPlotDF(
                    mcube_object_2@pvalues[[x]],
                    minus_log10p_max = 17
                )
                pvalues_x$celltype <- x
                return(pvalues_x)
            }
        )
    ),
    by = c("gene", "celltype"),
    suffixes = c("_slice_1", "_slice_2")
)

correlations <- pvalues_all %>%
    group_by(celltype) %>%
    summarize(
        correlation = cor(minus_log10p_slice_1, minus_log10p_slice_2, method = "pearson")
    )
[17]:
p <- ggplot(data = pvalues_all, aes(x = minus_log10p_slice_1, y = minus_log10p_slice_2)) +
    geom_point(aes(color = celltype), size = 5, alpha = 0.5) +
    geom_abline(intercept = 0, slope = 1, linetype = "dashed", linewidth = 1.5) +
    geom_text(
        data = correlations,
        mapping = aes(
            x = 4, y = 15,
            label = paste("r =", round(correlation, 2))
        ),
        size = 5
    ) +
    scale_x_continuous(limits = c(0, 17), oob = scales::oob_squish) +
    scale_y_continuous(limits = c(0, 17), oob = scales::oob_squish) +
    coord_fixed(ratio = 1) +
    facet_wrap(. ~ celltype, nrow = 1) +
    labs(
        title = expression(paste("-log"[10], plain("P"), " comparison")),
        x = "Slice 1", y = "Slice 2"
    ) +
    theme_classic() +
        theme(
            text = element_text(family = "Helvetica", size = 16),
            # plot.title = element_text(size = 20),
            plot.title = element_blank(),
            axis.title = element_text(size = 18),
            axis.text.x = element_text(size = 16),
            axis.text.y = element_text(size = 16),
            legend.position = "none"
        )
ggsave(
    filename = file.path(RESULT_PATH, paste0("pvalues_comparison", ".pdf")),
    plot = p, width = 15, height = 3
)
p
../../_images/analysis_mouse_brain_mouse_brain_MCube_23_0.png

Cell type Ext_Thal_1

[18]:
celltype <- "Ext_Thal_1"

p1 <- mcubePlotPropCellType(mcube_object_1@proportions, mcube_object_1@coordinates, celltype) +
    labs(title = paste(celltype, "in Slice 1")) +
    theme(
        text = element_text(size = 1, hjust = 0.5),
        plot.title = element_text(size = 20, hjust = 0.5),
        axis.title = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "none"
    )
ggsave(
    filename = file.path(RESULT_PATH, paste0("proportion_", celltype, "_slice_1", ".pdf")),
    plot = p1, width = 4, height = 4
)

p2 <- mcubePlotPropCellType(mcube_object_2@proportions, mcube_object_2@coordinates, celltype) +
    labs(title = paste(celltype, "in Slice 2")) +
    theme(
        text = element_text(size = 1, hjust = 0.5),
        plot.title = element_text(size = 20, hjust = 0.5),
        axis.title = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "none"
    )
ggsave(
    filename = file.path(RESULT_PATH, paste0("proportion_", celltype, "_slice_2", ".pdf")),
    plot = p2, width = 4, height = 4
)

p1
p2
../../_images/analysis_mouse_brain_mouse_brain_MCube_25_0.png
../../_images/analysis_mouse_brain_mouse_brain_MCube_25_1.png
[19]:
minus_log10p_max <- 5
pvalues_plot_df <- rbind(
    data.frame(
        mcubeQQPlotDF(pvalues_null_1[[celltype]], minus_log10p_max = 5),
        slice = "Slice 1",
        case = "Negative control"
    ),
    data.frame(
        mcubeQQPlotDF(pvalues_null_2[[celltype]], minus_log10p_max = 5),
        slice = "Slice 2",
        case = "Negative control"
    ),
    data.frame(
        mcubeQQPlotDF(mcube_object_1@pvalues[[celltype]], minus_log10p_max = 5),
        slice = "Slice 1",
        case = "Real case"
    ),
    data.frame(
        mcubeQQPlotDF(mcube_object_2@pvalues[[celltype]], minus_log10p_max = 5),
        slice = "Slice 2",
        case = "Real case"
    )
)
pvalues_plot_df$case <- factor(pvalues_plot_df$case, levels = c("Real case", "Negative control"))

head(pvalues_plot_df[(pvalues_plot_df$slice == "Slice 1") & (pvalues_plot_df$case == "Real case"), ])
head(pvalues_plot_df[(pvalues_plot_df$slice == "Slice 2") & (pvalues_plot_df$case == "Real case"), ])
A data.frame: 6 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalslicecase
<chr><dbl><dbl><dbl><dbl><chr><fct>
1728Pcp4 2.163879e-730.000588235353.230449Slice 1Real case
1729Cplx1 2.607116e-570.001764705952.753328Slice 1Real case
1730Atp1b19.317878e-440.002941176552.531479Slice 1Real case
1731Nsmf 9.545463e-380.004117647152.385351Slice 1Real case
1732Snap258.652090e-350.005294117652.276206Slice 1Real case
1733Tcf7l21.218709e-270.006470588252.189056Slice 1Real case
A data.frame: 6 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalslicecase
<chr><dbl><dbl><dbl><dbl><chr><fct>
2578Pcp4 9.857794e-630.000570125453.244030Slice 2Real case
2579Cplx1 8.096871e-560.001710376352.766908Slice 2Real case
2580Atp1b11.403904e-420.002850627152.545060Slice 2Real case
2581Nsmf 2.231085e-350.003990878052.398932Slice 2Real case
2582Snap259.386885e-350.005131128852.289787Slice 2Real case
2583Kif5a 3.131247e-250.006271379752.202637Slice 2Real case
[20]:
# demo_genes <- c("Pcp4", "Cplx1")
demo_genes <- c("Pcp4", "Nrxn3")
demo_genes_df <- pvalues_plot_df[(pvalues_plot_df$gene %in% demo_genes) & (pvalues_plot_df$case == "Real case"), ]
[21]:
expected_minus_log10p_lab <- expression(paste("Expected -log"[10], plain(P)))
observed_minus_log10p_lab <- expression(paste("Observed -log"[10], plain(P)))

p <- ggplot(data = pvalues_plot_df, aes(x = minus_log10p_theoretical, y = minus_log10p)) +
    geom_point(aes(color = case), size = 5, alpha = 0.5) +
    geom_abline(intercept = 0, slope = 1, linetype = "dashed", linewidth = 2) +
    geom_point(data = demo_genes_df, color = "dodgerblue", size = 5, alpha = 1) +
    ggrepel::geom_text_repel(
        data = demo_genes_df, aes(label = gene),
        color = "dodgerblue", size = 5, fontface = "bold.italic",
        direction = "y", vjust = 0, hjust = 1, point.size = 5
    ) +
    labs(
        title = celltype,
        x = expected_minus_log10p_lab, y = observed_minus_log10p_lab
    ) +
    facet_wrap(. ~ slice, nrow = 1) +
    theme_classic() +
    theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.title = element_text(size = 20),
            axis.text.x = element_text(size = 18),
            axis.text.y = element_text(size = 18),
            strip.text = element_text(size = 20),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
ggsave(
    filename = file.path(RESULT_PATH, paste0("pvalue_qqplot_", celltype, ".pdf")),
    plot = p, width = 6, height = 5
)
p
../../_images/analysis_mouse_brain_mouse_brain_MCube_28_0.png
[22]:
for (gene in demo_genes) {
    p1 <- mcubePlotExprCellTypeBinary(
        mcube_object_1,
        celltype = celltype, gene = gene, background = TRUE,
        opacity_target = 1, opacity_background = 0.5,
    ) + guides(color = guide_legend(override.aes = list(size = 5))) +
        labs(title = bquote("Slice 1: "~ italic(.(gene)) ~ "in" ~ .(celltype))) +
        theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.text.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks = element_blank(),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
    ggsave(
        filename = file.path(RESULT_PATH, paste0(celltype, "_", gene, "_slice_1", ".pdf")),
        plot = p1, width = 4, height = 5
    )

    p2 <- mcubePlotExprCellTypeBinary(
        mcube_object_2,
        celltype = celltype, gene = gene, background = TRUE,
        opacity_target = 1, opacity_background = 0.5,
    ) + guides(color = guide_legend(override.aes = list(size = 5))) +
        labs(title = bquote("Slice 2: "~ italic(.(gene)) ~ "in" ~ .(celltype))) +
        theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.text.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks = element_blank(),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
    ggsave(
        filename = file.path(RESULT_PATH, paste0(celltype, "_", gene, "_slice_2", ".pdf")),
        plot = p2, width = 4, height = 5
    )
}

p1
p2
../../_images/analysis_mouse_brain_mouse_brain_MCube_29_0.png
../../_images/analysis_mouse_brain_mouse_brain_MCube_29_1.png

Cell type Inh_1

[23]:
celltype <- "Inh_1"

p1 <- mcubePlotPropCellType(mcube_object_1@proportions, mcube_object_1@coordinates, celltype) +
    labs(title = paste(celltype, "in Slice 1")) +
    theme(
        text = element_text(size = 1, hjust = 0.5),
        plot.title = element_text(size = 20, hjust = 0.5),
        axis.title = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "none"
    )
ggsave(
    filename = file.path(RESULT_PATH, paste0("proportion_", celltype, "_slice_1", ".pdf")),
    plot = p1, width = 4, height = 4
)

p2 <- mcubePlotPropCellType(mcube_object_2@proportions, mcube_object_2@coordinates, celltype) +
    labs(title = paste(celltype, "in Slice 2")) +
    theme(
        text = element_text(size = 1, hjust = 0.5),
        plot.title = element_text(size = 20, hjust = 0.5),
        axis.title = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "none"
    )
ggsave(
    filename = file.path(RESULT_PATH, paste0("proportion_", celltype, "_slice_2", ".pdf")),
    plot = p2, width = 4, height = 4
)

p1
p2
../../_images/analysis_mouse_brain_mouse_brain_MCube_31_0.png
../../_images/analysis_mouse_brain_mouse_brain_MCube_31_1.png
[24]:
minus_log10p_max <- 5
pvalues_plot_df <- rbind(
    data.frame(
        mcubeQQPlotDF(pvalues_null_1[[celltype]], minus_log10p_max = 5),
        slice = "Slice 1",
        case = "Negative control"
    ),
    data.frame(
        mcubeQQPlotDF(pvalues_null_2[[celltype]], minus_log10p_max = 5),
        slice = "Slice 2",
        case = "Negative control"
    ),
    data.frame(
        mcubeQQPlotDF(mcube_object_1@pvalues[[celltype]], minus_log10p_max = 5),
        slice = "Slice 1",
        case = "Real case"
    ),
    data.frame(
        mcubeQQPlotDF(mcube_object_2@pvalues[[celltype]], minus_log10p_max = 5),
        slice = "Slice 2",
        case = "Real case"
    )
)
pvalues_plot_df$case <- factor(pvalues_plot_df$case, levels = c("Real case", "Negative control"))

head(pvalues_plot_df[(pvalues_plot_df$slice == "Slice 1") & (pvalues_plot_df$case == "Real case"), ])
head(pvalues_plot_df[(pvalues_plot_df$slice == "Slice 2") & (pvalues_plot_df$case == "Real case"), ])
A data.frame: 6 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalslicecase
<chr><dbl><dbl><dbl><dbl><chr><fct>
2053Pmch 3.432879e-2270.000497512453.303196Slice 1Real case
2054Gal 2.052655e-800.001492537352.826075Slice 1Real case
2055Vsnl1 1.337647e-640.002487562252.604226Slice 1Real case
2056Rasgrf2 1.621062e-580.003482587152.458098Slice 1Real case
2057Dlk1 1.103690e-550.004477611952.348954Slice 1Real case
2058Camk2d 2.890414e-550.005472636852.261803Slice 1Real case
A data.frame: 6 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalslicecase
<chr><dbl><dbl><dbl><dbl><chr><fct>
3058Pmch 1.089767e-1700.000477554953.320977Slice 2Real case
3059Gal 1.707627e-680.001432664852.843855Slice 2Real case
3060Vsnl1 3.919878e-510.002387774652.622007Slice 2Real case
3061Dlk1 1.704307e-440.003342884452.475879Slice 2Real case
3062Atp2b4 4.548976e-430.004297994352.366734Slice 2Real case
3063Peg3 3.590220e-410.005253104152.279584Slice 2Real case
[25]:
# demo_genes <- c("Pmch", "Gal") # Camk2d, Slc32a1, Unc13c, Nsf
demo_genes <-c("Calb1", "Gnas")
demo_genes_df <- pvalues_plot_df[(pvalues_plot_df$gene %in% demo_genes) & (pvalues_plot_df$case == "Real case"), ]
[26]:
p <- ggplot(data = pvalues_plot_df, aes(x = minus_log10p_theoretical, y = minus_log10p)) +
    geom_point(aes(color = case), size = 5, alpha = 0.5) +
    geom_abline(intercept = 0, slope = 1, linetype = "dashed", linewidth = 2) +
    geom_point(data = demo_genes_df, color = "dodgerblue", size = 5, alpha = 1) +
    ggrepel::geom_text_repel(
        data = demo_genes_df, aes(label = gene),
        color = "dodgerblue", size = 5, fontface = "bold.italic",
        direction = "y", vjust = -0.5, hjust = 0.5, point.size = 5
    ) +
    labs(
        title = celltype,
        x = expected_minus_log10p_lab, y = observed_minus_log10p_lab
    ) +
    facet_wrap(. ~ slice, nrow = 1) +
    theme_classic() +
    theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.title = element_text(size = 20),
            axis.text.x = element_text(size = 18),
            axis.text.y = element_text(size = 18),
            strip.text = element_text(size = 20),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
ggsave(
    filename = file.path(RESULT_PATH, paste0("pvalue_qqplot_", celltype, ".pdf")),
    plot = p, width = 6, height = 5
)
p
../../_images/analysis_mouse_brain_mouse_brain_MCube_34_0.png
[27]:
for (gene in demo_genes) {
    p1 <- mcubePlotExprCellTypeBinary(
        mcube_object_1,
        celltype = celltype, gene = gene, background = TRUE,
        opacity_target = 1, opacity_background = 0.5,
    ) + guides(color = guide_legend(override.aes = list(size = 5))) +
        labs(title = bquote("Slice 1: "~ italic(.(gene)) ~ "in" ~ .(celltype))) +
        theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.text.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks = element_blank(),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
    ggsave(
        filename = file.path(RESULT_PATH, paste0(celltype, "_", gene, "_slice_1", ".pdf")),
        plot = p1, width = 4, height = 5
    )

    p2 <- mcubePlotExprCellTypeBinary(
        mcube_object_2,
        celltype = celltype, gene = gene, background = TRUE,
        opacity_target = 1, opacity_background = 0.5,
    ) + guides(color = guide_legend(override.aes = list(size = 5))) +
        labs(title = bquote("Slice 2: "~ italic(.(gene)) ~ "in" ~ .(celltype))) +
        theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.text.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks = element_blank(),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
    ggsave(
        filename = file.path(RESULT_PATH, paste0(celltype, "_", gene, "_slice_2", ".pdf")),
        plot = p2, width = 4, height = 5
    )
}

p1
p2
../../_images/analysis_mouse_brain_mouse_brain_MCube_35_0.png
../../_images/analysis_mouse_brain_mouse_brain_MCube_35_1.png
[28]:
rm(mcube_object_1, mcube_object_2, pvalues_null_1, pvalues_null_2)

3D adult mouse brain model constructed from Spatial Transcriptomics dataset

[29]:
n_slices <- 35
slice_idx_seq <- c(1:n_slices)

reference_ST <- data.matrix(
    read.csv(file.path(DATA_PATH, "ST_3D", "reference_ST.csv"),
        header = TRUE, row.names = 1, check.names = FALSE
    )
)
dim(reference_ST)
num_celltypes <- nrow(reference_ST)

proportions_ST <- matrix(, nrow = 0, ncol = num_celltypes)
batch_id <- vector("character")
for(slice_idx in slice_idx_seq){
    proportions_i <- as.matrix(read.csv(
        file.path(
            DATA_PATH, "ST_3D",
            paste0("prop_slice", slice_idx - 1, ".csv")
        ),
        header = TRUE, row.names = 1, check.names = FALSE
    ))
    proportions_ST <- rbind(proportions_ST, proportions_i)
    batch_id <- c(batch_id, rep(paste0("slice", slice_idx - 1), nrow(proportions_i)))
}
dim(proportions_ST)
num_spots <- nrow(proportions_ST)
spots <- rownames(proportions_ST)
names(batch_id) <- spots


counts <- as.data.frame(readr::read_csv(
    file.path(DATA_PATH, "ST_3D", "counts.csv")
))
rownames(counts) <- counts[, 1]
counts[, 1] <- NULL
counts <- counts[spots, ]
counts <- data.matrix(counts)
dim(counts)

coordinates <- as.matrix(read.csv(
    file.path(DATA_PATH, "ST_3D", "3D_coordinates.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))

shuffle_idx <- sample(1:nrow(coordinates), nrow(coordinates))
write.csv(shuffle_idx, file = file.path(DATA_PATH, "ST_3D", "shuffle_idx.csv"), row.names = FALSE)
shuffle_idx <- read.csv(file.path(DATA_PATH, "ST_3D", "shuffle_idx.csv"), header = TRUE)[, 1]
length(unique(shuffle_idx))

coordinates_shuffle <- coordinates[shuffle_idx, ]
rownames(coordinates_shuffle) <- rownames(coordinates)
write.csv(coordinates_shuffle, file = file.path(DATA_PATH, "ST_3D", "3D_coordinates_shuffle.csv"))
coordinates_shuffle <- as.matrix(read.csv(
    file.path(DATA_PATH, "ST_3D", "3D_coordinates_shuffle.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))

coordinates <- coordinates[spots, ]
coordinates_shuffle <- coordinates_shuffle[spots, ]
dim(coordinates)
dim(coordinates_shuffle)

spot_effects_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "ST_3D", "spot_effects_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[spots, 1]

platform_effects_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "ST_3D", "platform_effects_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))

library_sizes_ST <- data.matrix(read.csv(
    file.path(DATA_PATH, "ST_3D", "library_sizes_ST.csv"),
    header = TRUE, row.names = 1, check.names = FALSE
))[spots, 1]
  1. 59
  2. 6227
  1. 17086
  2. 59
New names:
 `` -> `...1`
Rows: 17086 Columns: 6228
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr    (1): ...1
dbl (6227): A230001M10Rik, A230006K03Rik, A230009B12Rik, A230065N10Rik, A230...

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. 17086
  2. 6227
17086
  1. 17086
  2. 3
  1. 17086
  2. 3
[30]:
# Real case

mcube_object_3D <- createMCube(
    counts = counts, coordinates = coordinates,
    proportions = proportions_ST, library_sizes = library_sizes_ST,
    covariates = NULL, batch_id = batch_id,
    reference = reference_ST,
    spot_effects = spot_effects_ST, platform_effects = platform_effects_ST,
    project = "mouse_brain_3D"
)
mcube_object_3D <- mcubeFitNull(
    mcube_object_3D,
    num_workers = 70, num_threads = 1
)
mcube_object_3D <- mcubeTest(
    mcube_object_3D,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

p <- mcubePlotPvalues(mcube_object_3D@pvalues, "combined_pvalue", nrow = 4)
p <- p +
    labs(title = "3D Spatial Transcriptomics") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "ST_3D",
        paste0("pvalue_qqplot.png")
    ),
    plot = p, width = 12, height = 12
)
p
Select high-abundance cell types to analyze with proportion_threshold = 0.1 and celltype_threshold = 100.

mcubeFilterCellTypes: Cell type(s) Ext_Amy_2, Ext_Hpc_CA1, Ext_Hpc_CA3, Ext_Hpc_DG1, Ext_L23, Ext_L25, Ext_L56, Ext_L5_1, Ext_L5_2, Ext_L6, Ext_L6B, Ext_Med, Ext_Pir, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_2, Inh_4, Inh_5, Inh_Meis2_2, Inh_Meis2_3, Oligo_2 pass the threshold.

Cell type(s) Astro_AMY, Astro_AMY_CTX, Astro_CTX, Astro_HPC, Astro_HYPO, Astro_STR, Astro_THAL_hab, Astro_THAL_lat, Astro_THAL_med, Astro_WM, Endo, Ext_Amy_1, Ext_ClauPyr, Ext_Hpc_CA2, Ext_Hpc_DG2, Ext_L5_3, Ext_Unk_1, Ext_Unk_2, Ext_Unk_3, Inh_3, Inh_6, Inh_Lamp5, Inh_Meis2_1, Inh_Meis2_4, Inh_Pvalb, Inh_Sst, Inh_Vip, LowQ_1, LowQ_2, Micro, Nb_1, Nb_2, OPC_1, OPC_2, Oligo_1, Unk_1, Unk_2 has/have less than the minimum celltype_threshold = 100 with proportion_threshold = 0.1.
To include the above cell type(s), please reduce celltype_threshold or proportion_threshold.

Cell type(s) Ext_Amy_2, Ext_Hpc_CA1, Ext_Hpc_CA3, Ext_Hpc_DG1, Ext_L23, Ext_L25, Ext_L56, Ext_L5_1, Ext_L5_2, Ext_L6, Ext_L6B, Ext_Med, Ext_Pir, Ext_Thal_1, Ext_Thal_2, Inh_1, Inh_2, Inh_4, Inh_5, Inh_Meis2_2, Inh_Meis2_3, Oligo_2 will be analyzed.

Filter out lowly-expressed genes with gene_threshold = 5e-05.

mcubeFilterGenes: 2841 genes pass the threshold.

Select highly-expressed genes to analyze for each specific cell type with reference_threshold = 0.5.

mcubeFilterGenesCellType: Select 1091 genes to analyze for Ext_Amy_2.

mcubeFilterGenesCellType: Select 926 genes to analyze for Ext_Hpc_CA1.

mcubeFilterGenesCellType: Select 850 genes to analyze for Ext_Hpc_CA3.

mcubeFilterGenesCellType: Select 1113 genes to analyze for Ext_Hpc_DG1.

mcubeFilterGenesCellType: Select 1000 genes to analyze for Ext_L23.

mcubeFilterGenesCellType: Select 1049 genes to analyze for Ext_L25.

mcubeFilterGenesCellType: Select 1089 genes to analyze for Ext_L56.

mcubeFilterGenesCellType: Select 1000 genes to analyze for Ext_L5_1.

mcubeFilterGenesCellType: Select 982 genes to analyze for Ext_L5_2.

mcubeFilterGenesCellType: Select 1061 genes to analyze for Ext_L6.

mcubeFilterGenesCellType: Select 1126 genes to analyze for Ext_L6B.

mcubeFilterGenesCellType: Select 1073 genes to analyze for Ext_Med.

mcubeFilterGenesCellType: Select 1060 genes to analyze for Ext_Pir.

mcubeFilterGenesCellType: Select 1032 genes to analyze for Ext_Thal_1.

mcubeFilterGenesCellType: Select 1125 genes to analyze for Ext_Thal_2.

mcubeFilterGenesCellType: Select 1320 genes to analyze for Inh_1.

mcubeFilterGenesCellType: Select 1254 genes to analyze for Inh_2.

mcubeFilterGenesCellType: Select 1331 genes to analyze for Inh_4.

mcubeFilterGenesCellType: Select 1235 genes to analyze for Inh_5.

mcubeFilterGenesCellType: Select 1087 genes to analyze for Inh_Meis2_2.

mcubeFilterGenesCellType: Select 1062 genes to analyze for Inh_Meis2_3.

mcubeFilterGenesCellType: Select 1246 genes to analyze for Oligo_2.

Preprocessed data description: 17065 spots and 59 cell types in total. 16307 spots, 2840 genes, and 22 cell type(s) to analyze.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

mcubeKernel: length_scale is set as 0.15924412901695 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.225205206984061 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.15924412901695 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.225205206984061 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_39_1.png
[31]:
saveRDS(
    mcube_object_3D,
    file = file.path(
        RESULT_PATH, "ST_3D",
        paste0("mcube", ".rds")
    )
)
# mcube_object_3D <- readRDS(
#     file = file.path(
#         RESULT_PATH, "ST_3D",
#         paste0("mcube", ".rds")
#     )
# )
[32]:
# Negative control

mcube_object_null_3D <- mcube_object_3D
mcube_object_null_3D@coordinates <- as.matrix(coordinates_shuffle)
mcube_object_null_3D <- mcubeTest(
    mcube_object_null_3D,
    num_workers = 70, num_threads = 1, shared_memory = TRUE
)

pvalues_null_3D <- mcube_object_null_3D@pvalues
rm(mcube_object_null_3D)

p <- mcubePlotPvalues(pvalues_null_3D, "combined_pvalue", nrow = 4, under_null = TRUE)
p <- p +
    labs(title = "3D Spatial Transcriptomics") +
    theme(
        text = element_text(size = 16),
        plot.title = element_text(size = 20),
        axis.title = element_text(size = 20),
        axis.text.x = element_text(size = 16),
        axis.text.y = element_text(size = 16)
    )
ggsave(
    filename = file.path(
        RESULT_PATH, "ST_3D",
        paste0("p_value_qqplot_null.png")),
    plot = p, width = 12, height = 7
)
p
mcubeKernel: length_scale is set as 0.158383855293133 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.223988596216487 for the Gaussian kernel.

mcubeKernel: length_scale is set as 0.158383855293133 for the Gaussian_transformed kernel.

mcubeKernel: length_scale is set as 0.223988596216487 for the Gaussian_transformed kernel.

Number of physical cores: 72.

Number of workers: 70.

Number of thread(s) on BLAS per worker: 1.

../../_images/analysis_mouse_brain_mouse_brain_MCube_41_1.png
[33]:
saveRDS(
    pvalues_null_3D,
    file = file.path(
        RESULT_PATH, "ST_3D",
        paste0("mcube_pvalues_null", ".rds")
    )
)
# pvalues_null_3D <- readRDS(
#     file = file.path(
#         RESULT_PATH, "ST_3D",
#         paste0("mcube_pvalues_null", ".rds")
#     )
# )

Visualization

[34]:
mcube_object_3D <- readRDS(
    file = file.path(
        RESULT_PATH, "ST_3D",
        paste0("mcube", ".rds")
    )
)

pvalues_null_3D <- readRDS(
    file = file.path(
        RESULT_PATH, "ST_3D",
        paste0("mcube_pvalues_null", ".rds")
    )
)
[35]:
minus_log10p_max <- 5
demo_celltypes <- c("Ext_Thal_1", "Inh_1")
pvalues_plot_df <- data.frame()
for (celltype in demo_celltypes) {
    pvalues_plot_df <- rbind(
        pvalues_plot_df,
        data.frame(
            mcubeQQPlotDF(pvalues_null_3D[[celltype]], minus_log10p_max = 5),
            case = "Negative control",
            celltype = celltype
        ),
        data.frame(
            mcubeQQPlotDF(mcube_object_3D@pvalues[[celltype]], minus_log10p_max = 5),
            case = "Real case",
            celltype = celltype
        )
    )
}
pvalues_plot_df$case <- factor(pvalues_plot_df$case, levels = c("Real case", "Negative control"))
head(pvalues_plot_df[pvalues_plot_df$case == "Real case", ])
A data.frame: 6 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalcasecelltype
<chr><dbl><dbl><dbl><dbl><fct><chr>
775Nsmf 1.045641e-630.000645994853.189771Real caseExt_Thal_1
776Pcp4 3.496110e-620.001937984552.712650Real caseExt_Thal_1
777Cplx1 2.413101e-430.003229974252.490801Real caseExt_Thal_1
778Atp2a2 7.523337e-360.004521963852.344673Real caseExt_Thal_1
779Gprasp16.810823e-310.005813953552.235528Real caseExt_Thal_1
780Rora 6.613392e-300.007105943252.148378Real caseExt_Thal_1
[36]:
demo_genes <- list(
    Ext_Thal_1 = c("Pcp4", "Nrxn3"),
    Inh_1 = c("Calb1", "Gnas")
)
demo_genes_df <- rbind(
    pvalues_plot_df[
        (pvalues_plot_df$celltype == demo_celltypes[1]) &
            (pvalues_plot_df$gene %in% demo_genes[[1]]) &
            (pvalues_plot_df$case == "Real case"),
    ],
    pvalues_plot_df[
        (pvalues_plot_df$celltype == demo_celltypes[2]) &
            (pvalues_plot_df$gene %in% demo_genes[[2]]) &
            (pvalues_plot_df$case == "Real case"),
    ]
)
demo_genes_df
A data.frame: 4 × 7
genepvaluepvalue_theoreticalminus_log10pminus_log10p_theoreticalcasecelltype
<chr><dbl><dbl><dbl><dbl><fct><chr>
776Pcp4 3.496110e-620.001937984552.712650Real caseExt_Thal_1
787Nrxn32.314395e-220.016149870851.791831Real caseExt_Thal_1
2488Calb11.305533e-570.000532481453.273696Real caseInh_1
2493Gnas 6.143010e-300.005857295052.232303Real caseInh_1
[37]:
expected_minus_log10p_lab <- expression(paste("Expected -log"[10], plain(P)))
observed_minus_log10p_lab <- expression(paste("Observed -log"[10], plain(P)))

p <- ggplot(data = pvalues_plot_df, aes(x = minus_log10p_theoretical, y = minus_log10p)) +
    geom_point(aes(color = case), size = 5, alpha = 0.5) +
    geom_abline(intercept = 0, slope = 1, linetype = "dashed", linewidth = 2) +
    geom_point(data = demo_genes_df, color = "dodgerblue", size = 5, alpha = 1) +
    ggrepel::geom_text_repel(
        data = demo_genes_df, aes(label = gene),
        color = "dodgerblue", size = 5, fontface = "bold.italic",
        direction = "y", vjust = 0, hjust = 1, point.size = 5
    ) +
    labs(
        title = "3D model reconstructed from the ST data",
        x = expected_minus_log10p_lab, y = observed_minus_log10p_lab
    ) +
    facet_wrap(. ~ celltype, nrow = 1) +
    theme_classic() +
    theme(
            text = element_text(family = "Helvetica", size = 16),
            plot.title = element_text(size = 20, hjust = 0.5),
            axis.title = element_text(size = 20),
            axis.text.x = element_text(size = 18),
            axis.text.y = element_text(size = 18),
            strip.text = element_text(size = 20),
            legend.title = element_blank(),
            legend.text = element_text(size = 20),
            legend.position = "bottom"
        )
ggsave(
    filename = file.path(RESULT_PATH, paste0("pvalue_qqplot_3D", ".pdf")),
    plot = p, width = 6, height = 5
)
p
../../_images/analysis_mouse_brain_mouse_brain_MCube_47_0.png
[38]:
# # For saving the plotly object as a static image with `kaleido`
# install.packages('reticulate')
# reticulate::install_miniconda()
# reticulate::conda_install('r-reticulate', 'python-kaleido')
# reticulate::conda_install('r-reticulate', 'plotly', channel = 'plotly')
# reticulate::use_miniconda('r-reticulate')
[39]:
celltype <- demo_celltypes[1]
p <- mcubePlotPropCellType3D(
    mcube_object_3D@proportions, mcube_object_3D@coordinates, celltype,
    spot_size = 1.5, opacity_background = 0.05,
    plotly_eye = list(x = 1.8, y = 0.5, z = 1.8)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0("proportion_", celltype, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)
p
[40]:
gene <- demo_genes[[demo_celltypes[1]]][1]
p <- mcubePlotExprCellType3D(
    mcube_object_3D, celltype, gene,
    spot_size = 1.5, opacity_target = 0.7, opacity_background = 0.05,
    plotly_eye = list(x = 1.8, y = 0.5, z = 1.8)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0(celltype, "_", gene, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)

gene <- demo_genes[[demo_celltypes[1]]][2]
p <- mcubePlotExprCellType3D(
    mcube_object_3D, celltype, gene,
    spot_size = 1.5, opacity_target = 0.7, opacity_background = 0.05,
    plotly_eye = list(x = 1.8, y = 0.5, z = 1.8)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0(celltype, "_", gene, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)
p
[41]:
celltype <- demo_celltypes[2]
p <- mcubePlotPropCellType3D(
    mcube_object_3D@proportions, mcube_object_3D@coordinates, celltype,
    spot_size = 1.5, opacity_background = 0.05,
    plotly_eye = list(x = -0.8, y = -1.2, z = 1.6)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0("proportion_", celltype, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)
p
[42]:
celltype <- demo_celltypes[2]
gene <- demo_genes[[demo_celltypes[2]]][1]
p <- mcubePlotExprCellType3D(
    mcube_object_3D, celltype, gene,
    spot_size = 1.5, opacity_target = 0.7, opacity_background = 0.05,
    plotly_eye = list(x = -0.8, y = -1.2, z = 1.6)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0(celltype, "_", gene, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)

gene <- demo_genes[[demo_celltypes[2]]][2]
p <- mcubePlotExprCellType3D(
    mcube_object_3D, celltype, gene,
    spot_size = 1.5, opacity_target = 0.7, opacity_background = 0.05,
    plotly_eye = list(x = -0.8, y = -1.2, z = 1.6)
)
plotly::save_image(
    p, file.path(RESULT_PATH, paste0(celltype, "_", gene, "_3D", ".pdf")),
    width = 500, height = 500, scale = 5
)
p
[43]:
# For 3D visualizations in the Allen Mouse Brain Common Coordinate Framework (CCFv3) using rgl and misc3d
demo_celltype_gene_pairs <- c(
    paste0(demo_celltypes[1], "_", demo_genes[[1]]),
    paste0(demo_celltypes[2], "_", demo_genes[[2]])
)
plot3D_data <- list(
    proportions = mcube_object_3D@proportions,
    coordinates = mcube_object_3D@coordinates,
    null_models = mcube_object_3D@null_models[demo_celltype_gene_pairs]
)
saveRDS(
    plot3D_data,
    file = file.path(
        RESULT_PATH, "ST_3D",
        paste0("plot3D_data", ".rds")
    )
)
[ ]: