Ideas / critique / comments? contact me on twitter or by email: mattmule AT gmail.com

Ming Tang tweeted:

1/ a question on CD4 mRNA vs protein. @CalebLareau I saw “CD4+ T cells express low levels of the CD4 transcript but very high levels of CD4 protein (Stoeckius et al., 2017)” in your paper 2/ is it based on sup figure 7 of Stoeckius et al. 2017? I used to think CD4 mRNA is not well captured in 10x

To investigate noise in CD4 mRNA, below I look at bone marrow CITE-seq data generated by the Satija lab in the SeuratData package and extract out CD4 mRNA and protein.

library(Seurat)
library(SeuratData)

data('bmcite')
s = bmcite
rm(bmcite)

# extract out CD4 UMI counts for rna and adt
prot = GetAssayData(object = s,slot = 'counts', assay = 'ADT')['CD4', ]
mRNA = GetAssayData(object = s,slot = 'counts', assay = 'RNA')['CD4', ]

d = cbind(
  s@meta.data, 
  CD4_PROT = prot, 
  CD4_mRNA = mRNA
  )

Since a major component of ADT noise in CITE-seq data is due to ambient capture of unbound ADTs nearly ALL of the cells in this experiment are ‘positive’ for CD4 protein. For example in the raw umi count space, what fraction of cells are positive for CD4 ADT?

length(prot[prot > 0]) / length(prot)
## [1] 0.9999348

That’s all the cells except for 2.

The cell populations with the lowest levels of CD4 protein still have a mean of ~4.5 in log2 space, about 25 UMI counts. The highest CD4 ADT expression is seen in the CD4 clusters have about 1000 raw UMI counts. This is very different from the type of noise we see in mRNA data! Also we can start to see that CD4 is actually not a very specific T cell marker – this is actually known CD4 is ubiquitously expressed at lower levels on monocytes as we can see below.

library(ggplot2);  theme_set(theme_bw())
ggplot(d, aes( x = reorder(celltype.l2, log2(CD4_PROT)), y = log2(CD4_PROT))) +
  xlab("")+
  geom_boxplot(outlier.size = 0.1, size = 0.1) + 
  coord_flip() + 
  ggtitle('black line = 10 ADT counts')