CellChat/tutorial/CellChat_analysis_of_spatial_transcriptomics_data.Rmd at main · kimsjune/CellChat · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299

---
title: "CellChat inference and analysis of spatially proximal cell-cell communication from spatially resolved transcriptomics"
author: "Suoqin Jin"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  html_document:
    toc: true
    theme: united
mainfont: Arial
vignette: >
  %\VignetteIndexEntry{CellChat inference and analysis of spatially proximal cell-cell communication from spatially resolved transcriptomics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  root.dir = './'
)
#knitr::opts_chunk$set(eval = FALSE)
```


This vignette outlines the steps of inference, analysis and visualization of cell-cell communication network for **a single spatial transcriptomics dataset using CellChat**. We showcase CellChat’s application to spatial transcriptomics data by applying it to a mouse brain 10X visium dataset (https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-1-sagittal-anterior-1-standard-1-0-0). Biological annotations of spots (i.e., cell group information) are predicted using Seurat (https://satijalab.org/seurat/articles/spatial_vignette.html).

CellChat requires gene expression and spatial location data of spots/cells as the user input and models the probability of cell-cell communication by integrating gene expression with spatial distance as well as prior knowledge of the interactions between signaling ligands, receptors and their cofactors.

Upon infering the intercellular communication network, CellChat's various functionality can be used for further data exploration, analysis, and visualization.


## Load the required libraries
```{r message=FALSE,warning=FALSE}
ptm = Sys.time()

library(CellChat)
library(patchwork)
options(stringsAsFactors = FALSE)
```

# Part I: Data input & processing and initialization of CellChat object

When inferring spatially-proximal cell-cell communication from spatially resolved transcriptomic data, user also should provide spatial coordinates/locations of spot/cell centroids. In addition, to filter out cell-cell communication beyond the maximum diffusion range of molecules (e.g., ~250μm), CellChat needs to compute the cell centroid-to-centroid distance in the unit of micrometers. Therefore, for spatial technologies that only provide spatial coordinates in pixels, CellChat converts spatial coordinates from pixels to micrometers by requiring users to input the conversion factor.

CellChat requires four user inputs:

* **`data.input` (Gene expression data of spots/cells)**: genes should be in rows with rownames and cells in columns with colnames. Normalized data (e.g., library-size normalization and then log-transformed with a pseudocount of 1) is required as input for CellChat analysis. If user provides count data, we provide a `normalizeData` function to account for library size and then do log-transformed.

* **`meta` (User assigned cell labels and samples labels)**: a data frame (rows are cells with rownames) consisting of cell information, which will be used for defining cell groups.  A column named `samples` should be provided for spatial transcriptomics analysis, which is useful for analyzing cell-cell communication by aggregating multiple samples/replicates. Of note, for comparison analysis across different conditions, users still need to create a CellChat object seperately for each condition.

* **`coordinates` (Spatial coordinates of spots/cells)**: a data frame in which each row gives the spatial coordinates/locations of each cell/spot centroid.

* **`spatial.factors` (Spatial factors of spatial distance)**: a data frame containing two distance factors `ratio` and `tol`, which is dependent on spatial transcriptomics technologies (and specific datasets).  (i) `ratio`: the conversion factor when converting spatial coordinates from Pixels or other units to Micrometers (i.e.,Microns). For example, setting `ratio = 0.18` indicates that 1 pixel equals 0.18um in the coordinates. (ii) `tol`: the tolerance factor to increase the robustness when comparing the center-to-center distance against the `interaction.range`. This can be the half value of cell/spot size in the unit of um. **Please check the [vignette of FAQ on applying CellChat to spatially resolved transcriptomics data](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/FAQ_on_applying_CellChat_to_spatial_transcriptomics_data.html) for detailed explanations and setting of different technologies of spatial transcriptomics data.**


When inferring contact-dependent or juxtacrine signaling by setting `contact.dependent = TRUE` in `computeCommunProb`, and using L-R pairs from `Cell-Cell Contact` signaling classified in `CellChatDB$interaction$annotation`, CellChat requires another one user input:

* **`contact.range`**: a value giving the interaction range (Unit: microns) to restrict the contact-dependent signaling. For spatial transcriptomics in a single-cell resolution, `contact.range` is approximately equal to the estimated cell diameter (i.e., the cell center-to-center distance), which means that contact-dependent or juxtacrine signaling can only happens when the two cells are contact to each other. Typically, `contact.range = 10`, which is a typical human cell size. However, for low-resolution spatial data such as 10X visium, it should be the cell center-to-center distance (i.e., `contact.range = 100` for 10X visium data). The function `computeCellDistance` can compute the center-to-center distance.

Instead of providing `contact.range`, users may alternatively provide a value of `contact.knn.k`, in order to restrict the contact-dependent signaling within the k-nearest neighbors (knn).


## Load data
```{r, fig.width=5,fig.height = 5, fig.wide = TRUE, fig.align = "center"}
# Here we load a Seurat object of 10X Visium mouse cortex data and its associated cell meta data
load("/Users/suoqinjin/Library/CloudStorage/OneDrive-Personal/works/CellChat/tutorial/visium_mouse_cortex_annotated_full.RData")
# show the image and annotated spots
color.use <- scPalette(nlevels(visium.brain)); names(color.use) <- levels(visium.brain)
Seurat::SpatialDimPlot(visium.brain, label = T, label.size = 3, cols = color.use)

# Prepare input data for CelChat analysis
data.input = Seurat::GetAssayData(visium.brain, slot = "data", assay = "SCT") # normalized data matrix

# define the meta data:
# a column named `samples` should be provided for spatial transcriptomics analysis, which is useful for analyzing cell-cell communication by aggregating multiple samples/replicates. Of note, for comparison analysis across different conditions, users still need to create a CellChat object seperately for each condition.
meta = data.frame(labels = Seurat::Idents(visium.brain), samples = "sample1", row.names = names(Seurat::Idents(visium.brain))) # manually create a dataframe consisting of the cell labels
meta$samples <- factor(meta$samples)
unique(meta$labels) # check the cell labels
unique(meta$samples) # check the sample labels

# load spatial transcriptomics information
# Spatial locations of spots from full (NOT high/low) resolution images are required. For 10X Visium, this information is in `tissue_positions.csv`.
spatial.locs = Seurat::GetTissueCoordinates(visium.brain, scale = NULL, cols = c("imagerow", "imagecol"))

# Spatial factors of spatial coordinates
# For 10X Visium, the conversion factor of converting spatial coordinates from Pixels to Micrometers can be computed as the ratio of the theoretical spot size (i.e., 65um) over the number of pixels that span the diameter of a theoretical spot size in the full-resolution image (i.e., 'spot_diameter_fullres' in pixels in the 'scalefactors_json.json' file).
# Of note, the 'spot_diameter_fullres' factor is different from the `spot` in Seurat object and thus users still need to get the value from the original json file.
scalefactors = jsonlite::fromJSON(txt = file.path("/Users/suoqinjin/Library/CloudStorage/OneDrive-Personal/works/CellChat/tutorial/spatial_imaging_data_visium-brain", 'scalefactors_json.json'))
spot.size = 65 # the theoretical spot size (um) in 10X Visium
conversion.factor = spot.size/scalefactors$spot_diameter_fullres
spatial.factors = data.frame(ratio = conversion.factor, tol = spot.size/2)

d.spatial <- computeCellDistance(coordinates = spatial.locs, ratio = spatial.factors$ratio, tol = spatial.factors$tol)
min(d.spatial[d.spatial!=0]) # this value should approximately equal 100um for 10X Visium data
```

## Create a CellChat object
**USERS can create a new CellChat object from a data matrix or Seurat.** If input is a Seurat object, the meta data in the object will be used by default and USER must provide `group.by` to define the cell groups. e.g, group.by = "ident" for the default cell identities in Seurat object.

**NB: If USERS load previously calculated CellChat object (version < 2.1.0), please update the object via `updateCellChat`**
```{r}
cellchat <- createCellChat(object = data.input, meta = meta, group.by = "labels",
                           datatype = "spatial", coordinates = spatial.locs, spatial.factors = spatial.factors)
cellchat
```

## Set the ligand-receptor interaction database
Before users can employ CellChat to infer cell-cell communication, they need to set the ligand-receptor interaction database and identify over-expressed ligands or receptors.

Our database CellChatDB is a manually curated database of literature-supported ligand-receptor interactions in both human and mouse. CellChatDB v2 contains ~3,300 validated molecular interactions, including ~40% of secrete autocrine/paracrine signaling interactions, ~17% of extracellular matrix (ECM)-receptor interactions, ~13% of cell-cell contact interactions and ~30% non-protein signaling. Compared to CellChatDB v1, CellChatDB v2 adds more than 1000 protein and non-protein interactions such as metabolic and synaptic signaling. It should be noted that for molecules that are not directly related to genes measured in scRNA-seq, CellChat v2 estimates the expression of ligands and receptors using those molecules’ key mediators or enzymes for potential communication mediated by non-proteins.

CellChatDB v2 also adds additional functional annotations of ligand-receptor pairs, such as UniProtKB keywords (including biological process, molecular function, functional class, disease, etc), subcellular location and relevance to neurotransmitter.

Users can update CellChatDB by adding their own curated ligand-receptor pairs. Please check the [tutorial on updating the ligand-receptor interaction database CellChatDB](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/Update-CellChatDB.html).

When analyzing human samples, use the database **`CellChatDB.human`**; when analyzing mouse samples, use the database **`CellChatDB.mouse`**. CellChatDB categorizes ligand-receptor pairs into different types, including “Secreted Signaling”, “ECM-Receptor”, “Cell-Cell Contact” and “Non-protein Signaling”. By default, the “Non-protein Signaling” are not used.

```{r, fig.width=6,fig.height = 2.5, fig.wide = TRUE, fig.align = "center"}
CellChatDB <- CellChatDB.mouse # use CellChatDB.human if running on human data
showDatabaseCategory(CellChatDB)
# Show the structure of the database
dplyr::glimpse(CellChatDB$interaction)

# use a subset of CellChatDB for cell-cell communication analysis
CellChatDB.use <- subsetDB(CellChatDB, search = "Secreted Signaling", key = "annotation") # use Secreted Signaling

# Only uses the Secreted Signaling from CellChatDB v1
#  CellChatDB.use <- subsetDB(CellChatDB, search = list(c("Secreted Signaling"), c("CellChatDB v1")), key = c("annotation", "version"))

# use all CellChatDB except for "Non-protein Signaling" for cell-cell communication analysis
# CellChatDB.use <- subsetDB(CellChatDB)


# use all CellChatDB for cell-cell communication analysis
# CellChatDB.use <- CellChatDB # simply use the default CellChatDB. We do not suggest to use it in this way because CellChatDB v2 includes "Non-protein Signaling" (i.e., metabolic and synaptic signaling) that can be only estimated from gene expression data.

# set the used database in the object
cellchat@DB <- CellChatDB.use
```

## Preprocessing the expression data for cell-cell communication analysis
To infer the cell state-specific communications, CellChat identifies over-expressed ligands or receptors in one cell group and then identifies over-expressed ligand-receptor interactions if either ligand or receptor are over-expressed.

We also provide a function to project gene expression data onto protein-protein interaction (PPI) network. Specifically, a diffusion process is used to smooth genes’ expression values based on their neighbors’ defined in a high-confidence experimentally validated protein-protein network. This function is useful when analyzing single-cell data with shallow sequencing depth because the projection reduces the dropout effects of signaling genes, in particular for possible zero expression of subunits of ligands/receptors. One might be concerned about the possible artifact introduced by this diffusion process, however, it will only introduce very weak communications. By default CellChat uses the raw data (i.e., `object@data.signaling`) instead of the projected data. To use the projected data, users should run the function `projectData` before running `computeCommunProb`, and then set `raw.use = FALSE` when running `computeCommunProb`.

```{r}
# subset the expression data of signaling genes for saving computation cost
cellchat <- subsetData(cellchat) # This step is necessary even if using the whole database
future::plan("multisession", workers = 4)
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat, variable.both = F)

# project gene expression data onto PPI (Optional: when running it, USER should set `raw.use = FALSE` in the function `computeCommunProb()` in order to use the projected data)
# cellchat <- projectData(cellchat, PPI.mouse)
execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
```

# Part II: Inference of cell-cell communication network
CellChat infers the biologically significant cell-cell communication by assigning each interaction with a probability value and peforming a permutation test. CellChat models the probability of cell-cell communication by integrating gene expression with prior known knowledge of the interactions between signaling ligands, receptors and their cofactors using the law of mass action.

CAUTION: The number of inferred ligand-receptor pairs clearly depends on the **method for calculating the average gene expression per cell group**. By default, CellChat uses a statistically robust mean method called 'trimean', which produces fewer interactions than other methods. However, we find that CellChat performs well at predicting stronger interactions, which is very helpful for narrowing down on interactions for further experimental validations. In `computeCommunProb`, we provide an option for using other methods, such as 5% and 10% truncated mean, to calculating the average gene expression. Of note, 'trimean' approximates 25% truncated mean, implying that the average gene expression is zero if the percent of expressed cells in one group is less than 25%. To use 10% truncated mean, USER can set `type = "truncatedMean"` and `trim = 0.1`. To determine a proper value of trim, CellChat provides a function `computeAveExpr`, which can help to check the average expression of signaling genes of interest, e.g, `computeAveExpr(cellchat, features = c("CXCL12","CXCR4"), type =  "truncatedMean", trim = 0.1)`. Therefore, if well-known signaling pathways in the studied biological process are not predicted, users can try `truncatedMean` with lower values of `trim` to change the method for calculating the average gene expression per cell group.


## Compute the communication probability and infer cellular communication network
To quickly examine the inference results, USER can set `nboot = 20` in `computeCommunProb`. Then "pvalue < 0.05" means none of the permutation results are larger than the observed communication probability.

If well-known signaling pathways in the studied biological process are not predicted, USER can try `truncatedMean` with lower values of `trim` to change the method for calculating the average gene expression per cell group.

USERS may need to adjust the parameter `scale.distance` when working on data from other spatial transcriptomics technologies. Please check the documentation in detail via `?computeCommunProb`.

When inferring contact-dependent or juxtacrine signaling, users should provide a value of `contact.range` and set `contact.dependent = TRUE`. Briefly, users can set `contact.range = 10`, which is a typical human cell size. However, for low-resolution spatial data such as 10X visium, it should be the cell center-to-center distance (i.e., `contact.range = 100` for 10X visium data). Please check the [vignette of FAQ on applying CellChat to spatially resolved transcriptomics data](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/FAQ_on_applying_CellChat_to_spatial_transcriptomics_data.html) for detailed explanations. In this example, we did not use the L-R pairs from `Cell-Cell Contact` signaling, therefore we can set `contact.dependent = FALSE` and `contact.range = NULL`. But as an illustration, we use the following settings that lead to the same results.


```{r}
ptm = Sys.time()

cellchat <- computeCommunProb(cellchat, type = "truncatedMean", trim = 0.1,
                              distance.use = TRUE, interaction.range = 250, scale.distance = 0.01,
                              contact.dependent = TRUE, contact.range = 100)

```

Users can filter out the cell-cell communication if there are only few cells in certain cell groups. By default, the minimum number of cells required in each cell group for cell-cell communication is 10.
```{r}
cellchat <- filterCommunication(cellchat, min.cells = 10)
```

## Extract the inferred cellular communication network as a data frame
We provide a function `subsetCommunication` to easily access the inferred cell-cell communications of interest. For example,

* ```df.net <- subsetCommunication(cellchat)``` returns a data frame consisting of all the inferred cell-cell communications at the level of ligands/receptors. Set `slot.name = "netP"` to access the the inferred communications at the level of signaling pathways

* ```df.net <- subsetCommunication(cellchat, sources.use = c(1,2), targets.use = c(4,5))``` gives the inferred cell-cell communications sending from cell groups 1 and 2 to cell groups 4 and 5.

* ```df.net <- subsetCommunication(cellchat, signaling = c("WNT", "TGFb"))``` gives the inferred cell-cell communications mediated by signaling WNT and TGFb.


## Infer the cell-cell communication at a signaling pathway level
CellChat computes the communication probability on signaling pathway level by summarizing the communication probabilities of all ligands-receptors interactions associated with each signaling pathway.

NB: The inferred intercellular communication network of each ligand-receptor pair and each signaling pathway is stored in the slot 'net' and 'netP', respectively.

```{r}
cellchat <- computeCommunProbPathway(cellchat)
```

## Calculate the aggregated cell-cell communication network
We can calculate the aggregated cell-cell communication network by counting the number of links or summarizing the communication probability. USER can also calculate the aggregated network among a subset of cell groups by setting `sources.use` and `targets.use`.
```{r}
cellchat <- aggregateNet(cellchat)

execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
```

We can also visualize the aggregated cell-cell communication network. For example, showing the number of interactions or the total interaction strength (weights) between any two cell groups using circle plot or heatmap plot.
```{r, fig.width=10,fig.height = 5, fig.wide = TRUE, fig.align = "center"}

ptm = Sys.time()

groupSize <- as.numeric(table(cellchat@idents))
par(mfrow = c(1,2), xpd=TRUE)
netVisual_circle(cellchat@net$count, vertex.weight = rowSums(cellchat@net$count), weight.scale = T, label.edge= F, title.name = "Number of interactions")
netVisual_circle(cellchat@net$weight, vertex.weight = rowSums(cellchat@net$weight), weight.scale = T, label.edge= F, title.name = "Interaction weights/strength")
```
```{r, fig.width=3.5,fig.height = 3, fig.wide = TRUE, fig.align = "center"}
netVisual_heatmap(cellchat, measure = "count", color.heatmap = "Blues")
netVisual_heatmap(cellchat, measure = "weight", color.heatmap = "Blues")
```

# Part III: Visualization of cell-cell communication network
Upon infering the cell-cell communication network, CellChat provides various functionality for further data exploration, analysis, and visualization. Here we only showcase the `circle plot` and the new `spatial plot`.

**Visualization of cell-cell communication at different levels**: One can visualize the inferred communication network of signaling pathways using `netVisual_aggregate`, and visualize the inferred communication networks of individual L-R pairs associated with that signaling pathway using `netVisual_individual`.

Here we take input of one signaling pathway as an example. All the signaling pathways showing significant communications can be accessed by `cellchat@netP$pathways`.
```{r, fig.height = 5, fig.wide = TRUE, fig.align = "center"}
pathways.show <- c("IGF")
# Circle plot
par(mfrow=c(1,1), xpd = TRUE) # `xpd = TRUE` should be added to show the title
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "circle")
# Spatial plot
par(mfrow=c(1,1))
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "spatial", edge.width.max = 2, vertex.size.max = 1, alpha.image = 0.2, vertex.label.cex = 3.5)

execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
```

Compute and visualize the network centrality scores:
```{r, fig.height = 5, fig.wide = TRUE, fig.align = "center"}
# Compute the network centrality scores
cellchat <- netAnalysis_computeCentrality(cellchat, slot.name = "netP") # the slot 'netP' means the inferred intercellular communication network of signaling pathways
# Visualize the computed centrality scores using heatmap, allowing ready identification of major signaling roles of cell groups
par(mfrow=c(1,1))
netAnalysis_signalingRole_network(cellchat, signaling = pathways.show, width = 8, height = 2.5, font.size = 10)

# USER can show this information on the spatial transcriptomics when visualizing a signaling network, e.g., bigger circle indicates larger incoming signaling
par(mfrow=c(1,1))
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "spatial", edge.width.max = 2, alpha.image = 0.2, vertex.weight = "incoming", vertex.size.max = 4, vertex.label.cex = 3.5)

```

Visualize gene expression distribution on tissue
```{r, fig.height = 5, fig.wide = TRUE, fig.align = "center", eval=TRUE}
# Take an input of a few genes
spatialFeaturePlot(cellchat, features = c("Igf1","Igf1r"), point.size = 0.8, color.heatmap = "Reds", direction = 1)

# Take an input of a ligand-receptor pair
spatialFeaturePlot(cellchat, pairLR.use = "IGF1_IGF1R", point.size = 0.5, do.binary = FALSE, cutoff = 0.05, enriched.only = F, color.heatmap = "Reds", direction = 1)

# Take an input of a ligand-receptor pair and show expression in binary
spatialFeaturePlot(cellchat, pairLR.use = "IGF1_IGF1R", point.size = 1, do.binary = TRUE, cutoff = 0.05, enriched.only = F, color.heatmap = "Reds", direction = 1)

```

**NB: Upon infering the intercellular communication network from spatial transcriptomics data, CellChat's various functionality can be used for further data exploration, analysis, and visualization. Please check other functionalities in the [basic tutorial of CellChat](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/CellChat-vignette.html)**


# Part V: Save the CellChat object
```{r eval=FALSE}
saveRDS(cellchat, file = "cellchat_visium_mouse_cortex.rds")
```

# Part VI: Explore the cell-cell communication through the Interactive CellChat Explorer

```{r eval=FALSE}
runCellChatApp(cellchat)
```

# Part VII: Application to different technologies of spatially-resolved transcriptomics
Please check the [vignette of FAQ on applying CellChat to spatially resolved transcriptomics data](https://htmlpreview.github.io/?https://github.com/jinworks/CellChat/blob/master/tutorial/FAQ_on_applying_CellChat_to_spatial_transcriptomics_data.html) for detailed setting of different technologies of spatial transcriptomics data.

```{r}
sessionInfo()
```