Skill v1.0.1
currentAutomated scan100/1003 files
version: "1.0.1" name: torbcellselection description: Separates T and non-T cells or B and non-B cells from a mixed cell population. Uses either clonotype percentage from VDJ data, indicator gene expression (CD3 markers for T cells, CD19/CD20 for B cells), custom selector expressions, or k-means clustering for automatic selection.
TOrBCellSelection Process Configuration
Purpose
Separates T and non-T cells or B and non-B cells from a mixed cell population. Uses either clonotype percentage from VDJ data, indicator gene expression (CD3 markers for T cells, CD19/CD20 for B cells), custom selector expressions, or k-means clustering for automatic selection.
When to Use
- When dataset contains mixed cell types (T cells + other cell types, or B cells + other cell types)
- Before TCR-specific or BCR-specific analysis to isolate relevant cells
- After
SeuratClusteringOfAllCellsto identify which clusters are T/B cells - When scRNA-seq data includes scTCR-seq or scBCR-seq data
- DO NOT use if all cells in your dataset are already T/B cells
Configuration Structure
Process Enablement
[TOrBCellSelection]cache = true # Enable caching for this process
Input Specification
[TOrBCellSelection.in]# Seurat object file (RDS/qs2 format) from SeuratClusteringOfAllCellssrtobj = ["SeuratClusteringOfAllCells"]# Optional: Immune repertoire data file (RDS/qs2 format) from ScRepLoading# Required unless ignore_vdj is set to trueimmdata = ["ScRepLoading"]
Environment Variables
[TOrBCellSelection.envs]# Whether to ignore VDJ information and use only marker gene expressionignore_vdj = false# Custom R expression to identify T/B cells# Example: "Clonotype_Pct > 0.25" selects cells with >25% clonotype percentage# Can use indicator genes: "Clonotype_Pct > 0.25 & CD3E > 0"# If not provided, k-means clustering will be usedselector = null# List of indicator genes for T/B cell identification# For T cells: ["CD3E", "CD3D", "CD3G"] (positive markers)# or include negative markers: ["CD3E", "CD19", "CD14"]# For B cells: ["CD19", "MS4A1", "CD79A", "CD79B"]indicator_genes = ["CD3E"]# Parameters for k-means clustering (if selector not provided)# Reference: https://rdrr.io/r/stats/kmeans.html# Note: dots in argument names should be replaced with hyphenskmeans = {"nstart": 25}
Configuration Examples
Minimal Configuration (Default T Cell Markers)
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]
What this does: Uses default CD3E marker + k-means clustering with VDJ data to automatically select T cell clusters.
T Cell Selection with Multiple CD3 Markers
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Use all three CD3 markers for robust T cell identificationindicator_genes = ["CD3E", "CD3D", "CD3G"]
B Cell Selection (Default Markers)
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Select B cells using CD19 and CD20 (MS4A1) markersindicator_genes = ["CD19", "MS4A1"]
Selection by Clonotype Percentage Threshold
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Select cells/clusters with >25% clonotype percentage as T/B cellsselector = "Clonotype_Pct > 0.25"
Selection Combined with Marker Expression
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Select cells with high clonotype percentage AND CD3E expressionindicator_genes = ["CD3E"]selector = "Clonotype_Pct > 0.25 & CD3E > 0"
Selection Without VDJ Data (Markers Only)
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"][TOrBCellSelection.envs]# Ignore VDJ data, use only marker gene expressionignore_vdj = true# Need at least 2 markers for k-means when VDJ is ignoredindicator_genes = ["CD3E", "CD3D", "CD3G"]# First gene must be a positive marker for selection# (CD3E is positive for T cells)
B Cell Selection Without VDJ Data
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"][TOrBCellSelection.envs]# Select B cells using markers only (no VDJ data)ignore_vdj = trueindicator_genes = ["CD19", "MS4A1", "CD79A"]
Custom K-means Parameters
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]indicator_genes = ["CD3E", "CD3D", "CD3G"]# Custom k-means parameters# nstart: number of random starts for stability (default: 25)# iter.max: maximum iterations (default: 10 in R)# Note: hyphens instead of dots in key nameskmeans = {"nstart": 50, "iter-max": 20}
Common Patterns
Pattern 1: Standard T Cell Selection (with VDJ)
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Robust T cell selection using all three CD3 markersindicator_genes = ["CD3E", "CD3D", "CD3G"]
When to use: Typical TCR-seq analysis where T cells need to be separated from other cell types.
Pattern 2: Standard B Cell Selection (with VDJ)
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# B cell selection using CD19 and CD20 markersindicator_genes = ["CD19", "MS4A1"]
When to use: BCR-seq analysis where B cells need to be separated from other cell types.
Pattern 3: High-Sensitivity T Cell Selection
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Lower threshold to capture more T cellsselector = "Clonotype_Pct > 0.10 & CD3E > 0"
When to use: When you suspect low-quality VDJ data or want to capture borderline T cells.
Pattern 4: High-Specificity T Cell Selection
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Higher threshold for clean T cell populationselector = "Clonotype_Pct > 0.50 & CD3E > 1"
When to use: When you want only the highest-confidence T cells (e.g., for clonal expansion analysis).
Pattern 5: Auto-Selection (K-means) with Multiple Markers
[TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Let k-means determine T cell clusters automatically# No selector = automatic selectionindicator_genes = ["CD3E", "CD3D", "CD3G"]kmeans = {"nstart": 50}
When to use: When you don't have a specific threshold in mind and want automatic unsupervised selection.
Dependencies
Upstream Processes
- SeuratClusteringOfAllCells: Provides clustered Seurat object with
seurat_clustersmetadata - ScRepLoading: Provides VDJ data with clonotype information (unless
ignore_vdj = true)
Downstream Processes
- SeuratClustering: Clusters the selected T/B cells for downstream analysis
- ScRepCombiningExpression: Combines selected cells with VDJ data
- ModuleScoreCalculator: Calculates module scores on selected cells
- Other TCR/BCR-specific processes (CDR3Clustering, TESSA, ClonalStats, etc.)
Workflow Integration
SeuratPreparing → SeuratClusteringOfAllCells → TOrBCellSelection → SeuratClustering → (downstream TCR/BCR analysis)↑ScRepLoading
Selection Methods Explained
Method 1: K-means Clustering (Default)
When selector is not provided, TOrBCellSelection performs:
- Calculates average expression of indicator genes per cluster
- If VDJ data available: calculates clonotype percentage per cluster
- Performs k-means clustering (K=2) on [gene expressions + clonotype_pct]
- Selects cluster with higher clonotype percentage (or higher expression of first indicator gene if no VDJ)
Pros: Automatic, unsupervised, adapts to data Cons: May select unexpected clusters if data is noisy
Method 2: Custom Selector Expression
Provide a custom R expression via selector:
- Can use any metadata column:
Clonotype_Pct > 0.25 - Can combine with gene expression:
Clonotype_Pct > 0.25 & CD3E > 0 - Can use complex logic:
(Clonotype_Pct > 0.25 | CD3E > 1) & CD19 < 0.1
Pros: Full control, transparent selection criteria Cons: Requires domain knowledge, need to test thresholds
Method 3: Marker-Only Selection (ignore_vdj)
Set ignore_vdj = true to use only marker genes:
- Useful when VDJ data is poor or missing
- Requires at least 2 indicator genes for k-means
- First gene in list must be positive marker for the target cell type
Pros: Works without VDJ data, robust marker-based selection Cons: Requires good marker genes, may include non-clonal cells
Marker Gene Recommendations
T Cell Markers
Positive markers (expressed in T cells):
CD3E: Core CD3 epsilon chain (most reliable)CD3D: Core CD3 delta chainCD3G: Core CD3 gamma chain
Negative markers (excluded from T cells):
CD19: B cell markerMS4A1(CD20): B cell markerCD14: Monocyte markerCD68: Macrophage marker
Recommended for T cells:
indicator_genes = ["CD3E", "CD3D", "CD3G"]
B Cell Markers
Positive markers (expressed in B cells):
CD19: Pan-B cell marker (most reliable)MS4A1(CD20): Mature B cell markerCD79A: B cell receptor componentCD79B: B cell receptor component
Recommended for B cells:
indicator_genes = ["CD19", "MS4A1"]
Subtype-Specific Markers
For selecting specific T/B cell subtypes:
- T helper cells:
CD4 - Cytotoxic T cells:
CD8A,CD8B - Regulatory T cells:
FOXP3,IL2RA - Memory B cells:
CD27 - Plasma cells:
CD38,SDC1(CD138)
Validation Rules
Required Inputs
srtobjmust be specified (from SeuratClusteringOfAllCells)immdatarequired unlessignore_vdj = true
Marker Gene Validation
- Must provide at least 1 indicator gene
- If
ignore_vdj = true, must provide at least 2 indicator genes - First gene in
indicator_genesmust be a positive marker when using k-means without VDJ data
Selector Expression Validation
selectormust be a valid R expression- Can reference: metadata columns (e.g.,
Clonotype_Pct), indicator genes (e.g.,CD3E) - Use R logical operators:
&(and),|(or),!(not)
K-means Parameter Validation
kmeansmust be a valid JSON object- Valid keys:
nstart,iter-max,algorithm, etc. (seestats::kmeansdocumentation) - Dots in R argument names replaced with hyphens (e.g.,
iter.max→iter-max)
Troubleshooting
Issue: "No clonotype information found"
Cause: Barcode mismatch between scRNA-seq and VDJ data Solution:
- Check barcode formats match in both datasets
- Verify
ScRepLoadingprocessed VDJ data correctly - Try
ignore_vdj = trueto use marker genes only
Issue: "You need at least 2 markers to perform k-means clustering with VDJ data being ignored"
Cause: Using ignore_vdj = true with only 1 indicator gene Solution: Add more indicator genes or use a custom selector
Issue: Selected cells are not what I expected
Cause: K-means selected wrong cluster Solution:
- Check the k-means plot in
details/kmeans.png - Adjust
indicator_genesto include more robust markers - Use custom
selectorinstead of automatic selection - Adjust
kmeans.nstartfor more stable clustering (e.g.,{"nstart": 50})
Issue: Too few or too many cells selected
Cause: Threshold too high or too low Solution:
- Adjust
selectorthreshold (e.g.,Clonotype_Pct > 0.20vs0.30) - Review the selection table in
details/data.txt - Check scatter plots in
details/directory for gene vs clonotype relationships
Issue: All cells selected as T cells (or none selected)
Cause: Poor VDJ data or incorrect marker genes Solution:
- Verify VDJ data quality in
ScRepLoadingoutput - Check if
CD3Eis actually expressed in your data - Use
ignore_vdj = truewith robust marker genes - Manually inspect expression plots before running selection
Output Files
Primary Output
outfile: Seurat object (qs2 format) containing only selected T/B cells- Located at:
{{in.srtobj | stem}}.qs - Contains all original metadata + subset of cells
Detailed Output Directory (details/)
data.txt: Table of indicator gene expression and clonotype percentage per cluster- Shows: Cluster, indicator gene expression, Clonotype_Pct, Cluster_Size, is_selected
kmeans.png: K-means clustering visualization (if k-means used)selected_cells_per_sample.png: Bar plot of selected cells per sampleselected_cells_pie.png: Pie chart of selected vs other cellsselected-cells.png: Dimension plots showing VDJ data and selected cellsfeature-plots.png: Feature plots of indicator genes
Report
Interactive HTML report with visualization of selection results and cell composition.
Common Use Cases
Use Case 1: TCR-seq Analysis of PBMC Data
# Standard TCR-seq workflow[SeuratClusteringOfAllCells][TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]indicator_genes = ["CD3E", "CD3D", "CD3G"][SeuratClustering]# Clustering of selected T cells[CDR3Clustering][TESSA][ClonalStats]
Use Case 2: BCR-seq Analysis of Tumor-Infiltrating Lymphocytes
[SeuratClusteringOfAllCells][TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"]immdata = ["ScRepLoading"][TOrBCellSelection.envs]# Select B cells from TILsindicator_genes = ["CD19", "MS4A1", "CD79A"]selector = "CD19 > 0.5"[SeuratClustering][CDR3Clustering][CellCellCommunication]
Use Case 3: RNA-only Data with T/B Cell Separation
[SeuratClusteringOfAllCells][TOrBCellSelection][TOrBCellSelection.in]srtobj = ["SeuratClusteringOfAllCells"][TOrBCellSelection.envs]# No VDJ data, use markers onlyignore_vdj = trueindicator_genes = ["CD3E", "CD3D", "CD3G"][SeuratClustering][ScFGSEA][CellCellCommunication]
Key Notes
- Not for Pure T/B Cell Populations: If all cells are already T or B cells, skip this process and use
SeuratClusteringdirectly.
- Cluster-Level Selection: Selection happens at the cluster level, not single-cell level. All cells in selected clusters are kept.
- Normalization: Gene expression values are normalized (mean=0, SD=1) before k-means clustering.
- Marker First: When using k-means without VDJ data, the first indicator gene must be a positive marker for your target cell type.
- Report Review: Always review the HTML report and plots in
details/to verify selection quality.
- Threshold Tuning: Start with default k-means, then adjust to custom
selectorif automatic selection is not satisfactory.