Skill v1.0.1
currentAutomated scan100/1002 files
version: "1.0.1" name: singlem description: Use SingleM to profile metagenomes and genomes, generate OTU tables, and produce GTDB-based taxonomic profiles.
SingleM Taxonomic Profiling Skill
Overview
SingleM is a tool for profiling shotgun metagenomes (short- and long-read) by targeting 20 amino acid "window" sequences within single-copy marker genes. It generates GTDB-based taxonomic profiles and is particularly strong at handling novel lineages.
The primary subcommand for taxonomic profiling is singlem pipe.
Skill corresponds to SingleM v0.21.3.
Installation
Conda (recommended)
conda create -c conda-forge -c bioconda --override-channels \--name singlem 'singlem>=0.21.3'conda activate singlem# Download reference data (metapackage) — required after conda installsinglem data --output-directory /path/to/metapackage
Docker (includes reference data — no separate data download needed)
docker pull wwood/singlem:0.21.3# Run pipe directly:docker run -v `pwd`:`pwd` wwood/singlem:0.21.3 pipe \--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
Singularity/Apptainer
singularity pull docker://wwood/singlem:0.21.3singularity run -B `pwd`:`pwd` singlem_0.21.3.sif pipe \--sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4
Core Concepts
- OTU table: The intermediate output of
singlem pipe. Contains per-marker-gene OTU sequences with their coverage/abundance across samples. - Taxonomic profile (condensed profile): The final output summarising community composition. Generated from the OTU table via the
condensealgorithm, which uses trimmed means and expectation maximisation across 59 marker genes. - Coverage: The expected per-base coverage of a genome with that OTU sequence. Derived from
num_hits. The default minimum coverage to report in a taxonomic profile is 0.35× for reads, 0.1× for genomes. - GTDB taxonomy: SingleM uses GTDB taxonomy strings (e.g.
Root; d__Bacteria; p__Proteobacteria; ...).
SingleM Subcommands at a Glance
SingleM (and its phage-focused sibling Lyrebird) is a suite of subcommands. Most users only need pipe (and data once, to fetch reference data). The rest support downstream analysis, reference-data management, and package development.
Main tools
| Subcommand | Purpose | |
|---|---|---|
singlem pipe | Main workflow: profile reads/genomes → OTU table + GTDB taxonomic profile | |
singlem data | Download / verify the reference metapackage | |
singlem summarise | Mechanical transformations of pipe results (Krona, species-by-site tables, combining OTU tables, etc.) | |
singlem renew | Re-run taxonomy assignment on an existing archive OTU table against a new metapackage | |
singlem supplement | Add new genomes to a metapackage to create a custom reference | |
singlem prokaryotic_fraction | Estimate the bacterial/archaeal fraction (and average genome size) of a metagenome | |
singlem appraise | Assess how much of a metagenome is represented by a set of genomes/assemblies | |
lyrebird data | Download / verify the Lyrebird (phage) reference metapackage | |
lyrebird pipe | Profile dsDNA phages — same interface as singlem pipe |
Advanced / expert modes
| Subcommand | Purpose | |
|---|---|---|
singlem condense | Generate a taxonomic profile from an existing (archive) OTU table | |
singlem makedb | Build a searchable database (.sdb) from OTU tables | |
singlem query | Find sequences in a makedb database similar to query OTU sequences | |
singlem seqs | Choose the best window position within an HMM (step 1 of building a SingleM package) | |
singlem create | Create a SingleM package from a GraftM package + taxonomy (step 2 of package building) | |
singlem regenerate | Update an existing SingleM package with new sequences/taxonomy | |
singlem metapackage | Create (or --describe) a metapackage from individual SingleM packages | |
lyrebird condense | condense for Lyrebird (non-universal phage markers) | |
lyrebird renew | renew for Lyrebird archive OTU tables |
Generating a Taxonomic Profile
Basic usage — paired-end short reads
singlem pipe \--forward sample_R1.fastq.gz \--reverse sample_R2.fastq.gz \--taxonomic-profile sample.profile.tsv \--threads 8
Single-end or unpaired reads
singlem pipe \--sequences sample.fastq.gz \-p sample.profile.tsv \--threads 8
(-p is the short form of --taxonomic-profile)
Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)
singlem pipe \--sequences sample_nanopore.fastq.gz \-p sample.profile.tsv \--threads 8
Long reads use the same interface; SingleM auto-detects read length.
Multiple samples — combined in one run
singlem pipe \--forward S1_R1.fq.gz S2_R1.fq.gz \--reverse S1_R2.fq.gz S2_R2.fq.gz \--otu-table all_samples.otu_table.csv \--taxonomic-profile all_samples.profile.tsv \--threads 16
For >100 samples, run each individually and combine OTU tables withsinglem summarise.
Genome / assembly input
# Single genomesinglem pipe \--genome-fasta-files genome.fna \-p genome.profile.tsv# Many genomes from a directorysinglem pipe \--genome-fasta-directory /path/to/genomes/ \--genome-fasta-extension fna \-p genomes.profile.tsv \--threads 16# From a file listing genome pathssinglem pipe \--genome-fasta-list genomes.txt \-p genomes.profile.tsv \--threads 16
Genome mode uses different defaults: higher--min-taxon-coverage(0.1) and--min-orf-length(300 bp).
Output Options
Also save an OTU table (--otu-table)
Useful for alpha/beta diversity metrics, ordination, and inspecting raw data (e.g. which marker genes fired, which OTU sequences were found). Compatible with singlem summarise and singlem appraise.
singlem pipe \--forward sample_R1.fastq.gz \--reverse sample_R2.fastq.gz \--otu-table sample.otu_table.csv \--taxonomic-profile sample.profile.tsv \--threads 8
Save an archive OTU table (--archive-otu-table) — recommended for long-term archiving
The archive OTU table stores additional information (full sequence context, alignment data) needed to regenerate results without re-running the pipeline. It is the right format for two important downstream modes:
- `singlem condense` — re-derive the taxonomic profile from the archive OTU table (e.g. with different
--min-taxon-coveragesettings) without re-runningpipe - `singlem renew` — re-assign taxonomy against an updated metapackage without re-running
pipe
singlem pipe \--forward sample_R1.fastq.gz \--reverse sample_R2.fastq.gz \--archive-otu-table sample.archive.otu_table.json.gz \--taxonomic-profile sample.profile.tsv \--threads 8# Later: re-derive profile with different coverage thresholdsinglem condense \--input-archive-otu-tables sample.archive.otu_table.json.gz \--taxonomic-profile sample_recondensed.profile.tsv \--min-taxon-coverage 0.1# Later: re-assign taxonomy with a newer metapackagesinglem renew \--input-archive-otu-table sample.archive.otu_table.json.gz \--taxonomic-profile sample_updated.profile.tsv \--metapackage /path/to/new_metapackage
Key Options
| Option | Description | Default | |
|---|---|---|---|
--forward / -1 / --reads / --sequences | Forward or unpaired reads (FASTA/FASTQ, gzipped ok) | required | |
--reverse / -2 | Reverse reads for paired-end | — | |
--taxonomic-profile / -p | Output taxonomic profile (TSV) | not set | |
--otu-table | Output OTU table (CSV) | not set | |
--threads | Number of CPU threads | 1 | |
--metapackage | Path to reference metapackage | default system metapackage | |
--min-taxon-coverage | Min coverage to report in profile | 0.35 (reads), 0.1 (genomes) | |
--assignment-method | Taxonomy assignment algorithm for OTUs | smafa_naive_then_diamond | |
--genome-fasta-files | Input genome FASTA(s) | — | |
--genome-fasta-directory / -d | Directory of genome FASTAs | — | |
--genome-fasta-extension | Extension for genome FASTAs | fna | |
--genome-fasta-list | File listing genome paths | — |
Output Format
Taxonomic profile (-p / --taxonomic-profile) — SingleM condensed format
Tab-separated file (.tsv) with three columns: sample, coverage, taxonomy.
sample coverage taxonomymarine0.1 3.64 Root; d__Archaeamarine0.1 0.02 Root; d__Bacteriamarine0.1 0.56 Root; d__Archaea; p__Thermoproteotamarine0.1 0.80 Root; d__Bacteria; p__Desulfobacterotamarine0.1 2.17 Root; d__Bacteria; p__Proteobacteria
Key properties of the condensed format:
- Coverage is the estimated per-base read coverage attributed directly to that taxon — it is not inclusive of descendants. For example,
Root; d__Bacteria(coverage 0.02) does not include the coverage fromp__Desulfobacterota(0.80) orp__Proteobacteria(2.17); those are reported on their own lines. - Every taxonomic level that has any coverage is listed as its own row. Higher-level rows (e.g. domain) represent reads that could not be assigned more specifically.
- Taxonomy strings follow GTDB conventions:
Root; d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; ... - Multiple samples can appear in a single file, distinguished by the
samplecolumn.
OTU table (--otu-table)
CSV with columns: gene, sample, sequence, num_hits, coverage, taxonomy
Important Caveats
- Use raw reads, not quality-trimmed reads. Quality trimming (e.g. Trimmomatic) can shorten reads below 100 bp, making them unusable. Adapter trimming is fine but unnecessary.
- Do not use assembled contigs as read input. Use
--genome-fasta-filesfor assemblies/MAGs;--sequences/--forwardis for raw reads only. - Reference data required. After conda install, run
singlem databefore usingpipe. Docker images include reference data.
Downstream Analysis
Convert condensed profile to other formats (singlem summarise)
singlem summarise transforms the condensed profile into several more analysis-friendly formats.
Krona chart (interactive HTML)
singlem summarise \--input-taxonomic-profile sample.profile.tsv \--output-taxonomic-profile-krona sample.krona.html
Produces an interactive hierarchical chart viewable in any web browser. Can also be generated directly from pipe with --taxonomic-profile-krona.
Relative abundance species-by-site table
Outputs a taxon-by-sample matrix with relative abundance as percentages. Use --output-species-by-site-level to choose the taxonomic rank (domain, phylum, class, order, family, genus, or species):
singlem summarise \--input-taxonomic-profile sample.profile.tsv \--output-species-by-site-relative-abundance sample.phylum.csv \--output-species-by-site-level phylum
Example output (one column per sample when multiple samples are present):
taxonomy marine0.1unassigned 50.9Root; d__Archaea; p__Thermoproteota 7.79Root; d__Bacteria; p__Desulfobacterota 11.13Root; d__Bacteria; p__Proteobacteria 30.18
To generate tables for all taxonomic levels at once, use a prefix:
singlem summarise \--input-taxonomic-profile sample.profile.tsv \--output-species-by-site-relative-abundance-prefix myprefix# produces: myprefix-domain.tsv, myprefix-phylum.tsv, ..., myprefix-species.tsv
Long form with extra columns (filled coverage, relative abundance, level)
singlem summarise \--input-taxonomic-profile sample.profile.tsv \--output-taxonomic-profile-with-extras sample.with_extras.tsv
Adds full_coverage (coverage including descendants), relative_abundance (%), and level columns:
sample coverage full_coverage relative_abundance level taxonomymarine0.1 0 7.19 100.0 root Rootmarine0.1 3.64 4.20 58.41 domain Root; d__Archaeamarine0.1 0.02 2.99 41.59 domain Root; d__Bacteriamarine0.1 0.56 0.56 7.79 phylum Root; d__Archaea; p__Thermoproteotamarine0.1 0.80 0.80 11.13 phylum Root; d__Bacteria; p__Desulfobacterotamarine0.1 2.17 2.17 30.18 phylum Root; d__Bacteria; p__Proteobacteria
Note: coverage here is unfilled (not including descendants); full_coverage is filled (sum of a taxon and all its descendants).
Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc
singlem pipe \--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \-p sample.profile.tsv --threads 8singlem prokaryotic_fraction \--forward sample_R1.fq.gz --reverse sample_R2.fq.gz \-p sample.profile.tsv \> sample.prokaryotic_fraction.tsv
Re-profile with updated reference database (no re-running pipe)
Requires that the original run saved an --archive-otu-table.
singlem renew \--input-archive-otu-table sample.archive.otu_table.json.gz \--taxonomic-profile sample_updated.profile.tsv \--metapackage /path/to/new_metapackage
renew also accepts --assignment-method, --threads, and --min-taxon-coverage, just like pipe.
Combine OTU tables from multiple separate runs
singlem summarise \--input-otu-tables s1.otu_table.csv s2.otu_table.csv s3.otu_table.csv \--output-otu-table combined.otu_table.csv
Assess how much of a metagenome's prokaryotes have an associated genome/MAG (singlem appraise)
appraise compares OTU sequences from genomes and/or assemblies against those from the raw metagenome, reporting which lineages are represented and which are missing.
singlem pipe --sequences raw.fq.gz --otu-table metagenome.otu_table.csvsinglem pipe --genome-fasta-files my-genomes/*.fasta --otu-table genomes.otu_table.csvsinglem appraise \--metagenome-otu-tables metagenome.otu_table.csv \--genome-otu-tables genomes.otu_table.csv
Useful extras:
--assembly-otu-tables— appraise an assembly alongside (or instead of) binned genomes.--imperfect— match OTU sequences that are similar but not identical (e.g. to credit a genus-level representative); tune with--sequence-identity.--plot appraise.svg— render the appraisal visually (one rectangle per OTU sequence, sized by abundance).--output-binned-otu-table/--output-unbinned-otu-table/--output-unaccounted-for-otu-table— write OTU tables of the represented vs. missing populations.
Advanced & Expert Modes
These subcommands support custom reference data and lower-level analyses. Most users never need them.
Add genomes to a reference metapackage (singlem supplement)
Creates a new metapackage that includes your genomes, so future pipe runs can identify them. Taxonomy for the new genomes is assigned with GTDB-Tk (installed separately, with a version matching the metapackage's GTDB release) unless supplied via --taxonomy-file or --new-fully-defined-taxonomies.
singlem supplement \--new-genome-fasta-files genome1.fna genome2.fna \--input-metapackage /path/to/metapackage \--output-metapackage supplemented.smpkg \--checkm2-quality-file checkm2_quality.tsv \--dereplicate-with-galah \--threads 8
A dereplication mode is required: either --dereplicate-with-galah (run galah at species level) or --no-dereplication (inputs are already dereplicated). A quality-filtering choice is also required: pass CheckM2 results with --checkm2-quality-file, or skip with --no-quality-filter (and optionally --no-taxon-genome-lengths if no CheckM2 file is supplied).
Build and query a SingleM database (singlem makedb / singlem query)
Useful for asking "is this OTU sequence (or anything similar) present in samples B, C, D?". .sdb is the conventional database extension.
# Build a database from OTU tablessinglem makedb \--otu-tables B.otu_table.csv C.otu_table.csv D.otu_table.csv \--db BCD.sdb# Find database sequences within a given divergence of query OTUssinglem query \--db BCD.sdb \--query-otu-table A.otu_table.csv \--max-divergence 3
query can also dump database contents filtered by sample (--sample-names), by taxonomy (--taxonomy Archaea), or in full (--dump).
Re-derive a profile from an OTU table (singlem condense)
condense turns an archive OTU table into a taxonomic profile. It is normally invoked implicitly by pipe's -p / --taxonomic-profile, but can be run standalone — e.g. to recompute a profile with a different --min-taxon-coverage without re-running pipe. See "Save an archive OTU table" under Output Options for an example.
Create or inspect a metapackage (singlem metapackage)
Assemble individual SingleM packages (.spkg) into a metapackage, or inspect an existing one with --describe.
# Describe the contents of an existing metapackagesinglem metapackage --metapackage /path/to/metapackage --describe# Create a metapackage from individual packagessinglem metapackage \--singlem-packages pkg1.spkg pkg2.spkg \--metapackage new.smpkg \--nucleotide-sdb markers.sdb
Build SingleM packages from scratch (singlem seqs → create → regenerate)
Building a marker package is a multi-step expert workflow:
- `singlem seqs` — given an HMM-aligned FASTA, choose the best (most conserved) window position.
- `singlem create` — finalise a SingleM package from a GraftM package, a taxonomy file, and the window position from
seqs. - `singlem regenerate` — update an existing SingleM package with new sequences/taxonomy without rebuilding from scratch.
# 1. Choose the window position within the HMMsinglem seqs --alignment aligned.fasta --alignment-type aa --hmm marker.hmm# 2. Create the package using the hmm-position reported by step 1singlem create \--input-graftm-package marker.gpkg \--input-taxonomy marker_taxonomy.tsv \--hmm-position 25 \--target-domains Bacteria Archaea \--gene-description "Ribosomal protein S2" \--output-singlem-package marker.spkg
--gene-description is required — it is the free-form text shown by singlem metapackage --describe.
Phage Profiling (Lyrebird)
For dsDNA phage profiling, use the lyrebird command with the same interface:
# Download lyrebird reference datalyrebird data --output-directory /path/to/lyrebird_metapackagelyrebird pipe \--forward sample_R1.fq.gz \--reverse sample_R2.fq.gz \-p sample.phage_profile.tsv \--threads 8
Lyrebird uses >500 phage marker genes and vConTACT3-based taxonomy (not GTDB).
Lyrebird also provides condense and renew for archive OTU tables, mirroring their SingleM counterparts but using a Lyrebird metapackage. Save an archive OTU table from lyrebird pipe with --archive-otu-table to use them:
# Re-derive a phage profile from an archive OTU tablelyrebird condense \--input-archive-otu-table sample.archive.otu_table.json.gz \-p sample.phage_profile.tsv# Re-assign phage taxonomy against an updated Lyrebird metapackagelyrebird renew \--input-archive-otu-table sample.archive.otu_table.json.gz \-p sample.updated.phage_profile.tsv \--metapackage /path/to/new_lyrebird_metapackage
Quick Reference — Most Common Commands
# 1. Download reference data (once, after conda install)singlem data --output-directory ~/singlem_metapackage# 2. Profile paired-end metagenome (save archive OTU table for future re-use)singlem pipe \--forward sample_R1.fq.gz \--reverse sample_R2.fq.gz \--archive-otu-table sample.archive.otu_table.json.gz \--taxonomic-profile sample.profile.tsv \--threads 16# 3. View profilecat sample.profile.tsv# 4. Convert to Krona chartsinglem summarise \--input-taxonomic-profiles sample.profile.tsv \--output-taxonomic-profile-krona sample.krona.html
Citation
If you use SingleM, please cite:
Ben J. Woodcroft et al. Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02738-1