Skill v1.0.1

currentAutomated scan100/100

wwood/singlem/docs

2 files

──Details

PublishedMay 26, 2026 at 01:35 AM

Content Hashsha256:cd15524f2459a566...

Git SHA79850071d8e6

Bump Typepatch

Compare with v1.0.0

──Files

Files (1 file, 20.1 KB)

SKILL.md20.1 KBactive

SKILL.md · 483 lines · 20.1 KB

version: "1.0.1" name: singlem description: Use SingleM to profile metagenomes and genomes, generate OTU tables, and produce GTDB-based taxonomic profiles.

SingleM Taxonomic Profiling Skill

Overview

SingleM is a tool for profiling shotgun metagenomes (short- and long-read) by targeting 20 amino acid "window" sequences within single-copy marker genes. It generates GTDB-based taxonomic profiles and is particularly strong at handling novel lineages.

The primary subcommand for taxonomic profiling is singlem pipe.

Skill corresponds to SingleM v0.21.3.

Installation

Conda (recommended)

bash

conda create -c conda-forge -c bioconda --override-channels \
  --name singlem 'singlem>=0.21.3'
conda activate singlem
 
# Download reference data (metapackage) — required after conda install
singlem data --output-directory /path/to/metapackage

Docker (includes reference data — no separate data download needed)

bash

docker pull wwood/singlem:0.21.3
# Run pipe directly:
docker run -v `pwd`:`pwd` wwood/singlem:0.21.3 pipe \
  --sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4

Singularity/Apptainer

bash

singularity pull docker://wwood/singlem:0.21.3
singularity run -B `pwd`:`pwd` singlem_0.21.3.sif pipe \
  --sequences `pwd`/my.fastq.gz -p `pwd`/my.profile.csv --threads 4

Core Concepts

OTU table: The intermediate output of singlem pipe. Contains per-marker-gene OTU sequences with their coverage/abundance across samples.
Taxonomic profile (condensed profile): The final output summarising community composition. Generated from the OTU table via the condense algorithm, which uses trimmed means and expectation maximisation across 59 marker genes.
Coverage: The expected per-base coverage of a genome with that OTU sequence. Derived from num_hits. The default minimum coverage to report in a taxonomic profile is 0.35× for reads, 0.1× for genomes.
GTDB taxonomy: SingleM uses GTDB taxonomy strings (e.g. Root; d__Bacteria; p__Proteobacteria; ...).

SingleM Subcommands at a Glance

SingleM (and its phage-focused sibling Lyrebird) is a suite of subcommands. Most users only need pipe (and data once, to fetch reference data). The rest support downstream analysis, reference-data management, and package development.

Main tools

Subcommand	Purpose
`singlem pipe`	Main workflow: profile reads/genomes → OTU table + GTDB taxonomic profile
`singlem data`	Download / verify the reference metapackage
`singlem summarise`	Mechanical transformations of `pipe` results (Krona, species-by-site tables, combining OTU tables, etc.)
`singlem renew`	Re-run taxonomy assignment on an existing archive OTU table against a new metapackage
`singlem supplement`	Add new genomes to a metapackage to create a custom reference
`singlem prokaryotic_fraction`	Estimate the bacterial/archaeal fraction (and average genome size) of a metagenome
`singlem appraise`	Assess how much of a metagenome is represented by a set of genomes/assemblies
`lyrebird data`	Download / verify the Lyrebird (phage) reference metapackage
`lyrebird pipe`	Profile dsDNA phages — same interface as `singlem pipe`

Advanced / expert modes

Subcommand	Purpose
`singlem condense`	Generate a taxonomic profile from an existing (archive) OTU table
`singlem makedb`	Build a searchable database (`.sdb`) from OTU tables
`singlem query`	Find sequences in a `makedb` database similar to query OTU sequences
`singlem seqs`	Choose the best window position within an HMM (step 1 of building a SingleM package)
`singlem create`	Create a SingleM package from a GraftM package + taxonomy (step 2 of package building)
`singlem regenerate`	Update an existing SingleM package with new sequences/taxonomy
`singlem metapackage`	Create (or `--describe`) a metapackage from individual SingleM packages
`lyrebird condense`	`condense` for Lyrebird (non-universal phage markers)
`lyrebird renew`	`renew` for Lyrebird archive OTU tables

Generating a Taxonomic Profile

Basic usage — paired-end short reads

bash

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 8

Single-end or unpaired reads

bash

singlem pipe \
  --sequences sample.fastq.gz \
  -p sample.profile.tsv \
  --threads 8

(-p is the short form of --taxonomic-profile)

Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)

bash

singlem pipe \
  --sequences sample_nanopore.fastq.gz \
  -p sample.profile.tsv \
  --threads 8

Long reads use the same interface; SingleM auto-detects read length.

Multiple samples — combined in one run

bash

singlem pipe \
  --forward S1_R1.fq.gz S2_R1.fq.gz \
  --reverse S1_R2.fq.gz S2_R2.fq.gz \
  --otu-table all_samples.otu_table.csv \
  --taxonomic-profile all_samples.profile.tsv \
  --threads 16

For >100 samples, run each individually and combine OTU tables with singlem summarise.

Genome / assembly input

bash

# Single genome
singlem pipe \
  --genome-fasta-files genome.fna \
  -p genome.profile.tsv
 
# Many genomes from a directory
singlem pipe \
  --genome-fasta-directory /path/to/genomes/ \
  --genome-fasta-extension fna \
  -p genomes.profile.tsv \
  --threads 16
 
# From a file listing genome paths
singlem pipe \
  --genome-fasta-list genomes.txt \
  -p genomes.profile.tsv \
  --threads 16

Genome mode uses different defaults: higher --min-taxon-coverage (0.1) and --min-orf-length (300 bp).

Output Options

Also save an OTU table (`--otu-table`)

Useful for alpha/beta diversity metrics, ordination, and inspecting raw data (e.g. which marker genes fired, which OTU sequences were found). Compatible with singlem summarise and singlem appraise.

bash

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --otu-table sample.otu_table.csv \
  --taxonomic-profile sample.profile.tsv \
  --threads 8

Save an archive OTU table (`--archive-otu-table`) — recommended for long-term archiving

The archive OTU table stores additional information (full sequence context, alignment data) needed to regenerate results without re-running the pipeline. It is the right format for two important downstream modes:

`singlem condense` — re-derive the taxonomic profile from the archive OTU table (e.g. with different --min-taxon-coverage settings) without re-running pipe
`singlem renew` — re-assign taxonomy against an updated metapackage without re-running pipe

bash

singlem pipe \
  --forward sample_R1.fastq.gz \
  --reverse sample_R2.fastq.gz \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 8
 
# Later: re-derive profile with different coverage threshold
singlem condense \
  --input-archive-otu-tables sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_recondensed.profile.tsv \
  --min-taxon-coverage 0.1
 
# Later: re-assign taxonomy with a newer metapackage
singlem renew \
  --input-archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_updated.profile.tsv \
  --metapackage /path/to/new_metapackage

Key Options

Option	Description	Default
`--forward` / `-1` / `--reads` / `--sequences`	Forward or unpaired reads (FASTA/FASTQ, gzipped ok)	required
`--reverse` / `-2`	Reverse reads for paired-end	—
`--taxonomic-profile` / `-p`	Output taxonomic profile (TSV)	not set
`--otu-table`	Output OTU table (CSV)	not set
`--threads`	Number of CPU threads	1
`--metapackage`	Path to reference metapackage	default system metapackage
`--min-taxon-coverage`	Min coverage to report in profile	0.35 (reads), 0.1 (genomes)
`--assignment-method`	Taxonomy assignment algorithm for OTUs	`smafa_naive_then_diamond`
`--genome-fasta-files`	Input genome FASTA(s)	—
`--genome-fasta-directory` / `-d`	Directory of genome FASTAs	—
`--genome-fasta-extension`	Extension for genome FASTAs	`fna`
`--genome-fasta-list`	File listing genome paths	—

Output Format

Taxonomic profile (`-p` / `--taxonomic-profile`) — SingleM condensed format

Tab-separated file (.tsv) with three columns: sample, coverage, taxonomy.

sample     coverage  taxonomy
marine0.1  3.64      Root; d__Archaea
marine0.1  0.02      Root; d__Bacteria
marine0.1  0.56      Root; d__Archaea; p__Thermoproteota
marine0.1  0.80      Root; d__Bacteria; p__Desulfobacterota
marine0.1  2.17      Root; d__Bacteria; p__Proteobacteria

Key properties of the condensed format:

Coverage is the estimated per-base read coverage attributed directly to that taxon — it is not inclusive of descendants. For example, Root; d__Bacteria (coverage 0.02) does not include the coverage from p__Desulfobacterota (0.80) or p__Proteobacteria (2.17); those are reported on their own lines.
Every taxonomic level that has any coverage is listed as its own row. Higher-level rows (e.g. domain) represent reads that could not be assigned more specifically.
Taxonomy strings follow GTDB conventions: Root; d__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; ...
Multiple samples can appear in a single file, distinguished by the sample column.

OTU table (`--otu-table`)

CSV with columns: gene, sample, sequence, num_hits, coverage, taxonomy

Important Caveats

Use raw reads, not quality-trimmed reads. Quality trimming (e.g. Trimmomatic) can shorten reads below 100 bp, making them unusable. Adapter trimming is fine but unnecessary.
Do not use assembled contigs as read input. Use --genome-fasta-files for assemblies/MAGs; --sequences/--forward is for raw reads only.
Reference data required. After conda install, run singlem data before using pipe. Docker images include reference data.

Downstream Analysis

Convert condensed profile to other formats (`singlem summarise`)

singlem summarise transforms the condensed profile into several more analysis-friendly formats.

Krona chart (interactive HTML)

bash

singlem summarise \
  --input-taxonomic-profile sample.profile.tsv \
  --output-taxonomic-profile-krona sample.krona.html

Produces an interactive hierarchical chart viewable in any web browser. Can also be generated directly from pipe with --taxonomic-profile-krona.

Relative abundance species-by-site table

Outputs a taxon-by-sample matrix with relative abundance as percentages. Use --output-species-by-site-level to choose the taxonomic rank (domain, phylum, class, order, family, genus, or species):

bash

singlem summarise \
  --input-taxonomic-profile sample.profile.tsv \
  --output-species-by-site-relative-abundance sample.phylum.csv \
  --output-species-by-site-level phylum

Example output (one column per sample when multiple samples are present):

taxonomy                                marine0.1
unassigned                              50.9
Root; d__Archaea; p__Thermoproteota     7.79
Root; d__Bacteria; p__Desulfobacterota  11.13
Root; d__Bacteria; p__Proteobacteria    30.18

To generate tables for all taxonomic levels at once, use a prefix:

bash

singlem summarise \
  --input-taxonomic-profile sample.profile.tsv \
  --output-species-by-site-relative-abundance-prefix myprefix
# produces: myprefix-domain.tsv, myprefix-phylum.tsv, ..., myprefix-species.tsv

Long form with extra columns (filled coverage, relative abundance, level)

bash

singlem summarise \
  --input-taxonomic-profile sample.profile.tsv \
  --output-taxonomic-profile-with-extras sample.with_extras.tsv

Adds full_coverage (coverage including descendants), relative_abundance (%), and level columns:

sample     coverage  full_coverage  relative_abundance  level   taxonomy
marine0.1  0         7.19           100.0               root    Root
marine0.1  3.64      4.20           58.41               domain  Root; d__Archaea
marine0.1  0.02      2.99           41.59               domain  Root; d__Bacteria
marine0.1  0.56      0.56           7.79                phylum  Root; d__Archaea; p__Thermoproteota
marine0.1  0.80      0.80           11.13               phylum  Root; d__Bacteria; p__Desulfobacterota
marine0.1  2.17      2.17           30.18               phylum  Root; d__Bacteria; p__Proteobacteria

Note: coverage here is unfilled (not including descendants); full_coverage is filled (sum of a taxon and all its descendants).

Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc

bash

singlem pipe \
  --forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
  -p sample.profile.tsv --threads 8
 
singlem prokaryotic_fraction \
  --forward sample_R1.fq.gz --reverse sample_R2.fq.gz \
  -p sample.profile.tsv \
  > sample.prokaryotic_fraction.tsv

Re-profile with updated reference database (no re-running pipe)

Requires that the original run saved an --archive-otu-table.

bash

singlem renew \
  --input-archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample_updated.profile.tsv \
  --metapackage /path/to/new_metapackage

renew also accepts --assignment-method, --threads, and --min-taxon-coverage, just like pipe.

Combine OTU tables from multiple separate runs

bash

singlem summarise \
  --input-otu-tables s1.otu_table.csv s2.otu_table.csv s3.otu_table.csv \
  --output-otu-table combined.otu_table.csv

Assess how much of a metagenome's prokaryotes have an associated genome/MAG (`singlem appraise`)

appraise compares OTU sequences from genomes and/or assemblies against those from the raw metagenome, reporting which lineages are represented and which are missing.

bash

singlem pipe --sequences raw.fq.gz --otu-table metagenome.otu_table.csv
singlem pipe --genome-fasta-files my-genomes/*.fasta --otu-table genomes.otu_table.csv
singlem appraise \
  --metagenome-otu-tables metagenome.otu_table.csv \
  --genome-otu-tables genomes.otu_table.csv

Useful extras:

--assembly-otu-tables — appraise an assembly alongside (or instead of) binned genomes.
--imperfect — match OTU sequences that are similar but not identical (e.g. to credit a genus-level representative); tune with --sequence-identity.
--plot appraise.svg — render the appraisal visually (one rectangle per OTU sequence, sized by abundance).
--output-binned-otu-table / --output-unbinned-otu-table / --output-unaccounted-for-otu-table — write OTU tables of the represented vs. missing populations.

Advanced & Expert Modes

These subcommands support custom reference data and lower-level analyses. Most users never need them.

Add genomes to a reference metapackage (`singlem supplement`)

Creates a new metapackage that includes your genomes, so future pipe runs can identify them. Taxonomy for the new genomes is assigned with GTDB-Tk (installed separately, with a version matching the metapackage's GTDB release) unless supplied via --taxonomy-file or --new-fully-defined-taxonomies.

bash

singlem supplement \
  --new-genome-fasta-files genome1.fna genome2.fna \
  --input-metapackage /path/to/metapackage \
  --output-metapackage supplemented.smpkg \
  --checkm2-quality-file checkm2_quality.tsv \
  --dereplicate-with-galah \
  --threads 8

A dereplication mode is required: either --dereplicate-with-galah (run galah at species level) or --no-dereplication (inputs are already dereplicated). A quality-filtering choice is also required: pass CheckM2 results with --checkm2-quality-file, or skip with --no-quality-filter (and optionally --no-taxon-genome-lengths if no CheckM2 file is supplied).

Build and query a SingleM database (`singlem makedb` / `singlem query`)

Useful for asking "is this OTU sequence (or anything similar) present in samples B, C, D?". .sdb is the conventional database extension.

bash

# Build a database from OTU tables
singlem makedb \
  --otu-tables B.otu_table.csv C.otu_table.csv D.otu_table.csv \
  --db BCD.sdb
 
# Find database sequences within a given divergence of query OTUs
singlem query \
  --db BCD.sdb \
  --query-otu-table A.otu_table.csv \
  --max-divergence 3

query can also dump database contents filtered by sample (--sample-names), by taxonomy (--taxonomy Archaea), or in full (--dump).

Re-derive a profile from an OTU table (`singlem condense`)

condense turns an archive OTU table into a taxonomic profile. It is normally invoked implicitly by pipe's -p / --taxonomic-profile, but can be run standalone — e.g. to recompute a profile with a different --min-taxon-coverage without re-running pipe. See "Save an archive OTU table" under Output Options for an example.

Create or inspect a metapackage (`singlem metapackage`)

Assemble individual SingleM packages (.spkg) into a metapackage, or inspect an existing one with --describe.

bash

# Describe the contents of an existing metapackage
singlem metapackage --metapackage /path/to/metapackage --describe
 
# Create a metapackage from individual packages
singlem metapackage \
  --singlem-packages pkg1.spkg pkg2.spkg \
  --metapackage new.smpkg \
  --nucleotide-sdb markers.sdb

Build SingleM packages from scratch (`singlem seqs` → `create` → `regenerate`)

Building a marker package is a multi-step expert workflow:

`singlem seqs` — given an HMM-aligned FASTA, choose the best (most conserved) window position.
`singlem create` — finalise a SingleM package from a GraftM package, a taxonomy file, and the window position from seqs.
`singlem regenerate` — update an existing SingleM package with new sequences/taxonomy without rebuilding from scratch.

bash

# 1. Choose the window position within the HMM
singlem seqs --alignment aligned.fasta --alignment-type aa --hmm marker.hmm
 
# 2. Create the package using the hmm-position reported by step 1
singlem create \
  --input-graftm-package marker.gpkg \
  --input-taxonomy marker_taxonomy.tsv \
  --hmm-position 25 \
  --target-domains Bacteria Archaea \
  --gene-description "Ribosomal protein S2" \
  --output-singlem-package marker.spkg

--gene-description is required — it is the free-form text shown by singlem metapackage --describe.

Phage Profiling (Lyrebird)

For dsDNA phage profiling, use the lyrebird command with the same interface:

bash

# Download lyrebird reference data
lyrebird data --output-directory /path/to/lyrebird_metapackage
 
lyrebird pipe \
  --forward sample_R1.fq.gz \
  --reverse sample_R2.fq.gz \
  -p sample.phage_profile.tsv \
  --threads 8

Lyrebird uses >500 phage marker genes and vConTACT3-based taxonomy (not GTDB).

Lyrebird also provides condense and renew for archive OTU tables, mirroring their SingleM counterparts but using a Lyrebird metapackage. Save an archive OTU table from lyrebird pipe with --archive-otu-table to use them:

bash

# Re-derive a phage profile from an archive OTU table
lyrebird condense \
  --input-archive-otu-table sample.archive.otu_table.json.gz \
  -p sample.phage_profile.tsv
 
# Re-assign phage taxonomy against an updated Lyrebird metapackage
lyrebird renew \
  --input-archive-otu-table sample.archive.otu_table.json.gz \
  -p sample.updated.phage_profile.tsv \
  --metapackage /path/to/new_lyrebird_metapackage

Quick Reference — Most Common Commands

bash

# 1. Download reference data (once, after conda install)
singlem data --output-directory ~/singlem_metapackage
 
# 2. Profile paired-end metagenome (save archive OTU table for future re-use)
singlem pipe \
  --forward sample_R1.fq.gz \
  --reverse sample_R2.fq.gz \
  --archive-otu-table sample.archive.otu_table.json.gz \
  --taxonomic-profile sample.profile.tsv \
  --threads 16
 
# 3. View profile
cat sample.profile.tsv
 
# 4. Convert to Krona chart
singlem summarise \
  --input-taxonomic-profiles sample.profile.tsv \
  --output-taxonomic-profile-krona sample.krona.html

Citation

If you use SingleM, please cite:

Ben J. Woodcroft et al. Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02738-1

← v1.0.0 All versions

Skill v1.0.1

SingleM Taxonomic Profiling Skill

Overview

Installation

Conda (recommended)

Docker (includes reference data — no separate data download needed)

Singularity/Apptainer

Core Concepts

SingleM Subcommands at a Glance

Main tools

Advanced / expert modes

Generating a Taxonomic Profile

Basic usage — paired-end short reads

Single-end or unpaired reads

Long reads (Nanopore ≥R10.4.1 or PacBio HiFi)

Multiple samples — combined in one run

Genome / assembly input

Output Options

Also save an OTU table (--otu-table)

Save an archive OTU table (--archive-otu-table) — recommended for long-term archiving

Key Options

Output Format

Taxonomic profile (-p / --taxonomic-profile) — SingleM condensed format

OTU table (--otu-table)

Important Caveats

Downstream Analysis

Convert condensed profile to other formats (singlem summarise)

Krona chart (interactive HTML)

Relative abundance species-by-site table

Long form with extra columns (filled coverage, relative abundance, level)

Estimate fraction of reads that are bacterial/archaeal (prokaryotic) rather than eukaryotic/phage/etc

Re-profile with updated reference database (no re-running pipe)

Combine OTU tables from multiple separate runs

Assess how much of a metagenome's prokaryotes have an associated genome/MAG (singlem appraise)

Advanced & Expert Modes

Add genomes to a reference metapackage (singlem supplement)

Build and query a SingleM database (singlem makedb / singlem query)

Re-derive a profile from an OTU table (singlem condense)

Create or inspect a metapackage (singlem metapackage)

Build SingleM packages from scratch (singlem seqs → create → regenerate)

Phage Profiling (Lyrebird)

Quick Reference — Most Common Commands

Citation

Also save an OTU table (`--otu-table`)

Save an archive OTU table (`--archive-otu-table`) — recommended for long-term archiving

Taxonomic profile (`-p` / `--taxonomic-profile`) — SingleM condensed format

OTU table (`--otu-table`)

Convert condensed profile to other formats (`singlem summarise`)

Assess how much of a metagenome's prokaryotes have an associated genome/MAG (`singlem appraise`)

Add genomes to a reference metapackage (`singlem supplement`)

Build and query a SingleM database (`singlem makedb` / `singlem query`)

Re-derive a profile from an OTU table (`singlem condense`)

Create or inspect a metapackage (`singlem metapackage`)

Build SingleM packages from scratch (`singlem seqs` → `create` → `regenerate`)