Classify a directory of bins¶
Metagenomic binning gives you one FASTA per putative genome. GVClass takes a directory of those bins and returns a taxonomy call and a quality table for each one. This is the recommended way to run the tool.
Prepare the input directory¶
Put one FASTA file per putative genome in a single directory. A file may hold several contigs that belong to the same genome; GVClass treats the whole file as one query. Use nucleotide FASTA (.fna) or protein FASTA (.faa).
For reliable giant-virus calls, mind the assembled length:
- Minimum supported: about 20 kb.
- Better reliability: at or above 30 kb.
- Preferred: at or above 50 kb.
Keep filenames clean. The filename becomes the query name, so avoid ., ;, and :. Use _ or - instead. For protein input, write headers as filename|proteinid.
Tip
Filter short contigs (below a few kb) out of each bin before you run. For giant viruses, prefer bins assembled to at least 50 kb; short fragments add noise and weaken the marker signal.
Run the classification¶
Run from the cloned repository so the launcher can find src/:
Here my_bins is your input directory, -o my_results is the output directory, and -t 32 sets the total thread budget. Omit -o and results go to <query_dir>_results (for this command, my_bins_results).
Note
To run from any directory, install the CLI wrapper or use the Apptainer wrapper on a cluster. See Configure the database for shared database setups and Run on an HPC cluster for batch submission.
Choose parallelism¶
Two flags control throughput:
-tsets the total number of threads.-jsets how many bins run at once (workers). GVClass picks a worker count automatically when you leave-junset.
For a directory of many small bins, more workers help; for a few large genomes, give each worker more threads. See Tune speed and accuracy for the trade-offs.
Read the results¶
GVClass writes a combined table for the whole run plus per-query files:
gvclass_summary.tsvandgvclass_summary.csv: one row per bin with the taxonomy call and quality metrics.- per-query files such as
<query>.final_summary.tsvand<query>.tar.gzinside the output directory.
See Output files and columns for every file and the full column layout. To turn the table into a curation decision, read Assess genome quality. For what happens between FASTA and taxonomy, see How GVClass works.