Skip to content

Configuration file

GVClass reads run settings from a YAML file. Command-line flags override file values, and file values override built-in defaults. The shipped file is config/gvclass_config.yaml.

File lookup and precedence

GVClass resolves the config in this order:

  1. The path passed to -c / --config.
  2. ./gvclass_config.yaml in the current working directory.
  3. config/gvclass_config.yaml in the repository.

Effective values are merged with this precedence:

Source Precedence
CLI flags (see cli.md) highest
Config file middle
Built-in defaults lowest

Shipped configuration (gvclass-dev)

# Database configuration
database:
  path: resources
  download_url: https://dl.newlineages.com/gvclass/resources_v1_7_1.tar.gz
  download_version: v1.7.1
  download_sha256: 6f8ca4e0f61e094a7d05669e4024e07db9e3c1813fc07172e25113d362512c14
  expected_size: 2005

# Pipeline settings
pipeline:
  tree_method: veryfasttree
  iqtree_mode: fast
  mode_fast: true
  completeness_mode: novelty-aware
  sensitive_mode: true
  contigs_min_length: 10000
  threads: 4
  output_pattern: "{query_dir}_results"

# Genetic code settings
genetic_codes:
  codes: [0, 1, 4, 6, 11, 15, 29, 106, 129]
  improvement_threshold: 0.05

# Quality thresholds
quality:
  min_markers: 3
  min_length: 20000
  recommended_length: 50000

# Resource allocation
resources:
  memory_limit: 8
  task_timeout: 60

# Logging
logging:
  level: ERROR
  keep_temp: false

Note

On main, the database block points at the Zenodo release asset instead of the dl.newlineages.com tunnel shown here. See configure the database.

Keys

database

Key Default Meaning
path resources Database bundle location. Relative paths resolve from the repo root.
download_url https://dl.newlineages.com/gvclass/resources_v1_7_1.tar.gz Archive fetched for first-time setup and auto-update.
download_version v1.7.1 Pinned database version. An older installed DB_VERSION triggers a re-download.
download_sha256 6f8ca4e0...362512c14 Checksum verified after download.
expected_size 2005 Expected archive size in MB, used for validation.

The full checksum is 6f8ca4e0f61e094a7d05669e4024e07db9e3c1813fc07172e25113d362512c14. Override the database location with -d / --database or the GVCLASS_DB environment variable. See configure the database.

pipeline

Key Default Meaning
tree_method veryfasttree Per-marker tree builder. One of veryfasttree, iqtree, fasttree (fasttree is an alias for veryfasttree).
iqtree_mode fast Species-tree IQ-TREE search. fast or ufboot. Per-marker trees always run --fast.
mode_fast true Skip order-level marker trees when true.
completeness_mode novelty-aware Completeness estimator surfaced in the summary. novelty-aware or legacy.
sensitive_mode true Use E=1e-5 cutoffs instead of GA model cutoffs.
contigs_min_length 10000 Minimum contig length (bp) when splitting files in --contigs mode.
threads 4 Total compute threads.
output_pattern {query_dir}_results Output directory name. {query_dir} is the basename of the query directory.

For tuning these against runtime, see tune speed and accuracy.

genetic_codes

Key Default Meaning
codes [0, 1, 4, 6, 11, 15, 29, 106, 129] Genetic codes tested during gene calling. Code 0 is pyrodigal meta mode.
improvement_threshold 0.05 Minimum fractional improvement (5%) required to select an alternative code over meta.

quality

Key Default Meaning
min_markers 3 Minimum markers required for a query.
min_length 20000 Minimum contig length (bp). Shorter .fna inputs are skipped unless --allow-short is set.
recommended_length 50000 Recommended minimum length (bp).

resources

Key Default Meaning
memory_limit 8 Memory limit per worker (GB).
task_timeout 60 Per-task timeout (minutes).

logging

Key Default Meaning
level ERROR Log level. One of DEBUG, INFO, WARNING, ERROR, CRITICAL.
keep_temp false Keep intermediate files when true.