CoCoPyE is a fast tool for quality assessment of microbial genomes. It is able to reliably predict
completeness and contamination of bacterial and archaeal genomes. Additionally, it can provide a
taxonomic classification of the input.
Background: The classical approach for estimation of quality indices solely relies on a relatively small number of universal single copy genes. Because these classical markers only cover a small fraction of the whole genome, the quality assessment can be rather unreliable. Our method is based on a novel two-stage feature extraction and transformation scheme. It first performs a flexible extraction of genomic markers and then refines the marker-based estimates with a machine learning approach based on count-ratio histograms. In our simulation studies CoCoPyE showed a more accurate prediction of quality indices than existing tools.
Background: The classical approach for estimation of quality indices solely relies on a relatively small number of universal single copy genes. Because these classical markers only cover a small fraction of the whole genome, the quality assessment can be rather unreliable. Our method is based on a novel two-stage feature extraction and transformation scheme. It first performs a flexible extraction of genomic markers and then refines the marker-based estimates with a machine learning approach based on count-ratio histograms. In our simulation studies CoCoPyE showed a more accurate prediction of quality indices than existing tools.
Citing CoCoPyE
N. Birth, N. Leppich, J. Schirmacher, N. Andreae, R. Steinkamp, M. Blanke, P. Meinicke.
"CoCoPyE: feature engineering for learning and prediction of genome quality indices".
Preprint available on bioRxiv. https://doi.org/10.1101/2024.02.07.579156
Demo
Upload a FASTA file and let CoCoPyE calculate completeness and contamination.
Upload limit: 50MB
Results
Completeness estimate | - |
Contamination estimate | - |
Prediction method
|
- |
Taxonomy prediction
|
- |