Unsupported Browser Detected

Internet Explorer lacks support for the features of this website. For the best experience, please use a modern browser such as Chrome, Firefox, or Edge.

Cetacean genome assemblies and data sets are being generated at an ever-increasing rate, using a variety of genomic sequencing and assembly methods. In order to minimize redundancy and cost, we are monitoring multiple genome databases and publications to catalog genome resources. Availability of genomic data can both reduce costs of completing a reference-quality genome, and shift the priorities for generating genome assemblies of species with no public genomic resources.

The table below lists the genome assemblies currently available through public genome databases and genome project-oriented web sites. If we have missed a recent genome release, please let us know by emailing phillip.morin@noaa.gov. Additional details of some cetacean genome assemblies are available at from the Vertebrate Genomes Project and DNAzoo.


Figure 1. Fragmentation and gene identification data for reference vs. draft genomes from the NCBI genomes database (June 2022). Data from reference genomes are shown in red (Blue whale (Balaenoptera Musculus), vaquita (Phocoena sinus) and bottlenose dolphin (Tursiops truncatus)). A. Assembly contiguity metrics. Contigs are segments of contiguous, i.e. gapless, sequence. Scaffolds are sets of contigs that have been ordered and oriented using long-range mapping data such as optical maps and Hi-C. The scaffolds contain gaps. N50 is a measure of average length, e.g. 50% of all bases are contained in contigs of length N50 or longer. B. BUSCO scores represent % of universal single copy orthologs found in an annotated genome. Universal single copy orthologs are genes that are present in a single copy in all or most genomes within a phylogenetic group. Complete versions of most of these genes can usually be identified in every genome within the group, represented by the % Complete score. However, some genes are fragmented (% Fragmented) or missing (% Missing = 100 - % Complete - % Fragmented). A high % Complete score is an indication that a genome assembly is not missing a large amount of gene-coding sequence. (Figures adapted from Bukhman et al. 2022 (in review)).

Figure 1. A.

Figure 1. B.

Full table view: Open Dataset In New Window

Loading Data...