NAF

Nucleotide Archival Format

NAF is a format for storing DNA, RNA and protein sequences. It's lossless, very compact, has extremely fast decompression, and does not use a reference genome. NAF is intended to replace gzipped FASTA and FASTQ for sequence data exchange and storage.

Features

  • Stores DNA, RNA or protein sequences, with or without qualities.
  • Compresses from FASTA or FASTQ.
  • Archive is a single file.
  • Not limited in number of stored sequences, or in sequence length.
  • Supports IUPAC's ambiguous nucleotide codes.
  • Supports storing mask (lower/upper case).
  • Very fast decompression.
  • Even faster when decompressing only sequence or only names.

Tools for working with NAF format

  • ennaf (compressor / encoder)
  • unnaf (decompressor)

How to remember: After compressing your data with ennaf, you suddenly have enough space. However if you decompress it back with unnaf, your space is again un-enough.

License

NAF format and this web-site is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use.

Example benchmark

Test dataset: human genome (3.3 GB)

 See benchmarks below for details and other datasets.


Benchmarks

FASTA:

FASTQ:

Text vs DNA mode

For a more systematic benchmark, please see Sequence Compression Benchmark.

Format

NAF aims to find balance between simplicity, strong compression, and fast decompression. NAF is based on several simple ideas:

Since NAF is a binary format, it can't be manipulated with grep, head, and other text utilities, unlike FASTA and FASTQ, but similarly to gzipped FASTA and gzipped FASTQ (or to any other compressed format).

See NAF format specification for details.

Tools

NAF compressor and decompressor are available at github: https://github.com/KirillKryukov/naf.

Citation

If you use NAF, please cite:

Previously available at http://biorxiv.org/cgi/content/short/501130v2, doi: 10.1101/501130.

Users

News

October 1, 2019
NAF paper is officially published at Bioinfofrmatics.
October 1, 2019
NAF tools version 1.1.0 are released.
February 25, 2019
NAF paper is online at Bioinformatics: btz144
January 17, 2019
NAF tools version 1.0.0 are released.
Archived news

Contact

Any comments, suggestions or requests are welcome. Please email to: kkryukov@gmail.com .