NAF is a format for storing DNA, RNA and protein sequences. It's lossless, very compact, has extremely fast decompression, and does not use a reference genome. NAF is intended to replace gzipped FASTA and FASTQ for sequence data exchange and storage.
How to remember: After compressing your data with ennaf, you suddenly have enough space. However if you decompress it back with unnaf, your space is again un-enough.
NAF format and this web-site is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use.
Test dataset: human genome (3.3 GB)
See benchmarks below for details and other datasets.
FASTA:
FASTQ:
For a more systematic benchmark, please see Sequence Compression Benchmark.
NAF aims to find balance between simplicity, strong compression, and fast decompression. NAF is based on several simple ideas:
Since NAF is a binary format, it can't be manipulated with grep, head, and other text utilities, unlike FASTA and FASTQ, but similarly to gzipped FASTA and gzipped FASTQ (or to any other compressed format).
See NAF format specification for details.
NAF compressor and decompressor are available at github: https://github.com/KirillKryukov/naf.
If you use NAF, please cite:
Previously available at http://biorxiv.org/cgi/content/short/501130v2, doi: 10.1101/501130.
Any comments, suggestions or requests are welcome. Please email to: kkryukov@gmail.com .