NSS Alignment Format

Description

NSS (Name-Sequence-Sequence) is a sequence alignment format. It's a text format, storing one alignment per line. In case of pairwise alignment each line has 3 tab-separated fields: The first field is the alignment name, the second and third fields are aligned sequences. The name can be empty, in such case the line begins with tab character.

Example:

aln1	gataacaggcgtgaac	gataacaggtgtgaac
aln2	taa--aaaaaaaTAGACTCT	taacaaaaaaaaaaGACTCT

It can be naturally extended to multiple alignment: In such case each line would have N+1 fields (where N is the number of aligned sequences).

Origin

I first saw this format supported by Dr. Yuichiro Hara's scripts. I then decided to document it and support it in my tools.

Discussion

The advantages of this format:

Compact
Easy to parse

Disadvantages of this format:

Each line can be very long, this can cause troubles for text editors.
In plain text view the aligned sites are far from each other, making visual inspection impossible in text viewer/editor.
Whole sequence has to be loaded before seeing the beginning of the next sequence. This can be slow or otherwise problematic in case of very long sequences, such as the complete chromosomes.


	© 2014 Kirill Kryukov This document has been placed in the public domain