NSS Alignment Format
Description
NSS (Name-Sequence-Sequence) is a sequence alignment format. It's a text format, storing one alignment per line. In case of pairwise alignment each line has 3 tab-separated fields: The first field is the alignment name, the second and third fields are aligned sequences. The name can be empty, in such case the line begins with tab character.
Example:
aln1 gataacaggcgtgaac gataacaggtgtgaac
aln2 taa--aaaaaaaTAGACTCT taacaaaaaaaaaaGACTCT
It can be naturally extended to multiple alignment: In such case each line would have N+1 fields (where N is the number of aligned sequences).
Origin
I first saw this format supported by Dr. Yuichiro Hara's scripts. I then decided to document it and support it in my tools.
Discussion
The advantages of this format:
- Compact
- Easy to parse
Disadvantages of this format:
- Each line can be very long, this can cause troubles for text editors.
- In plain text view the aligned sites are far from each other, making visual inspection impossible in text viewer/editor.
- Whole sequence has to be loaded before seeing the beginning of the next sequence. This can be slow or otherwise problematic in case of very long sequences, such as the complete chromosomes.
© 2014 Kirill Kryukov This document has been placed in the public domain |