Swift Neighbor-Joining

About

Swift Neighbor-Joining is a phylogeny reconstruction method invented by Naruya Saitou and described in Chapter 17 of his book "Introduction to Evolutionary Genomics", Second Edition.

This page is a Work-In-Progress collection of notes on implementing and testing this method. By Kirill Kryukov, March 2020.

Notes

Note: This is not a final version, and should be only used for testing.

Note 2: Performance of this code should not be taken as representative of Swift NJ, because it's not complete yet and possibly performance may improve with further development.

Source code

Installing

Unpacking and running "make" in its directory should be enough to build it on a Unix-like system. On Windows it can be built using cygwin.

Usage

It's a command line tool. By default it reads a distance matrix from standard input, and writes generated tree to standard output. Both streams should be normally redirected to read from / write to a file: swiftnj <distance.martix >tree.newick

With "--verbose" flag it writes a detailed log to the error stream. To save this log to a file, use command like: swiftnj --verbose <distance.martix >tree.newick 2>tree.log

File formats

Input distance matrix is a full square matrix, in PHYLIP format (PHYLIP). Each OTU name is 10 characters long (space-padded if necessary), and can not start with at space. Currently input parser is not yet hardened to malformed input, but this is planned in the future.

Output tree is in Newick format (Wikipedia, PHYLIP). Output tree is unrooted.

Algorithm notes

v.0.0.1. Implements the algorithm in a straightforward way, as described in the "Introduction to Evolutionary Genomics", Second Edition, by Naruya Saitou.

Sometimes the method was getting stuck after running out of pairs to join (all pairs were suppressed in earlier steps). Example test data. Therefore I added "unstucking" logic, which detects such occurrences and removes suppression from all pairs.

v.0.0.2. Significantly improves speed at the expense of memory consumption. Main change: It loads all OTU pairs into a flat array, then sorts it by distance. This allows quickly selecting the next pair with the smallest distance without searching through entire distance matrix.

In addition, v.0.0.2 organizes suppressed OTUs in a doubly linked list, which allows iterating through unsuppressed OTUs while skipping all suppressed OTUs.

v.0.0.3 Implemented unrooted tree output. Improved speed of sorting pairs. Improved speed of releasing pair suppression.

Evaluation

See benchmark at: Phylogeny Benchmark.

References