Phylogeny Benchmark
About
This is a small benchmark of tree-building tools on simulated DNA sequences.
Tested tools
Currently the benchmark includes tools implementing Neighbor-Joining (Saitou and Nei, 1987) or its variants. More methods may be covered in the future.
- BioNJ, 1997 (Gascuel, 1997; Homepage)
- Ninja 1.2.2, 2013 (Wheeler, 2009; Homepage)
- QuickTree 2.5, 2019 (Homepage, GitHub)
- RapidNJ 2.3.2, 2018 (Simonsen et al., 2008; Homepage)
- SwiftNJ 0.0.3 (Saitou, 2018; Homepage)
- Weighbor 1.2, 2001 (Bruno et al., 2000; Web archive)
All of these tools work in command line, accept PHYLIP-formatted distance matrices and produce trees in Newick format.
Method
Testing is done on randomly generated data, produced using Clockwork Evolver (http://kirill-kryukov.com/study/tools/clockwork-evolver/). The script produces aligned sequences and a reference tree. Distance matrix is constructed based on the alignment using Distance Matrix Builder (http://kirill-kryukov.com/study/tools/distance-matrix-builder/). Distances are converted to p-distances using divide-matrix-by-number.
Test procedure: For each number of OTUs N, a set of test data is generated (alignment, distance matrix and reference tree). Sequence length of 16,500 bp is used. Then all tree-building software is applied to construct trees. Time spent for tree construction is measured and recorded. Each tree is compared via Robinson-Foulds distance with the reference tree, using "raxml-ng" (Kozlov et al., 2019; GitHub). The procedure is repeated 20 times (each time with new test data) and the results are averaged for each tool. Then the benchmark moves to the next N.
Two other tools have been tried for measuring distance between trees, but both were found problematic. "hashrf" (Sul and Williams, 2007; Homepage) does not correctly measure distance between unrooted trees. "Ktreedist" (Soria-Carrasco et al., 2007; Homepage) can't measure distance between a rooted and an unrooted tree.
Testing is automated with the custom-made benchmark script.
Benchmark results
Weighbor is excluded after 500 OTUs, and BioNJ is excluded after 3,500 OTUs due to their slowness.
Calculation time, in seconds. Each number is an average from 20 runs.
Size | bionj | ninja | quicktree | rapidnj | swiftnj-0.0.3 | weighbor |
---|---|---|---|---|---|---|
5 | 0.026 | 0.277 | 0.120 | 0.129 | 0.114 | 0.068 |
6 | 0.026 | 0.275 | 0.105 | 0.110 | 0.102 | 0.069 |
7 | 0.024 | 0.271 | 0.066 | 0.066 | 0.069 | 0.063 |
8 | 0.023 | 0.269 | 0.063 | 0.068 | 0.068 | 0.066 |
9 | 0.023 | 0.275 | 0.064 | 0.067 | 0.066 | 0.065 |
10 | 0.023 | 0.276 | 0.068 | 0.066 | 0.065 | 0.069 |
20 | 0.028 | 0.292 | 0.111 | 0.128 | 0.100 | 0.079 |
30 | 0.028 | 0.281 | 0.123 | 0.152 | 0.118 | 0.102 |
40 | 0.028 | 0.299 | 0.117 | 0.181 | 0.120 | 0.137 |
50 | 0.032 | 0.314 | 0.123 | 0.173 | 0.116 | 0.195 |
60 | 0.035 | 0.309 | 0.135 | 0.179 | 0.112 | 0.277 |
70 | 0.037 | 0.331 | 0.129 | 0.188 | 0.117 | 0.379 |
80 | 0.039 | 0.336 | 0.126 | 0.182 | 0.118 | 0.550 |
90 | 0.044 | 0.343 | 0.125 | 0.194 | 0.117 | 0.723 |
100 | 0.048 | 0.345 | 0.128 | 0.207 | 0.121 | 0.953 |
200 | 0.091 | 0.397 | 0.149 | 0.234 | 0.125 | 6.200 |
300 | 0.176 | 0.430 | 0.200 | 0.289 | 0.142 | 20.461 |
400 | 0.315 | 0.475 | 0.262 | 0.333 | 0.169 | 48.348 |
500 | 0.516 | 0.550 | 0.357 | 0.372 | 0.191 | 91.524 |
600 | 0.765 | 0.606 | 0.482 | 0.421 | 0.215 | - |
700 | 1.121 | 0.673 | 0.640 | 0.474 | 0.252 | - |
800 | 1.546 | 0.758 | 0.794 | 0.480 | 0.261 | - |
900 | 2.068 | 0.835 | 0.984 | 0.466 | 0.265 | - |
1000 | 3.094 | 0.730 | 1.243 | 0.534 | 0.332 | - |
1500 | 11.012 | 1.021 | 3.428 | 0.896 | 0.628 | - |
2000 | 31.035 | 1.475 | 7.607 | 1.308 | 1.157 | - |
2500 | 71.580 | 2.108 | 14.564 | 1.784 | 1.727 | - |
3000 | 133.748 | 2.703 | 24.723 | 2.452 | 2.415 | - |
3500 | 242.768 | 3.332 | 38.142 | 3.126 | 3.203 | - |
4000 | - | 4.183 | 56.699 | 3.994 | 4.164 | - |
4500 | - | 5.054 | 80.319 | 4.933 | 5.248 | - |
5000 | - | 6.041 | 109.899 | 5.968 | 6.446 | - |
5500 | - | 6.937 | 145.015 | 7.239 | 7.895 | - |
6000 | - | 8.192 | 187.639 | 8.692 | 9.257 | - |
6500 | - | 9.487 | 237.991 | 10.266 | 11.125 | - |
7000 | - | 10.808 | 282.819 | 11.507 | 12.193 | - |
7500 | - | 14.758 | 342.903 | 13.097 | 13.992 | - |
8000 | - | 18.087 | 413.387 | 15.363 | 15.976 | - |
8500 | - | 48.144 | 499.528 | 16.952 | 18.173 | - |
9000 | - | 51.347 | 589.581 | 19.317 | 20.294 | - |
9500 | - | 56.832 | 736.976 | 22.130 | 23.720 | - |
10000 | - | 63.122 | 850.169 | 25.104 | 26.261 | - |
Topological distance (Robinson-Foulds metric) from the correct tree. Each number is an average from 20 runs.
Size | bionj | ninja | quicktree | rapidnj | swiftnj-0.0.3 | weighbor |
---|---|---|---|---|---|---|
5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 0.2 | 0.1 | 0.1 | 0.2 | 0.1 | 0.2 |
8 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
10 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
20 | 0.5 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 |
30 | 0.9 | 1.1 | 1.1 | 1.1 | 1.5 | 0.8 |
40 | 1.5 | 1.5 | 1.5 | 1.5 | 2.5 | 1.2 |
50 | 1.2 | 1.6 | 1.6 | 1.6 | 4.4 | 1.5 |
60 | 1.6 | 1.7 | 1.7 | 1.7 | 3.9 | 1.6 |
70 | 2.5 | 2.7 | 2.7 | 2.7 | 5.3 | 2.5 |
80 | 3.2 | 3.0 | 3.0 | 3.0 | 5.7 | 2.9 |
90 | 3.5 | 3.6 | 3.6 | 3.6 | 6.5 | 3.4 |
100 | 3.3 | 3.0 | 3.0 | 3.0 | 9.7 | 3.1 |
200 | 5.7 | 6.4 | 6.4 | 6.4 | 22.2 | 5.9 |
300 | 10.9 | 11.7 | 11.7 | 11.7 | 37.7 | 10.4 |
400 | 13.5 | 15.2 | 15.2 | 15.2 | 56.3 | 15.4 |
500 | 18.6 | 21.0 | 21.0 | 21.0 | 69.4 | 20.0 |
600 | 23.0 | 25.3 | 25.2 | 25.3 | 86.2 | - |
700 | 27.8 | 30.0 | 30.0 | 30.0 | 109.4 | - |
800 | 30.4 | 33.1 | 33.1 | 33.1 | 119.7 | - |
900 | 33.0 | 36.9 | 36.9 | 36.9 | 132.1 | - |
1000 | 37.9 | 43.6 | 43.7 | 43.7 | 152.1 | - |
1500 | 61.5 | 69.7 | 69.7 | 69.8 | 238.2 | - |
2000 | 78.1 | 85.5 | 85.6 | 85.6 | 319.0 | - |
2500 | 101.3 | 114.5 | 114.2 | 114.4 | 394.8 | - |
3000 | 126.7 | 141.4 | 141.0 | 141.2 | 494.7 | - |
3500 | 135.1 | 153.7 | 153.8 | 153.6 | 554.3 | - |
4000 | - | 186.5 | 186.4 | 186.8 | 649.6 | - |
4500 | - | 200.7 | 200.6 | 200.7 | 712.7 | - |
5000 | - | 233.0 | 233.1 | 233.2 | 829.7 | - |
5500 | - | 266.4 | 266.1 | 266.6 | 910.1 | - |
6000 | - | 286.4 | 286.7 | 286.1 | 984.8 | - |
6500 | - | 303.5 | 304.0 | 304.1 | 1066.0 | - |
7000 | - | 335.5 | 335.7 | 335.4 | 1155.0 | - |
7500 | - | 353.4 | 354.3 | 353.6 | 1250.1 | - |
8000 | - | 377.6 | 377.3 | 378.1 | 1326.5 | - |
8500 | - | 392.9 | 392.8 | 392.9 | 1413.4 | - |
9000 | - | 424.4 | 425.1 | 424.4 | 1488.1 | - |
9500 | - | 463.6 | 461.9 | 463.6 | 1582.3 | - |
10000 | - | 474.6 | 474.3 | 474.5 | 1679.7 | - |
References
- William J. Bruno, Nicholas D. Socci, and Aaron L. Halpern (2000) "Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction" Molecular Biology and Evolution, 2000, 17(1), 189-197.
- Olivier Gascuel (1997) "BioNJ: an improved version of the NJ algorithm based on a simple model of sequence data" Molecular Biology and Evolution, 14, 685-695.
- Alexey M. Kozlov, Diego Darriba, Tomas Flouri, Benoit Morel, and Alexandros Stamatakis (2019) "RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference" Bioinformatics, btz305, doi:10.1093/bioinformatics/btz305
- Naruya Saitou and Masatoshi Nei (1987) "The neighbor-joining method: a new method for reconstructing phylogenetic trees" Molecular Biology and Evolution, 4(4), 406-425.
- Naruya Saitou (2018) "Introduction to Evolutionary Genomics", Second Edition, Springer, doi:10.1007/978-3-319-92642-1, eBook ISBN: 978-3-319-92642-1.
- Martin Simonsen, Thomas Mailund and Christian N.S. Pedersen (2008) "Rapid Neighbour Joining" In Proceedings of the 8th Workshop in Algorithms in Bioinformatics (WABI), LNBI 5251, 113-122, Springer Verlag, October 2008. doi:10.1007/978-3-540-87361-7_10.
- Victor Soria-Carrasco, Gerard Talavera, Javier Igea, Jose Castresana (2007) "The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees" Bioinformatics, 23(21), 1 November 2007, 2954-2956.
- Seung-Jin Sul, Tiffani L. Williams (2007) "A randomized algorithm for comparing sets of phylogenetic trees", In Proc. Fifth Asia Pacific Bioinformatics Conference (APBC'07), pages 121-130.
- T.J. Wheeler (2009) "Large-scale neighbor-joining with NINJA" In S.L. Salzberg and T. Warnow (Eds.), Proceedings of the 9th Workshop on Algorithms in Bioinformatics. WABI 2009, pp. 375-389. Springer, Berlin.