FASTA Splitter


When sequence data is large it often makes sense to analyze it in smaller chunks. This script divides a large FASTA file into a set of smaller, approximately equally sized files. It works with whole sequences, never dividing a sequence in the middle.


2016-08-31 – Version 0.2.5:

2015-03-27 – Version 0.2.4:

Older news


Current version:

Old versions:

(Distributed under the zlib/libpng license, see the source file for details)


Usage: [options] <file>... Options: --n-parts <N> - Divide into <N> parts --part-size <N> - Divide into parts of size <N> --measure (all|seq|count) - Specify whether all data, sequence length, or number of sequences is used for determining part sizes ('all' by default). --line-length - Set output sequence line length, 0 for single line (default: 60). --eol (dos|mac|unix) - Choose end-of-line character ('unix' by default). --part-num-prefix T - Put T before part number in file names (def.: .part-) --out-dir - Specify output directory. --nopad - Don't pad part numbers with 0. --version - Show version. --help - Show help.

The script supports two strategies: dividing into given number of parts (--n-parts <N>) and dividing into parts of given size (--part-size <N>).

It's possible to specify both --n-parts <N> and --part-size <M>. In such case the size of each part will not exceed <M>, and at most <N> parts will be written. This can be useful to extract some parts from the beginning of a large FASTA file without processing the whole file.

--measure option controls what is used to determine part sizes. With --measure count simply the number of sequences is used to delimit parts. With --measure seq sequence length in basepairs is used. With --measure all total size in bytes is used (including sequence names and end of line characters).


If you have any questions, comments or suggestions, please contact me.

  © 2012-2016 Kirill Kryukov
This page is available under the CC BY 3.0 License