FASTA Splitter

About

When sequence data is large it often makes sense to analyze it in smaller chunks. This script divides a large FASTA file into a set of smaller, approximately equally sized files. It works with whole sequences, never dividing a sequence in the middle.

News

2014-08-29 – Version 0.2.2:

2014-08-25 – Version 0.2.1:

Older news

Download

Current version:

Old versions:

(Distributed under the zlib/libpng license, see the source file for details)

Usage

Usage: fasta-splitter.pl [options] <file>... Options: --n-parts <N> - Divide into <N> parts --part-size <N> - Divide into parts of size <N> --measure (all|seq|count) - Specify whether all data, sequence length, or number of sequences is used for determining part sizes ('all' by default). --line-length - Set output sequence line length, 0 for single line (default: 60). --eol (dos|mac|unix) - Choose end-of-line character ('unix' by default). --version - Show version. --help - Show help.

The script supports two strategies: dividing into given number of parts (--n-parts <N>) and dividing into parts of given size (--part-size <N>).

It's possible to specify both --n-parts <N> and --part-size <M>. In such case the size of each part will not exceed <M>, and at most <N> parts will be written. This can be useful to extract some parts from the beginning of a large FASTA file without processing the whole file.

--measure option controls what is used to determine part sizes. With --measure count simply the number of sequences is used to delimit parts. With --measure seq sequence length in basepairs is used. With --measure all total size in bytes is used (including sequence names and end of line characters).

Limitations

If you have any questions, comments or suggestions, please contact me.


  © 2012-2014 Kirill Kryukov
This page is available under the CC BY 3.0 License