FASTA Splitter

About

When sequence data is large it often makes sense to analyze it in smaller chunks. This script divides a large FASTA file into a set of smaller, approximately equally sized files.

This tool was made by Kirill Kryukov. It is shared with the hope that it can be useful, but without any warranties.

News

2014-02-14 – Version 0.2.0 is a rewrite with following improvements:

This version has new command line format, please note when upgrading from previous version.

2012-03-02 – Version 0.1.1 - minor correction.

2012-02-01 – This page is created, version 0.1.0 is uploaded.

Download

Current version:

Old versions:

(Distributed under the zlib/libpng license, see the source file for details)

Usage

Usage: fasta-splitter.pl [options] <file>... Options: --n-parts <N> - Divide into <N> parts --part-size <N> - Divide into parts of size <N> --measure (all|seq|count) - Specify whether all data, sequence length, or number of sequences is used for determining part sizes ('all' by default). --line-length - Set output sequence line length, 0 for single line (default: 60). --eol (dos|mac|unix) - Choose end-of-line character ('unix' by default). --version - Show version. --help - Show help.

The script supports two strategies: dividing into given number of parts (--n-parts <N>) and dividing into parts of given size (--part-size <N>).

It's possible to specify both --n-parts <N> and --part-size <M>. In such case the size of each part will not exceed <M>, and at most <N> parts will be written. This can be useful to extract some parts from the beginning of a large FASTA file without processing the whole file.

--measure option controls what is used to determine part sizes. With --measure count simply the number of sequences is used to delimit parts. With --measure seq sequence length in basepairs is used. With --measure all total size in bytes is used (including sequence names and end of line characters).

Limitations

If you have any questions, comments or suggestions, please contact me.


  © 2012-2014 Kirill Kryukov
This page is available under the CC BY 3.0 License