BLAST Parser

About

This is a Perl script for parsing BLAST output and converting it into a more compact form (example below). The purpose of such conversion is to save storage space and enable faster downstream analysis, however all alignments will be lost.

This tool was made by Kirill Kryukov in Saitou lab, NIG. I share it with the hope that it can be useful, but without any warranties.

The parser was tested with the default pairwise output format (-outfmt 0) of blastn, blastx and tblastx from BLAST+ 2.2.25 package, and should probably work with any other version. Please report any issues or incompatibilities.

The space saving from using this parser varies greatly depending on the nature of the search (average alignment length, number of hits per database sequence, database sequence lengths, etc). Typically about 5 times reduction of output size can be expected, although in some worst cases it's only 3 times (e.g., with blastx search and NR database). With blastn searches the space saving is often 10 times or better.

Note: For most uses BLAST Compressor is a better choice, as it has a more complete parser and better compression.

News

2012-06-27 – Version 1.1.5. Improved compatibility with BLAST Compressor (with decompressed output).

2012-06-06 – Version 1.1.4. Improved compatibility with BLASTP and with ancient versions of BLAST. Added reporting the total number of hits.

2012-06-06 – Version 1.1.3. Improved compatibility with misformatted BLAST output.

2012-01-12 – Version 1.1.2 uploaded. Minor update, cosmetic changes.

Download

(Distributed under the zlib/libpng license, see the source file for details)

Usage

perl blast_parser.pl <blastoutput.txt >blastparsed.txt

You can also parse the BLAST output on the fly as it is generated, saving enormous amount of disk space (but losing all alignments). Just append

| perl blast_parser.pl >blastparsed.txt

to the end of your search command instead of specifying the output file.

Note: BLAST Parser is designed to work with local BLAST search output. Please don't try to use it on HTML output of online searches.

Example output

contig00004 (3377) ref|NT_022135.15|Hs2_22291 Homo sapiens chromosome 2 genomic contig, reference assembly (38390280) -1/+1 60(30,-,0) - 55(9e-004) 714..893 - 6489034..6489213 contig00057 (17399) ref|NT_034772.5|Hs5_34934 Homo sapiens chromosome 5 genomic contig, reference assembly (41199371) -2/+1 61(34,-,0) - 84(1e-011) 5945..6127 - 21229540..21229722 ref|NT_022517.17|Hs3_22673 Homo sapiens chromosome 3 genomic contig, reference assembly (66080833) -2/-1 81(33,-,0) - 70(1e-007) 4694..4936 - 38107628..38107870 -2/-3 48(25,-,0) - 60(1e-004) 4526..4669 - 38107065..38107208 ref|NT_022184.14|Hs2_22340 Homo sapiens chromosome 2 genomic contig, reference assembly (68373980) -2/+3 50(21,-,0) - 54(0.009) 4487..4636 - 5324220..5324369

If you have any questions, comments or suggestions, please contact me.


  © 2011–2012 Kirill Kryukov
This page is available under the CC BY 3.0 License