BLAST Supercompressor

About

BLAST Compressor can compress the BLAST output making it over 10x smaller, while preserving most of the data. Sometimes you don't need all of that data, and you may want to compress even more. This is the purpose of this script. It takes the output of BLAST Compressor, extracts only some hits and some information about those hits, and outputs it in a more compact format.

This tool was made by Kirill Kryukov in Saitou lab, NIG. I share it with the hope that it can be useful, but without any warranties.

News

2012-04-19 – This page is created, version 0.1.0 is uploaded.

Download

(Distributed under the zlib/libpng license, see the source file for details)

Details

What hits are stored and what hits are removed? - For each pair of query and database sequences only a single hit will be preserved - the one with the lowest e-value (or the one with the highest score among those with the same e-value). Note: This script does not know about multi-part hits, so the best idea is to use it with blastn output.

What information is stored about each hit? - 1. Query sequence ID. 2. Database sequence ID. 3. Bit-score. 4. E-value. Everything else is eliminated. (It's not hard to produce a different other output format, please let me know if you have any requests).

Usage

perl blast_supercompressor.pl <compact.txt >supercompact.txt

(You need Perl interpreter to run this script).

Example output (fragment)

" 1K.4 2-. / " 1G.8 2-- )1 | = 1F.0 8-- L 1D.2 3-, ,' )% &c #N Y $ 1B.4 1-+ +p 'r #} "a *

Colors legend: query sequence ID, database sequence IDs, bit-score, e-value.

If you have any questions, comments or suggestions, please contact me.


  © 2012 Kirill Kryukov
This page is available under the CC BY 3.0 License