ORF Finder
About
This is a simple ORF finder. Made for demonstrating how ORF-finding works, and for flexible output formatting.
- Takes FASTA-formatted nucleotide sequences as standard input.
- Produces amino-acid ORF sequences in either FASTA or tab-separated format (name tab sequence), into standard output.
- Supports vertebrate nuclear and mitochondrial genetic codes (more can be added easily).
- Can produce amino-acid or nucleotide ORF sequences.
- Can skip too short or too long ORFs.
- Can be made to process both strands, or only forward strand.
- Output ORF names can be configured using a user-specified pattern.
Download
- orf-finder-0.1.4.zip - shared under the zlib/libpng license.
Installing
No installation is needed, just unpack it anywhere. It's a perl script, so it only depends on perl interpreter.
Usage
orf-finder.pl [OPTIONS] <nucleotide.fa >orfs.fa
Options:
--transl-table 2 - Use vertebrate mitochondrion genetic code (default is vertebrate nuclear code).
--min-aa-length 10 - Only find ORFs that are at least 10 amino-acids long (default: 1).
--max-aa-length 100000 - Only find ORFs that are at most 100000 amino-acids long (default: 0 = no limit).
--amino-acid - Output amino-acid ORF sequences (default).
--nucleotide - Output nucleotide ORF sequences (default is amino-acid).
--use-only-forward-strand - Use only forward strand of the input (default: both strands).
--use-only-reverse-strand - Use only reverse strand of the input (default: both strands).
--out-format tabular - Produce tabular output (default is FASTA).
--name-format '{ACC}-{START}-{END} {COMMENT}' - One common ORF name format.
--name-format '{ACC}-{MIN}-{MAX}-{STRAND} {COMMENT}' - Another common ORF name format.
--num-x-to-split 10 - Divide into multiple ORFS on runs of 10 Xs or longer (XXXXXXXXXX or longer) (default: 5).
--help - Show usage help.
--version - Show version.