ORF Finder

About

This is a simple ORF finder. Made for demonstrating how ORF-finding works, and for flexible output formatting.

Takes FASTA-formatted nucleotide sequences as standard input.
Produces amino-acid ORF sequences in either FASTA or tab-separated format (name tab sequence), into standard output.
Supports vertebrate nuclear and mitochondrial genetic codes (more can be added easily).
Can produce amino-acid or nucleotide ORF sequences.
Can skip short ORFs.
Can be made to process both strands, or only forward strand.
Output ORF names can be configured using a user-specified pattern.

No installation is needed, just unpack it anywhere. It's a perl script, so it only depends on perl interpreter.

orf-finder.pl [OPTIONS] <nucleotide.fa >orfs.fa

Options:

--transl-table 2 - Use vertebrate mitochondrion genetic code (default is vertebrate nuclear code).

--min-length 10 - Only find ORFs that are at least 10 amino-acids long (default: 1).

--nucleotide - Output nucleotide ORF sequences (default is amino-acid).

--use-only-forward-strand - Use only forward strand of the input (default: both strands).

--use-only-reverse-strand - Use only reverse strand of the input (default: both strands).

--out-format tabular - Produce tabular output (default is FASTA).

--name-format '{ACC}-{START}-{END} {COMMENT}' - One common ORF name format.

--name-format '{ACC}-{MIN}-{MAX}-{STRAND} {COMMENT}' - Another common ORF name format.

--num-x-to-split 10 - Divide into multiple ORFS on runs of 10 Xs or longer (XXXXXXXXXX or longer) (default: 5).