Query Sequence
  (*Mandatory)
Enter sequence below in FASTA format[Help] or plain text
Example : 
Select Program
Program [Help]
Filters
Filter [Help]
Expect [Help]

Search Hints

Note
  • This search proviedes similarity (BLAST) and conservation (PASS) search for the 31 algae sequences of Alga-PrAS with users own sequences.
  • BLAST raw data can be downloaded from 'Text Download (Raw Data)' button.

BLASTP

Compares an amino acid query sequence against a protein sequence database.

BLASTX

Compares a nucleotide query sequence translated in all reading frames against a protein sequence database.
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
A adenosine R G A (purine) W A T (weak) N A G C T (any)
C cytidine Y T C (pyrimidine) B G T C - gap of indeterminate length
G guanine K G T (keto) D G A T
T thymidine M A C (amino) H A C T
U uridine S G C (strong) V G C A
For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:
A alanine H histidine Q glutamine Y tyrosine
B aspartate or asparagine I isoleucine R arginine Z glutamate or glutamine
C cystine K lysine S serine X any
D aspartate L leucine T threonine * translation stop
E glutamate M methionine U selenocysteine - gap of indeterminate length
F phenylalanine N asparagine V valine
G glycine P proline W tryptophan
Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences.
Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs.
It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.
The statistical significance threshold for reporting matches against database sequences; the default value is 1e-5, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable.
Conserved protein region is calculated with the PASS tool which determines the N-terminal site and the C-terminal site of conserved protein region among diverse organisms with the BLAST program. The region is related to structural and functional units.

Result example: 10–100(15, 20)
This case, "10–100", shows that the conserved protein region is located at amino acid residues 10 to 100 of the query sequence. Even if the query sequence is nucleic acid sequence, the position is shown as amino acid residues after the conversion of the nucleic acid sequence to the protein sequence. "(15, 20)" means the numbers of hit sequences which are involved in the decision of the N-terminal and C-terminal positions of the conserved protein region.
RIKEN Center for Sustainable Resource Science : Integrated Genome Informatics Research Unit  |  Contact us  |  Website policy  Creative Commons License