Phobius

A combined transmembrane topology and signal peptide predictor


Ask Phobius!
Normal prediction Constrained prediction PolyPhobius Instructions Download

Instructions


This server is for prediction of transmembrane topology and signal peptides from the amino acid sequence of a protein.

Phobius is described in:

PolyPhobius is described in:

The Phobius webserver is described in:

Please cite.

Input

The program takes proteins in FASTA format. It  recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown. Any other character is changed to X, so please make sure the sequences are sensible proteins

This is an example (one protein):

>Q8TCT8|PSL2_HUMAN you can have comments after the ID
MGPQRRLSPAGAALLWGFLLQLTAAQEAILHASGNGTTKDYCMLYNPYWTALPSTLENAT
SISLMNLTSTPLCNLSDIPPVGIKSKAVVVPWGSCHFLEKARIAQKGGAEAMLVVNNSVL
FPPSGNRSEFPDVKILIAFISYKDFRDMNQTLGDNITVKMYSPSWPNFDYTMVVIFVIAV
FTVALGGYWSGLVELENLKAVTTEDREMRKKKEEYLTFSPLTVVIFVVICCVMMVLLYFF
YKWLVYVMIAIFCIASAMSLYNCLAALIHKIPYGQCTIACRGKNMEVRLIFLSGLCIAVA
VVWAVFRNEDRWAWILQDILGIAFCLNLIKTLKLPNFKSCVILLGLLLLYDVFFVFITPF
ITKNGESIMVELAAGPFGNNEKLPVVIRVPKLIYFSVMSVCLMPVSILGFGDIIVPGLLI
AYCRRFDVQTGSSYIYYVSSTVAYAIGMILTFVVLVLMKKGQPALLYLVPCTLITASVVA
WRRKEMKKFWKGNSYQMMDHLDCATNEENPVISGEQIVQQ

How to run it

Either give the name of the local file in which you have the proteins in the top half of the window, or paste the sequence(s) into the lower part of the window.  Then press `Submit'. (It should be possible to both give it a local file and paste sequences if you really want.)
 
 

Output

There are two output formats: Long and short.

Long output format

For the long format (default), Phobius gives a list of the location of the predicted transmembrane helices, the predicted location of the intervening loop regions and signal peptide.

Here is an example:

ID   MTH_DROMEa signal peptide
FT   SIGNAL        1     24       
FT   REGION        1      3       N-REGION.
FT   REGION        4     19       H-REGION.
FT   REGION       20     24       C-REGION.
FT   TOPO_DOM     25    218       NON CYTOPLASMIC.
FT   TRANSMEM    219    238       
FT   TOPO_DOM    239    249       CYTOPLASMIC.
FT   TRANSMEM    250    269       
FT   TOPO_DOM    270    280       NON CYTOPLASMIC.
FT   TRANSMEM    281    302       
FT   TOPO_DOM    303    321       CYTOPLASMIC.
FT   TRANSMEM    322    342       
FT   TOPO_DOM    343    371       NON CYTOPLASMIC.
FT   TRANSMEM    372    391       
FT   TOPO_DOM    392    421       CYTOPLASMIC.
FT   TRANSMEM    422    439       
FT   TOPO_DOM    440    450       NON CYTOPLASMIC.
FT   TRANSMEM    451    476       
FT   TOPO_DOM    477    514       CYTOPLASMIC.
//

If the whole sequence is labeled as cytoplasmic or non cytoplasmic, the prediction  is that it contains no membrane
helices.  It is not wise to interpret it as a prediction of location. The prediction gives the most probable location and orientation of transmembrane helices in the sequence. It is found by an algorithm called N-best (or 1-best in this case) that sums over all paths through the model with the same location and direction of the helices.

Plot of probabilities

The plot shows the posterior probabilities of cytoplasmic/non cytoplasmic/TM helix/signal peptide. Here one can see possible weak TM helices that were not predicted,  and one can get an idea of the certainty of each segment in the prediction.

At the bottom of the plot (between -0.04 and 0) the N-best prediction is shown.

The plot is obtained by calculating the total probability that a  residue belongs to a helix, cytoplasmic, or non cytoplasmic summed over all possible  paths through the model.  Sometimes it seems like the plot and the prediction are contradictory, but that is because the plot shows probabilities for each residue, whereas the prediction is the overall most probable structure.  Therefore the plot should be seen as a complementary source of information.

Short output format

In the short output format one line is produced for each protein with no graphics. Each line starts with the sequence identifier and then these fields:
  • "TM":The number of predicted transmembrane segments.
  • "SP": Y/N indicator if a signal peptide was predicted or not.
  • "PREDICTION": Predicted topology of the protein
  • For the example above the short output would be:

    SEQENCE ID            TM SP PREDICTION
    MTH_DROME              7  Y n4-19c24/25o219-238i250-269o281-302i322-342o372-391i422-439o451-476i
    

    The topology is given as the position of the transmembrane helices separated by 'i' if the loop is on the cytoplasmic or 'o' if it is on the non cytoplasmic side. A signal peptide is given by the position of its h-region separated by a n and a c, and and the position of the last amino acid in the signal peptide and the first of the mature protein separated by a /. In above example n4-19c24/25o219-238i250-269o281-302i322-342o372-391i422-439o451-476i means that the protein is predicted to contain a signal peptide with a h-region between position 4 and 19 that is cleaved between position 24 and 25. It is followed by a non cytoplasmic loop and a TM segment between position 219 and 238, which is followed by a cytoplasmic loop, etc.

    Constrained prediction

    Sometimes the location of a certain region of a protein is known in advance. Typically antibody or fusion experiments could reveal such positions. In such cases the accuracy of the prediction is greatly increased by setting constraints on the prediction. You can do so by entering positions under appropriate localization as a space separated list. Here is an example:

    Membrane:
    Cytoplasmic loop:
    Non-cytoplasmic loop:
    Check if sequence is known to contain a signal peptide

    These setting would result in a prediction of Phobius with the amino acid 220-222, 380, and 460 in the membrane, and amino acid 315 as well as the C-terminus in the cytoplasm and a signal peptide.
     

    PolyPhobius Predictions - Including homologs

    Since both transmembrane topology and signal peptides are features that are likely to be conserved during evolution, we have included an option to use information from homologs in the prediction. You have the option to let Phobius use the NCBI's Blast server to scan the nr (non-redundant) database after homologs and Kalign to align them, or provide your own alignment in aligned FASTA format. In the latter the final prediction will be given only for the first sequence in the alignment.

    The usage of the Blast retrieval is time consuming and we recommend you to only use the method for single sequences.

    Licensing Phobius

    Phobius is freely available for local installation for academic use. It runs on Unix platforms with Perl version 5.6 or later.
    To download a copy, please fill in the the online licensing form.

    Getting help

    Please contact: