A combined transmembrane topology and signal peptide predictor
|Normal prediction||Constrained prediction||PolyPhobius||Instructions||Download|
Phobius is described in:
PolyPhobius is described in:
The Phobius webserver is described in:
This is an example (one protein):
>Q8TCT8|PSL2_HUMAN you can have comments after the ID MGPQRRLSPAGAALLWGFLLQLTAAQEAILHASGNGTTKDYCMLYNPYWTALPSTLENAT SISLMNLTSTPLCNLSDIPPVGIKSKAVVVPWGSCHFLEKARIAQKGGAEAMLVVNNSVL FPPSGNRSEFPDVKILIAFISYKDFRDMNQTLGDNITVKMYSPSWPNFDYTMVVIFVIAV FTVALGGYWSGLVELENLKAVTTEDREMRKKKEEYLTFSPLTVVIFVVICCVMMVLLYFF YKWLVYVMIAIFCIASAMSLYNCLAALIHKIPYGQCTIACRGKNMEVRLIFLSGLCIAVA VVWAVFRNEDRWAWILQDILGIAFCLNLIKTLKLPNFKSCVILLGLLLLYDVFFVFITPF ITKNGESIMVELAAGPFGNNEKLPVVIRVPKLIYFSVMSVCLMPVSILGFGDIIVPGLLI AYCRRFDVQTGSSYIYYVSSTVAYAIGMILTFVVLVLMKKGQPALLYLVPCTLITASVVA WRRKEMKKFWKGNSYQMMDHLDCATNEENPVISGEQIVQQ
Here is an example:
ID MTH_DROMEa signal peptide FT SIGNAL 1 24 FT REGION 1 3 N-REGION. FT REGION 4 19 H-REGION. FT REGION 20 24 C-REGION. FT TOPO_DOM 25 218 NON CYTOPLASMIC. FT TRANSMEM 219 238 FT TOPO_DOM 239 249 CYTOPLASMIC. FT TRANSMEM 250 269 FT TOPO_DOM 270 280 NON CYTOPLASMIC. FT TRANSMEM 281 302 FT TOPO_DOM 303 321 CYTOPLASMIC. FT TRANSMEM 322 342 FT TOPO_DOM 343 371 NON CYTOPLASMIC. FT TRANSMEM 372 391 FT TOPO_DOM 392 421 CYTOPLASMIC. FT TRANSMEM 422 439 FT TOPO_DOM 440 450 NON CYTOPLASMIC. FT TRANSMEM 451 476 FT TOPO_DOM 477 514 CYTOPLASMIC. //
If the whole sequence is labeled as cytoplasmic or non cytoplasmic, the prediction
is that it contains no membrane
helices. It is not wise to interpret it as a prediction of location. The prediction gives the most probable location and orientation of transmembrane helices in the sequence. It is found by an algorithm called N-best (or 1-best in this case) that sums over all paths through the model with the same location and direction of the helices.
At the bottom of the plot (between -0.04 and 0) the N-best prediction is shown.
The plot is obtained by calculating the total probability that a residue belongs to a helix, cytoplasmic, or non cytoplasmic summed over all possible paths through the model. Sometimes it seems like the plot and the prediction are contradictory, but that is because the plot shows probabilities for each residue, whereas the prediction is the overall most probable structure. Therefore the plot should be seen as a complementary source of information.
"TM":The number of predicted transmembrane segments. "SP": Y/N indicator if a signal peptide was predicted or not. "PREDICTION": Predicted topology of the protein
For the example above the short output would be:
SEQENCE ID TM SP PREDICTION MTH_DROME 7 Y n4-19c24/25o219-238i250-269o281-302i322-342o372-391i422-439o451-476i
The topology is given as the position of the transmembrane helices separated by 'i' if the loop is on the cytoplasmic or 'o' if it is on the non cytoplasmic side. A signal peptide is given by the position of its h-region separated by a n and a c, and and the position of the last amino acid in the signal peptide and the first of the mature protein separated by a /. In above example n4-19c24/25o219-238i250-269o281-302i322-342o372-391i422-439o451-476i means that the protein is predicted to contain a signal peptide with a h-region between position 4 and 19 that is cleaved between position 24 and 25. It is followed by a non cytoplasmic loop and a TM segment between position 219 and 238, which is followed by a cytoplasmic loop, etc.
Sometimes the location of a certain region of a protein is known in advance.
Typically antibody or fusion experiments could reveal such positions. In such cases the
accuracy of the prediction is greatly increased by setting constraints on the prediction.
You can do so by entering positions under appropriate localization as a space separated list.
Here is an example:
These setting would result in a prediction of Phobius with the amino acid 220-222, 380, and 460 in the membrane, and amino acid 315 as well as the C-terminus in the cytoplasm and a signal peptide.
Since both transmembrane topology and signal peptides are features that are likely to be conserved during evolution, we have included an option to use information from homologs in the prediction. You have the option to let Phobius use the NCBI's Blast server to scan the nr (non-redundant) database after homologs and Kalign to align them, or provide your own alignment in aligned FASTA format. In the latter the final prediction will be given only for the first sequence in the alignment.
The usage of the Blast retrieval is time consuming and we recommend you to only use the method for
Phobius is freely available for local installation for academic
use. It runs on Unix platforms with Perl version 5.6 or later.
To download a copy, please fill in the the online licensing form.