|Year : 2018 | Volume
| Issue : 2 | Page : 106-111
Effectiveness assessment of protein drugs and vaccines through in Silico analysis
Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
|Date of Web Publication||14-Jun-2018|
Dr. Tayebeh Farhadi
Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, P. O. Box 71345-1583, Shiraz
Source of Support: None, Conflict of Interest: None
Informatics studies represent the useful tool for development of the biologic-based therapeutics including proteins, antibodies, and vaccines. In this context, the bioinformatics studies are able to give more insights into the assessment of the effectiveness of the protein constructs, which have been developed as biologic-based therapeutics. Such investigations need a variety of data processing and analysis based on existing knowledge of gene functions and molecular interactions. Here, it is attempted to review some in silico approaches to assess physicochemical properties, posttranslational modification, hydrophobic behavior, solvent accessibility, allergenicity, and other properties of protein drugs and vaccines. Besides, in a part of this review, in silico strategies that are applicable during codon optimization of a recombinant protein are discussed.
Keywords: In silico assessment, protein drugs, protein vaccines
|How to cite this article:|
Farhadi T. Effectiveness assessment of protein drugs and vaccines through in Silico analysis. Biomed Biotechnol Res J 2018;2:106-11
|How to cite this URL:|
Farhadi T. Effectiveness assessment of protein drugs and vaccines through in Silico analysis. Biomed Biotechnol Res J [serial online] 2018 [cited 2022 Jan 24];2:106-11. Available from: https://www.bmbtrj.org/text.asp?2018/2/2/106/234452
| Introduction|| |
Nowadays, informatics studies represent the useful tool for development of the biologic-based therapeutics including proteins, antibodies, and vaccines., Overall, a variety of bioinformatics analyses and omic-based system biology experiments can be utilized to address scientific questions concerning biologic-based therapeutic for different purposes., In this context, the bioinformatics studies are able to give more insights into the assessment of the effectiveness of the protein constructs, which have been developed as biologic-based therapeutics. Such investigations need a variety of data processing and analysis based on existing knowledge of gene functions and molecular interactions.
Wet-laboratory experiments are the best way to assess the effectiveness of a biologic-based therapeutic but sometimes are very costly. However, there should be sufficient side evidences, and systematic analysis to show a therapeutic protein is effective. Here, it is attempted to review some in silico approaches to assess such therapeutics based on essential criteria including physicochemical properties, posttranslational modification, hydrophobic behavior, solvent accessibility, allergenicity, and other properties. Besides, in the case of recombinant therapeutic proteins, in silico strategies are applicable while codon optimization and reverse translation of a desired protein. In a part of this review, such usefulness strategies are discussed.
| Evaluation of the Physicochemical Parameters|| |
Protein sequence statistics for proteins include amino acid composition, theoretical pI, instability index, aliphatic index, grand average of hydropathicity (GRAVY), in vitro and in vivo half-life, and molecular weight.
Some software and online servers such as ProtParam tool (http://web.expasy.org/protparam/) and SOLpro server (http://scratch.proteomics.ics.uci.edu/) are available to predict mentioned characteristics. ProtParam results show the physicochemical properties of uncharacterized proteins. SOLpro server (http://scratch.proteomics.ics.uci.edu/) is utilized to predict the propensity of protein solubility on overexpression in Escherichia More Details coli. In SOLpro, a two-stage Support vector machine (SVM) architecture procedure based on multiple representations of the primary sequence  is run. The overall accuracy of this tool is estimated over 74% employing multiple runs of tenfold cross-validation.
The pI is defined a pH at which the surface of a protein carries the electric charge, but the net charge is zero. A protein with the pI <7 is acidic in nature. In contrast, a protein with the calculated value of pI >7 is categorized as basic. Estimation of pI value may be valuable to provide buffer systems for purification purposes. An instability index provides a stability prediction of a protein in vitro. A protein with instability index ≤40 is predicted to be stable and instability index >40 predicts that the protein is unstable. Aliphatic index predicts the relative volume occupied by aliphatic side chains of a protein, and a protein with the high aliphatic index is predicted to be highly thermostable. A protein with a high volume of aliphatic side chains of Ala, Val, Ilu, and Leu is predicted to be thermostable. Proteins with GRAVY ≤0 are forecasted as hydrophilic that has good interaction with the surrounding water molecules and proteins with GRAVY >0 are determined as hydrophobic.
In some previous studies, we used several web servers to predict physicochemical characteristics of some designed proteins including a novel vaccine adjuvant candidate, a multiepitope peptide vaccine based on Omps of Klebsiella pneumoniae, a DNA vaccine based on MOMP of Chlamydia trachomatis, and a designed chimeric DNA vaccine against Salmonella More Details enterica based on SopB and GroEL proteins.
| Posttranslational Modification Analysis|| |
In the field of biotechnology, a prokaryotic gene may be engineered to express in a eukaryotic host for different therapeutic purposes such as multiepitope and DNA vaccines. Overall, bacterial proteins with no target posttranslational site for modifications in their origin organism may include target motifs for the mentioned modifications in a eukaryotic host such as human cells. It is demonstrated that posttranslational modifications of a desired protein can alter its physicochemical characteristics such as stability, folding, activity, solubility, and the interaction with a ligand. However, the impact of posttranslational modifications on immunogenicity, antigenicity, and protective efficiency of an engineered prokaryotic gene for expression in the human body is not clear.
To predict the possibility of occurrence the certain posttranslational modification in a protein, a number of web servers such as NetNGlyc, NetCGlyc, NetOGlyc, NetAcet, and NetPhos are accessible at http://www.cbs.dtu.dk/services/. All the mentioned servers give neural network predictions. The NetNGlyc server predicts N-glycosylation sites in amino acid length of an expressed protein in human cells through examination of Asn-Xaa-Ser/Thr sequences. NetCGlyc and NetOGlyc web tools predict C-mannosylation sites and mucin-type GalNAc O-glycosylation sites in expressed proteins in a mammalian cell, respectively., For an expressed protein in a eukaryotic cell, the NetAcet web tool predicts substrates of N-acetyltransferase A, and NetPhos server predicts substrates of Ser, Thr, and Tyr phosphorylation sites.
| Calculation of Hydrophobic Regions|| |
Hydrophobicity is one of the important physicochemical characteristics and has a crucial role in expression of foreign proteins in various host systems. Therefore, some software are developed to evaluate hydrophobic behavior of amino acid sequences of proteins. For example, in BioEdit software, different methods such as Kyte and Doolittle algorithm  are exploited. In the algorithm, peak and well regions in a profile diagram exhibit hydrophobicity and hydrophilicity of a desired protein, respectively.
In one of our previously published articles, the hydrophobic property of a protein vaccine, designed based on Omps of K. pneumoniae, was predicted using algorithm of Kyte and Doolittle [Figure 1]; retrieved from the previous published article]. Considering to the figure, peak and trough regions show hydrophobicity and hydrophilicity, respectively.
|Figure 1: The hydrophobic profile of a designed vaccinal protein drawn utilizing the algorithm of Kyte and Doolittle. Size of window is 21. Well regions signify hydrophilicity and are antigenic regions. Regions above the threshold (0) are predicted to be hydrophobic regions|
Click here to view
| Solvent Accessibility Prediction|| |
Accessible surface areas (ASAs) are the exposed surface area of a protein (or residue) that water molecule could access and its values can be utilized to evaluate novel determined protein structures.
Recently, Volume, Area, Dihedral Angle Reporter (VADAR) server (http://redpoll.pharmacy.ualberta.ca/vadar/) has been developed to assess solvent accessibility of different residues. VADAR predicts ASAs both the entire protein and individual residues. In VADAR server, the residue ASA (RES ASA) and fractional ASA (FRAC ASA) can be predicted. The value under the RES ASA column represents ASA calculated in square angstroms, and the value under the FRAC ASA column represents the fractional ASA.
In 2017, we performed the VADAR server to predict ASA of a designed chimeric protein as a vaccine candidate against S. enterica. The result of RES ASA assessment disclosed that certain amino acids such as Lys130, Lys159, Lue69, Gln52, Lys50, and Arg118 have a highly accessible value compared to other amino acids in the protein construct. The result of FRAC ASA prediction is shown in [Figure 2] (retrieved from the mentioned published article). Considering to the figure, hydrophilic residues typically represented a high fraction of ASA while hydrophobic residues showed a minor fraction of ASA.
|Figure 2: Value prediction of fractional accessible surface area (ranging from 0.00–1.00) of a chimeric protein. Outer (hydrophilic) and inner (hydrophobic) residues have a high and low fraction of accessible surface area, respectively. X-axis displays “residue no.” of the protein and Y-axis signifies “fractional accessible surface area” of the protein construct|
Click here to view
| Prediction of Protein Antigenicity|| |
Recently, a growing number of protein drugs have obtained approval for product registration or come in clinical trials. Such therapeutics are generally administered at high doses over prolonged therapy duration. This may result in a problem related to their use including the improvement of protein-specific immune responses such as antibodies that can reduce drug potency and effectiveness. Undesirable immune responses against protein therapeutics can present safety risks to patients., Hence, proteins without immunodominant B- or/and T-cell epitopes, as antigenic determinants, would be assumed to be low immunogens and therefore suitable candidates to apply as therapeutics. The epitopes present in therapeutic proteins can be recognized and altered to decrease the immunogenicity of the therapeutic.
In case of vaccinal proteins, predictions of the epitopes have been favorably employed in efforts to enhance immunogenicity and effectiveness of such vaccines. Vaccinal proteins incorporating immunodominant B- and T-cell epitopes are capable to induce a broad humoral and cellular immune responses leading to provoke more efficient immune protection.
Determination of correct antigenic determinants through experimental analysis is costly and a lengthy procedure, and valid bioinformatic methods can simplify the finding of reliable epitopes.
Over the past 20 years, a number of computer algorithms have been developed and utilized for discovering immunodominant epitopes within protein agents of different sources. Such predictions of the epitopes have been favorably exploited in efforts to enhance immunogenicity and efficiency of vaccines.
In the field of protein therapeutics, the proteins have been monitored for T- or B-cell epitopes to assess and reduce the possible immunogenicity. Such predictions would make it possible to redesign proteins to make them less immunogenic or significantly rank protein candidates at the preclinical phase of drug development.
In 2017, aiming to predict B- and T-cell epitopes, we subjected a computational designed protein vaccine to valid and public available epitope prediction servers. Results of the epitope prediction showed that the designed vaccine candidate could induce B- and T-cell epitopes that yield a high immune response. In 2015, we predicted B- and T-cell epitopes of a computational designed protein vaccine based on Omps of Klebsiella pneumonia. Results showed that the construct contains immunodominant B-cell epitopes that can efficiently stimulate the humoral arm of the immune system.
To generate an overall antigenicity index of a protein, a number of in silico methods are available. ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) has been developed as a sequence-based, alignment-free, and pathogen-independent predictor of protein antigenicity. Using cross-validation experiments on the combined dataset, the accuracy of the predictor is computed 76%. VaxiJen (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) server can also be used for alignment-free prediction of entire protein antigenicity.
| Prediction of Protein Allergenicity|| |
To determine the allergenicity of a desired protein, AlgPred web server (http://www.imtech.res.in/raghava/algpred/) can be applied. AlgPred predicts the allergenicity based on similarity of experimentally recognized epitopes with any region of a query protein. At AlgPred, hybrid prediction method predicts protein allergenicity with a high accuracy (85% at a threshold − 0.4).
In 2017, we identified protein allergenicity from sequence-derived structural and physicochemical properties of the designed chimeric protein vaccine against S. enterica using the AlgPred (http://www.imtech.res.in/raghava/algpred) and APPEL servers (http://jing.cz3.nus.edu.sg/cgi-bin/APPEL).
| Reverse Translation and Codon Optimization|| |
Codon optimization is a useful method to enhance the expression efficiency of a foreign gene (such as DNA vaccines) in a host cells' body. It also helps to obtain optimum expression of a cloned gene in recombinant host cells. To improve the transcriptional and translational efficacy, codon usage can be optimized through improving different parameters including overall guanine-cytosine (GC) content of the gene, codon adaptation index (CAI), and frequency distribution of codons. The ideal range of GC content, as a measure of transcriptional and translational efficacy, is 30%–70%. CAI is an estimate of the relative adaptation of the codon usage bias for a particular DNA or RNA sequence and has a maximal theoretical value of 1.0. CAI ≥ 0.8 is suitable for maximum expression of a foreign gene in a different expression hosts. Codons with values of frequency distribution lower than 30% may decrease the expression efficiency.
Amino acid sequence of a foreign protein can be reverse translated into DNA sequence through software and online tools. Then, the DNA sequences can be adapted to a host codon usage through codon optimization tools. JCAT (http://www.jcat.de) and OPTIMIZER  are useful public available tools for reverse translation of proteins into DNA sequence and adaptation of the DNA sequences to desired host codon usage.
| Other Properties|| |
Protein structure analysis containing the prediction of transmembrane areas, disulfide bridges, and disordered regions of a protein can be evaluated using bioinformatics tools such as PredictProtein server.
The local complexity in amino acid sequence of a protein can be assessed through sequence (SEG) algorithm (http://ncbi.nih.gov/pub/seg/seg/). In a protein molecule, high compositional complexity is related to a great portion of loops that are generally more exposed than the helices and sheets. In contrast, low compositional complexity is corresponding to the stretches of nonrandom and simplistic amino acid sequence order within a protein sequence. Low-complexity regions are usually plentiful and display a high association with disordered areas in a protein.
In a previous study, we utilized SEG algorithm to assess the local complexity in amino acid sequence of the designed chimeric protein vaccine against S. enterica. Results showed that a high percentage of the amino acid residues of the chimeric protein had a high sequence complexity except the regions that were performed as linkers. It is reported that high compositional complex regions would be assigned as the antigenic regions of a protein sequence.
Electrostatic interactions affect protein flexibility that is necessary for motility of the protein fragments associated to each other and respect to surrounding environment. Hence, electrostatic energies play a significant role in protein structure and function. Prediction of electrostatic characteristics contributes to recognition of functional regions of a protein at the solvent accessible surface. As an example, we estimated and evaluated the electrostatic potential of the designed protein vaccine against S. enterica using the PCE web service (http://bioserv.rpbs.jussieu.fr/PCE). [Figure 3] (retrieved from the mentioned article) shows different views of electrostatic potential distributions on the chimeric protein surface ranging from −3.0 kcal/mol/e (red) to +3.0 kcal/mol/e (blue). Considering to the PCE analysis, at the surface of a protein, positive electrostatic potential areas (blue) are mainly associated with values >2 kcal/mol/e and negative electrostatic potential regions, mostly generated by negatively charged residues, are attributed to values < −2 kcal/mol/e.
|Figure 3: Different views of electrostatic potential distributions on surface of a designed protein vaccine that was visualized through protein continuum electrostatics server. The potential distribution ranges from -3.0 kcal/mol/e (red) to +3.0 kcal/mol/e (blue)|
Click here to view
| Case Study and Example Works|| |
Among 3437 approved therapeutics by FDA (in 2018), there are 277 protein-based therapies (8.05%) (https://www.drugbank.ca/biotech_drugs). For example, Aldesleukin is an important protein drug approved by FDA. Aldesleukin is a recombinant protein used to treat the adults with metastatic renal cell carcinoma (https://www.drugbank.ca/drugs/DB00041). In this study, it was attempted to provide detail in silico methodology for predicting the effectiveness of the protein drug as a case study. The sequence of Aldesleukin was retrieved from UniProtKB database at http://www.uniprot.org/(UniProt ID: P60568). The sequence was subjected to in silico assessments of pI, instability index, aliphatic index, and GRAVY. The antigenicity and allergenicity of the protein were also predicted. All assessments were done using the computational methods mentioned in the previous sections. The results of the in silico evaluations are represented in [Table 1]. Considering to the results, the protein drug has suitable physicochemical and immunological properties [Table 1].
|Table 1: Effectiveness assessment of Aldesleukin using in silico methodology|
Click here to view
In recent years, researchers have been interested to design and assess different protein drugs and vaccines through in silico approach. [Table 2] shows a summary of two previous works and their conclusions , as some examples of designing and evaluating the vaccine proteins.
|Table 2: Summary and important findings of previous works in protein design field|
Click here to view
While developing different algorithms able to assess various properties of a protein, it is essential to evaluate the accuracy of the algorithm using several real data sets. All techniques discussed in this review have been reported to have high accuracy.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Ranjbar MM, Ghorban K, Alavian SM, Keyvani H, Dadmanesh M, Roayaei Ardakany A, et al.
GB virus C/Hepatitis G virus envelope glycoprotein E2: Computational molecular features and immunoinformatics study. Hepat Mon 2013;13:e15342.
He Y. Vaccine adjuvant informatics: From data integration and analysis to rational vaccine adjuvant design. Front Immunol 2014;5:32.
Sayers S, Ulysse G, Xiang Z, He Y. Vaxjo: A web-based vaccine adjuvant database and its application for analysis of vaccine adjuvants and their uses in vaccine development. J Biomed Biotechnol 2012;2012:831486.
He Y, Rappuoli R, De Groot AS, Chen RT. Emerging vaccine informatics. J Biomed Biotechnol 2010;2010:218590.
Farhadi T, Hashemian SM. Constructing novel chimeric DNA vaccine against Salmonella enterica
based on SopB and GroEL proteins: An in silico
approach. J Pharm Investig 2017.[Doi: 10.1007/s40005-017-0360-6].
Farhadi T, Nezafat N, Ghasemi Y, Karimi Z, Hemmati S, Erfani N. Designing of complex multi-epitope peptide vaccine based on Omps of Klebsiella pneumoniae
: An in silico
approach. Int J Pept Res Ther 2015;21:325-41.
O'Hagan DT, Fox CB. New generation adjuvants – From empiricism to rational design. Vaccine 2015;33 Suppl 2:B14-20.
Magnan CN, Randall A, Baldi P. SOLpro: Accurate sequence-based prediction of protein solubility. Bioinformatics 2009;25:2200-7.
Rosenberg M 2nd
. Protein Analysis and Purification: Bench top Techniques. Boston: Birkhauser; 2005.
Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, et al
. Protein identification and analysis tools on the ExPASy server. In: Walker JM, editor. The Proteomics Protocols Handbook. Clifton, UK: Humana Press; 2005. p. 571-607.
Farhadi T, Ovchinnikov RS, Ranjbar MM. In silico
designing of some agonists of toll-like receptor 5 as a novel vaccine adjuvant candidates. Netw Model Anal Health Inform Bioinforma 2016;5:31. [Doi: 10.1007/s13721-016-0138-1]
Farhadi T, Ranjbar MM. Designing and modeling of complex DNA vaccine based on MOMP of Chlamydia trachomatis
: An in silico
approach. Netw Model Anal Health Inform Bioinforma 2017;6:1. [Doi: 10.1007/s13721-016-0142-5].
Kowalczyk DW, Ertl HC. Immune responses to DNA vaccines. Cell Mol Life Sci 1999;55:751-70.
Balen B, Krsnik-Rasol M. N-glycosylation of recombinant therapeutic glycoproteins in plant systems. Food Technol Biotechnol 2007;45:1-10.
Dowling W, Thompson E, Badger C, Mellquist JL, Garrison AR, Smith JM, et al.
Influences of glycosylation on antigenicity, immunogenicity, and protective efficacy of Ebola virus GP DNA vaccines. J Virol 2007;81:1821-37.
Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 2003;31:3692-7.
Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT, et al.
Precision mapping of the human O-GalNAc glycoproteome through simpleCell technology. EMBO J 2013;32:1478-88.
Julenius K. NetCGlyc 1.0: Prediction of mammalian C-mannosylation sites. Glycobiology 2007;17:868-76.
Kiemer L, Bendtsen JD, Blom N. NetAcet: Prediction of N-terminal acetylation sites. Bioinformatics 2005;21:1269-70.
Blom N, Gammeltoft S, Brunak S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 1999;294:1351-62.
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982;157:105-32.
Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, et al.
VADAR: A web server for quantitative evaluation of protein structure quality. Nucleic Acids Res 2003;31:3316-9.
Tangri S, Mothé BR, Eisenbraun J, Sidney J, Southwood S, Briggs K, et al.
Rationally engineered therapeutic proteins with reduced immunogenicity. J Immunol 2005;174:3187-96.
Casadevall N, Nataf J, Viron B, Kolta A, Kiladjian JJ, Martin-Dupont P, et al.
Pure red-cell aplasia and antierythropoietin antibodies in patients treated with recombinant erythropoietin. N Engl J Med 2002;346:469-75.
Gershon SK, Luksenburg H, Coté TR, Braun MM. Pure red-cell aplasia and recombinant erythropoietin. N Engl J Med 2002;346:1584-6.
Berzofsky JA, Ahlers JD, Belyakov IM. Strategies for designing and optimizing new generation vaccines. Nat Rev Immunol 2001;1:209-19.
De Groot AS, Berzofsky JA. From genome to vaccine – New immunoinformatics tools for vaccine design. Methods 2004;34:425-8.
He Y, Rappuoli R, De Groot AS, Chen RT. Vaccine informatics. J Biomed Biotechnol 2010; 2010. doi:10.1155/2010/765762.
De Groot AS, Sbai H, Aubin CS, McMurry J, Martin W. Immuno-informatics: Mining genomes for vaccine components. Immunol Cell Biol 2002;80:255-69.
Magnan CN, Baldi P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 2014;30:2592-7.
Saha S, Raghava GP. AlgPred: Prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res 2006;34:W202-9.
Sandhu KS, Pandey S, Maiti S, Pillai B. GASCO: Genetic algorithm simulation for codon optimization. In Silico
Sharp PM, Li WH. The codon adaptation index – A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 1987;15:1281-95.
Puigbò P, Guzmán E, Romeu A, Garcia-Vallvé S. OPTIMIZER: A web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res 2007;35:W126-31.
Rost B, Yachdav G, Liu J. The PredictProtein server. Nucleic Acids Res 2004;32:W321-6.
Xiao S, Huang Y, Xiao Y. Local complexity of protein sequences. Int J Mod Phys 2003;14:1191-9.
Wan H, Wootton JC. A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins. Comput Chem 2000;24:71-94.
Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, et al.
Intrinsically disordered protein. J Mol Graph Model 2001;19:26-59.
Bashford D. Macroscopic electrostatic models for protonation states in proteins. Front Biosci 2004;9:1082-99.
Kukić P, Nielsen JE. Electrostatics in proteins and protein-ligand complexes. Future Med Chem 2010;2:647-66.
Novinrooz A, Zahraei Salehi T, Firouzi R, Arabshahi S, Derakhshandeh A. In silico
design, expression, and purification of novel chimeric Escherichia coli
O157:H7 OmPa fused to LTB protein in Escherichia coli
. PLoS One 2017;12:e0173761.
Kianmehr A, Mohammadi HS, Shokrgozar MA, Omidinia E. In silico
design and analysis of a new hyperglycosylated analog of erythropoietin to improve drug efficacy. Adv Biomed Res 2015;4:142.
] [Full text]
[Figure 1], [Figure 2], [Figure 3]
[Table 1], [Table 2]