• Users Online: 737
  • Print this page
  • Email this page

 Table of Contents  
Year : 2022  |  Volume : 6  |  Issue : 1  |  Page : 93-97

In silico approach for the identification of mirror repeats in selected operon genes of Escherichia coli strain K-12 substrain MG1655

Department of Microbiology and Biotechnology, School of Life Sciences, Starex University, Gurugram, Haryana, India

Date of Submission17-Jul-2021
Date of Acceptance13-Oct-2021
Date of Web Publication11-Mar-2022

Correspondence Address:
Dinesh Chandra Sharma
School of Life Sciences, Starex University, Gurugram, Haryana
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/bbrj.bbrj_146_21

Rights and Permissions

Background: The repeating elements in the genes or genomes of living organisms are associated with a variety of functions at the molecular level. Mirror repeats (MRs) are unique type of repeat sequences among them, which are found to be linked with H-DNA formation and they have also associated with several neurological disorders with many other functional roles are also being reported. Methods: The manual bioinformatics-based approach is used to identify the MRs in the genome. The applied approach FASTA-parallel complement-BLAST is used by following some simple steps to identify MRs. This methodology is initiated by the downloading of a sequence of interest in FASTA format followed by development of the parallel complement and final step of BLAST analysis. By using this approach, the present study identifies MRs in lac, trp, and ara operon genes of Escherichia coli str. K-12 substr. MG1655 (NC_000913.3). Results: Present investigation identified the frequent distribution MRs in all the analyzed operon genes. These identified MRs vary in their length or size. In case of lac, trp, and ara operon, maximum number of MRs reported in lacZ (61), trpE (40), and araE (41) genes, respectively. Conclusion: The frequent existence of MRs (shorter as well as larger length) in analyzed genes gives a hint about their significant roles in the genes or genomes of all bacterial species. These may be useful to study the evolutionary history of living world. These types of studies will be exploring new trends and tools of molecular biology research as well as development of new concept for MR identification.

Keywords: BLAST, FASTA, H-DNA, mirror repeats, reverse complement

How to cite this article:
Yadav S, Yadav U, Sharma DC. In silico approach for the identification of mirror repeats in selected operon genes of Escherichia coli strain K-12 substrain MG1655. Biomed Biotechnol Res J 2022;6:93-7

How to cite this URL:
Yadav S, Yadav U, Sharma DC. In silico approach for the identification of mirror repeats in selected operon genes of Escherichia coli strain K-12 substrain MG1655. Biomed Biotechnol Res J [serial online] 2022 [cited 2022 Oct 1];6:93-7. Available from: https://www.bmbtrj.org/text.asp?2022/6/1/93/339359

  Introduction Top

Genome is a key regulatory organization of genes in all living organisms which controls all types of evolutionary mechanism at cellular as well as molecular level.[1] Genome of prokaryotic organisms shows less level of complexity as compare to the genome of eukaryotic ones.[2] It is declared on the basis of previous researches that human genome comprises both coding and noncoding parts.[3],[4] Both translated and nontranslated regions of genome have various repeating elements which contributed in the essential functioning of the genome which will not be neglected.[5],[6],[7] These repeating elements have been identified for some crucial roles involved in the genomic level activities of the cell.[8],[9],[10] In the light of previous studies, it was identified that the presence of these different types of repeating elements including simple sequence repeats, tandem repeats, transposable elements, LINE sequences, direct or indirect repeats, consensus sequences, insertion sequences, and palindrome sequences has their own significant roles in the genome [Figure 1].[11],[12]
Figure 1: Depicted various types of repeat elements along with their general classification[12]

Click here to view

Among them, a special type of repeat sequence referred as mirror repeats (MRs) is also being reported in the genomes.[13] These are defined as the repeats having bilateral symmetry on the same strand. For example – AGTTCGTTGCTTGA, in the given sequence, one part AGTTCGT shared center of symmetry with other parts of the sequence on the same strand. These types of repeats (MRs) are found to be associated with various types of regulatory as well as functional roles in the genome. The most common association of these types of repeat sequences is found with H-DNA formation.[14] The reason behind the formation of triplex DNA by non-Watson–Crick base pairing is also because of the presence of MRs.[15] These MRs are also found to be associated with neurological disorders.[16],[17] Various bioinformatics tools are mentioned already to identify different types of repeats including MRs.[18],[19] The present study focused on the identification of MRs in arabinose, tryptophan, and lac operon genes of Escherichia coli str. K-12 substr. MG1655 (NC_000913.3) using a manual bioinformatics-based approach. The approach basically referred as FASTA-parallel complement-BLAST (FPCB) which utilizes some public domain databases to identify MRs. The present research work carried out on some selected operon genes instead of complete genome of mentioned E. coli strain because of its large genome size, and this is not feasible to handle the complete genome sequence while performing manual bioinformatics-based analysis.

  Methods Top

The methodology of MR's identification is basically referred as FPCB[20] which utilizes public domain databases[21] to identify MR sequences in any gene or genome sequence. This is a manual bioinformatics-based method in which MRs identified by following few essential steps [Figure 2]. The present methodology involved the downloading of gene or genome sequence/coding sequence (CDS) of interest in FASTA format, further making its parallel complement and final step of BLAST analysis between subject and query sequence. If the position no of the subject sequence is exactly reverse in context of query sequence, then only it will be an MR. Present investigations utilize FPCB to identify MR sequences in lac, trp, and ara operon genes of E. coli str. K-12 substr. MG1655.
Figure 2: Represent methodology of mirror repeat's identification. First step involves downloading sequence in FASTA format, followed by making its parallel complement and final step of BLAST analysis

Click here to view

The steps involve in this process are shown in the flowchart given below:

  1. Downloading gene or genome sequence in FASTA format: (Step-1)

    The sequences for the gene/CDS/genome of interest were downloaded in FASTA format using the following link-http://www.ncbi.nlm.nih.gov
  2. Making parallel complement: (Step-2)

    Once downloaded, the FASTA format of nucleotide sequences were pasted into Reverse Complement program using the following link- http://www.bioinformatics.org/sms/rev
  3. MR Identification: (Step-3)

    Both query and subject sequences in FASTA format has undergone the BLAST homology search using the below given link-http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE = MegaBlast & PROGRAM = blastn& BLAST_PROGRAMS = megaBlast & PAGE_TYPE=BlastSearch & BLAST_SPEC = blast2seq & QUERY= & SUBJECTS
  4. Analysis: If the position number is exactly reverse in subject and query sequence then it will be a MR.

  Results Top

The present investigation has been carried out by using a standard computational methodology which has identified various MR's sequences in lac, trp, and ara operon genes of E. coli str. K-12 substr. MG1655 [Table 1],[Table 2],[Table 3] for genes description]. The resulted MRs are included both perfect and imperfect types of repeats, which are also of varied length. (For details about sequence type, length, and their position in genes see supplementary file). The maximum number of MRs 41, 61, and 40 have been identified, respectively in ara, lac, and trp operon genes [Figure 3],[Figure 4],[Figure 5].
Figure 3: Represent distribution of mirror repeats in arabinose operon genes. The maximum and minimum no of MRs found in araE (41) and araD (14) genes, respectively. Also in case of other genes of arabinose operon, mirror repeats are frequently distributed as per the data

Click here to view
Figure 4: Represent distribution of mirror repeats in lac operon genes. The maximum and minimum number of MRs found in lacZ (61) and lacA (16) genes, respectively

Click here to view
Figure 5: Represent distribution of mirror repeats in tryptophan operon genes. The maximum and minimum no of MRs found in trpE (40) and trpL (0) genes, respectively

Click here to view
Table 1: Description of genes present in lac operon of Escherichia coli strain K-12 substrain MG1655

Click here to view
Table 2: Description of genes present in trp operon of Escherichia coli strain. K-12 substrain MG1655

Click here to view
Table 3: Description of genes present in ara operon of Escherichia coli strain K-12 substrain MG1655

Click here to view

  Discussion Top

The presence of repetitive sequences in genes or genomes of living beings always attracts researchers to solve their structural as well as functional roles in the cellular system.[22],[23],[24],[25] The present investigation identified the occurrence and distribution of MRs in selected operon genes of said E. coli strain is similar as like other repeat elements found in bacterial genome.[26] The presence of shorter MR's sequences was observed more frequently as compared to longer ones. Hence, this will provide a hint that MRs must be associated with some critical functional role. This study will also be helpful in context of bioinformatics and computational biology for the development of new bioinformatics-based tools for MR's identification. MRs will also be helpful for the characterization and classification of organisms which may be based upon MR profiling of the genomes. These are also can be used for therapeutics as well as diagnostics purposes which can also be accessed through in silico based studies.[27],[28],[29]

  Conclusion Top

On the basis of obtained data and its analysis is concluded here that MR's are the essential part of the genome and further they will also be used for evolutionary studies also.

Limitation of study

No limitation in the study samples.


The authors are thankful to Sh. Mohinder Singh Ji, Hon'ble Chancellor, Starex University, to provide facility to do the research. We are also wanted to acknowledge Dr. Vikash Bhardwaj for his help in designing the figure of methodology for the present research.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

Ethical Statement

This is to declare here that there is no method used unethically.

  References Top

Shapiro JA. Living organisms author their read-write genomes in evolution. Biology (Basel) 2017;6:42.  Back to cited text no. 1
Friedman N, Rando OJ. Epigenomics and the structure of the living genome. Genome Res 2015;25:1482-90.  Back to cited text no. 2
Clay O, Cacciò S, Zoubak S, Mouchiroud D, Bernardi G. Human coding and noncoding DNA: Compositional correlations. Mol Phylogenet Evol 1996;5:2-12.  Back to cited text no. 3
Shukla R, Sharma DC, Pathak N, Bajpai P. Genomic DNA isolation from high polyphenolic content Grewia asiatica L. leaf without using liquid nitrogen. Iran J Sci Technol Trans A Sci 2018;42:347-51.  Back to cited text no. 4
Slotkin RK. The case for not masking away repetitive DNA. Mob DNA 2018;9:1-4.  Back to cited text no. 5
Gouveia JG, Wolf IR, Vilas-Boas LA, Heslop-Harrison JS, Schwarzacher T, Dias AL. Repetitive DNA in the catfish genome: rDNA, microsatellites, and Tc1-mariner transposon sequences in imparfinis species (Siluriformes, Heptapteridae). J Hered 2017;108:650-7.  Back to cited text no. 6
de Oliveira TD, Kretschmer R, Bertocchi NA, Degrandi TM, de Oliveira EH, Cioffi MB, et al. Genomic organization of repetitive DNA in woodpeckers (Aves, Piciformes): Implications for karyotype and ZW sex chromosome differentiation. PLoS One 2017;12:e0169987.  Back to cited text no. 7
Huang TY, Chang CK, Kao YF, Chin CH, Ni CW, Hsu HY, et al. Parity-dependent hairpin configurations of repetitive DNA sequence promote slippage associated with DNA expansion. Proc Natl Acad Sci U S A 2017;114:9535-40.  Back to cited text no. 8
Hall AC, Ostrowski LA, Pietrobon V, Mekhail K. Repetitive DNA loci and their modulation by the non-canonical nucleic acid structures R-loops and G-quadruplexes. Nucleus 2017;8:162-81.  Back to cited text no. 9
Shapiro JA, von Sternberg R. Why repetitive DNA is essential to genome function. Biol Rev Camb Philos Soc 2005;80:227-50.  Back to cited text no. 10
Paço A, Freitas R, Vieira-Da-Silva A. Conversion of DNA sequences: From a transposable element to a tandem repeat or to a gene. Genes (Basel) 2019;10:1-16.  Back to cited text no. 11
Jurka J, Kapitonov VV, Kohany O, Jurka MV. Repetitive sequences in complex genomes: Structure and evolution. Annu Rev Genomics Hum Genet 2007;8:241-59.  Back to cited text no. 12
Belland RJ. H-DNA formation by the coding repeat elements of neisserial opa genes. Mol Microbiol 1991;5:2351-60.  Back to cited text no. 13
Mirkin SM, Frank-Kamenetskiil MD. H-DNA and related structures. Annu Rev Biophys Biomol Struct 1994;23:541-76.  Back to cited text no. 14
Taniguchi Y, Magata Y, Osuki T, Notomi R, Wang L, Okamura H, et al. Development of novel C-nucleoside analogues for the formation of antiparallel-type triplex DNA with duplex DNA that includes TA and dUA base pairs. Org Biomol Chem 2020;18:2845-51.  Back to cited text no. 15
LeProust EM, Pearson CE, Sinden RR, Gao X. Unexpected formation of parallel duplex in GAA and TTC trinucleotide repeats of Friedreich's ataxia. J Mol Biol 2000;302:1063-80.  Back to cited text no. 16
Heidenfelder BL, Makhov AM, Topal MD. Hairpin formation in Friedreich's ataxia triplet repeat expansion. J Biol Chem 2003;278:2425-31.  Back to cited text no. 17
Novák P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: A galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 2013;29:792-3.  Back to cited text no. 18
Saha S, Bridges S, Magbanua ZV, Peterson DG. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res 2008;36:2284-94.  Back to cited text no. 19
Bhardwaj V, Swapni G, Sitaram M, Kulbhushan S. FPCB: A Simple and Swift Strategy for Mirror Repeat Identification. Preprint arXiv: 1312.3869; 2013. Available from: https://arxiv.org/abs/13120.3869v1. [Last accessed on 2021 May 02].  Back to cited text no. 20
National Center for Biotechnology Information (NCBI). Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 1988. Available from: http://www.ncbi.nlm.nih.gov. [Last accessed on 2021 May 02].  Back to cited text no. 21
Machado CR, Glugoski L, Domit C, Pucci MB, Goldberg DW, Marinho LA, et al. Comparative cytogenetics of four sea Turtle species (Cheloniidae): G-Banding pattern and in situ localization of repetitive DNA units. Cytogenet Genome Res 2020;160:531-8.  Back to cited text no. 22
Liehr T. Repetitive elements in humans. Int J Mol Sci 2021;22:2072.  Back to cited text no. 23
Dias CA, Kuhn GC, Svartman M, Santos Júnior JE, Santos FR, Pinto CM, et al. Identification and characterization of repetitive DNA in the genus Didelphis Linnaeus, 1758 (Didelphimorphia, Didelphidae) and the use of satellite DNAs as phylogenetic markers. Genet Mol Biol 2021;44:e20200384.  Back to cited text no. 24
Sato H, Das S, Singer RH, Vera M. Imaging of DNA and RNA in living eukaryotic cells to reveal spatiotemporal dynamics of gene expression. Annu Rev Biochem 2020;89:159-87.  Back to cited text no. 25
Brazda V, Fojta M, Bowater RP. Structures and stability of simple DNA repeats from bacteria. Biochem J 2020;477:325-39.  Back to cited text no. 26
Farhadi T. Effectiveness assessment of protein drugs and vaccines through in Silico analysis. Biomed Biotechnol Res J 2018;2:106-11.  Back to cited text no. 27
  [Full text]  
Satpathy R. In silico modeling and docking study of potential helicase (nonstructural proteins) inhibitors of novel coronavirus 2019 (severe acute respiratory syndrome coronavirus 2). Biomed Biotechnol Res J 2020;4:330-6.  Back to cited text no. 28
  [Full text]  
Wankhade G, Kamble S, Deshmukh S, Jena L, Waghmare P, Harinath BC. Inhibition of mycobacterial CYP125 enzyme by sesamin and β-sitosterol: An in silico and in vitro study. Biomed Biotechnol Res J 2017;1:49-54.  Back to cited text no. 29
  [Full text]  


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]

  [Table 1], [Table 2], [Table 3]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Article Figures
Article Tables

 Article Access Statistics
    PDF Downloaded87    
    Comments [Add]    

Recommend this journal