Patent application title: Engineered Cellular Pathways for Programmed Autoregulation of Differentiation
Inventors:
Ron Weiss (Princeton, NJ, US)
Ihor Lemischka (Princeton, NJ, US)
Priscilla E. M. Purnick (West Windsor, NJ, US)
Christoph Schaniel (New York, NY, US)
Miles Miller (Indianapolis, IN, US)
Patrick Guye (Rupperswil, CH)
IPC8 Class: AC12N1585FI
USPC Class:
1 1
Class name:
Publication date: 2017-02-16
Patent application number: 20170044570
Abstract:
The present invention provides compositions and methods for programming
mammalian cells to perform desired functions. In particular, the present
invention provides compositions and methods for programming stem cells to
differentiate into a desired cell type. A quorum sensing systems that
regulates the expression of cell fate regulators is introduced into
mammalian host cells, such as stem cells. The quorum sensing systems
generally comprises vectors that express the components of a bacterial
quorum sensing pathway, including proteins which catalyze the synthesis
of an autoinducer and a gene encoding a regulatory partner of the
autoinducer, and vectors in which genes encoding cell fate regulators are
operably linked to a promoter induced by the autoinducer/regulatory
partner complex. The system can also comprise vectors in which genes
encoding additional cell fate regulators are operably linked to a
promoter that is induced by a factor synthesized in response to a first
stage of differentiation, so that a second stage of differentiation is
triggered.Claims:
1-77. (canceled)
78. A composition comprising one or more mammalian vectors that comprise: a) a first nucleic acid sequence capable of producing a first cell fate regulator protein that is capable of inducing differentiation of a first cell type into a second cell type that expresses a protein marker, and b) a second nucleic acid sequence capable of producing a second cell fate regulator protein that is operably linked to a cell type specific promoter of said second cell type, and that is capable of inducing differentiation of said second cell type into a third cell type
79. The composition of claim 78, wherein said one or more vectors further comprises c) a third nucleic acid sequence capable of producing a third cell fate regulator protein that is operably linked to a cell type specific promoter of said second cell type, and that is capable of inducing differentiation of said second cell type into said third cell type.
80. The composition of claim 78, wherein said composition is comprised in a mammalian cell, wherein said vectors are exogenous to said mammalian cell.
81. The composition of claim 78, wherein said first cell fate regulator protein is selected from the group consisting of Sox17, Gata4, Gata6, Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1.
82. The composition of claim 79, wherein one or more of said second cell fate regulator protein and said third cell fate regulator protein is selected from the group consisting of Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1.
83. The composition of claim 78, wherein one or more of said first cell type and said second cell type is an epiblast-like stem cell (ELSC) and said protein marker is selected from the group consisting of stage-specific embryonic antigen-4, stage-specific embryonic antigen-1, stage-specific embryonic antigen-3, and carcinoembryonic antigen cell adhesion molecule-1.
84. The composition of claim 83, wherein said second cell type is a myoblast cell, and said protein marker is dystrophin.
85. The composition of claim 83, wherein said second cell type is an adipocyte cell, and said protein marker is PPAR(.
86. The composition of claim 83, wherein said second cell type is an endoderm cell, and said protein marker is selected from the group consisting of Hnf3.E-backward., lamininB1, and .A-inverted.-fetoprotein (AFP).
87. The composition of claim 83, wherein said second cell type is an ectoderm cell and said protein marker is nestin.
88. The composition of claim 78, wherein said first cell fate regulator protein comprises Gata4.
89. The composition of claim 78, wherein said second cell fate regulator protein comprises Pdx1.
90. The composition of claim 78, wherein said second cell fate regulator protein comprises Ngn3 and Pdx1.
91-125. (canceled)
Description:
[0001] This application is a continuation of, and claims priority to,
co-pending U.S. application Ser. No. 14/174,475, filed on Feb. 6, 2014,
which is a continuation of U.S. application Ser. No. 12/312,197, filed on
Apr. 29, 2009, which is the U.S. National stage filing of PCT Application
No. PCT/US2007/023227, filed on Nov. 1, 2007, now abandoned, which claims
priority to U.S. provisional Patent Application Ser. No. 60/856,531 filed
on Nov. 3, 2006, now abandoned, and to U.S. provisional Patent
Application Ser. No. 60/905,483 filed on Mar. 7, 2007, now abandoned,
each of which is herein incorporated by reference in its entirety for all
purposes.
FIELD OF THE INVENTION
[0002] The present invention provides compositions and methods for programming mammalian cells to perform desired functions. In particular, the present invention provides compositions and methods for programming stem cells to differentiate into a desired cell type.
BACKGROUND OF THE INVENTION
[0003] Diabetes Mellitus is a heterogeneous mix of genetic abnormalities in the insulin-producing machinery ranging from the body's inability to produce enough insulin to the body's inability to recognize and/or use insulin. Type I diabetes is an autoimmune disease which systematically destroys insulin-producing .beta. cells in the pancreas. Type II diabetes is caused by various genetic abnormalities in the pancreas and onset is directly correlated to obesity. The current standard treatment for diabetes is to maintain insulin levels by monitoring blood glucose and diet, to provide exogenous doses of insulin when necessary and to treat the consequences of diabetes such as loss of circulation to the extremities, glaucoma, and sepsis, as the disease progresses [Couper et al., Medical Journal of Australia, 179(8):441-447, 2003]. More radical treatments include full organ transplants, islet cell transplants or .beta. cell transplants. Pancreatic transplantation candidates are put on a long waiting list for a suitable organ. Even when patients are lucky enough to be chosen for an allogeneic pancreatic organ transplant, they must take immunosuppressants in order to battle graft vs. host disease. A recent attempt to use islet cell transplant therapy provided short-lived relief in most patients but the transplanted cells subsequently died or ceased to produce insulin in a majority of the initial successful transplants [Shapiro et al., New England Journal of Medicine, 355(13):1318-1330, 2006]. Clearly another approach is necessary to alleviate the problems caused by diabetes and address the root cause of the disease.
[0004] Recent developments in genome technologies, tissue engineering and synthetic biology offer possibilities to establish highly accurate and robust approaches for predictable and controllable cell fate regulation both temporally and spatially. Stem cell research promises to revolutionize the way many inherited and acquired diseases are treated and will also provide unprecedented insights into fetal development and the etiology of numerous disorders [Hochedlinger et al., N Engl J Med, 349(3):275-286, July 2003; Weissman, Science, 287(5457):1442-1446, February 2000; Lagasse et al., Immunity, 14(4):425-436, April 2001; Reya et al., Nature, 414(6859):105-111, November 2001]. Mouse embryonic stem (mES) cells are an attractive platform for this research because they are amenable to extensive genetic manipulation. When introduced into the appropriate in vitro or in vivo contexts, mES cells contribute to all tissue types of adult mice, including the germ line [Nagy et al., Development, 110(3):815-821, November 1990].
[0005] Consequently, there has been much excitement about the potential of these cells as an unlimited source of differentiated cell populations for transplantation or other therapies. Although potentially exciting and ground-breaking, ES cell-based therapies depend on the ability to reliably and controllably produce the necessary mature cell populations. In addition, directed differentiation must be absolute, given the tumorigenic potential of ES cells. With few exceptions, such directed production of desired cell populations has not been possible yet.
[0006] Current approaches towards tissue engineering and transplantation rely on carefully creating environments that induce cells to differentiate into desired tissues or organs. While these approaches have proven partially effective for certain applications, they are inherently limited since they rely on innate cellular response to existing host conditions or exogenous cues. Often, naturally occurring host conditions are insufficient to trigger the correct differentiation pathways. In those instances, researchers have attempted to provide appropriate environmental cues using scaffolds and exogenous signals. However, it is often difficult, if not impossible, to create and maintain the precise conditions that are required for tissue re-generation using such means.
[0007] What is needed in the art are systems and methods that can be used to cause stem cells to reliably differentiate into a desired cell type based on expression of genes introduced into the stem cells.
SUMMARY OF THE INVENTION
[0008] The present invention provides compositions and methods for programming mammalian cells to perform desired functions. In particular, the present invention provides compositions and methods for programming stem cells to differentiate into a desired cell type. Accordingly, in some embodiments, the present invention provides systems for directing differentiation of an undifferentiated cell type comprising: a) a first mammalian vector comprising a first gene encoding a protein that synthesizes an autoinducer; b) a second mammalian vector comprising a second gene encoding a regulatory protein that interacts with the autoinducer; c) a third mammalian vector comprising a promoter that binds to the regulatory protein in the presence of the autoinducer, the promoter operably linked to a third gene of interest encoding a first cell fate regulator. In some embodiments, the systems of the present invention further comprise a fourth mammalian vector comprising a promoter comprising a response element that binds to a regulatory protein produced in response to expression of the first cell fate regulator. The present invention is not limited to the use of any particular type of mammalian vector. Indeed, the use of a variety of mammalian vectors is contemplated, including, but not limited to lentiviral vectors, retroviral vectors, pseudotyped retroviral or lentiviral vectors, adenoviral vectors, AAV vectors, plasmids, artificial chromosomes, transposon vectors and the like.
[0009] In some embodiments, the first mammalian vector further comprises a promoter operably linked to the first gene. The present invention is not limited to the use of any particular promoter. Indeed, the use of a variety of promoters is contemplated. In some embodiments, the promoter is a repressible promoter. In some embodiments the promoter comprises a lac repressor. In some embodiments, the promoter is an inducible promoter. In some embodiments, the inducible promoter is a Tet promoter. The present invention is not limited to vectors encoding genes that synthesize any particular autoinducer. Indeed, the use of a variety of autoinducers is contemplated, including, but not limited to 30C6HSL, C4HSL, and 3OC14HSL. The present invention is not limited to the use of genes encoding any particular autoinducer. Indeed, the use of several genes that encode proteins that catalyze the synthesis of autoinducers is contemplated, included but not limited to the LuxI, RhlI, and CinI proteins. The present inventions is not limited to the use of the any particular genes encoding LuxI, RhlI, or CinI. In some embodiments, the genes comprise codons that are optimized for expression in mammalian cells. In other embodiments, the genes utilized are at least 70%, 80%, 90% or 95% identical to the wild type LuxI, RhlI or CinI wild-type genes.
[0010] In some preferred embodiments, the second mammalian vector comprises a gene encoding a regulatory protein that interacts with the autoinducer synthesized by the protein encoded by the gene of interest in the first vector. The present invention is not limited to the use of any particular regulatory protein. Indeed, the use of a variety of regulatory proteins is contemplated, including, but not limited to LuxR, RhlR, and CinR. The present inventions is not limited to the use of the any particular genes encoding LuxR, RhlR, or CinR. In some embodiments, the genes comprise codons that are optimized for expression in mammalian cells. In other embodiments, the genes utilized are at least 70%, 80%, 90% or 95% identical to the wild type LuxR, RhlR or CinR wild-type genes.
[0011] The present invention is not limited to the use of any particular promoter in the third vector of the system of the present invention. In some embodiments, the third mammalian vector comprises a promoter selected from the group consisting of lux and LasR promoters. In some preferred embodiments, the lux promoter comprises multiple repeats of the lux binding sequence, which binds LuxR/autoinducer complex. Likewise, the vectors for use in the systems of the present invention may comprise promoters comprising multiple repeats of the binding sequences recognized by the RhlI and CinI autoinducer complexes. The present invention is not limited to the use of vectors encoding any particular cell fate regulator. In some embodiments, the first cell fate regulator is selected from the group consisting of Sox17, Gata4, and Gata6 and combinations thereof. In further embodiments, the vectors used in the systems of the present invention encode two or more of the cell fate regulators selected from the group consisting of Sox17, Gata4, Gata6, Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1.
[0012] In some embodiments of the systems of the present invention, the fourth mammalian vector further comprises a gene encoding a second cell fate regulator. The present invention is not limited to the use of any particular second cell fate messenger. In some embodiments, the second cell fate regulator is selected from the group consisting of Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxeR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1. In some embodiments, the systems of the present invention comprise multiple vectors encoding one or more of the foregoing cell fate regulators. The fourth vectors of the systems of the present invention are not limited to the use of any particular promoter. Indeed, a variety of stage specific promoters may be utilized. In some embodiments, the promoter comprising a response element that binds to a regulatory protein produced in response to expression of the first cell fate regulator comprises an .alpha.-fetoprotein promoter.
[0013] In some embodiments, the systems of the present invention comprise vectors encoding additional proteins involved in a quorum sensing pathway. Accordingly, in some embodiments, the systems of the present invention comprise a fifth mammalian vector comprising a gene encoding an acyl carrier protein (ACP). In further embodiments, the systems of the present invention comprise a sixth mammalian vector comprising a gene encoding acyl-acyl carrier protein synthase (AAS). The present invention is not limited to vectors comprising any particular ACP or AAS genes. In some embodiments, the genes comprise codons optimized for expression in a mammalian cells. In further embodiments, the genes are at least 70%, 80%, 90% or 95% identical to the wild type ACP or AS genes.
[0014] In some embodiments, the systems of the present invention further comprise a separate vector encoding comprising a gene encoding RhlR. In some embodiments, the systems of the present invention further comprise a separate mammalian vector comprising a gene encoding Growth Arrest Factor. In some embodiments, the systems of the present invention further comprise a mammalian vector comprising a gene encoding TetR. In some embodiments, the systems of the present invention further comprise a separate mammalian vector comprising a gene of interest encoding a protein selected from the group consisting of TetR, LacI, and CinI operably linked to a Mouse Insulin promoter.
[0015] In some embodiments, the present invention provides a culture of mammalian cells comprising one or more of the foregoing vectors of the system of the present invention. In some embodiments, the present invention provides a mammalian cell comprising one or more of the foregoing vectors of the system of the present invention. The present invention is not limited to the use of any particular type of mammalian cells. In some embodiments, the cell is totipotent cell. In some embodiments, the cell is a multipotent cell. In some embodiments, the cell is a differentiated cell. In some embodiments, the cell or cell culture comprises cells selected from the group consisting of embryonic stem cells, adult stem cells or cord blood stem cells. The present invention is not limited to mammalian cells of any particular species. In some embodiments, the cells are human, primate, mouse, cow, hamster, rat, pig, sheep or goat cells. In some embodiments, the differentiated cells are .beta.-cells. In some embodiments, the cells produce insulin.
[0016] In some embodiments, the present invention provides methods of treating a patient comprising: providing a patient in need of treatment and the mammalian cells described in the foregoing paragraphs, and introducing the cells into the patient. In some embodiments, the patient is diabetic. In some embodiments, the mammalian cells produce insulin after introduction into the patient.
[0017] In some embodiments, the present invention provides methods of programming mammalian cells comprising: introducing a quorum sensing system into a mammalian cell, wherein the system causes production of an autoinducer molecule and a regulatory partner of the autoinducer; introducing an expression system for at least one cell fate regulator into the mammalian cell; culturing the cell under conditions such that the cell produces the autoinducer, wherein the autoinducer interacts with the regulatory partner to induce expression of the at least one cell fate regulator.
[0018] In some embodiments, the present invention provides methods of programming mammalian cells comprising 1) providing i) a mammalian cell, ii) a first vector encoding a protein that synthesizes an autoinducer molecule; iii) a second vector encoding a regulatory partner of the autoinducer; iv) a third vector encoding a gene of interest operably linked to a promoter activated by the interaction of the autoinducer and the regulatory partner; 2) introducing the vectors into the mammalian cell; 3) culturing the mammalian cell under conditions such that the regulatory partner and the autoinducer are synthesized and activate expression of the gene of interest.
[0019] In some embodiments, the present invention provides a mammalian cell comprising an exogenous gene selected from the group consisting of genes encoding LuxI, LuxR, RhlI, RhlR, CinI, and CinR. In some embodiments, the present invention provides a mammalian cell comprising an exogenous gene selected from the group consisting of Sox17, Gata4, and Gata6, wherein the exogenous gene is operably linked to a lux promoter. In some embodiments, the present invention provides a mammalian cell comprising an exogenous gene selected from the group consisting of Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1, wherein the exogenous gene is operably linked to a endoderm specific promoter. In some embodiments, the endoderm specific promoter is the .alpha.-fetoprotein promoter.
In some embodiments, the present invention provides a mammalian cell comprising a quorum sensing pathway. In some embodiments, the quorum sensing pathway comprises one or more exogenous genes encoding Lux1, LuxR, acyl carrier protein and acyl-acyl carrier protein synthase.
[0020] In some embodiments, the present invention provides a system for use in establishing tissue homeostasis in a differentiating cell system comprising mammalian vectors encoding at least one cell-cell communication system, wherein said at least a first cell-cell communication system controls proliferation or differentiation of cells in said differentiating cell system. In some embodiments, the systems further comprise at least a second cell-cell communication system, wherein said first and second cell-cell communication systems interact to establish homeostasis in said differentiating cell system. In some embodiments, the differentiating cells system comprises differentiable cells and differentiated cells produced from said differentiable cells. In some embodiments, the homeostasis is characterized by the controlled proliferation of said differentiable cells comprising said at least two cell-cell communication systems and the controlled production of said differentiated cells. In further embodiments, the at least two cell-cell communication systems interact to establish tissue homeostasis in a differentiating cell system comprising differentiable cells and differentiated cells by sensing the number of differentiating cells via the first cell-cell communication system and the number of differentiated cells via the second cell-cell communication system so that when the number of differentiable cells is low proliferation of the differentiable cells is induced, when the number of differentiable cells is high the proliferation of the differentiable is cells in inhibited, when the number of differentiated cells is low the differentiation of the differentiable cells is induced and the proliferation of the differentiable cells is induced, and when the number of differentiated cells is high the differentiation of the differentiable cells is inhibited. The present invention is not limited to the use of any particular cell-cell communication systems. Indeed, the use of a variety of cell-cell communication systems is contemplated. In some embodiments, the at least one cell-cell communication system is a bacterial cell-cell communication systems. In other embodiments, the at least one cell-cell communication system is selected from the group consisting of the LuxI, RhlI, and CinI cell-cell communication systems. The present invention is not limited to the use of any particular type of vector. Indeed, the use of a variety of vectors is contemplated. In some embodiments, the mammalian vectors are retroviral vectors. In further embodiments, the systems of the present invention further comprise a cell differentiation control system.
[0021] In some embodiments, the present invention provides a differentiable cell comprising the foregoing systems, wherein said differentiable cell is cultured in vitro. In further embodiments, the present invention provides a differentiable mammalian cell comprising at least a first cell-cell communication pathway, wherein said first cell-cell communication pathway is encoded by exogenous genes. In some embodiments, the cells further comprise at least a second cell communication pathway. In some embodiments, the cell differentiates into a target differentiated cell. The present invention is not limited to any particular type of cell. Indeed, the use of a variety of cell types is contemplated. In some embodiments, the cell is a pluripotent, multipotent or totipotent cell. The present invention is not limited to the use of any particular cell-cell communication systems. Indeed, the use of a variety of cell-cell communication systems is contemplated. In some embodiments, the at least one cell-cell communication system is a bacterial cell-cell communication systems. In other embodiments, the at least one cell-cell communication system is selected from the group consisting of the LuxI, RhlI, and CinI cell-cell communication systems. In some embodiments, the cell further comprises an exogenous cell differentiation pathway. In some embodiments, the first cell-cell communication pathway is a CinI/CinR cell-cell communication pathway. In some embodiments, the CinI/CinR cell-cell communication pathway comprises at least a first gene encoding CinI, a second gene encoding CinR, and a third gene of interest operably linked to a CinR/30C14HSL inducible promoter. In some embodiments, the second cell-cell communication pathway is a RhlI/RhlR cell-cell communication pathway. In some embodiments, the RhlI/RhlR cell-cell communication pathway comprises at least a first gene encoding RhlI, a second gene encoding RhlR, and a third gene of interest operably linked to a RhlR/C4HSL inducible promoter. In some embodiments, the cell comprises a cell differentiation pathway. In some embodiments, the cell differentiation pathway comprises at least one exogenous gene encoding a cell fate regulator. The present invention is not limited to the use of any particular cell fate regulator. Indeed, the use of a variety of cell fate regulators is contemplated, including, but not limited to, Sox17, Gata4, Gata6, Pdx1, Ngn3, Nkx6.1, Nkx2.2, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1. In some embodiments, the third gene of interest operably linked to a RhlR/C4HSL inducible promoter is a protein that inhibits growth in a cell. In further embodiments, the protein that inhibits growth in a cell is Growth Arrest Factor. In some embodiments, the third gene of interest operably linked to a CinR/30C14HSL inducible promoter is a repressor. In some embodiments, the repressor is lambda repressor. In some embodiments, the first and second cell-cell communication pathways interact to control proliferation and differentiation of said mammalian cell. In some embodiments, the cell differentiation pathway comprises at least one gene encoding a protein that causes said cell to differentiate into a target differentiated cell. In some embodiments, the target differentiated cell is a beta cell. In some embodiments, the second cell-cell communication pathway is activated in said target differentiated cell and comprises at least one gene encoding a protein that inhibits the proliferation of undifferentiated cells comprising said first and second cell-cell communication pathways. In some embodiments, the cell differentiation pathway is activated in said target differentiated cell and comprises at least one gene encoding a protein that regulates expression of said at least one gene encoding a protein that causes undifferentiated cells comprising said first and second cell-cell communication pathways to differentiate into a target differentiated cell. In some embodiments, the gene encoding a protein that regulates expression of said at least one gene encoding a protein that causes undifferentiated cells comprising said first and second cell-cell communication pathways to differentiate into a target differentiated cell is a repressor. In some embodiments, the repressor is a lambda repressor. In some embodiments, the cell is maintained in vitro.
[0022] The present invention further provides methods of controlling proliferation and differentiation of a differentiable cell comprising: a) providing: i) a differentiable cell; and ii) at least one cell-cell communication pathway; and b) introducing said at least one cell-cell communication pathway into said differentiable cell so that when said at least one cell-cell communication pathway is expressed in said differentiable cell, the proliferation and differentiation of said differentiable cell is controlled. In some embodiments, the two cell-cell communication pathways are introduced into said differentiable cells and wherein said two cell communication pathways interact to control proliferation and differentiation of said differentiable cell and proliferation of target differentiated cells that differentiate from said differentiable cells. In some embodiments, the at least one of said two cell-cell communication pathways provides regulatory feedback on differentiation of said differentiable cells. The present invention is not limited to the use of any particular cell-cell communication pathway. Indeed, the use of a variety of cell-cell communication pathways is contemplated. In some embodiments, the at least one cell-cell communication pathway is a bacterial cell-cell communication systems. In other embodiments, the at least one cell-cell communication pathway is selected from the group consisting of the LuxI, RhlI, and CinI cell-cell communication systems.
[0023] In some embodiments, the present invention provides methods of treating a subject comprising: a) providing a plurality of mammalian cells as described above; and b) introducing said cells into a subject under conditions such that the proliferation and differentiation of said cells is controlled to provide a source of differentiated target cells in said subject. In some embodiments, the subject is human.
[0024] In some embodiments, the present invention provides a symmetry breaking system for mammalian cells comprising a first vector comprising an activator operably linked to a promoter responsive to the activator and a repressor and a second vector encoding a repressor operably linked to promoter responsive to said activator. In some embodiments, the present invention provides a population of cells comprising the symmetry breaking system, wherein expression of the activator causes activation of the activator and the repressor and expression of the repressor causes repression of the activator, so that at any given time only a portion of the cells within the population have high levels of expression of the activator as compared to the repressor.
[0025] In some embodiments, the present invention provides a cascade system for mammalian cells comprising a first vector encoding a first repressor operably linked to a promoter responsive to a first activator, a second vector encoding a second repressor operably linked to promoter repressed by said first repressor, and a third vector encoding a second activator operably linked to a promoter repressed by said second repressor, and a fourth vector encoding a repressor operably linked to a promoter that is activated by said second activator and repressed by a third repressor. In some embodiments, the present invention provides a population of cells comprising the cascade system.
[0026] In some embodiments, the present invention provides a toggle switch system for mammalian cells comprising a first vector encoding first repressor operably linked to a promoter repressed by a second repressor and a second vector encoding said second repressor operably linked to a promoter repressed by said first repressor. In some embodiments, the first repressor is TetR and said second repressor is LacI. In some embodiments, the present invention provides cells comprising the toggle switch system.
[0027] In some embodiments, the present invention provides a population of uncommitted and committed cells comprising: a) a cell population control module comprising a first cell-cell communication pathway; b) a cell commitment module comprising a symmetry breaking system and a second cell-cell communication system; and c) a cell differentiation module; wherein said cell population control module senses the concentration of uncommitted cells in the population via said first cell-cell communication system, the cell commitment module senses the concentration of committed cells in the population via said second cell-cell communication system and controls which cells within the population are allowed to commit via said symmetry breaking system, so that when the concentration of uncommitted cells in the population is high, the concentration of committed cells in the population is low, and there are cells that allowed to commit, said cell differentiation module is activated. In some embodiments, the cell communication module further comprises a cascade system and a toggle switch system that interact with said symmetry breaking system and said second cell-cell communication system to control the commitment of cells within the population to differentiate. In some embodiments, the cells further comprise an apoptosis module. In some embodiments, the apoptosis module comprises a vectors encoding a repressor operably linked to a tissue-specific promoter and a vector encoding an apoptosis gene operably linked to a promoter regulated by said repressor, so that said apoptosis gene is not expressed in a tissue wherein said tissue-specific promoter is active.
[0028] In some embodiments, the present invention provides a fusion protein comprising a secretion signal, cell penetrating polypeptide, and trans acting domain in operable combination, wherein at least one of said secretion signal, cell penetrating polypeptide and trans acting domain are from different proteins. The present invention is not limited to the use of any particular secretion signal. Indeed, the use of a variety of secretion signals is contemplated, including, but not limited to IgG, t PA, serum albumin, lactoferrin, and growth hormone secretion signals. The present invention is not limited to the use of any particular cell penetrating polypeptide. Indeed, the use of a variety of cell penetrating polypeptides is contemplated, including, but not limited to, TAT cell penetrating polypeptide and penetratin. The present invention is not limited to the use of any particular trans-acting domain. Indeed, the use of variety of trans-acting domains is contemplated, including, but not limited to, zinc finger binding domains. In further embodiments, the present invention provides nucleic acids encoding the fusion proteins. In further embodiments, the present invention provides a vector comprising the nucleic acid encoding the fusion protein. In still further embodiments, the present invention provides a cell comprising the nucleic acid.
[0029] In some embodiments, the present invention provides a cell-cell communication system comprising: a) a first vector comprising a nucleic acid encoding a fusion protein comprising a secretion signal, cell penetrating polypeptide, and trans-acting domain in operable combination and b) a second vector comprising a nucleic acid encoding a promoter comprising an element that binds said trans-acting domain operably linked to a protein of interest. In further embodiments, the present invention provides a population of cells comprising the system.
DESCRIPTION OF THE FIGURES
[0030] FIG. 1A-1B: Autoregulated quorum sensing based two-step differentiation from mES to endoderm to .beta. cells. (1a) Circuit diagram of the system. Details are in the text. Gata4/Sox17 activates the .alpha.-fetoprotein promoter (pAFP) indirectly by stimulating expression of AFP (dashed line). Likewise, Pdx1 and Ngn3 activate the Mouse Insulin Promoter (MIP) indirectly by stimulating insulin production (dashed lines). AFP production results in green fluorescence and insulin production results in red fluorescence. (1b) Progress of system from a single cell to a collection of .beta. cells. Due to quorum sensing, a population of mES cells is necessary in order to differentiate into endodetin.
[0031] FIG. 2A-2C: (2a) 30C6HSL synthesis by mammalian 293FT cells. Supernatant from the cells' growth media was collected and filter sterilized. Histograms show the fluorescence intensities of bacterial cells sensitive to 30C6HSL grown in the filtered supernatant, expressing green fluorescent protein due to 30C6HSL from the mammalian cells. (2b) 30C6HSL detection by mammalian receiver cells. 293FT Cells express DsRed constitutively, but express EGFP only upon induction with exogenous 30C6HSL. (2c) Controlled stem cell differentiation with genetically inducible systems. Three different lentiviral genetic constructs were built, enabling exogenous chemical induction into myoblasts, adipocytes, or maintenance of stem cell properties. Dox-induced mES cells expressing MyoD causes visible changes to a myoblast morphology. In addition, the marker protein for myoblasts, dystrophin, was expressed from these cells (data not shown). PPAR.gamma. expression resulted in an adipocyte morphology. In a separate experiment, induced expression of Nanog supplanted the need for a growth factor usually required to maintain the overall character of mES cells.
[0032] FIG. 3: Diagram of a four module system for cell proliferation and differentiation control.
[0033] FIG. 4A-4B: (4a) shRNA is transduced into engineered mES cells and cells are allowed to differentiate into pancreatic .beta. cells. (4b) Before differentiation, shRNAs will be uniformly present in mES cells. After differentiation there will be different pools of cell types (undifferentiated mES cells, endodeunal cells or pancreatic .beta. cells, e.g.) depending on the effect of the transduced shRNAs. The resulting shRNA ratios compared with the shRNA ratios before differentiation will allow identification of genes necessary for differentiation.
[0034] FIG. 5A-5D provides the sequence (SEQ ID NO:01) for the LuxI vector-pLV-Hef1a-LuxIm-IRES2-DsRed2.
[0035] FIG. 6 provides a diagram of the vector of FIG. 5A-5D.
[0036] FIG. 7A-7E provides a description and sequence (SEQ ID NO:02) for the LuxR vector-pLV-Hef1a-p65H4LuxRFm-IRES2-DsRed2.
[0037] FIG. 8 provides a diagram of the vector of FIG. 7A-7E.
[0038] FIG. 9A-9E provides a description and sequence (SEQ ID NO:03) for the lux promoter vector for expression of Gata4/Sox17-pLV-minCMVLuxO7-IRES2-EGFP.
[0039] FIG. 10 provides a diagram of the vector of FIG. 9A-9E.
[0040] FIG. 11A 11E provides a description and sequence (SEQ ID NO:04) for the ACP vector-pLV-Hef1a-ACPm-IRES2-DsRed2
[0041] FIG. 12 provides a diagram of the vector of FIG. 11A-11E.
[0042] FIG. 13A-13F provides a description and sequence (SEQ ID NO:05) for the AAS vector-pLV-Hef1a-AAS-IRES2-EGFP.
[0043] FIG. 14 provides a diagram of the vector of FIG. 13A-13F.
[0044] FIG. 15A-15E provides a description and sequence (SEQ ID NO:06) for the AFP promoter vector PDX1-pLV-AFP-Pdx1-IRES2-DsRed2.
[0045] FIG. 16 provides a diagram of the vector of FIG. 15A-15E.
[0046] FIG. 17A-17E provides a description and sequence (SEQ ID NO:07) for the AFP promoter vector Ngn3-pLV-AFP-Ngn3-IRES2-DsRed2.
[0047] FIG. 18 provides a diagram of the vector of FIG. 17A-17E.
[0048] FIG. 19A-19E provides a description and sequence (SEQ ID NO:08) for the AFP promoter vector for TetRIKRAB-pLV-AFP-TetRKRAB-IRES2-DsRed2.
[0049] FIG. 20 provides a diagram of the vector of FIG. 19A-19E.
[0050] FIG. 21A-21E provides a description and sequence (SEQ ID NO:09) for the RhlI vector-pLV-Hef1a-RhlI-11RES2-DsRed2.
[0051] FIG. 22 provides a diagram of the vector of FIG. 21A-21E.
[0052] FIG. 23A-23H provides a diagram and sequence (SEQ ID NO:10) for the vector pLV-MIP-LacIKRAB-IRES2-EGFP.
[0053] FIG. 24A-24G provides a diagram and sequence (SEQ ID NO:11) for the vector pLV-MIP-IRES2-EGFP.
[0054] FIG. 25A-25G provides a diagram and sequence for SEQ ID NO:12) for the vector pLV-TetRKRAB-IRES2-Puro.
[0055] FIG. 26A-26B provides a diagram of a synthetic signaling circuit utilizing cell penetrating polypeptide elements.
[0056] FIG. 27A-27B provides a diagram of a synthetic signaling circuit utilizing cell penetrating polypeptide elements.
[0057] FIG. 28A-28G provides a plasmid and sequence for P_A1_R1/A1.
[0058] FIG. 29A-29G provides a plasmid and sequence for p_tetO/LacI.
[0059] FIG. 30A-30F provides a plasmid and sequence for p RhlI_R3/A2.
[0060] FIG. 31A-31F provides a plasmid and sequence for P_Cin/CinR.
[0061] FIG. 32A-32F provides a plasmid and sequence for P_Cin/CI.
[0062] FIG. 33A-33F provides a plasmid and sequence for P_Cin_tetO/CinI.
[0063] FIG. 34A-34G provides a plasmid and sequence for P_lacO_R4/TetR.
[0064] FIG. 35A-35F provides a plasmid and sequence for P_A1/R1.
[0065] FIG. 36A-36F provides a plasmid and sequence for P_A1/R2.
[0066] FIG. 37A-37G provides a plasmid and sequence for P_R2/R3.
[0067] FIG. 38A-38F provides a plasmid and sequence for P_A2/CIOR/R4.
DEFINITIONS
[0068] As used herein, the term "quorum sensing" refers to the detection of the density of a cell type by means of cell-cell communications. In some embodiments, "quorum sensing" is associated with an action upon achievement of a certain cell density.
[0069] As used herein, the tem' "quorum sensing system" refers to a system of genes that provides components for quorum sensing.
[0070] As used herein, the term "totipotent" means the ability of a cell to differentiate into any type of cell in a differentiated organism.
[0071] As used herein, the term "pluripotent" refers to a cell line capable of differentiating into several differentiated cell types.
[0072] As used herein, the term "multipotent" refers to a cell line capable of differentiating into at least two differentiated cell types.
[0073] As used herein, the term "host cell" refers to any eukaryotic cell (e.g., mammalian cells, avian cells, amphibian cells, plant cells, fish cells, and insect cells), whether located in vitro or in vivo.
[0074] As use herein, the term "stem cells" means cells that are totipotent or pluripotent and are capable of differentiating into one or more different cell types.
[0075] As use herein, the term "embryonic stem cells" means stem cells derived from an embryo.
[0076] As used herein, the term "adult stem cells" means stem cells derived from an organism afterbirth.
[0077] As used herein, the term "mesodermal cell line" means a cell line displaying characteristics associated with mesodermal cells.
[0078] As used herein, the term "endodermal cell line" means a cell line displaying characteristics normally associated with endodermal cells.
[0079] As used herein, the term "neural cell line" means a cell line displaying characteristics normally associated with neural cell lines. Examples of such characteristics include, but are not limited to, expression of GFAP, neuron-specific enolase, Neu-N, neurofilament-N, or tau.
[0080] As used herein the term "differentiable cell" refers to a cell that can differentiate into another cell type and includes multipotent, pluripotent and totipotent cells.
[0081] As used herein, the term "target differentiated cell" refers to a predetermined cell type that differentiates from a differentiable cell.
[0082] As used herein, the term "differentiating cell system" refers to a population of differentiable cells and target differentiated cells.
[0083] As used herein, the term "tissue homeostasis" refers to a steady state achieved between a population of differentiable cells and target differentiated cells in a differentiating cell system wherein cell-cell communication between the differentiable cells and the target differentiated cells controls the number of differentiable cells and target differentiated cells in the system.
[0084] As used herein, the term "proliferation" as used with respect to cells in a differentiating cell system refers to the production of cells of like type from a given cell population, such as the production of additional differentiable cells from a population of differentiable cells.
[0085] As used herein, the term "differentiation" as used with respect to cells in a differentiating cell system refers to the process by which cells differentiate from one cell type (e.g., a multipotent, totipotent or pluripotent differentiable cell) to another cell type such as a target differentiated cell (e.g., a beta cell).
[0086] The term "cell-cell communication pathway" refers to a network of two or more genes encoding proteins (e.g., LuxR, CinR, RhlR) and proteins that synthesize products (e.g., 30C.sub.6HSL, C.sub.14HSL, C.sub.4HSL) that are involved in cell-cell communication. Exemplary cell-cell communication pathways include, but are not limited to the LuxI/LuxR, CinI/CinR, RhlI/RhlR. As a further example, the LuxI/LuxR cell-cell communication pathway can include the LuxI and LuxR genes. Additionally the LuxI/LuxR cell-cell communication pathway can include a gene of interest operably linked to promoter induced by LuxR/30C.sub.6HSL. The CinI/CinR cell-cell communication pathway can include the CinI and CinR genes. Additionally the CinI/CinR cell-cell communication pathway can include a gene of interest operably linked to promoter induced by CinR/C.sub.14HSL. The RhlI/RhlR cell-cell communication pathway can include the RhlI and RhlR genes. Additionally the RhlI/RhlR cell-cell communication pathway can include a gene of interest operably linked to promoter induced by RhlR/C.sub.4HSL.
[0087] As used herein, the term "cell culture" refers to any in vitro culture of cells. Included within this term are continuous cell lines (e.g., with an immortal phenotype), primary cell cultures, finite cell lines (e.g., non-transformed cells), and any other cell population maintained in vitro, including oocytes and embryos.
[0088] As used herein, the term "vector" refers to any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc., which is capable of replication when associated with the proper control elements and which can transfer gene sequences between cells. Thus, the term includes cloning and expression vehicles, as well as viral vectors.
[0089] As used herein, the term "multiplicity of infection" or "MOI" refers to the ratio of integrating vectors:host cells used during transfection or transduction of host cells. For example, if 100,000 vectors are used to transduce 100,000 host cells, the multiplicity of infection is 1. The use of this term is not limited to events involving transduction, but instead encompasses introduction of a vector into a host by methods such as lipofection, microinjection, calcium phosphate precipitation, and electroporation.
[0090] As used herein, the term "genome" refers to the genetic material (e.g., chomosomes) of an organism.
[0091] The term "nucleotide sequence of interest" refers to any nucleotide sequence (e.g., RNA or DNA), the manipulation of which may be deemed desirable for any reason (e.g., treat disease, confer improved qualities, expression of a protein of interest in a host cell, expression of a ribozyme, etc.), by one of ordinary skill in the art. Such nucleotide sequences include, but are not limited to, coding sequences of structural genes (e.g., reporter genes, selection marker genes, oncogenes, drug resistance genes, growth factors, etc.), and non-coding regulatory sequences which do not encode an mRNA or protein product (e.g., promoter sequence, polyadenylation sequence, termination sequence, enhancer sequence, etc.).
[0092] As used herein, the term "protein of interest" refers to a protein encoded by a nucleic acid of interest.
[0093] As used herein, the term "exogenous gene" refers to a gene that is not naturally present in a host organism or cell, or is artificially introduced into a host organism or cell.
[0094] The term "gene" refers to a nucleic acid (e.g., DNA or RNA) sequence that comprises coding sequences necessary for the production of a polypeptide or precursor (e.g., proinsulin). The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, etc.) of the full-length or fragment are retained. The term also encompasses the coding region of a structural gene and includes sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. The sequences that are located 5' of the coding region and which are present on the mRNA are referred to as 5' untranslated sequences. The sequences that are located 3' or downstream of the coding region and which are present on the mRNA are referred to as 3' untranslated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.
[0095] As used herein, the term "gene expression" refers to the process of converting genetic information encoded in a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through "transcription" of the gene (i.e., via the enzymatic action of an RNA polymerase), and for protein encoding genes, into protein through "translation" of mRNA. Gene expression can be regulated at many stages in the process. "Up-regulation" or "activation" refers to regulation that increases the production of gene expression products (i.e., RNA or protein), while "down-regulation" or "repression" refers to regulation that decrease production. Molecules (e.g., transcription factors) that are involved in up-regulation or down-regulation are often called "activators" and "repressors," respectively.
[0096] Where "amino acid sequence" is recited herein to refer to an amino acid sequence of a naturally occurring protein molecule, "amino acid sequence" and like terms, such as "polypeptide" or "protein" are not meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the recited protein molecule.
[0097] As used herein, the terms "nucleic acid molecule encoding," "DNA sequence encoding," "DNA encoding," "RNA sequence encoding," and "RNA encoding" refer to the order or sequence of deoxyribonucleotides or ribonucleotides along a strand of deoxyribonucleic acid or ribonucleic acid. The order of these deoxyribonucleotides or ribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA or RNA sequence thus codes for the amino acid sequence.
[0098] As used herein, the terms "complementary" or "complementarity" are used in reference to polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, the sequence "5'-A-G-T-3'," is complementary to the sequence "3'-T-C-A-5'." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids.
[0099] The terms "homology" and "percent identity" when used in relation to nucleic acids refers to a degree of complementarity. There may be partial homology (i.e., partial identity) or complete homology (i.e., complete identity). A partially complementary sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid sequence and is referred to using the functional term "substantially homologous." The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe (i.e., an oligonucleotide which is capable of hybridizing to another oligonucleotide of interest) will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous sequence to a target sequence under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target which lacks even a partial degree of complementarity (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
[0100] The art knows well that numerous equivalent conditions may be employed to comprise low stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized, etc.) and the concentration of the salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization solution may be varied to generate conditions of low stringency hybridization different from, but equivalent to, the above listed conditions. In addition, the art knows conditions that promote hybridization under conditions of high stringency (e.g., increasing the temperature of the hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.).
[0101] When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term "substantially homologous" refers to any probe that can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low stringency as described above.
[0102] When used in reference to a single-stranded nucleic acid sequence, the term "substantially homologous" refers to any probe that can hybridize (i.e., it is the complement of) the single-stranded nucleic acid sequence under conditions of low stringency as described above.
[0103] As used herein, the term "hybridization" is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the T.sub.m of the formed hybrid, and the G:C ratio within the nucleic acids. A single molecule that contains pairing of complementary nucleic acids within its structure is said to be "self-hybridized."
[0104] As used herein, the term "T.sub.m" is used in reference to the "melting temperature" of a nucleic acid. The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the T.sub.m of nucleic acids is well known in the art. As indicated by standard references, a simple estimate of the T.sub.m value may be calculated by the equation: T.sub.m=81.5+0.41(% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (See e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization
[1985]). Other references include more sophisticated computations that take structural as well as sequence characteristics into account for the calculation of T.sub.m.
[0105] As used herein the term "stringency" is used in reference to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid hybridizations are conducted. With "high stringency" conditions, nucleic acid base pairing will occur only between nucleic acid fragments that have a high frequency of complementary base sequences. Thus, conditions of "weak" or "low" stringency are often required with nucleic acids that are derived from organisms that are genetically diverse, as the frequency of complementary sequences is usually less.
[0106] "High stringency conditions" when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42.degree. C. in a solution consisting of 5.times.SSPE (43.8 g/l NaCl, 6.9 g/l NaH.sub.2PO.sub.4.H.sub.2O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5.times.Denhardt's reagent and 100 .mu.g/ml denatured salmon sperm DNA followed by washing in a solution comprising 0.1.times.SSPE, 1.0% SDS at 42.degree. C. when a probe of about 500 nucleotides in length is employed.
[0107] "Medium stringency conditions" when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42.degree. C. in a solution consisting of 5.times.SSPE (43.8 g/l NaCl, 6.9 g/l NaH.sub.2PO.sub.4.H.sub.2O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5.times.Denhardt's reagent and 100 .mu.g/ml denatured salmon sperm DNA followed by washing in a solution comprising 1.0.times.SSPE, 1.0% SDS at 42.degree. C. when a probe of about 500 nucleotides in length is employed.
[0108] "Low stringency conditions" comprise conditions equivalent to binding or hybridization at 42.degree. C. in a solution consisting of 5.times.SSPE (43.8 g/l NaCl, 6.9 g/l NaH.sub.2PO.sub.4.H.sub.2O and 1.85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5.times.Denhardt's reagent [50.times.Denhardt's contains per 500 ml: 5 g Ficoll (Type 400, Pharamcia), 5 g BSA (Fraction V; Sigma)] and 100 .mu.g/ml denatured salmon sperm DNA followed by washing in a solution comprising 5.times.SSPE, 0.1% SDS at 42.degree. C. when a probe of about 500 nucleotides in length is employed.
[0109] The terms "in operable combination," "in operable order," and "operably linked" as used herein refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.
[0110] As used herein, the term "selectable marker" refers to a gene that encodes an enzymatic activity that confers the ability to grow in medium lacking what would otherwise be an essential nutrient (e.g. the HIS3 gene in yeast cells); in addition, a selectable marker may confer resistance to an antibiotic or drug upon the cell in which the selectable marker is expressed. Selectable markers may be "dominant"; a dominant selectable marker encodes an enzymatic activity that can be detected in any eukaryotic cell line. Examples of dominant selectable markers include the bacterial aminoglycoside 3' phosphotransferase gene (also referred to as the neo gene) that confers resistance to the drug G418 in mammalian cells, the bacterial hygromycin G phosphotransferase (hyg) gene that confers resistance to the antibiotic hygromycin and the bacterial xanthine-guanine phosphoribosyl transferase gene (also referred to as the gpt gene) that confers the ability to grow in the presence of mycophenolic acid. Other selectable markers are not dominant in that their use must be in conjunction with a cell line that lacks the relevant enzyme activity. Examples of non-dominant selectable markers include the thymidine kinase (tk) gene that is used in conjunction with tk.sup.- cell lines, the CAD gene, which is used in conjunction with CAD-deficient cells, and the mammalian hypoxanthine-guanine phosphoribosyl transferase (hprt) gene, which is used in conjunction with hprt.sup.- cell lines. A review of the use of selectable markers in mammalian cell lines is provided in Sambrook, J. et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York (1989) pp. 16.9-16.15.
[0111] As used herein, the term "regulatory element" refers to a genetic element that controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element that facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, RNA export elements, internal ribosome entry sites, etc. (defined infra).
[0112] Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription (Maniatis et al., Science 236:1237
[1987]). Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect and mammalian cells, and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types (for review see, Voss et al., Trends Biochem. Sci., 11:287
[1986]; and Maniatis et al., supra). For example, the SV40 early gene enhancer is very active in a wide variety of cell types from many mammalian species and has been widely used for the expression of proteins in mammalian cells (Dijkema et al., EMBO J. 4:761
[1985]). Two other examples of promoter/enhancer elements active in a broad range of mammalian cell types are those from the human elongation factor 1.alpha. gene (Uetsuki et al., J. Biol. Chem., 264:5791
[1989]; Kim et al., Gene 91:217
[1990]; and Mizushima and Nagata, Nuc. Acids. Res., 18:5322
[1990]) and the long terminal repeats of the Rous sarcoma virus (Gorman et al., Proc. Natl. Acad. Sci. USA 79:6777
[1982]) and the human cytomegalovirus (Boshart et al., Cell 41:521
[1985]).
[0113] As used herein, the term "promoter/enhancer" denotes a segment of DNA which contains sequences capable of providing both promoter and enhancer functions (i.e., the functions provided by a promoter element and an enhancer element, see above for a discussion of these functions). For example, the long terminal repeats of retroviruses contain both promoter and enhancer functions. The enhancer/promoter may be "endogenous" or "exogenous" or "heterologous." An "endogenous" enhancer/promoter is one that is naturally linked with a given gene in the genome. An "exogenous" or "heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene by means of genetic manipulation (i.e., molecular biological techniques such as cloning and recombination) such that transcription of that gene is directed by the linked enhancer/promoter.
[0114] Regulatory elements may be tissue specific or cell specific. The term "tissue specific" as it applies to a regulatory element refers to a regulatory element that is capable of directing selective expression of a nucleotide sequence of interest to a specific type of tissue (e.g., liver) in the relative absence of expression of the same nucleotide sequence of interest in a different type of tissue (e.g., lung).
[0115] Tissue specificity of a regulatory element may be evaluated by, for example, operably linking a reporter gene to a promoter sequence (which is not tissue-specific) and to the regulatory element to generate a reporter construct, introducing the reporter construct into the genome of an animal such that the reporter construct is integrated into every tissue of the resulting transgenic animal, and detecting the expression of the reporter gene (e.g., detecting mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues of the transgenic animal. The detection of a greater level of expression of the reporter gene in one or more tissues relative to the level of expression of the reporter gene in other tissues shows that the regulatory element is "specific" for the tissues in which greater levels of expression are detected. Thus, the term "tissue-specific" (e.g., liver-specific) as used herein is a relative term that does not require absolute specificity of expression. In other words, the term "tissue-specific"does not require that one tissue have extremely high levels of expression and another tissue have no expression. It is sufficient that expression is greater in one tissue than another. By contrast, "strict" or "absolute" tissue-specific expression is meant to indicate expression in a single tissue type (e.g., liver) with no detectable expression in other tissues.
[0116] The term "cell type specific" as applied to a regulatory element refers to a regulatory element that is capable of directing selective expression of a nucleotide sequence of interest in a specific type of cell in the relative absence of expression of the same nucleotide sequence of interest in a different type of cell within the same tissue. The term "cell type specific" when applied to a regulatory element also means a regulatory element capable of promoting selective expression of a nucleotide sequence of interest in a region within a single tissue.
[0117] Cell type specificity of a regulatory element may be assessed using methods well known in the art (e.g., immunohistochemical staining and/or Northern blot analysis). Briefly, for immunohistochemical staining, tissue sections are embedded in paraffin, and paraffin sections are reacted with a primary antibody specific for the polypeptide product encoded by the nucleotide sequence of interest whose expression is regulated by the regulatory element. A labeled (e.g., peroxidase conjugated) secondary antibody specific for the primary antibody is allowed to bind to the sectioned tissue and specific binding detected (e.g., with avidin/biotin) by microscopy. Briefly, for Northern blot analysis, RNA is isolated from cells and electrophoresed on agarose gels to fractionate the RNA according to size followed by transfer of the RNA from the gel to a solid support (e.g., nitrocellulose or a nylon membrane). The immobilized RNA is then probed with a labeled oligo-deoxyribonucleotide probe or DNA probe to detect RNA species complementary to the probe used. Northern blots are a standard tool of molecular biologists.
[0118] The term "promoter," "promoter element," or "promoter sequence" as used herein, refers to a DNA sequence which when ligated to a nucleotide sequence of interest is capable of controlling the transcription of the nucleotide sequence of interest into mRNA. A promoter is typically, though not necessarily, located 5' (i.e., upstream) of a nucleotide sequence of interest whose transcription into mRNA it controls, and provides a site for specific binding by RNA polymerase and other transcription factors for initiation of transcription.
[0119] Promoters may be constitutive or regulatable. The term "constitutive" when made in reference to a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid sequence in the absence of a stimulus (e.g., heat shock, chemicals, etc.). In contrast, a "regulatable" promoter is one that is capable of directing a level of transcription of an operably linked nucleic acid sequence in the presence of a stimulus (e.g., heat shock, chemicals, etc.), which is different from the level of transcription of the operably linked nucleic acid sequence in the absence of the stimulus.
[0120] As used herein, the term "nucleic acid binding protein" refers to proteins that bind to nucleic acid, and in particular to proteins that cause increased (i.e., activators or transcription factors) or decreased (i.e., inhibitors, repressors) transcription from a gene.
[0121] The presence of "splicing signals" on an expression vector often results in higher levels of expression of the recombinant transcript. Splicing signals mediate the removal of introns from the primary RNA transcript and consist of a splice donor and acceptor site (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York
[1989], pp. 16.7-16.8). A commonly used splice donor and acceptor site is the splice junction from the 16S RNA of SV40.
[0122] Efficient expression of recombinant DNA sequences in eukaryotic cells requires expression of signals directing the efficient termination and polyadenylation of the resulting transcript. Transcription termination signals are generally found downstream of the polyadenylation signal and are a few hundred nucleotides in length. The term "poly A site" or "poly A sequence" as used herein denotes a DNA sequence that directs both the termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the recombinant transcript is desirable as transcripts lacking a poly A tail are unstable and are rapidly degraded. The poly A signal utilized in an expression vector may be "heterologous" or "endogenous." An endogenous poly A signal is one that is found naturally at the 3' end of the coding region of a given gene in the genome. A heterologous poly A signal is one that is isolated from one gene and placed 3' of another gene. A commonly used heterologous poly A signal is the SV40 poly A signal. The SV40 poly A signal is contained on a 237 bp BamHI/BclI restriction fragment and directs both termination and polyadenylation (Sambrook, supra, at 16.6-16.7).
[0123] Eukaryotic expression vectors may also contain "viral replicons" or "viral origins of replication." Viral replicons are viral DNA sequences that allow for the extrachromosomal replication of a vector in a host cell expressing the appropriate replication factors. Vectors that contain either the SV40 or polyoma virus origin of replication replicate to high "copy number" (up to 10.sup.4 copies/cell) in cells that express the appropriate viral T antigen. Vectors that contain the replicons from bovine papillomavirus or Epstein-Barr virus replicate extrachromosomally at "low copy number" (.about.100 copies/cell). However, it is not intended that expression vectors be limited to any particular viral origin of replication.
[0124] As used herein, the term "long terminal repeat" of "LTR" refers to transcriptional control elements located in or isolated from the U3 region 5' and 3' of a retroviral genome. As is known in the art, long terminal repeats may be used as control elements in retroviral vectors, or isolated from the retroviral genome and used to control expression from other types of vectors.
[0125] As used herein, the terms "RNA export element" or "Pre-mRNA Processing Enhancer (PPE)" refer to 3' and 5' cis-acting post-transcriptional regulatory elements that enhance export of RNA from the nucleus. "PPE" elements include, but are not limited to Mertz sequences (described in U.S. Pat. Nos. 5,914,267 and 5,686,120, all of which are incorporated herein by reference) and woodchuck mRNA processing enhancer (WPRE; WO99/14310 and U.S. Pat. No. 6,136,597, each of which is incorporated herein by reference).
[0126] As used herein, the term "polycistronic" refers to an mRNA encoding more than polypeptide chain (See, e.g., WO 93/03143, WO 88/05486, and European Pat. No. 117058, all of which are incorporated herein by reference). Likewise, the term "arranged in polycistronic sequence" refers to the arrangement of genes encoding two different polypeptide chains in a single mRNA.
[0127] As used herein, the term "internal ribosome entry site" or "IRES" refers to a sequence located between polycistronic genes that permits the production of the expression product originating from the second gene by internal initiation of the translation of the dicistronic mRNA. Examples of internal ribosome entry sites include, but are not limited to, those derived from foot and mouth disease virus (FDV), encephalomyocarditis virus, poliovirus and RDV (Scheper et al., Biochem. 76: 801-809
[1994]; Meyer et al., J. Virol. 69: 2819-2824
[1995]; Jang et al., 1988, J. Virol. 62: 2636-2643
[1998]; Haller et al., J. Virol. 66: 5075-5086
[1995]). Vectors incorporating IRES's may be assembled as is known in the art. For example, a retroviral vector containing a polycistronic sequence may contain the following elements in operable association: nucleotide polylinker, gene of interest, an internal ribosome entry site and a mammalian selectable marker or another gene of interest. The polycistronic cassette is situated within the retroviral vector between the 5' LTR and the 3' LTR at a position such that transcription from the 5' LTR promoter transcribes the polycistronic message cassette. The transcription of the polycistronic message cassette may also be driven by an internal promoter (e.g., cytomegalovirus promoter) or an inducible promoter, which may be preferable depending on the use. The polycistronic message cassette can further comprise a cDNA or genomic DNA (gDNA) sequence operatively associated within the polylinker. Any mammalian selectable marker can be utilized as the polycistronic message cassette mammalian selectable marker. Such mammalian selectable markers are well known to those of skill in the art and can include, but are not limited to, kanamycin/G418, hygromycin B or mycophenolic acid resistance markers.
[0128] As used herein, the term "retrovirus" refers to a retroviral particle which is capable of entering a cell (i.e., the particle contains a membrane-associated protein such as an envelope protein or a viral G glycoprotein which can bind to the host cell surface and facilitate entry of the viral particle into the cytoplasm of the host cell) and integrating the retroviral genome (as a double-stranded provirus) into the genome of the host cell. The term "retrovirus" encompasses Oncovirinae (e.g., Moloney murine leukemia virus (MoMOLV), Moloney murine sarcoma virus (MoMSV), and Mouse mammary tumor virus (MMTV), Spumavirinae, amd Lentivirinae (e.g., Human immunodeficiency virus, Simian immunodeficiency virus, Equine infection anemia virus, and Caprine arthritis-encephalitis virus; See, e.g., U.S. Pat. Nos. 5,994,136 and 6,013,516, both of which are incorporated herein by reference).
[0129] As used herein, the term "retroviral vector" refers to a retrovirus that has been modified to express a gene of interest. Retroviral vectors can be used to transfer genes efficiently into host cells by exploiting the viral infectious process. Foreign or heterologous genes cloned (i.e., inserted using molecular biological techniques) into the retroviral genome can be delivered efficiently to host cells that are susceptible to infection by the retrovirus. Through well known genetic manipulations, the replicative capacity of the retroviral genome can be destroyed. The resulting replication-defective vectors can be used to introduce new genetic material to a cell but they are unable to replicate. A helper virus or packaging cell line can be used to permit vector particle assembly and egress from the cell. Such retroviral vectors comprise a replication-deficient retroviral genome containing a nucleic acid sequence encoding at least one gene of interest (i.e., a polycistronic nucleic acid sequence can encode more than one gene of interest), a 5' retroviral long terminal repeat (5' LTR); and a 3' retroviral long terminal repeat (3' LTR).
[0130] The term "pseudotyped retroviral vector" refers to a retroviral vector containing a heterologous membrane protein. The term "membrane-associated protein" refers to a protein (e.g., a viral envelope glycoprotein or the G proteins of viruses in the Rhabdoviridae family such as VSV, Piry, Chandipura and Mokola) that are associated with the membrane surrounding a viral particle; these membrane-associated proteins mediate the entry of the viral particle into the host cell. The membrane associated protein may bind to specific cell surface protein receptors, as is the case for retroviral envelope proteins or the membrane-associated protein may interact with a phospholipid component of the plasma membrane of the host cell, as is the case for the G proteins derived from members of the Rhabdoviridae family.
[0131] The term "heterologous membrane-associated protein" refers to a membrane-associated protein which is derived from a virus that is not a member of the same viral class or family as that from which the nucleocapsid protein of the vector particle is derived. "Viral class or family" refers to the taxonomic rank of class or family, as assigned by the International Committee on Taxonomy of Viruses.
[0132] The term "Rhabdoviridae" refers to a family of enveloped RNA viruses that infect animals, including humans, and plants. The Rhabdoviridae family encompasses the genus Vesiculovirus that includes vesicular stomatitis virus (VSV), Cocal virus, Piry virus, Chandipura virus, and Spring viremia of carp virus (sequences encoding the Spring viremia of carp virus are available under GenBank accession number U18101). The G proteins of viruses in the Vesiculovirus genera are virally-encoded integral membrane proteins that form externally projecting homotrimeric spike glycoproteins complexes that are required for receptor binding and membrane fusion. The G proteins of viruses in the Vesiculovirus genera have a covalently bound palmititic acid (C.sub.16) moiety. The amino acid sequences of the G proteins from the Vesiculoviruses are fairly well conserved. For example, the Piry virus G protein share about 38% identity and about 55% similarity with the VSV G proteins (several strains of VSV are known, e.g., Indiana, New Jersey, Orsay, San Juan, etc., and their G proteins are highly homologous). The Chandipura virus G protein and the VSV G proteins share about 37% identity and 52% similarity. Given the high degree of conservation (amino acid sequence) and the related functional characteristics (e.g., binding of the virus to the host cell and fusion of membranes, including syncytia formation) of the G proteins of the Vesiculoviruses, the G proteins from non-VSV Vesiculoviruses may be used in place of the VSV G protein for the pseudotyping of viral particles. The G proteins of the Lyssa viruses (another genera within the Rhabdoviridae family) also share a fair degree of conservation with the VSV G proteins and function in a similar manner (e.g., mediate fusion of membranes) and therefore may be used in place of the VSV G protein for the pseudotyping of viral particles. The Lyssa viruses include the Mokola virus and the Rabies viruses (several strains of Rabies virus are known and their G proteins have been cloned and sequenced). The Mokola virus G protein shares stretches of homology (particularly over the extracellular and transmembrane domains) with the VSV G proteins which show about 31% identity and 48% similarity with the VSV G proteins. Preferred G proteins share at least 25% identity, preferably at least 30% identity and most preferably at least 35% identity with the VSV G proteins. The VSV G protein from which New Jersey strain (the sequence of this G protein is provided in GenBank accession numbers M27165 and M21557) is employed as the reference VSV G protein.
[0133] As used herein, the term "lentivirus vector" refers to retroviral vectors derived from the Lentiviridae family (e.g., human immunodeficiency virus, simian immunodeficiency virus, equine infectious anemia virus, and caprine arthritis-encephalitis virus) that are capable of integrating into non-dividing cells (See, e.g., U.S. Pat. Nos. 5,994,136 and 6,013,516, both of which are incorporated herein by reference).
[0134] The term "pseudotyped lentivirus vector" refers to lentivirus vector containing a heterologous membrane protein (e.g., a viral envelope glycoprotein or the G proteins of viruses in the Rhabdoviridae family such as VSV, Piry, Chandipura and Mokola).
[0135] As used herein, the term "transposon" refers to transposable elements (e.g., Tn5, Tn7, and Tn10) that can move or transpose from one position to another in a genome. In general, the transposition is controlled by a transposase. The term "transposon vector," as used herein, refers to a vector encoding a nucleic acid of interest flanked by the terminal ends of transposon. Examples of transposon vectors include, but are not limited to, those described in U.S. Pat. Nos. 6,027,722; 5,958,775; 5,968,785; 5,965,443; and 5,719,055, all of which are incorporated herein by reference.
[0136] As used herein, the tem "adeno-associated virus (AAV) vector" refers to a vector derived from an adeno-associated virus serotype, including without limitation, AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAVX7, etc. AAV vectors can have one or more of the AAV wild-type genes deleted in whole or part, preferably the rep and/or cap genes, but retain functional flanking ITR sequences.
[0137] AAV vectors can be constructed using recombinant techniques that are known in the art to include one or more heterologous nucleotide sequences flanked on both ends (5' and 3') with functional AAV ITRs. In the practice of the invention, an AAV vector can include at least one AAV ITR and a suitable promoter sequence positioned upstream of the heterologous nucleotide sequence and at least one AAV ITR positioned downstream of the heterologous sequence. A "recombinant AAV vector plasmid" refers to one type of recombinant AAV vector wherein the vector comprises a plasmid. As with AAV vectors in general, 5' and 3' ITRs flank the selected heterologous nucleotide sequence.
[0138] AAV vectors can also include transcription sequences such as polyadenylation sites, as well as selectable markers or reporter genes, enhancer sequences, and other control elements that allow for the induction of transcription. Such control elements are described above.
[0139] As used herein, the term "AAV virion" refers to a complete virus particle. An AAV virion may be a wild type AAV virus particle (comprising a linear, single-stranded AAV nucleic acid genome associated with an AAV capsid, i.e., a protein coat), or a recombinant AAV virus particle (described below). In this regard, single-stranded AAV nucleic acid molecules (either the sense/coding strand or the antisense/anticoding strand as those term's are generally defined) can be packaged into an AAV virion; both the sense and the antisense strands are equally infectious.
[0140] As used herein, the term "recombinant AAV virion" or "rAAV" is defined as an infectious, replication-defective virus composed of an AAV protein shell encapsidating (i.e., surrounding with a protein coat) a heterologous nucleotide sequence, which in turn is flanked 5' and 3' by AAV ITRs. A number of techniques for constructing recombinant AAV virions are known in the art (See, e.g., U.S. Pat. No. 5,173,414; WO 92/01070; WO 93/03769; Lebkowski et al., Molec. Cell. Biol. 8:3988-3996
[1988]; Vincent et al., Vaccines 90
[1990] (Cold Spring Harbor Laboratory Press); Carter, Current Opinion in Biotechnology 3:533-539
[1992]; Muzyczka, Current Topics in Microbiol. and Immunol. 158:97-129
[1992]; Kotin, Human Gene Therapy 5:793-801
[1994]; Shelling and Smith, Gene Therapy 1:165-169
[1994]; and Zhou et al., J. Exp. Med. 179:1867-1875
[1994], all of which are incorporated herein by reference).
[0141] Suitable nucleotide sequences for use in AAV vectors (and, indeed, any of the vectors described herein) include any functionally relevant nucleotide sequence. Thus, the AAV vectors of the present invention can comprise any desired gene that encodes a protein that is defective or missing from a target cell genome or that encodes a non-native protein having a desired biological or therapeutic effect (e.g., an antiviral function), or the sequence can correspond to a molecule having an antisense or ribozyme function. Suitable genes include those used for the treatment of inflammatory diseases, autoimmune, chronic and infectious diseases, including such disorders as AIDS, cancer, neurological diseases, cardiovascular disease, hypercholestemia; various blood disorders including various anemias, thalasemias and hemophilia; genetic defects such as cystic fibrosis, Gaucher's Disease, adenosine deaminase (ADA) deficiency, emphysema, etc. A number of antisense oligonucleotides (e.g., short oligonucleotides complementary to sequences around the translational initiation site (AUG codon) of an mRNA) that are useful in antisense therapy for cancer and for viral diseases have been described in the art. (See, e.g., Han et al., Proc. Natl. Acad. Sci. USA 88:4313-4317
[1991]; Uhlmann et al., Chem. Rev. 90:543-584
[1990]; Helene et al., Biochim Biophys. Acta. 1049:99-125
[1990]; Agarwal et al., Proc. Natl. Acad. Sci. USA 85:7079-7083
[1989]; and Heikkila et al., Nature 328:445-449
[1987]). For a discussion of suitable ribozymes, see, e.g., Cech et al. (1992) J. Biol. Chem. 267:17479-17482 and U.S. Pat. No. 5,225,347, incorporated herein by reference.
[0142] By "adeno-associated virus inverted terminal repeats" or "AAV ITRs" is meant the art-recognized palindromic regions found at each end of the AAV genome which function together in cis as origins of DNA replication and as packaging signals for the virus. For use with the present invention, flanking AAV ITRs are positioned 5' and 3 of one or more selected heterologous nucleotide sequences and, together with the rep coding region or the Rep expression product, provide for the integration of the selected sequences into the genome of a target cell.
[0143] The nucleotide sequences of AAV ITR regions are known (See, e.g., Kotin, Human Gene Therapy 5:793-801
[1994]; Berns, K. I. "Parvoviridae and their Replication" in Fundamental Virology, 2nd Edition, (B. N. Fields and D. M. Knipe, eds.) for the AAV-2 sequence. As used herein, an "AAV ITR" need not have the wild-type nucleotide sequence depicted, but may be altered, e.g., by the insertion, deletion or substitution of nucleotides. Additionally, the AAV ITR may be derived from any of several AAV serotypes, including without limitation, AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAVX7, etc. The 5' and 3' ITRs which flank a selected heterologous nucleotide sequence need not necessarily be identical or derived from the same AAV serotype or isolate, so long as they function as intended, i.e., to allow for the integration of the associated heterologous sequence into the target cell genome when the rep gene is present (either on the same or on a different vector), or when the Rep expression product is present in the target cell.
[0144] As used herein the term, the term "in vitro" refers to an artificial environment and to processes or reactions that occur within an artificial environment. In vitro environments can consist of, but are not limited to, test tubes and cell cultures. The term "in vivo" refers to the natural environment (e.g., an animal or a cell) and to processes or reaction that occur within a natural environment.
[0145] As used herein, the term "passage" refers to the process of diluting a culture of cells that has grown to a particular density or confluency (e.g., 70% or 80% confluent), and then allowing the diluted cells to regrow to the particular density or confluency desired (e.g., by replating the cells or establishing a new roller bottle culture with the cells.
[0146] As used herein, the term "stable," when used in reference to genome, refers to the stable maintenance of the information content of the genome from one generation to the next, or, in the particular case of a cell line, from one passage to the next. Accordingly, a genome is considered to be stable if no gross changes occur in the genome (e.g., a gene is deleted or a chromosomal translocation occurs). The term "stable" does not exclude subtle changes that may occur to the genome such as point mutations.
[0147] As used herein, the term "purified" refers to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated. An "isolated nucleic acid sequence" is therefore a purified nucleic acid sequence. "Substantially purified" molecules are at least 60% free, preferably at least 75% free, and more preferably at least 90% free from other components with which they are naturally associated.
GENERAL DESCRIPTION OF THE INVENTION
[0148] The present invention uses the novel paradigm of utilizing exogenous cell-cell communication pathways to autoregulate cell proliferation and differentation. In some embodiments, the present invention provides a variety of cell-cell communication pathways and regulatory motifs that can be incorporated into any desired cell type.
[0149] In some preferred embodiments, the systems, compositions and methods of the present invention allow for tissue homeostasis to be achieved between a source of differentiable cells such as stem cells and target differentiated cells produced from the differentiable cells so that both cell populations are maintained. In some embodiments, the systems, compositions and methods control the proliferation and differentiation of the differentiable cell population so that a population of differentiable cells is maintained over time that can differentiate into the desired target differentiated cells. In preferred embodiments, the differentiable cells are subject to a regulatory feedback mechanism that controls a) the proliferation or inhibition of proliferation of differentiable cells into additional differentiable cells so that a source of differentiable cells is stably maintained and b) the differentiation of differentiable cells into target differentiated cells so that a stable population of target differentiated cells is maintained.
[0150] In some embodiments, the present invention utilizes a series of modules to control proliferation and differentiation of differentiable cells. In some embodiments, the modules each comprise one more genes. In further embodiments, the genes are provided on vectors that can be introduced into a desired cell line. In some embodiments, the first module comprises one or more nucleic acid constructs that interact to provide population control for a population of proliferating cells. In some embodiments, the proliferating cells are differentiable cells. In some embodiments, the second module comprises one or more nucleic acid constructs that interact to control commitment of the differentiable cells to differentiate. In some preferred embodiments, if the differentiable cell population is above a certain threshold and the target differentiated cells are below a certain threshold, some of the differentiable cells commit to differentiate into the target differentiated cells. In some embodiments, the third module comprises one or more nucleic acid constructs that interact to cause differentiation of differentiable cells into target differentiated cells. In some embodiments, the fourth module comprises one or more nucleic acid constructs that trigger apoptosis if the differentiable cell migrates out of a desired location.
[0151] Precise in vivo control of stem cell differentiation into desired cell types, such as insulin-producing .beta. cells, the cell type that is adversely affected in Diabetes Mellitus, is achievable through use of the compositions and methods described herein. In some preferred embodiments, the approaches described herein ensure a constant and steady supply of precursor cells and p cells which autoregulate insulin production. This approach also has the potential to bypass graft vs. host disease by using naive embryonic stem cells [Burt et al., Journal of Experimental Medicine, 199(7):895-904, 2004] or the patient's own adult stem cells. Finally, the approach is modular, controllable and flexible, allowing us to genetically engineer pathways that best address a patient's disease state.
This approach represents a paradigm shift in tissue engineering and diabetes treatment. Artificial cell-cell communication coordinates cell population behavior and the formation of insulin-producing .beta. cells by precisely controlling gene expression in a two-step differentiation process. In the systems of the present invention, cells are not simply induced exogenously to differentiate, but rather are programmed to sense and respond to changes in their environment and coordinate their collective behavior based on the needs of the organism. It is important to initially build a large, undifferentiated reservoir of cells. Quorum sensing allows for controlled growth of mES cells until they reach the required density before they are directed to differentiate. Once mES cells have terminally differentiated into beta cells, they either stop dividing or divide at a very slow rate. Importantly, once these beta cells lose function or die due for example to an attack by the immune system, the ES cell reservoir detects this condition and produces new beta cells.
DETAILED DESCRIPTION OF THE INVENTION
[0152] The present invention utilizes artificial cell-cell communication pathways to control proliferation and differentiation of a population of differentiable cells. In some embodiments, multiple exogenous cell-cell communication pathways are introduced into a population of differentiable cells to provide a regulatory feedback control system. A schematic representation of an exemplary system is provided in FIG. 3. FIG. 3 depicts a modular system for control of proliferation and differentiation of cells. In preferred embodiments, the cells of the present invention comprise one or more of the depicted modules or components of the depicted modules. The depicted modules comprise stem cell population control modules, commitment to differentiation modules, cell differentiation modules, and apoptosis modules. In some preferred embodiments, the cells of the present invention comprise at least one population control module and commitment module that interface with a differentiation module and an apoptosis module. In some preferred embodiments, the cell population control module and commitment module interface to control the number of uncommitted cells in the population (i.e., cells that retain the ability to differentiate) and the number of committed cells in the population (i.e., cells that have committed to differentiate, but have not necessarily differentiated into the desired target cell type).
[0153] In preferred embodiments, the modules of the present invention comprise separate motifs or circuit designs that interact to produce a desired outcome. In particularly preferred embodiments, cell-cell communication, oscillator, cascade, and/or toggle switch motifs are incorporated into one or more of the modules of the present invention. In preferred embodiments, these motifs interact to provide a symmetry breaking condition so that a population of stem cells is maintained and not exhausted. These individual components and their interaction with one another is described in detail below.
[0154] In some embodiments, the present invention provides artificial cell-cell communication systems for use in mammalian cells. In some embodiments, cell-cell communication systems are derived from bacterial cell-cell communication systems from organisms such as Pseudomonas aeruginosa, Rhizobium leguminosarum, and Vibrio fischeri. These systems comprise genes that catalyze the synthesis of chemical signals of the acyl-homoserine lactone (AHL) family. In other embodiments, the cell-cell communication systems utilize secretion signals and viral internalization signals.
A. Population Control Module
[0155] As shown in FIG. 3, in some embodiments, the present invention provides a population control module comprising a pathway of exogenous genes that are preferably introduced into a population of stem cells. The population control module interacts with the commitment module to control the proliferation and differentiation of the cells within the system. The population control module is utilized to detect the number of stem cells in the system. If the stem cell population is low, then stem cell proliferation is allowed. If the stem cell population is high, then proliferation is inhibited. The commitment module (explained in more detail below) is used to detect the number of differentiated cells in the system (in this example, beta cells). If the beta cell population is high, proliferation of stem cells in inhibited. If the beta cell population is low, differentiation of stem cells is induced along with stem cell proliferation. A third module, described in more detail below, interacts with the two cell-cell communication modules to cause differentiation of stem cells into the desired target differentiated cell, in this example beta cells.
[0156] Referring to FIG. 3, in some embodiments, the population control module preferably comprises genes encoding a cell-cell communication pathway, preferably a RhlI/RhlR cell-cell communication pathway. The invention is not limited to the use of any particular cell-cell communication pathway in the first module. In other embodiments, Lux1/LuxR, CinI/CinR, or viral-based cell-cell communication systems are utilized. In preferred embodiments, bacterial cell-cell signaling pathways are utilized. As shown in FIG. 3, the population control module senses the density of the stem cell population. In some embodiments, the population control module comprises an exogenous RhlI gene operably linked to a repressor element, preferably a LacI repressor element. In the absence of the repressor, the exogenous RhlI gene catalyzes the synthesis of C.sub.4HSL. The population control module further comprises an exogenous RhrR gene and an exogenous gene comprising a Growth Arrest Factor (GAF) operably linked to a promoter responsive to the C.sub.4HSL/RhrR complex. Under conditions of a high density of stem cells, the corresponding high amount of C.sub.4HSL interacts with the gene product of RhrR to form a complex, which activates expression of GAF, thus inhibiting proliferation of the stem cells.
[0157] Exemplary vectors for the components shown in FIG. 3 are provided in FIGS. 28A 28G-37A-37G, where A1=Gal4, R1=ZF1, R2=ZF2, R3=ZF3, R4=ZF4, A2=CymVP16, and P is either Hef1a or minCMV.
B. Commitment Module
[0158] Referring to FIG. 3, in some embodiments, the present invention provides a commitment module comprising multiple motifs or circuit elements that allow for symmetry breaking and control the commitment of cell to differentiate. In some embodiments, the module comprises one or more of the following motifs: an oscillator, a cascade, a toggle switch, and a cell-cell communication pathway. In some preferred embodiments, these motifs interact, as described in detail below, to allow commitment to differentiate when three conditions are met within the population of cells containing the modules: a) there is a high level of uncommitted cells; b) there are cells that allowed to commit; and c) there is a low level of committed cells.
[0159] Referring to FIG. 3, an embodiment of an oscillator is depicted by A1 and R1. In preferred embodiments, A1 is an activator and R1 is a repressor that are connected in way so that they oscillate. In some embodiments, as depicted, the activator A1 activates itself and the repressor R1 and the repressor R1 represses the activator A1. This oscillation provides symmetry breaking for the population of cells containing the motif. Due to the oscillation of activator A1 and repressor R1, the population of cells containing activator A1 and repressor R1 is asynchronous. At any given time, only a portion of the cells in the population will have a high level of activator A1. In some embodiments, as depicted, only cells with high activator A1 are allowed to commit to differentiate, thus the remaining portion of the cells with low levels of activator A1 is reserved in an uncommitted state. This ensures that a population of uncommitted stem cells will be maintained. Otherwise, all uncommitted stem cells could commit to differentiate, allowing the exhaustion of the stem cells from the population.
[0160] Referring to FIG. 3, an embodiment of a cascade is depicted by repressor R2, repressor R3, activator A2, and repressor R4. In some embodiments, activator A2 is operably linked to a promoter responsive to C.sub.4HSL/RhrR and thus interfaces with the population control module. In some embodiments, as depicted, activator A1 from the oscillator activates repressor R2, which in turn represses repressor R3, which is a repressor of activator A2. Activator A2 interfaces with the population control module and is activated by the C.sub.4HSL/RhrR complex. Thus, if the level of activator A1 is high, repressor R2 is activated which represses repressor R3, allowing the level of activator A2 to be high if there is a concomitant high concentration of uncommitted stem cells that are producing C.sub.4HSL/RhrR. As a result, the level of activator A2 can only be high when two conditions are met: 1) there is a high level of A1 within the cell; and 2) there is a high concentration of uncommitted stem cells. When these two conditions are met, activator A2 in turn activates repressor R4, which interfaces with the cell-cell communication and toggle switch motifs.
[0161] Still referring to FIG. 3, in order for cells within the population to commit as described above, a third condition must be met: a low concentration of committed cells. In some embodiments, the level of committed cells is detected via a cell-cell communication pathway, preferably a CinI/CinR cell-cell communication pathway. The invention is not limited to the use of any particular cell-cell communication pathway in the second module, although the particular cell-cell communication pathway utilized should preferably be different than the cell-cell communication pathway selected for the cell population module. In other embodiments, Lux1/LuxR, RhlI/RhlR, or viral-based cell-cell communication systems are utilized.
[0162] In some embodiments, as depicted in FIG. 3, the cell-cell communication pathway comprises an exogenous CinI gene, and exogenous CinR gene, and an exogenous gene encoding the repressor CI operably linked to a promoter responsive to the C.sub.14HSL/CinR complex. High levels of C.sub.14HSL are indicative of a high concentration of committed cells. Under these conditions, repressor CI is activated and in turn represses repressor R4. In some embodiments, repressor R4 interacts with the toggle switch motif as detailed below. When the level of repressor R4 is low, there is no commitment. As can be seen, the level of R4 can only be high when the level of activator A2 is high and the level of CI is low. Low levels of C.sub.14HSL are indicative of a low concentration of committed cells and under these conditions, the level of repressor CI is low, allowing the level repressor R4 to be high.
[0163] Still referring to FIG. 3, in some embodiments, the commitment module further comprises a toggle switch motif that switches cells between an uncommitted state and a committed state. In some embodiments, as depicted in FIG. 3, the toggle switch motif comprises two repressors, in preferred embodiments, TetR and LacI. In some embodiments, TetR and LacI cross repress each other. High levels of TetR within a cell are indicative of an uncommitted state. As shown in FIG. 3, IPTG can be added to the cells to repress LacI, thus causing the cells to enter into an uncommitted state. Alternatively, as described above, repressor R4 is high in cells where activator A1 is high and where there is a high concentration of uncommitted stem cells. When these two conditions are met, along with the condition of low number of committed cells, repressor R4 represses TetR, allowing a high level of LacI. High levels of LacI in a cell trigger the transition from an uncommitted cell to a committed cell by repressing TetR. As depicted in FIG. 3, when LacI is high, repression of CinI is released, causing the synthesis of C.sub.14HSL, the activation of CI and the repression of R4. These events are indicative of a committed cell. As also depicted in FIG. 3, high LacI within a cell represses RhlI of the cell population module, inhibiting synthesis of C.sub.4HSL.
[0164] Exemplary vectors for the components shown in FIG. 3 are provided in FIGS. 28A-28G-37A-37G, where A1=Gal4, R1=ZF1, R2=ZF2, R3=ZF3, R4=ZF4, A2=CymVP16, and P is either Hef1a or minCMV.
C. Cell Differentiation Module
[0165] As shown in FIG. 3, in some embodiments, the present invention provides a cell differentiation module that is responsive to the commitment module. In some embodiments, the cell-differentiation module comprises at least an exogenous gene encoding a first cell fate regulator that is operably linked to a repressor, such as the TetR repressor. In some embodiments, as depicted in FIG. 3, the TetR/LacI toggle switch controls the expression of the cell fate regulator. When the three conditions of a) high level of uncommitted cells, b) cell that that is allowed to commit, and c) low level of committed cells are met, TetR is repressed, releasing the repression of the a cell fate regulator such as Gata4 or Sox 17 (not shown). Gata4 expression is the initial step in the programmed two-step differentiation of ES cells into .beta. cells using established cell fate regulators. In some preferred embodiments, two-step differentiation of ES cells first to visceral or definitive endoderm and then to pancreatic .beta. cells is facilitated by using controlled expression of established cell fate regulators triggered by cell-cell communication. A preferred system for the integration of the relevant cell fate regulators with the cell population and commitment modules is shown in FIG. 3. When the three conditions are met, either Gata4 is expressed. These two cell fate regulators were chosen because they are both expressed in mammalian visceral endoderm [Ritz-Laser et al., Molecular Endocrinology, 19(3):759-770, 2005; Ku et al., Stem Cells, 22:1205-1217, 2004]. Both Gata4 and Sox17 are required transcription factors in pancreatic organogenesis. Gata4 is present in very early visceral endoderm that differentiates into both insulin-producing and glucagon-producing cells [42]. Sox17 is expressed exclusively in insulin-producing cells and appears slightly later than Gata4 [43]. In preferred embodiments, endoderm differentiation is verified by visualizing the presence of endodermal markers such as Hnf3.beta. or lamininB1 using immunohistochemistry and Western blots. Cells that remain in the stem cell state are identified by staining with alkaline phosphatase in a standard ES cell assay.
[0166] In some preferred embodiments, the cell-cell communication systems of the present invention comprise vectors that express cell fate regulators that regulate the differentiation of endoderm to .beta. cells. Differentiation into endoderm triggers the expression of endoderm specific factors. In preferred embodiments, the genes encoding cell fate regulators are operably linked to a promoter that is regulated (preferably induced) by one or more endoderm specific factors. It is contemplated that cells that differentiate into endoderm naturally express the endodermal marker .alpha.-fetoprotein (AFP) which binds the AFP promoter. In the third module of the present invention, AFP regulates expression of Ngn3, Pdx1 and/or other cell fate regulators such as EGFP via the AFP promoter, hence the differentiation from ES to endoderm is visualized with the appearance of green fluorescence. In vivo, Ngn3 and Pdx1 are both expressed at various points during cellular differentiation from endoderm to .beta. cells [Soria, Differentiation, 68:205-219, 2001] and both have been found to be necessary for terminal differentiation into .beta. cells. Naturally occurring in vivo specialization of endodermal cells into .beta. cells depends on the existence of a complex set of environmental cues which may not be present in the disease state. In the engineered system of the present invention, however, differentiation of mES cells into endoderm internally forces the cell to subsequently specialize into .beta. cells. Upon terminal .beta. cell differentiation, insulin production activates DsRed expression from the Mouse Insulin Promoter (MIP). Cells expressing the fluorescent red protein are assayed for terminal differentiation by immunohistochemical assays and Western blotting of .beta. cell markers such as C-peptide, insulin and Nkx6. 1. Cells not expressing fluorescent red protein are histochemically assayed for the presence of AFP, Pdx1, Ngn3 and also for stem cell character using the standard ES cell assay. Cells not initially induced with IPTG should not fluoresce red and should remain in the ES cell state.
[0167] In other preferred embodiments, other cell fate regulators are expressed via the differentiation module. Examples of the cell fate regulators include, but are not limited to, Nkx6.1, Nkx2.2, Isl-1, NeuroD, Pax6, Fgf4, BRA, Wnt9, NCAD, CER, FoxA2, CxcR4, Hnf1B, Hnf4A, Hnf6, HlxB9, Pax4, Cgc, GHRL, SST, PPY, Activin, Fgf10, Cyc, RA, Ex4, DAPT, HGF and Igf1. In preferred embodiments, these cell fate regulators are inserted into lentiviral vectors and introduced into the cell line of interest. In some preferred embodiments, the cell fate regulators are operably linked to a stage specific promoter. Stage specific promoters are promoters that are expressed at a particular stage of differentiation, such as differentiation into endoderm, ectoderm or mesoderm. In some preferred embodiments, the cell fate regulators are operably linked to an AFP promoter.
[0168] Exemplary vectors for the components shown in FIG. 3 are provided in FIGS. 28A-28G-37A-37G, where A1=Gal4, R1=ZF1, R2=ZF2, R3=ZF3, R4=ZF4, A2=CymVP16, and P is either Hef1a or minCMV.
D. Apoptosis Module
[0169] As shown in Figure, in some embodiments, the cells of the present invention further comprise an apoptosis module. In some embodiments, the apopstosis module comprises a series of vectors comprising genes that trigger apoptosis is the cell migrate to an undesired location. In preferred embodiments, the stem cells of the present invention would undergo apoptosis if the cells leave the pancreas. As shown in FIG. 3, genes encoding an apoptosis pathway are operably linked to a repressor element responsive to a repressor that is synthesized in the presence of a signal for the pancreas (PS). In the absence of PS, the repressor is not synthesized and apoptosis is triggered.
E. Cell Penetration Element Based Cell-Cell Communication Systems
[0170] In some embodiments, the present invention provides cell-cell communication systems that utilize cell penetrating polypeptide elements. In some embodiments, the present invention provides a fusion protein comprising a secretion signal, cell penetrating polypeptide, and trans acting domain in operable combination, wherein at least one of said secretion signal, cell penetrating polypeptide and trans acting domain are from at least two different proteins. In further embodiments, the present invention provides a nucleic acid encoding the fusion protein. In further preferred embodiments, the present invention provides vectors comprising a nucleic acid encoding the fusion protein. In still other embodiments, the present invention provides cells that express the fusion protein. In some embodiments, the present invention further provides a nucleic acid comprising a promoter comprising an element that binds or is responsive to the trans-acting domain, wherein the promoter is operably linked to a protein of interest.
[0171] The present invention is not limited to the use of any particular secretion signals, cell penetrating polypeptides or trans-acting domains. Indeed, the present invention contemplates a modular and highly expandable library of artificial cell-cell communication signals by using translational fusions of cell permeable peptides with in silico designed zinc finger proteins and an enhancer or repressor. Such fusion proteins are exported by sender cells, translocate to the nucleus of receiver cells, where they control expression of genes in synthetic as well as endogenous signaling pathways by binding their cognate DNA binding sites.
[0172] CPPs are peptide sequences with the ability to translocate across the plasma membrane and to reach cytoplasmic and/or nuclear compartments in live cells after internalization. CPPs have been first described in the HIV TAT-1 and the Antennapedia proteins, where the translocation seems to reflect in vivo biological process. In the last decade, fusion proteins containing such CPPs have been widely used to deliver effector proteins into the cytoplasm or nucleus of target cells, predominantly by adding the purified fusion protein to the cell culture supernatant or injecting it intraperitoneally in vivo. The binding of TAT to the cell surface thought to involve heparan sulfate proteoglycans, and in vitro evidences suggest that the uptake is mediated by an energy-depending endocytic process involving lipid rafts. The release from these endocytic vesicles into the cytoplasm is less well understood, and has been shown to be a rate-limiting step in the transduction process. For TAT, the subsequent localization to the nucleus is achieved through an effective nuclear localization sequence (NLS), endogenously present in this protein.
[0173] The present invention is not limited to the use any particular CPP peptide. The following CPP peptides find use in the present invention:
TABLE-US-00001 TAT: SGYGRKKRRQRRRC Antp: SGRQIKIWFQNRRMKWKKC TAT: GRKKRRQRRRPPQG TAT: (47-60), N-Cys-Tyr.sup.47-Gly-Arg-Lys-Lys-Arg-Arg- Gln-Arg-Arg-Arg-Pro-Pro-Gln.sup.60-COOH; ANTP: N-Cys-Arg.sup.43-Gln-Ile-Lys-Ile-Trp-Phe-Gln-Asn- Arg-Arg-Met-Lys-Trp-Lys-Lys.sup.58-COOH Vectocell .RTM.: DPV3 sense, 5' GATCCCGTAAAAAGCGTCGTC GAGAAAGCCGTAAGAAACGTCGACG TGAAAGCA-3'; DPV3 antisense, 5'-AGCTTGCTTTCACGTCGACGTTTCTTA CGGCTTTCTCGACGACGCTTTTTACGG-3'; DPV15b sense, 5'-GATCCGGTGCGTATGATCTGCGTCGTCG AGAACGTCAGAGCCGTCTGCGTCGACGTGAAAGACAGAGCAG AA-3'; DPV3 antisense, 5'-AGCTTTCTGCTCTGTCTTTCACGTCGAC GCAGACGGCTCTGACGTTCTCGACGACGCAGATCATACGCAC CG-3'; DPV1047 sense, 5'-GATCCGTTAAACGTGGACTGAAACTTC GTCATGTTCGTCCGCGTGTGACCCGTGATGTGA-3'; DPV1047 antisense, 5'-AGCTTCACATCACGGGTCACACGCG GACGAACATGACGAAGTTTCAGTCCACGTTTAACG-3'.
[0174] ZFs of the Cys2His2 type contain about 30 amino acids that code for two b-strands and an a-helix that mediates interaction with a nucleotide triplet. The human genome contains at least 4000 such domains in over 700 proteins, which represents about 2% of human genes. ZFs recognizing many of the 64 triplets possible have been isolated and described. Such a ZF domain can be treated as modular, which means that multiple concatenated ZF domains (polydactyl ZFs) do bind to DTS with a multiple of three triplets length. Recently, databases and tools became available to easily engineer ZFP-DTS pairs in silico to further streamline this process (16-18). The optimal linker sequences to connect such single ZF domains to a ZFP have also been described, as well as the optimal positioning of transcriptional activators or inhibitors.
[0175] A translational fusion of a CPP with a ZFP, when exported by a sender cell, translocates into the nucleus of receiver cells. The transcellular delivery of such ZFPs would massively expand the possibilities we have at our disposition to manipulate synthetic as well as endogenous signaling pathways in stem cells. This would facilitate the programmed differentiation of stem cells into tissue patterns by design. By engineering cells that can be manipulated to decide between cell fates based on the given signal and working within a 3D matrix system in vitro, we will study the potential of this system in promoting organogenesis and tissue repair. The long-term goal would be to have safe self-regulating and self regenerating tissue/organs for clinical applications. Another purpose is to elucidate the architecture and dynamics of endogenous cell fate regulatory networks as they function to promote lineage choices.
[0176] An additional application for such an artificial cell-cell communication system would be the delivery of cell-permeable therapeutic proteins influencing cell signaling at the level of protein-protein interactions in a controlled fashion in vivo. As an example, the secretion of cell-permeable peptides of the wild-type tumor suppressor p53 from artificially engineered cells or tissue, would restore cell growth control of tumor cells with mutated forms of p53, which are present in almost all human cancers.
[0177] The present invention is likewise not limited to the use of any particular secretion signal. A secretion signal is any DNA sequence which when operably linked to a recombinant DNA sequence encodes a signal peptide which is capable of causing the secretion of the recombinant polypeptide. In general, the signal peptides comprise a series of about 15 to 30 hydrophobic amino acid residues (See, e.g., Zwizinski et al., J. Biol. Chem. 255(16): 7973-77
[1980], Gray et al., Gene 39(2): 247-54
[1985], and Martial et al., Science 205: 602-607
[1979]). Such secretion signal sequences are preferably derived from genes encoding polypeptides secreted from the cell type targeted for tissue specific expression. Secretory DNA sequences, however, are not limited to such sequences. Secretory DNA sequences from proteins secreted from many cell types and organisms may also be used (e.g., the secretion signals for t PA, serum albumin, lactoferrin, and growth hormone, and secretion signals from microbial genes encoding secreted polypeptides such as from yeast, filamentous fungi, and bacteria).
[0178] FIGS. 26A-26B and 27A-27B provide examples of a synthetic cell-cell communication circuit that utilizes CPP elements. Panel (26A and 27A) depicts internal signaling. TAT is expressed upon the addition of Dox, binds to is promoter pTAT, and induces expression of EGFP. Panel (26B and 26B) depicts cell-cell signaling Sender and receiver circuits are infected into separate cells. Sender cells express TAT upon addition of Dox; TAT is secreted, and enters receiver cells, where is localizes to the nucleus and induces the expression of EGFP.
F. Vector Systems
[0179] In some embodiments, the various exogenous genes described above are provided on vectors that are introduced into the desired cell line, such as a stem cell line. In some preferred embodiments, the vectors are lentiviral vectors. In some embodiments, additional genes required for the synthesis of precursors of C.sub.4HSL, C.sub.14HSL, or 30C.sub.6HSL are also introduced into the desired cell line. In some preferred embodiments, vectors encoding one or more genes from the Type II bacterial fatty acid synthesis (FAS) system are also introduced into mammalian cells. In some embodiments, the vectors encode Acyl Carrier Protein and Acyl-Acyl Carrier (ACP) Protein Synthase (AAS) including ACP and AAS.
[0180] In preferred embodiments, the components of the cell-cell communication system are included in separate vectors. The individual vectors are then introduced into the desired cell line. In preferred embodiments, where lentiviral vectors are utilized, the target cell line is transduced with the lentiviral vectors. The cell line may be co-infected with the vectors or the vectors may be introduced serially.
[0181] In some preferred embodiments, the genes for the components of the cell-cell communication system of the present invention (e.g., LuxI, ACP, and AAS) are optimized for expression in mammalian cells. In other embodiments, genes comprising the entire bacterial Type II Fatty Acid Synthase system are included on vectors and introduced in the cell line.
[0182] In preferred embodiments, as described in more detail below, the components of the cell-cell communication systems are introduced into a mammalian cell line. The present invention is not limited to the use of any particular cell lines. In some embodiments, the cell lines are pluripotent, multipotent or totipotent cell lines. In some preferred embodiments, the cells lines are stem cell lines. In preferred embodiments, the components are incorporated into expression vectors that are introduced into mammalian cells. In some preferred embodiments, the genes encoding the components of the cell-cell communication system are operably linked to exogenous promoters.
[0183] The modules and motifs of the present invention generally comprise multiple exogenous genes operably linked to promoters. In preferred embodiments, the exogenous genes operably linked to promoters are included in a vector. In preferred embodiments, the vectors are introduced into a cell. The present invention is not limited to the use of any particular vector system. Indeed, the use of a variety of vector systems is contemplated. In some preferred embodiments, the vectors are lentiviral vectors. In other embodiments, the vectors are retroviral vectors, pseudotyped retroviral vectors, pseudotyped lentiviral vectors, adenovirus vectors, plasmids, transposons, or artificial chromosomes.
[0184] The present invention also contemplates the use of lentiviral vectors to generate high copy number cell lines. The lentiviruses (e.g., equine infectious anemia virus, caprine arthritis-encephalitis virus, human immunodeficiency virus) are a subfamily of retroviruses that are able to integrate into non-dividing cells. The lentiviral genome and the proviral DNA have the three genes found in all retroviruses: gag, poi, and env, which are flanked by two LTR sequences. The gag gene encodes the internal structural proteins (e.g., matrix, capsid, and nucleocapsid proteins); the poi gene encodes the reverse transcriptase, protease, and integrase proteins; and the poi gene encodes the viral envelope glycoproteins. The 5' and 3' LTRs control transcription and polyadenylation of the viral RNAs. Additional genes in the lentiviral genome include the vif, vpr, tat, rev, vpu, nef, and vpx genes.
[0185] A variety of lentiviral vectors and packaging cell lines are known in the art and find use in the present invention (See, e.g., U U.S. Pat. Nos. 5,994,136 and 6,013,516, both of which are herein incorporated by reference). Furthermore, the VSV G protein has also been used to pseudotype retroviral vectors based upon the human immunodeficiency virus (HIV) (Naldini et al., Science 272:263
[1996]). Thus, the VSV G protein may be used to generate a variety of pseudotyped retroviral vectors and is not limited to vectors based on MoMLV. The lentiviral vectors may also be modified as described above to contain various regulatory sequences (e.g., signal peptide sequences, RNA export elements, and IRES's). After the lentiviral vectors are produced, they may be used to transfect host cells as described above for retroviral vectors.
[0186] In general, retroviruses (family Retroviridae) are divided into three groups: the spumaviruses (e.g., human foamy virus); the lentiviruses (e.g., human immunodeficiency virus and sheep visna virus) and the oncoviruses (e.g., MLV, Rous sarcoma virus).
[0187] Retroviruses are enveloped (i.e., surrounded by a host cell-derived lipid bilayer membrane) single-stranded RNA viruses which infect animal cells. When a retrovirus infects a cell, its RNA genome is converted into a double-stranded linear DNA form (i.e., it is reverse transcribed). The DNA form of the virus is then integrated into the host cell genome as a provirus. The provirus serves as a template for the production of additional viral genomes and viral mRNAs. Mature viral particles containing two copies of genomic RNA bud from the surface of the infected cell. The viral particle comprises the genomic RNA, reverse transcriptase and other pol gene products inside the viral capsid (which contains the viral gag gene products), which is surrounded by a lipid bilayer membrane derived from the host cell containing the viral envelope glycoproteins (also referred to as membrane-associated proteins).
[0188] The organization of the genomes of numerous retroviruses is well known to the art and this has allowed the adaptation of the retroviral genome to produce retroviral vectors. The production of a recombinant retroviral vector carrying a gene of interest is typically achieved in two stages.
[0189] First, the gene of interest is inserted into a retroviral vector which contains the sequences necessary for the efficient expression of the gene of interest (including promoter and/or enhancer elements which may be provided by the viral long terminal repeats (LTRs) or by an internal promoter/enhancer and relevant splicing signals), sequences required for the efficient packaging of the viral RNA into infectious virions (e.g., the packaging signal (Psi), the tRNA primer binding site (-PBS), the 3' regulatory sequences required for reverse transcription (+PBS)) and the viral LTRs. The LTRs contain sequences required for the association of viral genomic RNA, reverse transcriptase and integrase functions, and sequences involved in directing the expression of the genomic RNA to be packaged in viral particles. For safety reasons, many recombinant retroviral vectors lack functional copies of the genes that are essential for viral replication (these essential genes are either deleted or disabled); therefore, the resulting virus is said to be replication defective.
[0190] Second, following the construction of the recombinant vector, the vector DNA is introduced into a packaging cell line. Packaging cell lines provide proteins required in trans for the packaging of the viral genomic RNA into viral particles having the desired host range (i.e., the viral-encoded gag, poi and env proteins). The host range is controlled, in part, by the type of envelope gene product expressed on the surface of the viral particle. Packaging cell lines may express ecotrophic, amphotropic or xenotropic envelope gene products. Alternatively, the packaging cell line may lack sequences encoding a viral envelope (env) protein. In this case the packaging cell line will package the viral genome into particles that lack a membrane-associated protein (e.g., an env protein). In order to produce viral particles containing a membrane associated protein that will permit entry of the virus into a cell, the packaging cell line containing the retroviral sequences is transfected with sequences encoding a membrane-associated protein (e.g., the G protein of vesicular stomatitis virus (VSV)). The transfected packaging cell will then produce viral particles, which contain the membrane-associated protein expressed by the transfected packaging cell line; these viral particles, which contain viral genomic RNA derived from one virus encapsidated by the envelope proteins of another virus are said to be pseudotyped virus particles.
[0191] The retroviral vectors of the present invention can be further modified to include additional regulatory sequences. As described above, the retroviral vectors of the present invention include the following elements in operable association: a) a 5' LTR; b) a packaging signal; c) a 3' LTR and d) a nucleic acid encoding a protein of interest located between the 5' and 3' LTRs. In some embodiments, the nucleic acid of interest is operably linked to a promoter of interest. As described above, in preferred embodiments, the nucleic acid of interest is a gene encoding a protein from a quorum sensing system, or a gene encoding a cell fate regulator. In some preferred embodiments, the promoter is a promoter responsive to an autoinducer/regulatory partner complex synthesized by the quorum sensing pathway, an inducible promoter, a repressible promoter, or a stage specific promoter such as the AFP promoter. In some embodiments of the present invention, the nucleic acid of interest may be arranged in opposite orientation to the 5' LTR when transcription from an internal promoter is desired.
[0192] In other embodiments of the present invention, the vectors are modified by incorporating an RNA export element (See, e.g., U.S. Pat. Nos. 5,914,267; 6,136,597; and 5,686,120; and WO99/14310, all of which are incorporated herein by reference) either 3' or 5' to the nucleic acid sequence encoding the protein of interest. It is contemplated that the use of RNA export elements allows high levels of expression of the protein of interest without incorporating splice signals or introns in the nucleic acid sequence encoding the protein of interest.
[0193] In still other embodiments, the vector further comprises at least one internal ribosome entry site (TRES) sequence. The sequences of several suitable IRES's are available, including, but not limited to, those derived from foot and mouth disease virus (FDV), encephalomyocarditis virus, and poliovirus. The IRES sequence can be interposed between two transcriptional units (e.g., nucleic acids encoding different proteins of interest or subunits of a multisubunit protein such as an antibody) to form a polycistronic sequence so that the two transcriptional units are transcribed from the same promoter.
[0194] The retroviral vectors of the present invention may also further comprise a selectable marker allowing selection of transformed cells. A number of selectable markers find use in the present invention, including, but not limited to the bacterial aminoglycoside 3' phosphotransferase gene (also referred to as the neo gene) that confers resistance to the drug G418 in mammalian cells, the bacterial hygromycin G phosphotransferase (hyg) gene that confers resistance to the antibiotic hygromycin and the bacterial xanthine-guanine phosphoribosyl transferase gene (also referred to as the gpt gene) that confers the ability to grow in the presence of mycophenolic acid. In some embodiments, the selectable marker gene is provided as part of polycistronic sequence that also encodes the protein of interest.
[0195] Viral vectors, including recombinant lentiviral vectors, provide a more efficient means of transferring genes into cells as compared to other techniques such as calcium phosphate-DNA co-precipitation or DEAE-dextran-mediated transfection, electroporation or microinjection of nucleic acids. It is believed that the efficiency of viral transfer is due in part to the fact that the transfer of nucleic acid is a receptor-mediated process (i.e., the virus binds to a specific receptor protein on the surface of the cell to be infected). In addition, the virally transferred nucleic acid once inside a cell integrates in controlled manner in contrast to the integration of nucleic acids which are not virally transferred; nucleic acids transferred by other means such as calcium phosphate-DNA co-precipitation are subject to rearrangement and degradation.
[0196] The most commonly used recombinant retroviral vectors are derived from the amphotropic Moloney murine leukemia virus (MoMLV) (See e.g., Miller and Baltimore Mol. Cell. Biol. 6:2895
[1986]). The MoMLV system has several advantages: 1) this specific retrovirus can infect many different cell types, 2) established packaging cell lines are available for the production of recombinant MoMLV viral particles and 3) the transferred genes are permanently integrated into the target cell chromosome. The established MoMLV vector systems comprise a DNA vector containing a small portion of the retroviral sequence (e.g., the viral long terminal repeat or "LTR" and the packaging or "psi" signal) and a packaging cell line. The gene to be transferred is inserted into the DNA vector. The viral sequences present on the DNA vector provide the signals necessary for the insertion or packaging of the vector RNA into the viral particle and for the expression of the inserted gene. The packaging cell line provides the proteins required for particle assembly (Markowitz et al., J. Virol. 62:1120
[1988]).
[0197] In some preferred embodiments, the retroviral vector is pseudotyped. (See, e.g., U.S. Pat. No. 5,512,421, which is incorporated herein by reference). In some preferred embodiments, the pseudotyped retrovirus contains the G protein of VSV as the membrane associated protein. Unlike retroviral envelope proteins that bind to a specific cell surface protein receptor to gain entry into a cell, the VSV G protein interacts with a phospholipid component of the plasma membrane (Mastromarino et al., J. Gen. Virol. 68:2359
[1977]). Because entry of VSV into a cell is not dependent upon the presence of specific protein receptors, VSV has an extremely broad host range. Pseudotyped retroviral vectors bearing the VSV G protein have an altered host range characteristic of VSV (i.e., they can infect almost all species of vertebrate, invertebrate and insect cells). Importantly, VSV G-pseudotyped retroviral vectors can be concentrated 2000-fold or more by ultracentrifugation without significant loss of infectivity (Burns et al. Proc. Natl. Acad. Sci. USA 90:8033
[1993]). The present invention also contemplates the use of adeno associated virus (AAV) vectors. AAV is a human DNA parvovirus, which belongs to the genus Dependovirus. The AAV genome is composed of a linear, single-stranded DNA molecule that contains approximately 4680 bases. The genome includes inverted terminal repeats (ITRs) at each end that function in cis as origins of DNA replication and as packaging signals for the virus. The internal nonrepeated portion of the genome includes two large open reading frames, known as the AAV rep and cap regions, respectively. These regions code for the viral proteins involved in replication and packaging of the virion. A family of at least four viral proteins are synthesized from the AAV rep region, Rep 78, Rep 68, Rep 52 and Rep 40, named according to their apparent molecular weight. The AAV cap region encodes at least three proteins, VP1, VP2 and VP3 (for a detailed description of the AAV genome, see e.g., Muzyczka, Current Topics Microbiol. Immunol. 158:97-129
[1992]; Kotin, Human Gene Therapy 5:793-801
[1994]).
[0198] AAV requires coinfection with an unrelated helper virus, such as adenovirus, a herpesvirus or vaccinia, in order for a productive infection to occur. In the absence of such coinfection, AAV establishes a latent state by insertion of its genome into a host cell chromosome. Subsequent infection by a helper virus rescues the integrated copy, which can then replicate to produce infectious viral progeny. Unlike the non-pseudotyped retroviruses, AAV has a wide host range and is able to replicate in cells from any species so long as there is coinfection with a helper virus that will also multiply in that species. Thus, for example, human AAV will replicate in canine cells coinfected with a canine adenovirus. Furthermore, unlike the retroviruses, AAV is not associated with any human or animal disease, does not appear to alter the biological properties of the host cell upon integration and is able to integrate into nondividing cells. It has also recently been found that AAV is capable of site-specific integration into a host cell genome.
[0199] In light of the above-described properties, a number of recombinant AAV vectors have been developed for gene delivery (See, e.g., U.S. Pat. Nos. 5,173,414; 5,139,941; WO 92/01070 and WO 93/03769, both of which are incorporated herein by reference; Lebkowski et al., Molec. Cell. Biol. 8:3988-3996
[1988]; Carter, Current Opinion in Biotechnology 3:533-539
[1992]; Muzyczka, Current Topics in Microbiol. and Immunol. 158:97-129
[1992]; Kotin, (1994) Human Gene Therapy 5:793-801; Shelling and Smith, Gene Therapy 1:165-169
[1994]; and Zhou et al., J. Exp. Med. 179:1867-1875
[1994]).
[0200] Recombinant AAV virions can be produced in a suitable host cell that has been transfected with both an AAV helper plasmid and an AAV vector. An AAV helper plasmid generally includes AAV rep and cap coding regions, but lacks AAV ITRs. Accordingly, the helper plasmid can neither replicate nor package itself. An AAV vector generally includes a selected gene of interest bounded by AAV ITRs that provide for viral replication and packaging functions. Both the helper plasmid and the AAV vector bearing the selected gene are introduced into a suitable host cell by transient transfection. The transfected cell is then infected with a helper virus, such as an adenovirus, which transactivates the AAV promoters present on the helper plasmid that direct the transcription and translation of AAV rep and cap regions. Recombinant AAV virions harboring the selected gene are formed and can be purified from the preparation. Once the AAV vectors are produced, they may be used to transfect (See, e.g., U.S. Pat. No. 5,843,742, herein incorporated by reference) host cells at the desired multiplicity of infection to produce high copy number host cells. As will be understood by those skilled in the art, the AAV vectors may also be modified as described above to contain various regulatory sequences (e.g., signal peptide sequences, RNA export elements, and IRES's).
[0201] The present invention also contemplates the use of transposon vectors. Transposons are mobile genetic elements that can move or transpose from one location another in the genome. Transposition within the genome is controlled by a transposase enzyme that is encoded by the transposon. Many examples of transposons are known in the art, including, but not limited to, Tn5 (See e.g., de la Cruz et al., J. Bact. 175: 6932-38
[1993], Tn7 (See e.g., Craig, Curr. Topics Microbiol. Immunol. 204: 27-48
[1996]), and Tn10 (See e.g., Morisato and Kleckner, Cell 51:101-111
[1987]). The ability of transposons to integrate into genomes has been utilized to create transposon vectors (See, e.g., U.S. Pat. Nos. 5,719,055; 5,968,785; 5,958,775; and 6,027,722; all of which are incorporated herein by reference.) Because transposons are not infectious, transposon vectors are introduced into host cells via methods known in the art (e.g., electroporation, lipofection, or microinjection). Therefore, the ratio of transposon vectors to host cells may be adjusted to provide the desired multiplicity of infection to produce the high copy number host cells of the present invention.
[0202] Transposon vectors suitable for use in the present invention generally comprise a nucleic acid encoding a protein of interest interposed between two transposon insertion sequences. Some vectors also comprise a nucleic acid sequence encoding a transposase enzyme. In these vectors, the one of the insertion sequences is positioned between the transposase enzyme and the nucleic acid encoding the protein of interest so that it is not incorporated into the genome of the host cell during recombination. Alternatively, the transposase enzyme may be provided by a suitable method (e.g., lipofection or microinjection). As will be understood by those skilled in the art, the transposon vectors may also be modified as described above to contain various regulatory sequences (e.g., signal peptide sequences, RNA export elements, and IRES's).
[0203] In some preferred embodiments, the quorum sensing system of the present invention comprises one or more of the vectors described in FIGS. 5A-5D-25A-25G, and SEQ ID Nos: 01-12. FIGS. 5A-5D and 6 provide a map and sequence (SEQ ID NO:01) for the LuxI vector-pLV-Hef1a-LuxIm-IRES2-DsRed2; FIGS. 7A-7E and 8 provide a map and sequence (SEQ ID NO:02) for the LuxR vector-pLV-Hef1a-p65H4LuxRFm-IRES2-DsRed2; FIGS. 9A-9E and 10 provide a map and sequence (SEQ ID NO:03) for the lux promoter vector for expression of Gata4/Sox17-pLV-minCMVLuxO7-IRES2-EGFP; FIGS. 11A-11E and 12 provide a map and sequence (SEQ ID NO:04) for the ACP vector-pLV-Hef1a-ACPm-IRES2-DsRed2; FIGS. 13A-13F and 14 provide a map and sequence (SEQ ID NO:05) for the AAS vector-pLV-Hef1a-AAS-IRES2-EGFP; FIGS. 15A-15E and 16 provide a map and sequence (SEQ ID NO:06) for the AFP promoter vector PDX1-pLV-AFP-Pdx1-IRES2-DsRed2; FIGS. 17A-17E and 18 provide a map and sequence (SEQ ID NO:07) for the AFP promoter vector Ngn3-pLV-AFP-Ngn3-IRES2-DsRed2; FIGS. 19A-19E and 20 provide a map and sequence (SEQ ID NO:08) for the AFP promoter vector for TetRKRAB-pLV-AFP-TetRKRAB-IRES2-DsRed2; FIGS. 21A-21E and 22 provide a map and sequence (SEQ ID NO:09) for the RhlI vector-pLV-Hef1a-RhlI-IRES2-DsRed2.
[0204] It will be recognized that forgoing vectors are plasmid vectors utilized for the production of lentiviral vectors that are used to transducer target cells such as embryonic stems cells or adult stem cells. The quorum sensing pathway components are derived from bacterial genes. In preferred embodiments, the bacterial genes are codon optimized for expression in mammalian cells. It will also be recognized that the sequences of the components of the vectors may be varied. Accordingly, the present invention encompasses the use of vector components, including the genes of interest such as LuxI, CinI, RhlI, LuxR, CinR, RhlR, and any of the cell fate regulators that are at least 50%, 70%, 80%, 90%, or 95% identical to the wild type gene of interest and maintain the function of the gene of interest. Likewise, the present invention encompasses the use of promoters that that are at least 50%, 70%, 80%, 90%, or 95% identical to the promoter sequences described herein such as the lux promoter, AFP promoter, etc. In some preferred embodiments, the genes encoding LuxR, CinR, or RhlR are modified to include a mammalian activation domain. In preferred embodiments, the mammalian activation domain is the P65 mammalian activation domain. In preferred embodiments, the LuxR-, CinR-, or RhlI-regulator proteins are therefore a fusion with a mammalian activation domain. Accordingly, in some embodiments, the present invention provides vectors and systems comprising vectors that comprise a gene encoding a regulator protein-mammalian activation domain fusion protein.
[0205] In further embodiments, the present invention provides promoters that are inducible by a regulatory protein-autoinducer complex. In some embodiments, the promoter comprises at least one, and preferably 2, 3, 4, 5, 6, 7, 8, 9, or 10 sequences that bind the regulatory protein-autoinducer complex. In some embodiments, the promoter further comprises a minimal element from a mammalian promoter. In some embodiments, the minimal element is derived from the cytomegalovirus promoter, an example of such a promoter is provided in FIGS. 9A-9E and 10, SEQ ID NO:3.
[0206] Accordingly, in some preferred embodiments, the present invention provides vectors with the following components in operable association:
[0207] 5'LTR-Promoter-Mammalian codon optimized LuxI-3'LTR
[0208] 5'LTR-Promoter-Mammalian codon optimized LuxI-IRES Reporter-3'LTR
[0209] 5'LTR-Repressible promoter-Mammalian codon optimized LuxI-3'LTR
[0210] 5'LTR-LacI promoter-Mammalian codon optimized LuxI-3'LTR
[0211] 5'LTR-Promoter-Mammalian codon optimized RhlI-3'LTR
[0212] 5'LTR-Promoter-Mammalian codon optimized RhlI-IRES Reporter-3'LTR
[0213] 5'LTR-Repressible promoter-Mammalian codon optimized RhlI-3'LTR
[0214] 5'LTR-LacI promoter-Mammalian codon optimized RhlI-3'LTR
[0215] 5'LTR-Promoter-Mammalian codon optimized CinI-3'LTR
[0216] 5'LTR-Promoter-Mammalian codon optimized CinI-IRES Reporter-3'LTR
[0217] 5'LTR-Repressible promoter-Mammalian codon optimized CinI-3'LTR
[0218] 5'LTR-LacI promoter-Mammalian codon optimized CinI-3'LTR
[0219] 5'LTR-Promoter-Mammalian codon optimized LuxR-3'LTR
[0220] 5'LTR-Promoter-Mammalian codon optimized LuxR-IRES Reporter-3'LTR
[0221] 5'LTR-Repressible promoter-Mammalian codon optimized LuxR-3'LTR
[0222] 5'LTR-LacI promoter-Mammalian codon optimized LuxR-3'LTR
[0223] 5'LTR-Promoter-Mammalian codon optimized RhlR-3'LTR
[0224] 5'LTR-Promoter-Mammalian codon optimized RhlR-IRES Reporter-3'LTR
[0225] 5'LTR-Repressible promoter-Mammalian codon optimized RhlR-3'LTR
[0226] 5'LTR-LacI promoter-Mammalian codon optimized RhlR-3'LTR
[0227] 5'LTR-Promoter-Mammalian codon optimized CinR-3'LTR
[0228] 5'LTR-Promoter-Mammalian codon optimized CinR-IRES Reporter-3'LTR
[0229] 5'LTR-Repressible promoter-Mammalian codon optimized CinR-3'LTR
[0230] 5'LTR-LacI promoter-Mammalian codon optimized CinR-3'LTR
[0231] 5'LTR-Promoter-Mammalian codon optimized LuxR P65 fusion-3'LTR
[0232] 5'LTR-Promoter-Mammalian codon optimized LuxR P65 fusion-IRES Reporter-3'LTR
[0233] 5'LTR-Repressible promoter-Mammalian codon optimized LuxR P65 fusion-3'LTR
[0234] 5'LTR-LacI promoter-Mammalian codon optimized LuxR P65 fusion-3'LTR
[0235] 5'TR-Promoter-Mammalian codon optimized RhlR P65 fusion-3'LTR
[0236] 5'LTR-Promoter-Mammalian codon optimized RhlR P65 fusion-IRES Reporter-3'LTR
[0237] 5'LTR-Repressible promoter-Mammalian codon optimized RhlR P65 fusion-3'LTR
[0238] 5'LTR-LacI promoter-Mammalian codon optimized RhlR P65 fusion-3'LTR
[0239] 5'LTR-Promoter-Mammalian codon optimized CinR P65 fusion-3'LTR
[0240] 5'LTR-Promoter-Mammalian codon optimized CinR P65 fusion-IRES Reporter-3'LTR
[0241] 5'LTR-Repressible promoter-Mammalian codon optimized CinR P65 fusion-3'LTR
[0242] 5'LTR-LacI promoter-Mammalian codon optimized CinR P65 fusion-3'LTR
[0243] 5' LTR-regulatory protein/autoinducer responsive promoter cell fate regulator gene 3'LTR
[0244] 5'LTR-regulatory protein/autoinducer responsive promoter cell fate regulator gene TRES reporter gene 3'LTR
[0245] 5'LTR-regulatory protein/autoinducer responsive promoter cell fate regulator gene TRES selectable marker 3'LTR
[0246] 5'LTR-lux promoter cell fate regulator gene 3'LTR
[0247] 5'LTR-regulatory protein/autoinducer responsive promoter Gata4 gene 3'LTR
[0248] 5'LTR-regulatory protein/autoinducer responsive promoter Sox17-3'LTR
[0249] 5'LTR-promoter-mammalian codon optimized ACP-3'LTR
[0250] 5'LTR-promoter-mammalian codon optimized AAS-3'LTR
[0251] 5'LTR-stage specific promoter-cell fate regulator-3'LTR
[0252] 5'LTR-AFP promoter-cell fate regulator-3'LTR
[0253] 5'LTR-stage specific promoter-cell fate regulator-IRES-reporter-3'LTR
[0254] 5'LTR-AFP promoter-cell fate regulator-IRES-reporter-3'LTR
[0255] 5'LTR-stage specific promoter-cell fate regulator-IRES-selectable marker-3'LTR
[0256] 5'LTR-AFP promoter-cell fate regulator-IRES-selectable marker-3'LTR
[0257] 5'LTR-stage specific promoter-Pdx1 gene-3'LTR
[0258] 5'LTR-AFP promoter-Pdx1 gene-3'LTR
[0259] 5'LTR-stage specific promoter-Pdx1 gene-IRES-reporter-3'LTR
[0260] 5'LTR-AFP promoter-Pdx1 gene-IRES-reporter-3'LTR
[0261] 5'LTR-stage specific promoter-Pdx1 gene-IRES-selectable marker-3'LTR
[0262] 5'LTR-AFP promoter-Pdx1 gene-IRES-selectable marker-3'LTR
[0263] 5'LTR-stage specific promoter-Ngn3 gene-3'LTR
[0264] 5'LTR-AFP promoter-Ngn3 gene-3'LTR
[0265] 5'LTR-stage specific promoter-Ngn3 gene-IRES-reporter-3'LTR
[0266] 5'LTR-AFP promoter-Ngn3 gene-IRES-reporter-3'LTR
[0267] 5'LTR-stage specific promoter-Ngn3 gene-IRES-selectable marker-3'LTR
[0268] 5'LTR-AFP promoter-Ngn3 gene-IRES-selectable marker-3'LTR
[0269] 5'LTR-stage specific promoter-TetR-3'LTR
[0270] 5'LTR-AFP promoter-TetRr-3'LTR
[0271] 5'LTR-stage specific promoter-TetR-IRES-reporter-3'LTR
[0272] 5'LTR-AFP promoter-TetR-IRES-reporter-3'LTR
[0273] 5'LTR-stage specific promoter-TetR-IRES-selectable marker-3'LTR
[0274] 5'LTR-AFP promoter-TetR-IRES-selectable marker-3'LTR
[0275] 5'LTR-terminal differentiation promoter-repressor-3'LTR
[0276] 5'LTR-MIP promoter-repressor-3'LTR
[0277] 5'LTR-stage specific promoter-repressor-IRES-reporter-3'LTR
[0278] 5'LTR-MIP promoter-repressor-IRES-reporter-3'LTR
[0279] 5'LTR-stage specific promoter-repressor-IRES-selectable marker-3'LTR
[0280] 5'LTR-MIP promoter-repressor-IRES-selectable marker-3'LTR
[0281] 5'LTR-terminal differentiation promoter-LacI-3'LTR
[0282] 5'LTR-MIP promoter-LacI-3'LTR
[0283] 5'LTR-stage specific promoter-LacI-IRES-reporter-3'LTR
[0284] 5'LTR-MIP promoter-LacI-IRES-reporter-3'LTR
[0285] 5'LTR-stage specific promoter-LacI-IRES-selectable marker-3'LTR
[0286] 5'LTR-MIP promoter-LacI-IRES-selectable marker-3'LTR
[0287] In the vectors described above, the 5' and 3'LTRs are preferably retroviral LTRs and most preferably lentiviral LTRs. The vectors can preferably comprise additional elements in additional to the listed elements.
G. Cells
[0288] The quorum sensing system of the present invention may be introduced into a variety of mammalian cell types. In preferred embodiments, the quorum sensing system is introduced into embryonic or adult stem cells. However, the quorum sensing systems may be introduced into any mammalian cell lines, including, but not limited to, 293 cells, to Chinese hamster ovary cells (CHO-K1, ATCC CC1-61); bovine mammary epithelial cells (ATCC CRL 10274; bovine mammary epithelial cells); monkey kidney CV1 line transfar lied by SV40 (COS-7, ATCC CRL 1651); human embryonic kidney line (293 or 293 cells subcloned for growth in suspension culture; see, e.g., Graham et al., J. Gen Viral., 36:59
[1977]); baby hamster kidney cells (BHK, ATCC CCL 10); mouse sertoli cells (TM4, Mather, Biol. Reprod. 23:243-251
[1980]); monkey kidney cells (CV1 ATCC CCL 70); African green monkey kidney cells (VERO-76, ATCC CRL-1587); human cervical carcinoma cells (HELA, ATCC CCL 2); canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 1442); human lung cells (W138, ATCC CCL 75); human liver cells (Hep G2, HB 8065); mouse mammary tumor (MMT 060562, ATCC CCL51); TRI cells (Mather et al., Annals N.Y. Acad. Sci., 383:44-68
[1982]); MRC 5 cells; FS4 cells; rat fibroblasts (208F cells); MDBK cells (bovine kidney cells); and a human hepatoma line (Hep G2).
[0289] The present invention is not limited to the use of any particular type of embryonic stem cells. Indeed, the use of embryonic stem cells from a number of animal species is contemplated. Methods for obtaining totipotent or pluripotent cells from humans, monkeys, mice, rats, pigs, cattle and sheep have been previously described. See, e.g., U.S. Pat. Nos. 5,453,357; 5,523,226; 5,589,376; 5,340,740; and 5,166,065 (all of which are specifically incorporated herein by reference); as well as, Evans, et al., Theriogenology 33(1):125-128, 1990; Evans, et al., Theriogenology 33(1):125-128, 1990; Notarianni, et al., J. Reprod. Feral. 41(Suppl.):51-56, 1990; Giles, et al., Mol. Reprod. Dev. 36:130-138, 1993; Graves, et al., Mol. Reprod. Dev. 36:424-433, 1993; Sukoyan, et al., Mol. Reprod. Dev. 33:418-431, 1992; Sukoyan, et al., Mol. Reprod. Dev. 36:148-158, 1993; Iannaccone, et al., Dev. Biol. 163:288-292, 1994; Evans & Kaufman, Nature 292:154-156, 1981; Martin, Proc Natl Acad Sci USA 78:7634-7638, 1981; Doetschman et al. Dev Biol 127:224-227, 1988); Giles et al. Mol Reprod Dev 36:130-138, 1993; Graves & Moreadith, Mol Reprod Dev 36:424-433, 1993 and Bradley, et al., Nature 309:255-256, 1984.
[0290] Primate embryonic stem cells may be preferably obtained by the methods disclosed in U.S. Pat. Nos. 5,843,780 and 6,200,806, each of which is incorporated herein by reference. Primate (including human) stem cells may also be obtained from commercial sources such as WiCell, Madison, Wis. A preferable medium for isolation of embryonic stem cells is "ES medium." ES medium consists of 80% Dulbecco's modified Eagle's medium (DMEM; no pyruvate, high glucose formulation, Gibco BRL), with 20% fetal bovine serum (FBS; Hyclone), 0.1 mM .beta.-mercaptoethanol (Sigma), 1% non-essential amino acid stock (Gibco BRL). Preferably, fetal bovine serum batches are compared by testing clonal plating efficiency of a low passage mouse ES cell line (ES), a cell line developed just for the purpose of this test. FBS batches must be compared because it has been found that batches vary dramatically in their ability to support embryonic cell growth, but any other method of assaying the competence of FBS batches for support of embryonic cells will work as an alternative.
[0291] Primate ES cells are isolated on a confluent layer of murine embryonic fibroblast in the presence of ES cell medium. Embryonic fibroblasts are preferably obtained from 12 day old fetuses from outbred CF1 mice (SASCO), but other strains may be used as an alternative. Tissue culture dishes are preferably treated with 0.1% gelatin (type I; Sigma).
Recovery of rhesus monkey embryos has been demonstrated, with recovery of an average 0.4 to 0.6 viable embryos per rhesus monkey per month, Seshagiri et al. Am J Primatol 29:81-91, 1993. Embryo collection from marmoset monkey is also well documented (Thomson et al. "Non-surgical uterine stage preimplantation embryo collection from the common marmoset," J Med Primatol, 23:333-336 (1994)). Here, the zona pellucida is removed from blastocysts by brief exposure to pronase (Sigma). For immunosurgery, blastocysts are exposed to a 1:50 dilution of rabbit anti-marmoset spleen cell antiserum (for marmoset blastocysts) or a 1:50 dilution of rabbit anti-rhesus monkey (for rhesus monkey blastocysts) in DMEM for 30 minutes, then washed for 5 minutes three times in DMEM, then exposed to a 1:5 dilution of Guinea pig complement (Gibco) for 3 minutes.
[0292] After two further washes in DMEM, lysed trophectoderm cells are removed from the intact inner cell mass (ICM) by gentle pipetting, and the ICM plated on mouse inactivated (3000 rads gamma irradiation) embryonic fibroblasts. After 7-21 days, ICM-derived masses are removed from endoderm outgrowths with a micropipette with direct observation under a stereo microscope, exposed to 0.05% Trypsin-EDTA (Gibco) supplemented with 1% chicken serum for 3-5 minutes and gently dissociated by gentle pipetting through a flame polished micropipette.
[0293] Dissociated cells are replated on embryonic feeder layers in fresh ES medium, and observed for colony formation. Colonies demonstrating ES-like morphology are individually selected, and split again as described above. The ES-like morphology is defined as compact colonies having a high nucleus to cytoplasm ratio and prominent nucleoli. Resulting ES cells are then routinely split by brief trypsinization or exposure to Dulbecco's Phosphate Buffered Saline (without calcium or magnesium and with 2 mM EDTA) every 1-2 weeks as the cultures become dense. Early passage cells are also frozen and stored in liquid nitrogen.
[0294] The present invention is not limited to the use of any particular adult stem cell. The adult stem cell is an undifferentiated (unspecialized) cell that is found in a differentiated (specialized) tissue; it can renew itself and become specialized to yield specialized cell types of the tissue from which it originated. These precursor cells exist within the differentiated tissues of the adult of all multicellular organisms. Precursor cells derived from adults can be divided into three categories based on their potential for differentiation. These three categories of precursor cells are epiblast-like stem cells, germ layer lineage stem cells, and progenitor cells. Precursor cells have been isolated from a wide variety of tissues, including, but not limited to, skeletal muscle, dermis, fat, cardiac muscle, granulation tissue, periosteum, perichondrium, brain, meninges, nerve sheaths, ligaments, tendons, blood vessels, bone marrow, trachea, lungs, esophagus, stomach, liver, intestines, spleen, pancreas, kidney, urinary bladder, and testis. Precursor cells can be released from the connective tissue compartments throughout the body by mechanical disruption and/or enzymatic digestion and have been isolated from, but not limited to, newborns, adolescent, and geriatric mice, rats and humans, and adult rabbits, dogs, goats, sheep, and pigs.
[0295] The first category of precursor cells, epiblast-like stem cells (ELSCs), consists of a stem cell that will form cells from all three embryonic germ layer lineages. Stem cells from adult rats and stem cells from adult humans can be released from the connective tissue compartments throughout the body by mechanical disruption and/or enzymatic digestion. The stem cells from either adult rats or adult humans can be preferentially slow frozen and stored at -80.degree. C..+-.5.degree. C. using 7.5% ultra-pure dimethyl sulfoxide. Fast thawing of stem cells from both species from the frozen state to ambient temperature yields recovery rates exceeding 98%. These cells in the undifferentiated state express the Oct-3/4 gene that is characteristic of embryonic stem cells. ELSCs do not spontaneously differentiate in a serum free environment lacking progression agents, proliferation agents, lineage-induction agents, and/or inhibitory factors, such as recombinant human leukemia inhibitory factor (LIF), recombinant murine leukemia inhibitory factor (ESGRO), or recombinant human anti-differentiation factor (ADF). Embryonic stem cells spontaneously differentiate under these conditions. In contrast, ELSCs derived from both species remain quiescent unless acted upon by specific proliferative and/or inductive agents and/or environment.
[0296] ELSCs proliferate to form multiple confluent layers of cells in vitro in the presence of proliferation agents such as platelet-derived growth factors and respond to lineage-induction agents. ELSCs respond to hepatocyte growth factor by forming cells belonging to the endodermal lineage. Cell lines have expressed phenotypic markers for many discrete cell types of ectodermal, mesodermal, and endodermal origin when exposed to general and specific induction agents.
[0297] The second category of precursor cells consists of three separate stem cells. Each of the cells forms cells of a specific embryonic germ layer lineage (ectodermal stem cells, mesodermal stem cells and endodermal stem cells). When exposed to general and specific inductive agents, germ layer lineage ectodermal stem cells can differentiated into, for example, neuronal progenitor cells, neurons, ganglia, oligodendrocytes, astrocytes, synaptic vesicles, radial glial cells, and keratinocytes.
[0298] The third category of precursor cells present in adult tissues is composed of a multitude of multipotent, tripotent, bipotent, and unipotent progenitor cells. In solid tissues these cells are located near their respective differentiated cell types. Progenitor cells do not typically display phenotypic expression markers for pluripotent ELSCs, such as stage specific embryonic antigen-4, stage-specific embryonic antigen-1 or stage-specific embryonic antigen-3, or carcinoembryonic antigen cell adhesion molecule-1 Similarly, progenitor cells do not typically display phenotypic expression markers for germ layer lineage stem cells, such as nestin for cells of the ectodermal lineage or fetoprotein for cells of the endodeunal lineage.
[0299] A progenitor cell may be multipotent, having the ability to form multiple cell types. A precursor cell of ectodelinal origin residing in the adenohypophysisand designated the adenohypophyseal progenitor cell is an example of a multipotent progenitor cell. This cell will form gonadotrophs, somatotrophs, thyrotrophs, corticotrophs, and mammotrophs. Progenitor cells for particular cell lineages have unique profiles of cell surface cluster of differentiation (CD) markers and unique profiles of phenotypic differentiation expression markers. Progenitor cells do not typically spontaneously differentiate in serum-free defined medium in the absence of a differentiation agent, such as LIF or ADF. Thus, unlike embryonic stem cells which spontaneously differentiate under these conditions, progenitor cells remain quiescent unless acted upon by proliferative agents (such as platelet-derived growth factor) and/or progressive agents (such as insulin, insulin-like growth factor-I or insulin-like growth factor-II).
[0300] Progenitor cells can regulate their behavior according to changing demands such that after transplantation they activate from quiescence to proliferate and generate both new satellite cells and substantial amounts of new differentiated cells. For example, the contractile units of muscle are myofibers, elongated syncytial cells each containing many hundreds of postmitotic myonuclei. Satellite cells are resident beneath the basal lamina of myofibers and function as myogenic precursors during muscle regeneration. In response to muscle injury, satellite cells are activated, proliferate, and differentiate, during which they fuse together to repair or replace damaged myofibers. When satellite cells are removed from their myofibers by a non-enzymatic physical titration method, they retain their ability to generate substantial quantities of new muscle after grafting that they are not able to attain by enzymatic digestion. Conventional enzymatic disaggregation techniques impair myogenic potential. Collins and Partridge "Self-Renewal of the Adult Skeletal Muscle Satellite Cell" Cell Cycle 4:10, 1338-1341 (2005).
[0301] Accordingly, the present invention also contemplates the use of non-embryonic stem cells, such as those described above. In some embodiments, mesenchymal stem cells (MSCs) can be derived from marrow, periosteum, dermis and other tissues of mesodermal origin (See, e.g., U.S. Pat. Nos. 5,591,625 and 5,486,359, each of which is incorporated herein by reference). MSCs are the formative pluripotential blast cells that differentiate into the specific types of connective tissues (i.e. the tissues of the body that support the specialized elements; particularly adipose, areolar, osseous, cartilaginous, elastic, marrow stroma, muscle, and fibrous connective tissues) depending upon various in vivo or in vitro environmental influences. Although these cells are normally present at very low frequencies in bone marrow, various methods have been described for isolating, purifying, and greatly replicating the marrow-derived mesenchymal stems cells in culture, i.e. in vitro (See also U.S. Pat. Nos. 5,197,985 and 5,226,914 and PCT Publication No. WO 92/22584, each of which are incorporated herein by reference).
[0302] Various methods have also been described for the isolation of hematopoietic stem cells (See, e.g., U.S. Pat. Nos. 5,061,620; 5,750,397; 5,716,827 all of which are incorporated herein by reference). It is contemplated that the methods of the present invention can be used to produce lymphoid, myeloid and erythroid cells from hematopoietic stem cells. The lymphoid lineage, comprising B-cells and T-cells, provides for the production of antibodies, regulation of the cellular immune system, detection of foreign agents in the blood, detection of cells foreign to the host, and the like. The myeloid lineage, which includes monocytes, granulocytes, megakaryocytes as well as other cells, monitors for the presence of foreign bodies in the blood stream, provides protection against neoplastic cells, scavenges foreign materials in the blood stream, produces platelets, and the like. The erythroid lineage provides the red blood cells, which act as oxygen carriers.
[0303] Accordingly, the present invention also contemplates the use of neural stem cells, which are generally isolated from developing fetuses. The isolation, culture, and use of neural stem cells are described in U.S. Pat. Nos. 5,654,183; 5,672,499; 5,750,376; 5,849,553; and 5,968,829, all of which are incorporated herein by reference. It is contemplated that the methods of the present invention can use neural stem cells to produce neurons, glia, melanocytes, cartilage and connective tissue of the head and neck, stroma of various secretory glands and cells in the outflow tract of the heart.
[0304] In some embodiments, the quorum sensing systems are incorporated into cord blood cells. Transplantation of umbilical-cord blood has been successfully performed to treat individuals with blood-diseases; donors, used have been newborn siblings being perfect HLA matches for the affects sibling. The advantages of cord blood as a source of hematopoietic stem cells for transplantation are clear. First, the proliferative capacity of hematopoietic stem cells in cord blood is superior to that of cells in marrow or blood from adults. Because they proliferate rapidly, the stem cells in a single unit of cord blood can reconstitute the entire hematopoietic system. Second, the use of cord blood reduces the risk of graft-versus-host disease, the main obstacle to the success of allogeneic transplantation of hematopoietic stem cells. Graft-versus-host disease is caused by a reaction of T cells in the graft to HLA antigens in the recipient; the immaturity of lymphocytes in cord blood dampens that reaction. A joint European study showed that recipients of cord blood from HLA-identical siblings had a lower risk of acute or chronic graft-versus-host disease than recipients of marrow from HLA-identical siblings. Children with acute leukemia who received HLA-mismatched cord blood from an unrelated donor also had a lower risk of graft-versus-host disease than recipients of HLA-mismatched marrow from an unrelated donor (Hematopoietic stem-cell transplants using umbilical-cord blood, New England Journal of Medicine, 2001, 344(24): 1860-1861, editorial). Cord blood cells from siblings or children with matching HLA could be used to produce cell lines for use as contemplated by this invention.
H. Treatment Methods
[0305] In preferred embodiments, cells incorporating the vectors or systems of vectors described are introduced into a subject in need of treatment. In some embodiments, where the subject is diabetic, synthetic .beta. cells comprising the quorum sensing system described above are introduced into the diabetic subject. In preferred embodiments, the synthetic .beta. cells produce insulin. It will be recognized that the systems described above can be adapted to cause the differentiation of embryonic stem cells, adult stem cells and cord blood stem cells into a variety of cell types that can be utilized for therapeutic purposes, including neurons, chondrocytes, myocytes, and keratinocytes of various types.
[0306] In some embodiments, banks of cells comprising the quorum sensing systems described above are provided. In preferred embodiments, the banks of cells include cell lines that are programmed to attain a particular differentiated states, such as 13 cells. In some embodiments, the banks of cells comprise multiple cell lines expressing different combinations of HLA antigens. It is contemplated that such banks will result in an increased likelihood of obtaining a 6-6, 10-10 or greater match for a particular subject.
I. Identification of Additional Genes Involved in Beta Cell Differentiation
[0307] In some embodiments, the systems of the present invention are used to identify genes involved in pancreatic .beta. cell differentiation using RNAi knockdown assays. In preferred embodiments, the assays utilize a large library of shRNAs comprising >100,000 hairpins which target .about.21,000 human and .about.17,000 mouse genes, 30,000 of which are cloned into a tet-inducible microRNA-embedded shRNA lentiviral vector.
[0308] The RNAi knockdown assay is illustrated in FIG. 4. First, engineered ES cells are transduced with a lentiviral shRNA library at an MOI of 1, so that each post-transduction cell will have one shRNA. After quorum sensing and two-step differentiation occurs in the transduced cells, three unique pools of cells emerge in the population. The first consists of cells in which shRNA interferes with the differentiation of mES cells to endoderm, causing cells to retain stem cell character and proliferate rapidly. The second pool consists of cells in which shRNA interferes with the differentiation of endodermal cells to .beta. cells, causing cells to retain endoderm character and proliferate more slowly than the stem cells. The last pool consists of cells in which shRNA does not interfere with the two-step differentiation process from mES cells, resulting in slowly- or non-dividing .beta. cells. Two populations of cells, a sample of the initial population after transduction and a sample of the population directed to differentiate, will be subjected to microarray analysis. Comparisons of the relative ratios of individual shRNA will allow identification of genes that are involved at each step of the differentiation process.
EXAMPLES
Example 1
[0309] 293FT cells that were genetically engineered to synthesize 30C6HSL were grown in liquid media, and this media was found to contain 30C6HSL (FIG. 2a). The 30C6HSL detection pathway consists of a signal transducer that binds 30C6HSL and activates transcription of genes controlled by a synthetic lux promoter. The 30C6HSL signal transducer is a chimeric LuxR-activator protein created by the fusion of a P65 mammalian activation domain to a codon optimized mammalian version of a bacterial LuxR. Initial testing of the mammalian version of LuxR in 293FT cells shows a strong response to the addition of 30C6HSL (FIG. 2b).
Example 2
Synthetic Gene Networks in Mammalian Cells--the rtTA Switch
[0310] A Reverse Tetracycline-controlled transactivator (rtTA) switch, where gene expression is upregulated by the addition of Doxycycline (Dox), has been implemented. rtTA is constitutively expressed in the Ainv15 mES cell line and activates transcription of a give cell fate regulator (CFR) and a EGFP as control from the TRE promoter in the presence of Dox. A dosage response curve (not shown) demonstrates that the expression level (as measured by flourescence depending of the Dox concentration.
[0311] Controlled Induction of Cell Fate Regulators in mES.
[0312] Ngn1, MyoD and Nanog Ainv15 mES stem cells constitutively expressing the Dox-inducible circuit can be infected with a virus encoding Ngn1/EGFP or MyoD/EGFP under the control of a TRE promoter. The cell maintains self-renewal in the absence of Dox, while the presence of Dox results in the differentiation into either cells with a neuronal morphology (Ngn1), muscle cell morphology (MyoD).
[0313] Matrigel-Embedded mES--MyoD Expression.
[0314] To verify the conditions required for mES cell growth and differentiation in semi-solid media, embedded mES cells were infected with the Dox inducible circuit encoding MyoD/EGFP in Matrigel and subsequently induced with 1 mg/ml Dox over several days. Cells not induced with Dox grow, do not express EGFP and maintained stem cell morphology in a Matrigel matrix. The addition of Dox results in EGFP expression and formation of multinucleated syncitia 60 hours post induction, which is one key development step in formation of muscle fibers.
Example 3
[0315] 293FT and CHO cells were infected with virus encoding pLV-pTat-IRES2-EGFP, and then exposed three different types of TAT communication molecules with various secretion tags. The 293FT cells are able to internalize the TAT protein, while CHO cells cannot. In the experiment, sender cells were grown for 2 days, and then receiver 293FT cells were grown in the supernatant. As expected, the supernatant from 293FT cells that can internalize the TAT molecule did not yield much communication, while supernatant from the CHO cells was able to active GFP expression significantly in the receiver cells.
Example 4
[0316] This example describes the assessment of the transactivatory properties of TAT and the functionality of the signaling modules by having sender and receiver modules in the same cell. Sender cells expressing TAT containing a secretion signal and receiver cells containing the detection module separated by a permeable barrier (e.g. transwell inserts), are used to validate and optimize the cell-cell transducing capabilities of TAT. A hemagglutinin tag (HA-tag) is added as a translational fusion to the C-terminus of TAT, to confirm its expression by means of immunofluorescence (IF) and western blotting (WB).
[0317] Engineering TAT Internal Signaling:
[0318] By having receiver and mock TAT sender (no secretion signal for export) in the same cell, the transactivatory properties of TAT as well as the detector functionality is validated. As described above, the rtTA switch, where gene expression is upregulated by the addition of Dox, is operational. Full-length HIV-1 TAT with a C-terminal HA-tag is placed under the control of the TRE promoter, enabling induction of TAT expression by adding Dox to the cell culture media. The receiver contains the wild-type HIV-1 pTAT promoter with the TAR element driving expression of EGFP upon TAT binding. To test, the TAT transactivation capabilities, Dox is added to cells, whereupon TAT expression is induced. TAT binds to pTAT and induces the expression of EGFP, which is detected by FACS and/or fluorescence microscopy (FIG. 26).
[0319] Engineering TAT Senders:
[0320] By co-infecting the rtTA switch and the TAT gene with an N-terminal secretion tag and an C-terminal HA-tag under control of the pTRE promoter into cells, Dox-inducible secretion of TAT is achieved (FIG. 26b). To get a high level of secretion to the extracellular milieu, the N-terminal export signal sequence present in IL-2 to the N-terminus of TAT is added. To verify protein expression and export, cell lysates and cell supernatants is checked by a WB against the HA tag. Low levels of TAT in the supernatant can be concentrated by precipitation of the protein fraction with triacetic acid.
[0321] Engineering TAT Receivers:
[0322] By infecting the module containing the pTAT promoter controlling the expression of EGFP, TAT-inducible expression is of EGFP achieved (FIG. 26b). Using the TAT senders secreting the protein into the supernatant described in the previous paragraph, EGFP expression in receivers is detectable by fluorescence microscopy or FACS. The uptake of TAT by receiver cells will also be confirmed by IF staining against the HA-tag. Using transwell inserts as a permeable barrier to keep sender and receiver cells separate, cell-cell communication of the secreted TAT can be assessed by measuring the EGFP protein expression levels in the receiver cells (FACS/fluorescence microscopy).
[0323] The presence of TAT in the sender cell, in the supernatant, in the cytoplasm or nucleus of the receiver cell will be confirmed by IF and WB against the HA-tag in TAT.
Recently, reports have suggested the release from macropinosomes is the limiting factor in the transduction of cells by using a HIS-tag purified TAT-peptide/Cre fusion protein. The addition of an N-terminal 20 amino acids of the influenza virus hemagglutinin protein HA2-which is a fusogenic peptide that destabilizes lipid membranes at low pH in mature endosomes--to the C-terminus of the TAT-peptide, markedly increased its release from the macropinosomes, and subsequently the effectiveness of transduction. As a consequence, the addition of this fusogenic peptide might also increase the effectiveness of cell-cell transduction.
[0324] If TAT receiver sensitivity is too low, a TAT-Cre fusion protein with a secretion signal in the sender cells will be made and used to transduce a reporter module into receiver cells where the expression of EGFP from a constitutive promoter is blocked by a terminator with two loxP sites. In principle, the translocation of one TAT-Cre molecule into the nucleus of a receiver cell should be sufficient to trigger the Cre mediated recombination/removal of the terminator, hence full expression of EGFP. Such a TAT-Cre fusion expressing module could also be very useful in tracing the dispersion of TAT in vivo, by using a reporter mouse cell line expressing EGFP upon recombination of a terminator located in its promoter.
Example 5
[0325] This example describes the assessment of the effectiveness of in silico designed ZFP-DTS pairs and the functionality of the signaling modules by having sender and receiver modules in the same cell. In a second step, sender cells expressing TAT-ZFP fusions and receiver cells containing the detection module separated by a barrier (e.g. transwell inserts) are used to check and optimize the cell-cell transducing capabilities of TAT-ZFP fusion proteins with their cognate DTS.
[0326] Published in silico designed ZFP-DTS pairs, as for example Jazz, EPOZFP-862c and ZFP-809 binding to the DTSs "GCTGCTGCG", "GCGGTGGCT", "G(G/c)GGG(T/a)G(A/g)C" (5-fold, 15-fold and 46-130 fold induction of reporter genes respectively) (37-39) will be used first to demonstrate the functionality of my reporter system. Subsequently, novel in silico ZFP-DTS pairs are designed.
[0327] Engineering ZFP-DTS Pairs (in Silico):
[0328] Described ZF--nucleotide triplet interactions will be used to assemble novel ZFP-DTS pairs in silico by using existing knowledge on how to engineer translational fusions of such single zinc finger to a ZFP, which is also integrated into the online ZFP design toolset "ZF Tools" developed by the Barbas group (Mandell, J. G. & Barbas, C. F., 3rd (2006) Nucleic acids research 34, W516-523; Mandell, J. G. (2006) (on the world wide web at scripps.edu/mb/barbas/zfdesign/zfdesignhome.php), p. Zinc Finger Tools Version 3.0). Care will be taken to design ZFP-DTS pairs for which the DTS has no significant homologies in the human or mouse genome (nucleotide blast against the genome sequence), to reduce the probability of ZFP binding to endogenous sequences.
[0329] Engineering the Internal ZFP-DTS Signaling:
[0330] By having receiver and mock sender (no secretion signal for export) in the same cell, the transactivatory properties of the ZFP-DTS pairs as well as the detector functionality will be validated. As described above, the reverse Tetracycline-controlled transactivator (rtTA) switch, where gene expression is upregulated by the addition of Dox is operational. For the mock secretion module, a TAT-ZFP-VP64 fusion (VP64 as a strong enhancer) will be placed under the control of the TRE promoter, enabling induction of TAT-ZFP-VP64 expression by adding Dox to the cell culture media. The receiver module contains a minimal CMV promoter with the cognate DTS for the ZFP driving the expression of EGFP upon TAT-ZFP-VP64 binding. To test the transactivation capabilities, Dox is added to cells containing the artificial network, whereupon TAT-ZFP-VP64 is expressed, exported to the extracellular milieu by the virtue of its N-terminal IL-2 signal peptide, re-enter the cell with the help of the transducing TAT domain and localize to the nucleus where binds to its cognate DTS. The binding of fusion protein to the minimal CMV promoter will induce expression of EGFP (detectable by FACS and/or fluorescence microscopy) (FIG. 27a).
[0331] Engineering ZFP Senders:
[0332] By co-infecting the rtTA switch and the TAT-ZFP-VP64 fusion gene with a secretion signal under control of the TRE promoter into cells, Dox-inducible secretion of TAT-ZFP-VP64 is achieved (FIG. 27b). This fusion protein is exported by the virtue of its N-terminal secretion system to the extracellular milieu, where it can be detected by using receiver cells, or, at an earlier stage, by WB. The expression of the fusion protein in the sender cells can be verified by WB of cell lysates or IF staining against the HA-tag.
[0333] Engineering ZFP Receivers:
[0334] By infecting the module containing the minimal CMV promoter with the DTS for the cognate ZFP controlling the expression of EGFP, TAT-ZFP-VP64-inducible expression of EGFP is achieved (FIG. 27b). Specifically, the secretion signal-containing TAT-ZFP-VP64 fusion protein is exported by sender cells separated by a transwell membrane into in the cell supernatant will transduce into receiver cells, localize to the nucleus and bind its cognate DTS in the minimal CMV promoter, thereby expressing EGFP (detectable by FACS and/or fluorescence microscopy). The uptake of the fusion protein will also be tracked by IF staining against the HA-tag.
[0335] If the engineering of cognate ZFP-DBP pairs and/or the export of ZFPs is problematic, the ZFP will be replaced by Gal4 and use the well-described Gal4 DTS in the receiver cells. This library could then be expanded by replacing Gal4 with known DNA-binding proteins/transcription factors (e.g. LacI, cI, TetR, rtTA) and their cognate DTS on the receiver side. Also, different endocytic domains and different secretion tags (from IL-11, GM-CSF . . . ) can be used.
[0336] Full length TAT might bind to endogenous regulatory elements, where it would have undesirable effects on the expression of endogenous genes. By using only a minimal TAT-peptide of 10 amino acids needed for cellular uptake and nuclear localization, it should be possible to reduce these undesirable interactions. Alternatively, other transducing/NLS-containing peptide sequences could be used in the delivery, as for example the 34 amino acid sequence of the Beta2/NeuroD transcription factor or the third a-helix of the Antennapedia homeoprotein.
Example 6
[0337] This example demonstrates that the artificial cell-cell communication signals described in Example 5 can be used to induce differentiation of mouse embryonic stem cells in vitro. Two types of sender cells will express TAT-ZFP1-VP64 or TAT-ZFP2-VP64. ZFP1 activates MyoD expression, while ZFP2 activates Ngn1 expression. By placing patches of sender cells on a Matrigel matrix with embedded receiver cells, the gradient of the two synthetic morphogens should induce differentiation of the receiver cells into a myoblast- or neuron-like morphology, depending on their location.
[0338] Engineering Senders:
[0339] By infecting the rtTA switch, the TAT-ZFP-VP64 fusion expressing module from above and additionally a module with a distinct ZFP (TAT-ZFP2-VP64), two sender cell types are created expressing TAT-ZFP-VP64 or TAT-ZFP2-VP64 upon addition of Dox.
[0340] Engineering Receivers:
[0341] By co-infecting the two modules containing a DTS1 or DTS2, a minimal CMV promoter, the myoD or ngn1 genes, an internal ribosome entry site (IRES) and a egfp or dsred2 gene into receiver cells, they will express MyoD and EGFP in response to ZFP1, Ngn1 and DsRed2 in response to ZFP2. These signaling circuits will be first tested with mES in liquid cell culture. If the communication and differentiation in the liquid media with transwell insert works, the receiver mES cells will be embedded in a Matrigel matrix and the sender cells filled into holes of various diameter at the surface of this matrix. This to induce three-dimensional differentiation patterns in the receiver cells. The differentiation success in both liquid as well matrix approach can be assessed by morphological changes, IF stainings, or RT-PCR for relevant markers (e.g. Desmin, Mef2, Cx2 for myocyte assays; Lim 1/2, Map2 and NSE for neural assays)
[0342] Other substrates than Matrigel might provide better growth and differentiation capabilities for mES, as for example a collagen-agarose matrix. The diffusion properties of the TAT-ZFP fusions proteins will be critical, as well as the positioning and the number the sender cells on the matrix. Suboptimal expression of MyoD/Ngn1 could be overcome by incorporating signal amplifying circuits into the receiver cells. Crosstalk between the ZFP-DTS pairs should have been ruled out earlier in the ZFP-DTS optimization procedure, nevertheless the ZFP and/or DTS coding sequences could be exchanged for other variants.
Sequence CWU
1
1
34110632DNAArtificial SequenceSynthetic 1cggccatcga taaggatccg cccctctccc
tccccccccc ctaacgttac tggccgaagc 60cgcttggaat aaggccggtg tgcgtttgtc
tatatgttat tttccaccat attgccgtct 120tttggcaatg tgagggcccg gaaacctggc
cctgtcttct tgacgagcat tcctaggggt 180ctttcccctc tcgccaaagg aatgcaaggt
ctgttgaatg tcgtgaagga agcagttcct 240ctggaagctt cttgaagaca aacaacgtct
gtagcgaccc tttgcaggca gcggaacccc 300ccacctggcg acaggtgcct ctgcggccaa
aagccacgtg tataagatac acctgcaaag 360gcggcacaac cccagtgcca cgttgtgagt
tggatagttg tggaaagagt caaatggctc 420tcctcaagcg tattcaacaa ggggctgaag
gatgcccaga aggtacccca ttgtatggga 480tctgatctgg ggcctcggtg cacatgcttt
acatgtgttt agtcgaggtt aaaaaaacgt 540ctaggccccc cgaaccacgg ggacgtggtt
ttcctttgaa aaacacgatg ataatatggc 600cacaaccatg gcctcctccg aggacgtcat
caaggagttc atgcgcttca aggtgcgcat 660ggagggctcc gtgaacggcc acgagttcga
gatcgagggc gagggcgagg gccgccccta 720cgagggcacc cagaccgcca agctgaaggt
gaccaagggc ggccccctgc ccttcgcctg 780ggacatcctg tccccccagt tccagtacgg
ctccaaggtg tacgtgaagc accccgccga 840catccccgac tacaagaagc tgtccttccc
cgagggcttc aagtgggagc gcgtgatgaa 900cttcgaggac ggcggcgtgg tgaccgtgac
ccaggactcc tccctgcagg acggctcctt 960catctacaag gtgaagttca tcggcgtgaa
cttcccctcc gacggccccg taatgcagaa 1020gaagactatg ggctgggagg cctccaccga
gcgcctgtac ccccgcgacg gcgtgctgaa 1080gggcgagatc cacaaggccc tgaagctgaa
ggacggcggc cactacctgg tggagttcaa 1140gtccatctac atggccaaga agcccgtgca
gctgcccggc tactactacg tggactccaa 1200gctggacatc acctcccaca acgaggacta
caccatcgtg gagcagtacg agcgcgccga 1260gggccgccac cacctgttcc tgtaggcggc
cgcaatcaac ctctggatta caaaatttgt 1320gaaagattga ctggtattct taactatgtt
gctcctttta cgctatgtgg atacgctgct 1380ttaatgcctt tgtatcatgc tattacttcc
cgtacggctt tcattttctc ctccttgtat 1440aaatcctggt tgctgtctct ttatgaggag
ttgtggcccg ttgtcaggca acgtggcgtg 1500gtgtgcactg tgtttgctga cgcaaccccc
actggttggg gcattgccac cacctatcaa 1560ctcctttccg ggactttcgc tttccccctc
cctattgcca cggcggaact cattgccgcc 1620tgccttgccc gctgctggac aggggctcgg
ctgttgggca ctgacaattc cgtggtgttg 1680tcggggaagc tgacgtcctt tccatggctg
ctcgcctgtg ttgccaactg gattctgcgc 1740gggacgtcct tctgctacgt cccttcggcc
ctcaatccag cggaccttcc ttcccgcggc 1800ctgctgccgg ttctgcggcc tcttccgcgt
cttcgccttc gccctcagac gagtcggatc 1860tccctttggg ccgcctcccc gcctgcctgc
aggtttgtcg agacctagaa aaacatggag 1920caatcacaag tagcaataca gcagctacca
atgctgattg tgcctggcta gaagcacaag 1980aggaggagga ggtgggtttt ccagtcacac
ctcaggtacc tttaagacca atgacttaca 2040aggcagctgt agatcttagc cactttttaa
aagaaaaggg gggactggaa gggctaattc 2100actcccaacg aagacaagat ctgctttttg
cttgtactgg gtctctctgg ttagaccaga 2160tctgagcctg ggagctctct ggctaactag
ggaacccact gcttaagcct caataaagct 2220tgccttgagt gcttcaagta gtgtgtgccc
gtctgttgtg tgactctggt aactagagat 2280ccctcagacc cttttagtca gtgtggaaaa
tctctagcag ggcccgttta aacccgctga 2340tcagcctcga ctgtgccttc tagttgccag
ccatctgttg tttgcccctc ccccgtgcct 2400tccttgaccc tggaaggtgc cactcccact
gtcctttcct aataaaatga ggaaattgca 2460tcgcattgtc tgagtaggtg tcattctatt
ctggggggtg gggtggggca ggacagcaag 2520ggggaggatt gggaagacaa tagcaggcat
gctggggatg cggtgggctc tatggcttct 2580gaggcggaaa gaaccagctg gggctctagg
gggtatcccc acgcgccctg tagcggcgca 2640ttaagcgcgg cgggtgtggt ggttacgcgc
agcgtgaccg ctacacttgc cagcgcccta 2700gcgcccgctc ctttcgcttt cttcccttcc
tttctcgcca cgttcgccgg ctttccccgt 2760caagctctaa atcggggcat ccctttaggg
ttccgattta gtgctttacg gcacctcgac 2820cccaaaaaac ttgattaggg tgatggttca
cgtagtgggc catcgccctg atagacggtt 2880tttcgccctt tgacgttgga gtccacgttc
tttaatagtg gactcttgtt ccaaactgga 2940acaacactca accctatctc ggtctattct
tttgatttat aagggatttt ggggatttcg 3000gcctattggt taaaaaatga gctgatttaa
caaaaattta acgcgaatta attctgtgga 3060atgtgtgtca gttagggtgt ggaaagtccc
caggctcccc aggcaggcag aagtatgcaa 3120agcatgcatc tcaattagtc agcaaccagg
tgtggaaagt ccccaggctc cccagcaggc 3180agaagtatgc aaagcatgca tctcaattag
tcagcaacca tagtcccgcc cctaactccg 3240cccatcccgc ccctaactcc gcccagttcc
gcccattctc cgccccatgg ctgactaatt 3300ttttttattt atgcagaggc cgaggccgcc
tctgcctctg agctattcca gaagtagtga 3360ggaggctttt ttggaggcct aggcttttgc
aaaaagctcc cgggagcttg tatatccatt 3420ttcggatctg atcagcacgt gttgacaatt
aatcatcggc atagtatatc ggcatagtat 3480aatacgacaa ggtgaggaac taaaccatgg
ccaagttgac cagtgccgtt ccggtgctca 3540ccgcgcgcga cgtcgccgga gcggtcgagt
tctggaccga ccggctcggg ttctcccggg 3600acttcgtgga ggacgacttc gccggtgtgg
tccgggacga cgtgaccctg ttcatcagcg 3660cggtccagga ccaggtggtg ccggacaaca
ccctggcctg ggtgtgggtg cgcggcctgg 3720acgagctgta cgccgagtgg tcggaggtcg
tgtccacgaa cttccgggac gcctccgggc 3780cggccatgac cgagatcggc gagcagccgt
gggggcggga gttcgccctg cgcgacccgg 3840ccggcaactg cgtgcacttc gtggccgagg
agcaggactg acacgtgcta cgagatttcg 3900attccaccgc cgccttctat gaaaggttgg
gcttcggaat cgttttccgg gacgccggct 3960ggatgatcct ccagcgcggg gatctcatgc
tggagttctt cgcccacccc aacttgttta 4020ttgcagctta taatggttac aaataaagca
atagcatcac aaatttcaca aataaagcat 4080ttttttcact gcattctagt tgtggtttgt
ccaaactcat caatgtatct tatcatgtct 4140gtataccgtc gacctctagc tagagcttgg
cgtaatcatg gtcatagctg tttcctgtgt 4200gaaattgtta tccgctcaca attccacaca
acatacgagc cggaagcata aagtgtaaag 4260cctggggtgc ctaatgagtg agctaactca
cattaattgc gttgcgctca ctgcccgctt 4320tccagtcggg aaacctgtcg tgccagctgc
attaatgaat cggccaacgc gcggggagag 4380gcggtttgcg tattgggcgc tcttccgctt
cctcgctcac tgactcgctg cgctcggtcg 4440ttcggctgcg gcgagcggta tcagctcact
caaaggcggt aatacggtta tccacagaat 4500caggggataa cgcaggaaag aacatgtgag
caaaaggcca gcaaaaggcc aggaaccgta 4560aaaaggccgc gttgctggcg tttttccata
ggctccgccc ccctgacgag catcacaaaa 4620atcgacgctc aagtcagagg tggcgaaacc
cgacaggact ataaagatac caggcgtttc 4680cccctggaag ctccctcgtg cgctctcctg
ttccgaccct gccgcttacc ggatacctgt 4740ccgcctttct cccttcggga agcgtggcgc
tttctcaatg ctcacgctgt aggtatctca 4800gttcggtgta ggtcgttcgc tccaagctgg
gctgtgtgca cgaacccccc gttcagcccg 4860accgctgcgc cttatccggt aactatcgtc
ttgagtccaa cccggtaaga cacgacttat 4920cgccactggc agcagccact ggtaacagga
ttagcagagc gaggtatgta ggcggtgcta 4980cagagttctt gaagtggtgg cctaactacg
gctacactag aaggacagta tttggtatct 5040gcgctctgct gaagccagtt accttcggaa
aaagagttgg tagctcttga tccggcaaac 5100aaaccaccgc tggtagcggt ggtttttttg
tttgcaagca gcagattacg cgcagaaaaa 5160aaggatctca agaagatcct ttgatctttt
ctacggggtc tgacgctcag tggaacgaaa 5220actcacgtta agggattttg gtcatgagat
tatcaaaaag gatcttcacc tagatccttt 5280taaattaaaa atgaagtttt aaatcaatct
aaagtatata tgagtaaact tggtctgaca 5340gttaccaatg cttaatcagt gaggcaccta
tctcagcgat ctgtctattt cgttcatcca 5400tagttgcctg actccccgtc gtgtagataa
ctacgatacg ggagggctta ccatctggcc 5460ccagtgctgc aatgataccg cgagacccac
gctcaccggc tccagattta tcagcaataa 5520accagccagc cggaagggcc gagcgcagaa
gtggtcctgc aactttatcc gcctccatcc 5580agtctattaa ttgttgccgg gaagctagag
taagtagttc gccagttaat agtttgcgca 5640acgttgttgc cattgctaca ggcatcgtgg
tgtcacgctc gtcgtttggt atggcttcat 5700tcagctccgg ttcccaacga tcaaggcgag
ttacatgatc ccccatgttg tgcaaaaaag 5760cggttagctc cttcggtcct ccgatcgttg
tcagaagtaa gttggccgca gtgttatcac 5820tcatggttat ggcagcactg cataattctc
ttactgtcat gccatccgta agatgctttt 5880ctgtgactgg tgagtactca accaagtcat
tctgagaata gtgtatgcgg cgaccgagtt 5940gctcttgccc ggcgtcaata cgggataata
ccgcgccaca tagcagaact ttaaaagtgc 6000tcatcattgg aaaacgttct tcggggcgaa
aactctcaag gatcttaccg ctgttgagat 6060ccagttcgat gtaacccact cgtgcaccca
actgatcttc agcatctttt actttcacca 6120gcgtttctgg gtgagcaaaa acaggaaggc
aaaatgccgc aaaaaaggga ataagggcga 6180cacggaaatg ttgaatactc atactcttcc
tttttcaata ttattgaagc atttatcagg 6240gttattgtct catgagcgga tacatatttg
aatgtattta gaaaaataaa caaatagggg 6300ttccgcgcac atttccccga aaagtgccac
ctgacgtcga cggatcggga gatctcccga 6360tcccctatgg tgcactctca gtacaatctg
ctctgatgcc gcatagttaa gccagtatct 6420gctccctgct tgtgtgttgg aggtcgctga
gtagtgcgcg agcaaaattt aagctacaac 6480aaggcaaggc ttgaccgaca attgcatgaa
gaatctgctt agggttaggc gttttgcgct 6540gcttcgcgat gtacgggcca gatatacgcg
ttgacattga ttattgacta gttattaata 6600gtaatcaatt acggggtcat tagttcatag
cccatatatg gagttccgcg ttacataact 6660tacggtaaat ggcccgcctg gctgaccgcc
caacgacccc cgcccattga cgtcaataat 6720gacgtatgtt cccatagtaa cgccaatagg
gactttccat tgacgtcaat gggtggacta 6780tttacggtaa actgcccact tggcagtaca
tcaagtgtat catatgccaa gtacgccccc 6840tattgacgtc aatgacggta aatggcccgc
ctggcattat gcccagtaca tgaccttatg 6900ggactttcct acttggcagt acatctacgt
attagtcatc gctattacca tggtgatgcg 6960gttttggcag tacatcaatg ggcgtggata
gcggtttgac tcacggggat ttccaagtct 7020ccaccccatt gacgtcaatg ggagtttgtt
ttggcaccaa aatcaacggg actttccaaa 7080atgtcgtaac aactccgccc cattgacgca
aatgggcggt aggcgtgtac ggtgggaggt 7140ctatataagc agagctctct ggctaactag
agaacccact gcttactggc ttatcgaaat 7200taatacgact cactataggg agacccaagc
tggtttaaac ttaagcttgg taccgagctc 7260actagtccag tgtggtggca gatatccagc
acagtggcgg ccgctcgagt ctagagggcc 7320cgttttgcct gtactgggtc tctctggtta
gaccagatct gagcctggga gctctctggc 7380taactaggga acccactgct taagcctcaa
taaagcttgc cttgagtgct tcaagtagtg 7440tgtgcccgtc tgttgtgtga ctctggtaac
tagagatccc tcagaccctt ttagtcagtg 7500tggaaaatct ctagcagtgg cgcccgaaca
gggacttgaa agcgaaaggg aaaccagagg 7560agctctctcg acgcaggact cggcttgctg
aagcgcgcac ggcaagaggc gaggggcggc 7620gactggtgag tacgccaaaa attttgacta
gcggaggcta gaaggagaga gatgggtgcg 7680agagcgtcag tattaagcgg gggagaatta
gatcgcgatg ggaaaaaatt cggttaaggc 7740cagggggaaa gaaaaaatat aaattaaaac
atatagtatg ggcaagcagg gagctagaac 7800gattcgcagt taatcctggc ctgttagaaa
catcagaagg ctgtagacaa atactgggac 7860agctacaacc atcccttcag acaggatcag
aagaacttag atcattatat aatacagtag 7920caaccctcta ttgtgtgcat caaaggatag
agataaaaga caccaaggaa gctttagaca 7980agatagagga agagcaaaac aaaagtaaga
ccaccgcaca gcaagcggcc gctgatcttc 8040agacctggag gaggagatat gagggacaat
tggagaagtg aattatataa atataaagta 8100gtaaaaattg aaccattagg agtagcaccc
accaaggcaa agagaagagt ggtgcagaga 8160gaaaaaagag cagtgggaat aggagctttg
ttccttgggt tcttgggagc agcaggaagc 8220actatgggcg cagcgtcaat gacgctgacg
gtacaggcca gacaattatt gtctggtata 8280gtgcagcagc agaacaattt gctgagggct
attgaggcgc aacagcatct gttgcaactc 8340acagtctggg gcatcaagca gctccaggca
agaatcctgg ctgtggaaag atacctaaag 8400gatcaacagc tcctggggat ttggggttgc
tctggaaaac tcatttgcac cactgctgtg 8460ccttggaatg ctagttggag taataaatct
ctggaacaga tttggaatca cacgacctgg 8520atggagtggg acagagaaat taacaattac
acaagcttaa tacactcctt aattgaagaa 8580tcgcaaaacc agcaagaaaa gaatgaacaa
gaattattgg aattagataa atgggcaagt 8640ttgtggaatt ggtttaacat aacaaattgg
ctgtggtata taaaattatt cataatgata 8700gtaggaggct tggtaggttt aagaatagtt
tttgctgtac tttctatagt gaatagagtt 8760aggcagggat attcaccatt atcgtttcag
acccacctcc caaccccgag gggacccgac 8820aggcccttaa ttaattggct ccggtgcccg
tcagtgggca gagcgcacat cgcccacagt 8880ccccgagaag ttggggggag gggtcggcaa
ttgaaccggt gcctagagaa ggtggcgcgg 8940ggtaaactgg gaaagtgatg tcgtgtactg
gctccgcctt tttcccgagg gtgggggaga 9000accgtatata agtgcagtag tcgccgtgaa
cgttcttttt cgcaacgggt ttgccgccag 9060aacacaggta agtgccgtgt gtggttcccg
cgggcctggc ctctttacgg gttatggccc 9120ttgcgtgcct tgaattactt ccacctggct
gcagtacgtg attcttgatc ccgagcttcg 9180ggttggaagt gggtgggaga gttcgaggcc
ttgcgcttaa ggagcccctt cgcctcgtgc 9240ttgagttgag gcctggcctg ggcgctgggg
ccgccgcgtg cgaatctggt ggcaccttcg 9300cgcctgtctc gctgctttcg ataagtctct
agccatttaa aatttttgat gacctgctgc 9360gacgcttttt ttctggcaag atagtcttgt
aaatgcgggc caagatctgc acactggtat 9420ttcggttttt ggggccgcgg gcggcgacgg
ggcccgtgcg tcccagcgca catgttcggc 9480gaggcggggc ctgcgagcgc ggccaccgag
aatcggacgg gggtagtctc aagctggccg 9540gcctgctctg gtgcctggcc tcgcgccgcc
gtgtatcgcc ccgccctggg cggcaaggct 9600ggcccggtcg gcaccagttg cgtgagcgga
aagatggccg cttcccggcc ctgctgcagg 9660gagctcaaaa tggaggacgc ggcgctcggg
agagcgggcg ggtgagtcac ccacacaaag 9720gaaaagggcc tttccgtcct cagccgtcgc
ttcatgtgac tccacggagt accgggcgcc 9780gtccaggcac ctcgattagt tctcgagctt
ttggagtacg tcgtctttag gttgggggga 9840ggggttttat gcgatggagt ttccccacac
tgagtgggtg gagactgaag ttaggccagc 9900ttggcacttg atgtaattct ccttggaatt
tgcccttttt gagtttggat cttggttcat 9960tctcaagcct cagacagtgg ttcaaagttt
ttttcttcca tttcaggtgt cgtgaggaat 10020tcggccatta cggcccgcca ccatgaccat
catgatcaag aagagcgact tcctggccat 10080ccccagcgag gagtacaagg gcatcctgag
cctgagatac caggtgttca agcagaggct 10140ggagtgggac ctggtggtgg agaacaacct
ggagagcgac gagtacgaca acagcaacgc 10200cgagtacatc tacgcctgcg acgacaccga
gaacgtgagc ggctgctggc gcctgctgcc 10260caccaccggc gactacatgc tgaagagcgt
gttccccgag ctgctgggcc agcagagcgc 10320ccccaaggac cccaacatcg tggagctgtc
caggttcgcc gtgggcaaga acagcagcaa 10380gatcaacaac agcgccagcg agatcaccat
gaagctgttc gaggccatct acaagcacgc 10440cgtgagccag ggcatcaccg agtacgtgac
cgtgaccagc accgccatcg agagatttct 10500gaagaggatc aaggtgccct gccacaggat
cggcgacaag gagatccacg tcctgggcga 10560caccaagagc gtggtgctgt ccatgcccat
caacgagcag ttcaaaaagg ccgtgctgaa 10620ctgaggccgc ct
10632211736DNAArtificial
SequenceSynthetic 2cggccatcga taaggatccg cccctctccc tccccccccc ctaacgttac
tggccgaagc 60cgcttggaat aaggccggtg tgcgtttgtc tatatgttat tttccaccat
attgccgtct 120tttggcaatg tgagggcccg gaaacctggc cctgtcttct tgacgagcat
tcctaggggt 180ctttcccctc tcgccaaagg aatgcaaggt ctgttgaatg tcgtgaagga
agcagttcct 240ctggaagctt cttgaagaca aacaacgtct gtagcgaccc tttgcaggca
gcggaacccc 300ccacctggcg acaggtgcct ctgcggccaa aagccacgtg tataagatac
acctgcaaag 360gcggcacaac cccagtgcca cgttgtgagt tggatagttg tggaaagagt
caaatggctc 420tcctcaagcg tattcaacaa ggggctgaag gatgcccaga aggtacccca
ttgtatggga 480tctgatctgg ggcctcggtg cacatgcttt acatgtgttt agtcgaggtt
aaaaaaacgt 540ctaggccccc cgaaccacgg ggacgtggtt ttcctttgaa aaacacgatg
ataatatggc 600cacaaccatg gcctcctccg aggacgtcat caaggagttc atgcgcttca
aggtgcgcat 660ggagggctcc gtgaacggcc acgagttcga gatcgagggc gagggcgagg
gccgccccta 720cgagggcacc cagaccgcca agctgaaggt gaccaagggc ggccccctgc
ccttcgcctg 780ggacatcctg tccccccagt tccagtacgg ctccaaggtg tacgtgaagc
accccgccga 840catccccgac tacaagaagc tgtccttccc cgagggcttc aagtgggagc
gcgtgatgaa 900cttcgaggac ggcggcgtgg tgaccgtgac ccaggactcc tccctgcagg
acggctcctt 960catctacaag gtgaagttca tcggcgtgaa cttcccctcc gacggccccg
taatgcagaa 1020gaagactatg ggctgggagg cctccaccga gcgcctgtac ccccgcgacg
gcgtgctgaa 1080gggcgagatc cacaaggccc tgaagctgaa ggacggcggc cactacctgg
tggagttcaa 1140gtccatctac atggccaaga agcccgtgca gctgcccggc tactactacg
tggactccaa 1200gctggacatc acctcccaca acgaggacta caccatcgtg gagcagtacg
agcgcgccga 1260gggccgccac cacctgttcc tgtaggcggc cgcaatcaac ctctggatta
caaaatttgt 1320gaaagattga ctggtattct taactatgtt gctcctttta cgctatgtgg
atacgctgct 1380ttaatgcctt tgtatcatgc tattacttcc cgtacggctt tcattttctc
ctccttgtat 1440aaatcctggt tgctgtctct ttatgaggag ttgtggcccg ttgtcaggca
acgtggcgtg 1500gtgtgcactg tgtttgctga cgcaaccccc actggttggg gcattgccac
cacctatcaa 1560ctcctttccg ggactttcgc tttccccctc cctattgcca cggcggaact
cattgccgcc 1620tgccttgccc gctgctggac aggggctcgg ctgttgggca ctgacaattc
cgtggtgttg 1680tcggggaagc tgacgtcctt tccatggctg ctcgcctgtg ttgccaactg
gattctgcgc 1740gggacgtcct tctgctacgt cccttcggcc ctcaatccag cggaccttcc
ttcccgcggc 1800ctgctgccgg ttctgcggcc tcttccgcgt cttcgccttc gccctcagac
gagtcggatc 1860tccctttggg ccgcctcccc gcctgcctgc aggtttgtcg agacctagaa
aaacatggag 1920caatcacaag tagcaataca gcagctacca atgctgattg tgcctggcta
gaagcacaag 1980aggaggagga ggtgggtttt ccagtcacac ctcaggtacc tttaagacca
atgacttaca 2040aggcagctgt agatcttagc cactttttaa aagaaaaggg gggactggaa
gggctaattc 2100actcccaacg aagacaagat ctgctttttg cttgtactgg gtctctctgg
ttagaccaga 2160tctgagcctg ggagctctct ggctaactag ggaacccact gcttaagcct
caataaagct 2220tgccttgagt gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt
aactagagat 2280ccctcagacc cttttagtca gtgtggaaaa tctctagcag ggcccgttta
aacccgctga 2340tcagcctcga ctgtgccttc tagttgccag ccatctgttg tttgcccctc
ccccgtgcct 2400tccttgaccc tggaaggtgc cactcccact gtcctttcct aataaaatga
ggaaattgca 2460tcgcattgtc tgagtaggtg tcattctatt ctggggggtg gggtggggca
ggacagcaag 2520ggggaggatt gggaagacaa tagcaggcat gctggggatg cggtgggctc
tatggcttct 2580gaggcggaaa gaaccagctg gggctctagg gggtatcccc acgcgccctg
tagcggcgca 2640ttaagcgcgg cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc
cagcgcccta 2700gcgcccgctc ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg
ctttccccgt 2760caagctctaa atcggggcat ccctttaggg ttccgattta gtgctttacg
gcacctcgac 2820cccaaaaaac ttgattaggg tgatggttca cgtagtgggc catcgccctg
atagacggtt 2880tttcgccctt tgacgttgga gtccacgttc tttaatagtg gactcttgtt
ccaaactgga 2940acaacactca accctatctc ggtctattct tttgatttat aagggatttt
ggggatttcg 3000gcctattggt taaaaaatga gctgatttaa caaaaattta acgcgaatta
attctgtgga 3060atgtgtgtca gttagggtgt ggaaagtccc caggctcccc aggcaggcag
aagtatgcaa 3120agcatgcatc tcaattagtc agcaaccagg tgtggaaagt ccccaggctc
cccagcaggc 3180agaagtatgc aaagcatgca tctcaattag tcagcaacca tagtcccgcc
cctaactccg 3240cccatcccgc ccctaactcc gcccagttcc gcccattctc cgccccatgg
ctgactaatt 3300ttttttattt atgcagaggc cgaggccgcc tctgcctctg agctattcca
gaagtagtga 3360ggaggctttt ttggaggcct aggcttttgc aaaaagctcc cgggagcttg
tatatccatt 3420ttcggatctg atcagcacgt gttgacaatt aatcatcggc atagtatatc
ggcatagtat 3480aatacgacaa ggtgaggaac taaaccatgg ccaagttgac cagtgccgtt
ccggtgctca 3540ccgcgcgcga cgtcgccgga gcggtcgagt tctggaccga ccggctcggg
ttctcccggg 3600acttcgtgga ggacgacttc gccggtgtgg tccgggacga cgtgaccctg
ttcatcagcg 3660cggtccagga ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg
cgcggcctgg 3720acgagctgta cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac
gcctccgggc 3780cggccatgac cgagatcggc gagcagccgt gggggcggga gttcgccctg
cgcgacccgg 3840ccggcaactg cgtgcacttc gtggccgagg agcaggactg acacgtgcta
cgagatttcg 3900attccaccgc cgccttctat gaaaggttgg gcttcggaat cgttttccgg
gacgccggct 3960ggatgatcct ccagcgcggg gatctcatgc tggagttctt cgcccacccc
aacttgttta 4020ttgcagctta taatggttac aaataaagca atagcatcac aaatttcaca
aataaagcat 4080ttttttcact gcattctagt tgtggtttgt ccaaactcat caatgtatct
tatcatgtct 4140gtataccgtc gacctctagc tagagcttgg cgtaatcatg gtcatagctg
tttcctgtgt 4200gaaattgtta tccgctcaca attccacaca acatacgagc cggaagcata
aagtgtaaag 4260cctggggtgc ctaatgagtg agctaactca cattaattgc gttgcgctca
ctgcccgctt 4320tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc
gcggggagag 4380gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg
cgctcggtcg 4440ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta
tccacagaat 4500caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc
aggaaccgta 4560aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag
catcacaaaa 4620atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac
caggcgtttc 4680cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc
ggatacctgt 4740ccgcctttct cccttcggga agcgtggcgc tttctcaatg ctcacgctgt
aggtatctca 4800gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc
gttcagcccg 4860accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga
cacgacttat 4920cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta
ggcggtgcta 4980cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta
tttggtatct 5040gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga
tccggcaaac 5100aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg
cgcagaaaaa 5160aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag
tggaacgaaa 5220actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc
tagatccttt 5280taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact
tggtctgaca 5340gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt
cgttcatcca 5400tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta
ccatctggcc 5460ccagtgctgc aatgataccg cgagacccac gctcaccggc tccagattta
tcagcaataa 5520accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc
gcctccatcc 5580agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat
agtttgcgca 5640acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt
atggcttcat 5700tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg
tgcaaaaaag 5760cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca
gtgttatcac 5820tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta
agatgctttt 5880ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg
cgaccgagtt 5940gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact
ttaaaagtgc 6000tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg
ctgttgagat 6060ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt
actttcacca 6120gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga
ataagggcga 6180cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc
atttatcagg 6240gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa
caaatagggg 6300ttccgcgcac atttccccga aaagtgccac ctgacgtcga cggatcggga
gatctcccga 6360tcccctatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa
gccagtatct 6420gctccctgct tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt
aagctacaac 6480aaggcaaggc ttgaccgaca attgcatgaa gaatctgctt agggttaggc
gttttgcgct 6540gcttcgcgat gtacgggcca gatatacgcg ttgacattga ttattgacta
gttattaata 6600gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg
ttacataact 6660tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga
cgtcaataat 6720gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat
gggtggacta 6780tttacggtaa actgcccact tggcagtaca tcaagtgtat catatgccaa
gtacgccccc 6840tattgacgtc aatgacggta aatggcccgc ctggcattat gcccagtaca
tgaccttatg 6900ggactttcct acttggcagt acatctacgt attagtcatc gctattacca
tggtgatgcg 6960gttttggcag tacatcaatg ggcgtggata gcggtttgac tcacggggat
ttccaagtct 7020ccaccccatt gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg
actttccaaa 7080atgtcgtaac aactccgccc cattgacgca aatgggcggt aggcgtgtac
ggtgggaggt 7140ctatataagc agagctctct ggctaactag agaacccact gcttactggc
ttatcgaaat 7200taatacgact cactataggg agacccaagc tggtttaaac ttaagcttgg
taccgagctc 7260actagtccag tgtggtggca gatatccagc acagtggcgg ccgctcgagt
ctagagggcc 7320cgttttgcct gtactgggtc tctctggtta gaccagatct gagcctggga
gctctctggc 7380taactaggga acccactgct taagcctcaa taaagcttgc cttgagtgct
tcaagtagtg 7440tgtgcccgtc tgttgtgtga ctctggtaac tagagatccc tcagaccctt
ttagtcagtg 7500tggaaaatct ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg
aaaccagagg 7560agctctctcg acgcaggact cggcttgctg aagcgcgcac ggcaagaggc
gaggggcggc 7620gactggtgag tacgccaaaa attttgacta gcggaggcta gaaggagaga
gatgggtgcg 7680agagcgtcag tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt
cggttaaggc 7740cagggggaaa gaaaaaatat aaattaaaac atatagtatg ggcaagcagg
gagctagaac 7800gattcgcagt taatcctggc ctgttagaaa catcagaagg ctgtagacaa
atactgggac 7860agctacaacc atcccttcag acaggatcag aagaacttag atcattatat
aatacagtag 7920caaccctcta ttgtgtgcat caaaggatag agataaaaga caccaaggaa
gctttagaca 7980agatagagga agagcaaaac aaaagtaaga ccaccgcaca gcaagcggcc
gctgatcttc 8040agacctggag gaggagatat gagggacaat tggagaagtg aattatataa
atataaagta 8100gtaaaaattg aaccattagg agtagcaccc accaaggcaa agagaagagt
ggtgcagaga 8160gaaaaaagag cagtgggaat aggagctttg ttccttgggt tcttgggagc
agcaggaagc 8220actatgggcg cagcgtcaat gacgctgacg gtacaggcca gacaattatt
gtctggtata 8280gtgcagcagc agaacaattt gctgagggct attgaggcgc aacagcatct
gttgcaactc 8340acagtctggg gcatcaagca gctccaggca agaatcctgg ctgtggaaag
atacctaaag 8400gatcaacagc tcctggggat ttggggttgc tctggaaaac tcatttgcac
cactgctgtg 8460ccttggaatg ctagttggag taataaatct ctggaacaga tttggaatca
cacgacctgg 8520atggagtggg acagagaaat taacaattac acaagcttaa tacactcctt
aattgaagaa 8580tcgcaaaacc agcaagaaaa gaatgaacaa gaattattgg aattagataa
atgggcaagt 8640ttgtggaatt ggtttaacat aacaaattgg ctgtggtata taaaattatt
cataatgata 8700gtaggaggct tggtaggttt aagaatagtt tttgctgtac tttctatagt
gaatagagtt 8760aggcagggat attcaccatt atcgtttcag acccacctcc caaccccgag
gggacccgac 8820aggcccttaa ttaattggct ccggtgcccg tcagtgggca gagcgcacat
cgcccacagt 8880ccccgagaag ttggggggag gggtcggcaa ttgaaccggt gcctagagaa
ggtggcgcgg 8940ggtaaactgg gaaagtgatg tcgtgtactg gctccgcctt tttcccgagg
gtgggggaga 9000accgtatata agtgcagtag tcgccgtgaa cgttcttttt cgcaacgggt
ttgccgccag 9060aacacaggta agtgccgtgt gtggttcccg cgggcctggc ctctttacgg
gttatggccc 9120ttgcgtgcct tgaattactt ccacctggct gcagtacgtg attcttgatc
ccgagcttcg 9180ggttggaagt gggtgggaga gttcgaggcc ttgcgcttaa ggagcccctt
cgcctcgtgc 9240ttgagttgag gcctggcctg ggcgctgggg ccgccgcgtg cgaatctggt
ggcaccttcg 9300cgcctgtctc gctgctttcg ataagtctct agccatttaa aatttttgat
gacctgctgc 9360gacgcttttt ttctggcaag atagtcttgt aaatgcgggc caagatctgc
acactggtat 9420ttcggttttt ggggccgcgg gcggcgacgg ggcccgtgcg tcccagcgca
catgttcggc 9480gaggcggggc ctgcgagcgc ggccaccgag aatcggacgg gggtagtctc
aagctggccg 9540gcctgctctg gtgcctggcc tcgcgccgcc gtgtatcgcc ccgccctggg
cggcaaggct 9600ggcccggtcg gcaccagttg cgtgagcgga aagatggccg cttcccggcc
ctgctgcagg 9660gagctcaaaa tggaggacgc ggcgctcggg agagcgggcg ggtgagtcac
ccacacaaag 9720gaaaagggcc tttccgtcct cagccgtcgc ttcatgtgac tccacggagt
accgggcgcc 9780gtccaggcac ctcgattagt tctcgagctt ttggagtacg tcgtctttag
gttgggggga 9840ggggttttat gcgatggagt ttccccacac tgagtgggtg gagactgaag
ttaggccagc 9900ttggcacttg atgtaattct ccttggaatt tgcccttttt gagtttggat
cttggttcat 9960tctcaagcct cagacagtgg ttcaaagttt ttttcttcca tttcaggtgt
cgtgaggaat 10020tcggccatta cggcctgcca ccatggacca gtacctgccc gacaccgacg
acaggcacag 10080gatcgaggag aagaggaaga ggacctacga gaccttcaag agcatcatga
agaagagccc 10140cttcaacggc cccaccgagc ccagaccccc caccaggcgg atcgccgtgc
ccacaaggaa 10200cagcaccagc gtgcccaagc ctgcccccca gccctacacc ttccccgcca
gcctgagcac 10260catcaacttc gacgagttca gccccatgct gctgcccagc ggccagatca
gcaaccaggc 10320cctggccctg gctcctagca gcgcccctgt gctggcccag accatggtgc
ccagcagcgc 10380catggtgcct ctggcccagc ctcctgcccc tgcccccgtg ctgacccctg
gcccccctca 10440gagcctgagc gccccagtgc ccaagagcac ccaggccggc gagggcacac
tgagcgaggc 10500cctgctgcac ctgcagttcg acgccgacga ggacctgggc gccctgctgg
gcaacagcac 10560cgaccccggc gtgttcaccg acctggccag cgtggacaac agcgagttcc
agcagctgct 10620gaaccagggc gtgagcatga gccacagcac cgccgagccc atgctgatgg
agtaccccga 10680ggccatcacc aggctggtga ccggcagcca gagacccccc gaccctgccc
ctacccctct 10740gggcaccagc ggcctgccca acggcctgag cggcgacgag gacttcagca
gcatcgccga 10800catggacttc tccgccctgc tgtcccagat cagctccctg gagctggccg
aggccgctgc 10860caaggaggct gccgctaagg aggccgctgc taaggaggct gctgccaagg
ccgctgccat 10920gaagaacatc aacgccgacg acacctacag gatcatcaac aagatcaagg
cctgcagaag 10980caacaacgac atcaaccagt gcctgagcga catggccaag atggtgcact
gcgagtacta 11040cctgctggcc ttcatctacc cccacagcat ggtgaagagc gacatcagca
tcctggacaa 11100ctaccccaag aagtggaggc agtactacga cgacgccaac ctgatcaagt
acgaccccat 11160cgtggactac agcaacagca accacagccc catcaactgg aacatcttcg
agaacaacgc 11220cgtgaacaaa aagtccccca acgtgatcaa ggaggccaag acagccggcc
tgatcaccgg 11280cttcagcttc cccatccaca ccgccaacaa cggcttcggc atgctgtcct
tcgcccacag 11340cgagaaggac aactacatcg acagcctgtt tctgcacgcc tgcatgaaca
tccccctgat 11400cgtgcccagc ctggtggata actaccggaa gatcaacatc gccaacaaca
agtccaacaa 11460cgacctgacc aagcgggaga aggagtgcct ggcctgggcc tgcgagggca
agagcagctg 11520ggacatcagc aagatcctgg gctgcagcga gaggaccgtg accttccacc
tgaccaacgc 11580ccagatgaag ctgaacacca ccaacaggtg ccagagcatc agcaaggcca
tcctgaccgg 11640cgccatcgac tgcccctact tcaagaacag cagcctgagg ccccccaaaa
agaaaagaaa 11700ggtgcaccac caccaccacc actgatgagg ccgcct
11736310050DNAArtificial SequenceSynthetic 3aattcggcca
ttacggccgc tagcgttaac gtcgacggcc gcctcggcca tcgataagga 60tccggaatgc
ccctctccct cccccccccc taacgttact ggccgaagcc gcttggaata 120aggccggtgt
gcgtttgtct atatgttatt ttccaccata ttgccgtctt ttggcaatgt 180gagggcccgg
aaacctggcc ctgtcttctt gacgagcatt cctaggggtc tttcccctct 240cgccaaagga
atgcaaggtc tgttgaatgt cgtgaaggaa gcagttcctc tggaagcttc 300ttgaagacaa
acaacgtctg tagcgaccct ttgcaggcag cggaaccccc cacctggcga 360caggtgcctc
tgcggccaaa agccacgtgt ataagataca cctgcaaagg cggcacaacc 420ccagtgccac
gttgtgagtt ggatagttgt ggaaagagtc aaatggctct cctcaagcgt 480attcaacaag
gggctgaagg atgcccagaa ggtaccccat tgtatgggat ctgatctggg 540gcctcggtgc
acatgcttta catgtgttta gtcgaggtta aaaaaacgtc taggcccccc 600gaaccacggg
gacgtggttt tcctttgaaa aacacgatga taatatggcc acaaccatgg 660tgagcaaggg
cgaggagctg ttcaccgggg tggtgcccat cctggtcgag ctggacggcg 720acgtaaacgg
ccacaagttc agcgtgtccg gcgagggcga gggcgatgcc acctacggca 780agctgaccct
gaagttcatc tgcaccaccg gcaagctgcc cgtgccctgg cccaccctcg 840tgaccaccct
gacctacggc gtgcagtgct tcagccgcta ccccgaccac atgaagcagc 900acgacttctt
caagtccgcc atgcccgaag gctacgtcca ggagcgcacc atcttcttca 960aggacgacgg
caactacaag acccgcgccg aggtgaagtt cgagggcgac accctggtga 1020accgcatcga
gctgaagggc atcgacttca aggaggacgg caacatcctg gggcacaagc 1080tggagtacaa
ctacaacagc cacaacgtct atatcatggc cgacaagcag aagaacggca 1140tcaaggtgaa
cttcaagatc cgccacaaca tcgaggacgg cagcgtgcag ctcgccgacc 1200actaccagca
gaacaccccc atcggcgacg gccccgtgct gctgcccgac aaccactacc 1260tgagcaccca
gtccgccctg agcaaagacc ccaacgagaa gcgcgatcac atggtcctgc 1320tggagttcgt
gaccgccgcc gggatcactc tcggcatgga cgagctgtac aagtaagcgg 1380ccgcaatcaa
cctctggatt acaaaatttg tgaaagattg actggtattc ttaactatgt 1440tgctcctttt
acgctatgtg gatacgctgc tttaatgcct ttgtatcatg ctattacttc 1500ccgtacggct
ttcattttct cctccttgta taaatcctgg ttgctgtctc tttatgagga 1560gttgtggccc
gttgtcaggc aacgtggcgt ggtgtgcact gtgtttgctg acgcaacccc 1620cactggttgg
ggcattgcca ccacctatca actcctttcc gggactttcg ctttccccct 1680ccctattgcc
acggcggaac tcattgccgc ctgccttgcc cgctgctgga caggggctcg 1740gctgttgggc
actgacaatt ccgtggtgtt gtcggggaag ctgacgtcct ttccatggct 1800gctcgcctgt
gttgccaact ggattctgcg cgggacgtcc ttctgctacg tcccttcggc 1860cctcaatcca
gcggaccttc cttcccgcgg cctgctgccg gttctgcggc ctcttccgcg 1920tcttcgcctt
cgccctcaga cgagtcggat ctccctttgg gccgcctccc cgcctgcctg 1980caggtttgtc
gagacctaga aaaacatgga gcaatcacaa gtagcaatac agcagctacc 2040aatgctgatt
gtgcctggct agaagcacaa gaggaggagg aggtgggttt tccagtcaca 2100cctcaggtac
ctttaagacc aatgacttac aaggcagctg tagatcttag ccacttttta 2160aaagaaaagg
ggggactgga agggctaatt cactcccaac gaagacaaga tctgcttttt 2220gcttgtactg
ggtctctctg gttagaccag atctgagcct gggagctctc tggctaacta 2280gggaacccac
tgcttaagcc tcaataaagc ttgccttgag tgcttcaagt agtgtgtgcc 2340cgtctgttgt
gtgactctgg taactagaga tccctcagac ccttttagtc agtgtggaaa 2400atctctagca
gggcccgttt aaacccgctg atcagcctcg actgtgcctt ctagttgcca 2460gccatctgtt
gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg ccactcccac 2520tgtcctttcc
taataaaatg aggaaattgc atcgcattgt ctgagtaggt gtcattctat 2580tctggggggt
ggggtggggc aggacagcaa gggggaggat tgggaagaca atagcaggca 2640tgctggggat
gcggtgggct ctatggcttc tgaggcggaa agaaccagct ggggctctag 2700ggggtatccc
cacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 2760cagcgtgacc
gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 2820ctttctcgcc
acgttcgccg gctttccccg tcaagctcta aatcggggca tccctttagg 2880gttccgattt
agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 2940acgtagtggg
ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 3000ctttaatagt
ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 3060ttttgattta
taagggattt tggggatttc ggcctattgg ttaaaaaatg agctgattta 3120acaaaaattt
aacgcgaatt aattctgtgg aatgtgtgtc agttagggtg tggaaagtcc 3180ccaggctccc
caggcaggca gaagtatgca aagcatgcat ctcaattagt cagcaaccag 3240gtgtggaaag
tccccaggct ccccagcagg cagaagtatg caaagcatgc atctcaatta 3300gtcagcaacc
atagtcccgc ccctaactcc gcccatcccg cccctaactc cgcccagttc 3360cgcccattct
ccgccccatg gctgactaat tttttttatt tatgcagagg ccgaggccgc 3420ctctgcctct
gagctattcc agaagtagtg aggaggcttt tttggaggcc taggcttttg 3480caaaaagctc
ccgggagctt gtatatccat tttcggatct gatcagcacg tgttgacaat 3540taatcatcgg
catagtatat cggcatagta taatacgaca aggtgaggaa ctaaaccatg 3600gccaagttga
ccagtgccgt tccggtgctc accgcgcgcg acgtcgccgg agcggtcgag 3660ttctggaccg
accggctcgg gttctcccgg gacttcgtgg aggacgactt cgccggtgtg 3720gtccgggacg
acgtgaccct gttcatcagc gcggtccagg accaggtggt gccggacaac 3780accctggcct
gggtgtgggt gcgcggcctg gacgagctgt acgccgagtg gtcggaggtc 3840gtgtccacga
acttccggga cgcctccggg ccggccatga ccgagatcgg cgagcagccg 3900tgggggcggg
agttcgccct gcgcgacccg gccggcaact gcgtgcactt cgtggccgag 3960gagcaggact
gacacgtgct acgagatttc gattccaccg ccgccttcta tgaaaggttg 4020ggcttcggaa
tcgttttccg ggacgccggc tggatgatcc tccagcgcgg ggatctcatg 4080ctggagttct
tcgcccaccc caacttgttt attgcagctt ataatggtta caaataaagc 4140aatagcatca
caaatttcac aaataaagca tttttttcac tgcattctag ttgtggtttg 4200tccaaactca
tcaatgtatc ttatcatgtc tgtataccgt cgacctctag ctagagcttg 4260gcgtaatcat
ggtcatagct gtttcctgtg tgaaattgtt atccgctcac aattccacac 4320aacatacgag
ccggaagcat aaagtgtaaa gcctggggtg cctaatgagt gagctaactc 4380acattaattg
cgttgcgctc actgcccgct ttccagtcgg gaaacctgtc gtgccagctg 4440cattaatgaa
tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct 4500tcctcgctca
ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac 4560tcaaaggcgg
taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga 4620gcaaaaggcc
agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat 4680aggctccgcc
cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac 4740ccgacaggac
tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct 4800gttccgaccc
tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg 4860ctttctcaat
gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg 4920ggctgtgtgc
acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt 4980cttgagtcca
acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg 5040attagcagag
cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac 5100ggctacacta
gaaggacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga 5160aaaagagttg
gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt 5220gtttgcaagc
agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt 5280tctacggggt
ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga 5340ttatcaaaaa
ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc 5400taaagtatat
atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct 5460atctcagcga
tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata 5520actacgatac
gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca 5580cgctcaccgg
ctccagattt atcagcaata aaccagccag ccggaagggc cgagcgcaga 5640agtggtcctg
caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga 5700gtaagtagtt
cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg 5760gtgtcacgct
cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga 5820gttacatgat
cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt 5880gtcagaagta
agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct 5940cttactgtca
tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca 6000ttctgagaat
agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat 6060accgcgccac
atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga 6120aaactctcaa
ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc 6180aactgatctt
cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg 6240caaaatgccg
caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc 6300ctttttcaat
attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt 6360gaatgtattt
agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca 6420cctgacgtcg
acggatcggg agatctcccg atcccctatg gtgcactctc agtacaatct 6480gctctgatgc
cgcatagtta agccagtatc tgctccctgc ttgtgtgttg gaggtcgctg 6540agtagtgcgc
gagcaaaatt taagctacaa caaggcaagg cttgaccgac aattgcatga 6600agaatctgct
tagggttagg cgttttgcgc tgcttcgcga tgtacgggcc agatatacgc 6660gttgacattg
attattgact agttattaat agtaatcaat tacggggtca ttagttcata 6720gcccatatat
ggagttccgc gttacataac ttacggtaaa tggcccgcct ggctgaccgc 6780ccaacgaccc
ccgcccattg acgtcaataa tgacgtatgt tcccatagta acgccaatag 6840ggactttcca
ttgacgtcaa tgggtggact atttacggta aactgcccac ttggcagtac 6900atcaagtgta
tcatatgcca agtacgcccc ctattgacgt caatgacggt aaatggcccg 6960cctggcatta
tgcccagtac atgaccttat gggactttcc tacttggcag tacatctacg 7020tattagtcat
cgctattacc atggtgatgc ggttttggca gtacatcaat gggcgtggat 7080agcggtttga
ctcacgggga tttccaagtc tccaccccat tgacgtcaat gggagtttgt 7140tttggcacca
aaatcaacgg gactttccaa aatgtcgtaa caactccgcc ccattgacgc 7200aaatgggcgg
taggcgtgta cggtgggagg tctatataag cagagctctc tggctaacta 7260gagaacccac
tgcttactgg cttatcgaaa ttaatacgac tcactatagg gagacccaag 7320ctggtttaaa
cttaagcttg gtaccgagct cactagtcca gtgtggtggc agatatccag 7380cacagtggcg
gccgctcgag tctagagggc ccgttttgcc tgtactgggt ctctctggtt 7440agaccagatc
tgagcctggg agctctctgg ctaactaggg aacccactgc ttaagcctca 7500ataaagcttg
ccttgagtgc ttcaagtagt gtgtgcccgt ctgttgtgtg actctggtaa 7560ctagagatcc
ctcagaccct tttagtcagt gtggaaaatc tctagcagtg gcgcccgaac 7620agggacttga
aagcgaaagg gaaaccagag gagctctctc gacgcaggac tcggcttgct 7680gaagcgcgca
cggcaagagg cgaggggcgg cgactggtga gtacgccaaa aattttgact 7740agcggaggct
agaaggagag agatgggtgc gagagcgtca gtattaagcg ggggagaatt 7800agatcgcgat
gggaaaaaat tcggttaagg ccagggggaa agaaaaaata taaattaaaa 7860catatagtat
gggcaagcag ggagctagaa cgattcgcag ttaatcctgg cctgttagaa 7920acatcagaag
gctgtagaca aatactggga cagctacaac catcccttca gacaggatca 7980gaagaactta
gatcattata taatacagta gcaaccctct attgtgtgca tcaaaggata 8040gagataaaag
acaccaagga agctttagac aagatagagg aagagcaaaa caaaagtaag 8100accaccgcac
agcaagcggc cgctgatctt cagacctgga ggaggagata tgagggacaa 8160ttggagaagt
gaattatata aatataaagt agtaaaaatt gaaccattag gagtagcacc 8220caccaaggca
aagagaagag tggtgcagag agaaaaaaga gcagtgggaa taggagcttt 8280gttccttggg
ttcttgggag cagcaggaag cactatgggc gcagcgtcaa tgacgctgac 8340ggtacaggcc
agacaattat tgtctggtat agtgcagcag cagaacaatt tgctgagggc 8400tattgaggcg
caacagcatc tgttgcaact cacagtctgg ggcatcaagc agctccaggc 8460aagaatcctg
gctgtggaaa gatacctaaa ggatcaacag ctcctgggga tttggggttg 8520ctctggaaaa
ctcatttgca ccactgctgt gccttggaat gctagttgga gtaataaatc 8580tctggaacag
atttggaatc acacgacctg gatggagtgg gacagagaaa ttaacaatta 8640cacaagctta
atacactcct taattgaaga atcgcaaaac cagcaagaaa agaatgaaca 8700agaattattg
gaattagata aatgggcaag tttgtggaat tggtttaaca taacaaattg 8760gctgtggtat
atagaaatta ttcataatga tagtaggagg cttggtaggt ttaagaatag 8820tttttgctgt
actttctata gtgaatagag ttaggcaggg atattcacca ttatcgtttc 8880agacccacct
cccaaccccg aggggacccg acaggcccga aggaatagaa gaagaaggtg 8940gagagagaga
cagagacaga tccattcgat tagtgaacgg atcggcactg cgtgcgccaa 9000ttctgcagac
aaatggcagt attcatccac aattttaaaa gaaaaggggg gattgggggg 9060tacagtgcag
gggaaagaat agtagacata atagcaacag acatacaaac taaagaatta 9120caaaaacaaa
ttacaaaaat tcaaaatttt cgggtttatt acagggacag cagagatcca 9180gtttggggtt
gctctggaaa actcatttgc accactgctg tgccttggaa tgctagttgg 9240agtaataaat
ctctggaaca gatttggaat cacacgacct ggatggagtg ggacagagaa 9300attaacaatt
acacaagctt aatacactcc ttaattgaag aatcgcaaaa ccagcaagaa 9360aagaatgaac
aagaattatt ggaattagat aaatgggcaa gtttgtggaa ttggtttaac 9420ataacaaatt
ggctgtggta tataaaatta ttcataatga tagtaggagg cttggtaggt 9480ttaagaatag
tttttgctgt actttctata gtgaatagag ttaggcaggg atattcacca 9540ttatcgtttc
agacccacct cccaaccccg aggggacccg acaggccctt aattaatccc 9600ctgattctgt
ggataaccgt attaccgcct ttgagtgagc tgcacacctg taggatcgta 9660caggtaaagt
gaaaggctac aataggacac ctgtaggatc gtacaggtgg taaactcgag 9720agcgcccaat
aacctgtagg atcgtacagg tagcgcacta gagagcgccc aataacctgt 9780aggatcgtac
aggtaaagtg aaaggctaca ataggacacc tgtaggatcg tacaggtggt 9840aaactcgaga
gcgcccaata acctgtagga tcgtacaggt aaagtgaaag gctacaatag 9900gacacctgta
ggatcgtaca ggtggtaaac tcgacctata taagcagagc tcgtttagtg 9960aaccgtcaga
tcgcctggag acgccatcca cgctgttttg acctccatag aagacaccgg 10020gaccgatcca
gcctccgcgg ccccgaattg
10050410287DNAArtificial SequenceSynthetic 4cggccatcga taaggatccg
cccctctccc tccccccccc ctaacgttac tggccgaagc 60cgcttggaat aaggccggtg
tgcgtttgtc tatatgttat tttccaccat attgccgtct 120tttggcaatg tgagggcccg
gaaacctggc cctgtcttct tgacgagcat tcctaggggt 180ctttcccctc tcgccaaagg
aatgcaaggt ctgttgaatg tcgtgaagga agcagttcct 240ctggaagctt cttgaagaca
aacaacgtct gtagcgaccc tttgcaggca gcggaacccc 300ccacctggcg acaggtgcct
ctgcggccaa aagccacgtg tataagatac acctgcaaag 360gcggcacaac cccagtgcca
cgttgtgagt tggatagttg tggaaagagt caaatggctc 420tcctcaagcg tattcaacaa
ggggctgaag gatgcccaga aggtacccca ttgtatggga 480tctgatctgg ggcctcggtg
cacatgcttt acatgtgttt agtcgaggtt aaaaaaacgt 540ctaggccccc cgaaccacgg
ggacgtggtt ttcctttgaa aaacacgatg ataatatggc 600cacaaccatg gcctcctccg
aggacgtcat caaggagttc atgcgcttca aggtgcgcat 660ggagggctcc gtgaacggcc
acgagttcga gatcgagggc gagggcgagg gccgccccta 720cgagggcacc cagaccgcca
agctgaaggt gaccaagggc ggccccctgc ccttcgcctg 780ggacatcctg tccccccagt
tccagtacgg ctccaaggtg tacgtgaagc accccgccga 840catccccgac tacaagaagc
tgtccttccc cgagggcttc aagtgggagc gcgtgatgaa 900cttcgaggac ggcggcgtgg
tgaccgtgac ccaggactcc tccctgcagg acggctcctt 960catctacaag gtgaagttca
tcggcgtgaa cttcccctcc gacggccccg taatgcagaa 1020gaagactatg ggctgggagg
cctccaccga gcgcctgtac ccccgcgacg gcgtgctgaa 1080gggcgagatc cacaaggccc
tgaagctgaa ggacggcggc cactacctgg tggagttcaa 1140gtccatctac atggccaaga
agcccgtgca gctgcccggc tactactacg tggactccaa 1200gctggacatc acctcccaca
acgaggacta caccatcgtg gagcagtacg agcgcgccga 1260gggccgccac cacctgttcc
tgtaggcggc cgcaatcaac ctctggatta caaaatttgt 1320gaaagattga ctggtattct
taactatgtt gctcctttta cgctatgtgg atacgctgct 1380ttaatgcctt tgtatcatgc
tattacttcc cgtacggctt tcattttctc ctccttgtat 1440aaatcctggt tgctgtctct
ttatgaggag ttgtggcccg ttgtcaggca acgtggcgtg 1500gtgtgcactg tgtttgctga
cgcaaccccc actggttggg gcattgccac cacctatcaa 1560ctcctttccg ggactttcgc
tttccccctc cctattgcca cggcggaact cattgccgcc 1620tgccttgccc gctgctggac
aggggctcgg ctgttgggca ctgacaattc cgtggtgttg 1680tcggggaagc tgacgtcctt
tccatggctg ctcgcctgtg ttgccaactg gattctgcgc 1740gggacgtcct tctgctacgt
cccttcggcc ctcaatccag cggaccttcc ttcccgcggc 1800ctgctgccgg ttctgcggcc
tcttccgcgt cttcgccttc gccctcagac gagtcggatc 1860tccctttggg ccgcctcccc
gcctgcctgc aggtttgtcg agacctagaa aaacatggag 1920caatcacaag tagcaataca
gcagctacca atgctgattg tgcctggcta gaagcacaag 1980aggaggagga ggtgggtttt
ccagtcacac ctcaggtacc tttaagacca atgacttaca 2040aggcagctgt agatcttagc
cactttttaa aagaaaaggg gggactggaa gggctaattc 2100actcccaacg aagacaagat
ctgctttttg cttgtactgg gtctctctgg ttagaccaga 2160tctgagcctg ggagctctct
ggctaactag ggaacccact gcttaagcct caataaagct 2220tgccttgagt gcttcaagta
gtgtgtgccc gtctgttgtg tgactctggt aactagagat 2280ccctcagacc cttttagtca
gtgtggaaaa tctctagcag ggcccgttta aacccgctga 2340tcagcctcga ctgtgccttc
tagttgccag ccatctgttg tttgcccctc ccccgtgcct 2400tccttgaccc tggaaggtgc
cactcccact gtcctttcct aataaaatga ggaaattgca 2460tcgcattgtc tgagtaggtg
tcattctatt ctggggggtg gggtggggca ggacagcaag 2520ggggaggatt gggaagacaa
tagcaggcat gctggggatg cggtgggctc tatggcttct 2580gaggcggaaa gaaccagctg
gggctctagg gggtatcccc acgcgccctg tagcggcgca 2640ttaagcgcgg cgggtgtggt
ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta 2700gcgcccgctc ctttcgcttt
cttcccttcc tttctcgcca cgttcgccgg ctttccccgt 2760caagctctaa atcggggcat
ccctttaggg ttccgattta gtgctttacg gcacctcgac 2820cccaaaaaac ttgattaggg
tgatggttca cgtagtgggc catcgccctg atagacggtt 2880tttcgccctt tgacgttgga
gtccacgttc tttaatagtg gactcttgtt ccaaactgga 2940acaacactca accctatctc
ggtctattct tttgatttat aagggatttt ggggatttcg 3000gcctattggt taaaaaatga
gctgatttaa caaaaattta acgcgaatta attctgtgga 3060atgtgtgtca gttagggtgt
ggaaagtccc caggctcccc aggcaggcag aagtatgcaa 3120agcatgcatc tcaattagtc
agcaaccagg tgtggaaagt ccccaggctc cccagcaggc 3180agaagtatgc aaagcatgca
tctcaattag tcagcaacca tagtcccgcc cctaactccg 3240cccatcccgc ccctaactcc
gcccagttcc gcccattctc cgccccatgg ctgactaatt 3300ttttttattt atgcagaggc
cgaggccgcc tctgcctctg agctattcca gaagtagtga 3360ggaggctttt ttggaggcct
aggcttttgc aaaaagctcc cgggagcttg tatatccatt 3420ttcggatctg atcagcacgt
gttgacaatt aatcatcggc atagtatatc ggcatagtat 3480aatacgacaa ggtgaggaac
taaaccatgg ccaagttgac cagtgccgtt ccggtgctca 3540ccgcgcgcga cgtcgccgga
gcggtcgagt tctggaccga ccggctcggg ttctcccggg 3600acttcgtgga ggacgacttc
gccggtgtgg tccgggacga cgtgaccctg ttcatcagcg 3660cggtccagga ccaggtggtg
ccggacaaca ccctggcctg ggtgtgggtg cgcggcctgg 3720acgagctgta cgccgagtgg
tcggaggtcg tgtccacgaa cttccgggac gcctccgggc 3780cggccatgac cgagatcggc
gagcagccgt gggggcggga gttcgccctg cgcgacccgg 3840ccggcaactg cgtgcacttc
gtggccgagg agcaggactg acacgtgcta cgagatttcg 3900attccaccgc cgccttctat
gaaaggttgg gcttcggaat cgttttccgg gacgccggct 3960ggatgatcct ccagcgcggg
gatctcatgc tggagttctt cgcccacccc aacttgttta 4020ttgcagctta taatggttac
aaataaagca atagcatcac aaatttcaca aataaagcat 4080ttttttcact gcattctagt
tgtggtttgt ccaaactcat caatgtatct tatcatgtct 4140gtataccgtc gacctctagc
tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt 4200gaaattgtta tccgctcaca
attccacaca acatacgagc cggaagcata aagtgtaaag 4260cctggggtgc ctaatgagtg
agctaactca cattaattgc gttgcgctca ctgcccgctt 4320tccagtcggg aaacctgtcg
tgccagctgc attaatgaat cggccaacgc gcggggagag 4380gcggtttgcg tattgggcgc
tcttccgctt cctcgctcac tgactcgctg cgctcggtcg 4440ttcggctgcg gcgagcggta
tcagctcact caaaggcggt aatacggtta tccacagaat 4500caggggataa cgcaggaaag
aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta 4560aaaaggccgc gttgctggcg
tttttccata ggctccgccc ccctgacgag catcacaaaa 4620atcgacgctc aagtcagagg
tggcgaaacc cgacaggact ataaagatac caggcgtttc 4680cccctggaag ctccctcgtg
cgctctcctg ttccgaccct gccgcttacc ggatacctgt 4740ccgcctttct cccttcggga
agcgtggcgc tttctcaatg ctcacgctgt aggtatctca 4800gttcggtgta ggtcgttcgc
tccaagctgg gctgtgtgca cgaacccccc gttcagcccg 4860accgctgcgc cttatccggt
aactatcgtc ttgagtccaa cccggtaaga cacgacttat 4920cgccactggc agcagccact
ggtaacagga ttagcagagc gaggtatgta ggcggtgcta 4980cagagttctt gaagtggtgg
cctaactacg gctacactag aaggacagta tttggtatct 5040gcgctctgct gaagccagtt
accttcggaa aaagagttgg tagctcttga tccggcaaac 5100aaaccaccgc tggtagcggt
ggtttttttg tttgcaagca gcagattacg cgcagaaaaa 5160aaggatctca agaagatcct
ttgatctttt ctacggggtc tgacgctcag tggaacgaaa 5220actcacgtta agggattttg
gtcatgagat tatcaaaaag gatcttcacc tagatccttt 5280taaattaaaa atgaagtttt
aaatcaatct aaagtatata tgagtaaact tggtctgaca 5340gttaccaatg cttaatcagt
gaggcaccta tctcagcgat ctgtctattt cgttcatcca 5400tagttgcctg actccccgtc
gtgtagataa ctacgatacg ggagggctta ccatctggcc 5460ccagtgctgc aatgataccg
cgagacccac gctcaccggc tccagattta tcagcaataa 5520accagccagc cggaagggcc
gagcgcagaa gtggtcctgc aactttatcc gcctccatcc 5580agtctattaa ttgttgccgg
gaagctagag taagtagttc gccagttaat agtttgcgca 5640acgttgttgc cattgctaca
ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat 5700tcagctccgg ttcccaacga
tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag 5760cggttagctc cttcggtcct
ccgatcgttg tcagaagtaa gttggccgca gtgttatcac 5820tcatggttat ggcagcactg
cataattctc ttactgtcat gccatccgta agatgctttt 5880ctgtgactgg tgagtactca
accaagtcat tctgagaata gtgtatgcgg cgaccgagtt 5940gctcttgccc ggcgtcaata
cgggataata ccgcgccaca tagcagaact ttaaaagtgc 6000tcatcattgg aaaacgttct
tcggggcgaa aactctcaag gatcttaccg ctgttgagat 6060ccagttcgat gtaacccact
cgtgcaccca actgatcttc agcatctttt actttcacca 6120gcgtttctgg gtgagcaaaa
acaggaaggc aaaatgccgc aaaaaaggga ataagggcga 6180cacggaaatg ttgaatactc
atactcttcc tttttcaata ttattgaagc atttatcagg 6240gttattgtct catgagcgga
tacatatttg aatgtattta gaaaaataaa caaatagggg 6300ttccgcgcac atttccccga
aaagtgccac ctgacgtcga cggatcggga gatctcccga 6360tcccctatgg tgcactctca
gtacaatctg ctctgatgcc gcatagttaa gccagtatct 6420gctccctgct tgtgtgttgg
aggtcgctga gtagtgcgcg agcaaaattt aagctacaac 6480aaggcaaggc ttgaccgaca
attgcatgaa gaatctgctt agggttaggc gttttgcgct 6540gcttcgcgat gtacgggcca
gatatacgcg ttgacattga ttattgacta gttattaata 6600gtaatcaatt acggggtcat
tagttcatag cccatatatg gagttccgcg ttacataact 6660tacggtaaat ggcccgcctg
gctgaccgcc caacgacccc cgcccattga cgtcaataat 6720gacgtatgtt cccatagtaa
cgccaatagg gactttccat tgacgtcaat gggtggacta 6780tttacggtaa actgcccact
tggcagtaca tcaagtgtat catatgccaa gtacgccccc 6840tattgacgtc aatgacggta
aatggcccgc ctggcattat gcccagtaca tgaccttatg 6900ggactttcct acttggcagt
acatctacgt attagtcatc gctattacca tggtgatgcg 6960gttttggcag tacatcaatg
ggcgtggata gcggtttgac tcacggggat ttccaagtct 7020ccaccccatt gacgtcaatg
ggagtttgtt ttggcaccaa aatcaacggg actttccaaa 7080atgtcgtaac aactccgccc
cattgacgca aatgggcggt aggcgtgtac ggtgggaggt 7140ctatataagc agagctctct
ggctaactag agaacccact gcttactggc ttatcgaaat 7200taatacgact cactataggg
agacccaagc tggtttaaac ttaagcttgg taccgagctc 7260actagtccag tgtggtggca
gatatccagc acagtggcgg ccgctcgagt ctagagggcc 7320cgttttgcct gtactgggtc
tctctggtta gaccagatct gagcctggga gctctctggc 7380taactaggga acccactgct
taagcctcaa taaagcttgc cttgagtgct tcaagtagtg 7440tgtgcccgtc tgttgtgtga
ctctggtaac tagagatccc tcagaccctt ttagtcagtg 7500tggaaaatct ctagcagtgg
cgcccgaaca gggacttgaa agcgaaaggg aaaccagagg 7560agctctctcg acgcaggact
cggcttgctg aagcgcgcac ggcaagaggc gaggggcggc 7620gactggtgag tacgccaaaa
attttgacta gcggaggcta gaaggagaga gatgggtgcg 7680agagcgtcag tattaagcgg
gggagaatta gatcgcgatg ggaaaaaatt cggttaaggc 7740cagggggaaa gaaaaaatat
aaattaaaac atatagtatg ggcaagcagg gagctagaac 7800gattcgcagt taatcctggc
ctgttagaaa catcagaagg ctgtagacaa atactgggac 7860agctacaacc atcccttcag
acaggatcag aagaacttag atcattatat aatacagtag 7920caaccctcta ttgtgtgcat
caaaggatag agataaaaga caccaaggaa gctttagaca 7980agatagagga agagcaaaac
aaaagtaaga ccaccgcaca gcaagcggcc gctgatcttc 8040agacctggag gaggagatat
gagggacaat tggagaagtg aattatataa atataaagta 8100gtaaaaattg aaccattagg
agtagcaccc accaaggcaa agagaagagt ggtgcagaga 8160gaaaaaagag cagtgggaat
aggagctttg ttccttgggt tcttgggagc agcaggaagc 8220actatgggcg cagcgtcaat
gacgctgacg gtacaggcca gacaattatt gtctggtata 8280gtgcagcagc agaacaattt
gctgagggct attgaggcgc aacagcatct gttgcaactc 8340acagtctggg gcatcaagca
gctccaggca agaatcctgg ctgtggaaag atacctaaag 8400gatcaacagc tcctggggat
ttggggttgc tctggaaaac tcatttgcac cactgctgtg 8460ccttggaatg ctagttggag
taataaatct ctggaacaga tttggaatca cacgacctgg 8520atggagtggg acagagaaat
taacaattac acaagcttaa tacactcctt aattgaagaa 8580tcgcaaaacc agcaagaaaa
gaatgaacaa gaattattgg aattagataa atgggcaagt 8640ttgtggaatt ggtttaacat
aacaaattgg ctgtggtata taaaattatt cataatgata 8700gtaggaggct tggtaggttt
aagaatagtt tttgctgtac tttctatagt gaatagagtt 8760aggcagggat attcaccatt
atcgtttcag acccacctcc caaccccgag gggacccgac 8820aggcccttaa ttaattggct
ccggtgcccg tcagtgggca gagcgcacat cgcccacagt 8880ccccgagaag ttggggggag
gggtcggcaa ttgaaccggt gcctagagaa ggtggcgcgg 8940ggtaaactgg gaaagtgatg
tcgtgtactg gctccgcctt tttcccgagg gtgggggaga 9000accgtatata agtgcagtag
tcgccgtgaa cgttcttttt cgcaacgggt ttgccgccag 9060aacacaggta agtgccgtgt
gtggttcccg cgggcctggc ctctttacgg gttatggccc 9120ttgcgtgcct tgaattactt
ccacctggct gcagtacgtg attcttgatc ccgagcttcg 9180ggttggaagt gggtgggaga
gttcgaggcc ttgcgcttaa ggagcccctt cgcctcgtgc 9240ttgagttgag gcctggcctg
ggcgctgggg ccgccgcgtg cgaatctggt ggcaccttcg 9300cgcctgtctc gctgctttcg
ataagtctct agccatttaa aatttttgat gacctgctgc 9360gacgcttttt ttctggcaag
atagtcttgt aaatgcgggc caagatctgc acactggtat 9420ttcggttttt ggggccgcgg
gcggcgacgg ggcccgtgcg tcccagcgca catgttcggc 9480gaggcggggc ctgcgagcgc
ggccaccgag aatcggacgg gggtagtctc aagctggccg 9540gcctgctctg gtgcctggcc
tcgcgccgcc gtgtatcgcc ccgccctggg cggcaaggct 9600ggcccggtcg gcaccagttg
cgtgagcgga aagatggccg cttcccggcc ctgctgcagg 9660gagctcaaaa tggaggacgc
ggcgctcggg agagcgggcg ggtgagtcac ccacacaaag 9720gaaaagggcc tttccgtcct
cagccgtcgc ttcatgtgac tccacggagt accgggcgcc 9780gtccaggcac ctcgattagt
tctcgagctt ttggagtacg tcgtctttag gttgggggga 9840ggggttttat gcgatggagt
ttccccacac tgagtgggtg gagactgaag ttaggccagc 9900ttggcacttg atgtaattct
ccttggaatt tgcccttttt gagtttggat cttggttcat 9960tctcaagcct cagacagtgg
ttcaaagttt ttttcttcca tttcaggtgt cgtgaggaat 10020tcggccatta cggcccgcca
ccatgagcac catcgaggag agggtgaaga agatcatcgg 10080cgagcagctg ggcgtgaagc
aggaggaggt caccaacaac gccagcttcg tggaggacct 10140gggcgccgac agcctggaca
ccgtggagct ggtgatggcc ctggaggagg agttcgacac 10200cgagatcccc gacgaggagg
ccgagaagat caccaccgtg caggccgcca tcgactacat 10260caacggccac caggcctgag
gccgcct 10287512730DNAArtificial
SequenceSynthetic 5cgcgccaagc tagcaagtta acaaatcgat ccggatccgc ccctctccct
cccccccccc 60taacgttact ggccgaagcc gcttggaata aggccggtgt gcgtttgtct
atatgttatt 120ttccaccata ttgccgtctt ttggcaatgt gagggcccgg aaacctggcc
ctgtcttctt 180gacgagcatt cctaggggtc tttcccctct cgccaaagga atgcaaggtc
tgttgaatgt 240cgtgaaggaa gcagttcctc tggaagcttc ttgaagacaa acaacgtctg
tagcgaccct 300ttgcaggcag cggaaccccc cacctggcga caggtgcctc tgcggccaaa
agccacgtgt 360ataagataca cctgcaaagg cggcacaacc ccagtgccac gttgtgagtt
ggatagttgt 420ggaaagagtc aaatggctct cctcaagcgt attcaacaag gggctgaagg
atgcccagaa 480ggtaccccat tgtatgggat ctgatctggg gcctcggtgc acatgcttta
catgtgttta 540gtcgaggtta aaaaaacgtc taggcccccc gaaccacggg gacgtggttt
tcctttgaaa 600aacacgatga taatatggcc acaaccatgg tgagcaaggg cgaggagctg
ttcaccgggg 660tggtgcccat cctggtcgag ctggacggcg acgtaaacgg ccacaagttc
agcgtgtccg 720gcgagggcga gggcgatgcc acctacggca agctgaccct gaagttcatc
tgcaccaccg 780gcaagctgcc cgtgccctgg cccaccctcg tgaccaccct gacctacggc
gtgcagtgct 840tcagccgcta ccccgaccac atgaagcagc acgacttctt caagtccgcc
atgcccgaag 900gctacgtcca ggagcgcacc atcttcttca aggacgacgg caactacaag
acccgcgccg 960aggtgaagtt cgagggcgac accctggtga accgcatcga gctgaagggc
atcgacttca 1020aggaggacgg caacatcctg gggcacaagc tggagtacaa ctacaacagc
cacaacgtct 1080atatcatggc cgacaagcag aagaacggca tcaaggtgaa cttcaagatc
cgccacaaca 1140tcgaggacgg cagcgtgcag ctcgccgacc actaccagca gaacaccccc
atcggcgacg 1200gccccgtgct gctgcccgac aaccactacc tgagcaccca gtccgccctg
agcaaagacc 1260ccaacgagaa gcgcgatcac atggtcctgc tggagttcgt gaccgccgcc
gggatcactc 1320tcggcatgga cgagctgtac aagtaagcgg ccgcaatcaa cctctggatt
acaaaatttg 1380tgaaagattg actggtattc ttaactatgt tgctcctttt acgctatgtg
gatacgctgc 1440tttaatgcct ttgtatcatg ctattacttc ccgtacggct ttcattttct
cctccttgta 1500taaatcctgg ttgctgtctc tttatgagga gttgtggccc gttgtcaggc
aacgtggcgt 1560ggtgtgcact gtgtttgctg acgcaacccc cactggttgg ggcattgcca
ccacctatca 1620actcctttcc gggactttcg ctttccccct ccctattgcc acggcggaac
tcattgccgc 1680ctgccttgcc cgctgctgga caggggctcg gctgttgggc actgacaatt
ccgtggtgtt 1740gtcggggaag ctgacgtcct ttccatggct gctcgcctgt gttgccaact
ggattctgcg 1800cgggacgtcc ttctgctacg tcccttcggc cctcaatcca gcggaccttc
cttcccgcgg 1860cctgctgccg gttctgcggc ctcttccgcg tcttcgcctt cgccctcaga
cgagtcggat 1920ctccctttgg gccgcctccc cgcctgcctg caggtttgtc gagacctaga
aaaacatgga 1980gcaatcacaa gtagcaatac agcagctacc aatgctgatt gtgcctggct
agaagcacaa 2040gaggaggagg aggtgggttt tccagtcaca cctcaggtac ctttaagacc
aatgacttac 2100aaggcagctg tagatcttag ccacttttta aaagaaaagg ggggactgga
agggctaatt 2160cactcccaac gaagacaaga tctgcttttt gcttgtactg ggtctctctg
gttagaccag 2220atctgagcct gggagctctc tggctaacta gggaacccac tgcttaagcc
tcaataaagc 2280ttgccttgag tgcttcaagt agtgtgtgcc cgtctgttgt gtgactctgg
taactagaga 2340tccctcagac ccttttagtc agtgtggaaa atctctagca gggcccgttt
aaacccgctg 2400atcagcctcg actgtgcctt ctagttgcca gccatctgtt gtttgcccct
cccccgtgcc 2460ttccttgacc ctggaaggtg ccactcccac tgtcctttcc taataaaatg
aggaaattgc 2520atcgcattgt ctgagtaggt gtcattctat tctggggggt ggggtggggc
aggacagcaa 2580gggggaggat tgggaagaca atagcaggca tgctggggat gcggtgggct
ctatggcttc 2640tgaggcggaa agaaccagct ggggctctag ggggtatccc cacgcgccct
gtagcggcgc 2700attaagcgcg gcgggtgtgg tggttacgcg cagcgtgacc gctacacttg
ccagcgccct 2760agcgcccgct cctttcgctt tcttcccttc ctttctcgcc acgttcgccg
gctttccccg 2820tcaagctcta aatcggggca tccctttagg gttccgattt agtgctttac
ggcacctcga 2880ccccaaaaaa cttgattagg gtgatggttc acgtagtggg ccatcgccct
gatagacggt 2940ttttcgccct ttgacgttgg agtccacgtt ctttaatagt ggactcttgt
tccaaactgg 3000aacaacactc aaccctatct cggtctattc ttttgattta taagggattt
tggggatttc 3060ggcctattgg ttaaaaaatg agctgattta acaaaaattt aacgcgaatt
aattctgtgg 3120aatgtgtgtc agttagggtg tggaaagtcc ccaggctccc caggcaggca
gaagtatgca 3180aagcatgcat ctcaattagt cagcaaccag gtgtggaaag tccccaggct
ccccagcagg 3240cagaagtatg caaagcatgc atctcaatta gtcagcaacc atagtcccgc
ccctaactcc 3300gcccatcccg cccctaactc cgcccagttc cgcccattct ccgccccatg
gctgactaat 3360tttttttatt tatgcagagg ccgaggccgc ctctgcctct gagctattcc
agaagtagtg 3420aggaggcttt tttggaggcc taggcttttg caaaaagctc ccgggagctt
gtatatccat 3480tttcggatct gatcagcacg tgttgacaat taatcatcgg catagtatat
cggcatagta 3540taatacgaca aggtgaggaa ctaaaccatg gccaagttga ccagtgccgt
tccggtgctc 3600accgcgcgcg acgtcgccgg agcggtcgag ttctggaccg accggctcgg
gttctcccgg 3660gacttcgtgg aggacgactt cgccggtgtg gtccgggacg acgtgaccct
gttcatcagc 3720gcggtccagg accaggtggt gccggacaac accctggcct gggtgtgggt
gcgcggcctg 3780gacgagctgt acgccgagtg gtcggaggtc gtgtccacga acttccggga
cgcctccggg 3840ccggccatga ccgagatcgg cgagcagccg tgggggcggg agttcgccct
gcgcgacccg 3900gccggcaact gcgtgcactt cgtggccgag gagcaggact gacacgtgct
acgagatttc 3960gattccaccg ccgccttcta tgaaaggttg ggcttcggaa tcgttttccg
ggacgccggc 4020tggatgatcc tccagcgcgg ggatctcatg ctggagttct tcgcccaccc
caacttgttt 4080attgcagctt ataatggtta caaataaagc aatagcatca caaatttcac
aaataaagca 4140tttttttcac tgcattctag ttgtggtttg tccaaactca tcaatgtatc
ttatcatgtc 4200tgtataccgt cgacctctag ctagagcttg gcgtaatcat ggtcatagct
gtttcctgtg 4260tgaaattgtt atccgctcac aattccacac aacatacgag ccggaagcat
aaagtgtaaa 4320gcctggggtg cctaatgagt gagctaactc acattaattg cgttgcgctc
actgcccgct 4380ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa tcggccaacg
cgcggggaga 4440ggcggtttgc gtattgggcg ctcttccgct tcctcgctca ctgactcgct
gcgctcggtc 4500gttcggctgc ggcgagcggt atcagctcac tcaaaggcgg taatacggtt
atccacagaa 4560tcaggggata acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc
caggaaccgt 4620aaaaaggccg cgttgctggc gtttttccat aggctccgcc cccctgacga
gcatcacaaa 4680aatcgacgct caagtcagag gtggcgaaac ccgacaggac tataaagata
ccaggcgttt 4740ccccctggaa gctccctcgt gcgctctcct gttccgaccc tgccgcttac
cggatacctg 4800tccgcctttc tcccttcggg aagcgtggcg ctttctcaat gctcacgctg
taggtatctc 4860agttcggtgt aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc
cgttcagccc 4920gaccgctgcg ccttatccgg taactatcgt cttgagtcca acccggtaag
acacgactta 4980tcgccactgg cagcagccac tggtaacagg attagcagag cgaggtatgt
aggcggtgct 5040acagagttct tgaagtggtg gcctaactac ggctacacta gaaggacagt
atttggtatc 5100tgcgctctgc tgaagccagt taccttcgga aaaagagttg gtagctcttg
atccggcaaa 5160caaaccaccg ctggtagcgg tggttttttt gtttgcaagc agcagattac
gcgcagaaaa 5220aaaggatctc aagaagatcc tttgatcttt tctacggggt ctgacgctca
gtggaacgaa 5280aactcacgtt aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac
ctagatcctt 5340ttaaattaaa aatgaagttt taaatcaatc taaagtatat atgagtaaac
ttggtctgac 5400agttaccaat gcttaatcag tgaggcacct atctcagcga tctgtctatt
tcgttcatcc 5460atagttgcct gactccccgt cgtgtagata actacgatac gggagggctt
accatctggc 5520cccagtgctg caatgatacc gcgagaccca cgctcaccgg ctccagattt
atcagcaata 5580aaccagccag ccggaagggc cgagcgcaga agtggtcctg caactttatc
cgcctccatc 5640cagtctatta attgttgccg ggaagctaga gtaagtagtt cgccagttaa
tagtttgcgc 5700aacgttgttg ccattgctac aggcatcgtg gtgtcacgct cgtcgtttgg
tatggcttca 5760ttcagctccg gttcccaacg atcaaggcga gttacatgat cccccatgtt
gtgcaaaaaa 5820gcggttagct ccttcggtcc tccgatcgtt gtcagaagta agttggccgc
agtgttatca 5880ctcatggtta tggcagcact gcataattct cttactgtca tgccatccgt
aagatgcttt 5940tctgtgactg gtgagtactc aaccaagtca ttctgagaat agtgtatgcg
gcgaccgagt 6000tgctcttgcc cggcgtcaat acgggataat accgcgccac atagcagaac
tttaaaagtg 6060ctcatcattg gaaaacgttc ttcggggcga aaactctcaa ggatcttacc
gctgttgaga 6120tccagttcga tgtaacccac tcgtgcaccc aactgatctt cagcatcttt
tactttcacc 6180agcgtttctg ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg
aataagggcg 6240acacggaaat gttgaatact catactcttc ctttttcaat attattgaag
catttatcag 6300ggttattgtc tcatgagcgg atacatattt gaatgtattt agaaaaataa
acaaataggg 6360gttccgcgca catttccccg aaaagtgcca cctgacgtcg acggatcggg
agatctcccg 6420atcccctatg gtgcactctc agtacaatct gctctgatgc cgcatagtta
agccagtatc 6480tgctccctgc ttgtgtgttg gaggtcgctg agtagtgcgc gagcaaaatt
taagctacaa 6540caaggcaagg cttgaccgac aattgcatga agaatctgct tagggttagg
cgttttgcgc 6600tgcttcgcga tgtacgggcc agatatacgc gttgacattg attattgact
agttattaat 6660agtaatcaat tacggggtca ttagttcata gcccatatat ggagttccgc
gttacataac 6720ttacggtaaa tggcccgcct ggctgaccgc ccaacgaccc ccgcccattg
acgtcaataa 6780tgacgtatgt tcccatagta acgccaatag ggactttcca ttgacgtcaa
tgggtggact 6840atttacggta aactgcccac ttggcagtac atcaagtgta tcatatgcca
agtacgcccc 6900ctattgacgt caatgacggt aaatggcccg cctggcatta tgcccagtac
atgaccttat 6960gggactttcc tacttggcag tacatctacg tattagtcat cgctattacc
atggtgatgc 7020ggttttggca gtacatcaat gggcgtggat agcggtttga ctcacgggga
tttccaagtc 7080tccaccccat tgacgtcaat gggagtttgt tttggcacca aaatcaacgg
gactttccaa 7140aatgtcgtaa caactccgcc ccattgacgc aaatgggcgg taggcgtgta
cggtgggagg 7200tctatataag cagagctctc tggctaacta gagaacccac tgcttactgg
cttatcgaaa 7260ttaatacgac tcactatagg gagacccaag ctggtttaaa cttaagcttg
gtaccgagct 7320cactagtcca gtgtggtggc agatatccag cacagtggcg gccgctcgag
tctagagggc 7380ccgttttgcc tgtactgggt ctctctggtt agaccagatc tgagcctggg
agctctctgg 7440ctaactaggg aacccactgc ttaagcctca ataaagcttg ccttgagtgc
ttcaagtagt 7500gtgtgcccgt ctgttgtgtg actctggtaa ctagagatcc ctcagaccct
tttagtcagt 7560gtggaaaatc tctagcagtg gcgcccgaac agggacttga aagcgaaagg
gaaaccagag 7620gagctctctc gacgcaggac tcggcttgct gaagcgcgca cggcaagagg
cgaggggcgg 7680cgactggtga gtacgccaaa aattttgact agcggaggct agaaggagag
agatgggtgc 7740gagagcgtca gtattaagcg ggggagaatt agatcgcgat gggaaaaaat
tcggttaagg 7800ccagggggaa agaaaaaata taaattaaaa catatagtat gggcaagcag
ggagctagaa 7860cgattcgcag ttaatcctgg cctgttagaa acatcagaag gctgtagaca
aatactggga 7920cagctacaac catcccttca gacaggatca gaagaactta gatcattata
taatacagta 7980gcaaccctct attgtgtgca tcaaaggata gagataaaag acaccaagga
agctttagac 8040aagatagagg aagagcaaaa caaaagtaag accaccgcac agcaagcggc
cgctgatctt 8100cagacctgga ggaggagata tgagggacaa ttggagaagt gaattatata
aatataaagt 8160agtaaaaatt gaaccattag gagtagcacc caccaaggca aagagaagag
tggtgcagag 8220agaaaaaaga gcagtgggaa taggagcttt gttccttggg ttcttgggag
cagcaggaag 8280cactatgggc gcagcgtcaa tgacgctgac ggtacaggcc agacaattat
tgtctggtat 8340agtgcagcag cagaacaatt tgctgagggc tattgaggcg caacagcatc
tgttgcaact 8400cacagtctgg ggcatcaagc agctccaggc aagaatcctg gctgtggaaa
gatacctaaa 8460ggatcaacag ctcctgggga tttggggttg ctctggaaaa ctcatttgca
ccactgctgt 8520gccttggaat gctagttgga gtaataaatc tctggaacag atttggaatc
acacgacctg 8580gatggagtgg gacagagaaa ttaacaatta cacaagctta atacactcct
taattgaaga 8640atcgcaaaac cagcaagaaa agaatgaaca agaattattg gaattagata
aatgggcaag 8700tttgtggaat tggtttaaca taacaaattg gctgtggtat ataaaattat
tcataatgat 8760agtaggaggc ttggtaggtt taagaatagt ttttgctgta ctttctatag
tgaatagagt 8820taggcaggga tattcaccat tatcgtttca gacccacctc ccaaccccga
ggggacccga 8880caggccctta attaagctac atcatcaata atatacctta ttttggattg
aagccaatat 8940gataatgagg gggtggagtt tgtgacgtgg cgcggggcgt gggaacgggg
cgggtgacgt 9000agtagtgtgg cggaagtgtg atgttgcaag tgtggcggaa cacatgtaag
cgacggatgt 9060ggcaaaagtg acgtttttgg tgtgcgccgg tgtacacagg aagtgacaat
tttcgcgcgg 9120ttttaggcgg atgttgtagt aaatttgggc gtaaccgagt aagatttggc
cattttcgcg 9180ggaaaactga ataagaggaa gtgaaatctg aataattttg tgttactcat
agcgcgtaat 9240atttgtctag ggagatccga gctttgcaaa gatggataaa gttttaaaca
gagaggaatc 9300tttgcagcta atggaccttc taggtcttga aaggagtggg aattggctcc
ggtgcccgtc 9360agtgggcaga gcgcacatcg cccacagtcc ccgagaagtt ggggggaggg
gtcggcaatt 9420gaaccggtgc ctagagaagg tggcgcgggg taaactggga aagtgatgtc
gtgtactggc 9480tccgcctttt tcccgagggt gggggagaac cgtatataag tgcagtagtc
gccgtgaacg 9540ttctttttcg caacgggttt gccgccagaa cacaggtaag tgccgtgtgt
ggttcccgcg 9600ggcctggcct ctttacgggt tatggccctt gcgtgccttg aattacttcc
acctggctgc 9660agtacgtgat tcttgatccc gagcttcggg ttggaagtgg gtgggagagt
tcgaggcctt 9720gcgcttaagg agccccttcg cctcgtgctt gagttgaggc ctggcctggg
cgctggggcc 9780gccgcgtgcg aatctggtgg caccttcgcg cctgtctcgc tgctttcgat
aagtctctag 9840ccatttaaaa tttttgatga cctgctgcga cgcttttttt ctggcaagat
agtcttgtaa 9900atgcgggcca agatctgcac actggtattt cggtttttgg ggccgcgggc
ggcgacgggg 9960cccgtgcgtc ccagcgcaca tgttcggcga ggcggggcct gcgagcgcgg
ccaccgagaa 10020tcggacgggg gtagtctcaa gctggccggc ctgctctggt gcctggcctc
gcgccgccgt 10080gtatcgcccc gccctgggcg gcaaggctgg cccggtcggc accagttgcg
tgagcggaaa 10140gatggccgct tcccggccct gctgcaggga gctcaaaatg gaggacgcgg
cgctcgggag 10200agcgggcggg tgagtcaccc acacaaagga aaagggcctt tccgtcctca
gccgtcgctt 10260catgtgactc cacggagtac cgggcgccgt ccaggcacct cgattagttc
tcgagctttt 10320ggagtacgtc gtctttaggt tggggggagg ggttttatgc gatggagttt
ccccacactg 10380agtgggtgga gactgaagtt aggccagctt ggcacttgat gtaattctcc
ttggaatttg 10440ccctttttga gtttggatct tggttcattc tcaagcctca gacagtggtt
caaagttttt 10500ttcttccatt tcaggtgtcg tgaggaattc gctactagct cgagaagaat
tcaaggcgcg 10560ccgccaccat gctttttagc ttttttcgaa atttgtgccg tgttttgtat
cgcgttcgcg 10620ttacgggtga cacccaggca ctgaagggcg agcgcgttct aattacgcct
aatcacgtct 10680cttttattga tggcattttg cttggactgt ttttacctgt gcgtccagtg
tttgccgttt 10740acacctcaat aagccaacag tggtatatgc gttggctgaa atcatttatc
gactttgttc 10800ctctcgaccc gacgcaacct atggctatta aacatctggt acgtctggtg
gaacagggcc 10860gaccagtggt gattttccct gaaggacgca tcaccacgac aggctcgctg
atgaaaatct 10920acgatggcgc gggttttgtc gcggcgaagt ctggtgcaac ggttattcct
gtgcgtattg 10980aaggggcgga acttacgcac ttcagccgcc tgaaaggtct ggttaaacgt
cgcttgttcc 11040cgcaaattac tctgcatatt ttgccaccaa cgcaggtggc gatgccggat
gcgccgcgtg 11100cccgtgaccg tcgcaaaatc gctggcgaaa tgctgcatca aataatgatg
gaagcgcgaa 11160tggcggtgcg cccgcgtgaa acgctgtacg aatctttact gagtgcaatg
taccgcttcg 11220gagccgggaa gaaatgtgtc gaagacgtca actttacccc agactcctat
cgcaaattgc 11280ttacgaaaac gctgtttgtt ggacgcatcc ttgaaaaata cagtgttgaa
ggcgaacgca 11340tcggcttaat gctgcccaat gcaggcatca gtgcggcagt gatttttggg
gccatcgccc 11400gtcgccgcat gcccgcaatg atgaactaca ctgccggggt aaaagggctg
accagtgcta 11460ttacggcggc tgaaatcaaa accatcttca cttcccgcca gtttctcgat
aaaggcaaac 11520tctggcatct gccggagcaa cttactcagg tgcgctgggt ctatctggaa
gatttaaaag 11580cagatgtcac cactgccgac aaagtatgga tcttcgctca tttgctgatg
ccgcgtctgg 11640cacaggttaa acagcagccg gaagaagagg cgctgatcct ttttacctcc
ggttctgaag 11700gccatccgaa aggcgtcgtc catagccata aaagcattct ggcgaatgtc
gagcagatta 11760aaacgattgc cgacttcacc accaacgatc gctttatgtc ggcgttaccg
ctgtttcact 11820cctttgggct gacggtaggc ctgtttacgc cactgcttac aggtgcagaa
gtgttccttt 11880atccaagccc gctgcattac cgcattgtgc cggagttggt gtatgaccgc
agttgcaccg 11940tgttgttcgg cacctcgact ttcctcggtc actacgcgcg tttcgccaac
ccgtatgact 12000tctatcgtct acgctatgtg gtggcaggcg cagaaaaatt acaagaaagt
accaaacagc 12060tttggcagga taaatttggc ctgcgcatcc ttgaaggcta cggcgtgacc
gaatgcgcgc 12120ctgtcgtttc tatcaacgta ccgatggcgg cgaaacccgg tacggtaggg
cgtattctac 12180caggaatgga tgcgcgcctg ttgtcggtcc ctggtatcga agagggcgga
cgcctgcaac 12240tgaaagggcc gaacataatg aacggctatc tgcgggtgga gaagccaggt
gtactggaag 12300tgcccaccgc cgagaatgtt cgcggcgaaa tggagcgcgg ctggtatgac
actggcgata 12360ttgtgcgttt tgacgagcag ggctttgtgc agattcaggg ccgcgcaaaa
cgctttgcca 12420aaattgcagg cgaaatggtg tcgctggaaa tggtggaaca actggcactt
ggtgtttcgc 12480cagataaagt ccatgccact gcgattaaga gcgatgccag caaaggcgag
gcactggtgc 12540ttttcaccac agataacgaa ctgacgcgcg ataagttgca acagtatgcc
cgcgagcacg 12600gcgtgccgga gcttgctgta ccgcgcgata ttcgctatct gaaacagatg
ccattacttg 12660gcagcggcaa acctgacttt gtcacgttga aaagctgggt agacgaagcg
gaacaacacg 12720atgagtgagg
12730610773DNAArtificial SequenceSynthetic 6cggccatcga
taaggatccg cccctctccc tccccccccc ctaacgttac tggccgaagc 60cgcttggaat
aaggccggtg tgcgtttgtc tatatgttat tttccaccat attgccgtct 120tttggcaatg
tgagggcccg gaaacctggc cctgtcttct tgacgagcat tcctaggggt 180ctttcccctc
tcgccaaagg aatgcaaggt ctgttgaatg tcgtgaagga agcagttcct 240ctggaagctt
cttgaagaca aacaacgtct gtagcgaccc tttgcaggca gcggaacccc 300ccacctggcg
acaggtgcct ctgcggccaa aagccacgtg tataagatac acctgcaaag 360gcggcacaac
cccagtgcca cgttgtgagt tggatagttg tggaaagagt caaatggctc 420tcctcaagcg
tattcaacaa ggggctgaag gatgcccaga aggtacccca ttgtatggga 480tctgatctgg
ggcctcggtg cacatgcttt acatgtgttt agtcgaggtt aaaaaaacgt 540ctaggccccc
cgaaccacgg ggacgtggtt ttcctttgaa aaacacgatg ataatatggc 600cacaaccatg
gcctcctccg aggacgtcat caaggagttc atgcgcttca aggtgcgcat 660ggagggctcc
gtgaacggcc acgagttcga gatcgagggc gagggcgagg gccgccccta 720cgagggcacc
cagaccgcca agctgaaggt gaccaagggc ggccccctgc ccttcgcctg 780ggacatcctg
tccccccagt tccagtacgg ctccaaggtg tacgtgaagc accccgccga 840catccccgac
tacaagaagc tgtccttccc cgagggcttc aagtgggagc gcgtgatgaa 900cttcgaggac
ggcggcgtgg tgaccgtgac ccaggactcc tccctgcagg acggctcctt 960catctacaag
gtgaagttca tcggcgtgaa cttcccctcc gacggccccg taatgcagaa 1020gaagactatg
ggctgggagg cctccaccga gcgcctgtac ccccgcgacg gcgtgctgaa 1080gggcgagatc
cacaaggccc tgaagctgaa ggacggcggc cactacctgg tggagttcaa 1140gtccatctac
atggccaaga agcccgtgca gctgcccggc tactactacg tggactccaa 1200gctggacatc
acctcccaca acgaggacta caccatcgtg gagcagtacg agcgcgccga 1260gggccgccac
cacctgttcc tgtaggcggc cgcaatcaac ctctggatta caaaatttgt 1320gaaagattga
ctggtattct taactatgtt gctcctttta cgctatgtgg atacgctgct 1380ttaatgcctt
tgtatcatgc tattacttcc cgtacggctt tcattttctc ctccttgtat 1440aaatcctggt
tgctgtctct ttatgaggag ttgtggcccg ttgtcaggca acgtggcgtg 1500gtgtgcactg
tgtttgctga cgcaaccccc actggttggg gcattgccac cacctatcaa 1560ctcctttccg
ggactttcgc tttccccctc cctattgcca cggcggaact cattgccgcc 1620tgccttgccc
gctgctggac aggggctcgg ctgttgggca ctgacaattc cgtggtgttg 1680tcggggaagc
tgacgtcctt tccatggctg ctcgcctgtg ttgccaactg gattctgcgc 1740gggacgtcct
tctgctacgt cccttcggcc ctcaatccag cggaccttcc ttcccgcggc 1800ctgctgccgg
ttctgcggcc tcttccgcgt cttcgccttc gccctcagac gagtcggatc 1860tccctttggg
ccgcctcccc gcctgcctgc aggtttgtcg agacctagaa aaacatggag 1920caatcacaag
tagcaataca gcagctacca atgctgattg tgcctggcta gaagcacaag 1980aggaggagga
ggtgggtttt ccagtcacac ctcaggtacc tttaagacca atgacttaca 2040aggcagctgt
agatcttagc cactttttaa aagaaaaggg gggactggaa gggctaattc 2100actcccaacg
aagacaagat ctgctttttg cttgtactgg gtctctctgg ttagaccaga 2160tctgagcctg
ggagctctct ggctaactag ggaacccact gcttaagcct caataaagct 2220tgccttgagt
gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt aactagagat 2280ccctcagacc
cttttagtca gtgtggaaaa tctctagcag ggcccgttta aacccgctga 2340tcagcctcga
ctgtgccttc tagttgccag ccatctgttg tttgcccctc ccccgtgcct 2400tccttgaccc
tggaaggtgc cactcccact gtcctttcct aataaaatga ggaaattgca 2460tcgcattgtc
tgagtaggtg tcattctatt ctggggggtg gggtggggca ggacagcaag 2520ggggaggatt
gggaagacaa tagcaggcat gctggggatg cggtgggctc tatggcttct 2580gaggcggaaa
gaaccagctg gggctctagg gggtatcccc acgcgccctg tagcggcgca 2640ttaagcgcgg
cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta 2700gcgcccgctc
ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg ctttccccgt 2760caagctctaa
atcggggcat ccctttaggg ttccgattta gtgctttacg gcacctcgac 2820cccaaaaaac
ttgattaggg tgatggttca cgtagtgggc catcgccctg atagacggtt 2880tttcgccctt
tgacgttgga gtccacgttc tttaatagtg gactcttgtt ccaaactgga 2940acaacactca
accctatctc ggtctattct tttgatttat aagggatttt ggggatttcg 3000gcctattggt
taaaaaatga gctgatttaa caaaaattta acgcgaatta attctgtgga 3060atgtgtgtca
gttagggtgt ggaaagtccc caggctcccc aggcaggcag aagtatgcaa 3120agcatgcatc
tcaattagtc agcaaccagg tgtggaaagt ccccaggctc cccagcaggc 3180agaagtatgc
aaagcatgca tctcaattag tcagcaacca tagtcccgcc cctaactccg 3240cccatcccgc
ccctaactcc gcccagttcc gcccattctc cgccccatgg ctgactaatt 3300ttttttattt
atgcagaggc cgaggccgcc tctgcctctg agctattcca gaagtagtga 3360ggaggctttt
ttggaggcct aggcttttgc aaaaagctcc cgggagcttg tatatccatt 3420ttcggatctg
atcagcacgt gttgacaatt aatcatcggc atagtatatc ggcatagtat 3480aatacgacaa
ggtgaggaac taaaccatgg ccaagttgac cagtgccgtt ccggtgctca 3540ccgcgcgcga
cgtcgccgga gcggtcgagt tctggaccga ccggctcggg ttctcccggg 3600acttcgtgga
ggacgacttc gccggtgtgg tccgggacga cgtgaccctg ttcatcagcg 3660cggtccagga
ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg cgcggcctgg 3720acgagctgta
cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac gcctccgggc 3780cggccatgac
cgagatcggc gagcagccgt gggggcggga gttcgccctg cgcgacccgg 3840ccggcaactg
cgtgcacttc gtggccgagg agcaggactg acacgtgcta cgagatttcg 3900attccaccgc
cgccttctat gaaaggttgg gcttcggaat cgttttccgg gacgccggct 3960ggatgatcct
ccagcgcggg gatctcatgc tggagttctt cgcccacccc aacttgttta 4020ttgcagctta
taatggttac aaataaagca atagcatcac aaatttcaca aataaagcat 4080ttttttcact
gcattctagt tgtggtttgt ccaaactcat caatgtatct tatcatgtct 4140gtataccgtc
gacctctagc tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt 4200gaaattgtta
tccgctcaca attccacaca acatacgagc cggaagcata aagtgtaaag 4260cctggggtgc
ctaatgagtg agctaactca cattaattgc gttgcgctca ctgcccgctt 4320tccagtcggg
aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 4380gcggtttgcg
tattgggcgc tcttccgctt cctcgctcac tgactcgctg cgctcggtcg 4440ttcggctgcg
gcgagcggta tcagctcact caaaggcggt aatacggtta tccacagaat 4500caggggataa
cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta 4560aaaaggccgc
gttgctggcg tttttccata ggctccgccc ccctgacgag catcacaaaa 4620atcgacgctc
aagtcagagg tggcgaaacc cgacaggact ataaagatac caggcgtttc 4680cccctggaag
ctccctcgtg cgctctcctg ttccgaccct gccgcttacc ggatacctgt 4740ccgcctttct
cccttcggga agcgtggcgc tttctcaatg ctcacgctgt aggtatctca 4800gttcggtgta
ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc gttcagcccg 4860accgctgcgc
cttatccggt aactatcgtc ttgagtccaa cccggtaaga cacgacttat 4920cgccactggc
agcagccact ggtaacagga ttagcagagc gaggtatgta ggcggtgcta 4980cagagttctt
gaagtggtgg cctaactacg gctacactag aaggacagta tttggtatct 5040gcgctctgct
gaagccagtt accttcggaa aaagagttgg tagctcttga tccggcaaac 5100aaaccaccgc
tggtagcggt ggtttttttg tttgcaagca gcagattacg cgcagaaaaa 5160aaggatctca
agaagatcct ttgatctttt ctacggggtc tgacgctcag tggaacgaaa 5220actcacgtta
agggattttg gtcatgagat tatcaaaaag gatcttcacc tagatccttt 5280taaattaaaa
atgaagtttt aaatcaatct aaagtatata tgagtaaact tggtctgaca 5340gttaccaatg
cttaatcagt gaggcaccta tctcagcgat ctgtctattt cgttcatcca 5400tagttgcctg
actccccgtc gtgtagataa ctacgatacg ggagggctta ccatctggcc 5460ccagtgctgc
aatgataccg cgagacccac gctcaccggc tccagattta tcagcaataa 5520accagccagc
cggaagggcc gagcgcagaa gtggtcctgc aactttatcc gcctccatcc 5580agtctattaa
ttgttgccgg gaagctagag taagtagttc gccagttaat agtttgcgca 5640acgttgttgc
cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat 5700tcagctccgg
ttcccaacga tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag 5760cggttagctc
cttcggtcct ccgatcgttg tcagaagtaa gttggccgca gtgttatcac 5820tcatggttat
ggcagcactg cataattctc ttactgtcat gccatccgta agatgctttt 5880ctgtgactgg
tgagtactca accaagtcat tctgagaata gtgtatgcgg cgaccgagtt 5940gctcttgccc
ggcgtcaata cgggataata ccgcgccaca tagcagaact ttaaaagtgc 6000tcatcattgg
aaaacgttct tcggggcgaa aactctcaag gatcttaccg ctgttgagat 6060ccagttcgat
gtaacccact cgtgcaccca actgatcttc agcatctttt actttcacca 6120gcgtttctgg
gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga ataagggcga 6180cacggaaatg
ttgaatactc atactcttcc tttttcaata ttattgaagc atttatcagg 6240gttattgtct
catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagggg 6300ttccgcgcac
atttccccga aaagtgccac ctgacgtcga cggatcggga gatctcccga 6360tcccctatgg
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatct 6420gctccctgct
tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt aagctacaac 6480aaggcaaggc
ttgaccgaca attgcatgaa gaatctgctt agggttaggc gttttgcgct 6540gcttcgcgat
gtacgggcca gatatacgcg ttgacattga ttattgacta gttattaata 6600gtaatcaatt
acggggtcat tagttcatag cccatatatg gagttccgcg ttacataact 6660tacggtaaat
ggcccgcctg gctgaccgcc caacgacccc cgcccattga cgtcaataat 6720gacgtatgtt
cccatagtaa cgccaatagg gactttccat tgacgtcaat gggtggacta 6780tttacggtaa
actgcccact tggcagtaca tcaagtgtat catatgccaa gtacgccccc 6840tattgacgtc
aatgacggta aatggcccgc ctggcattat gcccagtaca tgaccttatg 6900ggactttcct
acttggcagt acatctacgt attagtcatc gctattacca tggtgatgcg 6960gttttggcag
tacatcaatg ggcgtggata gcggtttgac tcacggggat ttccaagtct 7020ccaccccatt
gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg actttccaaa 7080atgtcgtaac
aactccgccc cattgacgca aatgggcggt aggcgtgtac ggtgggaggt 7140ctatataagc
agagctctct ggctaactag agaacccact gcttactggc ttatcgaaat 7200taatacgact
cactataggg agacccaagc tggtttaaac ttaagcttgg taccgagctc 7260actagtccag
tgtggtggca gatatccagc acagtggcgg ccgctcgagt ctagagggcc 7320cgttttgcct
gtactgggtc tctctggtta gaccagatct gagcctggga gctctctggc 7380taactaggga
acccactgct taagcctcaa taaagcttgc cttgagtgct tcaagtagtg 7440tgtgcccgtc
tgttgtgtga ctctggtaac tagagatccc tcagaccctt ttagtcagtg 7500tggaaaatct
ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg aaaccagagg 7560agctctctcg
acgcaggact cggcttgctg aagcgcgcac ggcaagaggc gaggggcggc 7620gactggtgag
tacgccaaaa attttgacta gcggaggcta gaaggagaga gatgggtgcg 7680agagcgtcag
tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt cggttaaggc 7740cagggggaaa
gaaaaaatat aaattaaaac atatagtatg ggcaagcagg gagctagaac 7800gattcgcagt
taatcctggc ctgttagaaa catcagaagg ctgtagacaa atactgggac 7860agctacaacc
atcccttcag acaggatcag aagaacttag atcattatat aatacagtag 7920caaccctcta
ttgtgtgcat caaaggatag agataaaaga caccaaggaa gctttagaca 7980agatagagga
agagcaaaac aaaagtaaga ccaccgcaca gcaagcggcc gctgatcttc 8040agacctggag
gaggagatat gagggacaat tggagaagtg aattatataa atataaagta 8100gtaaaaattg
aaccattagg agtagcaccc accaaggcaa agagaagagt ggtgcagaga 8160gaaaaaagag
cagtgggaat aggagctttg ttccttgggt tcttgggagc agcaggaagc 8220actatgggcg
cagcgtcaat gacgctgacg gtacaggcca gacaattatt gtctggtata 8280gtgcagcagc
agaacaattt gctgagggct attgaggcgc aacagcatct gttgcaactc 8340acagtctggg
gcatcaagca gctccaggca agaatcctgg ctgtggaaag atacctaaag 8400gatcaacagc
tcctggggat ttggggttgc tctggaaaac tcatttgcac cactgctgtg 8460ccttggaatg
ctagttggag taataaatct ctggaacaga tttggaatca cacgacctgg 8520atggagtggg
acagagaaat taacaattac acaagcttaa tacactcctt aattgaagaa 8580tcgcaaaacc
agcaagaaaa gaatgaacaa gaattattgg aattagataa atgggcaagt 8640ttgtggaatt
ggtttaacat aacaaattgg ctgtggtata taaaattatt cataatgata 8700gtaggaggct
tggtaggttt aagaatagtt tttgctgtac tttctatagt gaatagagtt 8760aggcagggat
attcaccatt atcgtttcag acccacctcc caaccccgag gggacccgac 8820aggcccttaa
ttaagccacc tatcctcttc agacctcttc aggaaacagc tatgcacata 8880gcacacaggc
atatgttcaa ccaaaacact gaaacacata aaagaaatgt ttaaagaatg 8940aatttaaaaa
aataaaaaat aaactcaact acatatgaag ccttagcaaa catgtctgga 9000cctctagaca
cacagactct gacacgccaa cgtctgagtt ctagtttcga tacgcactgg 9060gaagttttaa
aagttttcca tcaactctaa tgtgtagaga aatggaaact atcatagact 9120ctacggcatt
gagggtgaag gtatgagtga agcactctta gggtcagaag tatgtcagtg 9180cccatttgtt
gctgttagca tcatcatctt agggcttgag aggatgttgc agctgaccca 9240tgcacctgtg
acatacatat ggaattattc tttggcacat aaaattagaa tgggagctgg 9300ctcatcaggt
tttgtgctgt aagttttcta tgttaaacca gatgcgatac actaaataaa 9360ataaaatata
cttgaccgat ggttttgagc gaaataataa ctggataatc aagaaatata 9420tccactaatg
aatagcctga actactgaaa caatttgttc agtgcctagc atatggtgtg 9480cattttatta
tttctttcaa aaagaatgta tttggagtta catagtaagt ctgctacctt 9540ttctttatgg
ctatatctat gtcttatgtt gagatgaatg aattattctt caggggaaat 9600aatctatttg
aacagtttag atggtgaaga acatttgcag catttgcaag atttttttcc 9660actctgaagt
ggtctttgtc cttgaacata ggatacaagt gacccctgct ctgttaatta 9720ttggcaaatt
gcctaacttc aacgtaagga aatagagtca tatgtttgct cactgaaggt 9780tactagttaa
caggcatccc ttaaacagga tataaaagga cttcagcagg actgctcgaa 9840acatcccact
tccagcactg cctgcggtga aggaaccagc agccgaattc ggccattacg 9900gcctgccacc
atgaacagtg aggagcagta ctacgcggcc acacagctct acaaggaccc 9960gtgcgcattc
cagaggggcc cggtgccaga gttcagcgct aacccccctg cgtgcctgta 10020catgggccgc
cagcccccac ctccgccgcc accccagttt acaagctcgc tgggatcact 10080ggagcaggga
agtcctccgg acatctcccc atacgaagtg cccccgctcg cctccgacga 10140cccggctggc
gctcacctcc accaccacct tccagctcag ctcgggctcg cccatccacc 10200tcccggacct
ttcccgaatg gaaccgagcc tgggggcctg gaagagccca accgcgtcca 10260gctccctttc
ccgtggatga aatccaccaa agctcacgcg tggaaaggcc agtgggcagg 10320aggtgcttac
acagcggaac ccgaggaaaa caagaggacc cgtactgcct acacccgggc 10380gcagctgctg
gagctggaga aggaattctt atttaacaaa tacatctccc ggccccgccg 10440ggtggagctg
gcagtgatgt tgaacttgac cgagagacac atcaaaatct ggttccaaaa 10500ccgtcgcatg
aagtggaaaa aagaggaaga taagaaacgt agtagcggga ccccgagtgg 10560gggcggtggg
ggcgaagagc cggagcaaga ttgtgcggtg acctcgggcg aggagctgct 10620ggcagtgcca
ccgctgccac ctcccggagg tgccgtgccc ccaggcgtcc cagctgcagt 10680ccgggagggc
ctactgcctt cgggccttag cgtgtcgcca cagccctcca gcatcgcgcc 10740actgcgaccg
caggaacccc ggtgaggccg cct
10773710562DNAArtificial SequenceSynthetic 7cggccatcga taaggatccg
cccctctccc tccccccccc ctaacgttac tggccgaagc 60cgcttggaat aaggccggtg
tgcgtttgtc tatatgttat tttccaccat attgccgtct 120tttggcaatg tgagggcccg
gaaacctggc cctgtcttct tgacgagcat tcctaggggt 180ctttcccctc tcgccaaagg
aatgcaaggt ctgttgaatg tcgtgaagga agcagttcct 240ctggaagctt cttgaagaca
aacaacgtct gtagcgaccc tttgcaggca gcggaacccc 300ccacctggcg acaggtgcct
ctgcggccaa aagccacgtg tataagatac acctgcaaag 360gcggcacaac cccagtgcca
cgttgtgagt tggatagttg tggaaagagt caaatggctc 420tcctcaagcg tattcaacaa
ggggctgaag gatgcccaga aggtacccca ttgtatggga 480tctgatctgg ggcctcggtg
cacatgcttt acatgtgttt agtcgaggtt aaaaaaacgt 540ctaggccccc cgaaccacgg
ggacgtggtt ttcctttgaa aaacacgatg ataatatggc 600cacaaccatg gcctcctccg
aggacgtcat caaggagttc atgcgcttca aggtgcgcat 660ggagggctcc gtgaacggcc
acgagttcga gatcgagggc gagggcgagg gccgccccta 720cgagggcacc cagaccgcca
agctgaaggt gaccaagggc ggccccctgc ccttcgcctg 780ggacatcctg tccccccagt
tccagtacgg ctccaaggtg tacgtgaagc accccgccga 840catccccgac tacaagaagc
tgtccttccc cgagggcttc aagtgggagc gcgtgatgaa 900cttcgaggac ggcggcgtgg
tgaccgtgac ccaggactcc tccctgcagg acggctcctt 960catctacaag gtgaagttca
tcggcgtgaa cttcccctcc gacggccccg taatgcagaa 1020gaagactatg ggctgggagg
cctccaccga gcgcctgtac ccccgcgacg gcgtgctgaa 1080gggcgagatc cacaaggccc
tgaagctgaa ggacggcggc cactacctgg tggagttcaa 1140gtccatctac atggccaaga
agcccgtgca gctgcccggc tactactacg tggactccaa 1200gctggacatc acctcccaca
acgaggacta caccatcgtg gagcagtacg agcgcgccga 1260gggccgccac cacctgttcc
tgtaggcggc cgcaatcaac ctctggatta caaaatttgt 1320gaaagattga ctggtattct
taactatgtt gctcctttta cgctatgtgg atacgctgct 1380ttaatgcctt tgtatcatgc
tattacttcc cgtacggctt tcattttctc ctccttgtat 1440aaatcctggt tgctgtctct
ttatgaggag ttgtggcccg ttgtcaggca acgtggcgtg 1500gtgtgcactg tgtttgctga
cgcaaccccc actggttggg gcattgccac cacctatcaa 1560ctcctttccg ggactttcgc
tttccccctc cctattgcca cggcggaact cattgccgcc 1620tgccttgccc gctgctggac
aggggctcgg ctgttgggca ctgacaattc cgtggtgttg 1680tcggggaagc tgacgtcctt
tccatggctg ctcgcctgtg ttgccaactg gattctgcgc 1740gggacgtcct tctgctacgt
cccttcggcc ctcaatccag cggaccttcc ttcccgcggc 1800ctgctgccgg ttctgcggcc
tcttccgcgt cttcgccttc gccctcagac gagtcggatc 1860tccctttggg ccgcctcccc
gcctgcctgc aggtttgtcg agacctagaa aaacatggag 1920caatcacaag tagcaataca
gcagctacca atgctgattg tgcctggcta gaagcacaag 1980aggaggagga ggtgggtttt
ccagtcacac ctcaggtacc tttaagacca atgacttaca 2040aggcagctgt agatcttagc
cactttttaa aagaaaaggg gggactggaa gggctaattc 2100actcccaacg aagacaagat
ctgctttttg cttgtactgg gtctctctgg ttagaccaga 2160tctgagcctg ggagctctct
ggctaactag ggaacccact gcttaagcct caataaagct 2220tgccttgagt gcttcaagta
gtgtgtgccc gtctgttgtg tgactctggt aactagagat 2280ccctcagacc cttttagtca
gtgtggaaaa tctctagcag ggcccgttta aacccgctga 2340tcagcctcga ctgtgccttc
tagttgccag ccatctgttg tttgcccctc ccccgtgcct 2400tccttgaccc tggaaggtgc
cactcccact gtcctttcct aataaaatga ggaaattgca 2460tcgcattgtc tgagtaggtg
tcattctatt ctggggggtg gggtggggca ggacagcaag 2520ggggaggatt gggaagacaa
tagcaggcat gctggggatg cggtgggctc tatggcttct 2580gaggcggaaa gaaccagctg
gggctctagg gggtatcccc acgcgccctg tagcggcgca 2640ttaagcgcgg cgggtgtggt
ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta 2700gcgcccgctc ctttcgcttt
cttcccttcc tttctcgcca cgttcgccgg ctttccccgt 2760caagctctaa atcggggcat
ccctttaggg ttccgattta gtgctttacg gcacctcgac 2820cccaaaaaac ttgattaggg
tgatggttca cgtagtgggc catcgccctg atagacggtt 2880tttcgccctt tgacgttgga
gtccacgttc tttaatagtg gactcttgtt ccaaactgga 2940acaacactca accctatctc
ggtctattct tttgatttat aagggatttt ggggatttcg 3000gcctattggt taaaaaatga
gctgatttaa caaaaattta acgcgaatta attctgtgga 3060atgtgtgtca gttagggtgt
ggaaagtccc caggctcccc aggcaggcag aagtatgcaa 3120agcatgcatc tcaattagtc
agcaaccagg tgtggaaagt ccccaggctc cccagcaggc 3180agaagtatgc aaagcatgca
tctcaattag tcagcaacca tagtcccgcc cctaactccg 3240cccatcccgc ccctaactcc
gcccagttcc gcccattctc cgccccatgg ctgactaatt 3300ttttttattt atgcagaggc
cgaggccgcc tctgcctctg agctattcca gaagtagtga 3360ggaggctttt ttggaggcct
aggcttttgc aaaaagctcc cgggagcttg tatatccatt 3420ttcggatctg atcagcacgt
gttgacaatt aatcatcggc atagtatatc ggcatagtat 3480aatacgacaa ggtgaggaac
taaaccatgg ccaagttgac cagtgccgtt ccggtgctca 3540ccgcgcgcga cgtcgccgga
gcggtcgagt tctggaccga ccggctcggg ttctcccggg 3600acttcgtgga ggacgacttc
gccggtgtgg tccgggacga cgtgaccctg ttcatcagcg 3660cggtccagga ccaggtggtg
ccggacaaca ccctggcctg ggtgtgggtg cgcggcctgg 3720acgagctgta cgccgagtgg
tcggaggtcg tgtccacgaa cttccgggac gcctccgggc 3780cggccatgac cgagatcggc
gagcagccgt gggggcggga gttcgccctg cgcgacccgg 3840ccggcaactg cgtgcacttc
gtggccgagg agcaggactg acacgtgcta cgagatttcg 3900attccaccgc cgccttctat
gaaaggttgg gcttcggaat cgttttccgg gacgccggct 3960ggatgatcct ccagcgcggg
gatctcatgc tggagttctt cgcccacccc aacttgttta 4020ttgcagctta taatggttac
aaataaagca atagcatcac aaatttcaca aataaagcat 4080ttttttcact gcattctagt
tgtggtttgt ccaaactcat caatgtatct tatcatgtct 4140gtataccgtc gacctctagc
tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt 4200gaaattgtta tccgctcaca
attccacaca acatacgagc cggaagcata aagtgtaaag 4260cctggggtgc ctaatgagtg
agctaactca cattaattgc gttgcgctca ctgcccgctt 4320tccagtcggg aaacctgtcg
tgccagctgc attaatgaat cggccaacgc gcggggagag 4380gcggtttgcg tattgggcgc
tcttccgctt cctcgctcac tgactcgctg cgctcggtcg 4440ttcggctgcg gcgagcggta
tcagctcact caaaggcggt aatacggtta tccacagaat 4500caggggataa cgcaggaaag
aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta 4560aaaaggccgc gttgctggcg
tttttccata ggctccgccc ccctgacgag catcacaaaa 4620atcgacgctc aagtcagagg
tggcgaaacc cgacaggact ataaagatac caggcgtttc 4680cccctggaag ctccctcgtg
cgctctcctg ttccgaccct gccgcttacc ggatacctgt 4740ccgcctttct cccttcggga
agcgtggcgc tttctcaatg ctcacgctgt aggtatctca 4800gttcggtgta ggtcgttcgc
tccaagctgg gctgtgtgca cgaacccccc gttcagcccg 4860accgctgcgc cttatccggt
aactatcgtc ttgagtccaa cccggtaaga cacgacttat 4920cgccactggc agcagccact
ggtaacagga ttagcagagc gaggtatgta ggcggtgcta 4980cagagttctt gaagtggtgg
cctaactacg gctacactag aaggacagta tttggtatct 5040gcgctctgct gaagccagtt
accttcggaa aaagagttgg tagctcttga tccggcaaac 5100aaaccaccgc tggtagcggt
ggtttttttg tttgcaagca gcagattacg cgcagaaaaa 5160aaggatctca agaagatcct
ttgatctttt ctacggggtc tgacgctcag tggaacgaaa 5220actcacgtta agggattttg
gtcatgagat tatcaaaaag gatcttcacc tagatccttt 5280taaattaaaa atgaagtttt
aaatcaatct aaagtatata tgagtaaact tggtctgaca 5340gttaccaatg cttaatcagt
gaggcaccta tctcagcgat ctgtctattt cgttcatcca 5400tagttgcctg actccccgtc
gtgtagataa ctacgatacg ggagggctta ccatctggcc 5460ccagtgctgc aatgataccg
cgagacccac gctcaccggc tccagattta tcagcaataa 5520accagccagc cggaagggcc
gagcgcagaa gtggtcctgc aactttatcc gcctccatcc 5580agtctattaa ttgttgccgg
gaagctagag taagtagttc gccagttaat agtttgcgca 5640acgttgttgc cattgctaca
ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat 5700tcagctccgg ttcccaacga
tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag 5760cggttagctc cttcggtcct
ccgatcgttg tcagaagtaa gttggccgca gtgttatcac 5820tcatggttat ggcagcactg
cataattctc ttactgtcat gccatccgta agatgctttt 5880ctgtgactgg tgagtactca
accaagtcat tctgagaata gtgtatgcgg cgaccgagtt 5940gctcttgccc ggcgtcaata
cgggataata ccgcgccaca tagcagaact ttaaaagtgc 6000tcatcattgg aaaacgttct
tcggggcgaa aactctcaag gatcttaccg ctgttgagat 6060ccagttcgat gtaacccact
cgtgcaccca actgatcttc agcatctttt actttcacca 6120gcgtttctgg gtgagcaaaa
acaggaaggc aaaatgccgc aaaaaaggga ataagggcga 6180cacggaaatg ttgaatactc
atactcttcc tttttcaata ttattgaagc atttatcagg 6240gttattgtct catgagcgga
tacatatttg aatgtattta gaaaaataaa caaatagggg 6300ttccgcgcac atttccccga
aaagtgccac ctgacgtcga cggatcggga gatctcccga 6360tcccctatgg tgcactctca
gtacaatctg ctctgatgcc gcatagttaa gccagtatct 6420gctccctgct tgtgtgttgg
aggtcgctga gtagtgcgcg agcaaaattt aagctacaac 6480aaggcaaggc ttgaccgaca
attgcatgaa gaatctgctt agggttaggc gttttgcgct 6540gcttcgcgat gtacgggcca
gatatacgcg ttgacattga ttattgacta gttattaata 6600gtaatcaatt acggggtcat
tagttcatag cccatatatg gagttccgcg ttacataact 6660tacggtaaat ggcccgcctg
gctgaccgcc caacgacccc cgcccattga cgtcaataat 6720gacgtatgtt cccatagtaa
cgccaatagg gactttccat tgacgtcaat gggtggacta 6780tttacggtaa actgcccact
tggcagtaca tcaagtgtat catatgccaa gtacgccccc 6840tattgacgtc aatgacggta
aatggcccgc ctggcattat gcccagtaca tgaccttatg 6900ggactttcct acttggcagt
acatctacgt attagtcatc gctattacca tggtgatgcg 6960gttttggcag tacatcaatg
ggcgtggata gcggtttgac tcacggggat ttccaagtct 7020ccaccccatt gacgtcaatg
ggagtttgtt ttggcaccaa aatcaacggg actttccaaa 7080atgtcgtaac aactccgccc
cattgacgca aatgggcggt aggcgtgtac ggtgggaggt 7140ctatataagc agagctctct
ggctaactag agaacccact gcttactggc ttatcgaaat 7200taatacgact cactataggg
agacccaagc tggtttaaac ttaagcttgg taccgagctc 7260actagtccag tgtggtggca
gatatccagc acagtggcgg ccgctcgagt ctagagggcc 7320cgttttgcct gtactgggtc
tctctggtta gaccagatct gagcctggga gctctctggc 7380taactaggga acccactgct
taagcctcaa taaagcttgc cttgagtgct tcaagtagtg 7440tgtgcccgtc tgttgtgtga
ctctggtaac tagagatccc tcagaccctt ttagtcagtg 7500tggaaaatct ctagcagtgg
cgcccgaaca gggacttgaa agcgaaaggg aaaccagagg 7560agctctctcg acgcaggact
cggcttgctg aagcgcgcac ggcaagaggc gaggggcggc 7620gactggtgag tacgccaaaa
attttgacta gcggaggcta gaaggagaga gatgggtgcg 7680agagcgtcag tattaagcgg
gggagaatta gatcgcgatg ggaaaaaatt cggttaaggc 7740cagggggaaa gaaaaaatat
aaattaaaac atatagtatg ggcaagcagg gagctagaac 7800gattcgcagt taatcctggc
ctgttagaaa catcagaagg ctgtagacaa atactgggac 7860agctacaacc atcccttcag
acaggatcag aagaacttag atcattatat aatacagtag 7920caaccctcta ttgtgtgcat
caaaggatag agataaaaga caccaaggaa gctttagaca 7980agatagagga agagcaaaac
aaaagtaaga ccaccgcaca gcaagcggcc gctgatcttc 8040agacctggag gaggagatat
gagggacaat tggagaagtg aattatataa atataaagta 8100gtaaaaattg aaccattagg
agtagcaccc accaaggcaa agagaagagt ggtgcagaga 8160gaaaaaagag cagtgggaat
aggagctttg ttccttgggt tcttgggagc agcaggaagc 8220actatgggcg cagcgtcaat
gacgctgacg gtacaggcca gacaattatt gtctggtata 8280gtgcagcagc agaacaattt
gctgagggct attgaggcgc aacagcatct gttgcaactc 8340acagtctggg gcatcaagca
gctccaggca agaatcctgg ctgtggaaag atacctaaag 8400gatcaacagc tcctggggat
ttggggttgc tctggaaaac tcatttgcac cactgctgtg 8460ccttggaatg ctagttggag
taataaatct ctggaacaga tttggaatca cacgacctgg 8520atggagtggg acagagaaat
taacaattac acaagcttaa tacactcctt aattgaagaa 8580tcgcaaaacc agcaagaaaa
gaatgaacaa gaattattgg aattagataa atgggcaagt 8640ttgtggaatt ggtttaacat
aacaaattgg ctgtggtata taaaattatt cataatgata 8700gtaggaggct tggtaggttt
aagaatagtt tttgctgtac tttctatagt gaatagagtt 8760aggcagggat attcaccatt
atcgtttcag acccacctcc caaccccgag gggacccgac 8820aggcccttaa ttaagccacc
tatcctcttc agacctcttc aggaaacagc tatgcacata 8880gcacacaggc atatgttcaa
ccaaaacact gaaacacata aaagaaatgt ttaaagaatg 8940aatttaaaaa aataaaaaat
aaactcaact acatatgaag ccttagcaaa catgtctgga 9000cctctagaca cacagactct
gacacgccaa cgtctgagtt ctagtttcga tacgcactgg 9060gaagttttaa aagttttcca
tcaactctaa tgtgtagaga aatggaaact atcatagact 9120ctacggcatt gagggtgaag
gtatgagtga agcactctta gggtcagaag tatgtcagtg 9180cccatttgtt gctgttagca
tcatcatctt agggcttgag aggatgttgc agctgaccca 9240tgcacctgtg acatacatat
ggaattattc tttggcacat aaaattagaa tgggagctgg 9300ctcatcaggt tttgtgctgt
aagttttcta tgttaaacca gatgcgatac actaaataaa 9360ataaaatata cttgaccgat
ggttttgagc gaaataataa ctggataatc aagaaatata 9420tccactaatg aatagcctga
actactgaaa caatttgttc agtgcctagc atatggtgtg 9480cattttatta tttctttcaa
aaagaatgta tttggagtta catagtaagt ctgctacctt 9540ttctttatgg ctatatctat
gtcttatgtt gagatgaatg aattattctt caggggaaat 9600aatctatttg aacagtttag
atggtgaaga acatttgcag catttgcaag atttttttcc 9660actctgaagt ggtctttgtc
cttgaacata ggatacaagt gacccctgct ctgttaatta 9720ttggcaaatt gcctaacttc
aacgtaagga aatagagtca tatgtttgct cactgaaggt 9780tactagttaa caggcatccc
ttaaacagga tataaaagga cttcagcagg actgctcgaa 9840acatcccact tccagcactg
cctgcggtga aggaaccagc agccgaattc ggccattacg 9900gccaccacca tgacgcctca
accctcgggt gcgcccactg tccaagtgac ccgtgagacg 9960gagcggtcct tccccagagc
ctcggaagac gaagtgacct gccccacgtc cgccccgccc 10020agccccactc gcacacgggg
gaactgcgca gaggcggaag agggaggctg ccgaggggcc 10080ccgaggaagc tccgggcacg
gcgcggggga cgcagccggc ctaagagcga gttggcactg 10140agcaagcagc gacggagtcg
gcgaaagaag gccaacgacc gcgagcgcaa tcgaatgcac 10200aacctcaact cggcactgga
cgccctgcgc ggtgtcctgc ccaccttccc agacgacgcg 10260aagctcacca agatcgagac
gctgcgcttc gcccacaact acatctgggc gctgactcaa 10320acgctgcgca tagcggacca
cagcttgtac gcgctggagc cgccggcgcc gcactgcggg 10380gagctgggca gcccaggcgg
ttcccccggg gactgggggt ccctctactc cccagtctcc 10440caggctggca gcctgagtcc
cgccgcgtcg ctggaggagc gacccgggct gctgggggcc 10500acctcttccg cctgcttgag
cccaggcagt ctggctttct cagattttct gtgaggccgc 10560ct
10562810926DNAArtificial
SequenceSynthetic 8cggccatcga taaggatccg cccctctccc tccccccccc ctaacgttac
tggccgaagc 60cgcttggaat aaggccggtg tgcgtttgtc tatatgttat tttccaccat
attgccgtct 120tttggcaatg tgagggcccg gaaacctggc cctgtcttct tgacgagcat
tcctaggggt 180ctttcccctc tcgccaaagg aatgcaaggt ctgttgaatg tcgtgaagga
agcagttcct 240ctggaagctt cttgaagaca aacaacgtct gtagcgaccc tttgcaggca
gcggaacccc 300ccacctggcg acaggtgcct ctgcggccaa aagccacgtg tataagatac
acctgcaaag 360gcggcacaac cccagtgcca cgttgtgagt tggatagttg tggaaagagt
caaatggctc 420tcctcaagcg tattcaacaa ggggctgaag gatgcccaga aggtacccca
ttgtatggga 480tctgatctgg ggcctcggtg cacatgcttt acatgtgttt agtcgaggtt
aaaaaaacgt 540ctaggccccc cgaaccacgg ggacgtggtt ttcctttgaa aaacacgatg
ataatatggc 600cacaaccatg gcctcctccg aggacgtcat caaggagttc atgcgcttca
aggtgcgcat 660ggagggctcc gtgaacggcc acgagttcga gatcgagggc gagggcgagg
gccgccccta 720cgagggcacc cagaccgcca agctgaaggt gaccaagggc ggccccctgc
ccttcgcctg 780ggacatcctg tccccccagt tccagtacgg ctccaaggtg tacgtgaagc
accccgccga 840catccccgac tacaagaagc tgtccttccc cgagggcttc aagtgggagc
gcgtgatgaa 900cttcgaggac ggcggcgtgg tgaccgtgac ccaggactcc tccctgcagg
acggctcctt 960catctacaag gtgaagttca tcggcgtgaa cttcccctcc gacggccccg
taatgcagaa 1020gaagactatg ggctgggagg cctccaccga gcgcctgtac ccccgcgacg
gcgtgctgaa 1080gggcgagatc cacaaggccc tgaagctgaa ggacggcggc cactacctgg
tggagttcaa 1140gtccatctac atggccaaga agcccgtgca gctgcccggc tactactacg
tggactccaa 1200gctggacatc acctcccaca acgaggacta caccatcgtg gagcagtacg
agcgcgccga 1260gggccgccac cacctgttcc tgtaggcggc cgcaatcaac ctctggatta
caaaatttgt 1320gaaagattga ctggtattct taactatgtt gctcctttta cgctatgtgg
atacgctgct 1380ttaatgcctt tgtatcatgc tattacttcc cgtacggctt tcattttctc
ctccttgtat 1440aaatcctggt tgctgtctct ttatgaggag ttgtggcccg ttgtcaggca
acgtggcgtg 1500gtgtgcactg tgtttgctga cgcaaccccc actggttggg gcattgccac
cacctatcaa 1560ctcctttccg ggactttcgc tttccccctc cctattgcca cggcggaact
cattgccgcc 1620tgccttgccc gctgctggac aggggctcgg ctgttgggca ctgacaattc
cgtggtgttg 1680tcggggaagc tgacgtcctt tccatggctg ctcgcctgtg ttgccaactg
gattctgcgc 1740gggacgtcct tctgctacgt cccttcggcc ctcaatccag cggaccttcc
ttcccgcggc 1800ctgctgccgg ttctgcggcc tcttccgcgt cttcgccttc gccctcagac
gagtcggatc 1860tccctttggg ccgcctcccc gcctgcctgc aggtttgtcg agacctagaa
aaacatggag 1920caatcacaag tagcaataca gcagctacca atgctgattg tgcctggcta
gaagcacaag 1980aggaggagga ggtgggtttt ccagtcacac ctcaggtacc tttaagacca
atgacttaca 2040aggcagctgt agatcttagc cactttttaa aagaaaaggg gggactggaa
gggctaattc 2100actcccaacg aagacaagat ctgctttttg cttgtactgg gtctctctgg
ttagaccaga 2160tctgagcctg ggagctctct ggctaactag ggaacccact gcttaagcct
caataaagct 2220tgccttgagt gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt
aactagagat 2280ccctcagacc cttttagtca gtgtggaaaa tctctagcag ggcccgttta
aacccgctga 2340tcagcctcga ctgtgccttc tagttgccag ccatctgttg tttgcccctc
ccccgtgcct 2400tccttgaccc tggaaggtgc cactcccact gtcctttcct aataaaatga
ggaaattgca 2460tcgcattgtc tgagtaggtg tcattctatt ctggggggtg gggtggggca
ggacagcaag 2520ggggaggatt gggaagacaa tagcaggcat gctggggatg cggtgggctc
tatggcttct 2580gaggcggaaa gaaccagctg gggctctagg gggtatcccc acgcgccctg
tagcggcgca 2640ttaagcgcgg cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc
cagcgcccta 2700gcgcccgctc ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg
ctttccccgt 2760caagctctaa atcggggcat ccctttaggg ttccgattta gtgctttacg
gcacctcgac 2820cccaaaaaac ttgattaggg tgatggttca cgtagtgggc catcgccctg
atagacggtt 2880tttcgccctt tgacgttgga gtccacgttc tttaatagtg gactcttgtt
ccaaactgga 2940acaacactca accctatctc ggtctattct tttgatttat aagggatttt
ggggatttcg 3000gcctattggt taaaaaatga gctgatttaa caaaaattta acgcgaatta
attctgtgga 3060atgtgtgtca gttagggtgt ggaaagtccc caggctcccc aggcaggcag
aagtatgcaa 3120agcatgcatc tcaattagtc agcaaccagg tgtggaaagt ccccaggctc
cccagcaggc 3180agaagtatgc aaagcatgca tctcaattag tcagcaacca tagtcccgcc
cctaactccg 3240cccatcccgc ccctaactcc gcccagttcc gcccattctc cgccccatgg
ctgactaatt 3300ttttttattt atgcagaggc cgaggccgcc tctgcctctg agctattcca
gaagtagtga 3360ggaggctttt ttggaggcct aggcttttgc aaaaagctcc cgggagcttg
tatatccatt 3420ttcggatctg atcagcacgt gttgacaatt aatcatcggc atagtatatc
ggcatagtat 3480aatacgacaa ggtgaggaac taaaccatgg ccaagttgac cagtgccgtt
ccggtgctca 3540ccgcgcgcga cgtcgccgga gcggtcgagt tctggaccga ccggctcggg
ttctcccggg 3600acttcgtgga ggacgacttc gccggtgtgg tccgggacga cgtgaccctg
ttcatcagcg 3660cggtccagga ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg
cgcggcctgg 3720acgagctgta cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac
gcctccgggc 3780cggccatgac cgagatcggc gagcagccgt gggggcggga gttcgccctg
cgcgacccgg 3840ccggcaactg cgtgcacttc gtggccgagg agcaggactg acacgtgcta
cgagatttcg 3900attccaccgc cgccttctat gaaaggttgg gcttcggaat cgttttccgg
gacgccggct 3960ggatgatcct ccagcgcggg gatctcatgc tggagttctt cgcccacccc
aacttgttta 4020ttgcagctta taatggttac aaataaagca atagcatcac aaatttcaca
aataaagcat 4080ttttttcact gcattctagt tgtggtttgt ccaaactcat caatgtatct
tatcatgtct 4140gtataccgtc gacctctagc tagagcttgg cgtaatcatg gtcatagctg
tttcctgtgt 4200gaaattgtta tccgctcaca attccacaca acatacgagc cggaagcata
aagtgtaaag 4260cctggggtgc ctaatgagtg agctaactca cattaattgc gttgcgctca
ctgcccgctt 4320tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc
gcggggagag 4380gcggtttgcg tattgggcgc tcttccgctt cctcgctcac tgactcgctg
cgctcggtcg 4440ttcggctgcg gcgagcggta tcagctcact caaaggcggt aatacggtta
tccacagaat 4500caggggataa cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc
aggaaccgta 4560aaaaggccgc gttgctggcg tttttccata ggctccgccc ccctgacgag
catcacaaaa 4620atcgacgctc aagtcagagg tggcgaaacc cgacaggact ataaagatac
caggcgtttc 4680cccctggaag ctccctcgtg cgctctcctg ttccgaccct gccgcttacc
ggatacctgt 4740ccgcctttct cccttcggga agcgtggcgc tttctcaatg ctcacgctgt
aggtatctca 4800gttcggtgta ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc
gttcagcccg 4860accgctgcgc cttatccggt aactatcgtc ttgagtccaa cccggtaaga
cacgacttat 4920cgccactggc agcagccact ggtaacagga ttagcagagc gaggtatgta
ggcggtgcta 4980cagagttctt gaagtggtgg cctaactacg gctacactag aaggacagta
tttggtatct 5040gcgctctgct gaagccagtt accttcggaa aaagagttgg tagctcttga
tccggcaaac 5100aaaccaccgc tggtagcggt ggtttttttg tttgcaagca gcagattacg
cgcagaaaaa 5160aaggatctca agaagatcct ttgatctttt ctacggggtc tgacgctcag
tggaacgaaa 5220actcacgtta agggattttg gtcatgagat tatcaaaaag gatcttcacc
tagatccttt 5280taaattaaaa atgaagtttt aaatcaatct aaagtatata tgagtaaact
tggtctgaca 5340gttaccaatg cttaatcagt gaggcaccta tctcagcgat ctgtctattt
cgttcatcca 5400tagttgcctg actccccgtc gtgtagataa ctacgatacg ggagggctta
ccatctggcc 5460ccagtgctgc aatgataccg cgagacccac gctcaccggc tccagattta
tcagcaataa 5520accagccagc cggaagggcc gagcgcagaa gtggtcctgc aactttatcc
gcctccatcc 5580agtctattaa ttgttgccgg gaagctagag taagtagttc gccagttaat
agtttgcgca 5640acgttgttgc cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt
atggcttcat 5700tcagctccgg ttcccaacga tcaaggcgag ttacatgatc ccccatgttg
tgcaaaaaag 5760cggttagctc cttcggtcct ccgatcgttg tcagaagtaa gttggccgca
gtgttatcac 5820tcatggttat ggcagcactg cataattctc ttactgtcat gccatccgta
agatgctttt 5880ctgtgactgg tgagtactca accaagtcat tctgagaata gtgtatgcgg
cgaccgagtt 5940gctcttgccc ggcgtcaata cgggataata ccgcgccaca tagcagaact
ttaaaagtgc 6000tcatcattgg aaaacgttct tcggggcgaa aactctcaag gatcttaccg
ctgttgagat 6060ccagttcgat gtaacccact cgtgcaccca actgatcttc agcatctttt
actttcacca 6120gcgtttctgg gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga
ataagggcga 6180cacggaaatg ttgaatactc atactcttcc tttttcaata ttattgaagc
atttatcagg 6240gttattgtct catgagcgga tacatatttg aatgtattta gaaaaataaa
caaatagggg 6300ttccgcgcac atttccccga aaagtgccac ctgacgtcga cggatcggga
gatctcccga 6360tcccctatgg tgcactctca gtacaatctg ctctgatgcc gcatagttaa
gccagtatct 6420gctccctgct tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt
aagctacaac 6480aaggcaaggc ttgaccgaca attgcatgaa gaatctgctt agggttaggc
gttttgcgct 6540gcttcgcgat gtacgggcca gatatacgcg ttgacattga ttattgacta
gttattaata 6600gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg
ttacataact 6660tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga
cgtcaataat 6720gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat
gggtggacta 6780tttacggtaa actgcccact tggcagtaca tcaagtgtat catatgccaa
gtacgccccc 6840tattgacgtc aatgacggta aatggcccgc ctggcattat gcccagtaca
tgaccttatg 6900ggactttcct acttggcagt acatctacgt attagtcatc gctattacca
tggtgatgcg 6960gttttggcag tacatcaatg ggcgtggata gcggtttgac tcacggggat
ttccaagtct 7020ccaccccatt gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg
actttccaaa 7080atgtcgtaac aactccgccc cattgacgca aatgggcggt aggcgtgtac
ggtgggaggt 7140ctatataagc agagctctct ggctaactag agaacccact gcttactggc
ttatcgaaat 7200taatacgact cactataggg agacccaagc tggtttaaac ttaagcttgg
taccgagctc 7260actagtccag tgtggtggca gatatccagc acagtggcgg ccgctcgagt
ctagagggcc 7320cgttttgcct gtactgggtc tctctggtta gaccagatct gagcctggga
gctctctggc 7380taactaggga acccactgct taagcctcaa taaagcttgc cttgagtgct
tcaagtagtg 7440tgtgcccgtc tgttgtgtga ctctggtaac tagagatccc tcagaccctt
ttagtcagtg 7500tggaaaatct ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg
aaaccagagg 7560agctctctcg acgcaggact cggcttgctg aagcgcgcac ggcaagaggc
gaggggcggc 7620gactggtgag tacgccaaaa attttgacta gcggaggcta gaaggagaga
gatgggtgcg 7680agagcgtcag tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt
cggttaaggc 7740cagggggaaa gaaaaaatat aaattaaaac atatagtatg ggcaagcagg
gagctagaac 7800gattcgcagt taatcctggc ctgttagaaa catcagaagg ctgtagacaa
atactgggac 7860agctacaacc atcccttcag acaggatcag aagaacttag atcattatat
aatacagtag 7920caaccctcta ttgtgtgcat caaaggatag agataaaaga caccaaggaa
gctttagaca 7980agatagagga agagcaaaac aaaagtaaga ccaccgcaca gcaagcggcc
gctgatcttc 8040agacctggag gaggagatat gagggacaat tggagaagtg aattatataa
atataaagta 8100gtaaaaattg aaccattagg agtagcaccc accaaggcaa agagaagagt
ggtgcagaga 8160gaaaaaagag cagtgggaat aggagctttg ttccttgggt tcttgggagc
agcaggaagc 8220actatgggcg cagcgtcaat gacgctgacg gtacaggcca gacaattatt
gtctggtata 8280gtgcagcagc agaacaattt gctgagggct attgaggcgc aacagcatct
gttgcaactc 8340acagtctggg gcatcaagca gctccaggca agaatcctgg ctgtggaaag
atacctaaag 8400gatcaacagc tcctggggat ttggggttgc tctggaaaac tcatttgcac
cactgctgtg 8460ccttggaatg ctagttggag taataaatct ctggaacaga tttggaatca
cacgacctgg 8520atggagtggg acagagaaat taacaattac acaagcttaa tacactcctt
aattgaagaa 8580tcgcaaaacc agcaagaaaa gaatgaacaa gaattattgg aattagataa
atgggcaagt 8640ttgtggaatt ggtttaacat aacaaattgg ctgtggtata taaaattatt
cataatgata 8700gtaggaggct tggtaggttt aagaatagtt tttgctgtac tttctatagt
gaatagagtt 8760aggcagggat attcaccatt atcgtttcag acccacctcc caaccccgag
gggacccgac 8820aggcccttaa ttaagccacc tatcctcttc agacctcttc aggaaacagc
tatgcacata 8880gcacacaggc atatgttcaa ccaaaacact gaaacacata aaagaaatgt
ttaaagaatg 8940aatttaaaaa aataaaaaat aaactcaact acatatgaag ccttagcaaa
catgtctgga 9000cctctagaca cacagactct gacacgccaa cgtctgagtt ctagtttcga
tacgcactgg 9060gaagttttaa aagttttcca tcaactctaa tgtgtagaga aatggaaact
atcatagact 9120ctacggcatt gagggtgaag gtatgagtga agcactctta gggtcagaag
tatgtcagtg 9180cccatttgtt gctgttagca tcatcatctt agggcttgag aggatgttgc
agctgaccca 9240tgcacctgtg acatacatat ggaattattc tttggcacat aaaattagaa
tgggagctgg 9300ctcatcaggt tttgtgctgt aagttttcta tgttaaacca gatgcgatac
actaaataaa 9360ataaaatata cttgaccgat ggttttgagc gaaataataa ctggataatc
aagaaatata 9420tccactaatg aatagcctga actactgaaa caatttgttc agtgcctagc
atatggtgtg 9480cattttatta tttctttcaa aaagaatgta tttggagtta catagtaagt
ctgctacctt 9540ttctttatgg ctatatctat gtcttatgtt gagatgaatg aattattctt
caggggaaat 9600aatctatttg aacagtttag atggtgaaga acatttgcag catttgcaag
atttttttcc 9660actctgaagt ggtctttgtc cttgaacata ggatacaagt gacccctgct
ctgttaatta 9720ttggcaaatt gcctaacttc aacgtaagga aatagagtca tatgtttgct
cactgaaggt 9780tactagttaa caggcatccc ttaaacagga tataaaagga cttcagcagg
actgctcgaa 9840acatcccact tccagcactg cctgcggtga aggaaccagc agccgaattc
ggccattacg 9900gcccgccacc atggctagat tagataaaag taaagtgatt aacagcgcat
tagagctgct 9960taatgaggtc ggaatcgaag gtttaacaac ccgtaaactc gcccagaagc
taggtgtaga 10020gcagcctaca ttgtattggc atgtaaaaaa taagcgggct ttgctcgacg
ccttagccat 10080tgagatgtta gataggcacc atactcactt ttgcccttta gaaggggaaa
gctggcaaga 10140ttttttacgt aataacgcta aaagttttag atgtgcttta ctaagtcatc
gcgatggagc 10200aaaagtacat ttaggtacac ggcctacaga aaaacagtat gaaactctcg
aaaatcaatt 10260agccttttta tgccaacaag gtttttcact agagaatgca ttatatgcac
tcagcgctgt 10320ggggcatttt actttaggtt gcgtattgga agatcaagag catcaagtcg
ctaaagaaga 10380aagggaaaca cctactactg atagtatgcc gccattatta cgacaagcta
tcgaattatt 10440tgatcaccaa ggtgcagagc cagccttctt attcggcctt gaattgatca
tatgcggatt 10500agaaaaacaa cttaaatgtg aaagtgggtc gccaaaaaag aagagaaagg
tcgacggcgg 10560tggtgctttg tctcctcagc actctgctgt cactcaagga agtatcatca
agaacaagga 10620gggcatggat gctaagtcac taactgcctg gtcccggaca ctggtgacct
tcaaggatgt 10680atttgtggac ttcaccaggg aggagtggaa gctgctggac actgctcagc
agatcgtgta 10740cagaaatgtg atgctggaga actataagaa cctggtttcc ttgggttatc
agcttactaa 10800gccagatgtg atcctccggt tggagaaggg agaagagccc tggctggtgg
agagagaaat 10860tcaccaagag acccatcctg attcagagac tgcatttgaa atcaaatcat
cagtttaagg 10920ccgcct
10926910656DNAArtificial SequenceSynthetic 9cggccatcga
taaggatccg cccctctccc tccccccccc ctaacgttac tggccgaagc 60cgcttggaat
aaggccggtg tgcgtttgtc tatatgttat tttccaccat attgccgtct 120tttggcaatg
tgagggcccg gaaacctggc cctgtcttct tgacgagcat tcctaggggt 180ctttcccctc
tcgccaaagg aatgcaaggt ctgttgaatg tcgtgaagga agcagttcct 240ctggaagctt
cttgaagaca aacaacgtct gtagcgaccc tttgcaggca gcggaacccc 300ccacctggcg
acaggtgcct ctgcggccaa aagccacgtg tataagatac acctgcaaag 360gcggcacaac
cccagtgcca cgttgtgagt tggatagttg tggaaagagt caaatggctc 420tcctcaagcg
tattcaacaa ggggctgaag gatgcccaga aggtacccca ttgtatggga 480tctgatctgg
ggcctcggtg cacatgcttt acatgtgttt agtcgaggtt aaaaaaacgt 540ctaggccccc
cgaaccacgg ggacgtggtt ttcctttgaa aaacacgatg ataatatggc 600cacaaccatg
gcctcctccg aggacgtcat caaggagttc atgcgcttca aggtgcgcat 660ggagggctcc
gtgaacggcc acgagttcga gatcgagggc gagggcgagg gccgccccta 720cgagggcacc
cagaccgcca agctgaaggt gaccaagggc ggccccctgc ccttcgcctg 780ggacatcctg
tccccccagt tccagtacgg ctccaaggtg tacgtgaagc accccgccga 840catccccgac
tacaagaagc tgtccttccc cgagggcttc aagtgggagc gcgtgatgaa 900cttcgaggac
ggcggcgtgg tgaccgtgac ccaggactcc tccctgcagg acggctcctt 960catctacaag
gtgaagttca tcggcgtgaa cttcccctcc gacggccccg taatgcagaa 1020gaagactatg
ggctgggagg cctccaccga gcgcctgtac ccccgcgacg gcgtgctgaa 1080gggcgagatc
cacaaggccc tgaagctgaa ggacggcggc cactacctgg tggagttcaa 1140gtccatctac
atggccaaga agcccgtgca gctgcccggc tactactacg tggactccaa 1200gctggacatc
acctcccaca acgaggacta caccatcgtg gagcagtacg agcgcgccga 1260gggccgccac
cacctgttcc tgtaggcggc cgcaatcaac ctctggatta caaaatttgt 1320gaaagattga
ctggtattct taactatgtt gctcctttta cgctatgtgg atacgctgct 1380ttaatgcctt
tgtatcatgc tattacttcc cgtacggctt tcattttctc ctccttgtat 1440aaatcctggt
tgctgtctct ttatgaggag ttgtggcccg ttgtcaggca acgtggcgtg 1500gtgtgcactg
tgtttgctga cgcaaccccc actggttggg gcattgccac cacctatcaa 1560ctcctttccg
ggactttcgc tttccccctc cctattgcca cggcggaact cattgccgcc 1620tgccttgccc
gctgctggac aggggctcgg ctgttgggca ctgacaattc cgtggtgttg 1680tcggggaagc
tgacgtcctt tccatggctg ctcgcctgtg ttgccaactg gattctgcgc 1740gggacgtcct
tctgctacgt cccttcggcc ctcaatccag cggaccttcc ttcccgcggc 1800ctgctgccgg
ttctgcggcc tcttccgcgt cttcgccttc gccctcagac gagtcggatc 1860tccctttggg
ccgcctcccc gcctgcctgc aggtttgtcg agacctagaa aaacatggag 1920caatcacaag
tagcaataca gcagctacca atgctgattg tgcctggcta gaagcacaag 1980aggaggagga
ggtgggtttt ccagtcacac ctcaggtacc tttaagacca atgacttaca 2040aggcagctgt
agatcttagc cactttttaa aagaaaaggg gggactggaa gggctaattc 2100actcccaacg
aagacaagat ctgctttttg cttgtactgg gtctctctgg ttagaccaga 2160tctgagcctg
ggagctctct ggctaactag ggaacccact gcttaagcct caataaagct 2220tgccttgagt
gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt aactagagat 2280ccctcagacc
cttttagtca gtgtggaaaa tctctagcag ggcccgttta aacccgctga 2340tcagcctcga
ctgtgccttc tagttgccag ccatctgttg tttgcccctc ccccgtgcct 2400tccttgaccc
tggaaggtgc cactcccact gtcctttcct aataaaatga ggaaattgca 2460tcgcattgtc
tgagtaggtg tcattctatt ctggggggtg gggtggggca ggacagcaag 2520ggggaggatt
gggaagacaa tagcaggcat gctggggatg cggtgggctc tatggcttct 2580gaggcggaaa
gaaccagctg gggctctagg gggtatcccc acgcgccctg tagcggcgca 2640ttaagcgcgg
cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta 2700gcgcccgctc
ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg ctttccccgt 2760caagctctaa
atcggggcat ccctttaggg ttccgattta gtgctttacg gcacctcgac 2820cccaaaaaac
ttgattaggg tgatggttca cgtagtgggc catcgccctg atagacggtt 2880tttcgccctt
tgacgttgga gtccacgttc tttaatagtg gactcttgtt ccaaactgga 2940acaacactca
accctatctc ggtctattct tttgatttat aagggatttt ggggatttcg 3000gcctattggt
taaaaaatga gctgatttaa caaaaattta acgcgaatta attctgtgga 3060atgtgtgtca
gttagggtgt ggaaagtccc caggctcccc aggcaggcag aagtatgcaa 3120agcatgcatc
tcaattagtc agcaaccagg tgtggaaagt ccccaggctc cccagcaggc 3180agaagtatgc
aaagcatgca tctcaattag tcagcaacca tagtcccgcc cctaactccg 3240cccatcccgc
ccctaactcc gcccagttcc gcccattctc cgccccatgg ctgactaatt 3300ttttttattt
atgcagaggc cgaggccgcc tctgcctctg agctattcca gaagtagtga 3360ggaggctttt
ttggaggcct aggcttttgc aaaaagctcc cgggagcttg tatatccatt 3420ttcggatctg
atcagcacgt gttgacaatt aatcatcggc atagtatatc ggcatagtat 3480aatacgacaa
ggtgaggaac taaaccatgg ccaagttgac cagtgccgtt ccggtgctca 3540ccgcgcgcga
cgtcgccgga gcggtcgagt tctggaccga ccggctcggg ttctcccggg 3600acttcgtgga
ggacgacttc gccggtgtgg tccgggacga cgtgaccctg ttcatcagcg 3660cggtccagga
ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg cgcggcctgg 3720acgagctgta
cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac gcctccgggc 3780cggccatgac
cgagatcggc gagcagccgt gggggcggga gttcgccctg cgcgacccgg 3840ccggcaactg
cgtgcacttc gtggccgagg agcaggactg acacgtgcta cgagatttcg 3900attccaccgc
cgccttctat gaaaggttgg gcttcggaat cgttttccgg gacgccggct 3960ggatgatcct
ccagcgcggg gatctcatgc tggagttctt cgcccacccc aacttgttta 4020ttgcagctta
taatggttac aaataaagca atagcatcac aaatttcaca aataaagcat 4080ttttttcact
gcattctagt tgtggtttgt ccaaactcat caatgtatct tatcatgtct 4140gtataccgtc
gacctctagc tagagcttgg cgtaatcatg gtcatagctg tttcctgtgt 4200gaaattgtta
tccgctcaca attccacaca acatacgagc cggaagcata aagtgtaaag 4260cctggggtgc
ctaatgagtg agctaactca cattaattgc gttgcgctca ctgcccgctt 4320tccagtcggg
aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 4380gcggtttgcg
tattgggcgc tcttccgctt cctcgctcac tgactcgctg cgctcggtcg 4440ttcggctgcg
gcgagcggta tcagctcact caaaggcggt aatacggtta tccacagaat 4500caggggataa
cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta 4560aaaaggccgc
gttgctggcg tttttccata ggctccgccc ccctgacgag catcacaaaa 4620atcgacgctc
aagtcagagg tggcgaaacc cgacaggact ataaagatac caggcgtttc 4680cccctggaag
ctccctcgtg cgctctcctg ttccgaccct gccgcttacc ggatacctgt 4740ccgcctttct
cccttcggga agcgtggcgc tttctcaatg ctcacgctgt aggtatctca 4800gttcggtgta
ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc gttcagcccg 4860accgctgcgc
cttatccggt aactatcgtc ttgagtccaa cccggtaaga cacgacttat 4920cgccactggc
agcagccact ggtaacagga ttagcagagc gaggtatgta ggcggtgcta 4980cagagttctt
gaagtggtgg cctaactacg gctacactag aaggacagta tttggtatct 5040gcgctctgct
gaagccagtt accttcggaa aaagagttgg tagctcttga tccggcaaac 5100aaaccaccgc
tggtagcggt ggtttttttg tttgcaagca gcagattacg cgcagaaaaa 5160aaggatctca
agaagatcct ttgatctttt ctacggggtc tgacgctcag tggaacgaaa 5220actcacgtta
agggattttg gtcatgagat tatcaaaaag gatcttcacc tagatccttt 5280taaattaaaa
atgaagtttt aaatcaatct aaagtatata tgagtaaact tggtctgaca 5340gttaccaatg
cttaatcagt gaggcaccta tctcagcgat ctgtctattt cgttcatcca 5400tagttgcctg
actccccgtc gtgtagataa ctacgatacg ggagggctta ccatctggcc 5460ccagtgctgc
aatgataccg cgagacccac gctcaccggc tccagattta tcagcaataa 5520accagccagc
cggaagggcc gagcgcagaa gtggtcctgc aactttatcc gcctccatcc 5580agtctattaa
ttgttgccgg gaagctagag taagtagttc gccagttaat agtttgcgca 5640acgttgttgc
cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat 5700tcagctccgg
ttcccaacga tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag 5760cggttagctc
cttcggtcct ccgatcgttg tcagaagtaa gttggccgca gtgttatcac 5820tcatggttat
ggcagcactg cataattctc ttactgtcat gccatccgta agatgctttt 5880ctgtgactgg
tgagtactca accaagtcat tctgagaata gtgtatgcgg cgaccgagtt 5940gctcttgccc
ggcgtcaata cgggataata ccgcgccaca tagcagaact ttaaaagtgc 6000tcatcattgg
aaaacgttct tcggggcgaa aactctcaag gatcttaccg ctgttgagat 6060ccagttcgat
gtaacccact cgtgcaccca actgatcttc agcatctttt actttcacca 6120gcgtttctgg
gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga ataagggcga 6180cacggaaatg
ttgaatactc atactcttcc tttttcaata ttattgaagc atttatcagg 6240gttattgtct
catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagggg 6300ttccgcgcac
atttccccga aaagtgccac ctgacgtcga cggatcggga gatctcccga 6360tcccctatgg
tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatct 6420gctccctgct
tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt aagctacaac 6480aaggcaaggc
ttgaccgaca attgcatgaa gaatctgctt agggttaggc gttttgcgct 6540gcttcgcgat
gtacgggcca gatatacgcg ttgacattga ttattgacta gttattaata 6600gtaatcaatt
acggggtcat tagttcatag cccatatatg gagttccgcg ttacataact 6660tacggtaaat
ggcccgcctg gctgaccgcc caacgacccc cgcccattga cgtcaataat 6720gacgtatgtt
cccatagtaa cgccaatagg gactttccat tgacgtcaat gggtggacta 6780tttacggtaa
actgcccact tggcagtaca tcaagtgtat catatgccaa gtacgccccc 6840tattgacgtc
aatgacggta aatggcccgc ctggcattat gcccagtaca tgaccttatg 6900ggactttcct
acttggcagt acatctacgt attagtcatc gctattacca tggtgatgcg 6960gttttggcag
tacatcaatg ggcgtggata gcggtttgac tcacggggat ttccaagtct 7020ccaccccatt
gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg actttccaaa 7080atgtcgtaac
aactccgccc cattgacgca aatgggcggt aggcgtgtac ggtgggaggt 7140ctatataagc
agagctctct ggctaactag agaacccact gcttactggc ttatcgaaat 7200taatacgact
cactataggg agacccaagc tggtttaaac ttaagcttgg taccgagctc 7260actagtccag
tgtggtggca gatatccagc acagtggcgg ccgctcgagt ctagagggcc 7320cgttttgcct
gtactgggtc tctctggtta gaccagatct gagcctggga gctctctggc 7380taactaggga
acccactgct taagcctcaa taaagcttgc cttgagtgct tcaagtagtg 7440tgtgcccgtc
tgttgtgtga ctctggtaac tagagatccc tcagaccctt ttagtcagtg 7500tggaaaatct
ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg aaaccagagg 7560agctctctcg
acgcaggact cggcttgctg aagcgcgcac ggcaagaggc gaggggcggc 7620gactggtgag
tacgccaaaa attttgacta gcggaggcta gaaggagaga gatgggtgcg 7680agagcgtcag
tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt cggttaaggc 7740cagggggaaa
gaaaaaatat aaattaaaac atatagtatg ggcaagcagg gagctagaac 7800gattcgcagt
taatcctggc ctgttagaaa catcagaagg ctgtagacaa atactgggac 7860agctacaacc
atcccttcag acaggatcag aagaacttag atcattatat aatacagtag 7920caaccctcta
ttgtgtgcat caaaggatag agataaaaga caccaaggaa gctttagaca 7980agatagagga
agagcaaaac aaaagtaaga ccaccgcaca gcaagcggcc gctgatcttc 8040agacctggag
gaggagatat gagggacaat tggagaagtg aattatataa atataaagta 8100gtaaaaattg
aaccattagg agtagcaccc accaaggcaa agagaagagt ggtgcagaga 8160gaaaaaagag
cagtgggaat aggagctttg ttccttgggt tcttgggagc agcaggaagc 8220actatgggcg
cagcgtcaat gacgctgacg gtacaggcca gacaattatt gtctggtata 8280gtgcagcagc
agaacaattt gctgagggct attgaggcgc aacagcatct gttgcaactc 8340acagtctggg
gcatcaagca gctccaggca agaatcctgg ctgtggaaag atacctaaag 8400gatcaacagc
tcctggggat ttggggttgc tctggaaaac tcatttgcac cactgctgtg 8460ccttggaatg
ctagttggag taataaatct ctggaacaga tttggaatca cacgacctgg 8520atggagtggg
acagagaaat taacaattac acaagcttaa tacactcctt aattgaagaa 8580tcgcaaaacc
agcaagaaaa gaatgaacaa gaattattgg aattagataa atgggcaagt 8640ttgtggaatt
ggtttaacat aacaaattgg ctgtggtata taaaattatt cataatgata 8700gtaggaggct
tggtaggttt aagaatagtt tttgctgtac tttctatagt gaatagagtt 8760aggcagggat
attcaccatt atcgtttcag acccacctcc caaccccgag gggacccgac 8820aggcccttaa
ttaattggct ccggtgcccg tcagtgggca gagcgcacat cgcccacagt 8880ccccgagaag
ttggggggag gggtcggcaa ttgaaccggt gcctagagaa ggtggcgcgg 8940ggtaaactgg
gaaagtgatg tcgtgtactg gctccgcctt tttcccgagg gtgggggaga 9000accgtatata
agtgcagtag tcgccgtgaa cgttcttttt cgcaacgggt ttgccgccag 9060aacacaggta
agtgccgtgt gtggttcccg cgggcctggc ctctttacgg gttatggccc 9120ttgcgtgcct
tgaattactt ccacctggct gcagtacgtg attcttgatc ccgagcttcg 9180ggttggaagt
gggtgggaga gttcgaggcc ttgcgcttaa ggagcccctt cgcctcgtgc 9240ttgagttgag
gcctggcctg ggcgctgggg ccgccgcgtg cgaatctggt ggcaccttcg 9300cgcctgtctc
gctgctttcg ataagtctct agccatttaa aatttttgat gacctgctgc 9360gacgcttttt
ttctggcaag atagtcttgt aaatgcgggc caagatctgc acactggtat 9420ttcggttttt
ggggccgcgg gcggcgacgg ggcccgtgcg tcccagcgca catgttcggc 9480gaggcggggc
ctgcgagcgc ggccaccgag aatcggacgg gggtagtctc aagctggccg 9540gcctgctctg
gtgcctggcc tcgcgccgcc gtgtatcgcc ccgccctggg cggcaaggct 9600ggcccggtcg
gcaccagttg cgtgagcgga aagatggccg cttcccggcc ctgctgcagg 9660gagctcaaaa
tggaggacgc ggcgctcggg agagcgggcg ggtgagtcac ccacacaaag 9720gaaaagggcc
tttccgtcct cagccgtcgc ttcatgtgac tccacggagt accgggcgcc 9780gtccaggcac
ctcgattagt tctcgagctt ttggagtacg tcgtctttag gttgggggga 9840ggggttttat
gcgatggagt ttccccacac tgagtgggtg gagactgaag ttaggccagc 9900ttggcacttg
atgtaattct ccttggaatt tgcccttttt gagtttggat cttggttcat 9960tctcaagcct
cagacagtgg ttcaaagttt ttttcttcca tttcaggtgt cgtgaggaat 10020tcggccatta
cggcccgcca ccatgatcga attgctctct gaatcgctgg aagggctttc 10080cgccgccatg
atcgccgagc tgggacgcta ccggcatcag gtcttcatcg agaagctggg 10140ctgggacgtg
gtctccacct ccagggtccg cgaccaggaa ttcgaccagt tcgaccatcc 10200gcaaacccgc
tacatcgtcg ccatgagccg ccagggcatc tgcggttgcg cccgcctgct 10260gccgacgacc
gacgcctacc tgctcaagga cgtcttcgcc tacctgtgca gcgaaacccc 10320gccgagcgat
ccgtcggtct gggagctttc gcgctacgcc gccagcgcgg cggacgatcc 10380gcagctggcg
atgaagatat tctggtccag cctgcaatgc gcctggtacc tgggcgccag 10440ttcggtggtg
gcggtgacca ccacggccat ggagcgctat ttcgttcgca acggcgtgat 10500cctccagcgc
ctcggcccgc cgcagaaggt caagggcgag acgctggtcg cgatcagctt 10560cccggcctac
caggagcgcg gcctggagat gctgctgcgc taccacccgg aatggctgca 10620gggcgtaccg
ctgtcgatgg cggtgtgagg ccgcct
106561012768DNAArtificial SequenceSynthetic 10gatccgcccc tctccctccc
ccccccctaa cgttactggc cgaagccgct tggaataagg 60ccggtgtgcg tttgtctata
tgttattttc caccatattg ccgtcttttg gcaatgtgag 120ggcccggaaa cctggccctg
tcttcttgac gagcattcct aggggtcttt cccctctcgc 180caaaggaatg caaggtctgt
tgaatgtcgt gaaggaagca gttcctctgg aagcttcttg 240aagacaaaca acgtctgtag
cgaccctttg caggcagcgg aaccccccac ctggcgacag 300gtgcctctgc ggccaaaagc
cacgtgtata agatacacct gcaaaggcgg cacaacccca 360gtgccacgtt gtgagttgga
tagttgtgga aagagtcaaa tggctctcct caagcgtatt 420caacaagggg ctgaaggatg
cccagaaggt accccattgt atgggatctg atctggggcc 480tcggtgcaca tgctttacat
gtgtttagtc gaggttaaaa aaacgtctag gccccccgaa 540ccacggggac gtggttttcc
tttgaaaaac acgatgataa tatggccaca accatggtga 600gcaagggcga ggagctgttc
accggggtgg tgcccatcct ggtcgagctg gacggcgacg 660taaacggcca caagttcagc
gtgtccggcg agggcgaggg cgatgccacc tacggcaagc 720tgaccctgaa gttcatctgc
accaccggca agctgcccgt gccctggccc accctcgtga 780ccaccctgac ctacggcgtg
cagtgcttca gccgctaccc cgaccacatg aagcagcacg 840acttcttcaa gtccgccatg
cccgaaggct acgtccagga gcgcaccatc ttcttcaagg 900acgacggcaa ctacaagacc
cgcgccgagg tgaagttcga gggcgacacc ctggtgaacc 960gcatcgagct gaagggcatc
gacttcaagg aggacggcaa catcctgggg cacaagctgg 1020agtacaacta caacagccac
aacgtctata tcatggccga caagcagaag aacggcatca 1080aggtgaactt caagatccgc
cacaacatcg aggacggcag cgtgcagctc gccgaccact 1140accagcagaa cacccccatc
ggcgacggcc ccgtgctgct gcccgacaac cactacctga 1200gcacccagtc cgccctgagc
aaagacccca acgagaagcg cgatcacatg gtcctgctgg 1260agttcgtgac cgccgccggg
atcactctcg gcatggacga gctgtacaag taagcggccg 1320caatcaacct ctggattaca
aaatttgtga aagattgact ggtattctta actatgttgc 1380tccttttacg ctatgtggat
acgctgcttt aatgcctttg tatcatgcta ttacttcccg 1440tacggctttc attttctcct
ccttgtataa atcctggttg ctgtctcttt atgaggagtt 1500gtggcccgtt gtcaggcaac
gtggcgtggt gtgcactgtg tttgctgacg caacccccac 1560tggttggggc attgccacca
cctatcaact cctttccggg actttcgctt tccccctccc 1620tattgccacg gcggaactca
ttgccgcctg ccttgcccgc tgctggacag gggctcggct 1680gttgggcact gacaattccg
tggtgttgtc ggggaagctg acgtcctttc catggctgct 1740cgcctgtgtt gccaactgga
ttctgcgcgg gacgtccttc tgctacgtcc cttcggccct 1800caatccagcg gaccttcctt
cccgcggcct gctgccggtt ctgcggcctc ttccgcgtct 1860tcgccttcgc cctcagacga
gtcggatctc cctttgggcc gcctccccgc ctgcctgcag 1920gtttgtcgag acctagaaaa
acatggagca atcacaagta gcaatacagc agctaccaat 1980gctgattgtg cctggctaga
agcacaagag gaggaggagg tgggttttcc agtcacacct 2040caggtacctt taagaccaat
gacttacaag gcagctgtag atcttagcca ctttttaaaa 2100gaaaaggggg gactggaagg
gctaattcac tcccaacgaa gacaagatct gctttttgct 2160tgtactgggt ctctctggtt
agaccagatc tgagcctggg agctctctgg ctaactaggg 2220aacccactgc ttaagcctca
ataaagcttg ccttgagtgc ttcaagtagt gtgtgcccgt 2280ctgttgtgtg actctggtaa
ctagagatcc ctcagaccct tttagtcagt gtggaaaatc 2340tctagcaggg cccgtttaaa
cccgctgatc agcctcgact gtgccttcta gttgccagcc 2400atctgttgtt tgcccctccc
ccgtgccttc cttgaccctg gaaggtgcca ctcccactgt 2460cctttcctaa taaaatgagg
aaattgcatc gcattgtctg agtaggtgtc attctattct 2520ggggggtggg gtggggcagg
acagcaaggg ggaggattgg gaagacaata gcaggcatgc 2580tggggatgcg gtgggctcta
tggcttctga ggcggaaaga accagctggg gctctagggg 2640gtatccccac gcgccctgta
gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag 2700cgtgaccgct acacttgcca
gcgccctagc gcccgctcct ttcgctttct tcccttcctt 2760tctcgccacg ttcgccggct
ttccccgtca agctctaaat cggggcatcc ctttagggtt 2820ccgatttagt gctttacggc
acctcgaccc caaaaaactt gattagggtg atggttcacg 2880tagtgggcca tcgccctgat
agacggtttt tcgccctttg acgttggagt ccacgttctt 2940taatagtgga ctcttgttcc
aaactggaac aacactcaac cctatctcgg tctattcttt 3000tgatttataa gggattttgg
ggatttcggc ctattggtta aaaaatgagc tgatttaaca 3060aaaatttaac gcgaattaat
tctgtggaat gtgtgtcagt tagggtgtgg aaagtcccca 3120ggctccccag gcaggcagaa
gtatgcaaag catgcatctc aattagtcag caaccaggtg 3180tggaaagtcc ccaggctccc
cagcaggcag aagtatgcaa agcatgcatc tcaattagtc 3240agcaaccata gtcccgcccc
taactccgcc catcccgccc ctaactccgc ccagttccgc 3300ccattctccg ccccatggct
gactaatttt ttttatttat gcagaggccg aggccgcctc 3360tgcctctgag ctattccaga
agtagtgagg aggctttttt ggaggcctag gcttttgcaa 3420aaagctcccg ggagcttgta
tatccatttt cggatctgat cagcacgtgt tgacaattaa 3480tcatcggcat agtatatcgg
catagtataa tacgacaagg tgaggaacta aaccatggcc 3540aagttgacca gtgccgttcc
ggtgctcacc gcgcgcgacg tcgccggagc ggtcgagttc 3600tggaccgacc ggctcgggtt
ctcccgggac ttcgtggagg acgacttcgc cggtgtggtc 3660cgggacgacg tgaccctgtt
catcagcgcg gtccaggacc aggtggtgcc ggacaacacc 3720ctggcctggg tgtgggtgcg
cggcctggac gagctgtacg ccgagtggtc ggaggtcgtg 3780tccacgaact tccgggacgc
ctccgggccg gccatgaccg agatcggcga gcagccgtgg 3840gggcgggagt tcgccctgcg
cgacccggcc ggcaactgcg tgcacttcgt ggccgaggag 3900caggactgac acgtgctacg
agatttcgat tccaccgccg ccttctatga aaggttgggc 3960ttcggaatcg ttttccggga
cgccggctgg atgatcctcc agcgcgggga tctcatgctg 4020gagttcttcg cccaccccaa
cttgtttatt gcagcttata atggttacaa ataaagcaat 4080agcatcacaa atttcacaaa
taaagcattt ttttcactgc attctagttg tggtttgtcc 4140aaactcatca atgtatctta
tcatgtctgt ataccgtcga cctctagcta gagcttggcg 4200taatcatggt catagctgtt
tcctgtgtga aattgttatc cgctcacaat tccacacaac 4260atacgagccg gaagcataaa
gtgtaaagcc tggggtgcct aatgagtgag ctaactcaca 4320ttaattgcgt tgcgctcact
gcccgctttc cagtcgggaa acctgtcgtg ccagctgcat 4380taatgaatcg gccaacgcgc
ggggagaggc ggtttgcgta ttgggcgctc ttccgcttcc 4440tcgctcactg actcgctgcg
ctcggtcgtt cggctgcggc gagcggtatc agctcactca 4500aaggcggtaa tacggttatc
cacagaatca ggggataacg caggaaagaa catgtgagca 4560aaaggccagc aaaaggccag
gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg 4620ctccgccccc ctgacgagca
tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg 4680acaggactat aaagatacca
ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt 4740ccgaccctgc cgcttaccgg
atacctgtcc gcctttctcc cttcgggaag cgtggcgctt 4800tctcaatgct cacgctgtag
gtatctcagt tcggtgtagg tcgttcgctc caagctgggc 4860tgtgtgcacg aaccccccgt
tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt 4920gagtccaacc cggtaagaca
cgacttatcg ccactggcag cagccactgg taacaggatt 4980agcagagcga ggtatgtagg
cggtgctaca gagttcttga agtggtggcc taactacggc 5040tacactagaa ggacagtatt
tggtatctgc gctctgctga agccagttac cttcggaaaa 5100agagttggta gctcttgatc
cggcaaacaa accaccgctg gtagcggtgg tttttttgtt 5160tgcaagcagc agattacgcg
cagaaaaaaa ggatctcaag aagatccttt gatcttttct 5220acggggtctg acgctcagtg
gaacgaaaac tcacgttaag ggattttggt catgagatta 5280tcaaaaagga tcttcaccta
gatcctttta aattaaaaat gaagttttaa atcaatctaa 5340agtatatatg agtaaacttg
gtctgacagt taccaatgct taatcagtga ggcacctatc 5400tcagcgatct gtctatttcg
ttcatccata gttgcctgac tccccgtcgt gtagataact 5460acgatacggg agggcttacc
atctggcccc agtgctgcaa tgataccgcg agacccacgc 5520tcaccggctc cagatttatc
agcaataaac cagccagccg gaagggccga gcgcagaagt 5580ggtcctgcaa ctttatccgc
ctccatccag tctattaatt gttgccggga agctagagta 5640agtagttcgc cagttaatag
tttgcgcaac gttgttgcca ttgctacagg catcgtggtg 5700tcacgctcgt cgtttggtat
ggcttcattc agctccggtt cccaacgatc aaggcgagtt 5760acatgatccc ccatgttgtg
caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc 5820agaagtaagt tggccgcagt
gttatcactc atggttatgg cagcactgca taattctctt 5880actgtcatgc catccgtaag
atgcttttct gtgactggtg agtactcaac caagtcattc 5940tgagaatagt gtatgcggcg
accgagttgc tcttgcccgg cgtcaatacg ggataatacc 6000gcgccacata gcagaacttt
aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa 6060ctctcaagga tcttaccgct
gttgagatcc agttcgatgt aacccactcg tgcacccaac 6120tgatcttcag catcttttac
tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa 6180aatgccgcaa aaaagggaat
aagggcgaca cggaaatgtt gaatactcat actcttcctt 6240tttcaatatt attgaagcat
ttatcagggt tattgtctca tgagcggata catatttgaa 6300tgtatttaga aaaataaaca
aataggggtt ccgcgcacat ttccccgaaa agtgccacct 6360gacgtcgacg gatcgggaga
tctcccgatc ccctatggtg cactctcagt acaatctgct 6420ctgatgccgc atagttaagc
cagtatctgc tccctgcttg tgtgttggag gtcgctgagt 6480agtgcgcgag caaaatttaa
gctacaacaa ggcaaggctt gaccgacaat tgcatgaaga 6540atctgcttag ggttaggcgt
tttgcgctgc ttcgcgatgt acgggccaga tatacgcgtt 6600gacattgatt attgactagt
tattaatagt aatcaattac ggggtcatta gttcatagcc 6660catatatgga gttccgcgtt
acataactta cggtaaatgg cccgcctggc tgaccgccca 6720acgacccccg cccattgacg
tcaataatga cgtatgttcc catagtaacg ccaataggga 6780ctttccattg acgtcaatgg
gtggactatt tacggtaaac tgcccacttg gcagtacatc 6840aagtgtatca tatgccaagt
acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 6900ggcattatgc ccagtacatg
accttatggg actttcctac ttggcagtac atctacgtat 6960tagtcatcgc tattaccatg
gtgatgcggt tttggcagta catcaatggg cgtggatagc 7020ggtttgactc acggggattt
ccaagtctcc accccattga cgtcaatggg agtttgtttt 7080ggcaccaaaa tcaacgggac
tttccaaaat gtcgtaacaa ctccgcccca ttgacgcaaa 7140tgggcggtag gcgtgtacgg
tgggaggtct atataagcag agctctctgg ctaactagag 7200aacccactgc ttactggctt
atcgaaatta atacgactca ctatagggag acccaagctg 7260gtttaaactt aagcttggta
ccgagctcac tagtccagtg tggtggcaga tatccagcac 7320agtggcggcc gctcgagtct
agagggcccg tttaaacnnn nngggtctct ctggttagac 7380cagatctgag cctgggagct
ctctggctaa ctagggaacc cactgcttaa gcctcaataa 7440agcttgcctt gagtgcttca
agtagtgtgt gcccgtctgt tgtgtgactc tggtaactag 7500agatccctca gaccctttta
gtcagtgtgg aaaatctcta gcannnnnnn nnnccagcaa 7560cttatctgtg tctgtccgat
tgtctagtgt ctatgtttga tgttatgcgc ctgcgtctgt 7620actagttagc taactagctc
tgtatctggc ggacccgtgg tggaactgac gagttctgaa 7680cacccggccg caaccctggg
agacgtccca gggactttgg gggccgtttt tgtggcccga 7740cctgaggaag ggagtcgatg
tggaatccga ccccgtcagg atatgtggtt ctggtaggag 7800acgagaacct aaaacagttc
ccgcctccgt ctgaattttt gctttcggtt tggaaccgaa 7860gccgcgcgtc ttgtctgctg
cagcgctgca gcatcgttct gtgttgtctc tgtctgactg 7920tgtttctgta tttgtctgaa
aattagggcc agactgttac cactccctta agtttgacct 7980taggtcactg gaaagatgtc
gagcggatcg ctcacaacca gtcggtagat gtcaagaaga 8040gacgttgggt taccttctgc
tctgcagaat ggccaacctt taacgtcgga tggccgcgag 8100acggcacctt taaccgagac
ctcatcaccc aggttaagat caaggtcttt tcacctggcc 8160cgcatggaca cccagaccag
gtcccctaca tcgtgacctg ggaagccttg gcttttgacc 8220cccctccctg ggtcaagccc
tttgtacacc ctaagcctcc gcctcctctt cctccatccg 8280ccccgtctct cccccttgaa
cctcctcgtt cgaccccgcc tcgatcctcc ctttatccag 8340ccctcactcc ttctctaggc
gccnnnnngg aaaagtttag taaaacacca tatgtatgtt 8400tcagggaaag ctaggggatg
gttttataga catcactatg aaagccctca tccaagaata 8460agttcagaag tacacatccc
actaggggat gctagattgg taataacaac atattggggt 8520ctgcatacag gagaaagaga
ctggcatctg ggtcagggag tctccataga atggaggaaa 8580aagagatata gcacacaagt
agaccctgaa ctagcagacc aactaattca tctgtattac 8640tttgactgtt tttcagactc
tgctataaga aaggccttat taggacatat agttagccct 8700aggtgtgaat atcaagcagg
acataacaag gtaggatctc tacaatactt ggcactagca 8760gcattaataa caccaaaaaa
gataaagcca cctttgccta gtgttacgaa actgacagag 8820gatagatgga acaagcccca
gaagaccaag ggccacagag ggagccacac aatgaatgga 8880cactagagct tttagaggag
cttaagaatg aagctgttag acattttcct aggatttggc 8940tccatggctt agggcaacat
atctatgaaa cttatgggga tacttgggca ggagtggaag 9000ccataataat gcaacaactg
ctgtttatcc atttcagaat tgggtgtcga catagcagaa 9060taggcgttac tcaacagagg
agagcaagaa atggagccag tagatcctag actagagccc 9120tggaagcatc caggaagtca
gcctaaaact gcttgtacca cttgctattg taaaaagtgt 9180tgctttcatt gccaagtttg
tttcacaaca aaagccttag gcatctccta tggcaggaag 9240aagcggagac agcgacgaag
acctcctcaa ggcagtcaga ctcatcaagt ttctctatca 9300aagcagtaag tagtacatgt
aatgcaacct atacaaatag caatagcagc attagtagta 9360gcaataataa tagcaatagt
tgtgtggtcc atagtaatca tagaatatag gaaaatatta 9420agacaaagaa aaatagacag
gttaattgat agactaatnn nnnnnnggcc cgaaggaata 9480gaagaagaag gtggagagag
agacagagac agatccattc gattagtgaa cggatcggca 9540ctgcgtgcgc caattctgca
gacaaatggc agtattcatc cacaatttta aaagaaaagg 9600ggggattggg gggtacagtg
caggggaaag aatagtagac ataatagcaa cagacataca 9660aactaaagaa ttacaaaaac
aaattacaaa aattcaaaat tttcgggttt attacaggga 9720cagcagagat ccagtttggt
taattaatgc tgctgagcca cgtgaagcct taagccgcag 9780ggagaccctg aaaatagact
gtaagcaacc ttggggtggg gtaagcctgt tgcagcgtta 9840gtgtatccat gaactccttt
aagatacaag ctatgtgtca ttcctcatgg agacgccgga 9900gagtcttgag cttacatgat
ggtcctgcgg aggccttaaa gtcctgtggg gatgctcagg 9960gttgtccctg tgtgtcactg
agacctccga aagaatttaa atatcacaga aaaaaaaaac 10020ataaagaaga ccttaagcct
ttttaagttc ccaaggagag ccagaagtgg tggtatatag 10080ctcaacttgg gagactgagg
caaaaagact gactgagttt aagaccagcc tagctacacc 10140acaaagctcc tttctcaaaa
gaaaaagctc ccaaggaggt tggatcccca gtgagaccct 10200cagcaaggcc ctctggcctc
actatggtcc tgaggacacc cctccacact gccctgatct 10260tcttacccca tcctgcctga
gctcttcacc tgctctcctc ttgcattcaa atatcttctg 10320ctacagtagg tccactggag
tctcccaggt acccagagtg tgaatgtctg cagcactttc 10380tgggggacaa ggagcagaga
gcaagggacc cacaattcgg gtctagtgtc tgtaaagcct 10440tgcggagggt agagttctag
ttcacacaag gcaccaagtg tttttgctgg ttgcctagga 10500aacaggacag tgccaaatca
ggaacagaaa gagtcaagga acccccaacc actccaagcg 10560gaggctgaga aaggttttgt
agctggaata gagcatgcac taacagatgg agacagctgg 10620ctttgagctc tgaagcaagt
attacatatg gagacttgct ggccttcagg tgcttatctt 10680gttattggat actgcaggag
gatgtaccac agggcttcag ctcagctgac ccccaagtgg 10740gatatggaaa gagagataga
ggaggaggga ccattaagtg ccttgctgcc tgaattctgc 10800tttccttcta cctctgagag
agagctgggg actcggctga gttaagaacc cagctatcaa 10860ttggaactgt gaaacagtcc
aagggacaaa gatactaggt ccccaactgc aacttcctgg 10920ggaatgatgt ggaaaaatgc
tcagccaagg acaaagaaag catcacccac tctggaacaa 10980tgtcccctgc tgtgaactgg
ttcatcaggc catcagggcc ccttgttaag actctaatta 11040ccctaggact aagtagaggt
gttgacgtcc aatgagcgct ttctgcagac ctagcaccag 11100ggaagtgttt ggaaactgca
gcttcagccc ctctggccat ctgctgacct accccacctg 11160gagcccttaa tgggtcaaac
agcaaagtcc agggggcaga gaggaggtgc tttggtctat 11220aaaggtagtg gggacccagt
aaccaccggc gcgccaagct agcaagttaa caaatcgatc 11280cggatcctcc caccatgaaa
ccagtaacgt tatacgatgt cgcagagtat gccggtgtct 11340cttatcagac cgtttcccgc
gtggtgaacc aggccagcca cgtttctgcg aaaacgcggg 11400aaaaagtgga agcggcgatg
gcggagctga attacattcc caaccgcgtg gcacaacaac 11460tggcgggcaa acagtcgttg
ctgattggcg ttgccacctc cagtctggcc ctgcacgcgc 11520cgtcgcaaat tgtcgcggcg
attaaatctc gcgccgatca actgggtgcc agcgtggtgg 11580tgtcgatggt agaacgaagc
ggcgtcgaag cctgtaaagc ggcggtgcac aatcttctcg 11640cgcaacgcgt cagtgggctg
atcattaact atccgctgga tgaccaggat gccattgctg 11700tggaagctgc ctgcactaat
gttccggcgt tatttcttga tgtctctgac cagacaccca 11760tcaacagtat tattttctcc
catgaagacg gtacgcgact gggcgtggag catctggtcg 11820cattgggtca ccagcaaatc
gcgctgttag cgggcccatt aagttctgtc tcggcgcgtc 11880tgcgtctggc tggctggcat
aaatatctca ctcgcaatca aattcagccg atagcggaac 11940gggaaggcga ctggagtgcc
atgtccggtt ttcaacaaac catgcaaatg ctgaatgagg 12000gcatcgttcc cactgcgatg
ctggttgcca acgatcagat ggcgctgggc gcaatgcgcg 12060ccattaccga gtccgggctg
cgcgttggtg cggatatctc ggtagtggga tacgacgata 12120ccgaagacag ctcatgttat
atcccgccgt taaccaccat caaacaggat tttcgcctgc 12180tggggcaaac cagcgtggac
cgcttgctgc aactctctca gggccaggcg gtgaagggca 12240atcagctgtt gcccgtctca
ctggtgaaaa gaaaaaccac cctggcgccc aatacgcaaa 12300ccgcctctcc ccgcgcgttg
gccgattcat taatgcagct ggcacgacag gtttcccgac 12360tggaaagcgg gcagagaccg
ccaaaaaaga agagaaaggt cgacggcggt ggtgctttgt 12420ctcctcagca ctctgctgtc
actcaaggaa gtatcatcaa gaacaaggag ggcatggatg 12480ctaagtcact aactgcctgg
tcccggacac tggtgacctt caaggatgta tttgtggact 12540tcaccaggga ggagtggaag
ctgctggaca ctgctcagca gatcgtgtac agaaatgtga 12600tgctggagaa ctataagaac
ctggtttcct tgggttatca gcttactaag ccagatgtga 12660tcctccggtt ggagaaggga
gaagagccct ggctggtgga gagagaaatt caccaagaga 12720cccatcctga ttcagagact
gcatttgaaa tcaaatcatc agtttgag 127681111287DNAArtificial
SequenceSynthetic 11cgcgccaagc tagcaagtta acaaatcgat ccggatccnn
nnngcccctc tccctccccc 60ccccctaacg ttactggccg aagccgcttg gaataaggcc
ggtgtgcgtt tgtctatatg 120ttattttcca ccatattgcc gtcttttggc aatgtgaggg
cccggaaacc tggccctgtc 180ttcttgacga gcattcctag gggtctttcc cctctcgcca
aaggaatgca aggtctgttg 240aatgtcgtga aggaagcagt tcctctggaa gcttcttgaa
gacaaacaac gtctgtagcg 300accctttgca ggcagcggaa ccccccacct ggcgacaggt
gcctctgcgg ccaaaagcca 360cgtgtataag atacacctgc aaaggcggca caaccccagt
gccacgttgt gagttggata 420gttgtggaaa gagtcaaatg gctctcctca agcgtattca
acaaggggct gaaggatgcc 480cagaaggtac cccattgtat gggatctgat ctggggcctc
ggtgcacatg ctttacatgt 540gtttagtcga ggttaaaaaa acgtctaggc cccccgaacc
acggggacgt ggttttcctt 600tgaaaaacac gatgataata tggccacaac catggtgagc
aagggcgagg agctgttcac 660cggggtggtg cccatcctgg tcgagctgga cggcgacgta
aacggccaca agttcagcgt 720gtccggcgag ggcgagggcg atgccaccta cggcaagctg
accctgaagt tcatctgcac 780caccggcaag ctgcccgtgc cctggcccac cctcgtgacc
accctgacct acggcgtgca 840gtgcttcagc cgctaccccg accacatgaa gcagcacgac
ttcttcaagt ccgccatgcc 900cgaaggctac gtccaggagc gcaccatctt cttcaaggac
gacggcaact acaagacccg 960cgccgaggtg aagttcgagg gcgacaccct ggtgaaccgc
atcgagctga agggcatcga 1020cttcaaggag gacggcaaca tcctggggca caagctggag
tacaactaca acagccacaa 1080cgtctatatc atggccgaca agcagaagaa cggcatcaag
gtgaacttca agatccgcca 1140caacatcgag gacggcagcg tgcagctcgc cgaccactac
cagcagaaca cccccatcgg 1200cgacggcccc gtgctgctgc ccgacaacca ctacctgagc
acccagtccg ccctgagcaa 1260agaccccaac gagaagcgcg atcacatggt cctgctggag
ttcgtgaccg ccgccgggat 1320cactctcggc atggacgagc tgtacaagta agcggccgca
atcaacctct ggattacaaa 1380atttgtgaaa gattgactgg tattcttaac tatgttgctc
cttttacgct atgtggatac 1440gctgctttaa tgcctttgta tcatgctatt acttcccgta
cggctttcat tttctcctcc 1500ttgtataaat cctggttgct gtctctttat gaggagttgt
ggcccgttgt caggcaacgt 1560ggcgtggtgt gcactgtgtt tgctgacgca acccccactg
gttggggcat tgccaccacc 1620tatcaactcc tttccgggac tttcgctttc cccctcccta
ttgccacggc ggaactcatt 1680gccgcctgcc ttgcccgctg ctggacaggg gctcggctgt
tgggcactga caattccgtg 1740gtgttgtcgg ggaagctgac gtcctttcca tggctgctcg
cctgtgttgc caactggatt 1800ctgcgcggga cgtccttctg ctacgtccct tcggccctca
atccagcgga ccttccttcc 1860cgcggcctgc tgccggttct gcggcctctt ccgcgtcttc
gccttcgccc tcagacgagt 1920cggatctccc tttgggccgc ctccccgcct gcctgcaggt
ttgtcgagac ctagaaaaac 1980atggagcaat cacaagtagc aatacagcag ctaccaatgc
tgattgtgcc tggctagaag 2040cacaagagga ggaggaggtg ggttttccag tcacacctca
ggtaccttta agaccaatga 2100cttacaaggc agctgtagat cttagccact ttttaaaaga
aaagggggga ctggaagggc 2160taattcactc ccaacgaaga caagatctgc tttttgcttg
tactgggtct ctctggttag 2220accagatctg agcctgggag ctctctggct aactagggaa
cccactgctt aagcctcaat 2280aaagcttgcc ttgagtgctt caagtagtgt gtgcccgtct
gttgtgtgac tctggtaact 2340agagatccct cagacccttt tagtcagtgt ggaaaatctc
tagcagggcc cgtttaaacc 2400cgctgatcag cctcgactgt gccttctagt tgccagccat
ctgttgtttg cccctccccc 2460gtgccttcct tgaccctgga aggtgccact cccactgtcc
tttcctaata aaatgaggaa 2520attgcatcgc attgtctgag taggtgtcat tctattctgg
ggggtggggt ggggcaggac 2580agcaaggggg aggattggga agacaatagc aggcatgctg
gggatgcggt gggctctatg 2640gcttctgagg cggaaagaac cagctggggc tctagggggt
atccccacgc gccctgtagc 2700ggcgcattaa gcgcggcggg tgtggtggtt acgcgcagcg
tgaccgctac acttgccagc 2760gccctagcgc ccgctccttt cgctttcttc ccttcctttc
tcgccacgtt cgccggcttt 2820ccccgtcaag ctctaaatcg gggcatccct ttagggttcc
gatttagtgc tttacggcac 2880ctcgacccca aaaaacttga ttagggtgat ggttcacgta
gtgggccatc gccctgatag 2940acggtttttc gccctttgac gttggagtcc acgttcttta
atagtggact cttgttccaa 3000actggaacaa cactcaaccc tatctcggtc tattcttttg
atttataagg gattttgggg 3060atttcggcct attggttaaa aaatgagctg atttaacaaa
aatttaacgc gaattaattc 3120tgtggaatgt gtgtcagtta gggtgtggaa agtccccagg
ctccccaggc aggcagaagt 3180atgcaaagca tgcatctcaa ttagtcagca accaggtgtg
gaaagtcccc aggctcccca 3240gcaggcagaa gtatgcaaag catgcatctc aattagtcag
caaccatagt cccgccccta 3300actccgccca tcccgcccct aactccgccc agttccgccc
attctccgcc ccatggctga 3360ctaatttttt ttatttatgc agaggccgag gccgcctctg
cctctgagct attccagaag 3420tagtgaggag gcttttttgg aggcctaggc ttttgcaaaa
agctcccggg agcttgtata 3480tccattttcg gatctgatca gcacgtgttg acaattaatc
atcggcatag tatatcggca 3540tagtataata cgacaaggtg aggaactaaa ccatggccaa
gttgaccagt gccgttccgg 3600tgctcaccgc gcgcgacgtc gccggagcgg tcgagttctg
gaccgaccgg ctcgggttct 3660cccgggactt cgtggaggac gacttcgccg gtgtggtccg
ggacgacgtg accctgttca 3720tcagcgcggt ccaggaccag gtggtgccgg acaacaccct
ggcctgggtg tgggtgcgcg 3780gcctggacga gctgtacgcc gagtggtcgg aggtcgtgtc
cacgaacttc cgggacgcct 3840ccgggccggc catgaccgag atcggcgagc agccgtgggg
gcgggagttc gccctgcgcg 3900acccggccgg caactgcgtg cacttcgtgg ccgaggagca
ggactgacac gtgctacgag 3960atttcgattc caccgccgcc ttctatgaaa ggttgggctt
cggaatcgtt ttccgggacg 4020ccggctggat gatcctccag cgcggggatc tcatgctgga
gttcttcgcc caccccaact 4080tgtttattgc agcttataat ggttacaaat aaagcaatag
catcacaaat ttcacaaata 4140aagcattttt ttcactgcat tctagttgtg gtttgtccaa
actcatcaat gtatcttatc 4200atgtctgtat accgtcgacc tctagctaga gcttggcgta
atcatggtca tagctgtttc 4260ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat
acgagccgga agcataaagt 4320gtaaagcctg gggtgcctaa tgagtgagct aactcacatt
aattgcgttg cgctcactgc 4380ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta
atgaatcggc caacgcgcgg 4440ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc
gctcactgac tcgctgcgct 4500cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa
ggcggtaata cggttatcca 4560cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa
aggccagcaa aaggccagga 4620accgtaaaaa ggccgcgttg ctggcgtttt tccataggct
ccgcccccct gacgagcatc 4680acaaaaatcg acgctcaagt cagaggtggc gaaacccgac
aggactataa agataccagg 4740cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc
gaccctgccg cttaccggat 4800acctgtccgc ctttctccct tcgggaagcg tggcgctttc
tcaatgctca cgctgtaggt 4860atctcagttc ggtgtaggtc gttcgctcca agctgggctg
tgtgcacgaa ccccccgttc 4920agcccgaccg ctgcgcctta tccggtaact atcgtcttga
gtccaacccg gtaagacacg 4980acttatcgcc actggcagca gccactggta acaggattag
cagagcgagg tatgtaggcg 5040gtgctacaga gttcttgaag tggtggccta actacggcta
cactagaagg acagtatttg 5100gtatctgcgc tctgctgaag ccagttacct tcggaaaaag
agttggtagc tcttgatccg 5160gcaaacaaac caccgctggt agcggtggtt tttttgtttg
caagcagcag attacgcgca 5220gaaaaaaagg atctcaagaa gatcctttga tcttttctac
ggggtctgac gctcagtgga 5280acgaaaactc acgttaaggg attttggtca tgagattatc
aaaaaggatc ttcacctaga 5340tccttttaaa ttaaaaatga agttttaaat caatctaaag
tatatatgag taaacttggt 5400ctgacagtta ccaatgctta atcagtgagg cacctatctc
agcgatctgt ctatttcgtt 5460catccatagt tgcctgactc cccgtcgtgt agataactac
gatacgggag ggcttaccat 5520ctggccccag tgctgcaatg ataccgcgag acccacgctc
accggctcca gatttatcag 5580caataaacca gccagccgga agggccgagc gcagaagtgg
tcctgcaact ttatccgcct 5640ccatccagtc tattaattgt tgccgggaag ctagagtaag
tagttcgcca gttaatagtt 5700tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc
acgctcgtcg tttggtatgg 5760cttcattcag ctccggttcc caacgatcaa ggcgagttac
atgatccccc atgttgtgca 5820aaaaagcggt tagctccttc ggtcctccga tcgttgtcag
aagtaagttg gccgcagtgt 5880tatcactcat ggttatggca gcactgcata attctcttac
tgtcatgcca tccgtaagat 5940gcttttctgt gactggtgag tactcaacca agtcattctg
agaatagtgt atgcggcgac 6000cgagttgctc ttgcccggcg tcaatacggg ataataccgc
gccacatagc agaactttaa 6060aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact
ctcaaggatc ttaccgctgt 6120tgagatccag ttcgatgtaa cccactcgtg cacccaactg
atcttcagca tcttttactt 6180tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa
tgccgcaaaa aagggaataa 6240gggcgacacg gaaatgttga atactcatac tcttcctttt
tcaatattat tgaagcattt 6300atcagggtta ttgtctcatg agcggataca tatttgaatg
tatttagaaa aataaacaaa 6360taggggttcc gcgcacattt ccccgaaaag tgccacctga
cgtcgacgga tcgggagatc 6420tcccgatccc ctatggtgca ctctcagtac aatctgctct
gatgccgcat agttaagcca 6480gtatctgctc cctgcttgtg tgttggaggt cgctgagtag
tgcgcgagca aaatttaagc 6540tacaacaagg caaggcttga ccgacaattg catgaagaat
ctgcttaggg ttaggcgttt 6600tgcgctgctt cgcgatgtac gggccagata tacgcgttga
cattgattat tgactagtta 6660ttaatagtaa tcaattacgg ggtcattagt tcatagccca
tatatggagt tccgcgttac 6720ataacttacg gtaaatggcc cgcctggctg accgcccaac
gacccccgcc cattgacgtc 6780aataatgacg tatgttccca tagtaacgcc aatagggact
ttccattgac gtcaatgggt 6840ggactattta cggtaaactg cccacttggc agtacatcaa
gtgtatcata tgccaagtac 6900gccccctatt gacgtcaatg acggtaaatg gcccgcctgg
cattatgccc agtacatgac 6960cttatgggac tttcctactt ggcagtacat ctacgtatta
gtcatcgcta ttaccatggt 7020gatgcggttt tggcagtaca tcaatgggcg tggatagcgg
tttgactcac ggggatttcc 7080aagtctccac cccattgacg tcaatgggag tttgttttgg
caccaaaatc aacgggactt 7140tccaaaatgt cgtaacaact ccgccccatt gacgcaaatg
ggcggtaggc gtgtacggtg 7200ggaggtctat ataagcagag ctctctggct aactagagaa
cccactgctt actggcttat 7260cgaaattaat acgactcact atagggagac ccaagctggt
ttaaacttaa gcttggtacc 7320gagctcacta gtccagtgtg gtggcagata tccagcacag
tggcggccgc tcgagtctag 7380agggcccgtt taaacnnnnn gggtctctct ggttagacca
gatctgagcc tgggagctct 7440ctggctaact agggaaccca ctgcttaagc ctcaataaag
cttgccttga gtgcttcaag 7500tagtgtgtgc ccgtctgttg tgtgactctg gtaactagag
atccctcaga cccttttagt 7560cagtgtggaa aatctctagc annnnnnnnn nccagcaact
tatctgtgtc tgtccgattg 7620tctagtgtct atgtttgatg ttatgcgcct gcgtctgtac
tagttagcta actagctctg 7680tatctggcgg acccgtggtg gaactgacga gttctgaaca
cccggccgca accctgggag 7740acgtcccagg gactttgggg gccgtttttg tggcccgacc
tgaggaaggg agtcgatgtg 7800gaatccgacc ccgtcaggat atgtggttct ggtaggagac
gagaacctaa aacagttccc 7860gcctccgtct gaatttttgc tttcggtttg gaaccgaagc
cgcgcgtctt gtctgctgca 7920gcgctgcagc atcgttctgt gttgtctctg tctgactgtg
tttctgtatt tgtctgaaaa 7980ttagggccag actgttacca ctcccttaag tttgacctta
ggtcactgga aagatgtcga 8040gcggatcgct cacaaccagt cggtagatgt caagaagaga
cgttgggtta ccttctgctc 8100tgcagaatgg ccaaccttta acgtcggatg gccgcgagac
ggcaccttta accgagacct 8160catcacccag gttaagatca aggtcttttc acctggcccg
catggacacc cagaccaggt 8220cccctacatc gtgacctggg aagccttggc ttttgacccc
cctccctggg tcaagccctt 8280tgtacaccct aagcctccgc ctcctcttcc tccatccgcc
ccgtctctcc cccttgaacc 8340tcctcgttcg accccgcctc gatcctccct ttatccagcc
ctcactcctt ctctaggcgc 8400cnnnnnggaa aagtttagta aaacaccata tgtatgtttc
agggaaagct aggggatggt 8460tttatagaca tcactatgaa agccctcatc caagaataag
ttcagaagta cacatcccac 8520taggggatgc tagattggta ataacaacat attggggtct
gcatacagga gaaagagact 8580ggcatctggg tcagggagtc tccatagaat ggaggaaaaa
gagatatagc acacaagtag 8640accctgaact agcagaccaa ctaattcatc tgtattactt
tgactgtttt tcagactctg 8700ctataagaaa ggccttatta ggacatatag ttagccctag
gtgtgaatat caagcaggac 8760ataacaaggt aggatctcta caatacttgg cactagcagc
attaataaca ccaaaaaaga 8820taaagccacc tttgcctagt gttacgaaac tgacagagga
tagatggaac aagccccaga 8880agaccaaggg ccacagaggg agccacacaa tgaatggaca
ctagagcttt tagaggagct 8940taagaatgaa gctgttagac attttcctag gatttggctc
catggcttag ggcaacatat 9000ctatgaaact tatggggata cttgggcagg agtggaagcc
ataataatgc aacaactgct 9060gtttatccat ttcagaattg ggtgtcgaca tagcagaata
ggcgttactc aacagaggag 9120agcaagaaat ggagccagta gatcctagac tagagccctg
gaagcatcca ggaagtcagc 9180ctaaaactgc ttgtaccact tgctattgta aaaagtgttg
ctttcattgc caagtttgtt 9240tcacaacaaa agccttaggc atctcctatg gcaggaagaa
gcggagacag cgacgaagac 9300ctcctcaagg cagtcagact catcaagttt ctctatcaaa
gcagtaagta gtacatgtaa 9360tgcaacctat acaaatagca atagcagcat tagtagtagc
aataataata gcaatagttg 9420tgtggtccat agtaatcata gaatatagga aaatattaag
acaaagaaaa atagacaggt 9480taattgatag actaatnnnn nnnnggcccg aaggaataga
agaagaaggt ggagagagag 9540acagagacag atccattcga ttagtgaacg gatcggcact
gcgtgcgcca attctgcaga 9600caaatggcag tattcatcca caattttaaa agaaaagggg
ggattggggg gtacagtgca 9660ggggaaagaa tagtagacat aatagcaaca gacatacaaa
ctaaagaatt acaaaaacaa 9720attacaaaaa ttcaaaattt tcgggtttat tacagggaca
gcagagatcc agtttggtta 9780attaatgctg ctgagccacg tgaagcctta agccgcaggg
agaccctgaa aatagactgt 9840aagcaacctt ggggtggggt aagcctgttg cagcgttagt
gtatccatga actcctttaa 9900gatacaagct atgtgtcatt cctcatggag acgccggaga
gtcttgagct tacatgatgg 9960tcctgcggag gccttaaagt cctgtgggga tgctcagggt
tgtccctgtg tgtcactgag 10020acctccgaaa gaatttaaat atcacagaaa aaaaaaacat
aaagaagacc ttaagccttt 10080ttaagttccc aaggagagcc agaagtggtg gtatatagct
caacttggga gactgaggca 10140aaaagactga ctgagtttaa gaccagccta gctacaccac
aaagctcctt tctcaaaaga 10200aaaagctccc aaggaggttg gatccccagt gagaccctca
gcaaggccct ctggcctcac 10260tatggtcctg aggacacccc tccacactgc cctgatcttc
ttaccccatc ctgcctgagc 10320tcttcacctg ctctcctctt gcattcaaat atcttctgct
acagtaggtc cactggagtc 10380tcccaggtac ccagagtgtg aatgtctgca gcactttctg
ggggacaagg agcagagagc 10440aagggaccca caattcgggt ctagtgtctg taaagccttg
cggagggtag agttctagtt 10500cacacaaggc accaagtgtt tttgctggtt gcctaggaaa
caggacagtg ccaaatcagg 10560aacagaaaga gtcaaggaac ccccaaccac tccaagcgga
ggctgagaaa ggttttgtag 10620ctggaataga gcatgcacta acagatggag acagctggct
ttgagctctg aagcaagtat 10680tacatatgga gacttgctgg ccttcaggtg cttatcttgt
tattggatac tgcaggagga 10740tgtaccacag ggcttcagct cagctgaccc ccaagtggga
tatggaaaga gagatagagg 10800aggagggacc attaagtgcc ttgctgcctg aattctgctt
tccttctacc tctgagagag 10860agctggggac tcggctgagt taagaaccca gctatcaatt
ggaactgtga aacagtccaa 10920gggacaaaga tactaggtcc ccaactgcaa cttcctgggg
aatgatgtgg aaaaatgctc 10980agccaaggac aaagaaagca tcacccactc tggaacaatg
tcccctgctg tgaactggtt 11040catcaggcca tcagggcccc ttgttaagac tctaattacc
ctaggactaa gtagaggtgt 11100tgacgtccaa tgagcgcttt ctgcagacct agcaccaggg
aagtgtttgg aaactgcagc 11160ttcagcccct ctggccatct gctgacctac cccacctgga
gcccttaatg ggtcaaacag 11220caaagtccag ggggcagaga ggaggtgctt tggtctataa
aggtagtggg gacccagtaa 11280ccaccgg
112871211178DNAArtificial SequenceSynthetic
12tacagtgcag gggaaagaat agtagacata atagcaacag acatacaaac taaagaatta
60caaaaacaaa ttacaaaaat tcaaaatttt cgggtttatt acagggacag cagagatcca
120gtttggttaa ttaattggct ccggtgcccg tcagtgggca gagcgcacat cgcccacagt
180ccccgagaag ttggggggag gggtcggcaa ttgaaccggt gcctagagaa ggtggcgcgg
240ggtaaactgg gaaagtgatg tcgtgtactg gctccgcctt tttcccgagg gtgggggaga
300accgtatata agtgcagtag tcgccgtgaa cgttcttttt cgcaacgggt ttgccgccag
360aacacaggta agtgccgtgt gtggttcccg cgggcctggc ctctttacgg gttatggccc
420ttgcgtgcct tgaattactt ccacctggct gcagtacgtg attcttgatc ccgagcttcg
480ggttggaagt gggtgggaga gttcgaggcc ttgcgcttaa ggagcccctt cgcctcgtgc
540ttgagttgag gcctggcctg ggcgctgggg ccgccgcgtg cgaatctggt ggcaccttcg
600cgcctgtctc gctgctttcg ataagtctct agccatttaa aatttttgat gacctgctgc
660gacgcttttt ttctggcaag atagtcttgt aaatgcgggc caagatctgc acactggtat
720ttcggttttt ggggccgcgg gcggcgacgg ggcccgtgcg tcccagcgca catgttcggc
780gaggcggggc ctgcgagcgc ggccaccgag aatcggacgg gggtagtctc aagctggccg
840gcctgctctg gtgcctggcc tcgcgccgcc gtgtatcgcc ccgccctggg cggcaaggct
900ggcccggtcg gcaccagttg cgtgagcgga aagatggccg cttcccggcc ctgctgcagg
960gagctcaaaa tggaggacgc ggcgctcggg agagcgggcg ggtgagtcac ccacacaaag
1020gaaaagggcc tttccgtcct cagccgtcgc ttcatgtgac tccacggagt accgggcgcc
1080gtccaggcac ctcgattagt tctcgagctt ttggagtacg tcgtctttag gttgggggga
1140ggggttttat gcgatggagt ttccccacac tgagtgggtg gagactgaag ttaggccagc
1200ttggcacttg atgtaattct ccttggaatt tgcccttttt gagtttggat cttggttcat
1260tctcaagcct cagacagtgg ttcaaagttt ttttcttcca tttcaggtgt cgtgaggaat
1320tcggccatta cggcccgcca ccatggctag attagataaa agtaaagtga ttaacagcgc
1380attagagctg cttaatgagg tcggaatcga aggtttaaca acccgtaaac tcgcccagaa
1440gctaggtgta gagcagccta cattgtattg gcatgtaaaa aataagcggg ctttgctcga
1500cgccttagcc attgagatgt tagataggca ccatactcac ttttgccctt tagaagggga
1560aagctggcaa gattttttac gtaataacgc taaaagtttt agatgtgctt tactaagtca
1620tcgcgatgga gcaaaagtac atttaggtac acggcctaca gaaaaacagt atgaaactct
1680cgaaaatcaa ttagcctttt tatgccaaca aggtttttca ctagagaatg cattatatgc
1740actcagcgct gtggggcatt ttactttagg ttgcgtattg gaagatcaag agcatcaagt
1800cgctaaagaa gaaagggaaa cacctactac tgatagtatg ccgccattat tacgacaagc
1860tatcgaatta tttgatcacc aaggtgcaga gccagccttc ttattcggcc ttgaattgat
1920catatgcgga ttagaaaaac aacttaaatg tgaaagtggg tcgccaaaaa agaagagaaa
1980ggtcgacggc ggtggtgctt tgtctcctca gcactctgct gtcactcaag gaagtatcat
2040caagaacaag gagggcatgg atgctaagtc actaactgcc tggtcccgga cactggtgac
2100cttcaaggat gtatttgtgg acttcaccag ggaggagtgg aagctgctgg acactgctca
2160gcagatcgtg tacagaaatg tgatgctgga gaactataag aacctggttt ccttgggtta
2220tcagcttact aagccagatg tgatcctccg gttggagaag ggagaagagc cctggctggt
2280ggagagagaa attcaccaag agacccatcc tgattcagag actgcatttg aaatcaaatc
2340atcagtttaa ggccgcctcg gccatcgata aggatccgga atgcccctct ccctcccccc
2400cccctaacgt tactggccga agccgcttgg aataaggccg gtgtgcgttt gtctatatgt
2460tattttccac catattgccg tcttttggca atgtgagggc ccggaaacct ggccctgtct
2520tcttgacgag cattcctagg ggtctttccc ctctcgccaa aggaatgcaa ggtctgttga
2580atgtcgtgaa ggaagcagtt cctctggaag cttcttgaag acaaacaacg tctgtagcga
2640ccctttgcag gcagcggaac cccccacctg gcgacaggtg cctctgcggc caaaagccac
2700gtgtataaga tacacctgca aaggcggcac aaccccagtg ccacgttgtg agttggatag
2760ttgtggaaag agtcaaatgg ctctcctcaa gcgtattcaa caaggggctg aaggatgccc
2820agaaggtacc ccattgtatg ggatctgatc tggggcctcg gtgcacatgc tttacatgtg
2880tttagtcgag gttaaaaaaa cgtctaggcc ccccgaacca cggggacgtg gttttccttt
2940gaaaaacacg atgataatat ggccacaacc atgaccgagt acaagcccac ggtgcgcctc
3000gccacccgcg acgacgtccc ccgggccgta cgcaccctcg ccgccgcgtt cgccgactac
3060cccgccacgc gccacaccgt cgacccggac cgccacatcg agcgggtcac cgagctgcaa
3120gaactcttcc tcacgcgcgt cgggctcgac atcggcaagg tgtgggtcgc ggacgacggc
3180gccgcggtgg cggtctggac cacgccggag agcgtcgaag cgggggcggt gttcgccgag
3240atcggcccgc gcatggccga gttgagcggt tcccggctgg ccgcgcagca acagatggaa
3300ggcctcctgg cgccgcaccg gcccaaggag cccgcgtggt tcctggccac cgtcggcgtc
3360tcgcccgacc accagggcaa gggtctgggc agcgccgtcg tgctccccgg agtggaggcg
3420gccgagcgcg ccggggtgcc cgccttcctg gagacctccg cgccccgcaa cctccccttc
3480tacgagcggc tcggcttcac cgtcaccgcc gacgtcgagg tgcccgaagg accgcgcacc
3540tggtgcatga cccgcaagcc cggtgcctga gcggccgcaa tcaacctctg gattacaaaa
3600tttgtgaaag attgactggt attcttaact atgttgctcc ttttacgcta tgtggatacg
3660ctgctttaat gcctttgtat catgctatta cttcccgtac ggctttcatt ttctcctcct
3720tgtataaatc ctggttgctg tctctttatg aggagttgtg gcccgttgtc aggcaacgtg
3780gcgtggtgtg cactgtgttt gctgacgcaa cccccactgg ttggggcatt gccaccacct
3840atcaactcct ttccgggact ttcgctttcc ccctccctat tgccacggcg gaactcattg
3900ccgcctgcct tgcccgctgc tggacagggg ctcggctgtt gggcactgac aattccgtgg
3960tgttgtcggg gaagctgacg tcctttccat ggctgctcgc ctgtgttgcc aactggattc
4020tgcgcgggac gtccttctgc tacgtccctt cggccctcaa tccagcggac cttccttccc
4080gcggcctgct gccggttctg cggcctcttc cgcgtcttcg ccttcgccct cagacgagtc
4140ggatctccct ttgggccgcc tccccgcctg cctgcaggtt tgtcgagacc tagaaaaaca
4200tggagcaatc acaagtagca atacagcagc taccaatgct gattgtgcct ggctagaagc
4260acaagaggag gaggaggtgg gttttccagt cacacctcag gtacctttaa gaccaatgac
4320ttacaaggca gctgtagatc ttagccactt tttaaaagaa aaggggggac tggaagggct
4380aattcactcc caacgaagac aagatctgct ttttgcttgt actgggtctc tctggttaga
4440ccagatctga gcctgggagc tctctggcta actagggaac ccactgctta agcctcaata
4500aagcttgcct tgagtgcttc aagtagtgtg tgcccgtctg ttgtgtgact ctggtaacta
4560gagatccctc agaccctttt agtcagtgtg gaaaatctct agcagggccc gtttaaaccc
4620gctgatcagc ctcgactgtg ccttctagtt gccagccatc tgttgtttgc ccctcccccg
4680tgccttcctt gaccctggaa ggtgccactc ccactgtcct ttcctaataa aatgaggaaa
4740ttgcatcgca ttgtctgagt aggtgtcatt ctattctggg gggtggggtg gggcaggaca
4800gcaaggggga ggattgggaa gacaatagca ggcatgctgg ggatgcggtg ggctctatgg
4860cttctgaggc ggaaagaacc agctggggct ctagggggta tccccacgcg ccctgtagcg
4920gcgcattaag cgcggcgggt gtggtggtta cgcgcagcgt gaccgctaca cttgccagcg
4980ccctagcgcc cgctcctttc gctttcttcc cttcctttct cgccacgttc gccggctttc
5040cccgtcaagc tctaaatcgg ggcatccctt tagggttccg atttagtgct ttacggcacc
5100tcgaccccaa aaaacttgat tagggtgatg gttcacgtag tgggccatcg ccctgataga
5160cggtttttcg ccctttgacg ttggagtcca cgttctttaa tagtggactc ttgttccaaa
5220ctggaacaac actcaaccct atctcggtct attcttttga tttataaggg attttgggga
5280tttcggccta ttggttaaaa aatgagctga tttaacaaaa atttaacgcg aattaattct
5340gtggaatgtg tgtcagttag ggtgtggaaa gtccccaggc tccccaggca ggcagaagta
5400tgcaaagcat gcatctcaat tagtcagcaa ccaggtgtgg aaagtcccca ggctccccag
5460caggcagaag tatgcaaagc atgcatctca attagtcagc aaccatagtc ccgcccctaa
5520ctccgcccat cccgccccta actccgccca gttccgccca ttctccgccc catggctgac
5580taattttttt tatttatgca gaggccgagg ccgcctctgc ctctgagcta ttccagaagt
5640agtgaggagg cttttttgga ggcctaggct tttgcaaaaa gctcccggga gcttgtatat
5700ccattttcgg atctgatcag cacgtgttga caattaatca tcggcatagt atatcggcat
5760agtataatac gacaaggtga ggaactaaac catggccaag ttgaccagtg ccgttccggt
5820gctcaccgcg cgcgacgtcg ccggagcggt cgagttctgg accgaccggc tcgggttctc
5880ccgggacttc gtggaggacg acttcgccgg tgtggtccgg gacgacgtga ccctgttcat
5940cagcgcggtc caggaccagg tggtgccgga caacaccctg gcctgggtgt gggtgcgcgg
6000cctggacgag ctgtacgccg agtggtcgga ggtcgtgtcc acgaacttcc gggacgcctc
6060cgggccggcc atgaccgaga tcggcgagca gccgtggggg cgggagttcg ccctgcgcga
6120cccggccggc aactgcgtgc acttcgtggc cgaggagcag gactgacacg tgctacgaga
6180tttcgattcc accgccgcct tctatgaaag gttgggcttc ggaatcgttt tccgggacgc
6240cggctggatg atcctccagc gcggggatct catgctggag ttcttcgccc accccaactt
6300gtttattgca gcttataatg gttacaaata aagcaatagc atcacaaatt tcacaaataa
6360agcatttttt tcactgcatt ctagttgtgg tttgtccaaa ctcatcaatg tatcttatca
6420tgtctgtata ccgtcgacct ctagctagag cttggcgtaa tcatggtcat agctgtttcc
6480tgtgtgaaat tgttatccgc tcacaattcc acacaacata cgagccggaa gcataaagtg
6540taaagcctgg ggtgcctaat gagtgagcta actcacatta attgcgttgc gctcactgcc
6600cgctttccag tcgggaaacc tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg
6660gagaggcggt ttgcgtattg ggcgctcttc cgcttcctcg ctcactgact cgctgcgctc
6720ggtcgttcgg ctgcggcgag cggtatcagc tcactcaaag gcggtaatac ggttatccac
6780agaatcaggg gataacgcag gaaagaacat gtgagcaaaa ggccagcaaa aggccaggaa
6840ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc cgcccccctg acgagcatca
6900caaaaatcga cgctcaagtc agaggtggcg aaacccgaca ggactataaa gataccaggc
6960gtttccccct ggaagctccc tcgtgcgctc tcctgttccg accctgccgc ttaccggata
7020cctgtccgcc tttctccctt cgggaagcgt ggcgctttct caatgctcac gctgtaggta
7080tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac cccccgttca
7140gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag tccaacccgg taagacacga
7200cttatcgcca ctggcagcag ccactggtaa caggattagc agagcgaggt atgtaggcgg
7260tgctacagag ttcttgaagt ggtggcctaa ctacggctac actagaagga cagtatttgg
7320tatctgcgct ctgctgaagc cagttacctt cggaaaaaga gttggtagct cttgatccgg
7380caaacaaacc accgctggta gcggtggttt ttttgtttgc aagcagcaga ttacgcgcag
7440aaaaaaagga tctcaagaag atcctttgat cttttctacg gggtctgacg ctcagtggaa
7500cgaaaactca cgttaaggga ttttggtcat gagattatca aaaaggatct tcacctagat
7560ccttttaaat taaaaatgaa gttttaaatc aatctaaagt atatatgagt aaacttggtc
7620tgacagttac caatgcttaa tcagtgaggc acctatctca gcgatctgtc tatttcgttc
7680atccatagtt gcctgactcc ccgtcgtgta gataactacg atacgggagg gcttaccatc
7740tggccccagt gctgcaatga taccgcgaga cccacgctca ccggctccag atttatcagc
7800aataaaccag ccagccggaa gggccgagcg cagaagtggt cctgcaactt tatccgcctc
7860catccagtct attaattgtt gccgggaagc tagagtaagt agttcgccag ttaatagttt
7920gcgcaacgtt gttgccattg ctacaggcat cgtggtgtca cgctcgtcgt ttggtatggc
7980ttcattcagc tccggttccc aacgatcaag gcgagttaca tgatccccca tgttgtgcaa
8040aaaagcggtt agctccttcg gtcctccgat cgttgtcaga agtaagttgg ccgcagtgtt
8100atcactcatg gttatggcag cactgcataa ttctcttact gtcatgccat ccgtaagatg
8160cttttctgtg actggtgagt actcaaccaa gtcattctga gaatagtgta tgcggcgacc
8220gagttgctct tgcccggcgt caatacggga taataccgcg ccacatagca gaactttaaa
8280agtgctcatc attggaaaac gttcttcggg gcgaaaactc tcaaggatct taccgctgtt
8340gagatccagt tcgatgtaac ccactcgtgc acccaactga tcttcagcat cttttacttt
8400caccagcgtt tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa agggaataag
8460ggcgacacgg aaatgttgaa tactcatact cttccttttt caatattatt gaagcattta
8520tcagggttat tgtctcatga gcggatacat atttgaatgt atttagaaaa ataaacaaat
8580aggggttccg cgcacatttc cccgaaaagt gccacctgac gtcgacggat cgggagatct
8640cccgatcccc tatggtgcac tctcagtaca atctgctctg atgccgcata gttaagccag
8700tatctgctcc ctgcttgtgt gttggaggtc gctgagtagt gcgcgagcaa aatttaagct
8760acaacaaggc aaggcttgac cgacaattgc atgaagaatc tgcttagggt taggcgtttt
8820gcgctgcttc gcgatgtacg ggccagatat acgcgttgac attgattatt gactagttat
8880taatagtaat caattacggg gtcattagtt catagcccat atatggagtt ccgcgttaca
8940taacttacgg taaatggccc gcctggctga ccgcccaacg acccccgccc attgacgtca
9000ataatgacgt atgttcccat agtaacgcca atagggactt tccattgacg tcaatgggtg
9060gactatttac ggtaaactgc ccacttggca gtacatcaag tgtatcatat gccaagtacg
9120ccccctattg acgtcaatga cggtaaatgg cccgcctggc attatgccca gtacatgacc
9180ttatgggact ttcctacttg gcagtacatc tacgtattag tcatcgctat taccatggtg
9240atgcggtttt ggcagtacat caatgggcgt ggatagcggt ttgactcacg gggatttcca
9300agtctccacc ccattgacgt caatgggagt ttgttttggc accaaaatca acgggacttt
9360ccaaaatgtc gtaacaactc cgccccattg acgcaaatgg gcggtaggcg tgtacggtgg
9420gaggtctata taagcagagc ggtttaaact taagcttggt accgagctca ctagtccagt
9480gtggtggcag atatccagca cagtggcggc cgctcgagtc tagagggccc gttttgcctg
9540tactgggtct ctctggttag accagatctg agcctgggag ctctctggct aactagggaa
9600cccactgctt aagcctcaat aaagcttgcc ttgagtgctt caagtagtgt gtgcccgtct
9660gttgtgtgac tctggtaact agagatccct cagacccttt tagtcagtgt ggaaaatctc
9720tagcagtggc gcccgaacag ggacttgaaa gcgaaaggga aaccagagga gctctctcga
9780cgcaggactc ggcttgctga agcgcgcacg gcaagaggcg aggggcggcg actggtgagt
9840acgccaaaaa ttttgactag cggaggctag aaggagagag atgggtgcga gagcgtcagt
9900attaagcggg ggagaattag atcgcgatgg gaaaaaattc ggttaaggcc agggggaaag
9960aaaaaatata aattaaaaca tatagtatgg gcaagcaggg agctagaacg attcgcagtt
10020aatcctggcc tgttagaaac atcagaaggc tgtagacaaa tactgggaca gctacaacca
10080tcccttcaga caggatcaga agaacttaga tcattatata atacagtagc aaccctctat
10140tgtgtgcatc aaaggataga gataaaagac accaaggaag ctttagacaa gatagaggaa
10200gagcaaaaca aaagtaagac caccgcacag caagcggccg ctgatcttca gacctggagg
10260aggagatatg agggacaatt ggagaagtga attatataaa tataaagtag taaaaattga
10320accattagga gtagcaccca ccaaggcaaa gagaagagtg gtgcagagag aaaaaagagc
10380agtgggaata ggagctttgt tccttgggtt cttgggagca gcaggaagca ctatgggcgc
10440agcgtcaatg acgctgacgg tacaggccag acaattattg tctggtatag tgcagcagca
10500gaacaatttg ctgagggcta ttgaggcgca acagcatctg ttgcaactca cagtctgggg
10560catcaagcag ctccaggcaa gaatcctggc tgtggaaaga tacctaaagg atcaacagct
10620cctggggatt tggggttgct ctggaaaact catttgcacc actgctgtgc cttggaatgc
10680tagttggagt aataaatctc tggaacagat ttggaatcac acgacctgga tggagtggga
10740cagagaaatt aacaattaca caagcttaat acactcctta attgaagaat cgcaaaacca
10800gcaagaaaag aatgaacaag aattattgga attagataaa tgggcaagtt tgtggaattg
10860gtttaacata acaaattggc tgtggtatat agaaattatt cataatgata gtaggaggct
10920tggtaggttt aagaatagtt tttgctgtac tttctatagt gaatagagtt aggcagggat
10980attcaccatt atcgtttcag acccacctcc caaccccgag gggacccgac aggcccgaag
11040gaatagaaga agaaggtgga gagagagaca gagacagatc cattcgatta gtgaacggat
11100cggcactgcg tgcgccaatt ctgcagacaa atggcagtat tcatccacaa ttttaaaaga
11160aaagggggga ttgggggg
111781310573DNAArtificial SequenceSynthetic 13aatgaagcta ctgtcttcta
tcgaacaagc atgcgatatt tgccgactta aaaagctcaa 60gtgctccaaa gaaaaaccga
agtgcgccaa gtgtctgaag aacaactggg agtgtcgcta 120ctctcccaaa accaaaaggt
ctccgctgac tagggcacat ctgacagaag tggaatcaag 180gctagaaaga ctggaacagc
tatttctact gatttttcct cgagaagacc ttgacatgat 240tttgaaaatg gattctttac
aggatataaa agcattgtta acaggattat ttgtacaaga 300taatgtgaat aaagatgccg
tcacagatag attggcttca gtggagactg atatgcctct 360aacattgaga cagcatagaa
taagtgcgac atcatcatcg gaagagagta gtaacaaagg 420tcaaagacag ttgactgtat
cgattgactc ggcagctcat catgataact ccacaattcc 480gttggatttt atgcccaggg
atgctcttca tggatttgat tggtctgaag aggatgacat 540gtcggatggc ttgcccttcc
tgaaaacgga ccccaacaat aatgggttct ttggcgacgg 600ttctctctta tgtattcttc
gatctattgg ctttaaaccg gaaaattaca cgaactctaa 660cgttaacagg ctcccgacca
tgattacgga tagatacacg ttggcttcta gatccacaac 720atcccgttta cttcaaagtt
atctcaataa ttttcacccc tactgcccta tcgtgcactc 780accgacgcta atgatgttgt
ataataacca gattgaaatc gcgtcgaagg atcaatggca 840aatccttttt aactgcatat
tagccattgg agcctggtgt atagaggggg aatctactga 900tatagatgtt ttttactatc
aaaatgctaa atctcatttg acgagcaagg tcttcgagtc 960aggttccata attttggtga
cagccctaca tcttctgtcg cgatatacac agtggaggca 1020gaaaacaaat actagctata
attttcacag cttttccata agaatggcca tatcattggg 1080cttgaatagg gacctcccct
cgtccttcag tgatagcagc attctggaac aaagacgccg 1140aatttggtgg tctgtctact
cttgggagat ccaattgtcc ctgctttatg gtcgatccat 1200ccagctttct cagaatacaa
tctccttccc ttcttctgtc gacgatgtgc agcgtaccac 1260aacaggtccc accatatatc
atggcatcat tgaaacagca aggctcttac aagttttcac 1320aaaaatctat gaactagaca
aaacagtaac tgcagaaaaa agtcctatat gtgcaaaaaa 1380atgcttgatg atttgtaatg
agattgagga ggtttcgaga caggcaccaa agtttttaca 1440aatggatatt tccaccaccg
ctctaaccaa tttgttgaag gaacaccctt ggctatcctt 1500tacaagattc gaactgaagt
ggaaacagtt gtctcttatc atttatgtat taagagattt 1560tttcactaat tttacccaga
aaaagtcaca actagaacag gatcaaaatg atcatcaaag 1620ttatgaagtt aaacgatgct
ccatcatgtt aagcgatgca gcacaaagaa ctgttatgtc 1680tgtaagtagc tatatggaca
atcataatgt caccccatat tttgcctgga attgttctta 1740ttacttgttc aatgcagtcc
tagtacccat aaagactcta ctctcaaact caaaatcgaa 1800tgctgagaat aacgagaccg
cacaattatt acaacaaatt aacactgttc tgatgctatt 1860aaaaaaactg gccactttta
aaatccagac ttgtgaaaaa tacattcaag tactggaaga 1920ggtatgtgcg ccgtttctgt
tatcacagtg tgcaatccca ttaccgcata tcagttataa 1980caatagtaat ggtagcgcca
ttaaaaatat tgtcggttct gcaactatcg cccaataccc 2040tactcttccg gaggaaaatg
tcaacaatat cagtgttaaa tatgtttctc ctggctcagt 2100agggccttca cctgtgccat
tgaaatcagg agcaagtttc agtgatctag tcaagctgtt 2160atctaaccgt ccaccctctc
gtaactctcc agtgacaata ccaagaagca caccttcgca 2220tcgctcagtc acgccttttc
tagggcaaca gcaacagctg caatcattag tgccactgac 2280cccgtctgct ttgtttggtg
gcgccaattt taatcaaagt gggaatattg ctgatagctc 2340attgtccttc actttcacta
acagtagcaa cggtccgaac ctcataacaa ctcaaacaaa 2400ttctcaagcg ctttcacaac
caattgcctc ctctaacgtt catgataact tcatgaataa 2460tgaaatcacg gctagtaaaa
ttgatgatgg taataattca aaaccactgt cacctggttg 2520gacggaccaa actgcgtata
acgcgtttgg aatcactaca gggatgttta ataccactac 2580aatggatgat gtatataact
atctattcga tgatgaagat accccaccaa acccaaaaaa 2640agagtaagaa ttcgatatca
agcttatcga taatcaacct ctggattaca aaatttgtga 2700aagattgact ggtattctta
actatgttgc tccttttacg ctatgtggat acgctgcttt 2760aatgcctttg tatcatgcta
ttacttcccg tacggctttc attttctcct ccttgtataa 2820atcctggttg ctgtctcttt
atgaggagtt gtggcccgtt gtcaggcaac gtggcgtggt 2880gtgcactgtg tttgctgacg
caacccccac tggttggggc attgccacca cctatcaact 2940cctttccggg actttcgctt
tccccctccc tattgccacg gcggaactca ttgccgcctg 3000ccttgcccgc tgctggacag
gggctcggct gttgggcact gacaattccg tggtgttgtc 3060ggggaagctg acgtcctttc
catggctgct cgcctgtgtt gccaactgga ttctgcgcgg 3120gacgtccttc tgctacgtcc
cttcggccct caatccagcg gaccttcctt cccgcggcct 3180gctgccggtt ctgcggcctc
ttccgcgtct tcgccttcgc cctcagacga gtcggatctc 3240cctttgggcc gcctccccgc
ctgcctgcag gtttgtcgag acctagaaaa acatggagca 3300atcacaagta gcaatacagc
agctaccaat gctgattgtg cctggctaga agcacaagag 3360gaggaggagg tgggttttcc
agtcacacct caggtacctt taagaccaat gacttacaag 3420gcagctgtag atcttagcca
ctttttaaaa gaaaaggggg gactggaagg gctaattcac 3480tcccaacgaa gacaagatct
gctttttgct tgtactgggt ctctctggtt agaccagatc 3540tgagcctggg agctctctgg
ctaactaggg aacccactgc ttaagcctca ataaagcttg 3600ccttgagtgc ttcaagtagt
gtgtgcccgt ctgttgtgtg actctggtaa ctagagatcc 3660ctcagaccct tttagtcagt
gtggaaaatc tctagcaggg cccgtttaaa cccgctgatc 3720agcctcgact gtgccttcta
gttgccagcc atctgttgtt tgcccctccc ccgtgccttc 3780cttgaccctg gaaggtgcca
ctcccactgt cctttcctaa taaaatgagg aaattgcatc 3840gcattgtctg agtaggtgtc
attctattct ggggggtggg gtggggcagg acagcaaggg 3900ggaggattgg gaagacaata
gcaggcatgc tggggatgcg gtgggctcta tggcttctga 3960ggcggaaaga accagctggg
gctctagggg gtatccccac gcgccctgta gcggcgcatt 4020aagcgcggcg ggtgtggtgg
ttacgcgcag cgtgaccgct acacttgcca gcgccctagc 4080gcccgctcct ttcgctttct
tcccttcctt tctcgccacg ttcgccggct ttccccgtca 4140agctctaaat cggggcatcc
ctttagggtt ccgatttagt gctttacggc acctcgaccc 4200caaaaaactt gattagggtg
atggttcacg tagtgggcca tcgccctgat agacggtttt 4260tcgccctttg acgttggagt
ccacgttctt taatagtgga ctcttgttcc aaactggaac 4320aacactcaac cctatctcgg
tctattcttt tgatttataa gggattttgg ggatttcggc 4380ctattggtta aaaaatgagc
tgatttaaca aaaatttaac gcgaattaat tctgtggaat 4440gtgtgtcagt tagggtgtgg
aaagtcccca ggctccccag gcaggcagaa gtatgcaaag 4500catgcatctc aattagtcag
caaccaggtg tggaaagtcc ccaggctccc cagcaggcag 4560aagtatgcaa agcatgcatc
tcaattagtc agcaaccata gtcccgcccc taactccgcc 4620catcccgccc ctaactccgc
ccagttccgc ccattctccg ccccatggct gactaatttt 4680ttttatttat gcagaggccg
aggccgcctc tgcctctgag ctattccaga agtagtgagg 4740aggctttttt ggaggcctag
gcttttgcaa aaagctcccg ggagcttgta tatccatttt 4800cggatctgat cagcacgtgt
tgacaattaa tcatcggcat agtatatcgg catagtataa 4860tacgacaagg tgaggaacta
aaccatggcc aagttgacca gtgccgttcc ggtgctcacc 4920gcgcgcgacg tcgccggagc
ggtcgagttc tggaccgacc ggctcgggtt ctcccgggac 4980ttcgtggagg acgacttcgc
cggtgtggtc cgggacgacg tgaccctgtt catcagcgcg 5040gtccaggacc aggtggtgcc
ggacaacacc ctggcctggg tgtgggtgcg cggcctggac 5100gagctgtacg ccgagtggtc
ggaggtcgtg tccacgaact tccgggacgc ctccgggccg 5160gccatgaccg agatcggcga
gcagccgtgg gggcgggagt tcgccctgcg cgacccggcc 5220ggcaactgcg tgcacttcgt
ggccgaggag caggactgac acgtgctacg agatttcgat 5280tccaccgccg ccttctatga
aaggttgggc ttcggaatcg ttttccggga cgccggctgg 5340atgatcctcc agcgcgggga
tctcatgctg gagttcttcg cccaccccaa cttgtttatt 5400gcagcttata atggttacaa
ataaagcaat agcatcacaa atttcacaaa taaagcattt 5460ttttcactgc attctagttg
tggtttgtcc aaactcatca atgtatctta tcatgtctgt 5520ataccgtcga cctctagcta
gagcttggcg taatcatggt catagctgtt tcctgtgtga 5580aattgttatc cgctcacaat
tccacacaac atacgagccg gaagcataaa gtgtaaagcc 5640tggggtgcct aatgagtgag
ctaactcaca ttaattgcgt tgcgctcact gcccgctttc 5700cagtcgggaa acctgtcgtg
ccagctgcat taatgaatcg gccaacgcgc ggggagaggc 5760ggtttgcgta ttgggcgctc
ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt 5820cggctgcggc gagcggtatc
agctcactca aaggcggtaa tacggttatc cacagaatca 5880ggggataacg caggaaagaa
catgtgagca aaaggccagc aaaaggccag gaaccgtaaa 5940aaggccgcgt tgctggcgtt
tttccatagg ctccgccccc ctgacgagca tcacaaaaat 6000cgacgctcaa gtcagaggtg
gcgaaacccg acaggactat aaagatacca ggcgtttccc 6060cctggaagct ccctcgtgcg
ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc 6120gcctttctcc cttcgggaag
cgtggcgctt tctcaatgct cacgctgtag gtatctcagt 6180tcggtgtagg tcgttcgctc
caagctgggc tgtgtgcacg aaccccccgt tcagcccgac 6240cgctgcgcct tatccggtaa
ctatcgtctt gagtccaacc cggtaagaca cgacttatcg 6300ccactggcag cagccactgg
taacaggatt agcagagcga ggtatgtagg cggtgctaca 6360gagttcttga agtggtggcc
taactacggc tacactagaa ggacagtatt tggtatctgc 6420gctctgctga agccagttac
cttcggaaaa agagttggta gctcttgatc cggcaaacaa 6480accaccgctg gtagcggtgg
tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa 6540ggatctcaag aagatccttt
gatcttttct acggggtctg acgctcagtg gaacgaaaac 6600tcacgttaag ggattttggt
catgagatta tcaaaaagga tcttcaccta gatcctttta 6660aattaaaaat gaagttttaa
atcaatctaa agtatatatg agtaaacttg gtctgacagt 6720taccaatgct taatcagtga
ggcacctatc tcagcgatct gtctatttcg ttcatccata 6780gttgcctgac tccccgtcgt
gtagataact acgatacggg agggcttacc atctggcccc 6840agtgctgcaa tgataccgcg
agacccacgc tcaccggctc cagatttatc agcaataaac 6900cagccagccg gaagggccga
gcgcagaagt ggtcctgcaa ctttatccgc ctccatccag 6960tctattaatt gttgccggga
agctagagta agtagttcgc cagttaatag tttgcgcaac 7020gttgttgcca ttgctacagg
catcgtggtg tcacgctcgt cgtttggtat ggcttcattc 7080agctccggtt cccaacgatc
aaggcgagtt acatgatccc ccatgttgtg caaaaaagcg 7140gttagctcct tcggtcctcc
gatcgttgtc agaagtaagt tggccgcagt gttatcactc 7200atggttatgg cagcactgca
taattctctt actgtcatgc catccgtaag atgcttttct 7260gtgactggtg agtactcaac
caagtcattc tgagaatagt gtatgcggcg accgagttgc 7320tcttgcccgg cgtcaatacg
ggataatacc gcgccacata gcagaacttt aaaagtgctc 7380atcattggaa aacgttcttc
ggggcgaaaa ctctcaagga tcttaccgct gttgagatcc 7440agttcgatgt aacccactcg
tgcacccaac tgatcttcag catcttttac tttcaccagc 7500gtttctgggt gagcaaaaac
aggaaggcaa aatgccgcaa aaaagggaat aagggcgaca 7560cggaaatgtt gaatactcat
actcttcctt tttcaatatt attgaagcat ttatcagggt 7620tattgtctca tgagcggata
catatttgaa tgtatttaga aaaataaaca aataggggtt 7680ccgcgcacat ttccccgaaa
agtgccacct gacgtcgacg gatcgggaga tctcccgatc 7740ccctatggtg cactctcagt
acaatctgct ctgatgccgc atagttaagc cagtatctgc 7800tccctgcttg tgtgttggag
gtcgctgagt agtgcgcgag caaaatttaa gctacaacaa 7860ggcaaggctt gaccgacaat
tgcatgaaga atctgcttag ggttaggcgt tttgcgctgc 7920ttcgcgatgt acgggccaga
tatacgcgtt gacattgatt attgactagt tattaatagt 7980aatcaattac ggggtcatta
gttcatagcc catatatgga gttccgcgtt acataactta 8040cggtaaatgg cccgcctggc
tgaccgccca acgacccccg cccattgacg tcaataatga 8100cgtatgttcc catagtaacg
ccaataggga ctttccattg acgtcaatgg gtggactatt 8160tacggtaaac tgcccacttg
gcagtacatc aagtgtatca tatgccaagt acgcccccta 8220ttgacgtcaa tgacggtaaa
tggcccgcct ggcattatgc ccagtacatg accttatggg 8280actttcctac ttggcagtac
atctacgtat tagtcatcgc tattaccatg gtgatgcggt 8340tttggcagta catcaatggg
cgtggatagc ggtttgactc acggggattt ccaagtctcc 8400accccattga cgtcaatggg
agtttgtttt ggcaccaaaa tcaacgggac tttccaaaat 8460gtcgtaacaa ctccgcccca
ttgacgcaaa tgggcggtag gcgtgtacgg tgggaggtct 8520atataagcag agctctctgg
ctaactagag aacccactgc ttactggctt atcgaaatta 8580atacgactca ctatagggag
acccaagctg gtttaaactt aagcttggta ccgagctcac 8640tagtccagtg tggtggcaga
tatccagcac agtggcggcc gctcgagggg cccgttttgc 8700ctgtactggg tctctctggt
tagaccagat ctgagcctgg gagctctctg gctaactagg 8760gaacccactg cttaagcctc
aataaagctt gccttgagtg cttcaagtag tgtgtgcccg 8820tctgttgtgt gactctggta
actagagatc cctcagaccc ttttagtcag tgtggaaaat 8880ctctagcagt ggcgcccgaa
cagggacttg aaagcgaaag ggaaaccaga ggagctctct 8940cgacgcagga ctcggcttgc
tgaagcgcgc acggcaagag gcgaggggcg gcgactggtg 9000agtacgccaa aaattttgac
tagcggaggc tagaaggaga gagatgggtg cgagagcgtc 9060agtattaagc gggggagaat
tagatcgcga tgggaaaaaa ttcggttaag gccaggggga 9120aagaaaaaat ataaattaaa
acatatagta tgggcaagca gggagctaga acgattcgca 9180gttaatcctg gcctgttaga
aacatcagaa ggctgtagac aaatactggg acagctacaa 9240ccatcccttc agacaggatc
agaagaactt agatcattat ataatacagt agcaaccctc 9300tattgtgtgc atcaaaggat
agagataaaa gacaccaagg aagctttaga caagatagag 9360gaagagcaaa acaaaagtaa
gaccaccgca cagcaagcgg ccgctgatct tcagacctgg 9420aggaggagat atgagggaca
attggagaag tgaattatat aaatataaag tagtaaaaat 9480tgaaccatta ggagtagcac
ccaccaaggc aaagagaaga gtggtgcaga gagaaaaaag 9540agcagtggga ataggagctt
tgttccttgg gttcttggga gcagcaggaa gcactatggg 9600cgcagcgtca atgacgctga
cggtacaggc cagacaatta ttgtctggta tagtgcagca 9660gcagaacaat ttgctgaggg
ctattgaggc gcaacagcat ctgttgcaac tcacagtctg 9720gggcatcaag cagctccagg
caagaatcct ggctgtggaa agatacctaa aggatcaaca 9780gctcctgggg atttggggtt
gctctggaaa actcatttgc accactgctg tgccttggaa 9840tgctagttgg agtaataaat
ctctggaaca gatttggaat cacacgacct ggatggagtg 9900ggacagagaa attaacaatt
acacaagctt aatacactcc ttaattgaag aatcgcaaaa 9960ccagcaagaa aagaatgaac
aagaattatt ggaattagat aaatgggcaa gtttgtggaa 10020ttggtttaac ataacaaatt
ggctgtggta tataaaatta ttcataatga tagtaggagg 10080cttggtaggt ttaagaatag
tttttgctgt actttctata gtgaatagag ttaggcaggg 10140atattcacca ttatcgtttc
agacccacct cccaaccccg aggggacccg acaggccctt 10200aattaatccc ctgattctgt
ggataaccgt attaccgcct ttgagtgagc tgcacctagt 10260acggattaga agccgccgag
cgggtgacag ccctccgaag gaagactctc ctccgtgcgt 10320cctcgtcttc accggtcgcg
ttcctgaaac gcagatgtgc ctcgcgccgc actgctccga 10380acaatgtcga ctctagaggt
aaactcgacc tatataagca gagctcgttt agtgaaccgt 10440cagatcgcct ggagacgcca
tccacgctgt tttgacctcc atagaagaca ccgggaccga 10500tccagcctcc gcggccccga
attgaattcg aagcaggagt agacgccggc cattacggcc 10560tgccaccgtt aat
105731411022DNAArtificial
SequenceSynthetic 14cgcgatcaca tggtcctgct ggagttcgtg accgccgccg
ggatcactct cggcatggac 60gagctgtaca agtaagcggc cgcaatcaac ctctggatta
caaaatttgt gaaagattga 120ctggtattct taactatgtt gctcctttta cgctatgtgg
atacgctgct ttaatgcctt 180tgtatcatgc tattacttcc cgtacggctt tcattttctc
ctccttgtat aaatcctggt 240tgctgtctct ttatgaggag ttgtggcccg ttgtcaggca
acgtggcgtg gtgtgcactg 300tgtttgctga cgcaaccccc actggttggg gcattgccac
cacctatcaa ctcctttccg 360ggactttcgc tttccccctc cctattgcca cggcggaact
cattgccgcc tgccttgccc 420gctgctggac aggggctcgg ctgttgggca ctgacaattc
cgtggtgttg tcggggaagc 480tgacgtcctt tccatggctg ctcgcctgtg ttgccaactg
gattctgcgc gggacgtcct 540tctgctacgt cccttcggcc ctcaatccag cggaccttcc
ttcccgcggc ctgctgccgg 600ttctgcggcc tcttccgcgt cttcgccttc gccctcagac
gagtcggatc tccctttggg 660ccgcctcccc gcctgcctgc aggtttgtcg agacctagaa
aaacatggag caatcacaag 720tagcaataca gcagctacca atgctgattg tgcctggcta
gaagcacaag aggaggagga 780ggtgggtttt ccagtcacac ctcaggtacc tttaagacca
atgacttaca aggcagctgt 840agatcttagc cactttttaa aagaaaaggg gggactggaa
gggctaattc actcccaacg 900aagacaagat ctgctttttg cttgtactgg gtctctctgg
ttagaccaga tctgagcctg 960ggagctctct ggctaactag ggaacccact gcttaagcct
caataaagct tgccttgagt 1020gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt
aactagagat ccctcagacc 1080cttttagtca gtgtggaaaa tctctagcag ggcccgttta
aacccgctga tcagcctcga 1140ctgtgccttc tagttgccag ccatctgttg tttgcccctc
ccccgtgcct tccttgaccc 1200tggaaggtgc cactcccact gtcctttcct aataaaatga
ggaaattgca tcgcattgtc 1260tgagtaggtg tcattctatt ctggggggtg gggtggggca
ggacagcaag ggggaggatt 1320gggaagacaa tagcaggcat gctggggatg cggtgggctc
tatggcttct gaggcggaaa 1380gaaccagctg gggctctagg gggtatcccc acgcgccctg
tagcggcgca ttaagcgcgg 1440cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc
cagcgcccta gcgcccgctc 1500ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg
ctttccccgt caagctctaa 1560atcggggcat ccctttaggg ttccgattta gtgctttacg
gcacctcgac cccaaaaaac 1620ttgattaggg tgatggttca cgtagtgggc catcgccctg
atagacggtt tttcgccctt 1680tgacgttgga gtccacgttc tttaatagtg gactcttgtt
ccaaactgga acaacactca 1740accctatctc ggtctattct tttgatttat aagggatttt
ggggatttcg gcctattggt 1800taaaaaatga gctgatttaa caaaaattta acgcgaatta
attctgtgga atgtgtgtca 1860gttagggtgt ggaaagtccc caggctcccc aggcaggcag
aagtatgcaa agcatgcatc 1920tcaattagtc agcaaccagg tgtggaaagt ccccaggctc
cccagcaggc agaagtatgc 1980aaagcatgca tctcaattag tcagcaacca tagtcccgcc
cctaactccg cccatcccgc 2040ccctaactcc gcccagttcc gcccattctc cgccccatgg
ctgactaatt ttttttattt 2100atgcagaggc cgaggccgcc tctgcctctg agctattcca
gaagtagtga ggaggctttt 2160ttggaggcct aggcttttgc aaaaagctcc cgggagcttg
tatatccatt ttcggatctg 2220atcagcacgt gttgacaatt aatcatcggc atagtatatc
ggcatagtat aatacgacaa 2280ggtgaggaac taaaccatgg ccaagttgac cagtgccgtt
ccggtgctca ccgcgcgcga 2340cgtcgccgga gcggtcgagt tctggaccga ccggctcggg
ttctcccggg acttcgtgga 2400ggacgacttc gccggtgtgg tccgggacga cgtgaccctg
ttcatcagcg cggtccagga 2460ccaggtggtg ccggacaaca ccctggcctg ggtgtgggtg
cgcggcctgg acgagctgta 2520cgccgagtgg tcggaggtcg tgtccacgaa cttccgggac
gcctccgggc cggccatgac 2580cgagatcggc gagcagccgt gggggcggga gttcgccctg
cgcgacccgg ccggcaactg 2640cgtgcacttc gtggccgagg agcaggactg acacgtgcta
cgagatttcg attccaccgc 2700cgccttctat gaaaggttgg gcttcggaat cgttttccgg
gacgccggct ggatgatcct 2760ccagcgcggg gatctcatgc tggagttctt cgcccacccc
aacttgttta ttgcagctta 2820taatggttac aaataaagca atagcatcac aaatttcaca
aataaagcat ttttttcact 2880gcattctagt tgtggtttgt ccaaactcat caatgtatct
tatcatgtct gtataccgtc 2940gacctctagc tagagcttgg cgtaatcatg gtcatagctg
tttcctgtgt gaaattgtta 3000tccgctcaca attccacaca acatacgagc cggaagcata
aagtgtaaag cctggggtgc 3060ctaatgagtg agctaactca cattaattgc gttgcgctca
ctgcccgctt tccagtcggg 3120aaacctgtcg tgccagctgc attaatgaat cggccaacgc
gcggggagag gcggtttgcg 3180tattgggcgc tcttccgctt cctcgctcac tgactcgctg
cgctcggtcg ttcggctgcg 3240gcgagcggta tcagctcact caaaggcggt aatacggtta
tccacagaat caggggataa 3300cgcaggaaag aacatgtgag caaaaggcca gcaaaaggcc
aggaaccgta aaaaggccgc 3360gttgctggcg tttttccata ggctccgccc ccctgacgag
catcacaaaa atcgacgctc 3420aagtcagagg tggcgaaacc cgacaggact ataaagatac
caggcgtttc cccctggaag 3480ctccctcgtg cgctctcctg ttccgaccct gccgcttacc
ggatacctgt ccgcctttct 3540cccttcggga agcgtggcgc tttctcaatg ctcacgctgt
aggtatctca gttcggtgta 3600ggtcgttcgc tccaagctgg gctgtgtgca cgaacccccc
gttcagcccg accgctgcgc 3660cttatccggt aactatcgtc ttgagtccaa cccggtaaga
cacgacttat cgccactggc 3720agcagccact ggtaacagga ttagcagagc gaggtatgta
ggcggtgcta cagagttctt 3780gaagtggtgg cctaactacg gctacactag aaggacagta
tttggtatct gcgctctgct 3840gaagccagtt accttcggaa aaagagttgg tagctcttga
tccggcaaac aaaccaccgc 3900tggtagcggt ggtttttttg tttgcaagca gcagattacg
cgcagaaaaa aaggatctca 3960agaagatcct ttgatctttt ctacggggtc tgacgctcag
tggaacgaaa actcacgtta 4020agggattttg gtcatgagat tatcaaaaag gatcttcacc
tagatccttt taaattaaaa 4080atgaagtttt aaatcaatct aaagtatata tgagtaaact
tggtctgaca gttaccaatg 4140cttaatcagt gaggcaccta tctcagcgat ctgtctattt
cgttcatcca tagttgcctg 4200actccccgtc gtgtagataa ctacgatacg ggagggctta
ccatctggcc ccagtgctgc 4260aatgataccg cgagacccac gctcaccggc tccagattta
tcagcaataa accagccagc 4320cggaagggcc gagcgcagaa gtggtcctgc aactttatcc
gcctccatcc agtctattaa 4380ttgttgccgg gaagctagag taagtagttc gccagttaat
agtttgcgca acgttgttgc 4440cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt
atggcttcat tcagctccgg 4500ttcccaacga tcaaggcgag ttacatgatc ccccatgttg
tgcaaaaaag cggttagctc 4560cttcggtcct ccgatcgttg tcagaagtaa gttggccgca
gtgttatcac tcatggttat 4620ggcagcactg cataattctc ttactgtcat gccatccgta
agatgctttt ctgtgactgg 4680tgagtactca accaagtcat tctgagaata gtgtatgcgg
cgaccgagtt gctcttgccc 4740ggcgtcaata cgggataata ccgcgccaca tagcagaact
ttaaaagtgc tcatcattgg 4800aaaacgttct tcggggcgaa aactctcaag gatcttaccg
ctgttgagat ccagttcgat 4860gtaacccact cgtgcaccca actgatcttc agcatctttt
actttcacca gcgtttctgg 4920gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga
ataagggcga cacggaaatg 4980ttgaatactc atactcttcc tttttcaata ttattgaagc
atttatcagg gttattgtct 5040catgagcgga tacatatttg aatgtattta gaaaaataaa
caaatagggg ttccgcgcac 5100atttccccga aaagtgccac ctgacgtcga cggatcggga
gatctcccga tcccctatgg 5160tgcactctca gtacaatctg ctctgatgcc gcatagttaa
gccagtatct gctccctgct 5220tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt
aagctacaac aaggcaaggc 5280ttgaccgaca attgcatgaa gaatctgctt agggttaggc
gttttgcgct gcttcgcgat 5340gtacgggcca gatatacgcg ttgacattga ttattgacta
gttattaata gtaatcaatt 5400acggggtcat tagttcatag cccatatatg gagttccgcg
ttacataact tacggtaaat 5460ggcccgcctg gctgaccgcc caacgacccc cgcccattga
cgtcaataat gacgtatgtt 5520cccatagtaa cgccaatagg gactttccat tgacgtcaat
gggtggacta tttacggtaa 5580actgcccact tggcagtaca tcaagtgtat catatgccaa
gtacgccccc tattgacgtc 5640aatgacggta aatggcccgc ctggcattat gcccagtaca
tgaccttatg ggactttcct 5700acttggcagt acatctacgt attagtcatc gctattacca
tggtgatgcg gttttggcag 5760tacatcaatg ggcgtggata gcggtttgac tcacggggat
ttccaagtct ccaccccatt 5820gacgtcaatg ggagtttgtt ttggcaccaa aatcaacggg
actttccaaa atgtcgtaac 5880aactccgccc cattgacgca aatgggcggt aggcgtgtac
ggtgggaggt ctatataagc 5940agagctctct ggctaactag agaacccact gcttactggc
ttatcgaaat taatacgact 6000cactataggg agacccaagc tggtttaaac ttaagcttgg
taccgagctc actagtccag 6060tgtggtggca gatatccagc acagtggcgg ccgctcgagt
ctagagggcc cgttttgcct 6120gtactgggtc tctctggtta gaccagatct gagcctggga
gctctctggc taactaggga 6180acccactgct taagcctcaa taaagcttgc cttgagtgct
tcaagtagtg tgtgcccgtc 6240tgttgtgtga ctctggtaac tagagatccc tcagaccctt
ttagtcagtg tggaaaatct 6300ctagcagtgg cgcccgaaca gggacttgaa agcgaaaggg
aaaccagagg agctctctcg 6360acgcaggact cggcttgctg aagcgcgcac ggcaagaggc
gaggggcggc gactggtgag 6420tacgccaaaa attttgacta gcggaggcta gaaggagaga
gatgggtgcg agagcgtcag 6480tattaagcgg gggagaatta gatcgcgatg ggaaaaaatt
cggttaaggc cagggggaaa 6540gaaaaaatat aaattaaaac atatagtatg ggcaagcagg
gagctagaac gattcgcagt 6600taatcctggc ctgttagaaa catcagaagg ctgtagacaa
atactgggac agctacaacc 6660atcccttcag acaggatcag aagaacttag atcattatat
aatacagtag caaccctcta 6720ttgtgtgcat caaaggatag agataaaaga caccaaggaa
gctttagaca agatagagga 6780agagcaaaac aaaagtaaga ccaccgcaca gcaagcggcc
gctgatcttc agacctggag 6840gaggagatat gagggacaat tggagaagtg aattatataa
atataaagta gtaaaaattg 6900aaccattagg agtagcaccc accaaggcaa agagaagagt
ggtgcagaga gaaaaaagag 6960cagtgggaat aggagctttg ttccttgggt tcttgggagc
agcaggaagc actatgggcg 7020cagcgtcaat gacgctgacg gtacaggcca gacaattatt
gtctggtata gtgcagcagc 7080agaacaattt gctgagggct attgaggcgc aacagcatct
gttgcaactc acagtctggg 7140gcatcaagca gctccaggca agaatcctgg ctgtggaaag
atacctaaag gatcaacagc 7200tcctggggat ttggggttgc tctggaaaac tcatttgcac
cactgctgtg ccttggaatg 7260ctagttggag taataaatct ctggaacaga tttggaatca
cacgacctgg atggagtggg 7320acagagaaat taacaattac acaagcttaa tacactcctt
aattgaagaa tcgcaaaacc 7380agcaagaaaa gaatgaacaa gaattattgg aattagataa
atgggcaagt ttgtggaatt 7440ggtttaacat aacaaattgg ctgtggtata tagaaattat
tcataatgat agtaggaggc 7500ttggtaggtt taagaatagt ttttgctgta ctttctatag
tgaatagagt taggcaggga 7560tattcaccat tatcgtttca gacccacctc ccaaccccga
ggggacccga caggcccgaa 7620ggaatagaag aagaaggtgg agagagagac agagacagat
ccattcgatt agtgaacgga 7680tcggcactgc gtgcgccaat tctgcagaca aatggcagta
ttcatccaca attttaaaag 7740aaaagggggg attggggggt acagtgcagg ggaaagaata
gtagacataa tagcaacaga 7800catacaaact aaagaattac aaaaacaaat tacaaaaatt
caaaattttc gggtttatta 7860cagggacagc agagatccag tttggggttg ctctggaaaa
ctcatttgca ccactgctgt 7920gccttggaat gctagttgga gtaataaatc tctggaacag
atttggaatc acacgacctg 7980gatggagtgg gacagagaaa ttaacaatta cacaagctta
atacactcct taattgaaga 8040atcgcaaaac cagcaagaaa agaatgaaca agaattattg
gaattagata aatgggcaag 8100tttgtggaat tggtttaaca taacaaattg gctgtggtat
ataaaattat tcataatgat 8160agtaggaggc ttggtaggtt taagaatagt ttttgctgta
ctttctatag tgaatagagt 8220taggcaggga tattcaccat tatcgtttca gacccacctc
ccaaccccga ggggacccga 8280caggccctta attaattggc tccggtgccc gtcagtgggc
agagcgcaca tcgcccacag 8340tccccgagaa gttgggggga ggggtcggca attgaaccgg
tgcctagaga aggtggcgcg 8400gggtaaactg ggaaagtgat gtcgtgtact ggctccgcct
ttttcccgag ggtgggggag 8460aaccgtatat aagtgcagta gtcgccgtga agctagctcc
ctatcagtga tagagatctc 8520cctatcagtg atagagagct agccgttctt tttcgcaacg
ggtttgccgc cagaacacag 8580gtaagtgccg tgtgtggttc ccgcgggcct ggcctcttta
cgggttatgg cccttgcgtg 8640ccttgaatta cttccacctg gctgcagtac gtgattcttg
atcccgagct tcgggttgga 8700agtgggtggg agagttcgag gccttgcgct taaggagccc
cttcgcctcg tgcttgagtt 8760gaggcctggc ctgggcgctg gggccgccgc gtgcgaatct
ggtggcacct tcgcgcctgt 8820ctcgctgctt tcgataagtc tctagccatt taaaattttt
gatgacctgc tgcgacgctt 8880tttttctggc aagatagtct tgtaaatgcg ggccaagatc
tgcacactgg tatttcggtt 8940tttggggccg cgggcggcga cggggcccgt gcgtcccagc
gcacatgttc ggcgaggcgg 9000ggcctgcgag cgcggccacc gagaatcgga cgggggtagt
ctcaagctgg ccggcctgct 9060ctggtgcctg gcctcgcgcc gccgtgtatc gccccgccct
gggcggcaag gctggcccgg 9120tcggcaccag ttgcgtgagc ggaaagatgg ccgcttcccg
gccctgctgc agggagctca 9180aaatggagga cgcggcgctc gggagagcgg gcgggtgagt
cacccacaca aaggaaaagg 9240gcctttccgt cctcagccgt cgcttcatgt gactccacgg
agtaccgggc gccgtccagg 9300cacctcgatt agttctcgag cttttggagt acgtcgtctt
taggttgggg ggaggggttt 9360tatgcgatgg agtttcccca cactgagtgg gtggagactg
aagttaggcc agcttggcac 9420ttgatgtaat tctccttgga atttgccctt tttgagtttg
gatcttggtt cattctcaag 9480cctcagacag tggttcaaag tttttttctt ccatttcagg
tgaattcggc cattacggcc 9540tcccaccatg aaaccagtaa cgttatacga tgtcgcagag
tatgccggtg tctcttatca 9600gaccgtttcc cgcgtggtga accaggccag ccacgtttct
gcgaaaacgc gggaaaaagt 9660ggaagcggcg atggcggagc tgaattacat tcccaaccgc
gtggcacaac aactggcggg 9720caaacagtcg ttgctgattg gcgttgccac ctccagtctg
gccctgcacg cgccgtcgca 9780aattgtcgcg gcgattaaat ctcgcgccga tcaactgggt
gccagcgtgg tggtgtcgat 9840ggtagaacga agcggcgtcg aagcctgtaa agcggcggtg
cacaatcttc tcgcgcaacg 9900cgtcagtggg ctgatcatta actatccgct ggatgaccag
gatgccattg ctgtggaagc 9960tgcctgcact aatgttccgg cgttatttct tgatgtctct
gaccagacac ccatcaacag 10020tattattttc tcccatgaag acggtacgcg actgggcgtg
gagcatctgg tcgcattggg 10080tcaccagcaa atcgcgctgt tagcgggccc attaagttct
gtctcggcgc gtctgcgtct 10140ggctggctgg cataaatatc tcactcgcaa tcaaattcag
ccgatagcgg aacgggaagg 10200cgactggagt gccatgtccg gttttcaaca aaccatgcaa
atgctgaatg agggcatcgt 10260tcccactgcg atgctggttg ccaacgatca gatggcgctg
ggcgcaatgc gcgccattac 10320cgagtccggg ctgcgcgttg gtgcggatat ctcggtagtg
ggatacgacg ataccgaaga 10380cagctcatgt tatatcccgc cgttaaccac catcaaacag
gattttcgcc tgctggggca 10440aaccagcgtg gaccgcttgc tgcaactctc tcagggccag
gcggtgaagg gcaatcagct 10500gttgcccgtc tcactggtga aaagaaaaac caccctggcg
cccaatacgc aaaccgcctc 10560tccccgcgcg ttggccgatt cattaatgca gctggcacga
caggtttccc gactggaaag 10620cgggcagcca aaaaagaaga gaaaggtcga cggcggtggt
gctttgtctc ctcagcactc 10680tgctgtcact caaggaagta tcatcaagaa caaggagggc
atggatgcta agtcactaac 10740tgcctggtcc cggacactgg tgaccttcaa ggatgtattt
gtggacttca ccagggagga 10800gtggaagctg ctggacactg ctcagcagat cgtgtacaga
aatgtgatgc tggagaacta 10860taagaacctg gtttccttgg gttatcagct tactaagcca
gatgtgatcc tccggttgga 10920gaagggagaa gagccctggc tggtggagag agaaattcac
caagagaccc atcctgattc 10980agagactgca tttgaaatca aatcatcagt ttaaggccgc
ct 11022159123DNAArtificial SequenceSynthetic
15ttattgccac catgagcccg aaacgccgca cccaggcgga acgcgcgatg gaaacccagg
60gcaaactgat tgcggcggcg ctgggcgtgc tgcgcgaaaa aggctatgcg ggctttcgca
120ttgcggatgt gccgggcgcg gcgggcgtga gccgcggcgc gcagagccat cattttccga
180ccaaactgga actgctgctg gcgacctttg aatggctgta tgaacagatt accgaacgca
240gccgcgcgcg cctggcgaaa ctgaaaccgg aagatgatgt gattcagcag atgctggatg
300atgcggcgga attttttctg gatgatgatt ttagcattag cctggatctg attgtggcgg
360cggatcgcga tccggcgctg cgcgaaggca ttcagcgcac cgtggaacgc aaccgctttg
420tggtggaaga tatgtggctg ggcgtgctgg tgagccgcgg cctgagccgc gatgatgcgg
480aagatattct gtggctgatt tttaacagcg tgcgcggcct ggcggtgcgc agcctgtggc
540agaaagataa agaacgcttt gaacgcgtgc gcaacagcac cctggaaatt gcgcgcgaac
600gctatgcgaa atttaaacgc gcgtacagcc gcgcgcgtac gaaaaacaat tacgggtcta
660ccatcgaggg cctgctcgat ctcccggacg acgacgcccc cgaagaggcg gggctggcgg
720ctccgcgcct gtcctttctc cccgcgggac acacgcgcag actgtcgacg gcccccccga
780ccgatgtcag cctgggggac gagctccact tagacggcga ggacgtggcg atggcgcatg
840ccgacgcgct agacgatttc gatctggaca tgttggggga cggggattcc ccgggtccgg
900gatttacccc ccacgactcc gccccctacg gcgctctgga tatggccgac ttcgagtttg
960agcagatgtt taccgatgcc cttggaattg acgagtacgg tgggtaataa gtgtgggagg
1020gctaagggcg cgccgttcta gagaattcga tatcaagctt atcgataatc aacctctgga
1080ttacaaaatt tgtgaaagat tgactggtat tcttaactat gttgctcctt ttacgctatg
1140tggatacgct gctttaatgc ctttgtatca tgctattact tcccgtacgg ctttcatttt
1200ctcctccttg tataaatcct ggttgctgtc tctttatgag gagttgtggc ccgttgtcag
1260gcaacgtggc gtggtgtgca ctgtgtttgc tgacgcaacc cccactggtt ggggcattgc
1320caccacctat caactccttt ccgggacttt cgctttcccc ctccctattg ccacggcgga
1380actcattgcc gcctgccttg cccgctgctg gacaggggct cggctgttgg gcactgacaa
1440ttccgtggtg ttgtcgggga agctgacgtc ctttccatgg ctgctcgcct gtgttgccaa
1500ctggattctg cgcgggacgt ccttctgcta cgtcccttcg gccctcaatc cagcggacct
1560tccttcccgc ggcctgctgc cggttctgcg gcctcttccg cgtcttcgcc ttcgccctca
1620gacgagtcgg atctcccttt gggccgcctc cccgcctgcc tgcaggtttg tcgagaccta
1680gaaaaacatg gagcaatcac aagtagcaat acagcagcta ccaatgctga ttgtgcctgg
1740ctagaagcac aagaggagga ggaggtgggt tttccagtca cacctcaggt acctttaaga
1800ccaatgactt acaaggcagc tgtagatctt agccactttt taaaagaaaa ggggggactg
1860gaagggctaa ttcactccca acgaagacaa gatctgcttt ttgcttgtac tgggtctctc
1920tggttagacc agatctgagc ctgggagctc tctggctaac tagggaaccc actgcttaag
1980cctcaataaa gcttgccttg agtgcttcaa gtagtgtgtg cccgtctgtt gtgtgactct
2040ggtaactaga gatccctcag acccttttag tcagtgtgga aaatctctag cagggcccgt
2100ttaaacccgc tgatcagcct cgactgtgcc ttctagttgc cagccatctg ttgtttgccc
2160ctcccccgtg ccttccttga ccctggaagg tgccactccc actgtccttt cctaataaaa
2220tgaggaaatt gcatcgcatt gtctgagtag gtgtcattct attctggggg gtggggtggg
2280gcaggacagc aagggggagg attgggaaga caatagcagg catgctgggg atgcggtggg
2340ctctatggct tctgaggcgg aaagaaccag ctggggctct agggggtatc cccacgcgcc
2400ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg cgcagcgtga ccgctacact
2460tgccagcgcc ctagcgcccg ctcctttcgc tttcttccct tcctttctcg ccacgttcgc
2520cggctttccc cgtcaagctc taaatcgggg catcccttta gggttccgat ttagtgcttt
2580acggcacctc gaccccaaaa aacttgatta gggtgatggt tcacgtagtg ggccatcgcc
2640ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg ttctttaata gtggactctt
2700gttccaaact ggaacaacac tcaaccctat ctcggtctat tcttttgatt tataagggat
2760tttggggatt tcggcctatt ggttaaaaaa tgagctgatt taacaaaaat ttaacgcgaa
2820ttaattctgt ggaatgtgtg tcagttaggg tgtggaaagt ccccaggctc cccaggcagg
2880cagaagtatg caaagcatgc atctcaatta gtcagcaacc aggtgtggaa agtccccagg
2940ctccccagca ggcagaagta tgcaaagcat gcatctcaat tagtcagcaa ccatagtccc
3000gcccctaact ccgcccatcc cgcccctaac tccgcccagt tccgcccatt ctccgcccca
3060tggctgacta atttttttta tttatgcaga ggccgaggcc gcctctgcct ctgagctatt
3120ccagaagtag tgaggaggct tttttggagg cctaggcttt tgcaaaaagc tcccgggagc
3180ttgtatatcc attttcggat ctgatcagca cgtgttgaca attaatcatc ggcatagtat
3240atcggcatag tataatacga caaggtgagg aactaaacca tggccaagtt gaccagtgcc
3300gttccggtgc tcaccgcgcg cgacgtcgcc ggagcggtcg agttctggac cgaccggctc
3360gggttctccc gggacttcgt ggaggacgac ttcgccggtg tggtccggga cgacgtgacc
3420ctgttcatca gcgcggtcca ggaccaggtg gtgccggaca acaccctggc ctgggtgtgg
3480gtgcgcggcc tggacgagct gtacgccgag tggtcggagg tcgtgtccac gaacttccgg
3540gacgcctccg ggccggccat gaccgagatc ggcgagcagc cgtgggggcg ggagttcgcc
3600ctgcgcgacc cggccggcaa ctgcgtgcac ttcgtggccg aggagcagga ctgacacgtg
3660ctacgagatt tcgattccac cgccgccttc tatgaaaggt tgggcttcgg aatcgttttc
3720cgggacgccg gctggatgat cctccagcgc ggggatctca tgctggagtt cttcgcccac
3780cccaacttgt ttattgcagc ttataatggt tacaaataaa gcaatagcat cacaaatttc
3840acaaataaag catttttttc actgcattct agttgtggtt tgtccaaact catcaatgta
3900tcttatcatg tctgtatacc gtcgacctct agctagagct tggcgtaatc atggtcatag
3960ctgtttcctg tgtgaaattg ttatccgctc acaattccac acaacatacg agccggaagc
4020ataaagtgta aagcctgggg tgcctaatga gtgagctaac tcacattaat tgcgttgcgc
4080tcactgcccg ctttccagtc gggaaacctg tcgtgccagc tgcattaatg aatcggccaa
4140cgcgcgggga gaggcggttt gcgtattggg cgctcttccg cttcctcgct cactgactcg
4200ctgcgctcgg tcgttcggct gcggcgagcg gtatcagctc actcaaaggc ggtaatacgg
4260ttatccacag aatcagggga taacgcagga aagaacatgt gagcaaaagg ccagcaaaag
4320gccaggaacc gtaaaaaggc cgcgttgctg gcgtttttcc ataggctccg cccccctgac
4380gagcatcaca aaaatcgacg ctcaagtcag aggtggcgaa acccgacagg actataaaga
4440taccaggcgt ttccccctgg aagctccctc gtgcgctctc ctgttccgac cctgccgctt
4500accggatacc tgtccgcctt tctcccttcg ggaagcgtgg cgctttctca atgctcacgc
4560tgtaggtatc tcagttcggt gtaggtcgtt cgctccaagc tgggctgtgt gcacgaaccc
4620cccgttcagc ccgaccgctg cgccttatcc ggtaactatc gtcttgagtc caacccggta
4680agacacgact tatcgccact ggcagcagcc actggtaaca ggattagcag agcgaggtat
4740gtaggcggtg ctacagagtt cttgaagtgg tggcctaact acggctacac tagaaggaca
4800gtatttggta tctgcgctct gctgaagcca gttaccttcg gaaaaagagt tggtagctct
4860tgatccggca aacaaaccac cgctggtagc ggtggttttt ttgtttgcaa gcagcagatt
4920acgcgcagaa aaaaaggatc tcaagaagat cctttgatct tttctacggg gtctgacgct
4980cagtggaacg aaaactcacg ttaagggatt ttggtcatga gattatcaaa aaggatcttc
5040acctagatcc ttttaaatta aaaatgaagt tttaaatcaa tctaaagtat atatgagtaa
5100acttggtctg acagttacca atgcttaatc agtgaggcac ctatctcagc gatctgtcta
5160tttcgttcat ccatagttgc ctgactcccc gtcgtgtaga taactacgat acgggagggc
5220ttaccatctg gccccagtgc tgcaatgata ccgcgagacc cacgctcacc ggctccagat
5280ttatcagcaa taaaccagcc agccggaagg gccgagcgca gaagtggtcc tgcaacttta
5340tccgcctcca tccagtctat taattgttgc cgggaagcta gagtaagtag ttcgccagtt
5400aatagtttgc gcaacgttgt tgccattgct acaggcatcg tggtgtcacg ctcgtcgttt
5460ggtatggctt cattcagctc cggttcccaa cgatcaaggc gagttacatg atcccccatg
5520ttgtgcaaaa aagcggttag ctccttcggt cctccgatcg ttgtcagaag taagttggcc
5580gcagtgttat cactcatggt tatggcagca ctgcataatt ctcttactgt catgccatcc
5640gtaagatgct tttctgtgac tggtgagtac tcaaccaagt cattctgaga atagtgtatg
5700cggcgaccga gttgctcttg cccggcgtca atacgggata ataccgcgcc acatagcaga
5760actttaaaag tgctcatcat tggaaaacgt tcttcggggc gaaaactctc aaggatctta
5820ccgctgttga gatccagttc gatgtaaccc actcgtgcac ccaactgatc ttcagcatct
5880tttactttca ccagcgtttc tgggtgagca aaaacaggaa ggcaaaatgc cgcaaaaaag
5940ggaataaggg cgacacggaa atgttgaata ctcatactct tcctttttca atattattga
6000agcatttatc agggttattg tctcatgagc ggatacatat ttgaatgtat ttagaaaaat
6060aaacaaatag gggttccgcg cacatttccc cgaaaagtgc cacctgacgt cgacggatcg
6120ggagatctcc cgatccccta tggtgcactc tcagtacaat ctgctctgat gccgcatagt
6180taagccagta tctgctccct gcttgtgtgt tggaggtcgc tgagtagtgc gcgagcaaaa
6240tttaagctac aacaaggcaa ggcttgaccg acaattgcat gaagaatctg cttagggtta
6300ggcgttttgc gctgcttcgc gatgtacggg ccagatatac gcgttgacat tgattattga
6360ctagttatta atagtaatca attacggggt cattagttca tagcccatat atggagttcc
6420gcgttacata acttacggta aatggcccgc ctggctgacc gcccaacgac ccccgcccat
6480tgacgtcaat aatgacgtat gttcccatag taacgccaat agggactttc cattgacgtc
6540aatgggtgga ctatttacgg taaactgccc acttggcagt acatcaagtg tatcatatgc
6600caagtacgcc ccctattgac gtcaatgacg gtaaatggcc cgcctggcat tatgcccagt
6660acatgacctt atgggacttt cctacttggc agtacatcta cgtattagtc atcgctatta
6720ccatggtgat gcggttttgg cagtacatca atgggcgtgg atagcggttt gactcacggg
6780gatttccaag tctccacccc attgacgtca atgggagttt gttttggcac caaaatcaac
6840gggactttcc aaaatgtcgt aacaactccg ccccattgac gcaaatgggc ggtaggcgtg
6900tacggtggga ggtctatata agcagagctc tctggctaac tagagaaccc actgcttact
6960ggcttatcga aattaatacg actcactata gggagaccca agctggttta aacttaagct
7020tggtaccgag ctcactagtc cagtgtggtg gcagatatcc agcacagtgg cggccgctcg
7080aggggcccgt tttgcctgta ctgggtctct ctggttagac cagatctgag cctgggagct
7140ctctggctaa ctagggaacc cactgcttaa gcctcaataa agcttgcctt gagtgcttca
7200agtagtgtgt gcccgtctgt tgtgtgactc tggtaactag agatccctca gaccctttta
7260gtcagtgtgg aaaatctcta gcagtggcgc ccgaacaggg acttgaaagc gaaagggaaa
7320ccagaggagc tctctcgacg caggactcgg cttgctgaag cgcgcacggc aagaggcgag
7380gggcggcgac tggtgagtac gccaaaaatt ttgactagcg gaggctagaa ggagagagat
7440gggtgcgaga gcgtcagtat taagcggggg agaattagat cgcgatggga aaaaattcgg
7500ttaaggccag ggggaaagaa aaaatataaa ttaaaacata tagtatgggc aagcagggag
7560ctagaacgat tcgcagttaa tcctggcctg ttagaaacat cagaaggctg tagacaaata
7620ctgggacagc tacaaccatc ccttcagaca ggatcagaag aacttagatc attatataat
7680acagtagcaa ccctctattg tgtgcatcaa aggatagaga taaaagacac caaggaagct
7740ttagacaaga tagaggaaga gcaaaacaaa agtaagacca ccgcacagca agcggccgct
7800gatcttcaga cctggaggag gagatatgag ggacaattgg agaagtgaat tatataaata
7860taaagtagta aaaattgaac cattaggagt agcacccacc aaggcaaaga gaagagtggt
7920gcagagagaa aaaagagcag tgggaatagg agctttgttc cttgggttct tgggagcagc
7980aggaagcact atgggcgcag cgtcaatgac gctgacggta caggccagac aattattgtc
8040tggtatagtg cagcagcaga acaatttgct gagggctatt gaggcgcaac agcatctgtt
8100gcaactcaca gtctggggca tcaagcagct ccaggcaaga atcctggctg tggaaagata
8160cctaaaggat caacagctcc tggggatttg gggttgctct ggaaaactca tttgcaccac
8220tgctgtgcct tggaatgcta gttggagtaa taaatctctg gaacagattt ggaatcacac
8280gacctggatg gagtgggaca gagaaattaa caattacaca agcttaatac actccttaat
8340tgaagaatcg caaaaccagc aagaaaagaa tgaacaagaa ttattggaat tagataaatg
8400ggcaagtttg tggaattggt ttaacataac aaattggctg tggtatataa aattattcat
8460aatgatagta ggaggcttgg taggtttaag aatagttttt gctgtacttt ctatagtgaa
8520tagagttagg cagggatatt caccattatc gtttcagacc cacctcccaa ccccgagggg
8580acccgacagg cccttaatta atcccctgat tctgtggata accgtattac cgcctttgag
8640tgagctgcac aactgccaga tttcacagga aaagtgaaag gctacaatag gacaactgcc
8700agatttcaca ggaaaagtga aaggctacaa taggacaact gccagatttc acaggaaaag
8760tgaaaggcta caataggaca actgccagat ttcacaggaa aagtgaaagg ctacaatagg
8820acaactgcca gatttcacag gaaaagtgaa aggctacaat aggacaactg ccagatttca
8880caggaaaagt gaaaggctac aataggacaa agtgaaaggc tacaatagga cggtaaactc
8940gacctatata agcagagctc gtttagtgaa ccgtcagatc gcctggagac gccatccacg
9000ctgttttgac ctccatagaa gacaccggga ccgatccagc ctccgcggcc ccgaattgaa
9060ttcgatgctg gtgttaaaaa catatgatgc tggtgttaaa aacggccatt acggcctgcc
9120acc
9123169275DNAArtificial SequenceSynthetic 16ttatgtgtgg gagggctaag
ggcgcgccgt tctagagaat tcgatatcaa gcttatcgat 60aatcaacctc tggattacaa
aatttgtgaa agattgactg gtattcttaa ctatgttgct 120ccttttacgc tatgtggata
cgctgcttta atgcctttgt atcatgctat tacttcccgt 180acggctttca ttttctcctc
cttgtataaa tcctggttgc tgtctcttta tgaggagttg 240tggcccgttg tcaggcaacg
tggcgtggtg tgcactgtgt ttgctgacgc aacccccact 300ggttggggca ttgccaccac
ctatcaactc ctttccggga ctttcgcttt ccccctccct 360attgccacgg cggaactcat
tgccgcctgc cttgcccgct gctggacagg ggctcggctg 420ttgggcactg acaattccgt
ggtgttgtcg gggaagctga cgtcctttcc atggctgctc 480gcctgtgttg ccaactggat
tctgcgcggg acgtccttct gctacgtccc ttcggccctc 540aatccagcgg accttccttc
ccgcggcctg ctgccggttc tgcggcctct tccgcgtctt 600cgccttcgcc ctcagacgag
tcggatctcc ctttgggccg cctccccgcc tgcctgcagg 660tttgtcgaga cctagaaaaa
catggagcaa tcacaagtag caatacagca gctaccaatg 720ctgattgtgc ctggctagaa
gcacaagagg aggaggaggt gggttttcca gtcacacctc 780aggtaccttt aagaccaatg
acttacaagg cagctgtaga tcttagccac tttttaaaag 840aaaagggggg actggaaggg
ctaattcact cccaacgaag acaagatctg ctttttgctt 900gtactgggtc tctctggtta
gaccagatct gagcctggga gctctctggc taactaggga 960acccactgct taagcctcaa
taaagcttgc cttgagtgct tcaagtagtg tgtgcccgtc 1020tgttgtgtga ctctggtaac
tagagatccc tcagaccctt ttagtcagtg tggaaaatct 1080ctagcagggc ccgtttaaac
ccgctgatca gcctcgactg tgccttctag ttgccagcca 1140tctgttgttt gcccctcccc
cgtgccttcc ttgaccctgg aaggtgccac tcccactgtc 1200ctttcctaat aaaatgagga
aattgcatcg cattgtctga gtaggtgtca ttctattctg 1260gggggtgggg tggggcagga
cagcaagggg gaggattggg aagacaatag caggcatgct 1320ggggatgcgg tgggctctat
ggcttctgag gcggaaagaa ccagctgggg ctctaggggg 1380tatccccacg cgccctgtag
cggcgcatta agcgcggcgg gtgtggtggt tacgcgcagc 1440gtgaccgcta cacttgccag
cgccctagcg cccgctcctt tcgctttctt cccttccttt 1500ctcgccacgt tcgccggctt
tccccgtcaa gctctaaatc ggggcatccc tttagggttc 1560cgatttagtg ctttacggca
cctcgacccc aaaaaacttg attagggtga tggttcacgt 1620agtgggccat cgccctgata
gacggttttt cgccctttga cgttggagtc cacgttcttt 1680aatagtggac tcttgttcca
aactggaaca acactcaacc ctatctcggt ctattctttt 1740gatttataag ggattttggg
gatttcggcc tattggttaa aaaatgagct gatttaacaa 1800aaatttaacg cgaattaatt
ctgtggaatg tgtgtcagtt agggtgtgga aagtccccag 1860gctccccagg caggcagaag
tatgcaaagc atgcatctca attagtcagc aaccaggtgt 1920ggaaagtccc caggctcccc
agcaggcaga agtatgcaaa gcatgcatct caattagtca 1980gcaaccatag tcccgcccct
aactccgccc atcccgcccc taactccgcc cagttccgcc 2040cattctccgc cccatggctg
actaattttt tttatttatg cagaggccga ggccgcctct 2100gcctctgagc tattccagaa
gtagtgagga ggcttttttg gaggcctagg cttttgcaaa 2160aagctcccgg gagcttgtat
atccattttc ggatctgatc agcacgtgtt gacaattaat 2220catcggcata gtatatcggc
atagtataat acgacaaggt gaggaactaa accatggcca 2280agttgaccag tgccgttccg
gtgctcaccg cgcgcgacgt cgccggagcg gtcgagttct 2340ggaccgaccg gctcgggttc
tcccgggact tcgtggagga cgacttcgcc ggtgtggtcc 2400gggacgacgt gaccctgttc
atcagcgcgg tccaggacca ggtggtgccg gacaacaccc 2460tggcctgggt gtgggtgcgc
ggcctggacg agctgtacgc cgagtggtcg gaggtcgtgt 2520ccacgaactt ccgggacgcc
tccgggccgg ccatgaccga gatcggcgag cagccgtggg 2580ggcgggagtt cgccctgcgc
gacccggccg gcaactgcgt gcacttcgtg gccgaggagc 2640aggactgaca cgtgctacga
gatttcgatt ccaccgccgc cttctatgaa aggttgggct 2700tcggaatcgt tttccgggac
gccggctgga tgatcctcca gcgcggggat ctcatgctgg 2760agttcttcgc ccaccccaac
ttgtttattg cagcttataa tggttacaaa taaagcaata 2820gcatcacaaa tttcacaaat
aaagcatttt tttcactgca ttctagttgt ggtttgtcca 2880aactcatcaa tgtatcttat
catgtctgta taccgtcgac ctctagctag agcttggcgt 2940aatcatggtc atagctgttt
cctgtgtgaa attgttatcc gctcacaatt ccacacaaca 3000tacgagccgg aagcataaag
tgtaaagcct ggggtgccta atgagtgagc taactcacat 3060taattgcgtt gcgctcactg
cccgctttcc agtcgggaaa cctgtcgtgc cagctgcatt 3120aatgaatcgg ccaacgcgcg
gggagaggcg gtttgcgtat tgggcgctct tccgcttcct 3180cgctcactga ctcgctgcgc
tcggtcgttc ggctgcggcg agcggtatca gctcactcaa 3240aggcggtaat acggttatcc
acagaatcag gggataacgc aggaaagaac atgtgagcaa 3300aaggccagca aaaggccagg
aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc 3360tccgcccccc tgacgagcat
cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga 3420caggactata aagataccag
gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc 3480cgaccctgcc gcttaccgga
tacctgtccg cctttctccc ttcgggaagc gtggcgcttt 3540ctcaatgctc acgctgtagg
tatctcagtt cggtgtaggt cgttcgctcc aagctgggct 3600gtgtgcacga accccccgtt
cagcccgacc gctgcgcctt atccggtaac tatcgtcttg 3660agtccaaccc ggtaagacac
gacttatcgc cactggcagc agccactggt aacaggatta 3720gcagagcgag gtatgtaggc
ggtgctacag agttcttgaa gtggtggcct aactacggct 3780acactagaag gacagtattt
ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa 3840gagttggtag ctcttgatcc
ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt 3900gcaagcagca gattacgcgc
agaaaaaaag gatctcaaga agatcctttg atcttttcta 3960cggggtctga cgctcagtgg
aacgaaaact cacgttaagg gattttggtc atgagattat 4020caaaaaggat cttcacctag
atccttttaa attaaaaatg aagttttaaa tcaatctaaa 4080gtatatatga gtaaacttgg
tctgacagtt accaatgctt aatcagtgag gcacctatct 4140cagcgatctg tctatttcgt
tcatccatag ttgcctgact ccccgtcgtg tagataacta 4200cgatacggga gggcttacca
tctggcccca gtgctgcaat gataccgcga gacccacgct 4260caccggctcc agatttatca
gcaataaacc agccagccgg aagggccgag cgcagaagtg 4320gtcctgcaac tttatccgcc
tccatccagt ctattaattg ttgccgggaa gctagagtaa 4380gtagttcgcc agttaatagt
ttgcgcaacg ttgttgccat tgctacaggc atcgtggtgt 4440cacgctcgtc gtttggtatg
gcttcattca gctccggttc ccaacgatca aggcgagtta 4500catgatcccc catgttgtgc
aaaaaagcgg ttagctcctt cggtcctccg atcgttgtca 4560gaagtaagtt ggccgcagtg
ttatcactca tggttatggc agcactgcat aattctctta 4620ctgtcatgcc atccgtaaga
tgcttttctg tgactggtga gtactcaacc aagtcattct 4680gagaatagtg tatgcggcga
ccgagttgct cttgcccggc gtcaatacgg gataataccg 4740cgccacatag cagaacttta
aaagtgctca tcattggaaa acgttcttcg gggcgaaaac 4800tctcaaggat cttaccgctg
ttgagatcca gttcgatgta acccactcgt gcacccaact 4860gatcttcagc atcttttact
ttcaccagcg tttctgggtg agcaaaaaca ggaaggcaaa 4920atgccgcaaa aaagggaata
agggcgacac ggaaatgttg aatactcata ctcttccttt 4980ttcaatatta ttgaagcatt
tatcagggtt attgtctcat gagcggatac atatttgaat 5040gtatttagaa aaataaacaa
ataggggttc cgcgcacatt tccccgaaaa gtgccacctg 5100acgtcgacgg atcgggagat
ctcccgatcc cctatggtgc actctcagta caatctgctc 5160tgatgccgca tagttaagcc
agtatctgct ccctgcttgt gtgttggagg tcgctgagta 5220gtgcgcgagc aaaatttaag
ctacaacaag gcaaggcttg accgacaatt gcatgaagaa 5280tctgcttagg gttaggcgtt
ttgcgctgct tcgcgatgta cgggccagat atacgcgttg 5340acattgatta ttgactagtt
attaatagta atcaattacg gggtcattag ttcatagccc 5400atatatggag ttccgcgtta
cataacttac ggtaaatggc ccgcctggct gaccgcccaa 5460cgacccccgc ccattgacgt
caataatgac gtatgttccc atagtaacgc caatagggac 5520tttccattga cgtcaatggg
tggactattt acggtaaact gcccacttgg cagtacatca 5580agtgtatcat atgccaagta
cgccccctat tgacgtcaat gacggtaaat ggcccgcctg 5640gcattatgcc cagtacatga
ccttatggga ctttcctact tggcagtaca tctacgtatt 5700agtcatcgct attaccatgg
tgatgcggtt ttggcagtac atcaatgggc gtggatagcg 5760gtttgactca cggggatttc
caagtctcca ccccattgac gtcaatggga gtttgttttg 5820gcaccaaaat caacgggact
ttccaaaatg tcgtaacaac tccgccccat tgacgcaaat 5880gggcggtagg cgtgtacggt
gggaggtcta tataagcaga gctctctggc taactagaga 5940acccactgct tactggctta
tcgaaattaa tacgactcac tatagggaga cccaagctgg 6000tttaaactta agcttggtac
cgagctcact agtccagtgt ggtggcagat atccagcaca 6060gtggcggccg ctcgaggggc
ccgttttgcc tgtactgggt ctctctggtt agaccagatc 6120tgagcctggg agctctctgg
ctaactaggg aacccactgc ttaagcctca ataaagcttg 6180ccttgagtgc ttcaagtagt
gtgtgcccgt ctgttgtgtg actctggtaa ctagagatcc 6240ctcagaccct tttagtcagt
gtggaaaatc tctagcagtg gcgcccgaac agggacttga 6300aagcgaaagg gaaaccagag
gagctctctc gacgcaggac tcggcttgct gaagcgcgca 6360cggcaagagg cgaggggcgg
cgactggtga gtacgccaaa aattttgact agcggaggct 6420agaaggagag agatgggtgc
gagagcgtca gtattaagcg ggggagaatt agatcgcgat 6480gggaaaaaat tcggttaagg
ccagggggaa agaaaaaata taaattaaaa catatagtat 6540gggcaagcag ggagctagaa
cgattcgcag ttaatcctgg cctgttagaa acatcagaag 6600gctgtagaca aatactggga
cagctacaac catcccttca gacaggatca gaagaactta 6660gatcattata taatacagta
gcaaccctct attgtgtgca tcaaaggata gagataaaag 6720acaccaagga agctttagac
aagatagagg aagagcaaaa caaaagtaag accaccgcac 6780agcaagcggc cgctgatctt
cagacctgga ggaggagata tgagggacaa ttggagaagt 6840gaattatata aatataaagt
agtaaaaatt gaaccattag gagtagcacc caccaaggca 6900aagagaagag tggtgcagag
agaaaaaaga gcagtgggaa taggagcttt gttccttggg 6960ttcttgggag cagcaggaag
cactatgggc gcagcgtcaa tgacgctgac ggtacaggcc 7020agacaattat tgtctggtat
agtgcagcag cagaacaatt tgctgagggc tattgaggcg 7080caacagcatc tgttgcaact
cacagtctgg ggcatcaagc agctccaggc aagaatcctg 7140gctgtggaaa gatacctaaa
ggatcaacag ctcctgggga tttggggttg ctctggaaaa 7200ctcatttgca ccactgctgt
gccttggaat gctagttgga gtaataaatc tctggaacag 7260atttggaatc acacgacctg
gatggagtgg gacagagaaa ttaacaatta cacaagctta 7320atacactcct taattgaaga
atcgcaaaac cagcaagaaa agaatgaaca agaattattg 7380gaattagata aatgggcaag
tttgtggaat tggtttaaca taacaaattg gctgtggtat 7440ataaaattat tcataatgat
agtaggaggc ttggtaggtt taagaatagt ttttgctgta 7500ctttctatag tgaatagagt
taggcaggga tattcaccat tatcgtttca gacccacctc 7560ccaaccccga ggggacccga
caggccctta attaatcccc tgattctgtg gataaccgta 7620ttaccgcctt tgagtgagct
gcacgggagc aattaatcta tagattaaaa agtgaaaggc 7680tacaatagga cgggagcaat
taatctatag attaaaaagt gaaaggctac aataggacgg 7740gagcaattaa tctatagatt
aaaaagtgaa aggctacaat aggacgggag caattaatct 7800atagattaaa aagtgaaagg
ctacaatagg acgggagcaa ttaatctata gattaaaaag 7860tgaaaggcta caataggacg
ggagcaatta atctatagat taaaaagtga aaggctacaa 7920taggacggga gcaattaatc
tatagattaa aaagtgaaag gctacaatag gacggtaaac 7980tcgacctata taagcagagc
tcgtttagtg aaccgtcaga tcgcctggag acgccatcca 8040cgctgttttg acctccatag
aagacaccgg gaccgatcca gcctccgcgg ccccgaattg 8100aattcggcca ttacggcctg
ccaccatgat tgagaatacc tatagcgaaa agttcgagtc 8160cgcgttcgaa cagatcaagg
cggcggccaa cgtggatgcc gccatccgta ttctccaggc 8220ggaatataac ctcgatttcg
tcacctacca tctcgcccag acgatcgcga gcaagatcga 8280ttcgcccttc gtgcgcacca
cctatccgga tgcctgggtt tcccgctacc tcctcaacag 8340ctatgtgaag gtcgatccga
tcgtcaagca gggcttcgaa cgccagctgc ccttcgactg 8400gagcgaggtc gaaccgacgc
cggaggccta tgccatgctg gtcgacgccc agaaacacgg 8460catcggtggc aatggctact
ccatccccgt cgccgacaag gcgcagcgcc gcgccctgct 8520gtcgctgaat gcccgtatac
cggccgacga atggaccgag ctcgtgcgcc gctgccgcaa 8580cgagtggatc gagatcgccc
atctgatcca ccgcaaggcc gtctatgagc tgcatggcga 8640aaacgatccg gtgccggcat
tgtcgccgcg cgagatcgag tgtctgcact ggaccgccct 8700cggcaaggat tacaaggata
tttcggtcat cctgggcata tcagagcata ccacacgcga 8760ttacctgaag accgcccgct
tcaagctcgg ctgcgccacg atctcggccg ccgcgtcgcg 8820ggctgttcaa ttgcgcatca
tcaatcccgc tgcaaacgac gaaaactacg ctttagtagc 8880tgcgtacagc cgcgcgcgta
cgaaaaacaa ttacgggtct accatcgagg gcctgctcga 8940tctcccggac gacgacgccc
ccgaagaggc ggggctggcg gctccgcgcc tgtcctttct 9000ccccgcggga cacacgcgca
gactgtcgac ggcccccccg accgatgtca gcctggggga 9060cgagctccac ttagacggcg
aggacgtggc gatggcgcat gccgacgcgc tagacgattt 9120cgatctggac atgttggggg
acggggattc cccgggtccg ggatttaccc cccacgactc 9180cgccccctac ggcgctctgg
atatggccga cttcgagttt gagcagatgt ttaccgatgc 9240ccttggaatt gacgagtacg
gtgggtaatg ccacc 9275179230DNAArtificial
SequenceSynthetic 17ttatgtgtgg gagggctaag ggcgcgccgt tctagagaat
tcgatatcaa gcttatcgat 60aatcaacctc tggattacaa aatttgtgaa agattgactg
gtattcttaa ctatgttgct 120ccttttacgc tatgtggata cgctgcttta atgcctttgt
atcatgctat tacttcccgt 180acggctttca ttttctcctc cttgtataaa tcctggttgc
tgtctcttta tgaggagttg 240tggcccgttg tcaggcaacg tggcgtggtg tgcactgtgt
ttgctgacgc aacccccact 300ggttggggca ttgccaccac ctatcaactc ctttccggga
ctttcgcttt ccccctccct 360attgccacgg cggaactcat tgccgcctgc cttgcccgct
gctggacagg ggctcggctg 420ttgggcactg acaattccgt ggtgttgtcg gggaagctga
cgtcctttcc atggctgctc 480gcctgtgttg ccaactggat tctgcgcggg acgtccttct
gctacgtccc ttcggccctc 540aatccagcgg accttccttc ccgcggcctg ctgccggttc
tgcggcctct tccgcgtctt 600cgccttcgcc ctcagacgag tcggatctcc ctttgggccg
cctccccgcc tgcctgcagg 660tttgtcgaga cctagaaaaa catggagcaa tcacaagtag
caatacagca gctaccaatg 720ctgattgtgc ctggctagaa gcacaagagg aggaggaggt
gggttttcca gtcacacctc 780aggtaccttt aagaccaatg acttacaagg cagctgtaga
tcttagccac tttttaaaag 840aaaagggggg actggaaggg ctaattcact cccaacgaag
acaagatctg ctttttgctt 900gtactgggtc tctctggtta gaccagatct gagcctggga
gctctctggc taactaggga 960acccactgct taagcctcaa taaagcttgc cttgagtgct
tcaagtagtg tgtgcccgtc 1020tgttgtgtga ctctggtaac tagagatccc tcagaccctt
ttagtcagtg tggaaaatct 1080ctagcagggc ccgtttaaac ccgctgatca gcctcgactg
tgccttctag ttgccagcca 1140tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg
aaggtgccac tcccactgtc 1200ctttcctaat aaaatgagga aattgcatcg cattgtctga
gtaggtgtca ttctattctg 1260gggggtgggg tggggcagga cagcaagggg gaggattggg
aagacaatag caggcatgct 1320ggggatgcgg tgggctctat ggcttctgag gcggaaagaa
ccagctgggg ctctaggggg 1380tatccccacg cgccctgtag cggcgcatta agcgcggcgg
gtgtggtggt tacgcgcagc 1440gtgaccgcta cacttgccag cgccctagcg cccgctcctt
tcgctttctt cccttccttt 1500ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc
ggggcatccc tttagggttc 1560cgatttagtg ctttacggca cctcgacccc aaaaaacttg
attagggtga tggttcacgt 1620agtgggccat cgccctgata gacggttttt cgccctttga
cgttggagtc cacgttcttt 1680aatagtggac tcttgttcca aactggaaca acactcaacc
ctatctcggt ctattctttt 1740gatttataag ggattttggg gatttcggcc tattggttaa
aaaatgagct gatttaacaa 1800aaatttaacg cgaattaatt ctgtggaatg tgtgtcagtt
agggtgtgga aagtccccag 1860gctccccagg caggcagaag tatgcaaagc atgcatctca
attagtcagc aaccaggtgt 1920ggaaagtccc caggctcccc agcaggcaga agtatgcaaa
gcatgcatct caattagtca 1980gcaaccatag tcccgcccct aactccgccc atcccgcccc
taactccgcc cagttccgcc 2040cattctccgc cccatggctg actaattttt tttatttatg
cagaggccga ggccgcctct 2100gcctctgagc tattccagaa gtagtgagga ggcttttttg
gaggcctagg cttttgcaaa 2160aagctcccgg gagcttgtat atccattttc ggatctgatc
agcacgtgtt gacaattaat 2220catcggcata gtatatcggc atagtataat acgacaaggt
gaggaactaa accatggcca 2280agttgaccag tgccgttccg gtgctcaccg cgcgcgacgt
cgccggagcg gtcgagttct 2340ggaccgaccg gctcgggttc tcccgggact tcgtggagga
cgacttcgcc ggtgtggtcc 2400gggacgacgt gaccctgttc atcagcgcgg tccaggacca
ggtggtgccg gacaacaccc 2460tggcctgggt gtgggtgcgc ggcctggacg agctgtacgc
cgagtggtcg gaggtcgtgt 2520ccacgaactt ccgggacgcc tccgggccgg ccatgaccga
gatcggcgag cagccgtggg 2580ggcgggagtt cgccctgcgc gacccggccg gcaactgcgt
gcacttcgtg gccgaggagc 2640aggactgaca cgtgctacga gatttcgatt ccaccgccgc
cttctatgaa aggttgggct 2700tcggaatcgt tttccgggac gccggctgga tgatcctcca
gcgcggggat ctcatgctgg 2760agttcttcgc ccaccccaac ttgtttattg cagcttataa
tggttacaaa taaagcaata 2820gcatcacaaa tttcacaaat aaagcatttt tttcactgca
ttctagttgt ggtttgtcca 2880aactcatcaa tgtatcttat catgtctgta taccgtcgac
ctctagctag agcttggcgt 2940aatcatggtc atagctgttt cctgtgtgaa attgttatcc
gctcacaatt ccacacaaca 3000tacgagccgg aagcataaag tgtaaagcct ggggtgccta
atgagtgagc taactcacat 3060taattgcgtt gcgctcactg cccgctttcc agtcgggaaa
cctgtcgtgc cagctgcatt 3120aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat
tgggcgctct tccgcttcct 3180cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg
agcggtatca gctcactcaa 3240aggcggtaat acggttatcc acagaatcag gggataacgc
aggaaagaac atgtgagcaa 3300aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt
gctggcgttt ttccataggc 3360tccgcccccc tgacgagcat cacaaaaatc gacgctcaag
tcagaggtgg cgaaacccga 3420caggactata aagataccag gcgtttcccc ctggaagctc
cctcgtgcgc tctcctgttc 3480cgaccctgcc gcttaccgga tacctgtccg cctttctccc
ttcgggaagc gtggcgcttt 3540ctcaatgctc acgctgtagg tatctcagtt cggtgtaggt
cgttcgctcc aagctgggct 3600gtgtgcacga accccccgtt cagcccgacc gctgcgcctt
atccggtaac tatcgtcttg 3660agtccaaccc ggtaagacac gacttatcgc cactggcagc
agccactggt aacaggatta 3720gcagagcgag gtatgtaggc ggtgctacag agttcttgaa
gtggtggcct aactacggct 3780acactagaag gacagtattt ggtatctgcg ctctgctgaa
gccagttacc ttcggaaaaa 3840gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg
tagcggtggt ttttttgttt 3900gcaagcagca gattacgcgc agaaaaaaag gatctcaaga
agatcctttg atcttttcta 3960cggggtctga cgctcagtgg aacgaaaact cacgttaagg
gattttggtc atgagattat 4020caaaaaggat cttcacctag atccttttaa attaaaaatg
aagttttaaa tcaatctaaa 4080gtatatatga gtaaacttgg tctgacagtt accaatgctt
aatcagtgag gcacctatct 4140cagcgatctg tctatttcgt tcatccatag ttgcctgact
ccccgtcgtg tagataacta 4200cgatacggga gggcttacca tctggcccca gtgctgcaat
gataccgcga gacccacgct 4260caccggctcc agatttatca gcaataaacc agccagccgg
aagggccgag cgcagaagtg 4320gtcctgcaac tttatccgcc tccatccagt ctattaattg
ttgccgggaa gctagagtaa 4380gtagttcgcc agttaatagt ttgcgcaacg ttgttgccat
tgctacaggc atcgtggtgt 4440cacgctcgtc gtttggtatg gcttcattca gctccggttc
ccaacgatca aggcgagtta 4500catgatcccc catgttgtgc aaaaaagcgg ttagctcctt
cggtcctccg atcgttgtca 4560gaagtaagtt ggccgcagtg ttatcactca tggttatggc
agcactgcat aattctctta 4620ctgtcatgcc atccgtaaga tgcttttctg tgactggtga
gtactcaacc aagtcattct 4680gagaatagtg tatgcggcga ccgagttgct cttgcccggc
gtcaatacgg gataataccg 4740cgccacatag cagaacttta aaagtgctca tcattggaaa
acgttcttcg gggcgaaaac 4800tctcaaggat cttaccgctg ttgagatcca gttcgatgta
acccactcgt gcacccaact 4860gatcttcagc atcttttact ttcaccagcg tttctgggtg
agcaaaaaca ggaaggcaaa 4920atgccgcaaa aaagggaata agggcgacac ggaaatgttg
aatactcata ctcttccttt 4980ttcaatatta ttgaagcatt tatcagggtt attgtctcat
gagcggatac atatttgaat 5040gtatttagaa aaataaacaa ataggggttc cgcgcacatt
tccccgaaaa gtgccacctg 5100acgtcgacgg atcgggagat ctcccgatcc cctatggtgc
actctcagta caatctgctc 5160tgatgccgca tagttaagcc agtatctgct ccctgcttgt
gtgttggagg tcgctgagta 5220gtgcgcgagc aaaatttaag ctacaacaag gcaaggcttg
accgacaatt gcatgaagaa 5280tctgcttagg gttaggcgtt ttgcgctgct tcgcgatgta
cgggccagat atacgcgttg 5340acattgatta ttgactagtt attaatagta atcaattacg
gggtcattag ttcatagccc 5400atatatggag ttccgcgtta cataacttac ggtaaatggc
ccgcctggct gaccgcccaa 5460cgacccccgc ccattgacgt caataatgac gtatgttccc
atagtaacgc caatagggac 5520tttccattga cgtcaatggg tggactattt acggtaaact
gcccacttgg cagtacatca 5580agtgtatcat atgccaagta cgccccctat tgacgtcaat
gacggtaaat ggcccgcctg 5640gcattatgcc cagtacatga ccttatggga ctttcctact
tggcagtaca tctacgtatt 5700agtcatcgct attaccatgg tgatgcggtt ttggcagtac
atcaatgggc gtggatagcg 5760gtttgactca cggggatttc caagtctcca ccccattgac
gtcaatggga gtttgttttg 5820gcaccaaaat caacgggact ttccaaaatg tcgtaacaac
tccgccccat tgacgcaaat 5880gggcggtagg cgtgtacggt gggaggtcta tataagcaga
gctctctggc taactagaga 5940acccactgct tactggctta tcgaaattaa tacgactcac
tatagggaga cccaagctgg 6000tttaaactta agcttggtac cgagctcact agtccagtgt
ggtggcagat atccagcaca 6060gtggcggccg ctcgaggggc ccgttttgcc tgtactgggt
ctctctggtt agaccagatc 6120tgagcctggg agctctctgg ctaactaggg aacccactgc
ttaagcctca ataaagcttg 6180ccttgagtgc ttcaagtagt gtgtgcccgt ctgttgtgtg
actctggtaa ctagagatcc 6240ctcagaccct tttagtcagt gtggaaaatc tctagcagtg
gcgcccgaac agggacttga 6300aagcgaaagg gaaaccagag gagctctctc gacgcaggac
tcggcttgct gaagcgcgca 6360cggcaagagg cgaggggcgg cgactggtga gtacgccaaa
aattttgact agcggaggct 6420agaaggagag agatgggtgc gagagcgtca gtattaagcg
ggggagaatt agatcgcgat 6480gggaaaaaat tcggttaagg ccagggggaa agaaaaaata
taaattaaaa catatagtat 6540gggcaagcag ggagctagaa cgattcgcag ttaatcctgg
cctgttagaa acatcagaag 6600gctgtagaca aatactggga cagctacaac catcccttca
gacaggatca gaagaactta 6660gatcattata taatacagta gcaaccctct attgtgtgca
tcaaaggata gagataaaag 6720acaccaagga agctttagac aagatagagg aagagcaaaa
caaaagtaag accaccgcac 6780agcaagcggc cgctgatctt cagacctgga ggaggagata
tgagggacaa ttggagaagt 6840gaattatata aatataaagt agtaaaaatt gaaccattag
gagtagcacc caccaaggca 6900aagagaagag tggtgcagag agaaaaaaga gcagtgggaa
taggagcttt gttccttggg 6960ttcttgggag cagcaggaag cactatgggc gcagcgtcaa
tgacgctgac ggtacaggcc 7020agacaattat tgtctggtat agtgcagcag cagaacaatt
tgctgagggc tattgaggcg 7080caacagcatc tgttgcaact cacagtctgg ggcatcaagc
agctccaggc aagaatcctg 7140gctgtggaaa gatacctaaa ggatcaacag ctcctgggga
tttggggttg ctctggaaaa 7200ctcatttgca ccactgctgt gccttggaat gctagttgga
gtaataaatc tctggaacag 7260atttggaatc acacgacctg gatggagtgg gacagagaaa
ttaacaatta cacaagctta 7320atacactcct taattgaaga atcgcaaaac cagcaagaaa
agaatgaaca agaattattg 7380gaattagata aatgggcaag tttgtggaat tggtttaaca
taacaaattg gctgtggtat 7440ataaaattat tcataatgat agtaggaggc ttggtaggtt
taagaatagt ttttgctgta 7500ctttctatag tgaatagagt taggcaggga tattcaccat
tatcgtttca gacccacctc 7560ccaaccccga ggggacccga caggccctta attaatcccc
tgattctgtg gataaccgta 7620ttaccgcctt tgagtgagct gcacgggagc aattaatcta
tagattaaaa agtgaaaggc 7680tacaatagga cgggagcaat taatctatag attaaaaagt
gaaaggctac aataggacgg 7740gagcaattaa tctatagatt aaaaagtgaa aggctacaat
aggacgggag caattaatct 7800atagattaaa aagtgaaagg ctacaatagg acgggagcaa
ttaatctata gattaaaaag 7860tgaaaggcta caataggacg ggagcaatta atctatagat
taaaaagtga aaggctacaa 7920taggacggga gcaattaatc tatagattaa aaagtgaaag
gctacaatag gacggtaaac 7980tcgacctata taagcagagc tcgtttagtg aaccgtcaga
tcgcctggag acgccatcca 8040cgctgttttg acctccatag aagacaccgg gaccgatcca
gcctccgcgg ccccgaattg 8100aattcggcca ttacggcctg ccaccatgag cacaaaaaag
aaaccattaa cacaagagca 8160gcttgaggac gcacgtcgcc ttaaagcaat ttatgaaaaa
aagaaaaatg aacttggctt 8220atcccaggaa tctgtcgcag acaagatggg gatggggcag
tcaggcgttg gtgctttatt 8280taatggcatc aatgcattaa atgcttataa cgccgcattg
cttgcaaaaa ttctcaaagt 8340tagcgttgaa gaattcagcc cttcaatcgc cagagagatc
tacgagatgt atgaagcggt 8400tagtatgcag ccgtcactta gaagtgagta tgagtaccct
gttttttctc atgttcaggc 8460agggatgttc tcacctgagc ttagaacctt taccaaaggt
gatgcggaga gatgggtaag 8520cacaaccaaa aaagccagtg attctgcatt ctggcttgag
gttgaaggta attccatgac 8580cgcaccaaca ggctccaagc caagctttcc tgacggaatg
ttaattctcg ttgaccctga 8640gcaggctgtt gagcccgggg atttctgcat agccagactt
gggggtgatg agtttacctt 8700caagaaactg atcagggata gcggtcaggt gtttttacaa
ccactaaacc cacagtaccc 8760aatgatccca tgcaatgaga gttgttccgt tgtggggaaa
gttatcgcta gtcagtggcc 8820tgaagagacg tttggcccaa aaaagaagag aaaggtcgac
ggcggtggtg ctttgtctcc 8880tcagcactct gctgtcactc aaggaagtat catcaagaac
aaggagggca tggatgctaa 8940gtcactaact gcctggtccc ggacactggt gaccttcaag
gatgtatttg tggacttcac 9000cagggaggag tggaagctgc tggacactgc tcagcagatc
gtgtacagaa atgtgatgct 9060ggagaactat aagaacctgg tttccttggg ttatcagctt
actaagccag atgtgatcct 9120ccggttggag aagggagaag agccctggct ggtggagaga
gaaattcacc aagagaccca 9180tcctgattca gagactgcat ttgaaatcaa atcatcagtt
taatgccacc 9230188874DNAArtificial SequenceSynthetic
18ttatgtgtgg gagggctaag ggcgcgccgt tctagagaat tcgatatcaa gcttatcgat
60aatcaacctc tggattacaa aatttgtgaa agattgactg gtattcttaa ctatgttgct
120ccttttacgc tatgtggata cgctgcttta atgcctttgt atcatgctat tacttcccgt
180acggctttca ttttctcctc cttgtataaa tcctggttgc tgtctcttta tgaggagttg
240tggcccgttg tcaggcaacg tggcgtggtg tgcactgtgt ttgctgacgc aacccccact
300ggttggggca ttgccaccac ctatcaactc ctttccggga ctttcgcttt ccccctccct
360attgccacgg cggaactcat tgccgcctgc cttgcccgct gctggacagg ggctcggctg
420ttgggcactg acaattccgt ggtgttgtcg gggaagctga cgtcctttcc atggctgctc
480gcctgtgttg ccaactggat tctgcgcggg acgtccttct gctacgtccc ttcggccctc
540aatccagcgg accttccttc ccgcggcctg ctgccggttc tgcggcctct tccgcgtctt
600cgccttcgcc ctcagacgag tcggatctcc ctttgggccg cctccccgcc tgcctgcagg
660tttgtcgaga cctagaaaaa catggagcaa tcacaagtag caatacagca gctaccaatg
720ctgattgtgc ctggctagaa gcacaagagg aggaggaggt gggttttcca gtcacacctc
780aggtaccttt aagaccaatg acttacaagg cagctgtaga tcttagccac tttttaaaag
840aaaagggggg actggaaggg ctaattcact cccaacgaag acaagatctg ctttttgctt
900gtactgggtc tctctggtta gaccagatct gagcctggga gctctctggc taactaggga
960acccactgct taagcctcaa taaagcttgc cttgagtgct tcaagtagtg tgtgcccgtc
1020tgttgtgtga ctctggtaac tagagatccc tcagaccctt ttagtcagtg tggaaaatct
1080ctagcagggc ccgtttaaac ccgctgatca gcctcgactg tgccttctag ttgccagcca
1140tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac tcccactgtc
1200ctttcctaat aaaatgagga aattgcatcg cattgtctga gtaggtgtca ttctattctg
1260gggggtgggg tggggcagga cagcaagggg gaggattggg aagacaatag caggcatgct
1320ggggatgcgg tgggctctat ggcttctgag gcggaaagaa ccagctgggg ctctaggggg
1380tatccccacg cgccctgtag cggcgcatta agcgcggcgg gtgtggtggt tacgcgcagc
1440gtgaccgcta cacttgccag cgccctagcg cccgctcctt tcgctttctt cccttccttt
1500ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc ggggcatccc tttagggttc
1560cgatttagtg ctttacggca cctcgacccc aaaaaacttg attagggtga tggttcacgt
1620agtgggccat cgccctgata gacggttttt cgccctttga cgttggagtc cacgttcttt
1680aatagtggac tcttgttcca aactggaaca acactcaacc ctatctcggt ctattctttt
1740gatttataag ggattttggg gatttcggcc tattggttaa aaaatgagct gatttaacaa
1800aaatttaacg cgaattaatt ctgtggaatg tgtgtcagtt agggtgtgga aagtccccag
1860gctccccagg caggcagaag tatgcaaagc atgcatctca attagtcagc aaccaggtgt
1920ggaaagtccc caggctcccc agcaggcaga agtatgcaaa gcatgcatct caattagtca
1980gcaaccatag tcccgcccct aactccgccc atcccgcccc taactccgcc cagttccgcc
2040cattctccgc cccatggctg actaattttt tttatttatg cagaggccga ggccgcctct
2100gcctctgagc tattccagaa gtagtgagga ggcttttttg gaggcctagg cttttgcaaa
2160aagctcccgg gagcttgtat atccattttc ggatctgatc agcacgtgtt gacaattaat
2220catcggcata gtatatcggc atagtataat acgacaaggt gaggaactaa accatggcca
2280agttgaccag tgccgttccg gtgctcaccg cgcgcgacgt cgccggagcg gtcgagttct
2340ggaccgaccg gctcgggttc tcccgggact tcgtggagga cgacttcgcc ggtgtggtcc
2400gggacgacgt gaccctgttc atcagcgcgg tccaggacca ggtggtgccg gacaacaccc
2460tggcctgggt gtgggtgcgc ggcctggacg agctgtacgc cgagtggtcg gaggtcgtgt
2520ccacgaactt ccgggacgcc tccgggccgg ccatgaccga gatcggcgag cagccgtggg
2580ggcgggagtt cgccctgcgc gacccggccg gcaactgcgt gcacttcgtg gccgaggagc
2640aggactgaca cgtgctacga gatttcgatt ccaccgccgc cttctatgaa aggttgggct
2700tcggaatcgt tttccgggac gccggctgga tgatcctcca gcgcggggat ctcatgctgg
2760agttcttcgc ccaccccaac ttgtttattg cagcttataa tggttacaaa taaagcaata
2820gcatcacaaa tttcacaaat aaagcatttt tttcactgca ttctagttgt ggtttgtcca
2880aactcatcaa tgtatcttat catgtctgta taccgtcgac ctctagctag agcttggcgt
2940aatcatggtc atagctgttt cctgtgtgaa attgttatcc gctcacaatt ccacacaaca
3000tacgagccgg aagcataaag tgtaaagcct ggggtgccta atgagtgagc taactcacat
3060taattgcgtt gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc cagctgcatt
3120aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct tccgcttcct
3180cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca gctcactcaa
3240aggcggtaat acggttatcc acagaatcag gggataacgc aggaaagaac atgtgagcaa
3300aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc
3360tccgcccccc tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga
3420caggactata aagataccag gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc
3480cgaccctgcc gcttaccgga tacctgtccg cctttctccc ttcgggaagc gtggcgcttt
3540ctcaatgctc acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc aagctgggct
3600gtgtgcacga accccccgtt cagcccgacc gctgcgcctt atccggtaac tatcgtcttg
3660agtccaaccc ggtaagacac gacttatcgc cactggcagc agccactggt aacaggatta
3720gcagagcgag gtatgtaggc ggtgctacag agttcttgaa gtggtggcct aactacggct
3780acactagaag gacagtattt ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa
3840gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt
3900gcaagcagca gattacgcgc agaaaaaaag gatctcaaga agatcctttg atcttttcta
3960cggggtctga cgctcagtgg aacgaaaact cacgttaagg gattttggtc atgagattat
4020caaaaaggat cttcacctag atccttttaa attaaaaatg aagttttaaa tcaatctaaa
4080gtatatatga gtaaacttgg tctgacagtt accaatgctt aatcagtgag gcacctatct
4140cagcgatctg tctatttcgt tcatccatag ttgcctgact ccccgtcgtg tagataacta
4200cgatacggga gggcttacca tctggcccca gtgctgcaat gataccgcga gacccacgct
4260caccggctcc agatttatca gcaataaacc agccagccgg aagggccgag cgcagaagtg
4320gtcctgcaac tttatccgcc tccatccagt ctattaattg ttgccgggaa gctagagtaa
4380gtagttcgcc agttaatagt ttgcgcaacg ttgttgccat tgctacaggc atcgtggtgt
4440cacgctcgtc gtttggtatg gcttcattca gctccggttc ccaacgatca aggcgagtta
4500catgatcccc catgttgtgc aaaaaagcgg ttagctcctt cggtcctccg atcgttgtca
4560gaagtaagtt ggccgcagtg ttatcactca tggttatggc agcactgcat aattctctta
4620ctgtcatgcc atccgtaaga tgcttttctg tgactggtga gtactcaacc aagtcattct
4680gagaatagtg tatgcggcga ccgagttgct cttgcccggc gtcaatacgg gataataccg
4740cgccacatag cagaacttta aaagtgctca tcattggaaa acgttcttcg gggcgaaaac
4800tctcaaggat cttaccgctg ttgagatcca gttcgatgta acccactcgt gcacccaact
4860gatcttcagc atcttttact ttcaccagcg tttctgggtg agcaaaaaca ggaaggcaaa
4920atgccgcaaa aaagggaata agggcgacac ggaaatgttg aatactcata ctcttccttt
4980ttcaatatta ttgaagcatt tatcagggtt attgtctcat gagcggatac atatttgaat
5040gtatttagaa aaataaacaa ataggggttc cgcgcacatt tccccgaaaa gtgccacctg
5100acgtcgacgg atcgggagat ctcccgatcc cctatggtgc actctcagta caatctgctc
5160tgatgccgca tagttaagcc agtatctgct ccctgcttgt gtgttggagg tcgctgagta
5220gtgcgcgagc aaaatttaag ctacaacaag gcaaggcttg accgacaatt gcatgaagaa
5280tctgcttagg gttaggcgtt ttgcgctgct tcgcgatgta cgggccagat atacgcgttg
5340acattgatta ttgactagtt attaatagta atcaattacg gggtcattag ttcatagccc
5400atatatggag ttccgcgtta cataacttac ggtaaatggc ccgcctggct gaccgcccaa
5460cgacccccgc ccattgacgt caataatgac gtatgttccc atagtaacgc caatagggac
5520tttccattga cgtcaatggg tggactattt acggtaaact gcccacttgg cagtacatca
5580agtgtatcat atgccaagta cgccccctat tgacgtcaat gacggtaaat ggcccgcctg
5640gcattatgcc cagtacatga ccttatggga ctttcctact tggcagtaca tctacgtatt
5700agtcatcgct attaccatgg tgatgcggtt ttggcagtac atcaatgggc gtggatagcg
5760gtttgactca cggggatttc caagtctcca ccccattgac gtcaatggga gtttgttttg
5820gcaccaaaat caacgggact ttccaaaatg tcgtaacaac tccgccccat tgacgcaaat
5880gggcggtagg cgtgtacggt gggaggtcta tataagcaga gctctctggc taactagaga
5940acccactgct tactggctta tcgaaattaa tacgactcac tatagggaga cccaagctgg
6000tttaaactta agcttggtac cgagctcact agtccagtgt ggtggcagat atccagcaca
6060gtggcggccg ctcgaggggc ccgttttgcc tgtactgggt ctctctggtt agaccagatc
6120tgagcctggg agctctctgg ctaactaggg aacccactgc ttaagcctca ataaagcttg
6180ccttgagtgc ttcaagtagt gtgtgcccgt ctgttgtgtg actctggtaa ctagagatcc
6240ctcagaccct tttagtcagt gtggaaaatc tctagcagtg gcgcccgaac agggacttga
6300aagcgaaagg gaaaccagag gagctctctc gacgcaggac tcggcttgct gaagcgcgca
6360cggcaagagg cgaggggcgg cgactggtga gtacgccaaa aattttgact agcggaggct
6420agaaggagag agatgggtgc gagagcgtca gtattaagcg ggggagaatt agatcgcgat
6480gggaaaaaat tcggttaagg ccagggggaa agaaaaaata taaattaaaa catatagtat
6540gggcaagcag ggagctagaa cgattcgcag ttaatcctgg cctgttagaa acatcagaag
6600gctgtagaca aatactggga cagctacaac catcccttca gacaggatca gaagaactta
6660gatcattata taatacagta gcaaccctct attgtgtgca tcaaaggata gagataaaag
6720acaccaagga agctttagac aagatagagg aagagcaaaa caaaagtaag accaccgcac
6780agcaagcggc cgctgatctt cagacctgga ggaggagata tgagggacaa ttggagaagt
6840gaattatata aatataaagt agtaaaaatt gaaccattag gagtagcacc caccaaggca
6900aagagaagag tggtgcagag agaaaaaaga gcagtgggaa taggagcttt gttccttggg
6960ttcttgggag cagcaggaag cactatgggc gcagcgtcaa tgacgctgac ggtacaggcc
7020agacaattat tgtctggtat agtgcagcag cagaacaatt tgctgagggc tattgaggcg
7080caacagcatc tgttgcaact cacagtctgg ggcatcaagc agctccaggc aagaatcctg
7140gctgtggaaa gatacctaaa ggatcaacag ctcctgggga tttggggttg ctctggaaaa
7200ctcatttgca ccactgctgt gccttggaat gctagttgga gtaataaatc tctggaacag
7260atttggaatc acacgacctg gatggagtgg gacagagaaa ttaacaatta cacaagctta
7320atacactcct taattgaaga atcgcaaaac cagcaagaaa agaatgaaca agaattattg
7380gaattagata aatgggcaag tttgtggaat tggtttaaca taacaaattg gctgtggtat
7440ataaaattat tcataatgat agtaggaggc ttggtaggtt taagaatagt ttttgctgta
7500ctttctatag tgaatagagt taggcaggga tattcaccat tatcgtttca gacccacctc
7560ccaaccccga ggggacccga caggccctta attaatcccc tgattctgtg gataaccgta
7620ttaccgcctt tgagtgagct gcacgggagc aattaatcta tagattaaaa agtgaaaggc
7680tacaatagga cgggagcaat taatctatag attaaaaagt gaaaggctac aataggacgg
7740gagcaattaa tctatagatt aaaaagtgaa aggctacaat aggacgggag caattaatct
7800atagattaaa aagtgaaagg ctacaatagg acgggagcaa ttaatctata gattaaaaag
7860tgaaaggcta caataggacg ggagcaatta atctatagat taaaaagtga aaggctacaa
7920taggacggga gcaattaatc tatagattaa aaagtgaaag gctacaatag gacggtaaac
7980tcgacctata taagcagagc tcgtttagtg aaccgtcaga tcgcctggag acgccatcca
8040cgctgttttg acctccatag aagacaccgg gaccgatcca gcctccgcgg ccccgaattg
8100aattctccct atcagtgata gagatctccc tatcagtgat agagaggcca ttacggcctg
8160ccaccatgtt cgttatcatc caggctcacg aataccagaa atacgctgct gttctggacc
8220agatgttccg tctgcgtaaa aaagttttcg ctgacaccct gtgctgggac gttccggtta
8280tcggtccgta cgaacgtgac tcctacgact ccctggctcc ggcttacctg gtttggtgca
8340acgactcccg tacccgtctg tacggtggta tgcgtctgat gccgaccacc ggtccgaccc
8400tgctgtacga cgttttccgt gaaaccttcc cggacgctgc tgacctgatc gctccgggta
8460tctgggaagg tacccgtatg tgcatcgacg aagaagctat cgctaaagac ttcccggaaa
8520tcgacgctgg tcgtgctttc tccatgatgc tgctggctct gtgcgaatgc gctctggacc
8580acggtatcca caccatgatc tccaactacg aaccgtacct gaaacgtgtt tacaaacgtg
8640ctggtgctga agttgaagaa ctgggtcgtg ctgacggtta cggtaaatac ccggtttgct
8700gcggtgcttt cgaagtttcc gaccgtgttc tgcgtaaaat gcgtgctgct ctgggtctga
8760ccctgccgct gtacgttcgt cacgttccgg ctcgttccgt tgttacccag ttcctggaaa
8820tggctgctgc tgctaacgac gaaaactacg ctctggttgc ttaataatgc cacc
8874199927DNAArtificial SequenceSynthetic 19cggccatcga taaggatccg
cggccgcaat caacctctgg attacaaaat ttgtgaaaga 60ttgactggta ttcttaacta
tgttgctcct tttacgctat gtggatacgc tgctttaatg 120cctttgtatc atgctattac
ttcccgtacg gctttcattt tctcctcctt gtataaatcc 180tggttgctgt ctctttatga
ggagttgtgg cccgttgtca ggcaacgtgg cgtggtgtgc 240actgtgtttg ctgacgcaac
ccccactggt tggggcattg ccaccaccta tcaactcctt 300tccgggactt tcgctttccc
cctccctatt gccacggcgg aactcattgc cgcctgcctt 360gcccgctgct ggacaggggc
tcggctgttg ggcactgaca attccgtggt gttgtcgggg 420aagctgacgt cctttccatg
gctgctcgcc tgtgttgcca actggattct gcgcgggacg 480tccttctgct acgtcccttc
ggccctcaat ccagcggacc ttccttcccg cggcctgctg 540ccggttctgc ggcctcttcc
gcgtcttcgc cttcgccctc agacgagtcg gatctccctt 600tgggccgcct ccccgcctgc
ctgcaggttt gtcgagacct agaaaaacat ggagcaatca 660caagtagcaa tacagcagct
accaatgctg attgtgcctg gctagaagca caagaggagg 720aggaggtggg ttttccagtc
acacctcagg tacctttaag accaatgact tacaaggcag 780ctgtagatct tagccacttt
ttaaaagaaa aggggggact ggaagggcta attcactccc 840aacgaagaca agatctgctt
tttgcttgta ctgggtctct ctggttagac cagatctgag 900cctgggagct ctctggctaa
ctagggaacc cactgcttaa gcctcaataa agcttgcctt 960gagtgcttca agtagtgtgt
gcccgtctgt tgtgtgactc tggtaactag agatccctca 1020gaccctttta gtcagtgtgg
aaaatctcta gcagggcccg tttaaacccg ctgatcagcc 1080tcgactgtgc cttctagttg
ccagccatct gttgtttgcc cctcccccgt gccttccttg 1140accctggaag gtgccactcc
cactgtcctt tcctaataaa atgaggaaat tgcatcgcat 1200tgtctgagta ggtgtcattc
tattctgggg ggtggggtgg ggcaggacag caagggggag 1260gattgggaag acaatagcag
gcatgctggg gatgcggtgg gctctatggc ttctgaggcg 1320gaaagaacca gctggggctc
tagggggtat ccccacgcgc cctgtagcgg cgcattaagc 1380gcggcgggtg tggtggttac
gcgcagcgtg accgctacac ttgccagcgc cctagcgccc 1440gctcctttcg ctttcttccc
ttcctttctc gccacgttcg ccggctttcc ccgtcaagct 1500ctaaatcggg gcatcccttt
agggttccga tttagtgctt tacggcacct cgaccccaaa 1560aaacttgatt agggtgatgg
ttcacgtagt gggccatcgc cctgatagac ggtttttcgc 1620cctttgacgt tggagtccac
gttctttaat agtggactct tgttccaaac tggaacaaca 1680ctcaacccta tctcggtcta
ttcttttgat ttataaggga ttttggggat ttcggcctat 1740tggttaaaaa atgagctgat
ttaacaaaaa tttaacgcga attaattctg tggaatgtgt 1800gtcagttagg gtgtggaaag
tccccaggct ccccaggcag gcagaagtat gcaaagcatg 1860catctcaatt agtcagcaac
caggtgtgga aagtccccag gctccccagc aggcagaagt 1920atgcaaagca tgcatctcaa
ttagtcagca accatagtcc cgcccctaac tccgcccatc 1980ccgcccctaa ctccgcccag
ttccgcccat tctccgcccc atggctgact aatttttttt 2040atttatgcag aggccgaggc
cgcctctgcc tctgagctat tccagaagta gtgaggaggc 2100ttttttggag gcctaggctt
ttgcaaaaag ctcccgggag cttgtatatc cattttcgga 2160tctgatcagc acgtgttgac
aattaatcat cggcatagta tatcggcata gtataatacg 2220acaaggtgag gaactaaacc
atggccaagt tgaccagtgc cgttccggtg ctcaccgcgc 2280gcgacgtcgc cggagcggtc
gagttctgga ccgaccggct cgggttctcc cgggacttcg 2340tggaggacga cttcgccggt
gtggtccggg acgacgtgac cctgttcatc agcgcggtcc 2400aggaccaggt ggtgccggac
aacaccctgg cctgggtgtg ggtgcgcggc ctggacgagc 2460tgtacgccga gtggtcggag
gtcgtgtcca cgaacttccg ggacgcctcc gggccggcca 2520tgaccgagat cggcgagcag
ccgtgggggc gggagttcgc cctgcgcgac ccggccggca 2580actgcgtgca cttcgtggcc
gaggagcagg actgacacgt gctacgagat ttcgattcca 2640ccgccgcctt ctatgaaagg
ttgggcttcg gaatcgtttt ccgggacgcc ggctggatga 2700tcctccagcg cggggatctc
atgctggagt tcttcgccca ccccaacttg tttattgcag 2760cttataatgg ttacaaataa
agcaatagca tcacaaattt cacaaataaa gcattttttt 2820cactgcattc tagttgtggt
ttgtccaaac tcatcaatgt atcttatcat gtctgtatac 2880cgtcgacctc tagctagagc
ttggcgtaat catggtcata gctgtttcct gtgtgaaatt 2940gttatccgct cacaattcca
cacaacatac gagccggaag cataaagtgt aaagcctggg 3000gtgcctaatg agtgagctaa
ctcacattaa ttgcgttgcg ctcactgccc gctttccagt 3060cgggaaacct gtcgtgccag
ctgcattaat gaatcggcca acgcgcgggg agaggcggtt 3120tgcgtattgg gcgctcttcc
gcttcctcgc tcactgactc gctgcgctcg gtcgttcggc 3180tgcggcgagc ggtatcagct
cactcaaagg cggtaatacg gttatccaca gaatcagggg 3240ataacgcagg aaagaacatg
tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg 3300ccgcgttgct ggcgtttttc
cataggctcc gcccccctga cgagcatcac aaaaatcgac 3360gctcaagtca gaggtggcga
aacccgacag gactataaag ataccaggcg tttccccctg 3420gaagctccct cgtgcgctct
cctgttccga ccctgccgct taccggatac ctgtccgcct 3480ttctcccttc gggaagcgtg
gcgctttctc aatgctcacg ctgtaggtat ctcagttcgg 3540tgtaggtcgt tcgctccaag
ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct 3600gcgccttatc cggtaactat
cgtcttgagt ccaacccggt aagacacgac ttatcgccac 3660tggcagcagc cactggtaac
aggattagca gagcgaggta tgtaggcggt gctacagagt 3720tcttgaagtg gtggcctaac
tacggctaca ctagaaggac agtatttggt atctgcgctc 3780tgctgaagcc agttaccttc
ggaaaaagag ttggtagctc ttgatccggc aaacaaacca 3840ccgctggtag cggtggtttt
tttgtttgca agcagcagat tacgcgcaga aaaaaaggat 3900ctcaagaaga tcctttgatc
ttttctacgg ggtctgacgc tcagtggaac gaaaactcac 3960gttaagggat tttggtcatg
agattatcaa aaaggatctt cacctagatc cttttaaatt 4020aaaaatgaag ttttaaatca
atctaaagta tatatgagta aacttggtct gacagttacc 4080aatgcttaat cagtgaggca
cctatctcag cgatctgtct atttcgttca tccatagttg 4140cctgactccc cgtcgtgtag
ataactacga tacgggaggg cttaccatct ggccccagtg 4200ctgcaatgat accgcgagac
ccacgctcac cggctccaga tttatcagca ataaaccagc 4260cagccggaag ggccgagcgc
agaagtggtc ctgcaacttt atccgcctcc atccagtcta 4320ttaattgttg ccgggaagct
agagtaagta gttcgccagt taatagtttg cgcaacgttg 4380ttgccattgc tacaggcatc
gtggtgtcac gctcgtcgtt tggtatggct tcattcagct 4440ccggttccca acgatcaagg
cgagttacat gatcccccat gttgtgcaaa aaagcggtta 4500gctccttcgg tcctccgatc
gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg 4560ttatggcagc actgcataat
tctcttactg tcatgccatc cgtaagatgc ttttctgtga 4620ctggtgagta ctcaaccaag
tcattctgag aatagtgtat gcggcgaccg agttgctctt 4680gcccggcgtc aatacgggat
aataccgcgc cacatagcag aactttaaaa gtgctcatca 4740ttggaaaacg ttcttcgggg
cgaaaactct caaggatctt accgctgttg agatccagtt 4800cgatgtaacc cactcgtgca
cccaactgat cttcagcatc ttttactttc accagcgttt 4860ctgggtgagc aaaaacagga
aggcaaaatg ccgcaaaaaa gggaataagg gcgacacgga 4920aatgttgaat actcatactc
ttcctttttc aatattattg aagcatttat cagggttatt 4980gtctcatgag cggatacata
tttgaatgta tttagaaaaa taaacaaata ggggttccgc 5040gcacatttcc ccgaaaagtg
ccacctgacg tcgacggatc gggagatctc ccgatcccct 5100atggtgcact ctcagtacaa
tctgctctga tgccgcatag ttaagccagt atctgctccc 5160tgcttgtgtg ttggaggtcg
ctgagtagtg cgcgagcaaa atttaagcta caacaaggca 5220aggcttgacc gacaattgca
tgaagaatct gcttagggtt aggcgttttg cgctgcttcg 5280cgatgtacgg gccagatata
cgcgttgaca ttgattattg actagttatt aatagtaatc 5340aattacgggg tcattagttc
atagcccata tatggagttc cgcgttacat aacttacggt 5400aaatggcccg cctggctgac
cgcccaacga cccccgccca ttgacgtcaa taatgacgta 5460tgttcccata gtaacgccaa
tagggacttt ccattgacgt caatgggtgg actatttacg 5520gtaaactgcc cacttggcag
tacatcaagt gtatcatatg ccaagtacgc cccctattga 5580cgtcaatgac ggtaaatggc
ccgcctggca ttatgcccag tacatgacct tatgggactt 5640tcctacttgg cagtacatct
acgtattagt catcgctatt accatggtga tgcggttttg 5700gcagtacatc aatgggcgtg
gatagcggtt tgactcacgg ggatttccaa gtctccaccc 5760cattgacgtc aatgggagtt
tgttttggca ccaaaatcaa cgggactttc caaaatgtcg 5820taacaactcc gccccattga
cgcaaatggg cggtaggcgt gtacggtggg aggtctatat 5880aagcagagct ctctggctaa
ctagagaacc cactgcttac tggcttatcg aaattaatac 5940gactcactat agggagaccc
aagctggttt aaacttaagc ttggtaccga gctcactagt 6000ccagtgtggt ggcagatatc
cagcacagtg gcggccgctc gagtctagag ggcccgtttt 6060gcctgtactg ggtctctctg
gttagaccag atctgagcct gggagctctc tggctaacta 6120gggaacccac tgcttaagcc
tcaataaagc ttgccttgag tgcttcaagt agtgtgtgcc 6180cgtctgttgt gtgactctgg
taactagaga tccctcagac ccttttagtc agtgtggaaa 6240atctctagca gtggcgcccg
aacagggact tgaaagcgaa agggaaacca gaggagctct 6300ctcgacgcag gactcggctt
gctgaagcgc gcacggcaag aggcgagggg cggcgactgg 6360tgagtacgcc aaaaattttg
actagcggag gctagaagga gagagatggg tgcgagagcg 6420tcagtattaa gcgggggaga
attagatcgc gatgggaaaa aattcggtta aggccagggg 6480gaaagaaaaa atataaatta
aaacatatag tatgggcaag cagggagcta gaacgattcg 6540cagttaatcc tggcctgtta
gaaacatcag aaggctgtag acaaatactg ggacagctac 6600aaccatccct tcagacagga
tcagaagaac ttagatcatt atataataca gtagcaaccc 6660tctattgtgt gcatcaaagg
atagagataa aagacaccaa ggaagcttta gacaagatag 6720aggaagagca aaacaaaagt
aagaccaccg cacagcaagc ggccgctgat cttcagacct 6780ggaggaggag atatgaggga
caattggaga agtgaattat ataaatataa agtagtaaaa 6840attgaaccat taggagtagc
acccaccaag gcaaagagaa gagtggtgca gagagaaaaa 6900agagcagtgg gaataggagc
tttgttcctt gggttcttgg gagcagcagg aagcactatg 6960ggcgcagcgt caatgacgct
gacggtacag gccagacaat tattgtctgg tatagtgcag 7020cagcagaaca atttgctgag
ggctattgag gcgcaacagc atctgttgca actcacagtc 7080tggggcatca agcagctcca
ggcaagaatc ctggctgtgg aaagatacct aaaggatcaa 7140cagctcctgg ggatttgggg
ttgctctgga aaactcattt gcaccactgc tgtgccttgg 7200aatgctagtt ggagtaataa
atctctggaa cagatttgga atcacacgac ctggatggag 7260tgggacagag aaattaacaa
ttacacaagc ttaatacact ccttaattga agaatcgcaa 7320aaccagcaag aaaagaatga
acaagaatta ttggaattag ataaatgggc aagtttgtgg 7380aattggttta acataacaaa
ttggctgtgg tatataaaat tattcataat gatagtagga 7440ggcttggtag gtttaagaat
agtttttgct gtactttcta tagtgaatag agttaggcag 7500ggatattcac cattatcgtt
tcagacccac ctcccaaccc cgaggggacc cgacaggccc 7560ttaattaatt ggctccggtg
cccgtcagtg ggcagagcgc acatcgccca cagtccccga 7620gaagttgggg ggaggggtcg
gcaattgaac cggtgcctag agaaggtggc gcggggtaaa 7680ctgggaaagt gatgtcgtgt
actggctccg cctttttccc gagggtgggg gagaaccgta 7740tataagtgca gtagtcgccg
tgaagctagc aattgtgagc ggataacaat tccacagtcg 7800accctaggtt gtgtcgcgag
tgttggatcc cagctgacac caattgtgag cgctcacaat 7860tgctagaaga atacaaccac
gactatataa gaatacaacc acgactccgt tctttttcgc 7920aacgggtttg ccgccagaac
acaggtaagt gccgtgtgtg gttcccgcgg gcctggcctc 7980tttacgggtt atggcccttg
cgtgccttga attacttcca cctggctgca gtacgtgatt 8040cttgatcccg agcttcgggt
tggaagtggg tgggagagtt cgaggccttg cgcttaagga 8100gccccttcgc ctcgtgcttg
agttgaggcc tggcctgggc gctggggccg ccgcgtgcga 8160atctggtggc accttcgcgc
ctgtctcgct gctttcgata agtctctagc catttaaaat 8220ttttgatgac ctgctgcgac
gctttttttc tggcaagata gtcttgtaaa tgcgggccaa 8280gatctgcaca ctggtatttc
ggtttttggg gccgcgggcg gcgacggggc ccgtgcgtcc 8340cagcgcacat gttcggcgag
gcggggcctg cgagcgcggc caccgagaat cggacggggg 8400tagtctcaag ctggccggcc
tgctctggtg cctggcctcg cgccgccgtg tatcgccccg 8460ccctgggcgg caaggctggc
ccggtcggca ccagttgcgt gagcggaaag atggccgctt 8520cccggccctg ctgcagggag
ctcaaaatgg aggacgcggc gctcgggaga gcgggcgggt 8580gagtcaccca cacaaaggaa
aagggccttt ccgtcctcag ccgtcgcttc atgtgactcc 8640acggagtacc gggcgccgtc
caggcacctc gattagttct cgagcttttg gagtacgtcg 8700tctttaggtt ggggggaggg
gttttatgcg atggagtttc cccacactga gtgggtggag 8760actgaagtta ggccagcttg
gcacttgatg taattctcct tggaatttgc cctttttgag 8820tttggatctt ggttcattct
caagcctcag acagtggttc aaagtttttt tcttccattt 8880caggtgaatt cggccattac
ggcccgccac catggctaga ttagataaaa gtaaagtgat 8940taacagcgca ttagagctgc
ttaatgaggt cggaatcgaa ggtttaacaa cccgtaaact 9000cgcccagaag ctaggtgtag
agcagcctac attgtattgg catgtaaaaa ataagcgggc 9060tttgctcgac gccttagcca
ttgagatgtt agataggcac catactcact tttgcccttt 9120agaaggggaa agctggcaag
attttttacg taataacgct aaaagtttta gatgtgcttt 9180actaagtcat cgcgatggag
caaaagtaca tttaggtaca cggcctacag aaaaacagta 9240tgaaactctc gaaaatcaat
tagccttttt atgccaacaa ggtttttcac tagagaatgc 9300attatatgca ctcagcgctg
tggggcattt tactttaggt tgcgtattgg aagatcaaga 9360gcatcaagtc gctaaagaag
aaagggaaac acctactact gatagtatgc cgccattatt 9420acgacaagct atcgaattat
ttgatcacca aggtgcagag ccagccttct tattcggcct 9480tgaattgatc atatgcggat
tagaaaaaca acttaaatgt gaaagtgggt cgccaaaaaa 9540gaagagaaag gtcgacggcg
gtggtgcttt gtctcctcag cactctgctg tcactcaagg 9600aagtatcatc aagaacaagg
agggcatgga tgctaagtca ctaactgcct ggtcccggac 9660actggtgacc ttcaaggatg
tatttgtgga cttcaccagg gaggagtgga agctgctgga 9720cactgctcag cagatcgtgt
acagaaatgt gatgctggag aactataaga acctggtttc 9780cttgggttat cagcttacta
agccagatgt gatcctccgg ttggagaagg gagaagagcc 9840ctggctggtg gagagagaaa
ttcaccaaga gacccatcct gattcagaga ctgcatttga 9900aatcaaatca tcagtttaag
gccgcct 9927208916DNAArtificial
SequenceSynthetic 20agaattcgat atcaagctta tcgataatca acctctggat
tacaaaattt gtgaaagatt 60gactggtatt cttaactatg ttgctccttt tacgctatgt
ggatacgctg ctttaatgcc 120tttgtatcat gctattactt cccgtacggc tttcattttc
tcctccttgt ataaatcctg 180gttgctgtct ctttatgagg agttgtggcc cgttgtcagg
caacgtggcg tggtgtgcac 240tgtgtttgct gacgcaaccc ccactggttg gggcattgcc
accacctatc aactcctttc 300cgggactttc gctttccccc tccctattgc cacggcggaa
ctcattgccg cctgccttgc 360ccgctgctgg acaggggctc ggctgttggg cactgacaat
tccgtggtgt tgtcggggaa 420gctgacgtcc tttccatggc tgctcgcctg tgttgccaac
tggattctgc gcgggacgtc 480cttctgctac gtcccttcgg ccctcaatcc agcggacctt
ccttcccgcg gcctgctgcc 540ggttctgcgg cctcttccgc gtcttcgcct tcgccctcag
acgagtcgga tctccctttg 600ggccgcctcc ccgcctgcct gcaggtttgt cgagacctag
aaaaacatgg agcaatcaca 660agtagcaata cagcagctac caatgctgat tgtgcctggc
tagaagcaca agaggaggag 720gaggtgggtt ttccagtcac acctcaggta cctttaagac
caatgactta caaggcagct 780gtagatctta gccacttttt aaaagaaaag gggggactgg
aagggctaat tcactcccaa 840cgaagacaag atctgctttt tgcttgtact gggtctctct
ggttagacca gatctgagcc 900tgggagctct ctggctaact agggaaccca ctgcttaagc
ctcaataaag cttgccttga 960gtgcttcaag tagtgtgtgc ccgtctgttg tgtgactctg
gtaactagag atccctcaga 1020cccttttagt cagtgtggaa aatctctagc agggcccgtt
taaacccgct gatcagcctc 1080gactgtgcct tctagttgcc agccatctgt tgtttgcccc
tcccccgtgc cttccttgac 1140cctggaaggt gccactccca ctgtcctttc ctaataaaat
gaggaaattg catcgcattg 1200tctgagtagg tgtcattcta ttctgggggg tggggtgggg
caggacagca agggggagga 1260ttgggaagac aatagcaggc atgctgggga tgcggtgggc
tctatggctt ctgaggcgga 1320aagaaccagc tggggctcta gggggtatcc ccacgcgccc
tgtagcggcg cattaagcgc 1380ggcgggtgtg gtggttacgc gcagcgtgac cgctacactt
gccagcgccc tagcgcccgc 1440tcctttcgct ttcttccctt cctttctcgc cacgttcgcc
ggctttcccc gtcaagctct 1500aaatcggggc atccctttag ggttccgatt tagtgcttta
cggcacctcg accccaaaaa 1560acttgattag ggtgatggtt cacgtagtgg gccatcgccc
tgatagacgg tttttcgccc 1620tttgacgttg gagtccacgt tctttaatag tggactcttg
ttccaaactg gaacaacact 1680caaccctatc tcggtctatt cttttgattt ataagggatt
ttggggattt cggcctattg 1740gttaaaaaat gagctgattt aacaaaaatt taacgcgaat
taattctgtg gaatgtgtgt 1800cagttagggt gtggaaagtc cccaggctcc ccaggcaggc
agaagtatgc aaagcatgca 1860tctcaattag tcagcaacca ggtgtggaaa gtccccaggc
tccccagcag gcagaagtat 1920gcaaagcatg catctcaatt agtcagcaac catagtcccg
cccctaactc cgcccatccc 1980gcccctaact ccgcccagtt ccgcccattc tccgccccat
ggctgactaa ttttttttat 2040ttatgcagag gccgaggccg cctctgcctc tgagctattc
cagaagtagt gaggaggctt 2100ttttggaggc ctaggctttt gcaaaaagct cccgggagct
tgtatatcca ttttcggatc 2160tgatcagcac gtgttgacaa ttaatcatcg gcatagtata
tcggcatagt ataatacgac 2220aaggtgagga actaaaccat ggccaagttg accagtgccg
ttccggtgct caccgcgcgc 2280gacgtcgccg gagcggtcga gttctggacc gaccggctcg
ggttctcccg ggacttcgtg 2340gaggacgact tcgccggtgt ggtccgggac gacgtgaccc
tgttcatcag cgcggtccag 2400gaccaggtgg tgccggacaa caccctggcc tgggtgtggg
tgcgcggcct ggacgagctg 2460tacgccgagt ggtcggaggt cgtgtccacg aacttccggg
acgcctccgg gccggccatg 2520accgagatcg gcgagcagcc gtgggggcgg gagttcgccc
tgcgcgaccc ggccggcaac 2580tgcgtgcact tcgtggccga ggagcaggac tgacacgtgc
tacgagattt cgattccacc 2640gccgccttct atgaaaggtt gggcttcgga atcgttttcc
gggacgccgg ctggatgatc 2700ctccagcgcg gggatctcat gctggagttc ttcgcccacc
ccaacttgtt tattgcagct 2760tataatggtt acaaataaag caatagcatc acaaatttca
caaataaagc atttttttca 2820ctgcattcta gttgtggttt gtccaaactc atcaatgtat
cttatcatgt ctgtataccg 2880tcgacctcta gctagagctt ggcgtaatca tggtcatagc
tgtttcctgt gtgaaattgt 2940tatccgctca caattccaca caacatacga gccggaagca
taaagtgtaa agcctggggt 3000gcctaatgag tgagctaact cacattaatt gcgttgcgct
cactgcccgc tttccagtcg 3060ggaaacctgt cgtgccagct gcattaatga atcggccaac
gcgcggggag aggcggtttg 3120cgtattgggc gctcttccgc ttcctcgctc actgactcgc
tgcgctcggt cgttcggctg 3180cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt
tatccacaga atcaggggat 3240aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg
ccaggaaccg taaaaaggcc 3300gcgttgctgg cgtttttcca taggctccgc ccccctgacg
agcatcacaa aaatcgacgc 3360tcaagtcaga ggtggcgaaa cccgacagga ctataaagat
accaggcgtt tccccctgga 3420agctccctcg tgcgctctcc tgttccgacc ctgccgctta
ccggatacct gtccgccttt 3480ctcccttcgg gaagcgtggc gctttctcaa tgctcacgct
gtaggtatct cagttcggtg 3540taggtcgttc gctccaagct gggctgtgtg cacgaacccc
ccgttcagcc cgaccgctgc 3600gccttatccg gtaactatcg tcttgagtcc aacccggtaa
gacacgactt atcgccactg 3660gcagcagcca ctggtaacag gattagcaga gcgaggtatg
taggcggtgc tacagagttc 3720ttgaagtggt ggcctaacta cggctacact agaaggacag
tatttggtat ctgcgctctg 3780ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt
gatccggcaa acaaaccacc 3840gctggtagcg gtggtttttt tgtttgcaag cagcagatta
cgcgcagaaa aaaaggatct 3900caagaagatc ctttgatctt ttctacgggg tctgacgctc
agtggaacga aaactcacgt 3960taagggattt tggtcatgag attatcaaaa aggatcttca
cctagatcct tttaaattaa 4020aaatgaagtt ttaaatcaat ctaaagtata tatgagtaaa
cttggtctga cagttaccaa 4080tgcttaatca gtgaggcacc tatctcagcg atctgtctat
ttcgttcatc catagttgcc 4140tgactccccg tcgtgtagat aactacgata cgggagggct
taccatctgg ccccagtgct 4200gcaatgatac cgcgagaccc acgctcaccg gctccagatt
tatcagcaat aaaccagcca 4260gccggaaggg ccgagcgcag aagtggtcct gcaactttat
ccgcctccat ccagtctatt 4320aattgttgcc gggaagctag agtaagtagt tcgccagtta
atagtttgcg caacgttgtt 4380gccattgcta caggcatcgt ggtgtcacgc tcgtcgtttg
gtatggcttc attcagctcc 4440ggttcccaac gatcaaggcg agttacatga tcccccatgt
tgtgcaaaaa agcggttagc 4500tccttcggtc ctccgatcgt tgtcagaagt aagttggccg
cagtgttatc actcatggtt 4560atggcagcac tgcataattc tcttactgtc atgccatccg
taagatgctt ttctgtgact 4620ggtgagtact caaccaagtc attctgagaa tagtgtatgc
ggcgaccgag ttgctcttgc 4680ccggcgtcaa tacgggataa taccgcgcca catagcagaa
ctttaaaagt gctcatcatt 4740ggaaaacgtt cttcggggcg aaaactctca aggatcttac
cgctgttgag atccagttcg 4800atgtaaccca ctcgtgcacc caactgatct tcagcatctt
ttactttcac cagcgtttct 4860gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg
gaataagggc gacacggaaa 4920tgttgaatac tcatactctt cctttttcaa tattattgaa
gcatttatca gggttattgt 4980ctcatgagcg gatacatatt tgaatgtatt tagaaaaata
aacaaatagg ggttccgcgc 5040acatttcccc gaaaagtgcc acctgacgtc gacggatcgg
gagatctccc gatcccctat 5100ggtgcactct cagtacaatc tgctctgatg ccgcatagtt
aagccagtat ctgctccctg 5160cttgtgtgtt ggaggtcgct gagtagtgcg cgagcaaaat
ttaagctaca acaaggcaag 5220gcttgaccga caattgcatg aagaatctgc ttagggttag
gcgttttgcg ctgcttcgcg 5280atgtacgggc cagatatacg cgttgacatt gattattgac
tagttattaa tagtaatcaa 5340ttacggggtc attagttcat agcccatata tggagttccg
cgttacataa cttacggtaa 5400atggcccgcc tggctgaccg cccaacgacc cccgcccatt
gacgtcaata atgacgtatg 5460ttcccatagt aacgccaata gggactttcc attgacgtca
atgggtggac tatttacggt 5520aaactgccca cttggcagta catcaagtgt atcatatgcc
aagtacgccc cctattgacg 5580tcaatgacgg taaatggccc gcctggcatt atgcccagta
catgacctta tgggactttc 5640ctacttggca gtacatctac gtattagtca tcgctattac
catggtgatg cggttttggc 5700agtacatcaa tgggcgtgga tagcggtttg actcacgggg
atttccaagt ctccacccca 5760ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg
ggactttcca aaatgtcgta 5820acaactccgc cccattgacg caaatgggcg gtaggcgtgt
acggtgggag gtctatataa 5880gcagagctct ctggctaact agagaaccca ctgcttactg
gcttatcgaa attaatacga 5940ctcactatag ggagacccaa gctggtttaa acttaagctt
ggtaccgagc tcactagtcc 6000agtgtggtgg cagatatcca gcacagtggc ggccgctcga
ggggcccgtt ttgcctgtac 6060tgggtctctc tggttagacc agatctgagc ctgggagctc
tctggctaac tagggaaccc 6120actgcttaag cctcaataaa gcttgccttg agtgcttcaa
gtagtgtgtg cccgtctgtt 6180gtgtgactct ggtaactaga gatccctcag acccttttag
tcagtgtgga aaatctctag 6240cagtggcgcc cgaacaggga cttgaaagcg aaagggaaac
cagaggagct ctctcgacgc 6300aggactcggc ttgctgaagc gcgcacggca agaggcgagg
ggcggcgact ggtgagtacg 6360ccaaaaattt tgactagcgg aggctagaag gagagagatg
ggtgcgagag cgtcagtatt 6420aagcggggga gaattagatc gcgatgggaa aaaattcggt
taaggccagg gggaaagaaa 6480aaatataaat taaaacatat agtatgggca agcagggagc
tagaacgatt cgcagttaat 6540cctggcctgt tagaaacatc agaaggctgt agacaaatac
tgggacagct acaaccatcc 6600cttcagacag gatcagaaga acttagatca ttatataata
cagtagcaac cctctattgt 6660gtgcatcaaa ggatagagat aaaagacacc aaggaagctt
tagacaagat agaggaagag 6720caaaacaaaa gtaagaccac cgcacagcaa gcggccgctg
atcttcagac ctggaggagg 6780agatatgagg gacaattgga gaagtgaatt atataaatat
aaagtagtaa aaattgaacc 6840attaggagta gcacccacca aggcaaagag aagagtggtg
cagagagaaa aaagagcagt 6900gggaatagga gctttgttcc ttgggttctt gggagcagca
ggaagcacta tgggcgcagc 6960gtcaatgacg ctgacggtac aggccagaca attattgtct
ggtatagtgc agcagcagaa 7020caatttgctg agggctattg aggcgcaaca gcatctgttg
caactcacag tctggggcat 7080caagcagctc caggcaagaa tcctggctgt ggaaagatac
ctaaaggatc aacagctcct 7140ggggatttgg ggttgctctg gaaaactcat ttgcaccact
gctgtgcctt ggaatgctag 7200ttggagtaat aaatctctgg aacagatttg gaatcacacg
acctggatgg agtgggacag 7260agaaattaac aattacacaa gcttaataca ctccttaatt
gaagaatcgc aaaaccagca 7320agaaaagaat gaacaagaat tattggaatt agataaatgg
gcaagtttgt ggaattggtt 7380taacataaca aattggctgt ggtatataaa attattcata
atgatagtag gaggcttggt 7440aggtttaaga atagtttttg ctgtactttc tatagtgaat
agagttaggc agggatattc 7500accattatcg tttcagaccc acctcccaac cccgagggga
cccgacaggc ccttaattaa 7560tcccctgatt ctgtggataa ccgtattacc gcctttgagt
gagctgcacc tagtacggat 7620tagaagccgc cgagcgggtg acagccctcc gaaggaagac
tctcctccgt gcgtcctcgt 7680cttcaccggt cgcgttcctg aaacgcagat gtgcctcgcg
ccgcactgct ccgaacaatg 7740tcgactctag aggtaaactc gacctatata agcagagctc
gtttagtgaa ccgtcagatc 7800gcctggagac gccatccacg ctgttttgac ctccatagaa
gacaccggga ccgatccagc 7860ctccgcggcc ccgaattgaa ttcggccatt acggcctgcc
accgtgccac catgctggag 7920cccggcgaga agccctacaa gtgccccgag tgcggcaaga
gcttcagcga ctgcagggac 7980ctggccaggc accagaggac ccacaccggc gagaagccct
acaagtgccc cgagtgcggc 8040aagagcttca gcgaccccgg caacctggtg aggcaccaga
ggacccacac cggcgagaag 8100ccctacaagt gccccgagtg cggcaagagc ttcagccaga
gcagcagcct ggtgaggcac 8160cagaggaccc acaccggcga gaagccctac aagtgccccg
agtgcggcaa gagcttcagc 8220cagagggccc acctggagag gcaccagagg acccacaccg
gcgagaagcc ctacaagtgc 8280cccgagtgcg gcaagagctt cagccagagc ggcgacctga
ggaggcacca gaggacccac 8340accggcgaga agccctacaa gtgccccgag tgcggcaaga
gcttcagcca gagcagcaac 8400ctggtgaggc accagaggac ccacaccggc aagaagacca
gcggcccagg cggccgatgc 8460taagtcactg actgcctggt cccggacact ggtgaccttc
aaggatgtgt ttgtggactt 8520caccagggag gagtggaagc tgctggacac tgctcagcag
atcctgtaca gaaatgtgat 8580gctggagaac tataagaacc tggtttcctt gggttatcag
cttactaagc cagatgtgat 8640cctccggttg gagaagggag aagagccctg gctggtggag
agagaaattc accaagagac 8700ccatcctgat tcagagactg catttgaaat caaatcatca
gttgggcgcg ccgacgcgct 8760ggacgatttc gatctcgaca tgctgggttc tgatgccctc
gatgactttg acctggatat 8820gttgggaagc gacgcattgg atgactttga tctggacatg
ctcggctccg atgctctgga 8880cgatttcgat ctcgatatgt taattaacta attaat
8916218916DNAArtificial SequenceSynthetic
21agaattcgat atcaagctta tcgataatca acctctggat tacaaaattt gtgaaagatt
60gactggtatt cttaactatg ttgctccttt tacgctatgt ggatacgctg ctttaatgcc
120tttgtatcat gctattactt cccgtacggc tttcattttc tcctccttgt ataaatcctg
180gttgctgtct ctttatgagg agttgtggcc cgttgtcagg caacgtggcg tggtgtgcac
240tgtgtttgct gacgcaaccc ccactggttg gggcattgcc accacctatc aactcctttc
300cgggactttc gctttccccc tccctattgc cacggcggaa ctcattgccg cctgccttgc
360ccgctgctgg acaggggctc ggctgttggg cactgacaat tccgtggtgt tgtcggggaa
420gctgacgtcc tttccatggc tgctcgcctg tgttgccaac tggattctgc gcgggacgtc
480cttctgctac gtcccttcgg ccctcaatcc agcggacctt ccttcccgcg gcctgctgcc
540ggttctgcgg cctcttccgc gtcttcgcct tcgccctcag acgagtcgga tctccctttg
600ggccgcctcc ccgcctgcct gcaggtttgt cgagacctag aaaaacatgg agcaatcaca
660agtagcaata cagcagctac caatgctgat tgtgcctggc tagaagcaca agaggaggag
720gaggtgggtt ttccagtcac acctcaggta cctttaagac caatgactta caaggcagct
780gtagatctta gccacttttt aaaagaaaag gggggactgg aagggctaat tcactcccaa
840cgaagacaag atctgctttt tgcttgtact gggtctctct ggttagacca gatctgagcc
900tgggagctct ctggctaact agggaaccca ctgcttaagc ctcaataaag cttgccttga
960gtgcttcaag tagtgtgtgc ccgtctgttg tgtgactctg gtaactagag atccctcaga
1020cccttttagt cagtgtggaa aatctctagc agggcccgtt taaacccgct gatcagcctc
1080gactgtgcct tctagttgcc agccatctgt tgtttgcccc tcccccgtgc cttccttgac
1140cctggaaggt gccactccca ctgtcctttc ctaataaaat gaggaaattg catcgcattg
1200tctgagtagg tgtcattcta ttctgggggg tggggtgggg caggacagca agggggagga
1260ttgggaagac aatagcaggc atgctgggga tgcggtgggc tctatggctt ctgaggcgga
1320aagaaccagc tggggctcta gggggtatcc ccacgcgccc tgtagcggcg cattaagcgc
1380ggcgggtgtg gtggttacgc gcagcgtgac cgctacactt gccagcgccc tagcgcccgc
1440tcctttcgct ttcttccctt cctttctcgc cacgttcgcc ggctttcccc gtcaagctct
1500aaatcggggc atccctttag ggttccgatt tagtgcttta cggcacctcg accccaaaaa
1560acttgattag ggtgatggtt cacgtagtgg gccatcgccc tgatagacgg tttttcgccc
1620tttgacgttg gagtccacgt tctttaatag tggactcttg ttccaaactg gaacaacact
1680caaccctatc tcggtctatt cttttgattt ataagggatt ttggggattt cggcctattg
1740gttaaaaaat gagctgattt aacaaaaatt taacgcgaat taattctgtg gaatgtgtgt
1800cagttagggt gtggaaagtc cccaggctcc ccaggcaggc agaagtatgc aaagcatgca
1860tctcaattag tcagcaacca ggtgtggaaa gtccccaggc tccccagcag gcagaagtat
1920gcaaagcatg catctcaatt agtcagcaac catagtcccg cccctaactc cgcccatccc
1980gcccctaact ccgcccagtt ccgcccattc tccgccccat ggctgactaa ttttttttat
2040ttatgcagag gccgaggccg cctctgcctc tgagctattc cagaagtagt gaggaggctt
2100ttttggaggc ctaggctttt gcaaaaagct cccgggagct tgtatatcca ttttcggatc
2160tgatcagcac gtgttgacaa ttaatcatcg gcatagtata tcggcatagt ataatacgac
2220aaggtgagga actaaaccat ggccaagttg accagtgccg ttccggtgct caccgcgcgc
2280gacgtcgccg gagcggtcga gttctggacc gaccggctcg ggttctcccg ggacttcgtg
2340gaggacgact tcgccggtgt ggtccgggac gacgtgaccc tgttcatcag cgcggtccag
2400gaccaggtgg tgccggacaa caccctggcc tgggtgtggg tgcgcggcct ggacgagctg
2460tacgccgagt ggtcggaggt cgtgtccacg aacttccggg acgcctccgg gccggccatg
2520accgagatcg gcgagcagcc gtgggggcgg gagttcgccc tgcgcgaccc ggccggcaac
2580tgcgtgcact tcgtggccga ggagcaggac tgacacgtgc tacgagattt cgattccacc
2640gccgccttct atgaaaggtt gggcttcgga atcgttttcc gggacgccgg ctggatgatc
2700ctccagcgcg gggatctcat gctggagttc ttcgcccacc ccaacttgtt tattgcagct
2760tataatggtt acaaataaag caatagcatc acaaatttca caaataaagc atttttttca
2820ctgcattcta gttgtggttt gtccaaactc atcaatgtat cttatcatgt ctgtataccg
2880tcgacctcta gctagagctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt
2940tatccgctca caattccaca caacatacga gccggaagca taaagtgtaa agcctggggt
3000gcctaatgag tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg
3060ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag aggcggtttg
3120cgtattgggc gctcttccgc ttcctcgctc actgactcgc tgcgctcggt cgttcggctg
3180cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt tatccacaga atcaggggat
3240aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc
3300gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc
3360tcaagtcaga ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga
3420agctccctcg tgcgctctcc tgttccgacc ctgccgctta ccggatacct gtccgccttt
3480ctcccttcgg gaagcgtggc gctttctcaa tgctcacgct gtaggtatct cagttcggtg
3540taggtcgttc gctccaagct gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc
3600gccttatccg gtaactatcg tcttgagtcc aacccggtaa gacacgactt atcgccactg
3660gcagcagcca ctggtaacag gattagcaga gcgaggtatg taggcggtgc tacagagttc
3720ttgaagtggt ggcctaacta cggctacact agaaggacag tatttggtat ctgcgctctg
3780ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt gatccggcaa acaaaccacc
3840gctggtagcg gtggtttttt tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct
3900caagaagatc ctttgatctt ttctacgggg tctgacgctc agtggaacga aaactcacgt
3960taagggattt tggtcatgag attatcaaaa aggatcttca cctagatcct tttaaattaa
4020aaatgaagtt ttaaatcaat ctaaagtata tatgagtaaa cttggtctga cagttaccaa
4080tgcttaatca gtgaggcacc tatctcagcg atctgtctat ttcgttcatc catagttgcc
4140tgactccccg tcgtgtagat aactacgata cgggagggct taccatctgg ccccagtgct
4200gcaatgatac cgcgagaccc acgctcaccg gctccagatt tatcagcaat aaaccagcca
4260gccggaaggg ccgagcgcag aagtggtcct gcaactttat ccgcctccat ccagtctatt
4320aattgttgcc gggaagctag agtaagtagt tcgccagtta atagtttgcg caacgttgtt
4380gccattgcta caggcatcgt ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc
4440ggttcccaac gatcaaggcg agttacatga tcccccatgt tgtgcaaaaa agcggttagc
4500tccttcggtc ctccgatcgt tgtcagaagt aagttggccg cagtgttatc actcatggtt
4560atggcagcac tgcataattc tcttactgtc atgccatccg taagatgctt ttctgtgact
4620ggtgagtact caaccaagtc attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc
4680ccggcgtcaa tacgggataa taccgcgcca catagcagaa ctttaaaagt gctcatcatt
4740ggaaaacgtt cttcggggcg aaaactctca aggatcttac cgctgttgag atccagttcg
4800atgtaaccca ctcgtgcacc caactgatct tcagcatctt ttactttcac cagcgtttct
4860gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa
4920tgttgaatac tcatactctt cctttttcaa tattattgaa gcatttatca gggttattgt
4980ctcatgagcg gatacatatt tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc
5040acatttcccc gaaaagtgcc acctgacgtc gacggatcgg gagatctccc gatcccctat
5100ggtgcactct cagtacaatc tgctctgatg ccgcatagtt aagccagtat ctgctccctg
5160cttgtgtgtt ggaggtcgct gagtagtgcg cgagcaaaat ttaagctaca acaaggcaag
5220gcttgaccga caattgcatg aagaatctgc ttagggttag gcgttttgcg ctgcttcgcg
5280atgtacgggc cagatatacg cgttgacatt gattattgac tagttattaa tagtaatcaa
5340ttacggggtc attagttcat agcccatata tggagttccg cgttacataa cttacggtaa
5400atggcccgcc tggctgaccg cccaacgacc cccgcccatt gacgtcaata atgacgtatg
5460ttcccatagt aacgccaata gggactttcc attgacgtca atgggtggac tatttacggt
5520aaactgccca cttggcagta catcaagtgt atcatatgcc aagtacgccc cctattgacg
5580tcaatgacgg taaatggccc gcctggcatt atgcccagta catgacctta tgggactttc
5640ctacttggca gtacatctac gtattagtca tcgctattac catggtgatg cggttttggc
5700agtacatcaa tgggcgtgga tagcggtttg actcacgggg atttccaagt ctccacccca
5760ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg ggactttcca aaatgtcgta
5820acaactccgc cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa
5880gcagagctct ctggctaact agagaaccca ctgcttactg gcttatcgaa attaatacga
5940ctcactatag ggagacccaa gctggtttaa acttaagctt ggtaccgagc tcactagtcc
6000agtgtggtgg cagatatcca gcacagtggc ggccgctcga ggggcccgtt ttgcctgtac
6060tgggtctctc tggttagacc agatctgagc ctgggagctc tctggctaac tagggaaccc
6120actgcttaag cctcaataaa gcttgccttg agtgcttcaa gtagtgtgtg cccgtctgtt
6180gtgtgactct ggtaactaga gatccctcag acccttttag tcagtgtgga aaatctctag
6240cagtggcgcc cgaacaggga cttgaaagcg aaagggaaac cagaggagct ctctcgacgc
6300aggactcggc ttgctgaagc gcgcacggca agaggcgagg ggcggcgact ggtgagtacg
6360ccaaaaattt tgactagcgg aggctagaag gagagagatg ggtgcgagag cgtcagtatt
6420aagcggggga gaattagatc gcgatgggaa aaaattcggt taaggccagg gggaaagaaa
6480aaatataaat taaaacatat agtatgggca agcagggagc tagaacgatt cgcagttaat
6540cctggcctgt tagaaacatc agaaggctgt agacaaatac tgggacagct acaaccatcc
6600cttcagacag gatcagaaga acttagatca ttatataata cagtagcaac cctctattgt
6660gtgcatcaaa ggatagagat aaaagacacc aaggaagctt tagacaagat agaggaagag
6720caaaacaaaa gtaagaccac cgcacagcaa gcggccgctg atcttcagac ctggaggagg
6780agatatgagg gacaattgga gaagtgaatt atataaatat aaagtagtaa aaattgaacc
6840attaggagta gcacccacca aggcaaagag aagagtggtg cagagagaaa aaagagcagt
6900gggaatagga gctttgttcc ttgggttctt gggagcagca ggaagcacta tgggcgcagc
6960gtcaatgacg ctgacggtac aggccagaca attattgtct ggtatagtgc agcagcagaa
7020caatttgctg agggctattg aggcgcaaca gcatctgttg caactcacag tctggggcat
7080caagcagctc caggcaagaa tcctggctgt ggaaagatac ctaaaggatc aacagctcct
7140ggggatttgg ggttgctctg gaaaactcat ttgcaccact gctgtgcctt ggaatgctag
7200ttggagtaat aaatctctgg aacagatttg gaatcacacg acctggatgg agtgggacag
7260agaaattaac aattacacaa gcttaataca ctccttaatt gaagaatcgc aaaaccagca
7320agaaaagaat gaacaagaat tattggaatt agataaatgg gcaagtttgt ggaattggtt
7380taacataaca aattggctgt ggtatataaa attattcata atgatagtag gaggcttggt
7440aggtttaaga atagtttttg ctgtactttc tatagtgaat agagttaggc agggatattc
7500accattatcg tttcagaccc acctcccaac cccgagggga cccgacaggc ccttaattaa
7560tcccctgatt ctgtggataa ccgtattacc gcctttgagt gagctgcacc tagtacggat
7620tagaagccgc cgagcgggtg acagccctcc gaaggaagac tctcctccgt gcgtcctcgt
7680cttcaccggt cgcgttcctg aaacgcagat gtgcctcgcg ccgcactgct ccgaacaatg
7740tcgactctag aggtaaactc gacctatata agcagagctc gtttagtgaa ccgtcagatc
7800gcctggagac gccatccacg ctgttttgac ctccatagaa gacaccggga ccgatccagc
7860ctccgcggcc ccgaattgaa ttcggccatt acggcctgcc accgtgccac catgctggag
7920cccggcgaga agccctacaa gtgccccgag tgcggcaaga gcttcagcag gagcgacgag
7980ctggtgaggc accagaggac ccacaccggc gagaagccct acaagtgccc cgagtgcggc
8040aagagcttca gcaggagcga caagctggtg aggcaccaga ggacccacac cggcgagaag
8100ccctacaagt gccccgagtg cggcaagagc ttcagcagga gcgacgacct ggtgaggcac
8160cagaggaccc acaccggcga gaagccctac aagtgccccg agtgcggcaa gagcttcagc
8220aggagcgaca acctggtgag gcaccagagg acccacaccg gcgagaagcc ctacaagtgc
8280cccgagtgcg gcaagagctt cagcgacccc ggcgccctgg tgaggcacca gaggacccac
8340accggcgaga agccctacaa gtgccccgag tgcggcaaga gcttcagcga ccccggccac
8400ctggtgaggc accagaggac ccacaccggc aagaagacca gcggcccagg cggccgatgc
8460taagtcactg actgcctggt cccggacact ggtgaccttc aaggatgtgt ttgtggactt
8520caccagggag gagtggaagc tgctggacac tgctcagcag atcctgtaca gaaatgtgat
8580gctggagaac tataagaacc tggtttcctt gggttatcag cttactaagc cagatgtgat
8640cctccggttg gagaagggag aagagccctg gctggtggag agagaaattc accaagagac
8700ccatcctgat tcagagactg catttgaaat caaatcatca gttgggcgcg ccgacgcgct
8760ggacgatttc gatctcgaca tgctgggttc tgatgccctc gatgactttg acctggatat
8820gttgggaagc gacgcattgg atgactttga tctggacatg ctcggctccg atgctctgga
8880cgatttcgat ctcgatatgt taattaacta attaat
89162210562DNAArtificial SequenceSynthetic 22cgtgccacca tgctggagcc
cggcgagaag ccctacaagt gccccgagtg cggcaagagc 60ttcagcgaca gcggcaacct
gagggtgcac cagaggaccc acaccggcga gaagccctac 120aagtgccccg agtgcggcaa
gagcttcagc cagagggcca acctgagggc ccaccagagg 180acccacaccg gcgagaagcc
ctacaagtgc cccgagtgcg gcaagagctt cagcaccagc 240ggcagcctgg tgaggcacca
gaggacccac accggcgaga agccctacaa gtgccccgag 300tgcggcaaga gcttcagcac
cagcggccac ctggtgaggc accagaggac ccacaccggc 360gagaagccct acaagtgccc
cgagtgcggc aagagcttca gcaccagcgg cgagctggtg 420aggcaccaga ggacccacac
cggcgagaag ccctacaagt gccccgagtg cggcaagagc 480ttcagcacca gcggcaacct
ggtgaggcac cagaggaccc acaccggcaa gaagaccagc 540ggcccaggcg gccgatgcta
agtcactgac tgcctggtcc cggacactgg tgaccttcaa 600ggatgtgttt gtggacttca
ccagggagga gtggaagctg ctggacactg ctcagcagat 660cctgtacaga aatgtgatgc
tggagaacta taagaacctg gtttccttgg gttatcagct 720tactaagcca gatgtgatcc
tccggttgga gaagggagaa gagccctggc tggtggagag 780agaaattcac caagagaccc
atcctgattc agagactgca tttgaaatca aatcatcagt 840tgggcgcgcc gacgcgctgg
acgatttcga tctcgacatg ctgggttctg atgccctcga 900tgactttgac ctggatatgt
tgggaagcga cgcattggat gactttgatc tggacatgct 960cggctccgat gctctggacg
atttcgatct cgatatgtta attaactaac gatcacatgg 1020tcctgctgga gttcgtgacc
gccgccggga tcactctcgg catggacgag ctgtacaagt 1080aagcggccgc aatcaacctc
tggattacaa aatttgtgaa agattgactg gtattcttaa 1140ctatgttgct ccttttacgc
tatgtggata cgctgcttta atgcctttgt atcatgctat 1200tacttcccgt acggctttca
ttttctcctc cttgtataaa tcctggttgc tgtctcttta 1260tgaggagttg tggcccgttg
tcaggcaacg tggcgtggtg tgcactgtgt ttgctgacgc 1320aacccccact ggttggggca
ttgccaccac ctatcaactc ctttccggga ctttcgcttt 1380ccccctccct attgccacgg
cggaactcat tgccgcctgc cttgcccgct gctggacagg 1440ggctcggctg ttgggcactg
acaattccgt ggtgttgtcg gggaagctga cgtcctttcc 1500atggctgctc gcctgtgttg
ccaactggat tctgcgcggg acgtccttct gctacgtccc 1560ttcggccctc aatccagcgg
accttccttc ccgcggcctg ctgccggttc tgcggcctct 1620tccgcgtctt cgccttcgcc
ctcagacgag tcggatctcc ctttgggccg cctccccgcc 1680tgcctgcagg tttgtcgaga
cctagaaaaa catggagcaa tcacaagtag caatacagca 1740gctaccaatg ctgattgtgc
ctggctagaa gcacaagagg aggaggaggt gggttttcca 1800gtcacacctc aggtaccttt
aagaccaatg acttacaagg cagctgtaga tcttagccac 1860tttttaaaag aaaagggggg
actggaaggg ctaattcact cccaacgaag acaagatctg 1920ctttttgctt gtactgggtc
tctctggtta gaccagatct gagcctggga gctctctggc 1980taactaggga acccactgct
taagcctcaa taaagcttgc cttgagtgct tcaagtagtg 2040tgtgcccgtc tgttgtgtga
ctctggtaac tagagatccc tcagaccctt ttagtcagtg 2100tggaaaatct ctagcagggc
ccgtttaaac ccgctgatca gcctcgactg tgccttctag 2160ttgccagcca tctgttgttt
gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac 2220tcccactgtc ctttcctaat
aaaatgagga aattgcatcg cattgtctga gtaggtgtca 2280ttctattctg gggggtgggg
tggggcagga cagcaagggg gaggattggg aagacaatag 2340caggcatgct ggggatgcgg
tgggctctat ggcttctgag gcggaaagaa ccagctgggg 2400ctctaggggg tatccccacg
cgccctgtag cggcgcatta agcgcggcgg gtgtggtggt 2460tacgcgcagc gtgaccgcta
cacttgccag cgccctagcg cccgctcctt tcgctttctt 2520cccttccttt ctcgccacgt
tcgccggctt tccccgtcaa gctctaaatc ggggcatccc 2580tttagggttc cgatttagtg
ctttacggca cctcgacccc aaaaaacttg attagggtga 2640tggttcacgt agtgggccat
cgccctgata gacggttttt cgccctttga cgttggagtc 2700cacgttcttt aatagtggac
tcttgttcca aactggaaca acactcaacc ctatctcggt 2760ctattctttt gatttataag
ggattttggg gatttcggcc tattggttaa aaaatgagct 2820gatttaacaa aaatttaacg
cgaattaatt ctgtggaatg tgtgtcagtt agggtgtgga 2880aagtccccag gctccccagg
caggcagaag tatgcaaagc atgcatctca attagtcagc 2940aaccaggtgt ggaaagtccc
caggctcccc agcaggcaga agtatgcaaa gcatgcatct 3000caattagtca gcaaccatag
tcccgcccct aactccgccc atcccgcccc taactccgcc 3060cagttccgcc cattctccgc
cccatggctg actaattttt tttatttatg cagaggccga 3120ggccgcctct gcctctgagc
tattccagaa gtagtgagga ggcttttttg gaggcctagg 3180cttttgcaaa aagctcccgg
gagcttgtat atccattttc ggatctgatc agcacgtgtt 3240gacaattaat catcggcata
gtatatcggc atagtataat acgacaaggt gaggaactaa 3300accatggcca agttgaccag
tgccgttccg gtgctcaccg cgcgcgacgt cgccggagcg 3360gtcgagttct ggaccgaccg
gctcgggttc tcccgggact tcgtggagga cgacttcgcc 3420ggtgtggtcc gggacgacgt
gaccctgttc atcagcgcgg tccaggacca ggtggtgccg 3480gacaacaccc tggcctgggt
gtgggtgcgc ggcctggacg agctgtacgc cgagtggtcg 3540gaggtcgtgt ccacgaactt
ccgggacgcc tccgggccgg ccatgaccga gatcggcgag 3600cagccgtggg ggcgggagtt
cgccctgcgc gacccggccg gcaactgcgt gcacttcgtg 3660gccgaggagc aggactgaca
cgtgctacga gatttcgatt ccaccgccgc cttctatgaa 3720aggttgggct tcggaatcgt
tttccgggac gccggctgga tgatcctcca gcgcggggat 3780ctcatgctgg agttcttcgc
ccaccccaac ttgtttattg cagcttataa tggttacaaa 3840taaagcaata gcatcacaaa
tttcacaaat aaagcatttt tttcactgca ttctagttgt 3900ggtttgtcca aactcatcaa
tgtatcttat catgtctgta taccgtcgac ctctagctag 3960agcttggcgt aatcatggtc
atagctgttt cctgtgtgaa attgttatcc gctcacaatt 4020ccacacaaca tacgagccgg
aagcataaag tgtaaagcct ggggtgccta atgagtgagc 4080taactcacat taattgcgtt
gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 4140cagctgcatt aatgaatcgg
ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct 4200tccgcttcct cgctcactga
ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca 4260gctcactcaa aggcggtaat
acggttatcc acagaatcag gggataacgc aggaaagaac 4320atgtgagcaa aaggccagca
aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt 4380ttccataggc tccgcccccc
tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg 4440cgaaacccga caggactata
aagataccag gcgtttcccc ctggaagctc cctcgtgcgc 4500tctcctgttc cgaccctgcc
gcttaccgga tacctgtccg cctttctccc ttcgggaagc 4560gtggcgcttt ctcaatgctc
acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc 4620aagctgggct gtgtgcacga
accccccgtt cagcccgacc gctgcgcctt atccggtaac 4680tatcgtcttg agtccaaccc
ggtaagacac gacttatcgc cactggcagc agccactggt 4740aacaggatta gcagagcgag
gtatgtaggc ggtgctacag agttcttgaa gtggtggcct 4800aactacggct acactagaag
gacagtattt ggtatctgcg ctctgctgaa gccagttacc 4860ttcggaaaaa gagttggtag
ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt 4920ttttttgttt gcaagcagca
gattacgcgc agaaaaaaag gatctcaaga agatcctttg 4980atcttttcta cggggtctga
cgctcagtgg aacgaaaact cacgttaagg gattttggtc 5040atgagattat caaaaaggat
cttcacctag atccttttaa attaaaaatg aagttttaaa 5100tcaatctaaa gtatatatga
gtaaacttgg tctgacagtt accaatgctt aatcagtgag 5160gcacctatct cagcgatctg
tctatttcgt tcatccatag ttgcctgact ccccgtcgtg 5220tagataacta cgatacggga
gggcttacca tctggcccca gtgctgcaat gataccgcga 5280gacccacgct caccggctcc
agatttatca gcaataaacc agccagccgg aagggccgag 5340cgcagaagtg gtcctgcaac
tttatccgcc tccatccagt ctattaattg ttgccgggaa 5400gctagagtaa gtagttcgcc
agttaatagt ttgcgcaacg ttgttgccat tgctacaggc 5460atcgtggtgt cacgctcgtc
gtttggtatg gcttcattca gctccggttc ccaacgatca 5520aggcgagtta catgatcccc
catgttgtgc aaaaaagcgg ttagctcctt cggtcctccg 5580atcgttgtca gaagtaagtt
ggccgcagtg ttatcactca tggttatggc agcactgcat 5640aattctctta ctgtcatgcc
atccgtaaga tgcttttctg tgactggtga gtactcaacc 5700aagtcattct gagaatagtg
tatgcggcga ccgagttgct cttgcccggc gtcaatacgg 5760gataataccg cgccacatag
cagaacttta aaagtgctca tcattggaaa acgttcttcg 5820gggcgaaaac tctcaaggat
cttaccgctg ttgagatcca gttcgatgta acccactcgt 5880gcacccaact gatcttcagc
atcttttact ttcaccagcg tttctgggtg agcaaaaaca 5940ggaaggcaaa atgccgcaaa
aaagggaata agggcgacac ggaaatgttg aatactcata 6000ctcttccttt ttcaatatta
ttgaagcatt tatcagggtt attgtctcat gagcggatac 6060atatttgaat gtatttagaa
aaataaacaa ataggggttc cgcgcacatt tccccgaaaa 6120gtgccacctg acgtcgacgg
atcgggagat ctcccgatcc cctatggtgc actctcagta 6180caatctgctc tgatgccgca
tagttaagcc agtatctgct ccctgcttgt gtgttggagg 6240tcgctgagta gtgcgcgagc
aaaatttaag ctacaacaag gcaaggcttg accgacaatt 6300gcatgaagaa tctgcttagg
gttaggcgtt ttgcgctgct tcgcgatgta cgggccagat 6360atacgcgttg acattgatta
ttgactagtt attaatagta atcaattacg gggtcattag 6420ttcatagccc atatatggag
ttccgcgtta cataacttac ggtaaatggc ccgcctggct 6480gaccgcccaa cgacccccgc
ccattgacgt caataatgac gtatgttccc atagtaacgc 6540caatagggac tttccattga
cgtcaatggg tggactattt acggtaaact gcccacttgg 6600cagtacatca agtgtatcat
atgccaagta cgccccctat tgacgtcaat gacggtaaat 6660ggcccgcctg gcattatgcc
cagtacatga ccttatggga ctttcctact tggcagtaca 6720tctacgtatt agtcatcgct
attaccatgg tgatgcggtt ttggcagtac atcaatgggc 6780gtggatagcg gtttgactca
cggggatttc caagtctcca ccccattgac gtcaatggga 6840gtttgttttg gcaccaaaat
caacgggact ttccaaaatg tcgtaacaac tccgccccat 6900tgacgcaaat gggcggtagg
cgtgtacggt gggaggtcta tataagcaga gctctctggc 6960taactagaga acccactgct
tactggctta tcgaaattaa tacgactcac tatagggaga 7020cccaagctgg tttaaactta
agcttggtac cgagctcact agtccagtgt ggtggcagat 7080atccagcaca gtggcggccg
ctcgagtcta gagggcccgt tttgcctgta ctgggtctct 7140ctggttagac cagatctgag
cctgggagct ctctggctaa ctagggaacc cactgcttaa 7200gcctcaataa agcttgcctt
gagtgcttca agtagtgtgt gcccgtctgt tgtgtgactc 7260tggtaactag agatccctca
gaccctttta gtcagtgtgg aaaatctcta gcagtggcgc 7320ccgaacaggg acttgaaagc
gaaagggaaa ccagaggagc tctctcgacg caggactcgg 7380cttgctgaag cgcgcacggc
aagaggcgag gggcggcgac tggtgagtac gccaaaaatt 7440ttgactagcg gaggctagaa
ggagagagat gggtgcgaga gcgtcagtat taagcggggg 7500agaattagat cgcgatggga
aaaaattcgg ttaaggccag ggggaaagaa aaaatataaa 7560ttaaaacata tagtatgggc
aagcagggag ctagaacgat tcgcagttaa tcctggcctg 7620ttagaaacat cagaaggctg
tagacaaata ctgggacagc tacaaccatc ccttcagaca 7680ggatcagaag aacttagatc
attatataat acagtagcaa ccctctattg tgtgcatcaa 7740aggatagaga taaaagacac
caaggaagct ttagacaaga tagaggaaga gcaaaacaaa 7800agtaagacca ccgcacagca
agcggccgct gatcttcaga cctggaggag gagatatgag 7860ggacaattgg agaagtgaat
tatataaata taaagtagta aaaattgaac cattaggagt 7920agcacccacc aaggcaaaga
gaagagtggt gcagagagaa aaaagagcag tgggaatagg 7980agctttgttc cttgggttct
tgggagcagc aggaagcact atgggcgcag cgtcaatgac 8040gctgacggta caggccagac
aattattgtc tggtatagtg cagcagcaga acaatttgct 8100gagggctatt gaggcgcaac
agcatctgtt gcaactcaca gtctggggca tcaagcagct 8160ccaggcaaga atcctggctg
tggaaagata cctaaaggat caacagctcc tggggatttg 8220gggttgctct ggaaaactca
tttgcaccac tgctgtgcct tggaatgcta gttggagtaa 8280taaatctctg gaacagattt
ggaatcacac gacctggatg gagtgggaca gagaaattaa 8340caattacaca agcttaatac
actccttaat tgaagaatcg caaaaccagc aagaaaagaa 8400tgaacaagaa ttattggaat
tagataaatg ggcaagtttg tggaattggt ttaacataac 8460aaattggctg tggtatatag
aaattattca taatgatagt aggaggcttg gtaggtttaa 8520gaatagtttt tgctgtactt
tctatagtga atagagttag gcagggatat tcaccattat 8580cgtttcagac ccacctccca
accccgaggg gacccgacag gcccgaagga atagaagaag 8640aaggtggaga gagagacaga
gacagatcca ttcgattagt gaacggatcg gcactgcgtg 8700cgccaattct gcagacaaat
ggcagtattc atccacaatt ttaaaagaaa aggggggatt 8760ggggggtaca gtgcagggga
aagaatagta gacataatag caacagacat acaaactaaa 8820gaattacaaa aacaaattac
aaaaattcaa aattttcggg tttattacag ggacagcaga 8880gatccagttt ggggttgctc
tggaaaactc atttgcacca ctgctgtgcc ttggaatgct 8940agttggagta ataaatctct
ggaacagatt tggaatcaca cgacctggat ggagtgggac 9000agagaaatta acaattacac
aagcttaata cactccttaa ttgaagaatc gcaaaaccag 9060caagaaaaga atgaacaaga
attattggaa ttagataaat gggcaagttt gtggaattgg 9120tttaacataa caaattggct
gtggtatata aaattattca taatgatagt aggaggcttg 9180gtaggtttaa gaatagtttt
tgctgtactt tctatagtga atagagttag gcagggatat 9240tcaccattat cgtttcagac
ccacctccca accccgaggg gacccgacag gcccttaatt 9300aattggctcc ggtgcccgtc
agtgggcaga gcgcacatcg cccacagtcc ccgagaagtt 9360ggggggaggg gtcggcaatt
gaaccggtgc ctagagaagg tggcgcgggg taaactggga 9420aagtgatgtc gtgtactggc
tccgcctttt tcccgagggt gggggagaac cgtatataag 9480tgcagtagtc gccgtgaagc
tagcggcgtc gaggcggggg tgatatggcg tcgaggcggg 9540ggtggctagc cgttcttttt
cgcaacgggt ttgccgccag aacacaggta agtgccgtgt 9600gtggttcccg cgggcctggc
ctctttacgg gttatggccc ttgcgtgcct tgaattactt 9660ccacctggct gcagtacgtg
attcttgatc ccgagcttcg ggttggaagt gggtgggaga 9720gttcgaggcc ttgcgcttaa
ggagcccctt cgcctcgtgc ttgagttgag gcctggcctg 9780ggcgctgggg ccgccgcgtg
cgaatctggt ggcaccttcg cgcctgtctc gctgctttcg 9840ataagtctct agccatttaa
aatttttgat gacctgctgc gacgcttttt ttctggcaag 9900atagtcttgt aaatgcgggc
caagatctgc acactggtat ttcggttttt ggggccgcgg 9960gcggcgacgg ggcccgtgcg
tcccagcgca catgttcggc gaggcggggc ctgcgagcgc 10020ggccaccgag aatcggacgg
gggtagtctc aagctggccg gcctgctctg gtgcctggcc 10080tcgcgccgcc gtgtatcgcc
ccgccctggg cggcaaggct ggcccggtcg gcaccagttg 10140cgtgagcgga aagatggccg
cttcccggcc ctgctgcagg gagctcaaaa tggaggacgc 10200ggcgctcggg agagcgggcg
ggtgagtcac ccacacaaag gaaaagggcc tttccgtcct 10260cagccgtcgc ttcatgtgac
tccacggagt accgggcgcc gtccaggcac ctcgattagt 10320tctcgagctt ttggagtacg
tcgtctttag gttgggggga ggggttttat gcgatggagt 10380ttccccacac tgagtgggtg
gagactgaag ttaggccagc ttggcacttg atgtaattct 10440ccttggaatt tgcccttttt
gagtttggat cttggttcat tctcaagcct cagacagtgg 10500ttcaaagttt ttttcttcca
tttcaggtga attcggccat tacggcctcc caccggccgc 10560ct
10562239215DNAArtificial
SequenceSynthetic 23ttgccaccat gctggagccc ggcgagaagc cctacaagtg
ccccgagtgc ggcaagagct 60tcagcaccca cctggacctg atcaggcacc agaggaccca
caccggcgag aagccctaca 120agtgccccga gtgcggcaag agcttcagca ggaccgacac
cctgagggac caccagagga 180cccacaccgg cgagaagccc tacaagtgcc ccgagtgcgg
caagagcttc agcgacaaga 240aggacctgac caggcaccag aggacccaca ccggcgagaa
gccctacaag tgccccgagt 300gcggcaagag cttcagcagc cccgccgacc tgaccaggca
ccagaggacc cacaccggcg 360agaagcccta caagtgcccc gagtgcggca agagcttcag
caccaccggc aacctgaccg 420tgcaccagag gacccacacc ggcgagaagc cctacaagtg
ccccgagtgc ggcaagagct 480tcagcaggaa ggacaacctg aagaaccacc agaggaccca
caccggcaag aagaccagcg 540gcccaggcgg ccgatgctaa gtcactgact gcctggtccc
ggacactggt gaccttcaag 600gatgtgtttg tggacttcac cagggaggag tggaagctgc
tggacactgc tcagcagatc 660ctgtacagaa atgtgatgct ggagaactat aagaacctgg
tttccttggg ttatcagctt 720actaagccag atgtgatcct ccggttggag aagggagaag
agccctggct ggtggagaga 780gaaattcacc aagagaccca tcctgattca gagactgcat
ttgaaatcaa atcatcagtt 840gggcgcgccg acgcgctgga cgatttcgat ctcgacatgc
tgggttctga tgccctcgat 900gactttgacc tggatatgtt gggaagcgac gcattggatg
actttgatct ggacatgctc 960ggctccgatg ctctggacga tttcgatctc gatatgttaa
ttaactaata tgtgtgggag 1020ggctaagggc gcgccgttct agagaattcg atatcaagct
tatcgataat caacctctgg 1080attacaaaat ttgtgaaaga ttgactggta ttcttaacta
tgttgctcct tttacgctat 1140gtggatacgc tgctttaatg cctttgtatc atgctattac
ttcccgtacg gctttcattt 1200tctcctcctt gtataaatcc tggttgctgt ctctttatga
ggagttgtgg cccgttgtca 1260ggcaacgtgg cgtggtgtgc actgtgtttg ctgacgcaac
ccccactggt tggggcattg 1320ccaccaccta tcaactcctt tccgggactt tcgctttccc
cctccctatt gccacggcgg 1380aactcattgc cgcctgcctt gcccgctgct ggacaggggc
tcggctgttg ggcactgaca 1440attccgtggt gttgtcgggg aagctgacgt cctttccatg
gctgctcgcc tgtgttgcca 1500actggattct gcgcgggacg tccttctgct acgtcccttc
ggccctcaat ccagcggacc 1560ttccttcccg cggcctgctg ccggttctgc ggcctcttcc
gcgtcttcgc cttcgccctc 1620agacgagtcg gatctccctt tgggccgcct ccccgcctgc
ctgcaggttt gtcgagacct 1680agaaaaacat ggagcaatca caagtagcaa tacagcagct
accaatgctg attgtgcctg 1740gctagaagca caagaggagg aggaggtggg ttttccagtc
acacctcagg tacctttaag 1800accaatgact tacaaggcag ctgtagatct tagccacttt
ttaaaagaaa aggggggact 1860ggaagggcta attcactccc aacgaagaca agatctgctt
tttgcttgta ctgggtctct 1920ctggttagac cagatctgag cctgggagct ctctggctaa
ctagggaacc cactgcttaa 1980gcctcaataa agcttgcctt gagtgcttca agtagtgtgt
gcccgtctgt tgtgtgactc 2040tggtaactag agatccctca gaccctttta gtcagtgtgg
aaaatctcta gcagggcccg 2100tttaaacccg ctgatcagcc tcgactgtgc cttctagttg
ccagccatct gttgtttgcc 2160cctcccccgt gccttccttg accctggaag gtgccactcc
cactgtcctt tcctaataaa 2220atgaggaaat tgcatcgcat tgtctgagta ggtgtcattc
tattctgggg ggtggggtgg 2280ggcaggacag caagggggag gattgggaag acaatagcag
gcatgctggg gatgcggtgg 2340gctctatggc ttctgaggcg gaaagaacca gctggggctc
tagggggtat ccccacgcgc 2400cctgtagcgg cgcattaagc gcggcgggtg tggtggttac
gcgcagcgtg accgctacac 2460ttgccagcgc cctagcgccc gctcctttcg ctttcttccc
ttcctttctc gccacgttcg 2520ccggctttcc ccgtcaagct ctaaatcggg gcatcccttt
agggttccga tttagtgctt 2580tacggcacct cgaccccaaa aaacttgatt agggtgatgg
ttcacgtagt gggccatcgc 2640cctgatagac ggtttttcgc cctttgacgt tggagtccac
gttctttaat agtggactct 2700tgttccaaac tggaacaaca ctcaacccta tctcggtcta
ttcttttgat ttataaggga 2760ttttggggat ttcggcctat tggttaaaaa atgagctgat
ttaacaaaaa tttaacgcga 2820attaattctg tggaatgtgt gtcagttagg gtgtggaaag
tccccaggct ccccaggcag 2880gcagaagtat gcaaagcatg catctcaatt agtcagcaac
caggtgtgga aagtccccag 2940gctccccagc aggcagaagt atgcaaagca tgcatctcaa
ttagtcagca accatagtcc 3000cgcccctaac tccgcccatc ccgcccctaa ctccgcccag
ttccgcccat tctccgcccc 3060atggctgact aatttttttt atttatgcag aggccgaggc
cgcctctgcc tctgagctat 3120tccagaagta gtgaggaggc ttttttggag gcctaggctt
ttgcaaaaag ctcccgggag 3180cttgtatatc cattttcgga tctgatcagc acgtgttgac
aattaatcat cggcatagta 3240tatcggcata gtataatacg acaaggtgag gaactaaacc
atggccaagt tgaccagtgc 3300cgttccggtg ctcaccgcgc gcgacgtcgc cggagcggtc
gagttctgga ccgaccggct 3360cgggttctcc cgggacttcg tggaggacga cttcgccggt
gtggtccggg acgacgtgac 3420cctgttcatc agcgcggtcc aggaccaggt ggtgccggac
aacaccctgg cctgggtgtg 3480ggtgcgcggc ctggacgagc tgtacgccga gtggtcggag
gtcgtgtcca cgaacttccg 3540ggacgcctcc gggccggcca tgaccgagat cggcgagcag
ccgtgggggc gggagttcgc 3600cctgcgcgac ccggccggca actgcgtgca cttcgtggcc
gaggagcagg actgacacgt 3660gctacgagat ttcgattcca ccgccgcctt ctatgaaagg
ttgggcttcg gaatcgtttt 3720ccgggacgcc ggctggatga tcctccagcg cggggatctc
atgctggagt tcttcgccca 3780ccccaacttg tttattgcag cttataatgg ttacaaataa
agcaatagca tcacaaattt 3840cacaaataaa gcattttttt cactgcattc tagttgtggt
ttgtccaaac tcatcaatgt 3900atcttatcat gtctgtatac cgtcgacctc tagctagagc
ttggcgtaat catggtcata 3960gctgtttcct gtgtgaaatt gttatccgct cacaattcca
cacaacatac gagccggaag 4020cataaagtgt aaagcctggg gtgcctaatg agtgagctaa
ctcacattaa ttgcgttgcg 4080ctcactgccc gctttccagt cgggaaacct gtcgtgccag
ctgcattaat gaatcggcca 4140acgcgcgggg agaggcggtt tgcgtattgg gcgctcttcc
gcttcctcgc tcactgactc 4200gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct
cactcaaagg cggtaatacg 4260gttatccaca gaatcagggg ataacgcagg aaagaacatg
tgagcaaaag gccagcaaaa 4320ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc
cataggctcc gcccccctga 4380cgagcatcac aaaaatcgac gctcaagtca gaggtggcga
aacccgacag gactataaag 4440ataccaggcg tttccccctg gaagctccct cgtgcgctct
cctgttccga ccctgccgct 4500taccggatac ctgtccgcct ttctcccttc gggaagcgtg
gcgctttctc aatgctcacg 4560ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag
ctgggctgtg tgcacgaacc 4620ccccgttcag cccgaccgct gcgccttatc cggtaactat
cgtcttgagt ccaacccggt 4680aagacacgac ttatcgccac tggcagcagc cactggtaac
aggattagca gagcgaggta 4740tgtaggcggt gctacagagt tcttgaagtg gtggcctaac
tacggctaca ctagaaggac 4800agtatttggt atctgcgctc tgctgaagcc agttaccttc
ggaaaaagag ttggtagctc 4860ttgatccggc aaacaaacca ccgctggtag cggtggtttt
tttgtttgca agcagcagat 4920tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc
ttttctacgg ggtctgacgc 4980tcagtggaac gaaaactcac gttaagggat tttggtcatg
agattatcaa aaaggatctt 5040cacctagatc cttttaaatt aaaaatgaag ttttaaatca
atctaaagta tatatgagta 5100aacttggtct gacagttacc aatgcttaat cagtgaggca
cctatctcag cgatctgtct 5160atttcgttca tccatagttg cctgactccc cgtcgtgtag
ataactacga tacgggaggg 5220cttaccatct ggccccagtg ctgcaatgat accgcgagac
ccacgctcac cggctccaga 5280tttatcagca ataaaccagc cagccggaag ggccgagcgc
agaagtggtc ctgcaacttt 5340atccgcctcc atccagtcta ttaattgttg ccgggaagct
agagtaagta gttcgccagt 5400taatagtttg cgcaacgttg ttgccattgc tacaggcatc
gtggtgtcac gctcgtcgtt 5460tggtatggct tcattcagct ccggttccca acgatcaagg
cgagttacat gatcccccat 5520gttgtgcaaa aaagcggtta gctccttcgg tcctccgatc
gttgtcagaa gtaagttggc 5580cgcagtgtta tcactcatgg ttatggcagc actgcataat
tctcttactg tcatgccatc 5640cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag
tcattctgag aatagtgtat 5700gcggcgaccg agttgctctt gcccggcgtc aatacgggat
aataccgcgc cacatagcag 5760aactttaaaa gtgctcatca ttggaaaacg ttcttcgggg
cgaaaactct caaggatctt 5820accgctgttg agatccagtt cgatgtaacc cactcgtgca
cccaactgat cttcagcatc 5880ttttactttc accagcgttt ctgggtgagc aaaaacagga
aggcaaaatg ccgcaaaaaa 5940gggaataagg gcgacacgga aatgttgaat actcatactc
ttcctttttc aatattattg 6000aagcatttat cagggttatt gtctcatgag cggatacata
tttgaatgta tttagaaaaa 6060taaacaaata ggggttccgc gcacatttcc ccgaaaagtg
ccacctgacg tcgacggatc 6120gggagatctc ccgatcccct atggtgcact ctcagtacaa
tctgctctga tgccgcatag 6180ttaagccagt atctgctccc tgcttgtgtg ttggaggtcg
ctgagtagtg cgcgagcaaa 6240atttaagcta caacaaggca aggcttgacc gacaattgca
tgaagaatct gcttagggtt 6300aggcgttttg cgctgcttcg cgatgtacgg gccagatata
cgcgttgaca ttgattattg 6360actagttatt aatagtaatc aattacgggg tcattagttc
atagcccata tatggagttc 6420cgcgttacat aacttacggt aaatggcccg cctggctgac
cgcccaacga cccccgccca 6480ttgacgtcaa taatgacgta tgttcccata gtaacgccaa
tagggacttt ccattgacgt 6540caatgggtgg actatttacg gtaaactgcc cacttggcag
tacatcaagt gtatcatatg 6600ccaagtacgc cccctattga cgtcaatgac ggtaaatggc
ccgcctggca ttatgcccag 6660tacatgacct tatgggactt tcctacttgg cagtacatct
acgtattagt catcgctatt 6720accatggtga tgcggttttg gcagtacatc aatgggcgtg
gatagcggtt tgactcacgg 6780ggatttccaa gtctccaccc cattgacgtc aatgggagtt
tgttttggca ccaaaatcaa 6840cgggactttc caaaatgtcg taacaactcc gccccattga
cgcaaatggg cggtaggcgt 6900gtacggtggg aggtctatat aagcagagct ctctggctaa
ctagagaacc cactgcttac 6960tggcttatcg aaattaatac gactcactat agggagaccc
aagctggttt aaacttaagc 7020ttggtaccga gctcactagt ccagtgtggt ggcagatatc
cagcacagtg gcggccgctc 7080gaggggcccg ttttgcctgt actgggtctc tctggttaga
ccagatctga gcctgggagc 7140tctctggcta actagggaac ccactgctta agcctcaata
aagcttgcct tgagtgcttc 7200aagtagtgtg tgcccgtctg ttgtgtgact ctggtaacta
gagatccctc agaccctttt 7260agtcagtgtg gaaaatctct agcagtggcg cccgaacagg
gacttgaaag cgaaagggaa 7320accagaggag ctctctcgac gcaggactcg gcttgctgaa
gcgcgcacgg caagaggcga 7380ggggcggcga ctggtgagta cgccaaaaat tttgactagc
ggaggctaga aggagagaga 7440tgggtgcgag agcgtcagta ttaagcgggg gagaattaga
tcgcgatggg aaaaaattcg 7500gttaaggcca gggggaaaga aaaaatataa attaaaacat
atagtatggg caagcaggga 7560gctagaacga ttcgcagtta atcctggcct gttagaaaca
tcagaaggct gtagacaaat 7620actgggacag ctacaaccat cccttcagac aggatcagaa
gaacttagat cattatataa 7680tacagtagca accctctatt gtgtgcatca aaggatagag
ataaaagaca ccaaggaagc 7740tttagacaag atagaggaag agcaaaacaa aagtaagacc
accgcacagc aagcggccgc 7800tgatcttcag acctggagga ggagatatga gggacaattg
gagaagtgaa ttatataaat 7860ataaagtagt aaaaattgaa ccattaggag tagcacccac
caaggcaaag agaagagtgg 7920tgcagagaga aaaaagagca gtgggaatag gagctttgtt
ccttgggttc ttgggagcag 7980caggaagcac tatgggcgca gcgtcaatga cgctgacggt
acaggccaga caattattgt 8040ctggtatagt gcagcagcag aacaatttgc tgagggctat
tgaggcgcaa cagcatctgt 8100tgcaactcac agtctggggc atcaagcagc tccaggcaag
aatcctggct gtggaaagat 8160acctaaagga tcaacagctc ctggggattt ggggttgctc
tggaaaactc atttgcacca 8220ctgctgtgcc ttggaatgct agttggagta ataaatctct
ggaacagatt tggaatcaca 8280cgacctggat ggagtgggac agagaaatta acaattacac
aagcttaata cactccttaa 8340ttgaagaatc gcaaaaccag caagaaaaga atgaacaaga
attattggaa ttagataaat 8400gggcaagttt gtggaattgg tttaacataa caaattggct
gtggtatata aaattattca 8460taatgatagt aggaggcttg gtaggtttaa gaatagtttt
tgctgtactt tctatagtga 8520atagagttag gcagggatat tcaccattat cgtttcagac
ccacctccca accccgaggg 8580gacccgacag gcccttaatt aatcccctga ttctgtggat
aaccgtatta ccgcctttga 8640gtgagctgca caaagaaaca aaccaacctg tctgtattat
caaagtgaaa ggctacaata 8700ggacaaagaa acaaaccaac ctgtctgtat tatcaaagtg
aaaggctaca ataggacaaa 8760gaaacaaacc aacctgtctg tattatcaaa gtgaaaggct
acaataggac aaagaaacaa 8820accaacctgt ctgtattatc aaagtgaaag gctacaatag
gacaaagaaa caaaccaacc 8880tgtctgtatt atcaaagtga aaggctacaa taggacaaag
aaacaaacca acctgtctgt 8940attatcaaag tgaaaggcta caataggaca aagaaacaaa
ccaacctgtc tgtattatca 9000aagtgaaagg ctacaatagg acggtaaact cgacctatat
aagcagagct cgtttagtga 9060accgtcagat cgcctggaga cgccatccac gctgttttga
cctccataga agacaccggg 9120accgatccag cctccgcggc cccgaattga attctaacac
cgtgcgtgtt gactatttta 9180cctctggcgg tgataggcca ttacggcctg ccacc
92152414PRTArtificial SequenceSynthetic 24Ser Gly
Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg Cys 1 5
10 2519PRTArtificial SequenceSynthetic 25Ser
Gly Arg Gln Ile Lys Ile Trp Phe Gln Asn Arg Arg Met Lys Trp 1
5 10 15 Lys Lys Cys
2614PRTArtificial SequenceSynthetic 26Gly Arg Lys Lys Arg Arg Gln Arg Arg
Arg Pro Pro Gln Gly 1 5 10
2714PRTArtificial SequenceSynthetic 27Tyr Gly Arg Lys Lys Arg Arg Gln
Arg Arg Arg Pro Pro Gln 1 5 10
2817PRTArtificial SequenceSynthetic 28Cys Arg Gln Ile Lys Ile Trp
Phe Gln Asn Arg Arg Met Lys Trp Lys 1 5
10 15 Lys 2954DNAArtificial SequenceSynthetic
29gatcccgtaa aaagcgtcgt cgagaaagcc gtaagaaacg tcgacgtgaa agca
543054DNAArtificial SequenceSynthetic 30agcttgcttt cacgtcgacg tttcttacgg
ctttctcgac gacgcttttt acgg 543172DNAArtificial
SequenceSynthetic 31gatccggtgc gtatgatctg cgtcgtcgag aacgtcagag
ccgtctgcgt cgacgtgaaa 60gacagagcag aa
723272DNAArtificial SequenceSynthetic
32agctttctgc tctgtctttc acgtcgacgc agacggctct gacgttctcg acgacgcaga
60tcatacgcac cg
723360DNAArtificial SequenceSynthetic 33gatccgttaa acgtggactg aaacttcgtc
atgttcgtcc gcgtgtgacc cgtgatgtga 603460DNAArtificial
SequenceSynthetic 34agcttcacat cacgggtcac acgcggacga acatgacgaa
gtttcagtcc acgtttaacg 60
User Contributions:
Comment about this patent or add new information about this topic: