Patent application title: METHODS, SYNTHETIC HOSTS AND REAGENTS FOR THE BIOSYNTHESIS OF ISOPRENE AND DERIVATIVES
Inventors:
IPC8 Class: AC12P500FI
USPC Class:
1 1
Class name:
Publication date: 2019-07-18
Patent application number: 20190218577
Abstract:
Methods and compositions for synthesizing dienes and derivative thereof,
such as isoprene, in Cupriavidus necator are provided.Claims:
1. A method for synthesizing isoprene in Cupriavidus necator, said method
comprising enzymatically converting isopentenyl-pyrophosphate to
dimethylallylpyrophosphate using a polypeptide having isopentenyl
diphosphate isomerase enzyme activity.
2. The method of claim 1 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
3. The method of claim 1 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity comprises the amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
4. The method of claim 1 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity is encoded by a nucleic acid sequence having at least 70% sequence identity to the nucleic acid sequence set forth in any of SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
5. The method of claim 1 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
6. A method for synthesizing isoprene in Cupriavidus necator, said method comprising enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having isoprene synthase enzyme activity.
7. The method of claim 6 wherein the polypeptide having isoprene synthase enzyme activity has at least 70% sequence identity to the amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
8. The method of claim 6 wherein the polypeptide having isoprene synthase enzyme activity comprises the amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
9. The method of claim 6 wherein the polypeptide having isoprene synthase enzyme activity is encoded by a nucleic acid sequence having at least 70% sequence identity to the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
10. The method of claim 6 wherein the polypeptide having isoprene synthase enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
11. A method for synthesizing isoprene in Cupriavidus necator, said method comprising enzymatically converting isopentenyl-pyrophosphate to dimethylallylpyrophosphate using a polypeptide having isopentenyl diphosphate isomerase enzyme activity; and enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having isoprene synthase enzyme activity.
12. The method of claim 11 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
13. The method of claim 11 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity comprises an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
14. The method of claim 11 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity is encoded by a nucleic acid sequence having at least 70% sequence identity to the nucleic acid sequence set forth in any of SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
15. The method of claim 11 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
16. The method of claim 11 wherein the polypeptide having isoprene synthase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
17. The method of claim 11 wherein the polypeptide having isoprene synthase enzyme activity comprises an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
18. The method of claim 11 wherein the polypeptide having isoprene synthase enzyme activity is encoded by a nucleic acid sequence having at least 70% sequence identity to the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
19. The method of claim 11 wherein the polypeptide having isoprene synthase enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
20. The method of any of claims 1-19, wherein said method is performed in a recombinant Cupriavidus necator host.
21. The method of claim 20 wherein the recombinant Cupriavidus necator host comprises an exogenous nucleic acid sequence encoding a polypeptide having isopentenyl diphosphate isomerase enzyme activity.
22. The method of claim 21 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
23. The method of claim 21 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity comprises an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
24. The method of claim 21 wherein the exogenous nucleic acid sequence has at least 70% sequence identity to the nucleic acid sequence set forth in any of SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
25. The method of claim 21 wherein the exogenous nucleic acid sequence comprises SEQ ID NO: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
26. The method of claim 20 wherein the recombinant Cupriavidus necator host comprises an exogenous nucleic acid encoding a polypeptide having isoprene synthase enzyme activity.
27. The method of claim 26 wherein the polypeptide having isoprene synthase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
28. The method of claim 26 wherein the polypeptide having isoprene synthase enzyme activity comprises an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
29. The method of claim 26 wherein the exogenous nucleic acid sequence has at least 70% sequence identity to the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
30. The method of claim 26 wherein the exogenous nucleic acid sequence comprises SEQ ID NO: 14 or a functional fragment thereof.
31. The method of claim 20 wherein the recombinant Cupriavidus necator host comprises an exogenous nucleic acid encoding a polypeptide having isopentenyl diphosphate isomerase enzyme activity and an exogenous nucleic acid encoding a polypeptide having isoprene synthase enzyme activity.
32. The method of claim 20 wherein the recombinant Cupriavidus necator host has been transfected with a vector comprising any of SEQ ID NOs: 15, 16, 17, 18, 19, 20 or 21.
33. The method of any of claims 1 through 32, wherein at least one of the enzymatic conversions comprises gas fermentation within the Cupriavidus necator.
34. The method of claim 33, wherein the gas fermentation comprises at least one of natural gas, syngas, CO.sub.2/H.sub.2, methanol, ethanol, non-volatile residue, caustic wash from cyclohexane oxidation processes, or waste stream from a chemical or petrochemical industry.
35. The method of claim 34 wherein the gas fermentation comprises CO.sub.2/H.sub.2.
36. The method of any of claims 1 through 35, further comprising recovering produced isoprene.
37. A substantially pure recombinant Cupriavidus necator host capable of producing isoprene via a methylerythritol phosphate (MEP) pathway.
38. The recombinant Cupriavidus necator host of claim 37 comprising an exogenous nucleic acid sequence encoding a polypeptide having isopentenyl diphosphate isomerase enzyme activity.
39. The recombinant Cupriavidus necator host of claim 38 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
40. The recombinant Cupriavidus necator host of claim 38 wherein the polypeptide having isopentenyl diphosphate isomerase enzyme activity comprises an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof.
41. The recombinant Cupriavidus necator host of claim 38 the exogenous nucleic acid sequence has at least 70% sequence identity to the nucleic acid sequence set forth in any of SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
42. The recombinant Cupriavidus necator host of claim 38 wherein the exogenous nucleic acid sequence comprises SEQ ID NO: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
43. The recombinant Cupriavidus necator host of claim 37 comprising an exogenous nucleic acid sequence encoding a polypeptide having isoprene synthase enzyme activity.
44. The recombinant Cupriavidus necator host of claim 43 wherein the polypeptide having isoprene synthase enzyme activity has at least 70% sequence identity to an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
45. The recombinant Cupriavidus necator host of claim 43 wherein the polypeptide having isoprene synthase enzyme activity comprises an amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof.
46. The recombinant Cupriavidus necator host of claim 43 wherein the exogenous nucleic acid sequence has at least 70% sequence identity to the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof.
47. The recombinant Cupriavidus necator host of claim 43 wherein the exogenous nucleic acid sequence comprises SEQ ID NO: 14 or a functional fragment thereof.
48. The recombinant Cupriavidus necator host of claim 37 comprising an exogenous nucleic acid sequence encoding a polypeptide having isopentenyl diphosphate isomerase enzyme activity and an exogenous nucleic acid sequence encoding a polypeptide having isoprene synthase enzyme activity.
49. The recombinant Cupriavidus necator host of any of claims 37 to 48, wherein at least one of the exogenous nucleic acid sequences is contained within a plasmid.
50. The recombinant Cupriavidus necator host of any of claims 37 to 48, wherein at least one of the exogenous nucleic acid sequences is integrated into a chromosome of the host.
51. The recombinant Cupriavidus necator host of claim 37 which has been transfected with a vector comprising any of SEQ ID NOs: 15, 16, 17, 18, 19, 20 or 21.
52. The recombinant Cupriavidus necator host of claim 37, wherein the host performs the enzymatic synthesis by gas fermentation.
53. The recombinant Cupriavidus necator host of claim 52, wherein the gas fermentation comprises at least one of natural gas, syngas, CO.sub.2/H.sub.2, methanol, ethanol, non-volatile residue, caustic wash from cyclohexane oxidation processes, or waste stream from a chemical or petrochemical industry.
54. The recombinant Cupriavidus necator host of claim 53, wherein the gas fermentation comprises CO.sub.2/H.sub.2.
55. A bioderived isoprene produced in a recombinant Cupriavidus necator host according to any of 37 through 54, wherein said bioderived isoprene has a carbon-12, carbon-13, and carbon-14 isotope ratio that reflects an atmospheric carbon dioxide uptake source.
56. A bio-derived, bio-based, or fermentation-derived product produced from any of the methods or hosts of any of claims 1 to 54, wherein said product comprises: (i) a composition comprising at least one bio-derived, bio-based, or fermentation-derived compound or any combination thereof; (ii) a bio-derived, bio-based, or fermentation-derived polymer comprising the bio-derived, bio-based, or fermentation-derived composition or compound of (i), or any combination thereof; (iii) a bio-derived, bio-based, or fermentation-derived cis-polyisoprene rubber, trans-polyisoprene rubber, or liquid polyisoprene rubber, comprising the bio-derived, bio-based, or fermentation-derived compound or bio-derived, bio-based, or fermentation-derived composition of (i), or any combination thereof or the bio-derived, bio-based, or fermentation-derived polymer of (ii), or any combination thereof; (iv) a molded substance obtained by molding the bio-derived, bio-based, or fermentation-derived polymer of (ii), or the bio-derived, bio-based, or fermentation-derived rubber of (iii), or any combination thereof; (v) a bio-derived, bio-based, or fermentation-derived formulation comprising the bio-derived, bio-based, or fermentation-derived composition of (i), the bio-derived, bio-based, or fermentation-derived compound of (i), the bio-derived, bio-based, or fermentation-derived polymer of (ii), the bio-derived, bio-based, or fermentation-derived rubber of (iii), or the bio-derived, bio-based, or fermentation-derived molded substance of (iv), or any combination thereof; or (vi) a bio-derived, bio-based, or fermentation-derived semi-solid or a non-semi-solid stream, comprising the bio-derived, bio-based, or fermentation-derived composition of (i), the bio-derived, bio-based, or fermentation-derived compound of (i), the bio-derived, bio-based, or fermentation-derived polymer of (ii), the bio-derived, bio-based, or fermentation-derived rubber of (iii), the bio-derived, bio-based, or fermentation-derived formulation of (iv), or the bio-derived, bio-based, or fermentation-derived molded substance of (v), or any combination thereof.
Description:
[0001] This patent application claims the benefit of priority from U.S.
Provisional Application Ser. No. 62/402,209, filed Sep. 30, 2016,
teachings of which are hereby incorporated by reference in their
entirety.
FIELD
[0002] The present invention relates to methods and compositions for synthesizing dienes and derivative thereof, such as isoprene, in Cupriavidus necator.
BACKGROUND
[0003] Isoprene is an important monomer for the production of specialty elastomers including motor mounts/fittings, surgical gloves, rubber bands, golf balls and shoes. Styrene-isoprene-styrene block copolymers form a key component of hot-melt pressure-sensitive adhesive formulations and cis-polyisoprene is utilized in the manufacture of tires (Whited et al. Industrial Biotechnology 2010 6(3):152-163). Manufacturers of rubber goods depend on either imported natural rubber from the Brazilian rubber tree or petroleum-based synthetic rubber polymers (Whited et al. 2010, supra).
[0004] Given an over-reliance on petrochemical feedstocks, biotechnology offers an alternative approach to the generation of industrially relevant products, via biocatalysis. Biotechnology offers more sustainable methods for producing industrial intermediates, in particular isoprene.
[0005] There are known metabolic pathways leading to the synthesis of isoprene in eukaryotes such as Populus alba and some prokaryotes such as Bacillis subtillis have been reported to emit isoprene (Whited et al. 2010, supra). Isoprene production in prokaryotes is however rare, and no prokaryotic Isoprene synthase (hereafter ISPS) has been described to date.
[0006] Generally, two metabolic routes have been described incorporating the molecule dimethylallyl-pyrophosphate (--PP), the precursor to isoprene. These are known as the mevalonate and the non-mevalonate pathways (Kuzuyama Biosci. Biotechnol. Biochem. 2002 66(8):1619-1627), both of which function in terpenoid synthesis in vivo. Both require the introduction of a non-native ISPS in order to divert carbon to isoprene production.
[0007] The mevalonate pathway generally occurs in higher eukaryotes and Archaea and incorporates a decarboxylase enzyme, mevalonate diphosphate decarboxylase (hereafter MDD), that introduces the first vinyl-group into the precursors leading to isoprene. The second vinyl-group is introduced by isoprene synthase in the final step in synthesizing isoprene. The non-mevalonate pathway or 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway occurs in many bacteria and dimethylallyl-PP is generated alongside isopentenyl-PP, two molecules which are interconvertible via the action of isopentenyl pyrophophate isomerase or isopentyl diphosphate isomerase (hereafter IDI).
SUMMARY
[0008] An aspect of the present invention relates to methods for synthesizing isoprene in Cupriavidus necator.
[0009] In one nonlimiting embodiment, the method comprises enzymatically converting isopentenyl-pyrophosphate to dimethylallylpyrophosphate using a polypeptide having isopentenyl diphosphate isomerase enzyme activity.
[0010] In one nonlimiting embodiment, the method comprises enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having isoprene synthase enzyme activity.
[0011] Another aspect of the present invention relates to methods for synthesizing isoprene in Cupriavidus necator which comprise enzymatically converting isopentenyl-pyrophosphate to dimethylallylpyrophosphate using a polypeptide having isopentenyl diphosphate isomerase enzyme activity; an enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having isoprene synthase enzyme activity.
[0012] Another aspect of the present invention relates to a substantially pure recombinant Cupriavidus necator hosts capable of producing isoprene via a methylerythritol phosphate (MEP) pathway.
[0013] Another aspect of the present invention relates to bioderived isoprene produced in a recombinant Cupriavidus necator host.
[0014] Another aspect of the present invention relates to bio-derived, bio-based, or fermentation-derived products produced from any of the methods or hosts described herein.
[0015] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used to practice the invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
[0016] The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and the drawings, and from the claims. The word "comprising" in the claims may be replaced by "consisting essentially of" or with "consisting of," according to standard practice in patent law.
BRIEF DESCRIPTION OF THE FIGURES
[0017] FIGS. 1A and 1B are bargraphs showing isoprene production (ppm) of IDI-ISPS expressing C. necator strains compared to a strain expressing ISPS alone. FIG. 1A compares isoprene production in C. necator strains transfected with vectors pBBR1-ISPS, pBBR1-EC IDI-ISPS, pBBR1-BS IDI-ISPS, pBBR1-SCIDI-ISPS, pBBR1-EFIDI-ISPS, pBBR1-SPyrIDI-ISPS. The S. pneumonia IDI construct is shown separately in FIG. 1B wherein it was tested with a different incubation volume and time alongside an E. coli IDI, accounting for the difference in isoprene yield.
[0018] FIGS. 2A through 2G are images of vectors pBBR1-ISPS (FIG. 2A), pBBR1-EC IDI-ISPS (FIG. 2B), pBBR1-BS IDI-ISPS (FIG. 2C), pBBR1-SCIDI-ISPS (FIG. 2D), pBBR1-EFIDI-ISPS (FIG. 2E), pBBR1-SPyrIDI-ISPS (FIG. 2F) and pBBR1-Spneu IDI-ISPS (FIG. 2G). Nucleic acid sequences of these vectors are set forth herein in SEQ ID NOs: 15 through 21, respectively.
DETAILED DESCRIPTION
[0019] Cupriavidus necator is a Gram-negative soil bacterium of the Betaproteobacteria class. This hydrogen-oxidizing bacterium is capable of growing at the interface of anaerobic and aerobic environments and easily adapts between heterotrophic and autotrophic lifestyles. Sources of energy for the bacterium include both organic compounds and hydrogen. C. necator does not naturally contain genes for isoprene synthase (ISPS) or isopentyl diphosphate isomerase (IDI) and therefore does not express these enzymes.
[0020] The present invention provides methods and compositions for synthesizing isoprene in C. necator. In the methods and compositions of the present invention, C. necator is used to synthesize isoprene via a methylerythritol phosphate (MEP) pathway.
[0021] Surprisingly, the inventors herein have found that the overexpression of IDI and ISPS in C. necator resulted in the production of isoprene, via the MEP pathway. Various vectors were constructed and confirmed by sequencing. Vectors constructed included pBBR1-ISPS, pBBR1-EC IDI-ISPS, pBBR1-BS IDI-ISPS, pBBR1-SCIDI-ISPS, pBBR1EF-IDI-ISPS, pBBR1-SPyrIDI-ISPS and pBBR1-Spneu IDI-ISPS. Images of the constructed vectors are set forth in FIGS. 2A through 2G, respectively and their nucleic acid sequences are shown in SEQ ID NOs: 15 through 21, respectively. Isoprene production by strains of C. necator H16 .DELTA.phaCAB transformed with these vectors is summarized in Table 3 and depicted graphically in FIGS. 1A and 1B. The construction of a bicistronic expression cassette comprising the P. alba isoprene synthase and an IDI was demonstrated to be sufficient to achieve isoprene production in C. necator H16.DELTA.phaCAB. The IDIs from E. coli, B. subtilis, S. cerevisiae and E. faecalis were shown to be active in C. necator H16 across a greater than ten-fold range of yields (0.03 to 0.4 ppm). The strain containing the IDI from B. subtilis produced the most isoprene under these growth conditions, approximately 0.4 ppm. Other functional IDIs generated strains with a range of isoprene yields.
[0022] This document thus provides methods and compositions which can convert central precursors including isopentenyl-pyrophosphate and/or dimethylallylpyrophosphate into isoprene.
[0023] As used herein, the term "central precursor" is used to denote any metabolite in any metabolic pathway described herein leading to the synthesis of isoprene.
[0024] The term "central metabolite" is used herein to denote a metabolite that is produced in all microorganisms to support growth.
[0025] A nonlimiting example of a C. necator host useful in the present invention is a C. necator of the H16 strain. In one nonlimiting embodiment, a C. necator host of the H16 strain with the phaCAB gene locus knocked out (.DELTA.phaCAB) is used.
[0026] In one nonlimiting embodiment, the method comprises enzymatically converting isopentenyl-pyrophosphate to dimethylallylpyrophosphate using a polypeptide having IDI enzyme activity.
[0027] Polypeptides having IDI enzyme activity and nucleic acids encoding IDIs have been identified from various organisms and are readily available in publicly available databases such as GenBank or EMBL. Examples include, but are in no way limited to, IDIs from E. coli, B. subtilis, S. cerevisiae, E. faecalis, S. pyrogenes and S. pneumonia. In one nonlimiting embodiment, the polypeptide having IDI enzyme activity has at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% sequence identity to an amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having IDI enzyme activity comprises the amino acid sequence set forth in any of SEQ ID NOs: 1, 2, 3, 4, 5 or 6 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having IDI enzyme activity is encoded by a nucleic acid sequence having at least 70%, 75%, 80%, 85%, 90%, 91%, 920, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% sequence identity to the nucleic acid sequence set forth in any of SEQ ID NOs: 8, 9, 10, 11, 12 or 13 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having IDI enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NOs. 8, 9, 10, 11, 12 or 13 or a functional fragment thereof.
[0028] In another nonlimiting embodiment, the method comprises enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having ISPS enzyme activity.
[0029] Polypeptides having ISPS enzyme activity and nucleic acids encoding ISPSs have been identified from various organisms and are readily available in publicly available databases such as GenBank or EMBL. A nonlimiting example is the ISPS of Populus alba. In one nonlimiting embodiment, the polypeptide having ISPS enzyme activity has at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% sequence identity to the amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having ISPS enzyme activity comprises the amino acid sequence set forth in SEQ ID NO: 7 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having ISPS enzyme activity is encoded by a nucleic acid sequence having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% sequence identity to the nucleic acid sequence set forth in SEQ ID NO: 14 or a functional fragment thereof. In one nonlimiting embodiment, the polypeptide having ISPS enzyme activity is encoded by a nucleic acid sequence comprising the nucleic acid sequence set forth in SEQ ID NOs. 14 or a functional fragment thereof.
[0030] In one nonlimiting embodiment, the method for synthesizing isoprene in Cupriavidus necator comprises enzymatically converting isopentenyl-pyrophosphate to dimethylallylpyrophosphate using a polypeptide having IDI enzyme activity and enzymatically converting dimethylallylpyrophosphate to isoprene using a polypeptide having ISPS enzyme activity. In this embodiment, any of the polypeptides having IDI enzyme activity or ISPS enzyme activity described supra can be used.
[0031] The percent identity (homology) between two amino acid sequences can be determined as follows. First, the amino acid sequences are aligned using the BLAST 2 Sequences (B12seq) program from the stand-alone version of BLAST containing BLASTP version 2.0.14. This stand-alone version of BLAST can be obtained from the U.S. government's National Center for Biotechnology Information web site (www with the extension ncbi.nlm.nih.gov). Instructions explaining how to use the B12seq program can be found in the readme file accompanying BLASTZ. B12seq performs a comparison between two amino acid sequences using the BLASTP algorithm. To compare two amino acid sequences, the options of B12seq are set as follows: -i is set to a file containing the first amino acid sequence to be compared (e.g., C:\seq1.txt); -j is set to a file containing the second amino acid sequence to be compared (e.g., C:\seq2.txt); -p is set to blastp; -o is set to any desired file name (e.g., C:\output.txt); and all other options are left at their default setting. For example, the following command can be used to generate an output file containing a comparison between two amino acid sequences: C:\B12seq-i c:\seq1.txt-j c:\seq2.txt-p blastp-o c:\output.txt. If the two compared sequences share homology (identity), then the designated output file will present those regions of homology as aligned sequences. If the two compared sequences do not share homology (identity), then the designated output file will not present aligned sequences. Similar procedures can be following for nucleic acid sequences except that blastn is used.
[0032] Once aligned, the number of matches is determined by counting the number of positions where an identical amino acid residue is presented in both sequences. The percent identity (homology) is determined by dividing the number of matches by the length of the full-length polypeptide amino acid sequence followed by multiplying the resulting value by 100. It is noted that the percent identity (homology) value is rounded to the nearest tenth. For example, 90.11, 90.12, 90.13, and 90.14 is rounded down to 90.1, while 90.15, 90.16, 90.17, 90.18, and 90.19 is rounded up to 90.2. It also is noted that the length value will always be an integer.
[0033] It will be appreciated that a number of nucleic acids can encode a polypeptide having a particular amino acid sequence. The degeneracy of the genetic code is well known to the art; i.e., for many amino acids, there is more than one nucleotide triplet that serves as the codon for the amino acid. For example, codons in the coding sequence for a given enzyme can be modified such that optimal expression in a particular species (e.g., bacteria or fungus) is obtained, using appropriate codon bias tables for that species.
[0034] Functional fragments of any of the polypeptides or nucleic acid sequences described herein can also be used in the methods of the document. The term "functional fragment" as used herein refers to a peptide fragment of a polypeptide or a nucleic acid sequence fragment encoding a peptide fragment of a polypeptide that has at least 25% (e.g., at least: 30%; 40%; 50%; 60%; 70%; 75%; 80%; 85%; 90%; 95%; 98%; 99%; 100%; or even greater than 100%) of the activity of the corresponding mature, full-length, polypeptide. The functional fragment can generally, but not always, be comprised of a continuous region of the polypeptide, wherein the region has functional activity.
[0035] In one nonlimiting embodiment, methods of the present invention are performed in a recombinant Cupriavidus necator host. Recombinant hosts can naturally express none or some (e.g., one or more, two or more) of the enzymes of the pathways described herein. Endogenous genes of the recombinant hosts also can be disrupted to prevent the formation of undesirable metabolites or prevent the loss of intermediates in the pathway through other enzymes acting on such intermediates. Recombinant hosts can be referred to as recombinant host cells, engineered cells, or engineered hosts. Thus, as described herein, recombinant hosts can include exogenous nucleic acids encoding one or more of IDIs and/or ISPSs, as described herein.
[0036] The term "exogenous" as used herein with reference to a nucleic acid (or a protein) and a host refers to a nucleic acid that does not occur in (and cannot be obtained from) a cell of that particular type as it is found in nature or a protein encoded by such a nucleic acid. Thus, a non-naturally-occurring nucleic acid is considered to be exogenous to a host once in the host. It is important to note that non-naturally-occurring nucleic acids can contain nucleic acid subsequences or fragments of nucleic acid sequences that are found in nature provided the nucleic acid as a whole does not exist in nature. For example, a nucleic acid molecule containing a genomic DNA sequence within an expression vector is non-naturally-occurring nucleic acid, and thus is exogenous to a host cell once introduced into the host, since that nucleic acid molecule as a whole (genomic DNA plus vector DNA) does not exist in nature. Thus, any vector, autonomously replicating plasmid, or virus (e.g., retrovirus, adenovirus, or herpes virus) that as a whole does not exist in nature is considered to be non-naturally-occurring nucleic acid. It follows that genomic DNA fragments produced by PCR or restriction endonuclease treatment as well as cDNAs are considered to be non-naturally-occurring nucleic acid since they exist as separate molecules not found in nature. It also follows that any nucleic acid containing a promoter sequence and polypeptide-encoding sequence (e.g., cDNA or genomic DNA) in an arrangement not found in nature is non-naturally-occurring nucleic acid. A nucleic acid that is naturally-occurring can be exogenous to a particular host microorganism. For example, an entire chromosome isolated from a cell of yeast x is an exogenous nucleic acid with respect to a cell of yeast y once that chromosome is introduced into a cell of yeast y.
[0037] In contrast, the term "endogenous" as used herein with reference to a nucleic acid (e.g., a gene) (or a protein) and a host refers to a nucleic acid (or protein) that does occur in (and can be obtained from) that particular host as it is found in nature. Moreover, a cell "endogenously expressing" a nucleic acid (or protein) expresses that nucleic acid (or protein) as does a host of the same particular type as it is found in nature. Moreover, a host "endogenously producing" or that "endogenously produces" a nucleic acid, protein, or other compound produces that nucleic acid, protein, or compound as does a host of the same particular type as it is found in nature.
[0038] In one nonlimiting embodiment of the present invention, the method for isoprene production is performed in a recombinant Cupriavidus necator host comprising an exogenous nucleic acid sequence encoding a polypeptide having IDI enzyme activity. In this embodiment, any of the nucleic acid sequences encoding a polypeptide having IDI enzyme activity as described supra can be used.
[0039] In another nonlimiting embodiment of the present invention, the method is performed using a recombinant Cupriavidus necator host comprising an exogenous nucleic acid encoding a polypeptide having ISPS enzyme activity. In this embodiment, any of the nucleic acid sequences encoding a polypeptide having ISPS enzyme activity as described supra can be used.
[0040] In another nonlimiting embodiment, the method is performed using a recombinant Cupriavidus necator host comprising an exogenous nucleic acid encoding a polypeptide having IDI enzyme activity and an exogenous nucleic acid encoding a polypeptide having ISPS enzyme activity. In this embodiment, any of the nucleic acid sequences encoding a polypeptide having IDI enzyme activity and any of the nucleic acid sequences having ISPS enzyme activity as described supra can be used.
[0041] In another nonlimiting embodiment, the method for isoprene production of the present invention is performed in a recombinant Cupriavidus necator host which has been transformed with a vector comprising any of SEQ ID NOs:15, 16, 17, 18, 19, 20 or 21.
[0042] In any the methods described herein, a fermentation strategy can be used that entails anaerobic, micro-aerobic or aerobic cultivation. A fermentation strategy can entail nutrient limitation such as nitrogen, phosphate or oxygen limitation. A cell retention strategy using a ceramic hollow fiber membrane can be employed to achieve and maintain a high cell density during fermentation. The principal carbon source fed to the fermentation can derive from a biological or non-biological feedstock. The biological feedstock can be, or can derive from, monosaccharides, disaccharides, lignocellulose, hemicellulose, cellulose, lignin, levulinic acid and formic acid, triglycerides, glycerol, fatty acids, agricultural waste, condensed distillers' solubles or municipal waste. The non-biological feedstock can be, or can derive from, natural gas, syngas, CO.sub.2/H.sub.2, methanol, ethanol, non-volatile residue (NVR) a caustic wash waste stream from cyclohexane oxidation processes or waste stream from a chemical or petrochemical industry.
[0043] In one nonlimiting embodiment, at least one of the enzymatic conversions of the isoprene production method comprises gas fermentation within the Cupriavidus necator. In this embodiment, the gas fermentation may comprise at least one of natural gas, syngas, CO.sub.2/H.sub.2, methanol, ethanol, non-volatile residue, caustic wash from cyclohexane oxidation processes, or waste stream from a chemical or petrochemical industry. In one nonlimiting embodiment, the gas fermentation comprises CO.sub.2/H.sub.2.
[0044] The methods of the present invention may further comprise recovering produced isoprene from the Cupriavidus necator.
[0045] Once produced, any method can be used to isolate isoprene. For example, isoprene can be recovered from the fermenter off-gas stream as a volatile product as the boiling point of isoprene is 34.1.degree. C. At a typical fermentation temperature of approximately 30.degree. C., isoprene has a high vapor pressure and can be stripped by the gas flow rate through the broth for recovery from the off-gas. Isoprene can be selectively adsorbed onto, for example, an adsorbent and separated from the other off-gas components. Membrane separation technology may also be employed to separate isoprene from the other off-gas compounds. Isoprene may desorbed from the adsorbent using, for example, nitrogen and condensed at low temperature and high pressure.
[0046] Compositions for synthesizing isoprene in C. necator are also provided by the present invention.
[0047] In one nonlimiting embodiment, a substantially pure recombinant C. necator host capable of producing isoprene via a methylerythritol phosphate (MEP) pathway is provided.
[0048] As used herein, a "substantially pure culture" of a recombinant host microorganism is a culture of that microorganism in which less than about 40% (i.e., less than about 35%; 30%; 25%; 20%; 15%; 10%; 5%; 2%; 1%; 0.5%; 0.25%; 0.1%; 0.01%; 0.001%; 0.0001%; or even less) of the total number of viable cells in the culture are viable cells other than the recombinant microorganism, e.g., bacterial, fungal (including yeast), mycoplasmal, or protozoan cells. The term "about" in this context means that the relevant percentage can be 15% of the specified percentage above or below the specified percentage. Thus, for example, about 20% can be 17% to 23%. Such a culture of recombinant microorganisms includes the cells and a growth, storage, or transport medium. Media can be liquid, semi-solid (e.g., gelatinous media), or frozen. The culture includes the cells growing in the liquid or in/on the semi-solid medium or being stored or transported in a storage or transport medium, including a frozen storage or transport medium. The cultures are in a culture vessel or storage vessel or substrate (e.g., a culture dish, flask, or tube or a storage vial or tube).
[0049] In one nonlimiting embodiment, the recombinant C. necator host comprises an exogenous nucleic acid sequence encoding a polypeptide having IDI enzyme activity. Any nucleic acid sequence encoding a polypeptide having IDI enzyme activity as described supra can be used in this embodiment.
[0050] In another nonlimiting embodiment, the recombinant C. necator host comprises an exogenous nucleic acid encoding polypeptide having IPSP enzyme activity. Any nucleic acid sequence encoding a polypeptide having IPSP enzyme activity as described supra can be used in this embodiment.
[0051] In another nonlimiting embodiment, the recombinant C. necator host comprises an exogenous nucleic acid encoding a polypeptide having IDI enzyme activity and an exogenous nucleic acid encoding a polypeptide having ISPS enzyme activity. Any of the nucleic acid sequences encoding a polypeptide having IDI enzyme activity or IPSP enzyme activity as described supra can be used.
[0052] In one nonlimiting embodiment, at least one of the exogenous nucleic acid sequences in the recombinant host is contained within a plasmid.
[0053] In one nonlimiting embodiment, at least one of the exogenous nucleic acid sequences is integrated into a chromosome of the host.
[0054] In one nonlimiting embodiment, the recombinant C. necator host has been transfected with a vector comprising any of SEQ ID NOs:15, 16, 17, 18, 19, 20 or 21.
[0055] Also provided by the present invention is isoprene bioderived from a recombinant C. necator host according to any of methods described herein. In one nonlimiting embodiment, the bioderived isoprene has carbon isotope ratio that reflects an atmospheric carbon dioxide uptake source. Examples of such ratios include, but are not limited to, carbon-12, carbon-13, and carbon-14 isotopes.
[0056] In addition, the present invention provides bio-derived, bio-based, or fermentation-derived product produced using the methods and/or compositions disclosed herein. Examples of such products include, but are not limited to, compositions comprising at least one bio-derived, bio-based, or fermentation-derived compound or any combination thereof, as well as polymers, rubbers such as cis-polyisoprene rubber, trans-polyisoprene rubber, or liquid polyisoprene rubber, molded substances, formulations and semi-solid or non-semi-solid streams comprising one or more of the bio-derived, bio-based, or fermentation-derived compounds or compositions, combinations or products thereof.
[0057] The following section provides further illustration of the methods and compositions of the present invention. These working examples are illustrative only and are not intended to limit the scope of the invention in any way.
EXAMPLES
Example 1: Primers
[0058] Primers as listed in Table 1 were used in the following disclosed experiments.
TABLE-US-00001 Primer Sequence 1 5' GGAAGGAGCGAAGCATGCGTTGTAGCGTTAGC 3' (SEQ ID NO: 22) 2 5' GGGCTTTGTTAGCAGGCTTAGCGTTCGAACGGCAGAAT 3' (SEQ ID NO: 23) 3 5' GCCTGCTAACAAAGCCCGAAA 3' (SEQ ID NO: 24) 4 5' GCTTCGCTCCTTCCTTAAAG 3' (SEQ ID NO: 25) 5 5' GCCGCCCTATACCTTGTCT 3' (SEQ ID NO: 26) 6 5' ACGGCGTCACACTTTGCTAT 3' (SEQ ID NO: 27) 7 5' CGCGTCGCGAACGCCAGCAA 3' (SEQ ID NO: 28) 8 5' ACGGGGCCTGCCACCATACC 3' (SEQ ID NO: 29) 9 5' CTTATCGATGATAAGCTGTC 3' (SEQ ID NO: 30) 10 5' CAGCCCTAGATCGGCCACAG 3' (SEQ ID NO: 31) 11 5' TGCCTGCCCCTCCCTTTTGG 3' (SEQ ID NO: 32) 12 5' GCGGCGAGTGCGGGGGTTCC 3' (SEQ ID NO: 33) 13 5' GGAAACCCACGGCGGCAATG 3' (SEQ ID NO: 34) 14 5' ATCGGCTGTAGCCGCCTCTAGATT 3' (SEQ ID NO: 35) 15 5' AGTAACAATTGCTCAAGCAG 3' (SEQ ID NO: 36) 16 5' ATTCAGAGAAGAAACCAATT 3' (SEQ ID NO: 37) 17 5' GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGCAAAC 3' (SEQ ID NO: 38) 18 5' GCTTCGCTCCTTCCTTAAAGTTATTTAAGCTGGGTAAATGC 3' (SEQ ID NO: 39) 19 5' GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGGTC 3' (SEQ ID NO: 40) 20 5' GCTTCGCTCCTTCCTTAAAGTCAGCGCACCGAATACGA 3' (SEQ ID NO: 41) 21 5' GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGACTGCCGACAACAATAG 3' (SEQ ID NO: 42) 22 5' GCTTCGCTCCTTCCTTAAAGTTATAGCATTCTATGAATTTGCC 3' (SEQ ID NO: 43) 23 5'GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGAATCGAAAAGATGAAC 3' (SEQ ID NO: 44) 24 5' GCTTCGCTCCTTCCTTAAAGTTAACGTTTTGCGAAAACAG 3' (SEQ ID NO: 45) 25 5' GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGACTAACCGTAAAGATGATC 3' (SEQ ID NO: 46) 26 5' GCTTCGCTCCTTCCTTAAAGCTAATTGACCTGCTGCAAG 3' (SEQ ID NO: 47) 27 5' GCTAGAAATAATTTTGAGCTCGCCAAGGAGATATAATGACGACCAACCGCAAGGATG 3' (SEQ ID NO: 48) 28 5' GCTTCGCTCCTTCCTTAAAGTCACGCCTTCTTCATCTG 3' (SEQ ID NO: 49) 29 5' GCCGCCCTATACCTTGTCT 3' (SEQ ID NO: 50) 30 5' ACGGCGTCACACTTTGCTAT 3' (SEQ ID NO: 51)
Example 2: Cloning of Poplar ISPS for Expression in C. necator Spp.
[0059] The protein sequence for the Populus alba was obtained from GenBank (BAD98243.1) and the full gene (with an additional promoter and terminator), codon optimized for E. coli was purchased from Eurofins MWG (SEQ ID NO:52). This DNA was used as a template for amplification of the gene using primers 1 and 2 (see Table 1) and Phusion polymerase (NEB) with an annealing temperature of 45.degree. C. (the open reading frame (ORF) generated lacked the native plasmid tag; this ORF corresponds to nucleotides 168-1865 of SEQ ID NO:52). The vector backbone of pBBR1MCS3-pBAD was generated with primer 3 and 4 (see Table 1) and with Merck Millipore KOD polymerase with annealing temperatures of 50-55.degree. C. The two fragments were ligated using NEB Gibson Assembly reaction master mix as per the manufacturer's recommended protocol. The ligation mix was transformed into chemically competent E. coli NEB5.alpha. and correct clones verified via a combination of colony PCR and sequencing with primers 5 and 6 (see Table 1). Subsequently the whole construct was sequenced by MWG-Eurofins using primers 7-16 (see Table 1). A single verified construct was taken forward for further work and designated pBBR1-ISPS (see FIG. 2A; SEQ ID NO:15)
Example 3: Cloning of IDI-ISPS Bicistrons for Expression in C. necator spp.
[0060] A unique SacI restriction site was identified in pBBR1-ISPS, upstream of the ribosome binding site and downstream of the predicted transcriptional start site. pBBR1-ISPS was purified from NEB5.alpha. alpha using the Qiagen plasmid Midi prep kit, cut with SacI (NEB) and purified using the Qiagen PCR purification kit as per the recommended protocol. Nucleic acid sequences for IDIs from E. coli (SEQ ID NO:8), B. subtilis (SEQ ID NO:9), S. cerevisiae (SEQ ID NO:10), E. faecalis (SEQ ID NO:11), S. pyrogenes (SEQ ID NO:12) and S. pneumonia (SEQ ID NO:13) were obtained from GenBank. Each IDI was amplified from genomic DNA (purchased directly from DSMZ or ATCC) or in the case of the B. subtilis and S. pneumonia variants, from a codon optimized (C. necator) synthetic operon purchased from Eurofins MWG.
[0061] PCR products were generated with Merck Millipore KOD polymerase and an annealing temperature of 55.degree. C. and using primers 17-28 (see Table 1) purified using the Qiagen PCR purification kit and the recommended protocol. The PCR products were then used in a Gibson assembly with the SacI digested and purified pBBR1-ISPS and individual ligations transformed to E. coli NEB5.alpha.. Clones were verified via a combination of colony PCR with Taq polymerase (NEB) and sequencing with primers 29 and 30 (see Table 1). Single verified constructs representing each IDI coupled to ISPS were designated pBBR-EC IDI-ISPS (FIG. 2B; SEQ ID NO:16), pBBR1-BS IDI-ISPS (FIG. 2C; SEQ ID NO:17), pBBR1-SCIDI-ISPS (FIG. 2D; SEQ ID NO:18), pBBR1-EFIDI-ISPS (FIG. 2E; SEQ ID NO:19), pBBR1-SPyrIDI-ISPS (FIG. 2F; SEQ ID NO:20) and pBBR1 SpneuIDI-ISPS (FIG. 2G; SEQ ID NO:21) and further examined.
Example 4: Vector Preparation and Transference to C. necator H16 .DELTA.phaCAB
[0062] Vectors pBBR-EC IDI-ISPS, pBBR1-BS IDI-ISPS, pBBR1-SCIDI-ISPS, pBBR1-EFIDI-ISPS, pBBR1-SPyrIDI-ISPS and pBBR1 SpneuIDI-ISPS were prepared from their respective NEB5a hosts using the Qiagen Midi prep kit and appropriate culture volumes. A C. necator H16 strain with the phaCAB gene locus knocked out (.DELTA.phaCAB) was grown to mid/late exponential phase in tryptic soy broth (TSB) media at 30.degree. C. Cells were made competent with glycerol washes and used immediately. Unexpectedly, competent cells were transformed with at least 1 .mu.g of vector DNA via electroporation and recovered in TSB medium. Transformants were identified on TSB agar with 10 .mu.g/ml tetracycline. Single transformants representative of each IDI-ISPS clone were further examined.
Example 5: Isoprene Production in C. necator H16 .DELTA.phaCAB
[0063] IDI-ISPS clones in C. necator H16 .DELTA.phaCAB, representative of each IDI under study, were grown over 48 hours on TSB agar (without dextrose). The P. alba ISPS construct (pBBR1-ISPS) containing strain was also grown on the same media, as a control. Cultures were grown, induced and harvested. Cell pellets were resuspended in a suitable media and normalized in solution based on the wet cell weight. Further incubations with induction were performed in screw cap headspace gas chromatography (GC) vials (Anatune 093640-040-00 and 093640-038-00). Surprisingly, isoprene was produced and could be measured via gas chromatography-mass spectrometry (GCMS), the parameters for which are set out in Table 2. Ions monitored for isoprene were 39, 53 and 67 on an Agilent DB-624 column Agilent.
TABLE-US-00002 TABLE 2 GCMS analysis conditions for Isoprene GCMS CONDITIONS PARAMETER VALUE Carrier Gas Helium at constant flow (2.0 ml/min) Injector Split ratio Split 10:L Temperature 150.degree. C. Detector Source Temperature 230.degree. C. Quad Temperature 150.degree. C. Interface 260.degree. C. Gain 1 Scan Range] m/z 30-200 Threshold 150 Scan Speed 2{circumflex over ( )}2(A/D samples) 4 Sampling Rate 2{circumflex over ( )}n = 2{circumflex over ( )}2 Mode SCAN and SIM Solvent delay * 2.80 min Oven Temperature Initial T: 40.degree. C. .times. 10 min Oven Ramp 40.degree. C./min to 260.degree. C. for 5 min Injection volume 50 .mu.l from the HS in the GC 2 ml vial Incubation time and T 15 min at 95.degree. C. Agitator ON 500 rpm Injection volume 500 .mu.l of the Head Space Gas saver On after 2 min Concentration range 0.1-5.0 (.mu.g/ml) GC Column DB-624 (122-1334 Agilent) 60 m .times. 250 .mu.m .times. 1.4 .mu.m
Results of these isoprene production studies are shown in Table 3 and depicted graphically in FIG. 1.
TABLE-US-00003 TABLE 3 Isoprene production results of IDI-ISPS expressing C. necator strains Culture Mean Standard C. necator H16 .DELTA.phaCAB isoprene ppm deviation pBBR1-ISPS 0.0078 0.000051 pBBR1 - EC IDI-ISPS 0.030 0.0032 pBBR1 - BS IDI-ISPS 0.40 0.021 pBBR1 - SC-IDI-ISPS 0.076 0.0005 pBBR1 - EF IDI-ISPS 0.018 0.0012 pBBR1 - SPyr IDI-ISPS 0.0089 0.00089 pBBR1 - EC IDI-ISPS 0.184 0.003 pBBR1 - Spneu IDI-ISPS 0.595 0.011
Sequence CWU
1
1
521182PRTE. coli 1Met Gln Thr Glu His Val Ile Leu Leu Asn Ala Gln Gly Val
Pro Thr1 5 10 15Gly Thr
Leu Glu Lys Tyr Ala Ala His Thr Ala Asp Thr Arg Leu His 20
25 30Leu Ala Phe Ser Ser Trp Leu Phe Asn
Ala Lys Gly Gln Leu Leu Val 35 40
45Thr Arg Arg Ala Leu Ser Lys Lys Ala Trp Pro Gly Val Trp Thr Asn 50
55 60Ser Val Cys Gly His Pro Gln Leu Gly
Glu Ser Asn Glu Asp Ala Val65 70 75
80Ile Arg Arg Cys Arg Tyr Glu Leu Gly Val Glu Ile Thr Pro
Pro Glu 85 90 95Ser Ile
Tyr Pro Asp Phe Arg Tyr Arg Ala Thr Asp Pro Ser Gly Ile 100
105 110Val Glu Asn Glu Val Cys Pro Val Phe
Ala Ala Arg Thr Thr Ser Ala 115 120
125Leu Gln Ile Asn Asp Asp Glu Val Met Asp Tyr Gln Trp Cys Asp Leu
130 135 140Ala Asp Val Leu His Gly Ile
Asp Ala Thr Pro Trp Ala Phe Ser Pro145 150
155 160Trp Met Val Met Gln Ala Thr Asn Arg Glu Ala Arg
Lys Arg Leu Ser 165 170
175Ala Phe Thr Gln Leu Lys 1802350PRTB. subtilis 2Met Val Thr
Arg Ala Glu Arg Lys Arg Gln His Ile Asn His Ala Leu1 5
10 15Ser Ile Gly Gln Lys Arg Glu Thr Gly
Leu Asp Asp Ile Thr Phe Val 20 25
30His Val Ser Leu Pro Asp Leu Ala Leu Glu Gln Val Asp Ile Ser Thr
35 40 45Lys Ile Gly Glu Leu Ser Ser
Ser Ser Pro Ile Phe Ile Asn Ala Met 50 55
60Thr Gly Gly Gly Gly Lys Leu Thr Tyr Glu Ile Asn Lys Ser Leu Ala65
70 75 80Arg Ala Ala Ser
Gln Ala Gly Ile Pro Leu Ala Val Gly Ser Gln Met 85
90 95Ser Ala Leu Lys Asp Pro Ser Glu Arg Leu
Ser Tyr Glu Ile Val Arg 100 105
110Lys Glu Asn Pro Asn Gly Leu Ile Phe Ala Asn Leu Gly Ser Glu Ala
115 120 125Thr Ala Ala Gln Ala Lys Glu
Ala Val Glu Met Ile Gly Ala Asn Ala 130 135
140Leu Gln Ile His Leu Asn Val Ile Gln Glu Ile Val Met Pro Glu
Gly145 150 155 160Asp Arg
Ser Phe Ser Gly Ala Leu Lys Arg Ile Glu Gln Ile Cys Ser
165 170 175Arg Val Ser Val Pro Val Ile
Val Lys Glu Val Gly Phe Gly Met Ser 180 185
190Lys Ala Ser Ala Gly Lys Leu Tyr Glu Ala Gly Ala Ala Ala
Val Asp 195 200 205Ile Gly Gly Tyr
Gly Gly Thr Asn Phe Ser Lys Ile Glu Asn Leu Arg 210
215 220Arg Gln Arg Gln Ile Ser Phe Phe Asn Ser Trp Gly
Ile Ser Thr Ala225 230 235
240Ala Ser Leu Ala Glu Ile Arg Ser Glu Phe Pro Ala Ser Thr Met Ile
245 250 255Ala Ser Gly Gly Leu
Gln Asp Ala Leu Asp Val Ala Lys Ala Ile Ala 260
265 270Leu Gly Ala Ser Cys Thr Gly Met Ala Gly His Phe
Leu Lys Ala Leu 275 280 285Thr Asp
Ser Gly Glu Glu Gly Leu Leu Glu Glu Ile Gln Leu Ile Leu 290
295 300Glu Glu Leu Lys Leu Ile Met Thr Val Leu Gly
Ala Arg Thr Ile Ala305 310 315
320Asp Leu Gln Lys Ala Pro Leu Val Ile Lys Gly Glu Thr His His Trp
325 330 335Leu Thr Glu Arg
Gly Val Asn Thr Ser Ser Tyr Ser Val Arg 340
345 3503288PRTS. cerevisiae 3Met Thr Ala Asp Asn Asn Ser
Met Pro His Gly Ala Val Ser Ser Tyr1 5 10
15Ala Lys Leu Val Gln Asn Gln Thr Pro Glu Asp Ile Leu
Glu Glu Phe 20 25 30Pro Glu
Ile Ile Pro Leu Gln Gln Arg Pro Asn Thr Arg Ser Ser Glu 35
40 45Thr Ser Asn Asp Glu Ser Gly Glu Thr Cys
Phe Ser Gly His Asp Glu 50 55 60Glu
Gln Ile Lys Leu Met Asn Glu Asn Cys Ile Val Leu Asp Trp Asp65
70 75 80Asp Asn Ala Ile Gly Ala
Gly Thr Lys Lys Val Cys His Leu Met Glu 85
90 95Asn Ile Glu Lys Gly Leu Leu His Arg Ala Phe Ser
Val Phe Ile Phe 100 105 110Asn
Glu Gln Gly Glu Leu Leu Leu Gln Gln Arg Ala Thr Glu Lys Ile 115
120 125Thr Phe Pro Asp Leu Trp Thr Asn Thr
Cys Cys Ser His Pro Leu Cys 130 135
140Ile Asp Asp Glu Leu Gly Leu Lys Gly Lys Leu Asp Asp Lys Ile Lys145
150 155 160Gly Ala Ile Thr
Ala Ala Val Arg Lys Leu Asp His Glu Leu Gly Ile 165
170 175Pro Glu Asp Glu Thr Lys Thr Arg Gly Lys
Phe His Phe Leu Asn Arg 180 185
190Ile His Tyr Met Ala Pro Ser Asn Glu Pro Trp Gly Glu His Glu Ile
195 200 205Asp Tyr Ile Leu Phe Tyr Lys
Ile Asn Ala Lys Glu Asn Leu Thr Val 210 215
220Asn Pro Asn Val Asn Glu Val Arg Asp Phe Lys Trp Val Ser Pro
Asn225 230 235 240Asp Leu
Lys Thr Met Phe Ala Asp Pro Ser Tyr Lys Phe Thr Pro Trp
245 250 255Phe Lys Ile Ile Cys Glu Asn
Tyr Leu Phe Asn Trp Trp Glu Gln Leu 260 265
270Asp Asp Leu Ser Glu Val Glu Asn Asp Arg Gln Ile His Arg
Met Leu 275 280 2854347PRTE.
faecalis 4Met Asn Arg Lys Asp Glu His Leu Ser Leu Ala Lys Ala Phe His
Lys1 5 10 15Glu Lys Ser
Asn Asp Phe Asp Arg Val Arg Phe Val His Gln Ser Phe 20
25 30Ala Glu Ser Ala Val Asn Glu Val Asp Ile
Ser Thr Ser Phe Leu Ser 35 40
45Phe Gln Leu Pro Gln Pro Phe Tyr Val Asn Ala Met Thr Gly Gly Ser 50
55 60Gln Arg Ala Lys Glu Ile Asn Gln Gln
Leu Gly Ile Ile Ala Lys Glu65 70 75
80Thr Gly Leu Leu Val Ala Thr Gly Ser Val Ser Ala Ala Leu
Lys Asp 85 90 95Ala Ser
Leu Ala Asp Thr Tyr Gln Ile Met Arg Lys Glu Asn Pro Asp 100
105 110Gly Leu Ile Phe Ala Asn Ile Gly Ala
Gly Leu Gly Val Glu Glu Ala 115 120
125Lys Arg Ala Leu Asp Leu Phe Gln Ala Asn Ala Leu Gln Ile His Val
130 135 140Asn Val Pro Gln Glu Leu Val
Met Pro Glu Gly Asp Arg Asp Phe Thr145 150
155 160Asn Trp Leu Thr Lys Ile Glu Ala Ile Val Gln Ala
Val Glu Val Pro 165 170
175Val Ile Val Lys Glu Val Gly Phe Gly Met Ser Gln Glu Thr Leu Glu
180 185 190Lys Leu Thr Ser Ile Gly
Val Gln Ala Ala Asp Val Ser Gly Gln Gly 195 200
205Gly Thr Ser Phe Thr Gln Ile Glu Asn Ala Arg Arg Lys Lys
Arg Glu 210 215 220Leu Ser Phe Leu Asp
Asp Trp Gly Gln Ser Thr Val Ile Ser Leu Leu225 230
235 240Glu Ser Gln Asn Trp Gln Lys Lys Leu Thr
Ile Leu Gly Ser Gly Gly 245 250
255Val Arg Asn Ser Leu Asp Ile Val Lys Gly Leu Ala Leu Gly Ala Lys
260 265 270Ser Met Gly Val Ala
Gly Thr Ile Leu Ala Ser Leu Met Ser Lys Asn 275
280 285Gly Leu Glu Asn Thr Leu Ala Leu Val Gln Gln Trp
Gln Glu Glu Val 290 295 300Lys Met Leu
Tyr Thr Leu Leu Gly Lys Lys Thr Thr Glu Glu Leu Thr305
310 315 320Ser Thr Ala Leu Val Leu Asp
Pro Val Leu Val Asn Trp Cys His Asn 325
330 335Arg Gly Ile Asp Ser Thr Val Phe Ala Lys Arg
340 3455329PRTS. pyrogenes 5Met Thr Asn Arg Lys Asp
Asp His Ile Lys Tyr Ala Leu Lys Tyr Gln1 5
10 15Ser Pro Tyr Asn Ala Phe Asp Asp Ile Glu Leu Ile
His His Ser Leu 20 25 30Pro
Ser Tyr Asp Leu Ser Asp Ile Asp Leu Ser Thr His Phe Ala Gly 35
40 45Gln Asp Phe Asp Phe Pro Phe Tyr Ile
Asn Ala Met Thr Gly Gly Ser 50 55
60Gln Lys Gly Lys Ala Val Asn Glu Lys Leu Ala Lys Val Ala Ala Ala65
70 75 80Thr Gly Ile Val Met
Val Thr Gly Ser Tyr Ser Ala Ala Leu Lys Asn 85
90 95Pro Asn Asp Asp Ser Tyr Arg Leu His Glu Val
Ala Asp Asn Leu Lys 100 105
110Leu Ala Thr Asn Ile Gly Leu Asp Lys Pro Val Ala Leu Gly Gln Gln
115 120 125Thr Val Gln Glu Met Gln Pro
Leu Phe Leu Gln Val His Val Asn Val 130 135
140Met Gln Glu Leu Leu Met Pro Glu Gly Glu Arg Val Phe His Thr
Trp145 150 155 160Lys Lys
His Leu Ala Glu Tyr Ala Ser Gln Ile Pro Val Pro Val Ile
165 170 175Leu Lys Glu Val Gly Phe Gly
Met Asp Val Asn Ser Ile Lys Leu Ala 180 185
190His Asp Leu Gly Ile Gln Thr Phe Asp Ile Ser Gly Arg Gly
Gly Thr 195 200 205Ser Phe Ala Tyr
Ile Glu Asn Gln Arg Gly Gly Asp Arg Ser Tyr Leu 210
215 220Asn Asp Trp Gly Gln Thr Thr Val Gln Cys Leu Leu
Asn Ala Gln Gly225 230 235
240Leu Met Asp Gln Val Glu Ile Leu Ala Ser Gly Gly Val Arg His Pro
245 250 255Leu Asp Met Ile Lys
Cys Phe Val Leu Gly Ala Arg Ala Val Gly Leu 260
265 270Ser Arg Thr Val Leu Glu Leu Val Glu Lys Tyr Pro
Thr Glu Arg Val 275 280 285Ile Ala
Ile Val Asn Gly Trp Lys Glu Glu Leu Lys Ile Ile Met Cys 290
295 300Ala Leu Asp Cys Lys Thr Ile Lys Glu Leu Lys
Gly Val Asp Tyr Leu305 310 315
320Leu Tyr Gly Arg Leu Gln Gln Val Asn 3256336PRTS.
pneumonia 6Met Thr Thr Asn Arg Lys Asp Glu His Ile Leu Tyr Ala Leu Glu
Gln1 5 10 15Lys Ser Ser
Tyr Asn Ser Phe Asp Glu Val Glu Leu Ile His Ser Ser 20
25 30Leu Pro Leu Tyr Asn Leu Asp Glu Ile Asp
Leu Ser Thr Glu Phe Ala 35 40
45Gly Arg Lys Trp Asp Phe Pro Phe Tyr Ile Asn Ala Met Thr Gly Gly 50
55 60Ser Asn Lys Gly Arg Glu Ile Asn Gln
Lys Leu Ala Gln Val Ala Glu65 70 75
80Ser Cys Gly Ile Leu Phe Val Thr Gly Ser Tyr Ser Ala Ala
Leu Lys 85 90 95Asn Pro
Thr Asp Asp Ser Phe Ser Val Lys Ser Ser His Pro Asn Leu 100
105 110Leu Leu Gly Thr Asn Ile Gly Leu Asp
Lys Pro Val Glu Leu Gly Leu 115 120
125Gln Thr Val Glu Glu Met Asn Pro Val Leu Leu Gln Val His Val Asn
130 135 140Val Met Gln Glu Leu Leu Met
Pro Glu Gly Glu Arg Lys Phe Arg Ser145 150
155 160Trp Gln Ser His Leu Ala Asp Tyr Ser Lys Gln Ile
Pro Val Pro Ile 165 170
175Val Leu Lys Glu Val Gly Phe Gly Met Asp Ala Lys Thr Ile Glu Arg
180 185 190Ala Tyr Glu Phe Gly Val
Arg Thr Val Asp Leu Ser Gly Arg Gly Gly 195 200
205Thr Ser Phe Ala Tyr Ile Glu Asn Arg Arg Ser Gly Gln Arg
Asp Tyr 210 215 220Leu Asn Gln Trp Gly
Gln Ser Thr Met Gln Ala Leu Leu Asn Ala Gln225 230
235 240Glu Trp Lys Asp Lys Val Glu Leu Leu Val
Ser Gly Gly Val Arg Asn 245 250
255Pro Leu Asp Met Ile Lys Cys Leu Val Phe Gly Ala Lys Ala Val Gly
260 265 270Leu Ser Arg Thr Val
Leu Glu Leu Val Glu Thr Tyr Thr Val Glu Glu 275
280 285Val Ile Gly Ile Val Gln Gly Trp Lys Ala Asp Leu
Arg Leu Ile Met 290 295 300Cys Ser Leu
Asn Cys Ala Thr Ile Ala Asp Leu Gln Lys Val Asp Tyr305
310 315 320Leu Leu Tyr Gly Lys Leu Lys
Glu Ala Lys Asp Gln Met Lys Lys Ala 325
330 3357560PRTPopulus alba 7Met Arg Cys Ser Val Ser Thr
Glu Asn Val Ser Phe Thr Glu Thr Glu1 5 10
15Thr Glu Ala Arg Arg Ser Ala Asn Tyr Glu Pro Asn Ser
Trp Asp Tyr 20 25 30Asp Tyr
Leu Leu Ser Ser Asp Thr Asp Glu Ser Ile Glu Val Tyr Lys 35
40 45Asp Lys Ala Lys Lys Leu Glu Ala Glu Val
Arg Arg Glu Ile Asn Asn 50 55 60Glu
Lys Ala Glu Phe Leu Thr Leu Leu Glu Leu Ile Asp Asn Val Gln65
70 75 80Arg Leu Gly Leu Gly Tyr
Arg Phe Glu Ser Asp Ile Arg Gly Ala Leu 85
90 95Asp Arg Phe Val Ser Ser Gly Gly Phe Asp Ala Val
Thr Lys Thr Ser 100 105 110Leu
His Gly Thr Ala Leu Ser Phe Arg Leu Leu Arg Gln His Gly Phe 115
120 125Glu Val Ser Gln Glu Ala Phe Ser Gly
Phe Lys Asp Gln Asn Gly Asn 130 135
140Phe Leu Glu Asn Leu Lys Glu Asp Ile Lys Ala Ile Leu Ser Leu Tyr145
150 155 160Glu Ala Ser Phe
Leu Ala Leu Glu Gly Glu Asn Ile Leu Asp Glu Ala 165
170 175Lys Val Phe Ala Ile Ser His Leu Lys Glu
Leu Ser Glu Glu Lys Ile 180 185
190Gly Lys Glu Leu Ala Glu Gln Val Asn His Ala Leu Glu Leu Pro Leu
195 200 205His Arg Arg Thr Gln Arg Leu
Glu Ala Val Trp Ser Ile Glu Ala Tyr 210 215
220Arg Lys Lys Glu Asp Ala Asn Gln Val Leu Leu Glu Leu Ala Ile
Leu225 230 235 240Asp Tyr
Asn Met Ile Gln Ser Val Tyr Gln Arg Asp Leu Arg Glu Thr
245 250 255Ser Arg Trp Trp Arg Arg Val
Gly Leu Ala Thr Lys Leu His Phe Ala 260 265
270Arg Asp Arg Leu Ile Glu Ser Phe Tyr Trp Ala Val Gly Val
Ala Phe 275 280 285Glu Pro Gln Tyr
Ser Asp Cys Arg Asn Ser Val Ala Lys Met Phe Ser 290
295 300Phe Val Thr Ile Ile Asp Asp Ile Tyr Asp Val Tyr
Gly Thr Leu Asp305 310 315
320Glu Leu Glu Leu Phe Thr Asp Ala Val Glu Arg Trp Asp Val Asn Ala
325 330 335Ile Asn Asp Leu Pro
Asp Tyr Met Lys Leu Cys Phe Leu Ala Leu Tyr 340
345 350Asn Thr Ile Asn Glu Ile Ala Tyr Asp Asn Leu Lys
Asp Lys Gly Glu 355 360 365Asn Ile
Leu Pro Tyr Leu Thr Lys Ala Trp Ala Asp Leu Cys Asn Ala 370
375 380Phe Leu Gln Glu Ala Lys Trp Leu Tyr Asn Lys
Ser Thr Pro Thr Phe385 390 395
400Asp Asp Tyr Phe Gly Asn Ala Trp Lys Ser Ser Ser Gly Pro Leu Gln
405 410 415Leu Val Phe Ala
Tyr Phe Ala Val Val Gln Asn Ile Lys Lys Glu Glu 420
425 430Ile Glu Asn Leu Gln Lys Tyr His Asp Thr Ile
Ser Arg Pro Ser His 435 440 445Ile
Phe Arg Leu Cys Asn Asp Leu Ala Ser Ala Ser Ala Glu Ile Ala 450
455 460Arg Gly Glu Thr Ala Asn Ser Val Ser Cys
Tyr Met Arg Thr Lys Gly465 470 475
480Ile Ser Glu Glu Leu Ala Thr Glu Ser Val Met Asn Leu Ile Asp
Glu 485 490 495Thr Trp Lys
Lys Met Asn Lys Glu Lys Leu Gly Gly Ser Leu Phe Ala 500
505 510Lys Pro Phe Val Glu Thr Ala Ile Asn Leu
Ala Arg Gln Ser His Cys 515 520
525Thr Tyr His Asn Gly Asp Ala His Thr Ser Pro Asp Glu Leu Thr Arg 530
535 540Lys Arg Val Leu Ser Val Ile Thr
Glu Pro Ile Leu Pro Phe Glu Arg545 550
555 5608549DNAE. coli 8atgcaaacgg aacacgtcat tttattgaat
gcacagggag ttcccacggg tacgctggaa 60aagtatgccg cacacacggc agacacccgc
ttacatctcg cgttctccag ttggctgttt 120aatgccaaag gacaattatt agttacccgc
cgcgcactga gcaaaaaagc atggcctggc 180gtgtggacta actcggtttg tgggcaccca
caactgggag aaagcaacga agacgcagtg 240atccgccgtt gccgttatga gcttggcgtg
gaaattacgc ctcctgaatc tatctatcct 300gactttcgct accgcgccac cgatccgagt
ggcattgtgg aaaatgaagt gtgtccggta 360tttgccgcac gcaccactag tgcgttacag
atcaatgatg atgaagtgat ggattatcaa 420tggtgtgatt tagcagatgt attacacggt
attgatgcca cgccgtgggc gttcagtccg 480tggatggtga tgcaggcgac aaatcgcgaa
gccagaaaac gattatctgc atttacccag 540cttaaataa
54991053DNAB. subtilis 9atggtcacgc
gcgcggagcg caagcgccag cacatcaacc acgcgctctc catcggccag 60aagcgcgaaa
ccggcctgga cgacatcacg tttgtgcatg tctcgctgcc ggacctggcc 120ctcgaacagg
tcgacatctc gacgaagatt ggcgagctga gctcctcgtc gccgatcttc 180atcaacgcga
tgaccggcgg tggtggcaag ctgacctacg agatcaacaa gtccctggcg 240cgcgcggcca
gccaggccgg catcccgctg gcggtcggca gccagatgtc ggccctgaag 300gaccccagcg
agcgcctgtc gtacgagatt gtccgcaagg aaaacccgaa cggcctgatc 360ttcgccaatc
tgggctcgga agccaccgcg gcgcaggcca aagaagcggt ggagatgatc 420ggcgccaacg
ccctgcagat ccacctgaac gtgatccaag agatcgtgat gcccgagggc 480gaccgttcct
tctccggcgc cctcaagcgc atcgagcaaa tctgcagccg cgtgtcggtg 540cccgtcatcg
tcaaggaagt gggcttcggc atgtcgaagg ccagcgccgg caagctgtac 600gaagccggcg
cggccgccgt ggacatcggc ggctacggcg gcacgaactt cagcaagatt 660gagaatctgc
gccgccagcg gcagatcagc ttcttcaact cgtggggcat cagcacggcc 720gcgtcgctgg
cggagatccg gtccgagttc ccggcctcga ccatgatcgc gtccggtggc 780ctccaagacg
ccctggacgt cgccaaggcc atcgccctgg gcgcgagctg caccggcatg 840gccggtcact
tcctgaaggc cctgaccgat agcggcgagg aaggcctgct ggaagagatc 900cagctgatcc
tggaagaact gaagctgatc atgacggtgc tgggcgcccg taccatcgcg 960gatctgcaaa
aggcgccgct cgtgatcaag ggcgaaaccc atcactggct caccgagcgg 1020ggcgtgaaca
ccagctcgta ttcggtgcgc tga 105310867DNAS.
cerevisiae 10atgactgccg acaacaatag tatgccccat ggtgcagtat ctagttacgc
caaattagtg 60caaaaccaaa cacctgaaga cattttggaa gagtttcctg aaattattcc
attacaacaa 120agacctaata cccgatctag tgagacgtca aatgacgaaa gcggagaaac
atgtttttct 180ggtcatgatg aggagcaaat taagttaatg aatgaaaatt gtattgtttt
ggattgggac 240gataatgcta ttggtgccgg taccaagaaa gtttgtcatt taatggaaaa
tattgaaaag 300ggtttactac atcgtgcatt ctccgtcttt attttcaatg aacaaggtga
attactttta 360caacaaagag ccactgaaaa aataactttc cctgatcttt ggactaacac
atgctgctct 420catccactat gtattgatga cgaattaggt ttgaagggta agctagacga
taagattaag 480ggcgctatta ctgcggcggt gagaaaacta gatcatgaat taggtattcc
agaagatgaa 540actaagacaa ggggtaagtt tcacttttta aacagaatcc attacatggc
accaagcaat 600gaaccatggg gtgaacatga aattgattac atcctatttt ataagatcaa
cgctaaagaa 660aacttgactg tcaacccaaa cgtcaatgaa gttagagact tcaaatgggt
ttcaccaaat 720gatttgaaaa ctatgtttgc tgacccaagt tacaagttta cgccttggtt
taagattatt 780tgcgagaatt acttattcaa ctggtgggag caattagatg acctttctga
agtggaaaat 840gacaggcaaa ttcatagaat gctataa
867111044DNAE. faecalis 11atgaatcgaa aagatgaaca tctatcatta
gctaaagcgt tccacaaaga aaaaagtaat 60gactttgatc gtgtgcgttt tgttcaccaa
tcgtttgctg aatccgctgt taacgaagtg 120gatatttcca cttcgtttct ttcttttcag
cttccccaac ctttttatgt caatgcaatg 180acaggtggta gtcagcgtgc aaaagaaatt
aatcagcaat taggcattat tgccaaagaa 240actggccttt tagttgcgac aggatctgtc
tcggcagcgt taaaagatgc tagtttagcg 300gatacgtatc aaattatgcg aaaagaaaac
ccagatggac tcatttttgc caatattggt 360gcaggcttgg gtgtggaaga agcaaagcga
gcgcttgatt tatttcaagc gaatgcctta 420caaatccatg taaatgtgcc ccaagaattg
gtcatgcctg aaggagatcg tgatttcact 480aattggctaa ccaagattga agctatcgta
caggccgtag aagtgcctgt cattgtcaaa 540gaggttggct ttggcatgag ccaagaaacc
ttagaaaaac ttacctctat cggcgttcaa 600gcagcggatg tgagcggcca aggcggaacg
agttttacac aaattgaaaa tgcccggcgg 660aagaaacgag aactttcttt cttagatgat
tgggggcaat caacggtcat ctctcttctg 720gaatcacaaa attggcaaaa gaaactaact
attctcggct ctggcggtgt gcgtaactct 780cttgatattg tcaaaggact cgctttaggt
gccaaaagca tgggagttgc tgggactatc 840ttagcttccc ttatgagtaa aaatggttta
gaaaatacct tagcccttgt acagcaatgg 900caagaagaag tgaaaatgct ttatactctt
ttaggaaaaa agacgacaga agaattgacg 960agtaccgcac ttgtcctcga tccagtttta
gttaattggt gtcataaccg tggtatcgac 1020agcactgttt tcgcaaaacg ttaa
104412990DNAS. pyrogenes 12atgactaacc
gtaaagatga tcacatcaaa tatgctctca agtaccaatc gccttataat 60gcttttgatg
acatagaact catacaccat tccttaccta gctatgattt gtctgatatt 120gatctcagta
ctcattttgc tgggcaagac ttcgactttc ccttttacat caatgccatg 180acaggaggaa
gtcaaaaagg caaagctgtc aatgaaaaat tggccaaagt agcagcagca 240acagggattg
tcatggtgac agggtcttat agcgctgctt taaaaaatcc taacgacgat 300tcctatcgtt
tacatgaggt ggcagataac ttgaaactag ccacgaatat tggtctagat 360aaacctgtgg
cgctaggaca acaaacggtt caagaaatgc agcccctctt tttacaggtt 420catgtgaatg
tgatgcaaga gttgctgatg ccagagggtg agcgcgtctt tcatacctgg 480aaaaaacacc
tcgctgaata cgctagtcaa ataccagttc ctgtcattct caaagaagtt 540ggttttggca
tggatgtcaa tagtatcaag ctagcacatg acctaggcat tcaaaccttt 600gatatttcag
gtagaggagg aacttcattt gcttacattg aaaatcaaag agggggagac 660cgctcttact
taaacgattg gggacaaacc actgttcagt gcttactgaa tgcacaagga 720ctgatggacc
aagtggaaat cttagcttcg ggtggtgtca gacacccctt ggacatgatt 780aagtgttttg
tcttaggagc acgtgcagtg ggactctcac gcaccgtttt agaattggtc 840gaaaaatacc
caaccgagcg tgtgattgct atcgttaatg gctggaaaga agaattaaaa 900atcattatgt
gtgctcttga ctgtaaaact attaaagaat taaagggagt cgactactta 960ctatatggac
gcttgcagca ggtcaattag 990131008DNAS.
pneumonia 13atgacgacca accgcaagga tgagcacatc ctctacgccc tggagcagaa
gtcgtcgtac 60aactcgttcg acgaagtgga actgatccac tcgtcgctgc cgctgtataa
cctggacgaa 120atcgacctgt ccaccgagtt cgccggccgc aagtgggatt tcccgttcta
catcaatgcc 180atgaccggcg gtagcaacaa gggccgcgaa atcaatcaga agctggccca
ggtcgccgag 240tcgtgcggca tcctgttcgt caccggcagc tactccgccg cgctgaagaa
cccgaccgac 300gactcgttct cggtcaagag cagccacccg aatctgctgc tgggcacgaa
catcggcctc 360gacaagcccg tcgaactggg cctgcagacc gtggaagaaa tgaaccccgt
gctgctccag 420gtgcatgtga acgtgatgca agagctgctg atgccggagg gcgaacgcaa
gttccgcagc 480tggcagtcgc acctggccga ctactcgaag cagatccccg tgccgatcgt
gctgaaagaa 540gtgggcttcg gcatggacgc caagaccatc gagcgtgcct acgagttcgg
cgtgcgcacc 600gtggacctct cgggccgcgg tggcacgagc ttcgcgtaca tcgaaaaccg
gcgcagcggc 660cagcgcgact acctgaacca gtggggccaa tcgaccatgc aggccctgct
gaacgcgcaa 720gaatggaagg acaaggtcga gctgctggtg tcgggcggcg tgcgtaaccc
gctcgacatg 780atcaagtgcc tggtgttcgg cgccaaggcc gtgggcctgt cccgcaccgt
gctggagctg 840gtcgaaacct acaccgtcga agaagtcatc ggcattgtcc agggctggaa
ggccgacctc 900cgcctcatca tgtgctccct gaactgcgcc acgatcgcgg acctccagaa
ggtggactat 960ctcctctacg gcaagctcaa agaagccaag gaccagatga agaaggcg
1008141683DNAPopulus alba 14atgcgttgta gcgttagcac cgaaaatgtg
tcgtttacgg aaacggaaac cgaagctcgc 60cgcagcgcaa actatgaacc gaactcgtgg
gattacgatt acctccttag cagcgatacg 120gatgaaagca ttgaagtgta taaagacaaa
gccaagaaac tggaggccga agtccgtcgc 180gaaatcaaca atgagaaagc ggagtttctt
acgttactgg aattgatcga taacgtgcaa 240cggttaggcc tcggctaccg ctttgagagc
gatatccgtg gtgcactgga ccgcttcgta 300tcgtctggtg gttttgacgc cgttacgaaa
acgagcctgc atggtacagc attgtctttt 360cggctgttgc gccagcatgg atttgaagtg
tcacaggagg cattttcagg cttcaaagac 420cagaacggga attttttgga gaatttgaaa
gaagatatca aagcgatctt atctctgtat 480gaggcgtcat ttctcgctct ggaaggggaa
aatattctgg acgaagcgaa agtgttcgca 540atttcccatc tgaaagaact ttccgaagaa
aagattggga aagaattggc cgaacaggtg 600aaccatgcgc tggaactgcc actgcaccgt
cgcacccaac gcctcgaagc ggtatggtcg 660attgaagcgt atcgcaaaaa agaggatgca
aatcaggttc tgctggaact ggccattctc 720gactataaca tgattcagtc cgtctatcaa
cgtgatctgc gcgaaactag tcgttggtgg 780cgccgtgtag gacttgccac taaactgcat
tttgcacgtg atcgtctgat tgagtcgttc 840tattgggcgg ttggtgtagc gtttgagccg
cagtattctg attgccgcaa tagtgtggcg 900aaaatgttct cctttgtgac catcattgac
gatatttacg acgtgtatgg caccctggat 960gaactggaat tattcaccga tgcagtagaa
cgctgggacg tcaacgcgat caatgatttg 1020ccggattaca tgaaactgtg ttttctggcc
ctgtataaca ccattaacga aattgcctat 1080gacaacctca aagacaaggg tgaaaatatc
ctgccctatc tgactaaagc ttgggctgat 1140ctgtgtaacg cgttcttaca ggaagccaaa
tggctctaca acaagagtac gcctactttc 1200gatgactact ttggcaacgc ttggaaaagc
tctagcggcc ctttacaact ggtgttcgcg 1260tatttcgccg ttgttcagaa tatcaagaaa
gaagagattg agaacctcca aaagtaccac 1320gatacgattt cgcgtccgtc acacatcttt
cgcctttgca atgatttggc cagtgcatct 1380gcagagattg cgcgcggtga aactgccaac
tccgtcagtt gctacatgcg taccaaaggc 1440atcagcgagg aactggctac cgagtcggtg
atgaacttaa tcgatgaaac ctggaagaag 1500atgaacaaag agaaacttgg tggcagtctg
tttgctaaac cgttcgttga gacagcgatt 1560aatctggcgc gtcaaagcca ctgcacctac
cacaatggcg atgcccacac atccccagac 1620gaattaaccc ggaaacgtgt cctgagtgtc
atcaccgaac ccattctgcc gttcgaacgc 1680taa
1683157399DNAArtificial
sequenceSynthetic 15tagattaatt aacctccagc gcggggatct catgctggag
ttcttcgccc acccccagac 60aagctgtgac cgtctccggg agctgcatgt gtcagaggtt
ttcaccgtca tcaccgaaac 120gcgcgaggca gcagatcaat tcgcgcgcga aggcgaagcg
gcatgcataa tgtgcctgtc 180aaatggacga agcagggatt ctgcaaaccc tatgctactc
cgtcaagccg tcaattgtct 240gattcgttac caattatgac aacttgacgg ctacatcatt
cactttttct tcacaaccgg 300cacggaactc gctcgggctg gccccggtgc attttttaaa
tacccgcgag aaatagagtt 360gatcgtcaaa accaacattg cgaccgacgg tggcgatagg
catccgggtg gtgctcaaaa 420gcagcttcgc ctggctgata cgttggtcct cgcgccagct
taagacgcta atccctaact 480gctggcggaa aagatgtgac agacgcgacg gcgacaagca
aacatgctgt gcgacgctgg 540cgatatcaaa attgctgtct gccaggtgat cgctgatgta
ctgacaagcc tcgcgtaccc 600gattatccat cggtggatgg agcgactcgt taatcgcttc
catgcgccgc agtaacaatt 660gctcaagcag atttatcgcc agcagctccg aatagcgccc
ttccccttgc ccggcgttaa 720tgatttgccc aaacaggtcg ctgaaatgcg gctggtgcgc
ttcatccggg cgaaagaacc 780ccgtattggc aaatattgac ggccagttaa gccattcatg
ccagtaggcg cgcggacgaa 840agtaaaccca ctggtgatac cattcgcgag cctccggatg
acgaccgtag tgatgaatct 900ctcctggcgg gaacagcaaa atatcacccg gtcggcaaac
aaattctcgt ccctgatttt 960tcaccacccc ctgaccgcga atggtgagat tgagaatata
acctttcatt cccagcggtc 1020ggtcgataaa aaaatcgaga taaccgttgg cctcaatcgg
cgttaaaccc gccaccagat 1080gggcattaaa cgagtatccc ggcagcaggg gatcattttg
cgcttcagcc atacttttca 1140tactcccgcc attcagagaa gaaaccaatt gtccatattg
catcagacat tgccgtcact 1200gcgtctttta ctggctcttc tcgctaacca aaccggtaac
cccgcttatt aaaagcattc 1260tgtaacaaag cgggaccaaa gccatgacaa aaacgcgtaa
caaaagtgtc tataatcacg 1320gcagaaaagt ccacattgat tatttgcacg gcgtcacact
ttgctatgcc atagcatttt 1380tatccataag attagcggat cctacctgac gctttttatc
gcaactctct actgtttctc 1440catacccgtt ttttgggcta gaaataattt tgagctcctt
taaggaagga gcgaagcatg 1500cgttgtagcg ttagcaccga aaatgtgtcg tttacggaaa
cggaaaccga agctcgccgc 1560agcgcaaact atgaaccgaa ctcgtgggat tacgattacc
tccttagcag cgatacggat 1620gaaagcattg aagtgtataa agacaaagcc aagaaactgg
aggccgaagt ccgtcgcgaa 1680atcaacaatg agaaagcgga gtttcttacg ttactggaat
tgatcgataa cgtgcaacgg 1740ttaggcctcg gctaccgctt tgagagcgat atccgtggtg
cactggaccg cttcgtatcg 1800tctggtggtt ttgacgccgt tacgaaaacg agcctgcatg
gtacagcatt gtcttttcgg 1860ctgttgcgcc agcatggatt tgaagtgtca caggaggcat
tttcaggctt caaagaccag 1920aacgggaatt ttttggagaa tttgaaagaa gatatcaaag
cgatcttatc tctgtatgag 1980gcgtcatttc tcgctctgga aggggaaaat attctggacg
aagcgaaagt gttcgcaatt 2040tcccatctga aagaactttc cgaagaaaag attgggaaag
aattggccga acaggtgaac 2100catgcgctgg aactgccact gcaccgtcgc acccaacgcc
tcgaagcggt atggtcgatt 2160gaagcgtatc gcaaaaaaga ggatgcaaat caggttctgc
tggaactggc cattctcgac 2220tataacatga ttcagtccgt ctatcaacgt gatctgcgcg
aaactagtcg ttggtggcgc 2280cgtgtaggac ttgccactaa actgcatttt gcacgtgatc
gtctgattga gtcgttctat 2340tgggcggttg gtgtagcgtt tgagccgcag tattctgatt
gccgcaatag tgtggcgaaa 2400atgttctcct ttgtgaccat cattgacgat atttacgacg
tgtatggcac cctggatgaa 2460ctggaattat tcaccgatgc agtagaacgc tgggacgtca
acgcgatcaa tgatttgccg 2520gattacatga aactgtgttt tctggccctg tataacacca
ttaacgaaat tgcctatgac 2580aacctcaaag acaagggtga aaatatcctg ccctatctga
ctaaagcttg ggctgatctg 2640tgtaacgcgt tcttacagga agccaaatgg ctctacaaca
agagtacgcc tactttcgat 2700gactactttg gcaacgcttg gaaaagctct agcggccctt
tacaactggt gttcgcgtat 2760ttcgccgttg ttcagaatat caagaaagaa gagattgaga
acctccaaaa gtaccacgat 2820acgatttcgc gtccgtcaca catctttcgc ctttgcaatg
atttggccag tgcatctgca 2880gagattgcgc gcggtgaaac tgccaactcc gtcagttgct
acatgcgtac caaaggcatc 2940agcgaggaac tggctaccga gtcggtgatg aacttaatcg
atgaaacctg gaagaagatg 3000aacaaagaga aacttggtgg cagtctgttt gctaaaccgt
tcgttgagac agcgattaat 3060ctggcgcgtc aaagccactg cacctaccac aatggcgatg
cccacacatc cccagacgaa 3120ttaacccgga aacgtgtcct gagtgtcatc accgaaccca
ttctgccgtt cgaacgctaa 3180gcctgctaac aaagcccgaa aggaagctga gttggctgct
gccaccgctg agcactagtg 3240cggccgcttt gcgcattcac agttctccgc aagaattgat
tggctccaat tcttggagtg 3300gtgaatccgt tagcgaggtg ccgccggctt ccattcaggt
cgaggtggcc cggctccatg 3360caccgcgacg caacgcgggg aggcagacaa ggtatagggc
ggcgcctaca atccatgcca 3420acccgttcca tgtgctcgcc gaggcggcat aaatcgccgt
gacgatcagc ggtccagtga 3480tcgaagttag gctggtaaga gccgcgagcg atccttgaag
ctgtccctga tggtcgtcat 3540ctacctgcct ggacagcatg gcctgcaacg cgggcatccc
gatgccgccg gaagcgagaa 3600gaatcataat ggggaaggcc atccagcctc gcgtcgcgaa
cgccagcaag acgtagccca 3660gcgcgtcggc cgccatgccg gcgataatgg cctgcttctc
gccgaaacgt ttggtggcgg 3720gaccagtgac gaaggcttga gcgagggcgt gcaagattcc
gaataccgca agcgacaggc 3780cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa
aatgacccag agcgctgccg 3840gcacctgtcc tacgagttgc atgataaaga agacagtcat
aagtgcggcg acgatagtca 3900tgccccgcgc ccaccggaag gagctgactg ggttgaaggc
tctcaagggc atcggtcgac 3960gctctccctt atgcgactcc tgcattagga agcagcccag
tagtaggttg aggccgttga 4020gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc
gcccaacagt cccccggcca 4080cggggcctgc caccataccc acgccgaaac aagcgctcat
gagcccgaag tggcgagccc 4140gatcttcccc atcggtgatg tcggcgatat aggcgccagc
aaccgcacct gtggcgccgg 4200tgatgccggc cacgatgcgt ccggcgtaga ggatccacag
gacgggtgtg gtcgccatga 4260tcgcgtagtc gatagtggct ccaagtagcg aagcgagcag
gactgggcgg cggccaaagc 4320ggtcggacag tgctccgaga acgggtgcgc atagaaattg
catcaacgca tatagcgcta 4380gcagcacgcc atagtgactg gcgatgctgt cggaatggac
gatatcccgc aagaggcccg 4440gcagtaccgg cataaccaag cctatgccta cagcatccag
ggtgacggtg ccgaggatga 4500cgatgagcgc attgttagat ttcatacacg gtgcctgact
gcgttagcaa tttaactgtg 4560ataaactacc gcattaaagc ttatcgatga taagctgtca
aacatgagaa ttcttgaaga 4620cgaaagggcc tcgtgatacg cctattttta taggttaatg
tcatgataat aatggtttct 4680tagacgtcag gtggcacttt tcggggaaat gtgcgcgccc
gcgttcctgc tggcgctggg 4740cctgtttctg gcgctggact tcccgctgtt ccgtcagcag
cttttcgccc acggccttga 4800tgatcgcggc ggccttggcc tgcatatccc gattcaacgg
ccccagggcg tccagaacgg 4860gcttcaggcg ctcccgaagg tctcgggccg tctcttgggc
ttgatcggcc ttcttgcgca 4920tctcacgcgc tcctgcggcg gcctgtaggg caggctcata
cccctgccga accgcttttg 4980tcagccggtc ggccacggct tccggcgtct caacgcgctt
tgagattccc agcttttcgg 5040ccaatccctg cggtgcatag gcgcgtggct cgaccgcttg
cgggctgatg gtgacgtggc 5100ccactggtgg ccgctccagg gcctcgtaga acgcctgaat
gcgcgtgtga cgtgccttgc 5160tgccctcgat gccccgttgc agccctagat cggccacagc
ggccgcaaac gtggtctggt 5220cgcgggtcat ctgcgctttg ttgccgatga actccttggc
cgacagcctg ccgtcctgcg 5280tcagcggcac cacgaacgcg gtcatgtgcg ggctggtttc
gtcacggtgg atgctggccg 5340tcacgatgcg atccgccccg tacttgtccg ccagccactt
gtgcgccttc tcgaagaacg 5400ccgcctgctg ttcttggctg gccgacttcc accattccgg
gctggccgtc atgacgtact 5460cgaccgccaa cacagcgtcc ttgcgccgct tctctggcag
caactcgcgc agtcggccca 5520tcgcttcatc ggtgctgctg gccgcccagt gctcgttctc
tggcgtcctg ctggcgtcag 5580cgttgggcgt ctcgcgctcg cggtaggcgt gcttgagact
ggccgccacg ttgcccattt 5640tcgccagctt cttgcatcgc atgatcgcgt atgccgccat
gcctgcccct cccttttggt 5700gtccaaccgg ctcgacgggg gcagcgcaag gcggtgcctc
cggcgggcca ctcaatgctt 5760gagtatactc actagacttt gcttcgcaaa gtcgtgaccg
cctacggcgg ctgcggcgcc 5820ctacgggctt gctctccggg cttcgccctg cgcggtcgct
gcgctccctt gccagcccgt 5880ggatatgtgg acgatggccg cgagcggcca ccggctggct
cgcttcgctc ggcccgtgga 5940caaccctgct ggacaagctg atggacaggc tgcgcctgcc
cacgagcttg accacaggga 6000ttgcccaccg gctacccagc cttcgaccac atacccaccg
gctccaactg cgcggcctgc 6060ggccttgccc catcaatttt tttaattttc tctggggaaa
agcctccggc ctgcggcctg 6120cgcgcttcgc ttgccggttg gacaccaagt ggaaggcggg
tcaaggctcg cgcagcgacc 6180gcgcagcggc ttggccttga cgcgcctgga acgacccaag
cctatgcgag tgggggcagt 6240cgaagggcga agcccgcccg cctgcccccc gagcctcacg
gcggcgagtg cgggggttcc 6300aagggggcag cgccaccttg ggcaaggccg aaggccgcgc
agtcgatcaa caagccccgg 6360aggggccact ttttgccgga gggggagccg cgccgaaggc
gtgggggaac cccgcagggg 6420tgcccttctt tgggcaccaa agaactagat atagggcgaa
atgcgaaaga cttaaaaatc 6480aacaacttaa aaaagggggg tacgcaacag ctcattgcgg
caccccccgc aatagctcat 6540tgcgtaggtt aaagaaaatc tgtaattgac tgccactttt
acgcaacgca taattgttgt 6600cgcgctgccg aaaagttgca gctgattgcg catggtgccg
caaccgtgcg gcacccctac 6660cgcatggaga taagcatggc cacgcagtcc agagaaatcg
gcattcaagc caagaacaag 6720cccggtcact gggtgcaaac ggaacgcaaa gcgcatgagg
cgtgggccgg gcttattgcg 6780aggaaaccca cggcggcaat gctgctgcat cacctcgtgg
cgcagatggg ccaccagaac 6840gccgtggtgg tcagccagaa gacactttcc aagctcatcg
gacgttcttt gcggacggtc 6900caatacgcag tcaaggactt ggtggccgag cgctggatct
ccgtcgtgaa gctcaacggc 6960cccggcaccg tgtcggccta cgtggtcaat gaccgcgtgg
cgtggggcca gccccgcgac 7020cagttgcgcc tgtcggtgtt cagtgccgcc gtggtggttg
atcacgacga ccaggacgaa 7080tcgctgttgg ggcatggcga cctgcgccgc atcccgaccc
tgtatccggg cgagcagcaa 7140ctaccgaccg gccccggcga ggagccgccc agccagcccg
gcattccggg catggaacca 7200gacctgccag ccttgaccga aacggaggaa tgggaacggc
gcgggcagca gcgcctgccg 7260atgcccgatg agccgtgttt tctggacgat ggcgagccgt
tggagccgcc gacacgggtc 7320acgctgccgc gccggtagca cttgggttgc gcagcaaccc
gtaagtgcgc tgttccagac 7380tatcggctgt agccgcctc
7399167960DNAArtificial sequenceSynthetic
16ctcgatgccc cgttgcagcc ctagatcggc cacagcggcc gcaaacgtgg tctggtcgcg
60ggtcatctgc gctttgttgc cgatgaactc cttggccgac agcctgccgt cctgcgtcag
120cggcaccacg aacgcggtca tgtgcgggct ggtttcgtca cggtggatgc tggccgtcac
180gatgcgatcc gccccgtact tgtccgccag ccacttgtgc gccttctcga agaacgccgc
240ctgctgttct tggctggccg acttccacca ttccgggctg gccgtcatga cgtactcgac
300cgccaacaca gcgtccttgc gccgcttctc tggcagcaac tcgcgcagtc ggcccatcgc
360ttcatcggtg ctgctggccg cccagtgctc gttctctggc gtcctgctgg cgtcagcgtt
420gggcgtctcg cgctcgcggt aggcgtgctt gagactggcc gccacgttgc ccattttcgc
480cagcttcttg catcgcatga tcgcgtatgc cgccatgcct gcccctccct tttggtgtcc
540aaccggctcg acgggggcag cgcaaggcgg tgcctccggc gggccactca atgcttgagt
600atactcacta gactttgctt cgcaaagtcg tgaccgccta cggcggctgc ggcgccctac
660gggcttgctc tccgggcttc gccctgcgcg gtcgctgcgc tcccttgcca gcccgtggat
720atgtggacga tggccgcgag cggccaccgg ctggctcgct tcgctcggcc cgtggacaac
780cctgctggac aagctgatgg acaggctgcg cctgcccacg agcttgacca cagggattgc
840ccaccggcta cccagccttc gaccacatac ccaccggctc caactgcgcg gcctgcggcc
900ttgccccatc aattttttta attttctctg gggaaaagcc tccggcctgc ggcctgcgcg
960cttcgcttgc cggttggaca ccaagtggaa ggcgggtcaa ggctcgcgca gcgaccgcgc
1020agcggcttgg ccttgacgcg cctggaacga cccaagccta tgcgagtggg ggcagtcgaa
1080ggcgaagccc gcccgcctgc cccccgagcc tcacggcggc gagtgcgggg gttccaaggg
1140ggcagcgcca ccttgggcaa ggccgaaggc cgcgcagtcg atcaacaagc cccggagggg
1200ccactttttg ccggaggggg agccgcgccg aaggcgtggg ggaaccccgc aggggtgccc
1260ttctttgggc accaaagaac tagatatagg gcgaaatgcg aaagacttaa aaatcaacaa
1320cttaaaaaag gggggtacgc aacagctcat tgcggcaccc cccgcaatag ctcattgcgt
1380aggttaaaga aaatctgtaa ttgactgcca cttttacgca acgcataatt gttgtcgcgc
1440tgccgaaaag ttgcagctga ttgcgcatgg tgccgcaacc gtgcggcacc ctaccgcatg
1500gagataagca tggccacgca gtccagagaa atcggcattc aagccaagaa caagcccggt
1560cactgggtgc aaacggaacg caaagcgcat gaggcgtggg ccgggcttat tgcgaggaaa
1620cccacggcgg caatgctgct gcatcacctc gtggcgcaga tgggccacca gaacgccgtg
1680gtggtcagcc agaagacact ttccaagctc atcggacgtt ctttgcggac ggtccaatac
1740gcagtcaagg acttggtggc cgagcgctgg atctccgtcg tgaagctcaa cggccccggc
1800accgtgtcgg cctacgtggt caatgaccgc gtggcgtggg gccagccccg cgaccagttg
1860cgcctgtcgg tgttcagtgc cgccgtggtg gttgatcacg acgaccagga cgaatcgctg
1920ttggggcatg gcgacctgcg ccgcatcccg accctgtatc cgggcgagca gcaactaccg
1980accggccccg gcgaggagcc gcccagccag cccggcattc cgggcatgga accagacctg
2040ccagccttga ccgaaacgga ggaatgggaa cggcgcgggc agcagcgcct gccgatgccc
2100gatgagccgt gttttctgga cgatggcgag ccgttggagc cgccgacacg ggtcacgctg
2160ccgcgccggt agcacttggg ttgcgcagca acccgtaagt gcgctgttcc agactatcgg
2220ctgtagccgc ctctagatta attaacctcc agcgcgggga tctcatgctg gagttcttcg
2280cccaccccca gacaagctgt gaccgtctcc gggagctgca tgtgtcagag gttttcaccg
2340tcatcaccga aacgcgcgag gcagcagatc aattcgcgcg cgaaggcgaa gcggcatgca
2400taatgtgcct gtcaaatgga cgaagcaggg attctgcaaa ccctatgcta ctccgtcaag
2460ccgtcaattg tctgattcgt taccaattat gacaacttga cggctacatc attcactttt
2520tcttcacaac cggcacggaa ctcgctcggg ctggccccgg tgcatttttt aaatacccgc
2580gagaaataga gttgatcgtc aaaaccaaca ttgcgaccga cggtggcgat aggcatccgg
2640gtggtgctca aaagcagctt cgcctggctg atacgttggt cctcgcgcca gcttaagacg
2700ctaatcccta actgctggcg gaaaagatgt gacagacgcg acggcgacaa gcaaacatgc
2760tgtgcgacgc tggcgatatc aaaattgctg tctgccaggt gatcgctgat gtactgacaa
2820gcctcgcgta cccgattatc catcggtgga tggagcgact cgttaatcgc ttccatgcgc
2880cgcagtaaca attgctcaag cagatttatc gccagcagct ccgaatagcg cccttcccct
2940tgcccggcgt taatgatttg cccaaacagg tcgctgaaat gcggctggtg cgcttcatcc
3000gggcgaaaga accccgtatt ggcaaatatt gacggccagt taagccattc atgccagtag
3060gcgcgcggac gaaagtaaac ccactggtga taccattcgc gagcctccgg atgacgaccg
3120tagtgatgaa tctctcctgg cgggaacagc aaaatatcac ccggtcggca aacaaattct
3180cgtccctgat ttttcaccac cccctgaccg cgaatggtga gattgagaat ataacctttc
3240attcccagcg gtcggtcgat aaaaaaatcg agataaccgt tggcctcaat cggcgttaaa
3300cccgccacca gatgggcatt aaacgagtat cccggcagca ggggatcatt ttgcgcttca
3360gccatacttt tcatactccc gccattcaga gaagaaacca attgtccata ttgcatcaga
3420cattgccgtc actgcgtctt ttactggctc ttctcgctaa ccaaaccggt aaccccgctt
3480attaaaagca ttctgtaaca aagcgggacc aaagccatga caaaaacgcg taacaaaagt
3540gtctataatc acggcagaaa agtccacatt gattatttgc acggcgtcac actttgctat
3600gccatagcat ttttatccat aagattagcg gatcctacct gacgcttttt atcgcaactc
3660tctactgttt ctccataccc gttttttggg ctagaaataa ttttgagctc gccaaggaga
3720tataatgcaa acggaacacg tcattttatt gaatgcacag ggagttccca cgggtacgct
3780ggaaaagtat gccgcacaca cggcagacac ccgcttacat ctcgcgttct ccagttggct
3840gtttaatgcc aaaggacaat tattagttac ccgccgcgca ctgagcaaaa aagcatggcc
3900tggcgtgtgg actaactcgg tttgtgggca cccacaactg ggagaaagca acgaagacgc
3960agtgatccgc cgttgccgtt atgagcttgg cgtggaaatt acgcctcctg aatctatcta
4020tcctgacttt cgctaccgcg ccaccgatcc gagtggcatt gtggaaaatg aagtgtgtcc
4080ggtatttgcc gcacgcacca ctagtgcgtt acagatcaat gatgatgaag tgatggatta
4140tcaatggtgt gatttagcag atgtattaca cggtattgat gccacgccgt gggcgttcag
4200tccgtggatg gtgatgcagg cgacaaatcg cgaagccaga aaacgattat ctgcatttac
4260ccagcttaaa taactttaag gaaggagcga agcatgcgtt gtagcgttag caccgaaaat
4320gtgtcgttta cggaaacgga aaccgaagct cgccgcagcg caaactatga accgaactcg
4380tgggattacg attacctcct tagcagcgat acggatgaaa gcattgaagt gtataaagac
4440aaagccaaga aactggaggc cgaagtccgt cgcgaaatca acaatgagaa agcggagttt
4500cttacgttac tggaattgat cgataacgtg caacggttag gcctcggcta ccgctttgag
4560agcgatatcc gtggtgcact ggaccgcttc gtatcgtctg gtggttttga cgccgttacg
4620aaaacgagcc tgcatggtac agcattgtct tttcggctgt tgcgccagca tggatttgaa
4680gtgtcacagg aggcattttc aggcttcaaa gaccagaacg ggaatttttt ggagaatttg
4740aaagaagata tcaaagcgat cttatctctg tatgaggcgt catttctcgc tctggaaggg
4800gaaaatattc tggacgaagc gaaagtgttc gcaatttccc atctgaaaga actttccgaa
4860gaaaagattg ggaaagaatt ggccgaacag gtgaaccatg cgctggaact gccactgcac
4920cgtcgcaccc aacgcctcga agcggtatgg tcgattgaag cgtatcgcaa aaaagaggat
4980gcaaatcagg ttctgctgga actggccatt ctcgactata acatgattca gtccgtctat
5040caacgtgatc tgcgcgaaac tagtcgttgg tggcgccgtg taggacttgc cactaaactg
5100cattttgcac gtgatcgtct gattgagtcg ttctattggg cggttggtgt agcgtttgag
5160ccgcagtatt ctgattgccg caatagtgtg gcgaaaatgt tctcctttgt gaccatcatt
5220gacgatattt acgacgtgta tggcaccctg gatgaactgg aattattcac cgatgcagta
5280gaacgctggg acgtcaacgc gatcaatgat ttgccggatt acatgaaact gtgttttctg
5340gccctgtata acaccattaa cgaaattgcc tatgacaacc tcaaagacaa gggtgaaaat
5400atcctgccct atctgactaa agcttgggct gatctgtgta acgcgttctt acaggaagcc
5460aaatggctct acaacaagag tacgcctact ttcgatgact actttggcaa cgcttggaaa
5520agctctagcg gccctttaca actggtgttc gcgtatttcg ccgttgttca gaatatcaag
5580aaagaagaga ttgagaacct ccaaaagtac cacgatacga tttcgcgtcc gtcacacatc
5640tttcgccttt gcaatgattt ggccagtgca tctgcagaga ttgcgcgcgg tgaaactgcc
5700aactccgtca gttgctacat gcgtaccaaa ggcatcagcg aggaactggc taccgagtcg
5760gtgatgaact taatcgatga aacctggaag aagatgaaca aagagaaact tggtggcagt
5820ctgtttgcta aaccgttcgt tgagacagcg attaatctgg cgcgtcaaag ccactgcacc
5880taccacaatg gcgatgccca cacatcccca gacgaattaa cccggaaacg tgtcctgagt
5940gtcatcaccg aacccattct gccgttcgaa cgctaagcct gctaacaaag cccgaaagga
6000agctgagttg gctgctgcca ccgctgagca ctagtgcggc cgctttgcgc attcacagtt
6060ctccgcaaga attgattggc tccaattctt ggagtggtga atccgttagc gaggtgccgc
6120cggcttccat tcaggtcgag gtggcccggc tccatgcacc gcgacgcaac gcggggaggc
6180agacaaggta tagggcggcg cctacaatcc atgccaaccc gttccatgtg ctcgccgagg
6240cggcataaat cgccgtgacg atcagcggtc cagtgatcga agttaggctg gtaagagccg
6300cgagcgatcc ttgaagctgt ccctgatggt cgtcatctac ctgcctggac agcatggcct
6360gcaacgcggg catcccgatg ccgccggaag cgagaagaat cataatgggg aaggccatcc
6420agcctcgcgt cgcgaacgcc agcaagacgt agcccagcgc gtcggccgcc atgccggcga
6480taatggcctg cttctcgccg aaacgtttgg tggcgggacc agtgacgaag gcttgagcga
6540gggcgtgcaa gattccgaat accgcaagcg acaggccgat catcgtcgcg ctccagcgaa
6600agcggtcctc gccgaaaatg acccagagcg ctgccggcac ctgtcctacg agttgcatga
6660taaagaagac agtcataagt gcggcgacga tagtcatgcc ccgcgcccac cggaaggagc
6720tgactgggtt gaaggctctc aagggcatcg gtcgacgctc tcccttatgc gactcctgca
6780ttaggaagca gcccagtagt aggttgaggc cgttgagcac cgccgccgca aggaatggtg
6840catgcaagga gatggcgccc aacagtcccc cggccacggg gcctgccacc atacccacgc
6900cgaaacaagc gctcatgagc ccgaagtggc gagcccgatc ttccccatcg gtgatgtcgg
6960cgatataggc gccagcaacc gcacctgtgg cgccggtgat gccggccacg atgcgtccgg
7020cgtagaggat ccacaggacg ggtgtggtcg ccatgatcgc gtagtcgata gtggctccaa
7080gtagcgaagc gagcaggact gggcggcggc caaagcggtc ggacagtgct ccgagaacgg
7140gtgcgcatag aaattgcatc aacgcatata gcgctagcag cacgccatag tgactggcga
7200tgctgtcgga atggacgata tcccgcaaga ggcccggcag taccggcata accaagccta
7260tgcctacagc atccagggtg acggtgccga ggatgacgat gagcgcattg ttagatttca
7320tacacggtgc ctgactgcgt tagcaattta actgtgataa actaccgcat taaagcttat
7380cgatgataag ctgtcaaaca tgagaattct tgaagacgaa agggcctcgt gatacgccta
7440tttttatagg ttaatgtcat gataataatg gtttcttaga cgtcaggtgg cacttttcgg
7500ggaaatgtgc gcgcccgcgt tcctgctggc gctgggcctg tttctggcgc tggacttccc
7560gctgttccgt cagcagcttt tcgcccacgg ccttgatgat cgcggcggcc ttggcctgca
7620tatcccgatt caacggcccc agggcgtcca gaacgggctt caggcgctcc cgaaggtctc
7680gggccgtctc ttgggcttga tcggccttct tgcgcatctc acgcgctcct gcggcggcct
7740gtagggcagg ctcatacccc tgccgaaccg cttttgtcag ccggtcggcc acggcttccg
7800gcgtctcaac gcgctttgag attcccagct tttcggccaa tccctgcggt gcataggcgc
7860gtggctcgac cgcttgcggg ctgatggtga cgtggcccac tggtggccgc tccagggcct
7920cgtagaacgc ctgaatgcgc gtgtgacgtg ccttgctgcc
7960178464DNAArtificial sequenceSynthetic 17ctcgatgccc cgttgcagcc
ctagatcggc cacagcggcc gcaaacgtgg tctggtcgcg 60ggtcatctgc gctttgttgc
cgatgaactc cttggccgac agcctgccgt cctgcgtcag 120cggcaccacg aacgcggtca
tgtgcgggct ggtttcgtca cggtggatgc tggccgtcac 180gatgcgatcc gccccgtact
tgtccgccag ccacttgtgc gccttctcga agaacgccgc 240ctgctgttct tggctggccg
acttccacca ttccgggctg gccgtcatga cgtactcgac 300cgccaacaca gcgtccttgc
gccgcttctc tggcagcaac tcgcgcagtc ggcccatcgc 360ttcatcggtg ctgctggccg
cccagtgctc gttctctggc gtcctgctgg cgtcagcgtt 420gggcgtctcg cgctcgcggt
aggcgtgctt gagactggcc gccacgttgc ccattttcgc 480cagcttcttg catcgcatga
tcgcgtatgc cgccatgcct gcccctccct tttggtgtcc 540aaccggctcg acgggggcag
cgcaaggcgg tgcctccggc gggccactca atgcttgagt 600atactcacta gactttgctt
cgcaaagtcg tgaccgccta cggcggctgc ggcgccctac 660gggcttgctc tccgggcttc
gccctgcgcg gtcgctgcgc tcccttgcca gcccgtggat 720atgtggacga tggccgcgag
cggccaccgg ctggctcgct tcgctcggcc cgtggacaac 780cctgctggac aagctgatgg
acaggctgcg cctgcccacg agcttgacca cagggattgc 840ccaccggcta cccagccttc
gaccacatac ccaccggctc caactgcgcg gcctgcggcc 900ttgccccatc aattttttta
attttctctg gggaaaagcc tccggcctgc ggcctgcgcg 960cttcgcttgc cggttggaca
ccaagtggaa ggcgggtcaa ggctcgcgca gcgaccgcgc 1020agcggcttgg ccttgacgcg
cctggaacga cccaagccta tgcgagtggg ggcagtcgaa 1080ggcgaagccc gcccgcctgc
cccccgagcc tcacggcggc gagtgcgggg gttccaaggg 1140ggcagcgcca ccttgggcaa
ggccgaaggc cgcgcagtcg atcaacaagc cccggagggg 1200ccactttttg ccggaggggg
agccgcgccg aaggcgtggg ggaaccccgc aggggtgccc 1260ttctttgggc accaaagaac
tagatatagg gcgaaatgcg aaagacttaa aaatcaacaa 1320cttaaaaaag gggggtacgc
aacagctcat tgcggcaccc cccgcaatag ctcattgcgt 1380aggttaaaga aaatctgtaa
ttgactgcca cttttacgca acgcataatt gttgtcgcgc 1440tgccgaaaag ttgcagctga
ttgcgcatgg tgccgcaacc gtgcggcacc ctaccgcatg 1500gagataagca tggccacgca
gtccagagaa atcggcattc aagccaagaa caagcccggt 1560cactgggtgc aaacggaacg
caaagcgcat gaggcgtggg ccgggcttat tgcgaggaaa 1620cccacggcgg caatgctgct
gcatcacctc gtggcgcaga tgggccacca gaacgccgtg 1680gtggtcagcc agaagacact
ttccaagctc atcggacgtt ctttgcggac ggtccaatac 1740gcagtcaagg acttggtggc
cgagcgctgg atctccgtcg tgaagctcaa cggccccggc 1800accgtgtcgg cctacgtggt
caatgaccgc gtggcgtggg gccagccccg cgaccagttg 1860cgcctgtcgg tgttcagtgc
cgccgtggtg gttgatcacg acgaccagga cgaatcgctg 1920ttggggcatg gcgacctgcg
ccgcatcccg accctgtatc cgggcgagca gcaactaccg 1980accggccccg gcgaggagcc
gcccagccag cccggcattc cgggcatgga accagacctg 2040ccagccttga ccgaaacgga
ggaatgggaa cggcgcgggc agcagcgcct gccgatgccc 2100gatgagccgt gttttctgga
cgatggcgag ccgttggagc cgccgacacg ggtcacgctg 2160ccgcgccggt agcacttggg
ttgcgcagca acccgtaagt gcgctgttcc agactatcgg 2220ctgtagccgc ctctagatta
attaacctcc agcgcgggga tctcatgctg gagttcttcg 2280cccaccccca gacaagctgt
gaccgtctcc gggagctgca tgtgtcagag gttttcaccg 2340tcatcaccga aacgcgcgag
gcagcagatc aattcgcgcg cgaaggcgaa gcggcatgca 2400taatgtgcct gtcaaatgga
cgaagcaggg attctgcaaa ccctatgcta ctccgtcaag 2460ccgtcaattg tctgattcgt
taccaattat gacaacttga cggctacatc attcactttt 2520tcttcacaac cggcacggaa
ctcgctcggg ctggccccgg tgcatttttt aaatacccgc 2580gagaaataga gttgatcgtc
aaaaccaaca ttgcgaccga cggtggcgat aggcatccgg 2640gtggtgctca aaagcagctt
cgcctggctg atacgttggt cctcgcgcca gcttaagacg 2700ctaatcccta actgctggcg
gaaaagatgt gacagacgcg acggcgacaa gcaaacatgc 2760tgtgcgacgc tggcgatatc
aaaattgctg tctgccaggt gatcgctgat gtactgacaa 2820gcctcgcgta cccgattatc
catcggtgga tggagcgact cgttaatcgc ttccatgcgc 2880cgcagtaaca attgctcaag
cagatttatc gccagcagct ccgaatagcg cccttcccct 2940tgcccggcgt taatgatttg
cccaaacagg tcgctgaaat gcggctggtg cgcttcatcc 3000gggcgaaaga accccgtatt
ggcaaatatt gacggccagt taagccattc atgccagtag 3060gcgcgcggac gaaagtaaac
ccactggtga taccattcgc gagcctccgg atgacgaccg 3120tagtgatgaa tctctcctgg
cgggaacagc aaaatatcac ccggtcggca aacaaattct 3180cgtccctgat ttttcaccac
cccctgaccg cgaatggtga gattgagaat ataacctttc 3240attcccagcg gtcggtcgat
aaaaaaatcg agataaccgt tggcctcaat cggcgttaaa 3300cccgccacca gatgggcatt
aaacgagtat cccggcagca ggggatcatt ttgcgcttca 3360gccatacttt tcatactccc
gccattcaga gaagaaacca attgtccata ttgcatcaga 3420cattgccgtc actgcgtctt
ttactggctc ttctcgctaa ccaaaccggt aaccccgctt 3480attaaaagca ttctgtaaca
aagcgggacc aaagccatga caaaaacgcg taacaaaagt 3540gtctataatc acggcagaaa
agtccacatt gattatttgc acggcgtcac actttgctat 3600gccatagcat ttttatccat
aagattagcg gatcctacct gacgcttttt atcgcaactc 3660tctactgttt ctccataccc
gttttttggg ctagaaataa ttttgagctc gccaaggaga 3720tataatggtc acgcgcgcgg
agcgcaagcg ccagcacatc aaccacgcgc tctccatcgg 3780ccagaagcgc gaaaccggcc
tggacgacat cacgtttgtg catgtctcgc tgccggacct 3840ggccctcgaa caggtcgaca
tctcgacgaa gattggcgag ctgagctcct cgtcgccgat 3900cttcatcaac gcgatgaccg
gcggtggtgg caagctgacc tacgagatca acaagtccct 3960ggcgcgcgcg gccagccagg
ccggcatccc gctggcggtc ggcagccaga tgtcggccct 4020gaaggacccc agcgagcgcc
tgtcgtacga gattgtccgc aaggaaaacc cgaacggcct 4080gatcttcgcc aatctgggct
cggaagccac cgcggcgcag gccaaagaag cggtggagat 4140gatcggcgcc aacgccctgc
agatccacct gaacgtgatc caagagatcg tgatgcccga 4200gggcgaccgt tccttctccg
gcgccctcaa gcgcatcgag caaatctgca gccgcgtgtc 4260ggtgcccgtc atcgtcaagg
aagtgggctt cggcatgtcg aaggccagcg ccggcaagct 4320gtacgaagcc ggcgcggccg
ccgtggacat cggcggctac ggcggcacga acttcagcaa 4380gattgagaat ctgcgccgcc
agcggcagat cagcttcttc aactcgtggg gcatcagcac 4440ggccgcgtcg ctggcggaga
tccggtccga gttcccggcc tcgaccatga tcgcgtccgg 4500tggcctccaa gacgccctgg
acgtcgccaa ggccatcgcc ctgggcgcga gctgcaccgg 4560catggccggt cacttcctga
aggccctgac cgatagcggc gaggaaggcc tgctggaaga 4620gatccagctg atcctggaag
aactgaagct gatcatgacg gtgctgggcg cccgtaccat 4680cgcggatctg caaaaggcgc
cgctcgtgat caagggcgaa acccatcact ggctcaccga 4740gcggggcgtg aacaccagct
cgtattcggt gcgctgactt taaggaagga gcgaagcatg 4800cgttgtagcg ttagcaccga
aaatgtgtcg tttacggaaa cggaaaccga agctcgccgc 4860agcgcaaact atgaaccgaa
ctcgtgggat tacgattacc tccttagcag cgatacggat 4920gaaagcattg aagtgtataa
agacaaagcc aagaaactgg aggccgaagt ccgtcgcgaa 4980atcaacaatg agaaagcgga
gtttcttacg ttactggaat tgatcgataa cgtgcaacgg 5040ttaggcctcg gctaccgctt
tgagagcgat atccgtggtg cactggaccg cttcgtatcg 5100tctggtggtt ttgacgccgt
tacgaaaacg agcctgcatg gtacagcatt gtcttttcgg 5160ctgttgcgcc agcatggatt
tgaagtgtca caggaggcat tttcaggctt caaagaccag 5220aacgggaatt ttttggagaa
tttgaaagaa gatatcaaag cgatcttatc tctgtatgag 5280gcgtcatttc tcgctctgga
aggggaaaat attctggacg aagcgaaagt gttcgcaatt 5340tcccatctga aagaactttc
cgaagaaaag attgggaaag aattggccga acaggtgaac 5400catgcgctgg aactgccact
gcaccgtcgc acccaacgcc tcgaagcggt atggtcgatt 5460gaagcgtatc gcaaaaaaga
ggatgcaaat caggttctgc tggaactggc cattctcgac 5520tataacatga ttcagtccgt
ctatcaacgt gatctgcgcg aaactagtcg ttggtggcgc 5580cgtgtaggac ttgccactaa
actgcatttt gcacgtgatc gtctgattga gtcgttctat 5640tgggcggttg gtgtagcgtt
tgagccgcag tattctgatt gccgcaatag tgtggcgaaa 5700atgttctcct ttgtgaccat
cattgacgat atttacgacg tgtatggcac cctggatgaa 5760ctggaattat tcaccgatgc
agtagaacgc tgggacgtca acgcgatcaa tgatttgccg 5820gattacatga aactgtgttt
tctggccctg tataacacca ttaacgaaat tgcctatgac 5880aacctcaaag acaagggtga
aaatatcctg ccctatctga ctaaagcttg ggctgatctg 5940tgtaacgcgt tcttacagga
agccaaatgg ctctacaaca agagtacgcc tactttcgat 6000gactactttg gcaacgcttg
gaaaagctct agcggccctt tacaactggt gttcgcgtat 6060ttcgccgttg ttcagaatat
caagaaagaa gagattgaga acctccaaaa gtaccacgat 6120acgatttcgc gtccgtcaca
catctttcgc ctttgcaatg atttggccag tgcatctgca 6180gagattgcgc gcggtgaaac
tgccaactcc gtcagttgct acatgcgtac caaaggcatc 6240agcgaggaac tggctaccga
gtcggtgatg aacttaatcg atgaaacctg gaagaagatg 6300aacaaagaga aacttggtgg
cagtctgttt gctaaaccgt tcgttgagac agcgattaat 6360ctggcgcgtc aaagccactg
cacctaccac aatggcgatg cccacacatc cccagacgaa 6420ttaacccgga aacgtgtcct
gagtgtcatc accgaaccca ttctgccgtt cgaacgctaa 6480gcctgctaac aaagcccgaa
aggaagctga gttggctgct gccaccgctg agcactagtg 6540cggccgcttt gcgcattcac
agttctccgc aagaattgat tggctccaat tcttggagtg 6600gtgaatccgt tagcgaggtg
ccgccggctt ccattcaggt cgaggtggcc cggctccatg 6660caccgcgacg caacgcgggg
aggcagacaa ggtatagggc ggcgcctaca atccatgcca 6720acccgttcca tgtgctcgcc
gaggcggcat aaatcgccgt gacgatcagc ggtccagtga 6780tcgaagttag gctggtaaga
gccgcgagcg atccttgaag ctgtccctga tggtcgtcat 6840ctacctgcct ggacagcatg
gcctgcaacg cgggcatccc gatgccgccg gaagcgagaa 6900gaatcataat ggggaaggcc
atccagcctc gcgtcgcgaa cgccagcaag acgtagccca 6960gcgcgtcggc cgccatgccg
gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 7020gaccagtgac gaaggcttga
gcgagggcgt gcaagattcc gaataccgca agcgacaggc 7080cgatcatcgt cgcgctccag
cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 7140gcacctgtcc tacgagttgc
atgataaaga agacagtcat aagtgcggcg acgatagtca 7200tgccccgcgc ccaccggaag
gagctgactg ggttgaaggc tctcaagggc atcggtcgac 7260gctctccctt atgcgactcc
tgcattagga agcagcccag tagtaggttg aggccgttga 7320gcaccgccgc cgcaaggaat
ggtgcatgca aggagatggc gcccaacagt cccccggcca 7380cggggcctgc caccataccc
acgccgaaac aagcgctcat gagcccgaag tggcgagccc 7440gatcttcccc atcggtgatg
tcggcgatat aggcgccagc aaccgcacct gtggcgccgg 7500tgatgccggc cacgatgcgt
ccggcgtaga ggatccacag gacgggtgtg gtcgccatga 7560tcgcgtagtc gatagtggct
ccaagtagcg aagcgagcag gactgggcgg cggccaaagc 7620ggtcggacag tgctccgaga
acgggtgcgc atagaaattg catcaacgca tatagcgcta 7680gcagcacgcc atagtgactg
gcgatgctgt cggaatggac gatatcccgc aagaggcccg 7740gcagtaccgg cataaccaag
cctatgccta cagcatccag ggtgacggtg ccgaggatga 7800cgatgagcgc attgttagat
ttcatacacg gtgcctgact gcgttagcaa tttaactgtg 7860ataaactacc gcattaaagc
ttatcgatga taagctgtca aacatgagaa ttcttgaaga 7920cgaaagggcc tcgtgatacg
cctattttta taggttaatg tcatgataat aatggtttct 7980tagacgtcag gtggcacttt
tcggggaaat gtgcgcgccc gcgttcctgc tggcgctggg 8040cctgtttctg gcgctggact
tcccgctgtt ccgtcagcag cttttcgccc acggccttga 8100tgatcgcggc ggccttggcc
tgcatatccc gattcaacgg ccccagggcg tccagaacgg 8160gcttcaggcg ctcccgaagg
tctcgggccg tctcttgggc ttgatcggcc ttcttgcgca 8220tctcacgcgc tcctgcggcg
gcctgtaggg caggctcata cccctgccga accgcttttg 8280tcagccggtc ggccacggct
tccggcgtct caacgcgctt tgagattccc agcttttcgg 8340ccaatccctg cggtgcatag
gcgcgtggct cgaccgcttg cgggctgatg gtgacgtggc 8400ccactggtgg ccgctccagg
gcctcgtaga acgcctgaat gcgcgtgtga cgtgccttgc 8460tgcc
8464188278DNAArtificial
sequenceSynthetic 18ctcgatgccc cgttgcagcc ctagatcggc cacagcggcc
gcaaacgtgg tctggtcgcg 60ggtcatctgc gctttgttgc cgatgaactc cttggccgac
agcctgccgt cctgcgtcag 120cggcaccacg aacgcggtca tgtgcgggct ggtttcgtca
cggtggatgc tggccgtcac 180gatgcgatcc gccccgtact tgtccgccag ccacttgtgc
gccttctcga agaacgccgc 240ctgctgttct tggctggccg acttccacca ttccgggctg
gccgtcatga cgtactcgac 300cgccaacaca gcgtccttgc gccgcttctc tggcagcaac
tcgcgcagtc ggcccatcgc 360ttcatcggtg ctgctggccg cccagtgctc gttctctggc
gtcctgctgg cgtcagcgtt 420gggcgtctcg cgctcgcggt aggcgtgctt gagactggcc
gccacgttgc ccattttcgc 480cagcttcttg catcgcatga tcgcgtatgc cgccatgcct
gcccctccct tttggtgtcc 540aaccggctcg acgggggcag cgcaaggcgg tgcctccggc
gggccactca atgcttgagt 600atactcacta gactttgctt cgcaaagtcg tgaccgccta
cggcggctgc ggcgccctac 660gggcttgctc tccgggcttc gccctgcgcg gtcgctgcgc
tcccttgcca gcccgtggat 720atgtggacga tggccgcgag cggccaccgg ctggctcgct
tcgctcggcc cgtggacaac 780cctgctggac aagctgatgg acaggctgcg cctgcccacg
agcttgacca cagggattgc 840ccaccggcta cccagccttc gaccacatac ccaccggctc
caactgcgcg gcctgcggcc 900ttgccccatc aattttttta attttctctg gggaaaagcc
tccggcctgc ggcctgcgcg 960cttcgcttgc cggttggaca ccaagtggaa ggcgggtcaa
ggctcgcgca gcgaccgcgc 1020agcggcttgg ccttgacgcg cctggaacga cccaagccta
tgcgagtggg ggcagtcgaa 1080ggcgaagccc gcccgcctgc cccccgagcc tcacggcggc
gagtgcgggg gttccaaggg 1140ggcagcgcca ccttgggcaa ggccgaaggc cgcgcagtcg
atcaacaagc cccggagggg 1200ccactttttg ccggaggggg agccgcgccg aaggcgtggg
ggaaccccgc aggggtgccc 1260ttctttgggc accaaagaac tagatatagg gcgaaatgcg
aaagacttaa aaatcaacaa 1320cttaaaaaag gggggtacgc aacagctcat tgcggcaccc
cccgcaatag ctcattgcgt 1380aggttaaaga aaatctgtaa ttgactgcca cttttacgca
acgcataatt gttgtcgcgc 1440tgccgaaaag ttgcagctga ttgcgcatgg tgccgcaacc
gtgcggcacc ctaccgcatg 1500gagataagca tggccacgca gtccagagaa atcggcattc
aagccaagaa caagcccggt 1560cactgggtgc aaacggaacg caaagcgcat gaggcgtggg
ccgggcttat tgcgaggaaa 1620cccacggcgg caatgctgct gcatcacctc gtggcgcaga
tgggccacca gaacgccgtg 1680gtggtcagcc agaagacact ttccaagctc atcggacgtt
ctttgcggac ggtccaatac 1740gcagtcaagg acttggtggc cgagcgctgg atctccgtcg
tgaagctcaa cggccccggc 1800accgtgtcgg cctacgtggt caatgaccgc gtggcgtggg
gccagccccg cgaccagttg 1860cgcctgtcgg tgttcagtgc cgccgtggtg gttgatcacg
acgaccagga cgaatcgctg 1920ttggggcatg gcgacctgcg ccgcatcccg accctgtatc
cgggcgagca gcaactaccg 1980accggccccg gcgaggagcc gcccagccag cccggcattc
cgggcatgga accagacctg 2040ccagccttga ccgaaacgga ggaatgggaa cggcgcgggc
agcagcgcct gccgatgccc 2100gatgagccgt gttttctgga cgatggcgag ccgttggagc
cgccgacacg ggtcacgctg 2160ccgcgccggt agcacttggg ttgcgcagca acccgtaagt
gcgctgttcc agactatcgg 2220ctgtagccgc ctctagatta attaacctcc agcgcgggga
tctcatgctg gagttcttcg 2280cccaccccca gacaagctgt gaccgtctcc gggagctgca
tgtgtcagag gttttcaccg 2340tcatcaccga aacgcgcgag gcagcagatc aattcgcgcg
cgaaggcgaa gcggcatgca 2400taatgtgcct gtcaaatgga cgaagcaggg attctgcaaa
ccctatgcta ctccgtcaag 2460ccgtcaattg tctgattcgt taccaattat gacaacttga
cggctacatc attcactttt 2520tcttcacaac cggcacggaa ctcgctcggg ctggccccgg
tgcatttttt aaatacccgc 2580gagaaataga gttgatcgtc aaaaccaaca ttgcgaccga
cggtggcgat aggcatccgg 2640gtggtgctca aaagcagctt cgcctggctg atacgttggt
cctcgcgcca gcttaagacg 2700ctaatcccta actgctggcg gaaaagatgt gacagacgcg
acggcgacaa gcaaacatgc 2760tgtgcgacgc tggcgatatc aaaattgctg tctgccaggt
gatcgctgat gtactgacaa 2820gcctcgcgta cccgattatc catcggtgga tggagcgact
cgttaatcgc ttccatgcgc 2880cgcagtaaca attgctcaag cagatttatc gccagcagct
ccgaatagcg cccttcccct 2940tgcccggcgt taatgatttg cccaaacagg tcgctgaaat
gcggctggtg cgcttcatcc 3000gggcgaaaga accccgtatt ggcaaatatt gacggccagt
taagccattc atgccagtag 3060gcgcgcggac gaaagtaaac ccactggtga taccattcgc
gagcctccgg atgacgaccg 3120tagtgatgaa tctctcctgg cgggaacagc aaaatatcac
ccggtcggca aacaaattct 3180cgtccctgat ttttcaccac cccctgaccg cgaatggtga
gattgagaat ataacctttc 3240attcccagcg gtcggtcgat aaaaaaatcg agataaccgt
tggcctcaat cggcgttaaa 3300cccgccacca gatgggcatt aaacgagtat cccggcagca
ggggatcatt ttgcgcttca 3360gccatacttt tcatactccc gccattcaga gaagaaacca
attgtccata ttgcatcaga 3420cattgccgtc actgcgtctt ttactggctc ttctcgctaa
ccaaaccggt aaccccgctt 3480attaaaagca ttctgtaaca aagcgggacc aaagccatga
caaaaacgcg taacaaaagt 3540gtctataatc acggcagaaa agtccacatt gattatttgc
acggcgtcac actttgctat 3600gccatagcat ttttatccat aagattagcg gatcctacct
gacgcttttt atcgcaactc 3660tctactgttt ctccataccc gttttttggg ctagaaataa
ttttgagctc gccaaggaga 3720tataatgact gccgacaaca atagtatgcc ccatggtgca
gtatctagtt acgccaaatt 3780agtgcaaaac caaacacctg aagacatttt ggaagagttt
cctgaaatta ttccattaca 3840acaaagacct aatacccgat ctagtgagac gtcaaatgac
gaaagcggag aaacatgttt 3900ttctggtcat gatgaggagc aaattaagtt aatgaatgaa
aattgtattg ttttggattg 3960ggacgataat gctattggtg ccggtaccaa gaaagtttgt
catttaatgg aaaatattga 4020aaagggttta ctacatcgtg cattctccgt ctttattttc
aatgaacaag gtgaattact 4080tttacaacaa agagccactg aaaaaataac tttccctgat
ctttggacta acacatgctg 4140ctctcatcca ctatgtattg atgacgaatt aggtttgaag
ggtaagctag acgataagat 4200taagggcgct attactgcgg cggtgagaaa actagatcat
gaattaggta ttccagaaga 4260tgaaactaag acaaggggta agtttcactt tttaaacaga
atccattaca tggcaccaag 4320caatgaacca tggggtgaac atgaaattga ttacatccta
ttttataaga tcaacgctaa 4380agaaaacttg actgtcaacc caaacgtcaa tgaagttaga
gacttcaaat gggtttcacc 4440aaatgatttg aaaactatgt ttgctgaccc aagttacaag
tttacgcctt ggtttaagat 4500tatttgcgag aattacttat tcaactggtg ggagcaatta
gatgaccttt ctgaagtgga 4560aaatgacagg caaattcata gaatgctata actttaagga
aggagcgaag catgcgttgt 4620agcgttagca ccgaaaatgt gtcgtttacg gaaacggaaa
ccgaagctcg ccgcagcgca 4680aactatgaac cgaactcgtg ggattacgat tacctcctta
gcagcgatac ggatgaaagc 4740attgaagtgt ataaagacaa agccaagaaa ctggaggccg
aagtccgtcg cgaaatcaac 4800aatgagaaag cggagtttct tacgttactg gaattgatcg
ataacgtgca acggttaggc 4860ctcggctacc gctttgagag cgatatccgt ggtgcactgg
accgcttcgt atcgtctggt 4920ggttttgacg ccgttacgaa aacgagcctg catggtacag
cattgtcttt tcggctgttg 4980cgccagcatg gatttgaagt gtcacaggag gcattttcag
gcttcaaaga ccagaacggg 5040aattttttgg agaatttgaa agaagatatc aaagcgatct
tatctctgta tgaggcgtca 5100tttctcgctc tggaagggga aaatattctg gacgaagcga
aagtgttcgc aatttcccat 5160ctgaaagaac tttccgaaga aaagattggg aaagaattgg
ccgaacaggt gaaccatgcg 5220ctggaactgc cactgcaccg tcgcacccaa cgcctcgaag
cggtatggtc gattgaagcg 5280tatcgcaaaa aagaggatgc aaatcaggtt ctgctggaac
tggccattct cgactataac 5340atgattcagt ccgtctatca acgtgatctg cgcgaaacta
gtcgttggtg gcgccgtgta 5400ggacttgcca ctaaactgca ttttgcacgt gatcgtctga
ttgagtcgtt ctattgggcg 5460gttggtgtag cgtttgagcc gcagtattct gattgccgca
atagtgtggc gaaaatgttc 5520tcctttgtga ccatcattga cgatatttac gacgtgtatg
gcaccctgga tgaactggaa 5580ttattcaccg atgcagtaga acgctgggac gtcaacgcga
tcaatgattt gccggattac 5640atgaaactgt gttttctggc cctgtataac accattaacg
aaattgccta tgacaacctc 5700aaagacaagg gtgaaaatat cctgccctat ctgactaaag
cttgggctga tctgtgtaac 5760gcgttcttac aggaagccaa atggctctac aacaagagta
cgcctacttt cgatgactac 5820tttggcaacg cttggaaaag ctctagcggc cctttacaac
tggtgttcgc gtatttcgcc 5880gttgttcaga atatcaagaa agaagagatt gagaacctcc
aaaagtacca cgatacgatt 5940tcgcgtccgt cacacatctt tcgcctttgc aatgatttgg
ccagtgcatc tgcagagatt 6000gcgcgcggtg aaactgccaa ctccgtcagt tgctacatgc
gtaccaaagg catcagcgag 6060gaactggcta ccgagtcggt gatgaactta atcgatgaaa
cctggaagaa gatgaacaaa 6120gagaaacttg gtggcagtct gtttgctaaa ccgttcgttg
agacagcgat taatctggcg 6180cgtcaaagcc actgcaccta ccacaatggc gatgcccaca
catccccaga cgaattaacc 6240cggaaacgtg tcctgagtgt catcaccgaa cccattctgc
cgttcgaacg ctaagcctgc 6300taacaaagcc cgaaaggaag ctgagttggc tgctgccacc
gctgagcact agtgcggccg 6360ctttgcgcat tcacagttct ccgcaagaat tgattggctc
caattcttgg agtggtgaat 6420ccgttagcga ggtgccgccg gcttccattc aggtcgaggt
ggcccggctc catgcaccgc 6480gacgcaacgc ggggaggcag acaaggtata gggcggcgcc
tacaatccat gccaacccgt 6540tccatgtgct cgccgaggcg gcataaatcg ccgtgacgat
cagcggtcca gtgatcgaag 6600ttaggctggt aagagccgcg agcgatcctt gaagctgtcc
ctgatggtcg tcatctacct 6660gcctggacag catggcctgc aacgcgggca tcccgatgcc
gccggaagcg agaagaatca 6720taatggggaa ggccatccag cctcgcgtcg cgaacgccag
caagacgtag cccagcgcgt 6780cggccgccat gccggcgata atggcctgct tctcgccgaa
acgtttggtg gcgggaccag 6840tgacgaaggc ttgagcgagg gcgtgcaaga ttccgaatac
cgcaagcgac aggccgatca 6900tcgtcgcgct ccagcgaaag cggtcctcgc cgaaaatgac
ccagagcgct gccggcacct 6960gtcctacgag ttgcatgata aagaagacag tcataagtgc
ggcgacgata gtcatgcccc 7020gcgcccaccg gaaggagctg actgggttga aggctctcaa
gggcatcggt cgacgctctc 7080ccttatgcga ctcctgcatt aggaagcagc ccagtagtag
gttgaggccg ttgagcaccg 7140ccgccgcaag gaatggtgca tgcaaggaga tggcgcccaa
cagtcccccg gccacggggc 7200ctgccaccat acccacgccg aaacaagcgc tcatgagccc
gaagtggcga gcccgatctt 7260ccccatcggt gatgtcggcg atataggcgc cagcaaccgc
acctgtggcg ccggtgatgc 7320cggccacgat gcgtccggcg tagaggatcc acaggacggg
tgtggtcgcc atgatcgcgt 7380agtcgatagt ggctccaagt agcgaagcga gcaggactgg
gcggcggcca aagcggtcgg 7440acagtgctcc gagaacgggt gcgcatagaa attgcatcaa
cgcatatagc gctagcagca 7500cgccatagtg actggcgatg ctgtcggaat ggacgatatc
ccgcaagagg cccggcagta 7560ccggcataac caagcctatg cctacagcat ccagggtgac
ggtgccgagg atgacgatga 7620gcgcattgtt agatttcata cacggtgcct gactgcgtta
gcaatttaac tgtgataaac 7680taccgcatta aagcttatcg atgataagct gtcaaacatg
agaattcttg aagacgaaag 7740ggcctcgtga tacgcctatt tttataggtt aatgtcatga
taataatggt ttcttagacg 7800tcaggtggca cttttcgggg aaatgtgcgc gcccgcgttc
ctgctggcgc tgggcctgtt 7860tctggcgctg gacttcccgc tgttccgtca gcagcttttc
gcccacggcc ttgatgatcg 7920cggcggcctt ggcctgcata tcccgattca acggccccag
ggcgtccaga acgggcttca 7980ggcgctcccg aaggtctcgg gccgtctctt gggcttgatc
ggccttcttg cgcatctcac 8040gcgctcctgc ggcggcctgt agggcaggct catacccctg
ccgaaccgct tttgtcagcc 8100ggtcggccac ggcttccggc gtctcaacgc gctttgagat
tcccagcttt tcggccaatc 8160cctgcggtgc ataggcgcgt ggctcgaccg cttgcgggct
gatggtgacg tggcccactg 8220gtggccgctc cagggcctcg tagaacgcct gaatgcgcgt
gtgacgtgcc ttgctgcc 8278198455DNAArtificial sequenceSynthetic
19ctcgatgccc cgttgcagcc ctagatcggc cacagcggcc gcaaacgtgg tctggtcgcg
60ggtcatctgc gctttgttgc cgatgaactc cttggccgac agcctgccgt cctgcgtcag
120cggcaccacg aacgcggtca tgtgcgggct ggtttcgtca cggtggatgc tggccgtcac
180gatgcgatcc gccccgtact tgtccgccag ccacttgtgc gccttctcga agaacgccgc
240ctgctgttct tggctggccg acttccacca ttccgggctg gccgtcatga cgtactcgac
300cgccaacaca gcgtccttgc gccgcttctc tggcagcaac tcgcgcagtc ggcccatcgc
360ttcatcggtg ctgctggccg cccagtgctc gttctctggc gtcctgctgg cgtcagcgtt
420gggcgtctcg cgctcgcggt aggcgtgctt gagactggcc gccacgttgc ccattttcgc
480cagcttcttg catcgcatga tcgcgtatgc cgccatgcct gcccctccct tttggtgtcc
540aaccggctcg acgggggcag cgcaaggcgg tgcctccggc gggccactca atgcttgagt
600atactcacta gactttgctt cgcaaagtcg tgaccgccta cggcggctgc ggcgccctac
660gggcttgctc tccgggcttc gccctgcgcg gtcgctgcgc tcccttgcca gcccgtggat
720atgtggacga tggccgcgag cggccaccgg ctggctcgct tcgctcggcc cgtggacaac
780cctgctggac aagctgatgg acaggctgcg cctgcccacg agcttgacca cagggattgc
840ccaccggcta cccagccttc gaccacatac ccaccggctc caactgcgcg gcctgcggcc
900ttgccccatc aattttttta attttctctg gggaaaagcc tccggcctgc ggcctgcgcg
960cttcgcttgc cggttggaca ccaagtggaa ggcgggtcaa ggctcgcgca gcgaccgcgc
1020agcggcttgg ccttgacgcg cctggaacga cccaagccta tgcgagtggg ggcagtcgaa
1080ggcgaagccc gcccgcctgc cccccgagcc tcacggcggc gagtgcgggg gttccaaggg
1140ggcagcgcca ccttgggcaa ggccgaaggc cgcgcagtcg atcaacaagc cccggagggg
1200ccactttttg ccggaggggg agccgcgccg aaggcgtggg ggaaccccgc aggggtgccc
1260ttctttgggc accaaagaac tagatatagg gcgaaatgcg aaagacttaa aaatcaacaa
1320cttaaaaaag gggggtacgc aacagctcat tgcggcaccc cccgcaatag ctcattgcgt
1380aggttaaaga aaatctgtaa ttgactgcca cttttacgca acgcataatt gttgtcgcgc
1440tgccgaaaag ttgcagctga ttgcgcatgg tgccgcaacc gtgcggcacc ctaccgcatg
1500gagataagca tggccacgca gtccagagaa atcggcattc aagccaagaa caagcccggt
1560cactgggtgc aaacggaacg caaagcgcat gaggcgtggg ccgggcttat tgcgaggaaa
1620cccacggcgg caatgctgct gcatcacctc gtggcgcaga tgggccacca gaacgccgtg
1680gtggtcagcc agaagacact ttccaagctc atcggacgtt ctttgcggac ggtccaatac
1740gcagtcaagg acttggtggc cgagcgctgg atctccgtcg tgaagctcaa cggccccggc
1800accgtgtcgg cctacgtggt caatgaccgc gtggcgtggg gccagccccg cgaccagttg
1860cgcctgtcgg tgttcagtgc cgccgtggtg gttgatcacg acgaccagga cgaatcgctg
1920ttggggcatg gcgacctgcg ccgcatcccg accctgtatc cgggcgagca gcaactaccg
1980accggccccg gcgaggagcc gcccagccag cccggcattc cgggcatgga accagacctg
2040ccagccttga ccgaaacgga ggaatgggaa cggcgcgggc agcagcgcct gccgatgccc
2100gatgagccgt gttttctgga cgatggcgag ccgttggagc cgccgacacg ggtcacgctg
2160ccgcgccggt agcacttggg ttgcgcagca acccgtaagt gcgctgttcc agactatcgg
2220ctgtagccgc ctctagatta attaacctcc agcgcgggga tctcatgctg gagttcttcg
2280cccaccccca gacaagctgt gaccgtctcc gggagctgca tgtgtcagag gttttcaccg
2340tcatcaccga aacgcgcgag gcagcagatc aattcgcgcg cgaaggcgaa gcggcatgca
2400taatgtgcct gtcaaatgga cgaagcaggg attctgcaaa ccctatgcta ctccgtcaag
2460ccgtcaattg tctgattcgt taccaattat gacaacttga cggctacatc attcactttt
2520tcttcacaac cggcacggaa ctcgctcggg ctggccccgg tgcatttttt aaatacccgc
2580gagaaataga gttgatcgtc aaaaccaaca ttgcgaccga cggtggcgat aggcatccgg
2640gtggtgctca aaagcagctt cgcctggctg atacgttggt cctcgcgcca gcttaagacg
2700ctaatcccta actgctggcg gaaaagatgt gacagacgcg acggcgacaa gcaaacatgc
2760tgtgcgacgc tggcgatatc aaaattgctg tctgccaggt gatcgctgat gtactgacaa
2820gcctcgcgta cccgattatc catcggtgga tggagcgact cgttaatcgc ttccatgcgc
2880cgcagtaaca attgctcaag cagatttatc gccagcagct ccgaatagcg cccttcccct
2940tgcccggcgt taatgatttg cccaaacagg tcgctgaaat gcggctggtg cgcttcatcc
3000gggcgaaaga accccgtatt ggcaaatatt gacggccagt taagccattc atgccagtag
3060gcgcgcggac gaaagtaaac ccactggtga taccattcgc gagcctccgg atgacgaccg
3120tagtgatgaa tctctcctgg cgggaacagc aaaatatcac ccggtcggca aacaaattct
3180cgtccctgat ttttcaccac cccctgaccg cgaatggtga gattgagaat ataacctttc
3240attcccagcg gtcggtcgat aaaaaaatcg agataaccgt tggcctcaat cggcgttaaa
3300cccgccacca gatgggcatt aaacgagtat cccggcagca ggggatcatt ttgcgcttca
3360gccatacttt tcatactccc gccattcaga gaagaaacca attgtccata ttgcatcaga
3420cattgccgtc actgcgtctt ttactggctc ttctcgctaa ccaaaccggt aaccccgctt
3480attaaaagca ttctgtaaca aagcgggacc aaagccatga caaaaacgcg taacaaaagt
3540gtctataatc acggcagaaa agtccacatt gattatttgc acggcgtcac actttgctat
3600gccatagcat ttttatccat aagattagcg gatcctacct gacgcttttt atcgcaactc
3660tctactgttt ctccataccc gttttttggg ctagaaataa ttttgagctc gccaaggaga
3720tataatgaat cgaaaagatg aacatctatc attagctaaa gcgttccaca aagaaaaaag
3780taatgacttt gatcgtgtgc gttttgttca ccaatcgttt gctgaatccg ctgttaacga
3840agtggatatt tccacttcgt ttctttcttt tcagcttccc caaccttttt atgtcaatgc
3900aatgacaggt ggtagtcagc gtgcaaaaga aattaatcag caattaggca ttattgccaa
3960agaaactggc cttttagttg cgacaggatc tgtctcggca gcgttaaaag atgctagttt
4020agcggatacg tatcaaatta tgcgaaaaga aaacccagat ggactcattt ttgccaatat
4080tggtgcaggc ttgggtgtgg aagaagcaaa gcgagcgctt gatttatttc aagcgaatgc
4140cttacaaatc catgtaaatg tgccccaaga attggtcatg cctgaaggag atcgtgattt
4200cactaattgg ctaaccaaga ttgaagctat cgtacaggcc gtagaagtgc ctgtcattgt
4260caaagaggtt ggctttggca tgagccaaga aaccttagaa aaacttacct ctatcggcgt
4320tcaagcagcg gatgtgagcg gccaaggcgg aacgagtttt acacaaattg aaaatgcccg
4380gcggaagaaa cgagaacttt ctttcttaga tgattggggg caatcaacgg tcatctctct
4440tctggaatca caaaattggc aaaagaaact aactattctc ggctctggcg gtgtgcgtaa
4500ctctcttgat attgtcaaag gactcgcttt aggtgccaaa agcatgggag ttgctgggac
4560tatcttagct tcccttatga gtaaaaatgg tttagaaaat accttagccc ttgtacagca
4620atggcaagaa gaagtgaaaa tgctttatac tcttttagga aaaaagacga cagaagaatt
4680gacgagtacc gcacttgtcc tcgatccagt tttagttaat tggtgtcata accgtggtat
4740cgacagcact gttttcgcaa aacgttaact ttaaggaagg agcgaagcat gcgttgtagc
4800gttagcaccg aaaatgtgtc gtttacggaa acggaaaccg aagctcgccg cagcgcaaac
4860tatgaaccga actcgtggga ttacgattac ctccttagca gcgatacgga tgaaagcatt
4920gaagtgtata aagacaaagc caagaaactg gaggccgaag tccgtcgcga aatcaacaat
4980gagaaagcgg agtttcttac gttactggaa ttgatcgata acgtgcaacg gttaggcctc
5040ggctaccgct ttgagagcga tatccgtggt gcactggacc gcttcgtatc gtctggtggt
5100tttgacgccg ttacgaaaac gagcctgcat ggtacagcat tgtcttttcg gctgttgcgc
5160cagcatggat ttgaagtgtc acaggaggca ttttcaggct tcaaagacca gaacgggaat
5220tttttggaga atttgaaaga agatatcaaa gcgatcttat ctctgtatga ggcgtcattt
5280ctcgctctgg aaggggaaaa tattctggac gaagcgaaag tgttcgcaat ttcccatctg
5340aaagaacttt ccgaagaaaa gattgggaaa gaattggccg aacaggtgaa ccatgcgctg
5400gaactgccac tgcaccgtcg cacccaacgc ctcgaagcgg tatggtcgat tgaagcgtat
5460cgcaaaaaag aggatgcaaa tcaggttctg ctggaactgg ccattctcga ctataacatg
5520attcagtccg tctatcaacg tgatctgcgc gaaactagtc gttggtggcg ccgtgtagga
5580cttgccacta aactgcattt tgcacgtgat cgtctgattg agtcgttcta ttgggcggtt
5640ggtgtagcgt ttgagccgca gtattctgat tgccgcaata gtgtggcgaa aatgttctcc
5700tttgtgacca tcattgacga tatttacgac gtgtatggca ccctggatga actggaatta
5760ttcaccgatg cagtagaacg ctgggacgtc aacgcgatca atgatttgcc ggattacatg
5820aaactgtgtt ttctggccct gtataacacc attaacgaaa ttgcctatga caacctcaaa
5880gacaagggtg aaaatatcct gccctatctg actaaagctt gggctgatct gtgtaacgcg
5940ttcttacagg aagccaaatg gctctacaac aagagtacgc ctactttcga tgactacttt
6000ggcaacgctt ggaaaagctc tagcggccct ttacaactgg tgttcgcgta tttcgccgtt
6060gttcagaata tcaagaaaga agagattgag aacctccaaa agtaccacga tacgatttcg
6120cgtccgtcac acatctttcg cctttgcaat gatttggcca gtgcatctgc agagattgcg
6180cgcggtgaaa ctgccaactc cgtcagttgc tacatgcgta ccaaaggcat cagcgaggaa
6240ctggctaccg agtcggtgat gaacttaatc gatgaaacct ggaagaagat gaacaaagag
6300aaacttggtg gcagtctgtt tgctaaaccg ttcgttgaga cagcgattaa tctggcgcgt
6360caaagccact gcacctacca caatggcgat gcccacacat ccccagacga attaacccgg
6420aaacgtgtcc tgagtgtcat caccgaaccc attctgccgt tcgaacgcta agcctgctaa
6480caaagcccga aaggaagctg agttggctgc tgccaccgct gagcactagt gcggccgctt
6540tgcgcattca cagttctccg caagaattga ttggctccaa ttcttggagt ggtgaatccg
6600ttagcgaggt gccgccggct tccattcagg tcgaggtggc ccggctccat gcaccgcgac
6660gcaacgcggg gaggcagaca aggtataggg cggcgcctac aatccatgcc aacccgttcc
6720atgtgctcgc cgaggcggca taaatcgccg tgacgatcag cggtccagtg atcgaagtta
6780ggctggtaag agccgcgagc gatccttgaa gctgtccctg atggtcgtca tctacctgcc
6840tggacagcat ggcctgcaac gcgggcatcc cgatgccgcc ggaagcgaga agaatcataa
6900tggggaaggc catccagcct cgcgtcgcga acgccagcaa gacgtagccc agcgcgtcgg
6960ccgccatgcc ggcgataatg gcctgcttct cgccgaaacg tttggtggcg ggaccagtga
7020cgaaggcttg agcgagggcg tgcaagattc cgaataccgc aagcgacagg ccgatcatcg
7080tcgcgctcca gcgaaagcgg tcctcgccga aaatgaccca gagcgctgcc ggcacctgtc
7140ctacgagttg catgataaag aagacagtca taagtgcggc gacgatagtc atgccccgcg
7200cccaccggaa ggagctgact gggttgaagg ctctcaaggg catcggtcga cgctctccct
7260tatgcgactc ctgcattagg aagcagccca gtagtaggtt gaggccgttg agcaccgccg
7320ccgcaaggaa tggtgcatgc aaggagatgg cgcccaacag tcccccggcc acggggcctg
7380ccaccatacc cacgccgaaa caagcgctca tgagcccgaa gtggcgagcc cgatcttccc
7440catcggtgat gtcggcgata taggcgccag caaccgcacc tgtggcgccg gtgatgccgg
7500ccacgatgcg tccggcgtag aggatccaca ggacgggtgt ggtcgccatg atcgcgtagt
7560cgatagtggc tccaagtagc gaagcgagca ggactgggcg gcggccaaag cggtcggaca
7620gtgctccgag aacgggtgcg catagaaatt gcatcaacgc atatagcgct agcagcacgc
7680catagtgact ggcgatgctg tcggaatgga cgatatcccg caagaggccc ggcagtaccg
7740gcataaccaa gcctatgcct acagcatcca gggtgacggt gccgaggatg acgatgagcg
7800cattgttaga tttcatacac ggtgcctgac tgcgttagca atttaactgt gataaactac
7860cgcattaaag cttatcgatg ataagctgtc aaacatgaga attcttgaag acgaaagggc
7920ctcgtgatac gcctattttt ataggttaat gtcatgataa taatggtttc ttagacgtca
7980ggtggcactt ttcggggaaa tgtgcgcgcc cgcgttcctg ctggcgctgg gcctgtttct
8040ggcgctggac ttcccgctgt tccgtcagca gcttttcgcc cacggccttg atgatcgcgg
8100cggccttggc ctgcatatcc cgattcaacg gccccagggc gtccagaacg ggcttcaggc
8160gctcccgaag gtctcgggcc gtctcttggg cttgatcggc cttcttgcgc atctcacgcg
8220ctcctgcggc ggcctgtagg gcaggctcat acccctgccg aaccgctttt gtcagccggt
8280cggccacggc ttccggcgtc tcaacgcgct ttgagattcc cagcttttcg gccaatccct
8340gcggtgcata ggcgcgtggc tcgaccgctt gcgggctgat ggtgacgtgg cccactggtg
8400gccgctccag ggcctcgtag aacgcctgaa tgcgcgtgtg acgtgccttg ctgcc
8455208400DNAArtificial sequenceSynthetic 20ctcgatgccc cgttgcagcc
ctagatcggc cacagcggcc gcaaacgtgg tctggtcgcg 60ggtcatctgc gctttgttgc
cgatgaactc cttggccgac agcctgccgt cctgcgtcag 120cggcaccacg aacgcggtca
tgtgcgggct ggtttcgtca cggtggatgc tggccgtcac 180gatgcgatcc gccccgtact
tgtccgccag ccacttgtgc gccttctcga agaacgccgc 240ctgctgttct tggctggccg
acttccacca ttccgggctg gccgtcatga cgtactcgac 300cgccaacaca gcgtccttgc
gccgcttctc tggcagcaac tcgcgcagtc ggcccatcgc 360ttcatcggtg ctgctggccg
cccagtgctc gttctctggc gtcctgctgg cgtcagcgtt 420gggcgtctcg cgctcgcggt
aggcgtgctt gagactggcc gccacgttgc ccattttcgc 480cagcttcttg catcgcatga
tcgcgtatgc cgccatgcct gcccctccct tttggtgtcc 540aaccggctcg acgggggcag
cgcaaggcgg tgcctccggc gggccactca atgcttgagt 600atactcacta gactttgctt
cgcaaagtcg tgaccgccta cggcggctgc ggcgccctac 660gggcttgctc tccgggcttc
gccctgcgcg gtcgctgcgc tcccttgcca gcccgtggat 720atgtggacga tggccgcgag
cggccaccgg ctggctcgct tcgctcggcc cgtggacaac 780cctgctggac aagctgatgg
acaggctgcg cctgcccacg agcttgacca cagggattgc 840ccaccggcta cccagccttc
gaccacatac ccaccggctc caactgcgcg gcctgcggcc 900ttgccccatc aattttttta
attttctctg gggaaaagcc tccggcctgc ggcctgcgcg 960cttcgcttgc cggttggaca
ccaagtggaa ggcgggtcaa ggctcgcgca gcgaccgcgc 1020agcggcttgg ccttgacgcg
cctggaacga cccaagccta tgcgagtggg ggcagtcgaa 1080ggcgaagccc gcccgcctgc
cccccgagcc tcacggcggc gagtgcgggg gttccaaggg 1140ggcagcgcca ccttgggcaa
ggccgaaggc cgcgcagtcg atcaacaagc cccggagggg 1200ccactttttg ccggaggggg
agccgcgccg aaggcgtggg ggaaccccgc aggggtgccc 1260ttctttgggc accaaagaac
tagatatagg gcgaaatgcg aaagacttaa aaatcaacaa 1320cttaaaaaag gggggtacgc
aacagctcat tgcggcaccc cccgcaatag ctcattgcgt 1380aggttaaaga aaatctgtaa
ttgactgcca cttttacgca acgcataatt gttgtcgcgc 1440tgccgaaaag ttgcagctga
ttgcgcatgg tgccgcaacc gtgcggcacc ctaccgcatg 1500gagataagca tggccacgca
gtccagagaa atcggcattc aagccaagaa caagcccggt 1560cactgggtgc aaacggaacg
caaagcgcat gaggcgtggg ccgggcttat tgcgaggaaa 1620cccacggcgg caatgctgct
gcatcacctc gtggcgcaga tgggccacca gaacgccgtg 1680gtggtcagcc agaagacact
ttccaagctc atcggacgtt ctttgcggac ggtccaatac 1740gcagtcaagg acttggtggc
cgagcgctgg atctccgtcg tgaagctcaa cggccccggc 1800accgtgtcgg cctacgtggt
caatgaccgc gtggcgtggg gccagccccg cgaccagttg 1860cgcctgtcgg tgttcagtgc
cgccgtggtg gttgatcacg acgaccagga cgaatcgctg 1920ttggggcatg gcgacctgcg
ccgcatcccg accctgtatc cgggcgagca gcaactaccg 1980accggccccg gcgaggagcc
gcccagccag cccggcattc cgggcatgga accagacctg 2040ccagccttga ccgaaacgga
ggaatgggaa cggcgcgggc agcagcgcct gccgatgccc 2100gatgagccgt gttttctgga
cgatggcgag ccgttggagc cgccgacacg ggtcacgctg 2160ccgcgccggt agcacttggg
ttgcgcagca acccgtaagt gcgctgttcc agactatcgg 2220ctgtagccgc ctctagatta
attaacctcc agcgcgggga tctcatgctg gagttcttcg 2280cccaccccca gacaagctgt
gaccgtctcc gggagctgca tgtgtcagag gttttcaccg 2340tcatcaccga aacgcgcgag
gcagcagatc aattcgcgcg cgaaggcgaa gcggcatgca 2400taatgtgcct gtcaaatgga
cgaagcaggg attctgcaaa ccctatgcta ctccgtcaag 2460ccgtcaattg tctgattcgt
taccaattat gacaacttga cggctacatc attcactttt 2520tcttcacaac cggcacggaa
ctcgctcggg ctggccccgg tgcatttttt aaatacccgc 2580gagaaataga gttgatcgtc
aaaaccaaca ttgcgaccga cggtggcgat aggcatccgg 2640gtggtgctca aaagcagctt
cgcctggctg atacgttggt cctcgcgcca gcttaagacg 2700ctaatcccta actgctggcg
gaaaagatgt gacagacgcg acggcgacaa gcaaacatgc 2760tgtgcgacgc tggcgatatc
aaaattgctg tctgccaggt gatcgctgat gtactgacaa 2820gcctcgcgta cccgattatc
catcggtgga tggagcgact cgttaatcgc ttccatgcgc 2880cgcagtaaca attgctcaag
cagatttatc gccagcagct ccgaatagcg cccttcccct 2940tgcccggcgt taatgatttg
cccaaacagg tcgctgaaat gcggctggtg cgcttcatcc 3000gggcgaaaga accccgtatt
ggcaaatatt gacggccagt taagccattc atgccagtag 3060gcgcgcggac gaaagtaaac
ccactggtga taccattcgc gagcctccgg atgacgaccg 3120tagtgatgaa tctctcctgg
cgggaacagc aaaatatcac ccggtcggca aacaaattct 3180cgtccctgat ttttcaccac
cccctgaccg cgaatggtga gattgagaat ataacctttc 3240attcccagcg gtcggtcgat
aaaaaaatcg agataaccgt tggcctcaat cggcgttaaa 3300cccgccacca gatgggcatt
aaacgagtat cccggcagca ggggatcatt ttgcgcttca 3360gccatacttt tcatactccc
gccattcaga gaagaaacca attgtccata ttgcatcaga 3420cattgccgtc actgcgtctt
ttactggctc ttctcgctaa ccaaaccggt aaccccgctt 3480attaaaagca ttctgtaaca
aagcgggacc aaagccatga caaaaacgcg taacaaaagt 3540gtctataatc acggcagaaa
agtccacatt gattatttgc acggcgtcac actttgctat 3600gccatagcat ttttatccat
aagattagcg gatcctacct gacgcttttt atcgcaactc 3660tctactgttt ctccataccc
gttttttggg ctagaaataa ttttgagctc gccaaggaga 3720tataatgact aaccgtaaag
atgatcacat caaatatgct ctcaagtacc aatcgcctta 3780taatgctttt gatgacatag
aactcataca ccattcctta cctagctatg atttgtctga 3840tattgatctc agtactcatt
ttgctgggca agacttcgac tttccctttt acatcaatgc 3900catgacagga ggaagtcaaa
aaggcaaagc tgtcaatgaa aaattggcca aagtagcagc 3960agcaacaggg attgtcatgg
tgacagggtc ttatagcgct gctttaaaaa atcctaacga 4020cgattcctat cgtttacatg
aggtggcaga taacttgaaa ctagccacga atattggtct 4080agataaacct gtggcgctag
gacaacaaac ggttcaagaa atgcagcccc tctttttaca 4140ggttcatgtg aatgtgatgc
aagagttgct gatgccagag ggtgagcgcg tctttcatac 4200ctggaaaaaa cacctcgctg
aatacgctag tcaaatacca gttcctgtca ttctcaaaga 4260agttggtttt ggcatggatg
tcaatagtat caagctagca catgacctag gcattcaaac 4320ctttgatatt tcaggtagag
gaggaacttc atttgcttac attgaaaatc aaagaggggg 4380agaccgctct tacttaaacg
attggggaca aaccactgtt cagtgcttac tgaatgcaca 4440aggactgatg gaccaagtgg
aaatcttagc ttcgggtggt gtcagacacc ccttggacat 4500gattaagtgt tttgtcttag
gagcacgtgc agtgggactc tcacgcaccg ttttagaatt 4560ggtcgaaaaa tacccaaccg
agcgtgtgat tgctatcgtt aatggctgga aagaagaatt 4620aaaaatcatt atgtgtgctc
ttgactgtaa aactattaaa gaattaaagg gagtcgacta 4680cttactatat ggacgcttgc
agcaggtcaa ttagcttaag gaaggagcga agcatgcgtt 4740gtagcgttag caccgaaaat
gtgtcgttta cggaaacgga aaccgaagct cgccgcagcg 4800caaactatga accgaactcg
tgggattacg attacctcct tagcagcgat acggatgaaa 4860gcattgaagt gtataaagac
aaagccaaga aactggaggc cgaagtccgt cgcgaaatca 4920acaatgagaa agcggagttt
cttacgttac tggaattgat cgataacgtg caacggttag 4980gcctcggcta ccgctttgag
agcgatatcc gtggtgcact ggaccgcttc gtatcgtctg 5040gtggttttga cgccgttacg
aaaacgagcc tgcatggtac agcattgtct tttcggctgt 5100tgcgccagca tggatttgaa
gtgtcacagg aggcattttc aggcttcaaa gaccagaacg 5160ggaatttttt ggagaatttg
aaagaagata tcaaagcgat cttatctctg tatgaggcgt 5220catttctcgc tctggaaggg
gaaaatattc tggacgaagc gaaagtgttc gcaatttccc 5280atctgaaaga actttccgaa
gaaaagattg ggaaagaatt ggccgaacag gtgaaccatg 5340cgctggaact gccactgcac
cgtcgcaccc aacgcctcga agcggtatgg tcgattgaag 5400cgtatcgcaa aaaagaggat
gcaaatcagg ttctgctgga actggccatt ctcgactata 5460acatgattca gtccgtctat
caacgtgatc tgcgcgaaac tagtcgttgg tggcgccgtg 5520taggacttgc cactaaactg
cattttgcac gtgatcgtct gattgagtcg ttctattggg 5580cggttggtgt agcgtttgag
ccgcagtatt ctgattgccg caatagtgtg gcgaaaatgt 5640tctcctttgt gaccatcatt
gacgatattt acgacgtgta tggcaccctg gatgaactgg 5700aattattcac cgatgcagta
gaacgctggg acgtcaacgc gatcaatgat ttgccggatt 5760acatgaaact gtgttttctg
gccctgtata acaccattaa cgaaattgcc tatgacaacc 5820tcaaagacaa gggtgaaaat
atcctgccct atctgactaa agcttgggct gatctgtgta 5880acgcgttctt acaggaagcc
aaatggctct acaacaagag tacgcctact ttcgatgact 5940actttggcaa cgcttggaaa
agctctagcg gccctttaca actggtgttc gcgtatttcg 6000ccgttgttca gaatatcaag
aaagaagaga ttgagaacct ccaaaagtac cacgatacga 6060tttcgcgtcc gtcacacatc
tttcgccttt gcaatgattt ggccagtgca tctgcagaga 6120ttgcgcgcgg tgaaactgcc
aactccgtca gttgctacat gcgtaccaaa ggcatcagcg 6180aggaactggc taccgagtcg
gtgatgaact taatcgatga aacctggaag aagatgaaca 6240aagagaaact tggtggcagt
ctgtttgcta aaccgttcgt tgagacagcg attaatctgg 6300cgcgtcaaag ccactgcacc
taccacaatg gcgatgccca cacatcccca gacgaattaa 6360cccggaaacg tgtcctgagt
gtcatcaccg aacccattct gccgttcgaa cgctaagcct 6420gctaacaaag cccgaaagga
agctgagttg gctgctgcca ccgctgagca ctagtgcggc 6480cgctttgcgc attcacagtt
ctccgcaaga attgattggc tccaattctt ggagtggtga 6540atccgttagc gaggtgccgc
cggcttccat tcaggtcgag gtggcccggc tccatgcacc 6600gcgacgcaac gcggggaggc
agacaaggta tagggcggcg cctacaatcc atgccaaccc 6660gttccatgtg ctcgccgagg
cggcataaat cgccgtgacg atcagcggtc cagtgatcga 6720agttaggctg gtaagagccg
cgagcgatcc ttgaagctgt ccctgatggt cgtcatctac 6780ctgcctggac agcatggcct
gcaacgcggg catcccgatg ccgccggaag cgagaagaat 6840cataatgggg aaggccatcc
agcctcgcgt cgcgaacgcc agcaagacgt agcccagcgc 6900gtcggccgcc atgccggcga
taatggcctg cttctcgccg aaacgtttgg tggcgggacc 6960agtgacgaag gcttgagcga
gggcgtgcaa gattccgaat accgcaagcg acaggccgat 7020catcgtcgcg ctccagcgaa
agcggtcctc gccgaaaatg acccagagcg ctgccggcac 7080ctgtcctacg agttgcatga
taaagaagac agtcataagt gcggcgacga tagtcatgcc 7140ccgcgcccac cggaaggagc
tgactgggtt gaaggctctc aagggcatcg gtcgacgctc 7200tcccttatgc gactcctgca
ttaggaagca gcccagtagt aggttgaggc cgttgagcac 7260cgccgccgca aggaatggtg
catgcaagga gatggcgccc aacagtcccc cggccacggg 7320gcctgccacc atacccacgc
cgaaacaagc gctcatgagc ccgaagtggc gagcccgatc 7380ttccccatcg gtgatgtcgg
cgatataggc gccagcaacc gcacctgtgg cgccggtgat 7440gccggccacg atgcgtccgg
cgtagaggat ccacaggacg ggtgtggtcg ccatgatcgc 7500gtagtcgata gtggctccaa
gtagcgaagc gagcaggact gggcggcggc caaagcggtc 7560ggacagtgct ccgagaacgg
gtgcgcatag aaattgcatc aacgcatata gcgctagcag 7620cacgccatag tgactggcga
tgctgtcgga atggacgata tcccgcaaga ggcccggcag 7680taccggcata accaagccta
tgcctacagc atccagggtg acggtgccga ggatgacgat 7740gagcgcattg ttagatttca
tacacggtgc ctgactgcgt tagcaattta actgtgataa 7800actaccgcat taaagcttat
cgatgataag ctgtcaaaca tgagaattct tgaagacgaa 7860agggcctcgt gatacgccta
tttttatagg ttaatgtcat gataataatg gtttcttaga 7920cgtcaggtgg cacttttcgg
ggaaatgtgc gcgcccgcgt tcctgctggc gctgggcctg 7980tttctggcgc tggacttccc
gctgttccgt cagcagcttt tcgcccacgg ccttgatgat 8040cgcggcggcc ttggcctgca
tatcccgatt caacggcccc agggcgtcca gaacgggctt 8100caggcgctcc cgaaggtctc
gggccgtctc ttgggcttga tcggccttct tgcgcatctc 8160acgcgctcct gcggcggcct
gtagggcagg ctcatacccc tgccgaaccg cttttgtcag 8220ccggtcggcc acggcttccg
gcgtctcaac gcgctttgag attcccagct tttcggccaa 8280tccctgcggt gcataggcgc
gtggctcgac cgcttgcggg ctgatggtga cgtggcccac 8340tggtggccgc tccagggcct
cgtagaacgc ctgaatgcgc gtgtgacgtg ccttgctgcc 8400218443DNAArtificial
sequenceSynthetic 21ctcgatgccc cgttgcagcc ctagatcggc cacagcggcc
gcaaacgtgg tctggtcgcg 60ggtcatctgc gctttgttgc cgatgaactc cttggccgac
agcctgccgt cctgcgtcag 120cggcaccacg aacgcggtca tgtgcgggct ggtttcgtca
cggtggatgc tggccgtcac 180gatgcgatcc gccccgtact tgtccgccag ccacttgtgc
gccttctcga agaacgccgc 240ctgctgttct tggctggccg acttccacca ttccgggctg
gccgtcatga cgtactcgac 300cgccaacaca gcgtccttgc gccgcttctc tggcagcaac
tcgcgcagtc ggcccatcgc 360ttcatcggtg ctgctggccg cccagtgctc gttctctggc
gtcctgctgg cgtcagcgtt 420gggcgtctcg cgctcgcggt aggcgtgctt gagactggcc
gccacgttgc ccattttcgc 480cagcttcttg catcgcatga tcgcgtatgc cgccatgcct
gcccctccct tttggtgtcc 540aaccggctcg acgggggcag cgcaaggcgg tgcctccggc
gggccactca atgcttgagt 600atactcacta gactttgctt cgcaaagtcg tgaccgccta
cggcggctgc ggcgccctac 660gggcttgctc tccgggcttc gccctgcgcg gtcgctgcgc
tcccttgcca gcccgtggat 720atgtggacga tggccgcgag cggccaccgg ctggctcgct
tcgctcggcc cgtggacaac 780cctgctggac aagctgatgg acaggctgcg cctgcccacg
agcttgacca cagggattgc 840ccaccggcta cccagccttc gaccacatac ccaccggctc
caactgcgcg gcctgcggcc 900ttgccccatc aattttttta attttctctg gggaaaagcc
tccggcctgc ggcctgcgcg 960cttcgcttgc cggttggaca ccaagtggaa ggcgggtcaa
ggctcgcgca gcgaccgcgc 1020agcggcttgg ccttgacgcg cctggaacga cccaagccta
tgcgagtggg ggcagtcgaa 1080ggcgaagccc gcccgcctgc cccccgagcc tcacggcggc
gagtgcgggg gttccaaggg 1140ggcagcgcca ccttgggcaa ggccgaaggc cgcgcagtcg
atcaacaagc cccggagggg 1200ccactttttg ccggaggggg agccgcgccg aaggcgtggg
ggaaccccgc aggggtgccc 1260ttctttgggc accaaagaac tagatatagg gcgaaatgcg
aaagacttaa aaatcaacaa 1320cttaaaaaag gggggtacgc aacagctcat tgcggcaccc
cccgcaatag ctcattgcgt 1380aggttaaaga aaatctgtaa ttgactgcca cttttacgca
acgcataatt gttgtcgcgc 1440tgccgaaaag ttgcagctga ttgcgcatgg tgccgcaacc
gtgcggcacc ctaccgcatg 1500gagataagca tggccacgca gtccagagaa atcggcattc
aagccaagaa caagcccggt 1560cactgggtgc aaacggaacg caaagcgcat gaggcgtggg
ccgggcttat tgcgaggaaa 1620cccacggcgg caatgctgct gcatcacctc gtggcgcaga
tgggccacca gaacgccgtg 1680gtggtcagcc agaagacact ttccaagctc atcggacgtt
ctttgcggac ggtccaatac 1740gcagtcaagg acttggtggc cgagcgctgg atctccgtcg
tgaagctcaa cggccccggc 1800accgtgtcgg cctacgtggt caatgaccgc gtggcgtggg
gccagccccg cgaccagttg 1860cgcctgtcgg tgttcagtgc cgccgtggtg gttgatcacg
acgaccagga cgaatcgctg 1920ttggggcatg gcgacctgcg ccgcatcccg accctgtatc
cgggcgagca gcaactaccg 1980accggccccg gcgaggagcc gcccagccag cccggcattc
cgggcatgga accagacctg 2040ccagccttga ccgaaacgga ggaatgggaa cggcgcgggc
agcagcgcct gccgatgccc 2100gatgagccgt gttttctgga cgatggcgag ccgttggagc
cgccgacacg ggtcacgctg 2160ccgcgccggt agcacttggg ttgcgcagca acccgtaagt
gcgctgttcc agactatcgg 2220ctgtagccgc ctctagatta attaacctcc agcgcgggga
tctcatgctg gagttcttcg 2280cccaccccca gacaagctgt gaccgtctcc gggagctgca
tgtgtcagag gttttcaccg 2340tcatcaccga aacgcgcgag gcagcagatc aattcgcgcg
cgaaggcgaa gcggcatgca 2400taatgtgcct gtcaaatgga cgaagcaggg attctgcaaa
ccctatgcta ctccgtcaag 2460ccgtcaattg tctgattcgt taccaattat gacaacttga
cggctacatc attcactttt 2520tcttcacaac cggcacggaa ctcgctcggg ctggccccgg
tgcatttttt aaatacccgc 2580gagaaataga gttgatcgtc aaaaccaaca ttgcgaccga
cggtggcgat aggcatccgg 2640gtggtgctca aaagcagctt cgcctggctg atacgttggt
cctcgcgcca gcttaagacg 2700ctaatcccta actgctggcg gaaaagatgt gacagacgcg
acggcgacaa gcaaacatgc 2760tgtgcgacgc tggcgatatc aaaattgctg tctgccaggt
gatcgctgat gtactgacaa 2820gcctcgcgta cccgattatc catcggtgga tggagcgact
cgttaatcgc ttccatgcgc 2880cgcagtaaca attgctcaag cagatttatc gccagcagct
ccgaatagcg cccttcccct 2940tgcccggcgt taatgatttg cccaaacagg tcgctgaaat
gcggctggtg cgcttcatcc 3000gggcgaaaga accccgtatt ggcaaatatt gacggccagt
taagccattc atgccagtag 3060gcgcgcggac gaaagtaaac ccactggtga taccattcgc
gagcctccgg atgacgaccg 3120tagtgatgaa tctctcctgg cgggaacagc aaaatatcac
ccggtcggca aacaaattct 3180cgtccctgat ttttcaccac cccctgaccg cgaatggtga
gattgagaat ataacctttc 3240attcccagcg gtcggtcgat aaaaaaatcg agataaccgt
tggcctcaat cggcgttaaa 3300cccgccacca gatgggcatt aaacgagtat cccggcagca
ggggatcatt ttgcgcttca 3360gccatacttt tcatactccc gccattcaga gaagaaacca
attgtccata ttgcatcaga 3420cattgccgtc actgcgtctt ttactggctc ttctcgctaa
ccaaaccggt aaccccgctt 3480attaaaagca ttctgtaaca aagcgggacc aaagccatga
caaaaacgcg taacaaaagt 3540gtctataatc acggcagaaa agtccacatt gattatttgc
acggcgtcac actttgctat 3600gccatagcat ttttatccat aagattagcg gatcctacct
gacgcttttt atcgcaactc 3660tctactgttt ctccataccc gttttttggg ctagaaataa
ttttgagctc gccaaggaga 3720tataatgacg accaaccgca aggatgagca catcctctac
gccctggagc agaagtcgtc 3780gtacaactcg ttcgacgaag tggaactgat ccactcgtcg
ctgccgctgt ataacctgga 3840cgaaatcgac ctgtccaccg agttcgccgg ccgcaagtgg
gatttcccgt tctacatcaa 3900tgccatgacc ggcggtagca acaagggccg cgaaatcaat
cagaagctgg cccaggtcgc 3960cgagtcgtgc ggcatcctgt tcgtcaccgg cagctactcc
gccgcgctga agaacccgac 4020cgacgactcg ttctcggtca agagcagcca cccgaatctg
ctgctgggca cgaacatcgg 4080cctcgacaag cccgtcgaac tgggcctgca gaccgtggaa
gaaatgaacc ccgtgctgct 4140ccaggtgcat gtgaacgtga tgcaagagct gctgatgccg
gagggcgaac gcaagttccg 4200cagctggcag tcgcacctgg ccgactactc gaagcagatc
cccgtgccga tcgtgctgaa 4260agaagtgggc ttcggcatgg acgccaagac catcgagcgt
gcctacgagt tcggcgtgcg 4320caccgtggac ctctcgggcc gcggtggcac gagcttcgcg
tacatcgaaa accggcgcag 4380cggccagcgc gactacctga accagtgggg ccaatcgacc
atgcaggccc tgctgaacgc 4440gcaagaatgg aaggacaagg tcgagctgct ggtgtcgggc
ggcgtgcgta acccgctcga 4500catgatcaag tgcctggtgt tcggcgccaa ggccgtgggc
ctgtcccgca ccgtgctgga 4560gctggtcgaa acctacaccg tcgaagaagt catcggcatt
gtccagggct ggaaggccga 4620cctccgcctc atcatgtgct ccctgaactg cgccacgatc
gcggacctcc agaaggtgga 4680ctatctcctc tacggcaagc tcaaagaagc caaggaccag
atgaagaagg cgtgacttta 4740aggaaggagc gaagcatgcg ttgtagcgtt agcaccgaaa
atgtgtcgtt tacggaaacg 4800gaaaccgaag ctcgccgcag cgcaaactat gaaccgaact
cgtgggatta cgattacctc 4860cttagcagcg atacggatga aagcattgaa gtgtataaag
acaaagccaa gaaactggag 4920gccgaagtcc gtcgcgaaat caacaatgag aaagcggagt
ttcttacgtt actggaattg 4980atcgataacg tgcaacggtt aggcctcggc taccgctttg
agagcgatat ccgtggtgca 5040ctggaccgct tcgtatcgtc tggtggtttt gacgccgtta
cgaaaacgag cctgcatggt 5100acagcattgt cttttcggct gttgcgccag catggatttg
aagtgtcaca ggaggcattt 5160tcaggcttca aagaccagaa cgggaatttt ttggagaatt
tgaaagaaga tatcaaagcg 5220atcttatctc tgtatgaggc gtcatttctc gctctggaag
gggaaaatat tctggacgaa 5280gcgaaagtgt tcgcaatttc ccatctgaaa gaactttccg
aagaaaagat tgggaaagaa 5340ttggccgaac aggtgaacca tgcgctggaa ctgccactgc
accgtcgcac ccaacgcctc 5400gaagcggtat ggtcgattga agcgtatcgc aaaaaagagg
atgcaaatca ggttctgctg 5460gaactggcca ttctcgacta taacatgatt cagtccgtct
atcaacgtga tctgcgcgaa 5520actagtcgtt ggtggcgccg tgtaggactt gccactaaac
tgcattttgc acgtgatcgt 5580ctgattgagt cgttctattg ggcggttggt gtagcgtttg
agccgcagta ttctgattgc 5640cgcaatagtg tggcgaaaat gttctccttt gtgaccatca
ttgacgatat ttacgacgtg 5700tatggcaccc tggatgaact ggaattattc accgatgcag
tagaacgctg ggacgtcaac 5760gcgatcaatg atttgccgga ttacatgaaa ctgtgttttc
tggccctgta taacaccatt 5820aacgaaattg cctatgacaa cctcaaagac aagggtgaaa
atatcctgcc ctatctgact 5880aaagcttggg ctgatctgtg taacgcgttc ttacaggaag
ccaaatggct ctacaacaag 5940agtacgccta ctttcgatga ctactttggc aacgcttgga
aaagctctag cggcccttta 6000caactggtgt tcgcgtattt cgccgttgtt cagaatatca
agaaagaaga gattgagaac 6060ctccaaaagt accacgatac gatttcgcgt ccgtcacaca
tctttcgcct ttgcaatgat 6120ttggccagtg catctgcaga gattgcgcgc ggtgaaactg
ccaactccgt cagttgctac 6180atgcgtacca aaggcatcag cgaggaactg gctaccgagt
cggtgatgaa cttaatcgat 6240gaaacctgga agaagatgaa caaagagaaa cttggtggca
gtctgtttgc taaaccgttc 6300gttgagacag cgattaatct ggcgcgtcaa agccactgca
cctaccacaa tggcgatgcc 6360cacacatccc cagacgaatt aacccggaaa cgtgtcctga
gtgtcatcac cgaacccatt 6420ctgccgttcg aacgctaagc ctgctaacaa agcccgaaag
gaagctgagt tggctgctgc 6480caccgctgag ttggctgctg ccaccgctga gcactagtgc
ggccgctttg cgcattcaca 6540gttctccgca agaattgatt ggctccaatt cttggagtgg
tgaatccgtt agcgaggtgc 6600cgccggcttc cattcaggtc gaggtggccc ggctccatgc
accgcgacgc aacgcgggga 6660ggcagacaag gtatagggcg gcgcctacaa tccatgccaa
cccgttccat gtgctcgccg 6720aggcggcata aatcgccgtg acgatcagcg gtccagtgat
cgaagttagg ctggtaagag 6780ccgcgagcga tccttgaagc tgtccctgat ggtcgtcatc
tacctgcctg gacagcatgg 6840cctgcaacgc gggcatcccg atgccgccgg aagcgagaag
aatcataatg gggaaggcca 6900tccagcctcg cgtcgcgaac gccagcaaga cgtagcccag
cgcgtcggcc gccatgccgg 6960cgataatggc ctgcttctcg ccgaaacgtt tggtggcggg
accagtgacg aaggcttgag 7020cgagggcgtg caagattccg aataccgcaa gcgacaggcc
gatcatcgtc gcgctccagc 7080gaaagcggtc ctcgccgaaa atgacccaga gcgctgccgg
cacctgtcct acgagttgca 7140tgataaagaa gacagtcata agtgcggcga cgatagtcat
gccccgcgcc caccggaagg 7200agctgactgg gttgaaggct ctcaagggca tcggtcgacg
ctctccctta tgcgactcct 7260gcattaggaa gcagcccagt agtaggttga ggccgttgag
caccgccgcc gcaaggaatg 7320gtgcatgcaa ggagatggcg cccaacagtc ccccggccac
ggggcctgcc accataccca 7380cgccgaaaca agcgctcatg agcccgaagt ggcgagcccg
atcttcccca tcggtgatgt 7440cggcgatata ggcgccagca accgcacctg tggcgccggt
gatgccggcc acgatgcgtc 7500cggcgtagag gatccacagg acgggtgtgg tcgccatgat
cgcgtagtcg atagtggctc 7560caagtagcga agcgagcagg actgggcggc ggccaaagcg
gtcggacagt gctccgagaa 7620cgggtgcgca tagaaattgc atcaacgcat atagcgctag
cagcacgcca tagtgactgg 7680cgatgctgtc ggaatggacg atatcccgca agaggcccgg
cagtaccggc ataaccaagc 7740ctatgcctac agcatccagg gtgacggtgc cgaggatgac
gatgagcgca ttgttagatt 7800tcatacacgg tgcctgactg cgttagcaat ttaactgtga
taaactaccg cattaaagct 7860tatcgatgat aagctgtcaa acatgagaat tcttgaagac
gaaagggcct cgtgatacgc 7920ctatttttat aggttaatgt catgataata atggtttctt
agacgtcagg tggcactttt 7980cggggaaatg tgcgcgcccg cgttcctgct ggcgctgggc
ctgtttctgg cgctggactt 8040cccgctgttc cgtcagcagc ttttcgccca cggccttgat
gatcgcggcg gccttggcct 8100gcatatcccg attcaacggc cccagggcgt ccagaacggg
cttcaggcgc tcccgaaggt 8160ctcgggccgt ctcttgggct tgatcggcct tcttgcgcat
ctcacgcgct cctgcggcgg 8220cctgtagggc aggctcatac ccctgccgaa ccgcttttgt
cagccggtcg gccacggctt 8280ccggcgtctc aacgcgcttt gagattccca gcttttcggc
caatccctgc ggtgcatagg 8340cgcgtggctc gaccgcttgc gggctgatgg tgacgtggcc
cactggtggc cgctccaggg 8400cctcgtagaa cgcctgaatg cgcgtgtgac gtgccttgct
gcc 84432232DNAArtificial sequenceSynthetic
22ggaaggagcg aagcatgcgt tgtagcgtta gc
322338DNAArtificial sequenceSynthetic 23gggctttgtt agcaggctta gcgttcgaac
ggcagaat 382421DNAArtificial
sequenceSynthetic 24gcctgctaac aaagcccgaa a
212520DNAArtificial sequenceSynthetic 25gcttcgctcc
ttccttaaag
202619DNAArtificial sequenceSynthetic 26gccgccctat accttgtct
192720DNAArtificial sequenceSynthetic
27acggcgtcac actttgctat
202820DNAArtificial sequenceSynthetic 28cgcgtcgcga acgccagcaa
202920DNAArtificial sequenceSynthetic
29acggggcctg ccaccatacc
203020DNAArtificial sequenceSynthetic 30cttatcgatg ataagctgtc
203120DNAArtificial sequenceSynthetic
31cagccctaga tcggccacag
203220DNAArtificial sequenceSynthetic 32tgcctgcccc tcccttttgg
203320DNAArtificial sequenceSynthetic
33gcggcgagtg cgggggttcc
203420DNAArtificial sequenceSynthetic 34ggaaacccac ggcggcaatg
203524DNAArtificial sequenceSynthetic
35atcggctgta gccgcctcta gatt
243620DNAArtificial sequenceSynthetic 36agtaacaatt gctcaagcag
203720DNAArtificial sequenceSynthetic
37attcagagaa gaaaccaatt
203843DNAArtificial sequenceSynthetic 38gctagaaata attttgagct cgccaaggag
atataatgca aac 433941DNAArtificial
sequenceSynthetic 39gcttcgctcc ttccttaaag ttatttaagc tgggtaaatg c
414041DNAArtificial sequenceSynthetic 40gctagaaata
attttgagct cgccaaggag atataatggt c
414138DNAArtificial sequenceSynthetic 41gcttcgctcc ttccttaaag tcagcgcacc
gaatacga 384255DNAArtificial
sequenceSynthetic 42gctagaaata attttgagct cgccaaggag atataatgac
tgccgacaac aatag 554343DNAArtificial sequenceSynthetic
43gcttcgctcc ttccttaaag ttatagcatt ctatgaattt gcc
434454DNAArtificial sequenceSynthetic 44gctagaaata attttgagct cgccaaggag
atataatgaa tcgaaaagat gaac 544540DNAArtificial
sequenceSynthetic 45gcttcgctcc ttccttaaag ttaacgtttt gcgaaaacag
404657DNAArtificial sequenceSynthetic 46gctagaaata
attttgagct cgccaaggag atataatgac taaccgtaaa gatgatc
574739DNAArtificial sequenceSynthetic 47gcttcgctcc ttccttaaag ctaattgacc
tgctgcaag 394857DNAArtificial
sequenceSynthetic 48gctagaaata attttgagct cgccaaggag atataatgac
gaccaaccgc aaggatg 574938DNAArtificial sequenceSynthetic
49gcttcgctcc ttccttaaag tcacgccttc ttcatctg
385019DNAArtificial sequenceSynthetic 50gccgccctat accttgtct
195120DNAArtificial sequenceSynthetic
51acggcgtcac actttgctat
20521878DNAE. coli 52ataggatcct aatacgactc actatagggt tttgtttaac
tttaagaagg agatatacca 60tggctactga actgctgtgt ttgcatcgcc cgatttcact
tacacataaa ctgtttcgca 120atccactgcc gaaggttatt caggcgaccc ctctgacgtt
aaaactgcgt tgtagcgtta 180gcaccgaaaa tgtgtcgttt acggaaacgg aaaccgaagc
tcgccgcagc gcaaactatg 240aaccgaactc gtgggattac gattacctcc ttagcagcga
tacggatgaa agcattgaag 300tgtataaaga caaagccaag aaactggagg ccgaagtccg
tcgcgaaatc aacaatgaga 360aagcggagtt tcttacgtta ctggaattga tcgataacgt
gcaacggtta ggcctcggct 420accgctttga gagcgatatc cgtggtgcac tggaccgctt
cgtatcgtct ggtggttttg 480acgccgttac gaaaacgagc ctgcatggta cagcattgtc
ttttcggctg ttgcgccagc 540atggatttga agtgtcacag gaggcatttt caggcttcaa
agaccagaac gggaattttt 600tggagaattt gaaagaagat atcaaagcga tcttatctct
gtatgaggcg tcatttctcg 660ctctggaagg ggaaaatatt ctggacgaag cgaaagtgtt
cgcaatttcc catctgaaag 720aactttccga agaaaagatt gggaaagaat tggccgaaca
ggtgaaccat gcgctggaac 780tgccactgca ccgtcgcacc caacgcctcg aagcggtatg
gtcgattgaa gcgtatcgca 840aaaaagagga tgcaaatcag gttctgctgg aactggccat
tctcgactat aacatgattc 900agtccgtcta tcaacgtgat ctgcgcgaaa ctagtcgttg
gtggcgccgt gtaggacttg 960ccactaaact gcattttgca cgtgatcgtc tgattgagtc
gttctattgg gcggttggtg 1020tagcgtttga gccgcagtat tctgattgcc gcaatagtgt
ggcgaaaatg ttctcctttg 1080tgaccatcat tgacgatatt tacgacgtgt atggcaccct
ggatgaactg gaattattca 1140ccgatgcagt agaacgctgg gacgtcaacg cgatcaatga
tttgccggat tacatgaaac 1200tgtgttttct ggccctgtat aacaccatta acgaaattgc
ctatgacaac ctcaaagaca 1260agggtgaaaa tatcctgccc tatctgacta aagcttgggc
tgatctgtgt aacgcgttct 1320tacaggaagc caaatggctc tacaacaaga gtacgcctac
tttcgatgac tactttggca 1380acgcttggaa aagctctagc ggccctttac aactggtgtt
cgcgtatttc gccgttgttc 1440agaatatcaa gaaagaagag attgagaacc tccaaaagta
ccacgatacg atttcgcgtc 1500cgtcacacat ctttcgcctt tgcaatgatt tggccagtgc
atctgcagag attgcgcgcg 1560gtgaaactgc caactccgtc agttgctaca tgcgtaccaa
aggcatcagc gaggaactgg 1620ctaccgagtc ggtgatgaac ttaatcgatg aaacctggaa
gaagatgaac aaagagaaac 1680ttggtggcag tctgtttgct aaaccgttcg ttgagacagc
gattaatctg gcgcgtcaaa 1740gccactgcac ctaccacaat ggcgatgccc acacatcccc
agacgaatta acccggaaac 1800gtgtcctgag tgtcatcacc gaacccattc tgccgttcga
acgccatcat caccatcacc 1860attaatagcc tagggtgt
1878
User Contributions:
Comment about this patent or add new information about this topic: