Patent application title: HYDROGENASE POLYPEPTIDE AND METHODS OF USE
Inventors:
Michael W.w. Adams (Athens, GA, US)
Michael W.w. Adams (Athens, GA, US)
Francis E. Jenney, Jr. (Hoschton, GA, US)
Junsong Sun (Athens, GA, US)
Robert C. Hopkins (Hopkins, GA, US)
Assignees:
University of Georgia Research Foundation, Inc.
IPC8 Class: AC12P1940FI
USPC Class:
435 88
Class name: N-glycoside nucleoside having a fused ring containing a six-membered ring having two n-atoms in the same ring (e.g., purine nucleosides, etc.)
Publication date: 2011-01-27
Patent application number: 20110020875
Claims:
1. (canceled)
2. A tetrameric polypeptide comprising four subunits, wherein the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity, wherein the tetrameric polypeptide has hydrogenase activity, and wherein the tetrameric polypeptide is present in a genetically modified microbial cell.
3. A tetrameric polypeptide comprising four subunits, wherein the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity, wherein the tetrameric polypeptide has hydrogenase activity, and wherein one at least one subunit of the tetrameric polypeptide is a fusion comprising a heterologous amino acid sequence.
4. The tetrameric polypeptide of claim 3 wherein the tetrameric polypeptide is isolated.
5. (canceled)
6. The tetrameric polypeptide of claim 3 wherein the tetrameric polypeptide is present in a microbial cell.
7. (canceled)
8. The tetrameric polypeptide of claim 3 wherein the hydrogenase activity is at least 0.05 micromoles H2 produced min-1 mg protein-1 when isolated by whole cell extract, centrifugation of a whole cell extract at 100,000.times.g, heat-treatment at 80.degree. C. for 30 minutes, and re-centrifugation at 100,000.times.g.
9-14. (canceled)
15. The polypeptide of claim 3 wherein the polypeptide consists of the first subunit, the second subunit, the third subunit, and the fourth subunit.
16. (canceled)
17. A genetically modified microbe comprising an exogenous polypeptide, wherein the exogenous polypeptide comprises four subunits, wherein the first subunit comprises an amino acid sequence, and the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the second subunit comprises an amino acid sequence, and the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein the third subunit comprises an amino acid sequence, and the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the fourth subunit comprises an amino acid sequence, and the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity., and wherein the four subunits form a dimeric tetrameric polypeptide having hydrogenase activity.
18. (canceled)
19. The genetically modified microbe of claim 17 wherein one at least one subunit is a fusion comprising a heterologous amino acid sequence.
20-27. (canceled)
28. A method for using a polypeptide comprising:providinga) a tetrameric polypeptide comprising four subunits, wherein the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity, wherein the tetrameric polypeptide has hydrogenase activity, and wherein the tetrameric polypeptide is present in a genetically modified microbial cell,b) a tetrameric polypeptide comprising four subunits, wherein the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity, wherein the tetrameric polypeptide has hydrogenase activity, and wherein one at least one subunit of the tetrameric polypeptide is a fusion comprising a heterologous amino acid sequence, orc) a tetrameric polypeptide consisting of four subunits, wherein the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, wherein the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity, wherein the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, wherein the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity, wherein one at least one subunit of the tetrameric polypeptide is a fusion comprising a heterologous amino acid sequence, and wherein the tetrameric polypeptide has hydrogenase activity; andincubating the polypeptide under conditions suitable for producing H2 or NADPH.
29. The method of claim 28 further comprising collecting the produced H2 or the produced NADPH.
30. The method of claim 28 wherein the tetrameric polypeptide of (a) or (b) is an isolated polypeptide.
31. The method of claim 30 wherein the tetrameric polypeptide is present on a surface.
32. The method of claim 31 wherein the surface conducts electricity.
33. The method of claim 32 wherein the surface is an anode.
34. The method of claim 28 wherein the tetrameric polypeptide of (a) or (b) is chemically modified.
35. The method of claim 28 wherein the incubating comprises conditions that comprise a polysaccharide. (Original) The method of claim 35 wherein the polysaccharide comprises starch.
37. The method of claim 28 wherein the conditions comprise a temperature of at least 70.degree. C.
38-49. (canceled)
Description:
CONTINUING APPLICATION DATA
[0001]This application claims the benefit of U.S. Provisional Application Ser. No. 61/005,383, filed Dec. 5, 2007, which is incorporated by reference herein.
BACKGROUND
[0003]Molecular hydrogen (H2) is typically produced by steam reforming of methane, and platinum is the most commonly used catalyst for hydrogen production. Due to utilization of fossil fuels as a source of methane, as well as the expense, limited availability, sensitivity to poisoning, and bioincompatibility of the catalyst, it is not likely to be utilized in economical energy conversion systems (Bharadwaj and Schmidt. 1995. Fuel Processing Technology 42:109-127, Ghenciu. 2002. Current Opinion in Solid State & Materials Science 6:389-399). However, in 2003 President Bush in the State of the Union Address proposed the Hydrogen Fuel Initiative, the goal of which was to develop new technologies for production and utilization of H2 as a potential source of energy to replace fossil fuels. In microorganisms, the molecular machine responsible for the biological uptake and evolution of hydrogen is an enzyme known as hydrogenase. Hydrogenase catalyzes the simplest of chemical reactions, the interconversion of the neutral molecule H2 and its elementary constituents, two protons and two electrons (Eqn. 1).
2H++2e.sup.-⇄H2 (1)
Ironically, however, while the reaction that they catalyze is simple, hydrogenase enzymes are multimeric proteins and typically are sensitive to air (oxygen). This has to-date precluded the facile production of a recombinant form of the major class of hydrogenase, the so-called `nickel-iron` (NiFe) type.
[0004]Hydrogenases are found in representatives of most microbial genera, as well as some unicellular eukaryotes (Adams et al. 1980. Biochim Biophys Acta 594:105-76; Cammack et al. 2001. Hydrogen as a fuel: learning from nature. Taylor & Francis, London, New York; Friedrich and Schwartz. 1993. Annual Review of Microbiology 47:351-383; Przybyla et al. 1992. FEMS Microbiology Reviews 88:109-135, Vignais et al. 2001. FEMS Microbiology Reviews 25:455-501). The enzyme allows many microorganisms to use H2 gas as a source of low potential reductant (H2/H+, Eo'=-420 mV), either for carbon fixation or as a source of energy. In aerobic environments, H2 oxidation can be coupled via membrane electron transport to the reduction of oxygen (O2/H2O, Eo'=+820 mV). There are a variety of electron acceptors that can be coupled to anaerobic H2 oxidation, including carbon dioxide, which can be reduced to either methane (by methanogens) or acetate (by acetogens), and sulfate and ferric-iron, which are reduced to sulfide and ferrous iron, respectively. On the other hand, microorganisms that produce H2 during growth are widespread in anaerobic environments. The production of H2 is used as a mechanism to dispose of the excess reductant that is generated during the oxidation of organic material. These fermentative organisms conserve energy by chemical synthesis (substrate level phosphorylation) independent of the means by which they dispose of reductant (be it as H2 or as a reduced organic compound such as ethanol). However, it was recently discovered that some organisms are able to conserve energy directly from the production of H2 by a novel respiratory mechanism (Sapra et al. 2003. Proc Natl Acad Sci USA 100:7545-50).
[0005]Two major types of hydrogenase are known: the nickel-iron (NiFe) and the iron-only (Fe) enzymes (Adams. 1990. Biochimica Et Biophysica Acta 1020:115-145; Albracht. 1994. Biochimica Et Biophysica Acta-Bioenergetics 1188:167-204), which are unrelated phylogenetically (Meyer, J. 2007. Cellular and Molecular Life Sciences 64:1063-1084; Vignais et al. 2001. FEMS Microbiology Reviews 25:455-501). The iron-only type is found in only a few types of anaerobic bacteria and some photosynthetic algae, but they have been extensively studied. This includes structural characterization (Chen et al. 2002. Biochemistry 41:2036-2043; Nicolet et al. 2001. Journal of the American Chemical Society 123:1596-1601; Nicolet et al. 2000. Trends in Biochemical Sciences 25:138-143; Nicolet et al. 1999. Structure with Folding & Design 7:13-23; Peters et al. 1998. Science 282:1853-1858) including potential active site models (Boyke et al. 2004. Journal of the American Chemical Society 126:15151-15160; Tye et al. 2006. Inorg Chem 45:1552-9; Zilberrnan et al. 2007. Inorg Chem 46:1153-61), and recently insights have been provided into their biosynthesis (Mishra et al. 2004. Biochemical and Biophysical Research Communications 324:679-685; Posewitz et al. 2004. Journal of Biological Chemistry 279:25711-25720), as well there are some recent successful attempts to make recombinant forms of these enzymes (King et al. 2006. J Bacteriol 188:2163-72).
[0006]The majority of microorganisms that metabolize H2, however, contain NiFe-hydrogenases, an example of which is the cytoplasmic NiFe hydrogenase I of the hyperthermophilic archaeon, Pyrococcus furiosus, which grows optimally at 100° C. (Fiala and Stetter. 1986. Archives of Microbiology 145:56-61, Verhagen et al. 2001. Hyperthermophilic Enzymes, Pt A 330:25-30). The NiFe-hydrogenases have also been extensively characterized over the last 40 years, and several crystal structures are available (Garcin et al. 1998. Biochemical Society Transactions 26:396-401, Higuchi. 1999. Structure 7:549-56, Volbeda and Fontecilla-Camps. 2003. Dalton Transactions:4030-4038, Volbeda et al. 1996. Journal of the American Chemical Society 118:12989-12996). They all are made up of at least two subunits, one of which contains the NiFe-catalytic site, while the other contains three iron-sulfur (FeS) clusters. These clusters serve to shuttle electrons from the electron donor to the enzyme to and from the NiFe site in the catalytic subunit. The Ni atom is bound to four cysteinyl residues of this subunit, two of which are near the N-terminus and two near the C-terminus. Two of the four Cys bind a single Fe atom, which is also coordinated, remarkably, by one carbon monoxide (CO) and two cyanide (CN) ligands (Bagley et al. 1995. Biochemistry 34:5527-5535, Happe et al. 1997. Nature 385:126-126, Pierik et al. 1999. Journal of Biological Chemistry 274:3331-3337). These diatomic ligands serve to activate the iron atom (maintaining it in the low spin state) thereby facilitating catalysis. Interestingly, such ligands are also found at the active site of the iron-only hydrogenases (Nicolet et al. 2002. J Inorg Biochem 91:1-8), as well as the mononuclear iron site of a third type of hydrogenase found in a very limited number of archaea (Lyon et al. 2004. Journal of the American Chemical Society 126:14239-14248), an example of convergent evolution toward a similar function.
[0007]The hydrogenase of P. furiosus is of particular interest for additional reasons. First, it is obtained from an organism that grows optimally at 100° C. and has been shown to be an exceedingly robust and thermostable enzyme (Bryant and Adams. 1989. J Biol Chem 264:5070-9; Ma and Adams. 2001. Methods Enzymol 331:208-16). Second, in in vitro assays, the enzyme has been shown to be able to generate hydrogen gas by oxidizing NADPH in a reversible reaction (Ma and Adams. 2001. Methods Enzymol 331:208-16; Ma et al. 2000. J Bacteriol 182:1864-71; Ma et al. 1994. FEMS Microbiology Letters 122:245-250), which is a very rare property among the hydrogenases that have been characterized to date. Consequently, the reversible P. furiosus enzyme has utility in generating reductants such as NADPH. Likewise, the P. furiosus enzyme has utility in hydrogen production systems in which carbohydrates are oxidized to generate NADPH, which in turn can be converted to hydrogen gas by the hydrogenase. The production of hydrogen from glucose in an in vitro cell-free system using purified enzymes was first demonstrated over a decade ago (Woodward et al. 1996. Nat Biotechnol 14:872-4). This work was very recently extended in which the conversion of starch to hydrogen was described using an in vitro cell-free system made up of thirteen different enzymes (Zhang et al. 2007. PLoS ONE 2:e456). Twelve of the enzymes are used to oxidize starch and generate carbon dioxide and NADPH, and the thirteenth, P. furiosus hydrogenase, oxidizes NADPH and produces hydrogen gas. In this system, the hydrogenase was purified from P. furiosus biomass (Ma and Adams. 2001. Methods Enzymol 331:208-16) since a recombinant form of this enzyme was not available.
SUMMARY OF THE INVENTION
[0008]Provided herein are polypeptides having hydrogenase activity. In one aspect, the polypeptide is dimeric polypeptide. The amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity, and the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity. At least one subunit may be a fusion that includes a heterologous amino acid sequence. The dimeric polypeptide may further include two more subunits to result in a tetrameric polypeptide. The amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity, and the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity. The multimeric polypeptide may be isolated, or purified. The tetrameric polypeptide may be present in a genetically modified microbial cell. In some aspects, the genetically modified microbial cell is not Pyrococcus furiosus, P. abyssi, P. horikoshii, Thermococcus kodakaraensis, or T. onnurineus. It may be present in a microbial cell, such as, but not limited to Escherichia coli.
[0009]The multimeric polypeptide may have hydrogenase activity of at least 0.05 micromoles H2 produced min-1 mg protein-1 when isolated by centrifugation of a whole cell extract at 100,000×g, heat-treatment at 80° C. for 30 minutes, and re-centrifugation at 100,000×g. The heterologous amino acid sequence may be present at, for instance, the amino terminal end of a subunit, or the carboxy terminal end of a subunit. The multimeric polypeptide may include one or more chemically modified subunits. Also provided herein is a polypeptide consisting of two subunits or four subunits.
[0010]Also provided herein are genetically modified microbes. A genetically modified microbe may include an exogenous polypeptide, wherein the exogenous polypeptide includes two subunits. The first subunit includes an amino acid sequence, and the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity. The second subunit includes an amino acid sequence, and the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity. The two subunits form a dimeric polypeptide having hydrogenase activity. The dimeric polypeptide may further include two more subunits to form a tetrameric polypeptide having hydrogenase activity, wherein the third subunit includes an amino acid sequence, and the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity. The fourth subunit includes an amino acid sequence, and the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity. At least one subunit can be a fusion that includes a heterologous amino acid sequence. A genetically modified microbe may include one or more of the accessory polynucleotides described herein.
[0011]A genetically modified microbe may include two exogenous polynucleotides, wherein the exogenous polynucleotides each encode a subunit. The first subunit can include an amino acid sequence, and the amino acid sequence of the first subunit and the amino acid sequence of SEQ ID NO:6 have at least 80% identity. The second subunit can include an amino acid sequence, and the amino acid sequence of the second subunit and the amino acid sequence of SEQ ID NO:8 have at least 80% identity. The two subunits form a dimeric polypeptide having hydrogenase activity. The genetically modified microbe can further include two more exogenous polynucleotides, wherein the two more exogenous polynucleotides each encode a subunit. The third subunit can include an amino acid sequence, and the amino acid sequence of the third subunit and the amino acid sequence of SEQ ID NO:2 have at least 80% identity. The fourth subunit can include an amino acid sequence, and the amino acid sequence of the fourth subunit and the amino acid sequence of SEQ ID NO:4 have at least 80% identity. The four subunits form a tetrameric polypeptide having hydrogenase activity. At least one subunit can be a fusion that includes a heterologous amino acid sequence, such as a histidine tag.
[0012]Further provided herein are methods for making a polypeptide having hydrogenase activity. The methods may include providing a genetically modified microbe including exogenous polynucleotides as described herein, and incubating the microbe under conditions suitable for expression of the exogenous polynucleotides to produce a multimeric polypeptide having hydrogenase activity. The method may further include isolating, or optionally purifying, the polypeptide after the incubating.
[0013]Provided herein are methods for using a polypeptide having hydrogenase activity. The methods may include providing a polypeptide described herein, and incubating the polypeptide under conditions suitable for producing H2. The produced H2 may be collected.
[0014]In one aspect, the polypeptide is an isolated or purified polypeptide. The polypeptide may be present on a surface, such as one that conducts electricity, e.g., an anode. The polypeptide may be chemically modified. The incubating may include conditions that include a polysaccharide, such as a starch or a cellulose. The conditions can include a temperature of at least 37° C. or at least 70° C. 70° C.
[0015]In another aspect, the polypeptide is present in a genetically modified microbe. The incubating may include incubating the microbial cell under conditions suitable for the expression of the polypeptide. The incubating may include conditions that include a polysaccharide, such as a starch or a cellulose. The conditions can include a temperature of at least 37° C. or at least 70° C.
[0016]Provided herein are methods for using a polypeptide having hydrogenase activity. The methods for using a polypeptide having hydrogenase activity may include providing a polypeptide described herein, and incubating the polypeptide under conditions suitable for producing NADPH. The produced NADPH may be collected.
[0017]In one aspect, the polypeptide is an isolated or purified polypeptide. The conditions may include molecular hydrogen, and a temperature of at least 37° C. In another aspect, the polypeptide is present in a genetically modified microbe. The incubating may include incubating the microbial cell under conditions suitable for the expression of the polypeptide. The conditions may include a temperature of at least 37° C.
[0018]Also provided herein is an expression system for assembling a polypeptide having hydrogenase activity. The expression system includes the plasmids described herein. The plasmids may be present in a microbe, such as an E. coli.
[0019]As used herein, the term "polypeptide" refers broadly to a polymer of two or more amino acids joined together by peptide bonds. The term "polypeptide" also includes molecules which contain more than one polypeptide joined by a disulfide bond, or complexes of polypeptides that are joined together, covalently or noncovalently, as multimers (e.g., dimers, trimers, tetramers). A polypeptide also may possess non-protein (non-amino acid) ligands including, but not limited to, inorganic iron (Fe), nickel (Ni), inorganic iron-sulfur centers such as [4Fe-4S] clusters, and other organic ligands such as carbon monoxide (CO), cyanide (CN) and flavin. Thus, the terms peptide, oligopeptide, enzyme, subunit, and protein are all included within the definition of polypeptide and these terms are used interchangeably. It should be understood that these terms do not connote a specific length of a polymer of amino acids, nor are they intended to imply or distinguish whether the polypeptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. As used herein, "heterologous amino acid sequence" refers to amino acid sequences that are not normally present as part of a polypeptide present in a wilt-type cell. For instance, "heterologous amino acid sequence" includes extra amino acids at the amino terminal end or carboxy terminal of a polypeptide that are not normally part of a polypeptide that is present in a wild-type cell.
[0020]As used herein, "hydrogenase activity" refers to the ability of a polypeptide to catalyze the formation of molecular hydrogen (H2).
[0021]As used herein, "identity" refers to structural similarity between two polypeptides or two polynucleotides. The structural similarity between two polypeptides is determined by aligning the residues of the two polypeptides (e.g., a candidate amino acid sequence and a reference amino acid sequence, such as SEQ ID NO:2, 4, 6, or 8) to optimize the number of identical amino acids along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to optimize the number of shared amino acids, although the amino acids in each sequence must nonetheless remain in their proper order. The structural similarity is typically at least 80% identity, at least 81% identity, at least 82% identity, at least 83% identity, at least 84% identity, at least 85% identity, at least 86% identity, at least 87% identity, at least 88% identity, at least 89% identity, at least 90% identity, at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity. A candidate amino acid sequence can be isolated from a microbe, preferably a Pyrococcus spp., more preferably a P. furiosus, or can be produced using recombinant techniques, or chemically or enzymatically synthesized. Structural similarity may be determined, for example, using sequence techniques such as the BESTFIT algorithm in the GCG package (Madison Wis.), or the Blastp program of the BLAST 2 search algorithm, as described by Tatusova, et al. (FEMS Microbiol Lett 1999, 174:247-250), and available through the World Wide Web, for instance at the interne site maintained by the National Center for Biotechnology Information, National Institutes of Health. Preferably, structural similarity between two amino acid sequences is determined using the Blastp program of the BLAST 2 search algorithm. Preferably, the default values for all BLAST 2 search parameters are used, including matrix=BLOSUM62; open gap penalty=11, extension gap penalty=1, gap x_dropoff=50, expect=10, wordsize=3, and optionally, filter on. In the comparison of two amino acid sequences using the BLAST search algorithm, structural similarity is referred to as "identities."
[0022]The structural similarity between two polynucleotides is determined by aligning the residues of the two polynucleotides (e.g., a candidate nucleotide sequence and a reference nucleotide sequence, such as SEQ ID NO:1, 3, 5, or 7) to optimize the number of identical nucleotides along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to optimize the number of shared nucleotides, although the nucleotides in each sequence must nonetheless remain in their proper order. The structural similarity is typically at least 80% identity, at least 81% identity, at least 82% identity, at least 83% identity, at least 84% identity, at least 85% identity, at least 86% identity, at least 87% identity, at least 88% identity, at least 89% identity, at least 90% identity, at least 91% identity, at least 92% identity, at least 93% identity, at least 94% identity, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, or at least 99% identity. A candidate nucleotide sequence can be isolated from a microbe, preferably a Pyrococcus spp., more preferably a P. furiosus, or can be produced using recombinant techniques, or chemically or enzymatically synthesized. Structural similarity may be determined, for example, using sequence techniques such as GCG FastA (Genetics Computer Group, Madison, Wis.), MacVector 4.5 (Kodak/IBI software package) or other suitable sequencing programs or methods known in the art. Preferably, structural similarity between two nucleotide sequences is determined using the Blastn program of the BLAST 2 search algorithm, as described by Tatusova, et al. (1999. FEMS Microbiol Lett. 174:247-250), and available through the World Wide Web, for instance at the internet site maintained by the National Center for Biotechnology Information, National Institutes of Health. Preferably, the default values for all BLAST 2 search parameters are used, including reward for match=1, penalty for mismatch=-2, open gap penalty=5, extension gap penalty=2, gap x_dropoff=50, expect=10, wordsize=11, and optionally, filter on. In the comparison of two nucleotide sequences using the BLAST search algorithm, structural similarity is referred to as "identities."
[0023]As used herein, an "isolated" substance is one that has been removed from its natural environment, produced using recombinant techniques, or chemically or enzymatically synthesized. For instance, a polypeptide, a polynucleotide, H2, or NADPH can be isolated. Preferably, a substance is purified, i.e., is at least 60% free, preferably at least 75% free, and most preferably at least 90% free from other components with which it is naturally associated.
[0024]As used herein, the term "polynucleotide" refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxynucleotides, and includes both double- and single-stranded RNA and DNA. A polynucleotide can be obtained directly from a natural source, or can be prepared with the aid of recombinant, enzymatic, or chemical techniques. A polynucleotide can be linear or circular in topology. A polynucleotide may be, for example, a portion of a vector, such as an expression or cloning vector, or a fragment. A polynucleotide may include nucleotide sequences having different functions, including, for instance, coding regions, and non-coding regions such as regulatory regions.
[0025]As used herein, the terms "coding region," "coding sequence," and "open reading frame" are used interchangeably and refer to a nucleotide sequence that encodes a polypeptide and, when placed under the control of appropriate regulatory sequences expresses the encoded polypeptide. The boundaries of a coding region are generally determined by a translation start codon at its 5' end and a translation stop codon at its 3' end. A "regulatory sequence" is a nucleotide sequence that regulates expression of a coding sequence to which it is operably linked. Non-limiting examples of regulatory sequences include promoters, enhancers, transcription initiation sites, translation start sites, translation stop sites, and transcription terminators. The term "operably linked" refers to a juxtaposition of components such that they are in a relationship permitting them to function in their intended manner. A regulatory sequence is "operably linked" to a coding region when it is joined in such a way that expression of the coding region is achieved under conditions compatible with the regulatory sequence.
[0026]A polynucleotide that includes a coding region may include heterologous nucleotides that flank one or both sides of the coding region. As used herein, "heterologous nucleotides" refer to nucleotides that are not normally present flanking a coding region that is present in a wild-type cell. For instance, a coding region present in a wild-type microbe and encoding a polypeptide described herein is flanked by homologous sequences, and any other nucleotide sequence flanking the coding region is considered to be heterologous. Examples of heterologous nucleotides include, but are not limited to regulatory sequences. Typically, heterologous nucleotides are present in a polynucleotide described herein through the use of standard genetic and/or recombinant methodologies well known to one skilled in the art. A polynucleotide described herein may be included in a suitable vector.
[0027]As used herein, an "exogenous polynucleotide" refers to a polynucleotide that is not normally or naturally found in a microbe. As used herein, the term "endogenous polynucleotide" refers to a polynucleotide that is normally or naturally found in a cell microbe. An "endogenous polynucleotide " is also referred to as a "native polynucleotide."
[0028]The terms "complement" and "complementary" as used herein, refer to the ability of two single stranded polynucleotides to base pair with each other, where an adenine on one strand of a polynucleotide will base pair to a thymine or uracil on a strand of a second polynucleotide and a cytosine on one strand of a polynucleotide will base pair to a guanine on a strand of a second polynucleotide. Two polynucleotides are complementary to each other when a nucleotide sequence in one polynucleotide can base pair with a nucleotide sequence in a second polynucleotide. For instance, 5'-ATGC and 5'-GCAT are complementary. The term "substantial complement" and cognates thereof as used herein, refer to a polynucleotide that is capable of selectively hybridizing to a specified polynucleotide under stringent hybridization conditions. Stringent hybridization can take place under a number of pH, salt and temperature conditions. The pH can vary from 6 to 9, preferably 6.8 to 8.5. The salt concentration can vary from 0.15 M sodium to 0.9 M sodium, and other cations can be used as long as the ionic strength is equivalent to that specified for sodium. The temperature of the hybridization reaction can vary from 30° C. to 80° C., preferably from 45° C. to 70° C. Additionally, other compounds can be added to a hybridization reaction to promote specific hybridization at lower temperatures, such as at or approaching room temperature. Among the compounds contemplated for lowering the temperature requirements is formamide. Thus, a polynucleotide is typically substantially complementary to a second polynucleotide if hybridization occurs between the polynucleotide and the second polynucleotide. As used herein, "specific hybridization" refers to hybridization between two polynucleotides under stringent hybridization conditions.
[0029]As used herein, "genetically modified microbe" refers to a microbe which has been altered "by the hand of man." A genetically modified microbe includes a microbe into which has been introduced an exogenous polynucleotide, e.g., an expression vector. Genetically modified microbe also refers to a microbe that has been genetically manipulated such that endogenous nucleotides have been altered to include a mutation, such as a deletion, an insertion, a transition, a transversion, or a combination thereof. For instance, an endogenous coding region could be deleted. Such mutations may result in a polypeptide having a different amino acid sequence than was encoded by the endogenous polynucleotide. Another example of a genetically modified microbe is one having an altered regulatory sequence, such as a promoter, to result in increased or decreased expression of an operably linked endogenous coding region.
[0030]Conditions that are "suitable" for an event to occur, such as expression of an exogenous polynucleotide in a cell to produce a polypeptide, or production of molecular hydrogen or NADPH, or "suitable" conditions are conditions that do not prevent such events from occurring. Thus, these conditions permit, enhance, facilitate, and/or are conducive to the event.
[0031]The term "and/or" means one or all of the listed elements or a combination of any two or more of the listed elements.
[0032]The words "preferred" and "preferably" refer to embodiments of the invention that may afford certain benefits, under certain circumstances. However, other embodiments may also be preferred, under the same or other circumstances. Furthermore, the recitation of one or more preferred embodiments does not imply that other embodiments are not useful, and is not intended to exclude other embodiments from the scope of the invention.
[0033]The terms "comprises" and variations thereof do not have a limiting meaning where these terms appear in the description and claims.
[0034]Unless otherwise specified, "a," "an," "the," and "at least one" are used interchangeably and mean one or more than one.
[0035]Also herein, the recitations of numerical ranges by endpoints include all numbers subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.80, 4, 5, etc.).
[0036]For any method disclosed herein that includes discrete steps, the steps may be conducted in any feasible order. And, as appropriate, any combination of two or more steps may be conducted simultaneously.
BRIEF DESCRIPTION OF THE FIGURES
[0037]FIG. 1. Construction of anaerobic expression vector pC11A-CDABI.
[0038]FIG. 2. Construction of anaerobic expression vector pC3AR-slyD.
[0039]FIG. 3. Construction of anaerobic expression vector pEA-SHI.
[0040]FIG. 4. Construction of anaerobic expression vector pRA-EF.
[0041]FIG. 5. Immunoanalysis using antibodies to the catalytic subunit (PF0894). MW 1001 SHICDABIEFSlyD, MW 1001 containing the coding regions HypC, HypD, HypF, HypE, HypA, HypB, HycI, and SlyD. Native Pf SHI, native P. furiosus SHOI hydrogenase.
[0042]FIG. 6. QPCR analysis of the expression of exogenous coding regions in E. coli.
[0043]FIG. 7. Amino acid sequence and nucleotide sequence of the polypeptides and polynucleotides referenced in Table 1. Coding regions and deduced polypeptide sequences of Pyrococcus furiosus DSM3638 used herein. All P. furiosus DNA and predicted protein sequences were derived from the deposited Genbank sequence NC--003413. Accession numbers refer to specific sections of this DNA sequence or the translated open reading frames encoded therein. Sequence identification numbers for these sequences are shown in Table 1.
[0044]FIG. 8. Maps and complete nucleotide sequences of four expression vectors. pEA-SH1, SEQ ID NO:29; pC11A-CDABI, SEQ ID NO:30; pRA-EF, SEQ ID NO:31; and pC3AR-slyD, SEQ ID NO:32.
[0045]FIG. 9. MV (methyl viologen)-linked hydrogenase activity of native versus recombinant P. furiosus soluble hydrogenase I.
[0046]FIG. 10. Production of MV-Linked Hydrogenase activity at 80° C. in recombinant E. coli MW/rSHI-C. The results from two separate cultures (one indicated by circles, one by triangles) are shown. The growth curves are shown by solid symbols.
[0047]FIG. 11. High Density 5-Liter Controlled Fermentation of E. coli MW/rSHI-C.
[0048]FIG. 12. Recombinant Hydrogenase Purification Scheme.
[0049]FIG. 13. SDS Gel Analysis of Recombinant Hydrogenase Purification. WCE, whole cell extract; S100, cytoplasmic extract after a 100,000×g centrifugation; DEAE pool, pool from DEAE Sepharose column; and PS pool, pool from Phenyl Sepharose column. The PF numbers and the calculated molecular weights for the four subunits of the hydrogenase are indicated.
[0050]FIG. 14. SDS Gel Analysis of Highly Purified Recombinant Hydrogenase. PS pool, pool from Phenyl Sepharose column; native SHI, native hydrogenase purified from P. furiosus; S200, Sepharcryl S-200 eluate; HAP, Hydroxyapatite eluate.
[0051]FIG. 15. Metal Analysis of Phenyl Sepharose fractions.
[0052]FIG. 16. Thermal Sensitivity of Recombinant Hydrogenase.
[0053]FIG. 17. Oxygen Sensitivity of Recombinant Hydrogenase.
[0054]FIG. 18. Expected Interactions Between Tetrameric Recombinant Hydrogenase and MV and NADPH.
[0055]FIG. 19. Expected Interactions Between Dimeric Recombinant Hydrogenase and MV and NADPH.
[0056]FIG. 20. pEA-0893/0894 (plasmid map and nucleotide sequence, SEQ ID NO:33).
[0057]FIG. 21. Alignments of each of the four subunits of P. furiosus hydogenase I and other related hydrogenases from P. abyssi, P. horikoshii, and Thermococcus kodakaraensis. In each alignment identical residues are not shaded, similar residues are boxed, and non-similar residues are shaded dark gray. In each alignment, PF, P. furiosus; PAB, P. abyssi; TK, Thermococcus kodakaraensis; and PH, P. horikoshii. The gene identifiers refer to the coding regions encoding each polypeptide. PF0891-PF0894 (SEQ ID NOs:2, 4, 6, and 8, respectively) refers to the coding regions present at Genbank Accession No. NC--003413; PAB1784-PAB1787 (SEQ ID NOs:34, 35, 36, and 37, respectively) refers to the coding regions present at Genbank Accession No. AL096836; TK2069-TK2072 (SEQ ID NOs:38, 39, 40, and 41, respectively) refers to the coding regions present at Genbank Accession No. NC--006624; and PH1290-1294 (SEQ ID NOs:42, 43, 44, and 45, respectively) refers to the coding regions present at Genbank Accession No. NC--000961. A. Alignment of the beta subunits. B. Alignment of the gamma subunits. C. Alignment of the delta subunits. D. Alignment of the alpha subunits.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0058]The expression of a NiFe-hydrogenase from an extremophile is expected to be inactive and unfolded and consequently not stable when expressed in Escherichia coli. We expressed the catalytic subunit (SEQ ID NO:8) in E. coli and to our surprise found that the monomeric subunit was stable. However, the stable expression of one subunit did not indicate that the other structural and accessory proteins would also be stable, and it was expected that chaperones (to stabilize unfolded protein) would be required for the proper assembly of the NiFe site. Furthermore, successful heterologous expression, meaning expression (transcription and translation) of genes not normally found in a given cell, of genes that encode such a molecular machine as a NiFe-hydrogenase has not been possible, in part because there are a large number of accessory proteins involved in its assembly. Despite the fact that the host bacterium used here, E. coli synthesizes its own native hydrogenases (all integral membrane proteins) under anaerobic conditions, attempts to express the genes encoding hydrogenases from other organisms have typically not been done in E. coli, but rather in very closely related organisms (Bascones et al. 2000. Appl Environ Microbiol 66:4292-9; King et al. 2006. J Bacteriol 188:2163-72; Lenz et al. 2005. J Bacteriol 187:6590-5; Morimoto et al. 2005. FEMS Microbiology Letters 246:229-34; Porthun et al. 2002. Arch Microbiol 177:159-66; Rousset et al. 1998. Journal of Bacteriology 180:4982-4986). Only recently have attempts been made to express hydrogenases (from Synechocystis sp.) in E. coli (Maeda et al. 2007. BMC Biotechnol 7:25) and this apparently only has the effect of limiting H2 uptake in the recombinant strains. Proteins playing a role in the assembly of NiFe hydrogenases in E. coli have been extensively characterized (Bock et al. 2006. Adv Microb Physiol 51:1-71), and homologs of the genes encoding eight of these proteins exist in P. furiosus. Described herein is a system for successful heterologous overexpression of a functional and tagged hyperthermophilic NiFe hydrogenase under anaerobic conditions in the common laboratory protein expression host bacterium E. coli, using the heterologously-expressed accessory proteins from P. furiosus while simultaneously expressing those encoding the protein components of P. furiosus hydrogenase.
[0059]Provided herein are polypeptides having hydrogenase activity. Such polypeptides may be referred to herein as hydrogenase polypeptides. A polypeptide having hydrogenase activity may include four subunits. The first subunit includes the amino acid sequence SEQ ID NO:2, or an amino acid sequence having structural similarity thereto, the second subunit includes the amino acid sequence SEQ ID NO:4 or an amino acid sequence having structural similarity thereto, the third subunit includes the amino acid sequence SEQ ID NO:6 or an amino acid sequence having structural similarity thereto, and the fourth subunit includes the amino acid sequence SEQ ID NO:8 or an amino acid sequence having structural similarity thereto. Such a polypeptide may be isolated from a microbe, such as thermophiles (prokaryotic microbes that grow in environments at temperatures of between 60° C. and 79° C.), and hyperthermophiles (prokaryotic microbes that grow in environments at temperatures above 80° C.). Examples include archaea such as, but not limited to, a member of the genera Pyrococcus, for instance P. furiosus, P. abyssi, or P. horikoshii, or a member of the genera Thermococcus, for instance, T. kodakaraensis or T. onnurineus, or may be produced using recombinant techniques, or chemically or enzymatically synthesized.
[0060]A polypeptide provided herein also includes various subcomplexes. A subcomplex is defined as an engineered version of the hydrogenase polypeptide containing less than the natively purified four subunits. For example, a subcomplex may be the alpha subunit alone (SEQ ID NO: 8), the alpha subunit with one other subunit, (SEQ ID NO: 6, 4 or 2), or the alpha subunit with some combination of the two other subunits. Accordingly, a hydrogenase polypeptide may be monomeric, dimeric, trimeric, or tetrameric. One example of a a hydrogenase polypeptide has 2 subunits, a first subunit that includes the amino acid sequence SEQ ID NO:8, or an amino acid sequence having structural similarity thereto, and a second subunit that includes the amino acid sequence SEQ ID NO:6 or an amino acid sequence having structural similarity thereto.
[0061]The hydrogenase activity of a hydrogenase polypeptide of the present invention may be determined by routine methods known in the art. Preferably, a hydrogen evolution assay is used as described herein. For instance, a cell extract may be tested for hydrogen evolution after preparation of a whole cell extract, centrifugation at 100,000×g, heat-treatment at 80° C. for 30 minutes, and re-centrifugation at 100,000×g (referred to as an S100 fraction). The standard assay conditions may include using 5 mL stoppered vials containing 2 mL of anaerobic 100 mM EPPS buffer pH 8.4, 10 mM sodium dithionite, and 1 mM Methyl Viologen under an atmosphere of argon. Typically, 0.5 milligrams of protein is added when measuring the activity of protein from an 80° C.-treated S100 fraction, and no greater than 0.005 milligrams of protein is added when measuring the activity of protein from a column, such as a DEAF Sepharose and/or Phenyl Sepharose column. The vials are preheated at 80° C. for 1 minute, and 200 μL of sample is injected into the vial. After a period of time, for instance, 6 minutes, samples (100 μL) of the headspace of the sealed vial can be removed with a gas-tight syringe, and then injected into a gas chromatograph. The resulting hydrogen peak can be compared to a known standard curve to calculate micromoles of hydrogen produced per mL of assay solution. The specific activity is at least 0.05, at least 0.1, or at least 0.125 micromoles H2 produced min-1 mg protein-1. If the hydrogenase polypeptide is is further purified, for instance using column chromatography with DEAF Sepharose or a similar matrix, and Phenyl Sepharose or a similar matrix, as described herein, the specific activity is at least 0.5, at least 1, least 5, or at least 7.5 micromoles H2 produced min-1 mg protein-1. A hydrogenase polypeptide described herein that is to be tested may be expressed in a microbe, preferably an E. coli described herein, or produced using recombinant techniques, chemical or enzymatic synthesis. If the hydrogenase polypeptide is expressed in a microbe, preferably the microbe has undetectable levels of endogenous hydrogenase activity. Since most microbes do naturally express hydrogenase activity, microbes useful for expression of the hydrogenase polypeptides described herein may be engineered to not express endogenous hydrogenase activity. An example of such a microbe is MW1001 (Maeda et al. 2007. BMC Biotechnol 7:25). Other microbes can be engineered using methods known in the art to not express endogenous hydrogenase activity.
[0062]A hydrogenase polypeptide described herein typically has additional characteristics, including heat activation. A hydrogenase polypeptide described herein is typically activated by incubation at an elevated temperature. For instance, if a hydrogenase polypeptide is produced at temperatures prevalent when using E. coli to produce the polypeptide, e.g., 37° C., the specific activity can be increased by incubation at a temperature of at least 70° C., or at least 80° C. A hydrogenase polypeptide described herein also has the characteristic of being stable stable to incubation at high temperature. For instance, a hydrogenase polypeptide described herein does not lose any of its activity after incubation 90° C. for 10 hours. A hydrogenase polypeptide described herein also has the characteristic of being as sensitive to oxygen as the native form of the enzyme purifed from P. furiosus. A hydrogenase polypeptide described herein that has hydrogenase activity catalyzes the proton reduction (H2 production) coupled to the oxidation of an electron donor, such as NADPH, and also catalyzes the reverse, i.e., the oxidation of H2 coupled to the reduction of an electron acceptor, such as NADP. Another reaction that may be catalyzed by hydrogenase polypeptides described herein is the reduction of elemental sulfur to hydrogen sulfide with the use of molecular hydrogen (Kim et al. 1999. Biotechnol. Bioeng. 65:108-113; Ma et al., Proc. Nat. Acad. Sci. USA. 90:5341-5344).
[0063]A candidate polypeptide having structural similarity to a reference polypeptide may include conservative substitutions of amino acids present in the reference polypeptide. A conservative substitution is typically the substitution of one amino acid for another that is a member of the same class. For example, it is well known in the art of protein biochemistry that an amino acid belonging to a grouping of amino acids having a particular size or characteristic (such as charge, hydrophobicity, and/or hydrophilicity) can generally be substituted for another amino acid without substantially altering the secondary and/or tertiary structure of a polypeptide. For the purposes of this invention, conservative amino acid substitutions are defined to result from exchange of amino acids residues from within one of the following classes of residues: Class I: Gly, Ala, Val, Leu, and Ile (representing aliphatic side chains); Class II: Gly, Ala, Val, Leu, Ile, Ser, and Thr (representing aliphatic and aliphatic hydroxyl side chains); Class III: Tyr, Ser, and Thr (representing hydroxyl side chains); Class IV: Cys and Met (representing sulfur-containing side chains); Class V: Glu, Asp, Asn and Gln (carboxyl or amide group containing side chains); Class VI: His, Arg and Lys (representing basic side chains); Class VII: Gly, Ala, Pro, Trp, Tyr, Ile, Val, Leu, Phe and Met (representing hydrophobic side chains); Class VIII: Phe, Trp, and Tyr (representing aromatic side chains); and Class IX: Asn and Gln (representing amide side chains).
[0064]There are eight major groups of hydrogenase based on sequence similarities of their catalytic subunits (Vignais and Billoud. 2007. Chem Rev 107:4206-72). Hydrogenase polypeptides described herein are members of group 3b, the bidirectional NAD(P)-linked hydrogenases, and include, for instance, those found in other Pyrococcus and closely related species, e.g., Thermococcus, and also in photosynthetic bacteria (Thiocapsa) and aerobic hydrogen bacteria (Ralstonia). All [NiFe] hydrogenases (from all groups) are characterized by two CxxC domains, termed L1 and L2, that coordinate the Ni and Fe atom at the catalytic site of the catalytic subunit, alpha, an example of which is shown at SEQ ID NO:8. Each of the groups has conserved sequences surrounding these sites. The consensus L1 site is R[IV]C[AGS][FIL]Cxxx[HY]xx[AST][ANS]xx[AS][AILV] (SEQ ID NO:46), where x is any amino acid, and where one amino acid is chosen from each set enclosed by brackets (e.g., the second amino acid of the consensus is I or V). Examples of L1 sites include, but are not limited to, RICSFCSAAHKLTALEAA (SEQ ID NO:47), and RVCGICSAAHKLTALEAA (SEQ ID NO:48). The consensus L2 site is R[ANS][FHY]DPCISC[AS][ATV]H (SEQ ID NO:49), where one amino acid is chosen from each set enclosed by brackets (e.g., the second amino acid of the consensus is A or N or S). In both L1 and L2 sites, the change of any of the four cysteines is expected to result in a decrease or complete loss of hydrogenase activity. Further, regions of conservation can be determined by comparison of the amino acid sequences of each subunit (SEQ ID NO:2, 4, 6, or 8) with other hydrogenase subunits from other organisms (see FIG. 21). Thus, the skilled person can easily determine which amino acid residues can be altered without any effect on hydrogenase activity, and which cannot be changed or can be altered only through use of conservative substitutions.
[0065]Guidance concerning how to make phenotypically silent amino acid substitutions is provided in Bowie et al. (1990. Science, 247:1306-1310), wherein the authors indicate proteins are surprisingly tolerant of amino acid substitutions. For example, Bowie et al. disclose that there are two main approaches for studying the tolerance of a polypeptide sequence to change. The first method relies on the process of evolution, in which mutations are either accepted or rejected by natural selection. The second approach uses genetic engineering to introduce amino acid changes at specific positions of a cloned gene and selects or screens to identify sequences that maintain functionality. As stated by the authors, these studies have revealed that proteins are surprisingly tolerant of amino acid substitutions. The authors further indicate which changes are likely to be permissive at a certain position of the protein. For example, most buried amino acid residues require non-polar side chains, whereas few features of surface side chains are generally conserved. Other such phenotypically silent substitutions are described in Bowie et al, and the references cited therein.
[0066]A candidate polypeptide having structural similarity to one of the polypeptides SEQ ID NO:2, 4, 6, or 8 has hydrogenase activity when expressed in a microbe with the other 3 reference structural polypeptides and the other 8 reference accessory polypeptides (SEQ ID NO:s:10, 12, 14, 16, 18, 20, 22, and 24, described in detail below). For instance, when determining if a candidate polypeptide having some level of identity to SEQ ID NO:2 has hydrogenase activity, the candidate polypeptide is expressed in a microbe with reference polypeptides SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, and 24. Likewise, when determining if a candidate polypeptide having some level of identity to SEQ ID NO:4 has hydrogenase activity, the candidate polypeptide is expressed in a microbe with reference polypeptides SEQ ID NO: 2, 6, 8, 10, 12, 14, 16, 18, 20, 22, and 24, and so on for determining hydrogenase activity of candidate polypeptides having identity to each of the other structural or accessory polypeptides.
[0067]P. furiosus contains a second hydrogenase (SH-II) that is highly similar to the hydrogenase polypeptides described herein. SH-II was purified from native biomass of P. furiosus (Ma et al., 2000. J Bacteriol. 182(7):1864-71). It has very similar catalytic properties, and virtually identical physical properties to those of the hydrogenase polypeptides described herein. It contains four subunits of very similar size to those of the hydrogenase polypeptides described herein and these are predicted to coordinate exactly the same cofactors as the subunits of the hydrogenase polypeptides described herein. However, the sequences show only 55-63% sequence similarity. Nevertheless, P. furiosus has only one set of accessory genes to process and mature a hydrogenase, and so it is predicted that the set of accessory coding regions described herein that are used by P. furiosus to process the hydrogenase polypeptides described herein must also be used by the organism to process SH-II. Despite the apparent lack of sequence similarity the SH-I alpha and SH-II alpha subunits share a high degree of identity in the conserved L2 region and the C-terminal sequence that is cleaved for hydrogenase activity. Therefore, it is expected that the E. coli expression system described herein, which includes the accessory genes of P. furiosus, would also process and produce an active form of SH-II. In this case the plasmid containing the four SH-I genes would be replaced in E. coli by one containing the four SH-II genes.
[0068]Also provided are isolated polynucleotides encoding the polypeptides described herein. For instance, a polynucleotide may have a nucleotide sequence encoding a polypeptide having the amino acid sequence shown in SEQ ID NOs:2, 4, 6, or 8, and an example of the class of nucleotide sequences encoding each polypeptide is SEQ ID NOs:1, 3, 5, 7, respectively. It should be understood that a polynucleotide encoding a polypeptides represented by one of the sequences disclosed herein, e.g., SEQ ID NOs:2, 4, 6, or 8, is not limited to the nucleotide sequence disclosed at the polynucleotide sequences disclosed herein, e.g., SEQ ID NOs:1, 3, 5, or 7, respectively, but also includes the class of polynucleotides encoding such polypeptides as a result of the degeneracy of the genetic code. For example, the naturally occurring nucleotide sequence SEQ ID NO:1 is but one member of the class of nucleotide sequences encoding a polypeptide having the amino acid sequence SEQ ID NO:2. Likewise, the naturally occurring nucleotide sequences SEQ ID NO:3, 5, or 7, are but single members of the class of nucleotide sequences encoding a polypeptide having the amino acid sequence SEQ ID NO:4, 6, or 8, respectively. The class of nucleotide sequences encoding a selected polypeptide sequence is large but finite, and the nucleotide sequence of each member of the class may be readily determined by one skilled in the art by reference to the standard genetic code, wherein different nucleotide triplets (codons) are known to encode the same amino acid.
[0069]A polynucleotide disclosed herein may have structural similarity with the nucleotide sequence of SEQ ID NO:1, 3, 5, or 7. Such a polynucleotide may be isolated from a microbe, such as thermophiles (prokaryotic microbes that grow in environments at temperatures of between 60° C. and 79° C.), and hyperthermophiles (prokaryotic microbes that grow in environments at temperatures above 80° C.). Examples include archaea such as, but not limited to, a member of the genera Pyrococcus, for instance P. furiosus, P. abyssi, or P. horikoshii, or a member of the genera Thermococcus, for instance, T. kodakaraensis or T. onnurineus, or may be produced using recombinant techniques, or chemically or enzymatically synthesized. A polynucleotide disclosed herein may further include heterologous nucleotides flanking the open reading frame. Typically, heterologous nucleotides may be at the 5' end of the coding region, at the 3' end of the coding region, or the combination thereof. The number of heterologous nucleotides may be, for instance, at least 10, at least 100, or at least 1000.
[0070]An aspect of the present invention also includes fragments of the polypeptides described herein, and the polynucleotides encoding such fragments, such as SEQ ID NOs:2, 4, 6, and 8, as well as those polypeptides having structural similarity to SEQ ID NOs: 2, 4, 6, and 8. A polypeptide fragment may include a sequence of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, or at least 100 amino acid residues.
[0071]A polypeptide described herein or a fragment thereof may be expressed as a fusion polypeptide that includes a polypeptide of the present invention or a fragment thereof and a heterologous amino acid sequence. The heterologous amino acid sequence may be present at the amino terminal end or the carboxy terminal end of a polypeptide, or it may be present within the amino acid sequence of the polypeptide. For instance, the heterologous amino acid sequence may be useful for purification of the fusion polypeptide by affinity chromatography. Various methods are available for the addition of such affinity purification tags to proteins. Examples of tags include a polyhistidine-tag, maltose-binding protein, and Strep-tag®. Representative examples may be found in Hopp et al. (U.S. Pat. No. 4,703,004), Hopp et al. (U.S. Pat. No. 4,782,137), Sgarlato (U.S. Pat. No. 5,935,824), Sharma (U.S. Pat. No. 5,594,115, and Skerra and Schmidt, 1999, Biomol Eng. 16:79-86). In another example, the heterologous amino acid sequence may be a carrier polypeptide. The carrier polypeptide may be used to increase the immunogenicity of the fusion polypeptide to increase production of antibodies that specifically bind to a polypeptide of the invention. The invention is not limited by the types of carrier polypeptides that may be used to create fusion polypeptides. Examples of carrier polypeptides include, but are not limited to, keyhole limpet hemacyanin, bovine serum albumin, ovalbumin, mouse serum albumin, rabbit serum albumin, and the like. The heterologous amino acid sequence, for instance, a tag or a carrier, may also include a cleavable site that permits removal of most or all of the additional amino acid sequence. Examples of cleavable sites are known to the skilled person and routinely used, and include, but are not limited to, a TEV protease recognition site. The number of heterologous amino acids may be, for instance, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, or at least 40.
[0072]A polypeptide described herein may be modified. An example of a modification is a chemical modification with a hydrophobic group. Examples of suitable hydrophobic groups include, but are not limited to, polyethylene glycol derivatives, such as polyoxyethylene glycol p-nitrophenyl carbonate (PEG-pNPC), methoxypolyethylene glycol p-nitrophenyl carbonate (MPEG-pNPC), and methoxypolyethylene glycol cyanuric chloride (MPEG-CC). Preferably, the molecular weight of a polyethylene glycol derivative is less than 5 KDa. Methods for chemically modifying polypeptides are routine and known in the art. Such modified polypeptides can have altered characteristics such as increased solubility in organic solvents while retaining enzymatic activity. An example is modification of a polypeptide described herein is taught by Kim et al. (1999. Biotechnol. Bioeng. 65:108-113), where an SH-I hydrogenase polypeptide obtained from P. furiosus was modified with MPEG-CC. The resulting polypeptide retained the ability to reduce elemental sulfur to hydrogen sulfide (Ma et al., Proc. Nat. Acad. Sci. USA. 90:5341-5344).
[0073]A polynucleotide disclosed herein can be present in a vector. A vector is a replicating polynucleotide, such as a plasmid, phage, or cosmid, to which another polynucleotide may be attached so as to bring about the replication of the attached polynucleotide. Construction of vectors containing a polynucleotide of the invention may employ standard ligation techniques known in the art. See, e.g., (Sambrook et al., 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.). A vector can provide for further cloning (amplification of the polynucleotide), i.e., a cloning vector, or for expression of the polynucleotide, i.e., an expression vector. The term vector includes, but is not limited to, plasmid vectors, viral vectors, cosmid vectors, and artificial chromosome vectors. Preferably the vector is a plasmid.
[0074]Selection of a vector depends upon a variety of desired characteristics in the resulting construct, such as a selection marker, vector replication rate, and the like. Vectors can be introduced into a host cell using methods that are known and used routinely by the skilled person. The vector may replicate separately from the chromosome present in the microbe, or the polynucleotide may be integrated into a chromosome of the microbe.
[0075]An expression vector may optionally include a promoter that results in expression of an operably linked coding regino during growth in anaerobic conditions. Promoters act as regulatory signals that bind RNA polymerase in a cell to initiate transcription of a downstream (3' direction) coding region. The promoter used may be a constitutive or an inducible promoter. It may be, but need not be, heterologous with respect to a host cell. Examples of suitable promoters include, but are not limited to, P-hya (SEQ ID NO:25), P-hyc (SEQ ID NO:26), and P-xyl (SEQ ID NO:27). The hydrogenase promoters P-hya and P-hyc can be obtained from E. coli, and are expressed (and at different strengths) under anaerobic growth conditions and at undetectable levels under aerobic growth conditions. The xylose responsive promoter P-xyl is a slightly modified version of the B. megaterium xylose promoter (Qazi et al. 2001. Microb Ecol 41:301-309) denoted PxylA (Rygus et al. 1991. Arch Microbiol 155:535-42) (P-xyl, SEQ ID NO:27). This xylose promoter was discovered to be useful for expressing genes in E. coli under either aerobic or anaerobic conditions. This is a promoter sequence derived from an aerobic, gram positive organism (rather than from E. coli, which is a facultatively anaerobic gram negative organism), and it was not expected that this would function in E. coli. Fortuitiously, we discovered that in E. coli it expresses at very high levels under both aerobic and anaerobic conditions.
[0076]It should be understood that a promoter that drives expression of an operably linked coding region during growth in anaerobic conditions is not limited to the nucleotide sequences disclosed at SEQ ID NOs:25, 26, or 27. A person of ordinary skill will understand that the promoters disclosed herein may be modified by substitution (such as transition or transversion), deletion, and/or insertion of one or more nucleotides, where the altered promoter maintains its ability to drive expression of an operably linked coding region during growth in anaerobic conditions. Such modified promoters can be easily constructed using routine methods known in the art such as classical mutagenesis, site-directed mutagenesis, and DNA shuffling. Other useful promoters can be obtained from the genomes of microbes by reference to the regions upstream of coding sequences that are expressed under anaerobic conditions, such as coding regions encoding hydrogenase enzymes or involved in anaerobic respiration.
[0077]A vector introduced into a host cell optionally includes one or more marker sequences, which typically encode a molecule that inactivates or otherwise detects or is detected by a compound in the growth medium. For example, the inclusion of a marker sequence may render the transformed cell resistant to an antibiotic, or it may confer compound-specific metabolism on the transformed cell. Examples of a marker sequence include, but are not limited to, sequences that confer resistance to kanamycin, ampicillin, chloramphenicol, tetracycline, streptomycin, and neomycin.
[0078]Provided herein is a series of expression vectors which express recombinant proteins under strictly anaerobic growth conditions in a microbe, preferably E. coli. No E. coli protein expression vectors currently used are capable of this. In fact, most E. coli expression systems use a modified bacteriophage T7 promoter, regulated by a modification of the E. coli lactose operon repressor, so that expression of target genes can be induced by addition of lactose or the lactose homolog isopropyl-β-D-thiogalactopyranoside (IPTG) (Studier, F. W. 2005. Protein Expr Purif 41:207-34; Terpe, 2006. Appl Microbiol Biotechnol 72:211-22). However, this system does not operate under strictly anaerobic conditions and herein we utilized promoters that E. coli uses when grown in the absence of air. The expression vectors include a P-hly, P-hlc, or P-xyl promoter. An expression vector may include other polynucleotides that aid in, for instance, the cloning, manipulation, or expression of an operably linked coding region, or the purification of a polypeptide encoded by the coding region.
[0079]Polypeptides and fragments thereof described herein may be produced using recombinant DNA techniques, such as an expression vector present in a cell. Such methods are routine and known in the art. The polypeptides and fragments thereof may also be synthesized in vitro, e.g., by solid phase peptide synthetic methods. Solid phase peptide synthetic methods are routine and known in the art. A polypeptide produced using recombinant techniques or by solid phase peptide synthetic methods may be further purified by routine methods, such as fractionation on immunoaffinity or ion-exchange columns, ethanol precipitation, reverse phase HPLC, chromatography on silica or on an anion-exchange resin such as DEAE, chromatofocusing, SDS-PAGE, ammonium sulfate precipitation, gel filtration using, for example, Sephadex G-75, or ligand affinity. A preferred method for isolating and optionally purifiying a hydrogenase polypeptide described herein includes column chromatography using, for instance, ion exchange chromatography, such as DEAE sepharose, hydrophobic interaction chromatography, such as phenyl sepharose, or the combination thereof.
[0080]Polynucleotides of the present invention may be obtained from microbes, or produced in vitro or in vivo. For instance, methods for in vitro synthesis include, but are not limited to, chemical synthesis with a conventional DNA/RNA synthesizer. Commercial suppliers of synthetic polynucleotides and reagents for such synthesis are well known.
[0081]Also disclosed herein are genetically modified microbes that have exogenous polynucleotides encoding one or more of the polypeptides disclosed herein. Compared to a control microbe that is not genetically modified, a genetically modified microbe may exhibit production of a hydrogenase polypeptide, such as a tetrameric or a dimeric hydrogenase polypeptide. Accordingly, in one aspect of the invention a genetically modified microbe may include one or more exogenous polynucleotides that encode the subunits of a hydrogenase polypeptide. Exogenous polynucleotides encoding a hydrogenase polypeptide may be present in the microbe as a vector or integrated into a chromosome.
[0082]Examples of useful bacterial host cells include, but are not limited to, Escherichia (such as Escherichia coli), Salmonella (such as Salmonella enterica, Salmonella typhi, Salmonella typhimurium), a Thermotoga spp. (such as T. maritime), an Aquifex spp (such as A. aeolicus), photosynthetic organisms including cyanobacteria (such as a Synechococcus spp. such as Synechococcus sp. WH8102 or Synechocystis spp. such as Synechocystis PCC 6803) and photosynthetic bacteria (such as a Rhodobacter spp. such as Rhodobacter sphaeroides) and the like. Examples of useful archaeal host cells include, but are not limited to a Pyrococcus spp., such as P. furiosus, P. abyssi, and P. horikoshii, a Sulfolobus spp, such as S. solfataricus, a Thermococcus spp., such as T. kodakaraensis, and the like.
[0083]A genetically modified microbe having exogenous polynucleotides encoding one or more of the polypeptides disclosed herein may optionally include accessory polypeptides. These accessory polypeptides act to assemble the hydrogenase polypeptides described herein. Without intending to be limiting, it is believed the accessory polypeptides play a role in constructing the non-protein ligands present in the hydrogenase polypeptides. The accessory polypeptides include a first accessory polypeptide having the amino acid sequence SEQ ID NO:10 or an amino acid sequence having structural similarity thereto, a second accessory polypeptide having the amino acid sequence SEQ ID NO:12 or an amino acid sequence having structural similarity thereto, a third accessory polypeptide having the amino acid sequence SEQ ID NO:14 or an amino acid sequence having structural similarity thereto, a fourth accessory polypeptide having the amino acid sequence SEQ ID NO:16 or an amino acid sequence having structural similarity thereto, a fifth accessory polypeptide having the amino acid sequence SEQ ID NO:18 or an amino acid sequence having structural similarity thereto, a sixth accessory polypeptide having the amino acid sequence SEQ ID NO:20 or an amino acid sequence having structural similarity thereto, a seventh accessory polypeptide having the amino acid sequence SEQ ID NO:22 or an amino acid sequence having structural similarity thereto, and an eighth accessory polypeptide having the amino acid sequence SEQ ID NO:24 or an amino acid sequence having structural similarity thereto. Preferably, an exogenous polynucleotide encoding an accessory polypeptide is operably linked to a promoter that drives expression of the polynucleotide during growth in anaerobic conditions.
[0084]Also provided herein are isolated polypeptides having the amino acid sequence SEQ ID NOs:10, 12, 14, 16, 18, 20, 22, and 24, and amino acid sequences having structural similarity thereto, and isolated polynucleotides encoding the polypeptides.
[0085]A candidate polypeptide having structural similarity to one of the accessory polypeptides (SEQ ID NOs: 10, 12, 14, 16, 18, 20, 22, or 24) has activity when expressed in a microbe with the 4 reference polypeptides encoding a tetrameric hydrogenase polypeptide and the other 7 reference accessory polypeptides. For instance, when determining if a candidate polypeptide having some level of identity to SEQ ID NO:10 has the activity of catalyzing the biosynthesis of an active hydrogenase polypeptide, the candidate polypeptide is expressed in a microbe with reference polypeptides SEQ ID NO: 2, 4, 6, 8, 12, 14, 16, 18, 20, 22, and 24. Likewise, when determining if a candidate polypeptide having some level of identity to SEQ ID NO:12 has the activity of catalyzing the biosynthesis of an active hydrogenase polypeptide, the candidate polypeptide is expressed in a microbe with reference polypeptides SEQ ID NO: 2, 4, 6, 8, 10, 14, 16, 18, 20, 22, and 24, and so on.
[0086]In another aspect a genetically modified microbe may express an endogenous hydrogenase polypeptide at an increased level or having altered activity. For instance, a genetically modified microbe may include an altered regulatory sequence, where the altered regulatory sequence is operably linked to one or more coding regions encoding subunits of a hydrogenase polypeptide. In another example, an endogenous polynucleotide encoding a subunit of a hydrogenase polypeptide may include a mutation, such as a deletion, an insertion, a transition, a transversion, or a combination thereof, that alters a characteristic of the hydrogenase polypeptides, such as the activity. In those aspects where a genetically modified microbe expresses an endogenous hydrogenase polypeptide at an increased level or having altered activity, the microbe is typically an archaea, such as Pyrococcus spp., such as P. furiosus, P. abyssi, and P. horikoshii, a Thermococcus spp., such as T. kodakaraensis and T. onnurineus, and the like. Methods for modifying genomic DNA sequences of thermophiles and hyperthermophiles are known (Yang et al., PCT Application No. PCT/US2008/081157, filed Oct. 24, 2008, and Westpheling et al., U.S. Provisional Patent Application 61/000,338, filed Oct. 25, 2007).
[0087]A genetically modified microbe may include other modifications in addition to exogenous polynucleotides encoding one or more of the polypeptides disclosed herein, or expressing an endogenous hydrogenase polypeptide at an increased level or having altered activity. Such modifications may provide for increased production of electron donors used by a hydrogenase polypeptide described herein, such as NADPH. For instance, modifications may provide for increased levels in a cell of the enzymes used in the oxidative phase of the pentose phosphate pathway, such as glucose 6-phosphate dehydrogenase, 6-phosphogluconolactonase, and 6-phosphogluconate dehydrogenase. Modifications may provide for increased levels of substrates used in the oxidative phase of the pentose phosphate pathway by, for instance, increasing production of enzymes in biosynthetic pathways, reducing feedback inhibition at different locations in biosynthetic pathways, increasing importation of substrates and/or compounds used in biosynthetic pathways to make substrates, decreasing catabolism of substrates and/or compounds used in biosynthetic pathways to make substrates. Methods for modifying microbes to increase these and other compounds are routine and known in the art.
[0088]A genetically modified microbe of the present invention may include other modifications that provide for increased ability to use renewable resources, such as, but not limited to, biomass containing polysaccharides that can be broken down to yield glucose 6-phosphate, the first reactant of the pentose phosphate pathway and the substrate of the enzyme glucose 6-phosphate dehydrogenase. An example of such a polysaccharide is starch. Such modifications may provide for increased production of enzymes useful in the breakdown of biomass.
[0089]The hydrogenase polypeptides described herein can be used to produce molecular hydrogen. Molecular hydrogen is used in the petroleum and chemical industries. For instance, in a petrochemical plant, hydrogen is used for hydrodealkylation, hydrodesulfurization, and hydrocracking, all methods of refining crude oil for wider use. Molecular hydrogen is used for the production of ammonia, methanol, hydrochloric acid, and as a reducing agent for metal ores. In the food industry molecular hydrogen is used for hydrogenation of vegetable oils and fats, for instance, in producing margarine from liquid vegetable oil. Hydrogen is also useful as a fuel, both in traditional combustion engines as well as in fuel cells, and produces only water vapor when oxidized with oxygen.
[0090]In addition to hydrogen production systems, the applications for hydrogenase polypeptides described herein include cofactor [beta-1,4-nicotinamide adenindinucleotide, reduced form (NADH) or beta-1,4-nicotinamide adenindinucleotide phosphate, reduced form (NADPH)] regeneration (from NAD or NADP, respectively) using hydrogen as the source of energy (Hummel, 1999. Trends Biotechnol. 17:487-492; Mertens et al,. 2003. J. Mol. Catal. B: Enzym. 24-25:39-52). The hydrogenase polypeptides described herein have significant advantages over other enzymatic methods to regenerate these reduced cofactors as there is no oxidation product to remove or dispose of other than protons (from hydrogen oxidation). This is in contrast to, for example, lactate dehydrogenase, where lactate is the source of energy and the product is the C3 compound pyruvate (Eberly and Ely, 2008. Crit. Rev. Microbiol. 34:117-130). Cofactor regeneration using hydrogen with no waste products would be of tremendous benefit for the pharmaceutical industry.
[0091]Hydrogenase polypeptides obtained from P. furiosus have also been chemically modified such that the enzyme is soluble and active in water-immicible organic solvents such as toluene (Kim et al. 1999. Biotechnol. Bioeng. 65:108-113). Hydrogenase polypeptides described herein can also be chemically modified. Thus, the polypeptides described herein can reduce water-insoluble compounds with hydrogen. For example, elemental sulfur can be reduced to H2S, which is useful in removal of sulfur from some compositions used in the petroleum and coal industries.
[0092]Accordingly, provided herein are methods for making and using the hydrogenase polypeptides of the present invention. Methods for making a polypeptide having hydrogenase activity can include providing a genetically modified microbe that includes exogenous polynucleotides encoding 1, 2, 3, or 4 subunits of a hydrogenase polypeptide described herein, preferably 2 or 4 subunits, and incubating the microbe under conditions suitable for expression of the exogenous polynucleotides to produce a polypeptide, wherein the polypeptide has hydrogenase activity. The genetically modified microbe can be a bacterial cell, such as a gram negative, for instance, E. coli, or it can be an archaeal cell, for instance, a member of the genera Pyrococcus, for instance P. furiosus, P. abyssi, or P. horikoshii, or a member of the genera Thermococcus, for instance, T. kodakaraensis or T. onnurineus, or a photosynthetic bacterium; for instance, Rhodobacter sphaeroides. The genetically modified microbe may include exogenous polynucleotides encoding the accessory polypeptides described herein. In those aspects where the genetically modified microbe is a bacterial cell, such as E. coli, the genetically modified microbe typically does include exogenous polynucleotides encoding the accessory polypeptides. The incubation conditions are typically anaerobic, and the temperature may be at least 37° C., at least 60° C., at least 70° C., at least 80° C., or at least 90° C. The methods can be performed using any convenient manner. For instance, methods for growing microbial cells to high densities are routine and known in the art, and include batch and continuous fermentation processes. The method may further include isolating, and optionally purifying the hydrogenase polypeptide. Methods for isolating and optionally purifying hydrogenase polypeptides described herein are routine and known in the art.
[0093]Also provided herein are methods for using a hydrogenase polypeptide described herein. The methods can include providing a hydrogenase polypeptide, and incubating the hydrogenase polypeptide under conditions suitable for producing desirable products such as H2 or NADPH. Optionally, the product is collected using methods routine and known in the art.
[0094]In one aspect, the hydrogenase polypeptide used in the methods is cell-free, for instance, it is isolated, or optionally purified. Conditions suitable for incubating an isolated hydrogenase polypeptide may generally include aqueous conditions containing a suitable buffer, such as, but not limited to, EPPS (4-(2-hydroxyethyl)piperazine-1-propanesulfonic acid) at a concentration of 50 mM and buffered near neutral pH (typically 7.5-8.5). The hydrogenase polypeptide may be incubated in an organic solvent, such as, but not limited to, toluene, xylene, benzene, methylene chloride, chloroform, or tetrahydrofuran. A hydrogenase polypeptide that is incubated in an organic solvent is typically chemically modified, preferably with a hydrophobic group, as described herein. The incubation conditions are typically anaerobic, and the temperature may be at least 60° C., at least 70° C., at least 80° C., or at least 90° C. The methods can be performed in any convenient manner. Thus, the reaction steps may be performed in a single reaction vessel. The process may be performed as a batch process or as a continuous process, with desired product and waste products being removed continuously and new raw materials being introduced.
[0095]Methods for using an isolated hydrogenase polypeptide include the use of such a polypeptide bound to a surface. In some aspects the surface can be one that conducts electricity, such as an anode. Hydrogenase polypeptides bound to surfaces are useful for applications such as, but not limited to, fuel cells (Armstrong, U.S. Published Patent Application 20040214053).
[0096]Methods for using an isolated hydrogenase polypeptide include production of desirable products, such as molecular hydrogen, using renewable resources. For instance, biomass derived polysaccharides can be used as a substrate for the production of monomeric carbohydrates that could then be used as a source of NADPH, which in turn can be used by a hydrogenase polypeptide disclosed herein to produce hydrogen. Examples of such methods include in vitro hydrogen production as taught by Woodward et al. (1996. Nat Biotechnol 14:872-4), and Zhang et al. (2007. PLoS ONE 2:e456, and U.S. Published Patent Application 20070264534). Examples of useful polysaccharides include, but are not limited to, starch and cellulose. Renewable sources of these polysaccharides are known in the art.
[0097]In another aspect, a hydrogenase polypeptide used in the methods is present in a microbial cell. The methods can include incubating the microbial cell under conditions suitable for the expression of the polypeptide. The microbial cell is typically a genetically modified microbe, and may be a bacterial cell, such as a gram negative, for instance, E. coli, a photosynthetic organism, for instance, R. sphaeroides, or it can be an archaeal cell, for instance, a member of the genera Pyrococcus, for instance P. furiosus, P. abyssi, or P. horikoshii, or a member of the genera Thermococcus, for instance, T. kodakaraensis or T. onnurineus. The microbe may include exogenous polynucleotides encoding the accessory polypeptides described herein. In those aspects where the microbe is a bacterial cell, such as E. coli, the microbe typically includes exogenous polynucleotides encoding the accessory polypeptides. The incubation conditions are typically anaerobic, and the temperature may be at least 37° C., at least 60° C., at least 70° C., at least 80° C., or at least 90° C. The conditions used to incubate the microbial cell typically include substrates that can be used by a cell to produce a reactant, such as NADPH, or the reductant such as NADPH can be photoproduced by a photosynthetic cell, and the NADPH can be used by the hydrogenase polypeptide to produce molecular hydrogen. Examples of useful substrates include renewable resources containing polysaccharides such as starch, cellulose, or the combination. Alternatively, the conditions used to incubate the microbial cell can include H2, which can be used by the hydrogenase polypeptide to convert NADP to NADPH. The methods can be performed using any convenient manner. For instance, methods for growing microbial cells to high densities are routine and known in the art, and include batch and continuous fermentation processes.
[0098]The present invention is illustrated by the following examples. It is to be understood that the particular examples, materials, amounts, and procedures are to be interpreted broadly in accordance with the scope and spirit of the invention as set forth herein.
Example 1
Anaerobic Expression Vectors
[0099]A series of compatible vectors has been constructed with the various promoters described above. The expression vectors described here are derivatives of those described in Horanyi et al., (U.S. Published Patent Application 20060183193). These are a series of four vectors with compatible origins of replication and different antibiotic resistance markers which allow coexpression of multiple genes in E. coli using the lac operon regulation. These vectors have been modified to include the "anaerobic" promoters described above (Table 2) and up to 12 genes derived from P. furiosus. These are a) the structural genes for the four subunits of P. furiosus hydrogenase (Table 1) and b) the eight genes that encode the hydrogenase processing genes in P. furiosus (Table 1). The complete list of vectors created is found in Table 3, and four particular examples are shown in FIGS. 1-4. The complete map and sequences of these four vectors are shown in FIG. 8.
TABLE-US-00001 TABLE 1 Pyrococcus furiosus genes encoding structural and accessory proteins for cytoplasmic hydrogenase I and Genbank accession numbers. Coding region or deduced polypeptide sequence encoded by SEQ ID PF gene Gene Genbank coding NO identifier Accession# region 1 PF0891 Structural gene, AE010204.1 coding hydrogenase I region beta subunit 2 PF0891 Structural gene, AAL81015 Polypeptide hydrogenase I encoded by beta subunit coding region 3 PF0892 Structural gene, AE010204.1 coding hydrogenase I region gamma subunit 4 PF0892 Structural gene, AAL81016 Polypeptide hydrogenase I encoded by gamma subunit coding region 5 PF0893 Structural gene, AE010204.1 coding hydrogenase I region delta subunit 6 PF0893 Structural gene, AAL81017 Polypeptide hydrogenase I encoded by delta subunit coding region 7 PF0894 Structural gene, AE010204.1 coding hydrogenase I region alpha subunit 8 PF0894 Structural gene, AAL81018 Polypeptide hydrogenase I encoded by alpha subunit coding region 9 PF0548 HypC AE010177.1 coding region 10 PF0548 HypC AAL80672 Polypeptide encoded by coding region 11 PF0549 HypD AE010177.1 coding region 12 PF0549 HypD AAL80673 Polypeptide encoded by coding region 13 PF0559 HypF AE010178.1 coding region 14 PF0559 HypF AAL80683 Polypeptide encoded by coding region 15 PF0604 HypE AE010182.1 coding region 16 PF0604 HypE AAL80728 Polypeptide encoded by coding region 17 PF0615 HypA AE010183.1 coding region 18 PF0615 HypA AAL80739 Polypeptide encoded by coding region 19 PF0616 HypB AE010183.1 coding region 20 PF0616 HypB AAL80740 Polypeptide encoded by coding region 21 PF0617 HycI AE010183.1 coding region 22 PF0617 HycI AAL80741 Polypeptide encoded by coding region 23 PF1401 SlyD AE010243.1 coding region 24 PF1401 S1yD AAL81525 Polypeptide encoded by coding region
TABLE-US-00002 TABLE 2 Escherichia coli hydrogenase promoter DNA sequences derived from the K12 strain genome (accession number NC_000913), and Bacillus megaterium xylose promoter DNA sequences (derived from accession number X57598) (Qaziet al. 2001. Microb Ecol 41:301-309). Genome SEQ ID Gene Genbank nucleotide DNA NO identifier Accession# start and stop Sequence 25 E. coli K12 hya NC_000913.2 1031062-1031364 CTCGAATTCCTTCTCTTTTACTCGTTTAGCAAC promoter CGGCTAAACATCCCCACCGCCCGGCCAAAAGAA AAATAGGTCCATTTTTATCGCTAAAAGATAAAT CCACACAGTTTGTATTGTTTTGTGCAAAAGTTT CACTACGCTTTATTAACAATACTTTCTGGCGAC GTGCGCCAGTGCAGAAGGATGAGCTTTCGTTTT CAGCATCTCACGTGAAGCGATGGTTTGCCTTGC TACAGGGACGTCGCTTGCCGACCATAAGCGCCC GGTGTCCTGCCGGTGTCGCAAGGAGGAGAGACG TGCGATATGGGTCATCACCATCATCACCACGGC TCGATCACAAGTTTGTACAAAAAAGCAGGCTCA GAAAACCTGTATTTTCAGGGAGGA(PFU GENE)* 26 E.coli K12 hyc NC_0009112 2848966-2848355 CTCGAATTCTGCAGCATGTCACCATGACACTGTGG promoter ACAGCGGCGGACGCGCTGGGTCAGTAGCGTCACAT ACTGTTGGCATGTTTCACACCAGCATTCGGCCTCT TGTTCTTCGAGGTGCAGTTTACAACCTTCCGCCAC GCTGCCGCGGCAAACCAGATCAAAACAAAAGGCAA GAGAGCTGGTTTCGACACAAGAAAATGCGCCAATT TTGAGCCAGACCCCAGTTACGCGTTTTGCGCCGTG TTTTGCGGCCTGCTGTTCGATCAATTCCAGTGCCC GTTGGCAGAGGGTTATTTCGTGCATATCGCCTCCC ATTAACTATTGCCAGCTACAAGCAATAATTGTGCC AGTGTTGATTATCCCTGCGGTGAATAATGTCGATG ATGTCGAAATGACACGTCGACACGGCGACGAAATT CATCTTTAGCTTAAAAATCTCTTTAATAACAATAA ATTAAAAGTTGGCACAAAAAATGCTTAAAGCTGGC ATCTCTGTTAAACGGGTAACCTGACAATGACTATT TGGGAAATAAGCGAGAAAGCCGATTACATCGCACA GCGGCATCGTCGCCTACAGGACCAGTGGCACATCT ACTGCAATTCGCTGGTTCAGGGGAGAGGAGGAATA AAAAATG 27 B. megaterium xylA X57598 GAATTCTAGAATCTAATATTATAACTAAATTTTCT promoter AAAAAAAACATTGGAATAGACATTTATTTTGTATA TGATGAAATAAAGTTAGTTTATTGGATAAACAAAC TAACTTTATTAAGGTAGTTGATGGATAAACTTGTT CACTTAAATCAACCCGGGAACAAGGAGGAATAAAA AATG 28 E. coli pRIL section GGATCCCCGTCACCCTGGATGCTGTACAATTGACG ACGACAAGGGCCCGGGCAAACTAGTAATCAGACGC GGTCGTTCACTTGTTCAGCAACCAGATCAAAAGCC ATTGACTCAGCAAGGGTTGACCGTATAATTCACGC GATTACACCGCATTGCGGTATCAACGCGCCCTTAG CTCAGTTGGATAGAGCAACGACCTTCTAAGTCGTG GGCCGCAGGTTCGAATCCTGCAGGGCGCGCCATTA CAATTCAATCAGTTACGCCTTCTTTATATCCTCCA GCCATGGCCTTGAAATGGCGTTAGTCATGAAATAT AGACCGCCATCGAGTACCCCTTGTACCCTTAACTC TTCCTGATACGTAAATAATGATTTGGTGGCCCTTG CTGGACTTGAACCAGCGACCAAGCGATTATGAGTC GCCTGCTCTAACCACTGAGCTAAAGGGCCTTGAGT GTGCAATAACAATACTTATAAACCACGCAATAAAC ATGATGATCTAGAGAATCCCGTCGTAGCCACCATC TTTTTTTGCGGGAGTGGCGAAATTGGTAGACGCAC CAGATTTAGGTTCTGGCGCCGCTAGGTGTGCGAGT TCAAGTCTCGCCTCCCGCACCATTCACCAGAAAGC GTTGATCGGATGCCCTCGAGTCGGGCAGCGTTGGG TCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTC GTTGAGGACCCGGCTAGGCTGGCGGGGTTGCCTTA CTGGTTAGCAGAATGAATCACCGATACGCGAGCGA ACGTGAAGCGACTGCTGCTGCAAAACGTCTGCGAC CTGAGCTC * TheE. coli hya promoter, including the ATG protein translation initiation site is indicated in boldface in the table. The region immediately after includes ggt (encoding a Glycine)/catcaccatcatcaccac(6x His tag) / ggctcgatcacaagttt gtacaaaaaagcaggctca (Gateway attB1 site, encoding GSITSLYKKAGS)/gaaaacct gtattttcaggga (encoding TEV protease recognition site: ENLYFQG, TEV protease cut between Q and G)/gga, encoding another Glycine (SEQ ID NO: 50). At the asterisk, P. furiosus genes are cloned without a start codon to create a fusion protein MGHHHHHHGSITSLYKKAGSENLYFQGG-Pfu target gene (MGHHHHHHGSITSLYKKAGSENLYFQGG, SEQ ID NO: 51).
TABLE-US-00003 TABLE 3 Complete list of vectors constructed. Plasmids Constructed plasmid promoter gene Antibiotics pHA-BC hya 0894-hybC Amp pHA-CS hya 0894-CS Amp pET-CAG Gateway plasmid, with promoter P-hya , Ampicillin resistant, pET-CXG Gateway plasmid, with promoter P-xylA, Ampicillin resistant, pEA-SH1 hya 0891-0894 Amp pDEST-C11 T7 promoter, Gateway plasmid, from pDEST-C1, Streptomycin resistant pDEST- hya, Gateway plasmid, from pDEST-C1, Streptomycin resistant C11A pDEST- hya PF0615- Sm C11A- 0617 hypABI pC11A- hya PF0548- Sm CDABI 0549-0615- 0616-0617 pDEST-C3A Gateway plasmid with P-hya promoter in front of Gateway cassette, Chloramphenicol resistant pDEST-C3X Gateway plasmid with P-xylA promoter in front of Gateway cassette, Chloramphenicol resistant pDEST-C3- T7 PF0891- Cm SH1 0894 pDEST- hya PF0891- Cm C3A-SH1 0894 pDEST- hya lacZ Cm C3A-lacZ pDEST- P-xylA lacZ Cm C3X-lacZ pDEST- C3AR derivative of plasmid pDEST-C3A, in Which RIL fragment inserted pC3A-slyD hya PF1401 Cm pC3AR-slyD hya PF1401 Cm pRSF-CAG Gateway plasmid, sequencing confirmed, Kanamycin resistant, done by JS pRSF-CXG pRA-hypE hya PF0604 Kan pRA-hypF hya PF0559 Kan pRA-EF hya PF0604- Kan 0559 pDONR/zeo- PF0617 Zeo hycl pDONR/zeo- PF0548- Zeo hypCD-ABI 0549/0615- 0617 pDONR/zeo- PF0604/0559 Zeo hypEF pDON R/zeo- PF1401 Zeo slyD pDONR/zeo- E. coli lacZ N- Zeo lacZ terminal sequence pDONR/zeo- PF0548- Zeo hypCD 0549 pDONR/zeo- PF0604 Zeo hypE pDONR/zeo- PF0559 Zeo hypF Amp, ampicillin resistance marker; Sm, streptomycin/spectinomycin resistance marker; Cm, chloramphenicol resistance marker; Kan, kanamycin resistance marker; Zeo, zeocin resistance marker.
TABLE-US-00004 TABLE 4 Compatible anaerobic expression vectors utilized to express functional P. litriosus cytoplasmic hydrogenase1 in E. coli. Antibiotic P. furiosus Parent Resistance P. furiosus gene Vector Vector marker gene products number6 pC11A- pDEST-C12 Strepto-mycinR HypCDAB PF0548, CDABI6 HycI PF0549, PF0615- 0617 pC3AR-slyD1 pDEST-C33 Chloram- SlyD PF1401 phenicolR pEA-SH1 pET23(+)4 AmpicillinR Hydrogenase I PF0891- PF0894 pRA-EF7 pRSFDuet-15 KanamycinR HypEF PF0604 PF0559 1Also includes the region (SEQ ID NO: 28, see Table 2) of the Stratagene (La Jolla, CA) helper plasmid pRIL BL21-CodonPlus+10 +200 (DE3)-RIL competent cells, catalog number 230245. This strain carries the pRIL plasmid which expresses transfer RNAs that are rare in E. coli. 2Horanyi et al., (U.S. patent application 20060183193) 3Horanyi et al., (U.S. patent application 20060183193) 4EMD Chemicals Inc., Catalog Number 69771-3. 5EMD Chemicals Inc., Catalog Number 71341. 6An artificial intergenic sequence was introduced between the hypD and hypA coding regions to create a Shine-Dalgarno ribosome binding site for hypA. CD-ABI intergenic sequence: gaggtggaaa (SEQ ID NO: 52), there was an artificial Shine-dalgarno sequence (aggaggtg) in front of hypA gene. hypD+3 s expression stops at TAG, while hypA starts with ATG: (hypD-tttacaaatatggcgccctgatgtaggaggtggaaaATGcacgaatgggcgttggcagatgcaatagt- aagg-hypA) (tttacaaatatggcgccctgatgtaggaggtggaaaATGcacgaatgggcgttggcagatgcaatagtaagg, SEQ ID NO: 53). 7An artificial intergenic sequence was introduced between the hypE and hypF coding regions to create a Shine-Dalgarno ribosome binding site for hypF. The hypE-hypF intergenic sequence is still gaggtggaaa (SEQ ID NO: 52), there was an same artificial Shine- dalgarno sequence (aggaggtg) in front of hypF gene. hypE+3 s expression stops at tag, while hypF starts with ATG: hypE-gtgatcccgttcctagagtttgttaggaggtggaaaATGatctgggggagagaatgaaagcttatagaa- ttcacg-hypF (gtgatcccgttcctagagtttgttaggaggtggaaaATGatctgggggagagaatgaaagatatagaattcac- g; SEQ ID NO: 54).
[0100]In addition, one of the vectors, pC3AR-slyD (Table 3) has been further modified to include a region (SEQ ID NO: 28) of the Stratagene (La Jolla, Calif.) helper plasmid pRIL. This plasmid was purified from E. coli BL21-CodonPlus cells from Stratagene (La Jolla, Calif. catalog #230240). This overexpresses transfer RNAs that are rare in E. coli but are required for efficient expression of P. furiosus proteins due to differences in codon usage between the two organisms. This eliminates the need for yet another vector (containing pRIL) and yet another antibiotic resistance marker. The following sequence was amplified from pRIL by PCR, and inserted into pDEST-C3A to create destination plasmid pC3A-RIL, which was used to make expression plasmid pC3AR-slyD (ggatccccgtcaccctggatgctgtacaattgacgacgacaagggcccgggcaaactagtaatcagac gcggtcgttcacttgacagcaaccagatcaaaagccattgactcagcaagggagaccgtataattcacg cgattacaccgcattgcggtatcaacgcgccatagctcagttggatagagcaacgaccactaagtcgtg ggccgcaggttcgaatcctgcagggcgcgccattacaattcaatcagttacgccttctttatatcctccagc catggccttgaaatggcgttagtcatgaaatatagaccgccatcgagtaccccttgtaccataactcttcct gatacgtaaataatgatttggtggccatgctggacttgaaccagcgaccaagcgattatgagtcgcctgc tctaaccactgagctaaagggccttgagtgtgcaataacaatacttataaaccacgcaataaacatgatga tctagagaatcccgtcgtagccaccatctttnttgcgggagtggcgaaattggtagacgcaccagatttag gttctggcgccgctaggtgtgcgagttcaagtctcgcctcccgcaccattcaccagaaagcgttgatcgg atgccctcgagtcgggcagcgttgggtcctggccacgggtgcgcatgatcgtgacctgtcgttgagga cccggctaggctggcggggttgccttactggttagcagaatgaatcaccgatacgcgagcgaacgtgaa gcgactgctgctgcaaaacgtctgcgacctgagctc; SEQ ID NO:55). If all four vectors are used, there are seven possible cloning sites available, four Gateway® recombination sites (Invitrogen, Carlsbad, Calif.) under control of four different anaerobic promoters, and three standard multiple cloning sites (under standard T7 promoter control), as these are derived from the Novagen Duet system vectors (EMD Chemicals, San Diego, Calif.), with the exception of pEA-SHI, which was derived from pET23, also from Novagen but not part of the Duet system of vectors. However, as many as five consecutive genes can be cloned in tandem under control of the P-hya promoter (plasmid pC11A-CDABI), and all were expressed as demonstrated by quantitative PCR, as described below. This means as many as twenty genes can potentially be coexpressed anaerobically using these compatible vectors and potentially more. Herein we used all four vectors to express 12 genes from P. furiosus. In each construct, a single gene, or the first gene (at the 5' end) of any group of genes had a poly His-tag which is cleavable with TEV protease.
Example 2
Growth of Recombinant E. Coli and Production of Recombinant P. Furiosus Hydrogenase
[0101]The E. coli strain used for expression of the P. furiosus hydrogenase was MW1001, a derivative of the strain BW25113. This strain has the genotype (hyaB hybC hycE Δkan; defective in LSU of hydrogenases 1, 2, and 3, no antibiotic marker)m and lacks detectable E. coli hydrogenase activity (Maeda et al. 2007. BMC Biotechnol 7:25).
[0102]To obtain the recombinant form of P. furiosus cytoplasmic hydrogenase I, recombinant E. coli cells containing the four vectors (Table 4) were grown on an 8 L scale at 37° C. in 2×YT media (16 g Tryptone, 10 g Yeast Extract, 5 g NaCl) supplemented with 25 μM NiCl2, 100 μM FeCl3, 2 mM MgSO4 and the antibiotics Ampicillin (50 μg/ml), Chloramphenicol (16.5, μg/ml), Streptomycin (25 μg/ml) and Kanamycin (25 μg/ml). Cloning the complete. P. furiosus SHI operon in E. coli resulted in low efficiency of transformation; however, all techniques used for cloning and transformations were standard molecular biology techniques as described (Sambrook et al., J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), and transformants were obtained. The culture was sparged with sterile, compressed air (3-5 L/min) until an OD600 of ˜0.3 was reached. At this time compressed air was turned off and the cells were sparged with sterile argon (˜4 L/min) and 2% glucose and 30 mM sodium formate were added to supplement growth and induce hydrogenase-related genes in E. coli. The culture was allowed to ferment for five hours and the cells were then quickly harvested by centrifugation and frozen at -80° C. Frozen cells were then thawed and lysed at 25° C. in anaerobic 50 mM Tris buffer pH 8.0, 2 mM sodium dithionite, 0.5 mg/mL lysozyme, 50 μg/mL DNase at a ratio of 1 g/3 mL in an anaerobic chamber under an atmosphere of 5% hydrogen/95% argon overnight.
[0103]A hydrogen evolution assay was used to measure hydrogenase activity using an artificial (methyl viologen) electron carrier with sodium dithionite as the electron donor as described (Ma and Adams. 2001. Methods Enzymol 331:208-16). Briefly, this was carried out using 5 mL stoppered vials containing 2 mL of anaerobic 100 mM EPPS buffer pH 8.4, 10 mM sodium dithionite, and 1 mM Methyl Viologen under an atmosphere of argon. Vials were preheated at 80° C. for 1 min and then 200 μL of sample was injected. Samples (100 μL) of the headspace of the sealed vial were removed with a gas-tight syringe and injected into a gas chromatograph after the reaction had proceeded for 6 min. The resulting hydrogen peak was compared to a known standard curve to calculate micromoles of hydrogen produced per mL of assay solution. Specific activity is defined as micromoles H2 produced min-1 mg protein-1. After cell lysis the following samples were analyzed for hydrogen evolution at 80° C.: Whole cell extracts (WCEs), the cytoplasmic extract after a 100,000×g centrifugation (S100), and heat-treated (at 80° C. for 30 min) and re-centrifuged S100. The data are summarized in Table 5.
TABLE-US-00005 TABLE 5 MV-linked H2-evolving activity of recombinant P. furiosus cytoplasmic hydrogenase I. Total Specific Total Specific Step Units Activity Units Activity BW251131 MW10012 WCE 891 2.7 ND4 ND4 S100 2 0.02 ND4 ND4 80° C. ND4 ND4 ND4 ND4 treated S100 MW1001 + SHI5 MW1001 + SHI + Pf Plasmids6 WCE ND4 ND4 2.9 0.008 S100 ND4 ND4 3.8 0.04 80° C. ND4 ND4 4.9 0.31 treated S100 1Obtained from T.K. Wood, Texas A&M University, College Station, TX. 2See reference (Maeda et al. 2007. Appl Microbiol Biotechnol 76: 1035-1042). 3Specific activity is defined as μmol H2 produced min-1 mg protein-1. 4Not detected (below detection limit of 0.017 Units (measured with 0.5 mg protein after 2 minutes). 5Contains one plasmid expressing the four structural genes that encode P. furiosus hydrogenase: pEA-SH1 (PF0891-0894). 6Contains all four plasmids expressing P. furiosus hydrogenase genes including structural and processing genes: pEA-SH1 (PF0891-0894), pC11A-CDABI (PF0548-0549, PF0615-0617), pRA-EF (PF0604, PF0559), pC3AR-slyD (PF1401).
[0104]The data clearly demonstrate H2 evolution from cells expressing the genes encoding P. furiosus hydrogenase, with no detectable H2 produced by the control strain lacking any gene from P. furiosus. The form of the P. furiosus enzyme responsible for this activity was not only stable at 80° C. for 30 min, but it was activated by this heat treatment, a step that also precipitates heat-labile E. coli proteins. This increase was unexpected and, at 28%, significant. Production of protein corresponding to the catalytic subunit of hydrogenase I (encoded by PF0894) has been confirmed by immunoanalyis (FIG. 5). In addition, expression of the P. furiosus genes in E. coli using these constructs at the level of mRNA has been confirmed by quantitative PCR (FIG. 6). In comparison to the natively purified P. furiosus hydrogenase, FIG. 9 demonstrates that the MV-linked H2 evolution activity was virtually identical. The expression of coding regions PF0891-0894 resulted in a his-tag present at the amino terminal end of the polypeptide encoded by PF0891, the beta subunit. This tag did not result in a hydrogenase polypeptide that could be affinity purified; however, the hydrogenase polypeptide was active, suggesting the hydrogenase polypeptide is permissive for mutations.
[0105]We have therefore demonstrated that heterologous gene expression of the hydrogenase was achieved in E. coli. This was shown by analysis of cell-extracts for mRNA (by PCR) and for protein (by western blot) and that this gene expression leads to the production of a functional recombinant hydrogenase that is catalytically active at 80° C. (by hydrogen production measurements) and is also heat stable at 80° C. (for at least 30 min).
Example 3
Production of Hydrogenase by E. Coli
[0106]The ability of E. coli containing the four compatible vectors, termed strain MW/rSHI-C, to produce the recombinant hydrogenase was investigated throughout the growth phase (FIG. 10). The strain was grown on an 8-liter scale in carboys in 2×YT growth media (16 g tryptone, 10 g yeast extract and 5 g NaCl per liter) supplemented with 1% glucose, 2 mM MgSO4, Amp (50 μg/ml), Cm (16 μg/ml), Sm (25 μg/ml) and Kan (25 μg/mL), see Table 4. FIG. 10 summarizes the results from two separate cultures (one indicated by circles, one by triangles). At an OD600 of 0.2-0.3, 100 μM FeCl3 and 25 μM NiSO4 were added, the culture was then sealed and allowed to ferment anaerobically (indicated by the arrow in FIG. 10). The growth curves are shown by solid symbols. Samples of the culture were taken every hour after the anaerobic switch. The cells were harvested by centrifugation, lysed, and analyzed for MV linked hydrogenase activity at 80° C. (shown by open symbols). The results show that hydrogenase activity is not detected in E. coli MW/rSHI-C until the cells are switched to anaerobic growth, which is expected since expression of the P. furiosus genes is induced by the so-called anaerobic hya promoter. FIG. 10 also shows that the amount of 80° C. hydrogenase activity, and thus production of the recombinant hydrogenase, increases with cell growth until late stationary phase.
[0107]Cell yields of recombinant E. coli MW/rSHI-C approached 1 gram (wet weight)/liter when grown on the 8-liter scale in carboys. We also demonstrated that the same strain could be grown to extremely high cell densities under anaerobic conditions and under such conditions produced the recombinant hydrogenase, as measured by hydrogenase activity at 80° C. Cells were grown in a 5-liter controlled fermentation system (New Brunswick) on same medium that was used in the carboys but with controlled a) pH (6.5), b) dissolved oxygen, and c) glucose concentration. As shown in FIG. 11, cells were grown to an OD600 of 38 before switching to anaerobic conditions, in this case by replacing the air with Argon, and this induced the production of the recombinant hydrogenase activity to approximately the same level as in the 8-liter carboy cultures (˜0.1 unit/mg before heat treatment). The cell yield in this case was ˜40 gram (wet weight)/liter.
Example 4
Purification of Hydrogenase
[0108]A method for purifying the recombinant hydrogenase was developed that enabled confirmation of the production of the recombinant forms of all four of the protein subunits of P. furiosus hydrogenase. The scheme is summarized in FIG. 12, and involves two standard column chromatography steps using DEAE-Sepharose and Phenyl Sepharose (GE Healthcare). In brief, the E. coli cells (154 gram, wet weight) were broken by thawing them in 3 mL of anaerobic 50 mM Tris, pH 8.0 (3 mL per gram of frozen cells) containing 0.5 mg/mL lysozyme, 50 μg/mL DNase, 1 mM phenylmethylsulfonyl fluoride, and 2 mM sodium dithionite. The suspension was incubated at room temperature in an anaerobic chamber under an atmosphere of 5% H2/95% Ar for 4 hours to allow the cells to break. The sample was then sealed in an anaerobic flask and heat-treated at 80° C. for 30 min by immersion of the flask in a hot water bath. Samples were then anaerobically centrifuged at 100,000×g for 30 min. The supernatant (650 mls) was then diluted 5-fold with Buffer A (50 mM Tris, 2 mM sodium dithionite, pH 8.0) at a sample/Buffer A ratio and loaded onto a column of DEAE Sepharose (300 ml; GE Healthcare) equilibrated in Buffer A. The column was then washed with 5 column volumes of Buffer A and eluted with a 20-column volume gradient from 0 to 25% gradient of Buffer B (Buffer A+2M NaCl) in 40 ml fractions. Those that contained hydrogenase activity in the standard assay (at 80° C. using reduced methyl viologen as the electron donor) were combined and Buffer A containing 2.0 M ammonium sulfate (NH4)2SO4 was added to a final concentration of 0.8 M. The sample was then loaded on to a column of Phenyl Sepharose (45 ml) equilibrated in Buffer C (Buffer A containing 0.8M (NH4)2SO4). The column was washed with 5-column volumes of Buffer C and eluted with a 20 column volume gradient from 100% Buffer C to 100% Buffer A in 10 ml fractions. Those containing hydrogenase activity were combined.
[0109]Typical results of this two-column purification are shown in Table 6. The enzyme was purified almost 60-fold, about 20% of the total activity was recovered with a specific activity in the standard 80° C. assay of 6 units/mg. SDS gel analysis of the hydrogenase active fractions obtained at the different purification steps is shown in FIG. 13. The most purified fractions (the PS Pool from the Phenyl Sepharose column) contain six or so major bands on SDS gels. Analysis of the bands that migrated at the expected molecular weights for the four subunits of the recombinant hydrogenase (see FIG. 11) by standard tryptic digestion/mass spectrometry (MALDI) confirmed unambiguously that those were the four subunits of the P. furiosus hydrogenase enzyme.
TABLE-US-00006 TABLE 6 Isolation of recombinant hydrogenase. Total Total Unitsa Protein Specific % Fold Step (μmol min-1) (mg) Activity Yield Purification Cell Lysate 1349 13059 0.1 100 1 S100 (after 1380 1231 1 102 11 80° C./30 min) DEAE 640 301 2 47 21 Sepharose Phenyl 239 41 6 18 56 Sepharose aHydrogenase activity was measured at 80° C. using reduced MV as the electron donor. One unit of activity is equivalent to the production of 1 μmole of hydrogen per minute.
Example 5
Purification of Hydrogenase
[0110]A method to obtain highly purified preparations of the hydrogenase that are near homogeneous was devised. This involves two subsequent steps of conventional column chromatography. In brief, the PS Pool (see Table 6) was concentrated by ultrafiltration (Amicon, PM-30 membrane), and applied to a column of Sepharcryl S-200 (GE Healthcare) equilibrated with Buffer A. The same buffer was used to elute the column. Fractions that contained hydrogenase activity in the standard assay were combined and applied directly to a column of Hydroxyapatite (Life Science Research, Hercules, Calif.) equilibrated in Buffer A. The column was washed with 5 column volumes of Buffer A and eluted with a 20-column volume gradient from 0 to 50% gradient of Buffer D (Buffer A+0.5 M potassium phosphate). Samples containing hydrogenase activity were combined. As shown in FIG. 14, the fractions from the Hydroxyapatite column contain highly purified hydrogenase containing four major proteins. These corresponded to the protein bands found in the native hydrogenase purified from P. furiosus. The four protein bands in the purified recombinant hydrogenase were unambiguously shown by tryptic digest/MADI analysis to correspond to the four subunits of the recombinant form of P. furiosus hydrogenase. In addition, the hydrogenase activity from the Sephacryl S-200 column eluted a single band with a molecular weight of approximately 150,000, showing that it was a homogeneous species whose size corresponds to that of the native enzyme, which consists of a heterotetramer of four different polypeptides (see FIG. 14).
Example 6
Metal Analysis
[0111]The purified recombinant hydrogenase has hydrogen-evolving activity and must therefore contain a nickel-iron catalytic site. This is demonstrated by a metal analysis of the fractions eluting from the Phenyl Sepharose column using the technique of ICP-MS (Model 7500ce, Agilent Technologies). As shown in FIG. 15, fractions that contained hydrogenase activity also contained both nickel and iron. Moreover, the Fe:Ni ratio was approximately 20, which is almost identical to the value (Fe:Ni=19) proposed to be in the native P. furiosus enzyme (see proposed cofactor content in FIG. 14). Therefore, the recombinant hydrogenase has the expected metal content, consistent with a fully functional enzyme.
[0112]FIG. 15 shows a major additional peak of nickel that is not associated with the enzyme. We propose that this nickel is not inserted into the hydrogenase protein because of a limiting growth factor for hydrogenase biosynthesis in E. coli, but that this would occur when E. coli is grown under the appropriate conditions. As an example, nickel may not be processed completely due to the availability of the cyanide and carbon monoxide ligands that are coordinated to the nickel-iron catalytic site. Others have shown that carbamoyl phosphate is the source of the cyanide (Paschos et al. 2001. FEBS Lett 488:9-12). E. coli cells deficient in carbamoyl phosphate (CP) synthesis (by lesion the carAB locus) lose the ability to synthesize active hydrogenase enzymes (Blokesch and Bock. 2002. Journal of Molecular Biology 324:287-296). It was shown that the ΔcarAB strain contained a stable HypC-HypD complex but that processing of hydrogenase does not occur. The complex disappeared and processing and hydrogenase production was restored when a source of CP (L-citrulline) was added to the E. coli growth media. It is anticipated that the addition of this or similar sources of key nutrients will dramatically increase the yield of active recombinant P. furiosus hydrogenase produced in E. coli.
Example 7
Temperature and Oxygen Sensitivity and Electron Donor Specificity of Recombinant Hydrogenase
[0113]Purified recombinant hydrogenase is as stable to incubation at high temperature (90° C.) and as sensitive to oxygen as the native form of the enzyme purifed from P. furiosus native biomass. For example, as shown in FIG. 16, the thermal stability of purified recombinant hydrogenase (7.5 mg/ml) and the native hydrogenase (0.4 mg/ml) were analyzed by incubating samples anaerobically under Argon in 100 mM EPPS buffer, pH 8.4, containing 2 mM sodium dithionite in a sealed 8-ml serum vials in a 90° C. water bath. Samples were analyzed for 80° C. MV linked hydrogen evolution activity periodically during the incubation. Both enzyme preparations showed an initial activation to over 150% of the initial activity, as originally reported with the native enzyme (Bryant and Adams, 1989. 1989. J Biol Chem 264:5070-5079). Moreover, the recombinant enzyme continued to exhibit an activity above 150% of the initial value even after 11 hours at 90° C., while that of native enzyme decreased (FIG. 16). However, such stability is dependent upon the protein concentration and increases as the concentration increases. Given the 37-fold higher protein concentration of the recombinant enzyme, it can be concluded that the stabilities of the two forms are comparable.
[0114]FIG. 17 shows the results of incubating the purified recombinant hydrogenase (7.5 mg/ml) and the native hydrogenase (0.4 mg/ml) in 100 mM EPPS buffer, pH 8.4, in 8-ml serum vials at room temperature that were exposed at zero time to 20% oxygen (air). The sensitivities of the two forms to oxygen, a property that is not dependent upon protein concentration, was virtually identical.
[0115]The recombinant hydrogenase, like the native enzyme, is also able to use NADPH as an electron donor for hydrogen production at 80° C. As shown in Table 7, the two forms exhibit between 3 and 12% of the activity with MV as the electron donor when it is replaced by NADPH (1 mM) under the same assay conditions. The activity, oxygen and thermal stability data, summarized in Table 7, indicate that the structural and catalytic integrity of the recombinant hydrogenase is comparable to that of the native enzyme.
TABLE-US-00007 TABLE 7 Subunit Structure and Electron Donor Specificity of Native and Recombinant Forms of Hydrogenase Stability Stability MV- NADPH- at in Linked linked Ratio 90° C. Air (t1/2, Enzyme Type (units/mg) (units/mg) (%) (t1/2, hr) hour) Native hydrogenase 109 12.7 12 7 >12 (from P. furiosus biomass) Recombinant 5.7 0.15 3 >12 6 Hydrogenase (αβγδ)a Dimeric 0.4 0 -- ~1 ~1 Recombinant Hydrogenase (αδ)b Activities were measured using either 1 mM MV or 1 mM NADPH as the electron donor at 80° C. The stability values for the native and recombinant (αβγδ) enzymes are estimates from FIG. 17. The data used to estimate the values for the dimeric form (αδ) is not shown. aThe form of the tetrameric recombinant hydrogenase (αβγδ) used in this experiment was obtained after two chromatography steps (see Table 6). bThe form of the dimeric recombinant hydrogenase (αδ) used in this experiment was after the cell-free extract was clarified by centrifugation (the S-100 fraction). The dimeric form of the hydrogenase is described below.
Example 8
Production of a Dimeric Hydrogenase
[0116]The ability to generate the recombinant form of the hydrogenase opens up a complete spectrum of possibilities to produce mutant forms with very different properties from that of the native form. For example, FIG. 18 shows the proposed electron pathway from NADPH through the four subunits of the enzyme and the electron-carrying cofactors (FAD and then multiple [2Fe-2S] and [4Fe-4S] clusters) to the NiFe catalytic site, which catalyzes hydrogen (H2) production. It is assumed that the artificial electron carrier, MV, can donate electrons directly to one or more of the [2Fe-2S] and [4Fe-4S] clusters directly, by-passing the FAD, see FIG. 18. Consequently, the native heterotetrameric (αβγδ) enzyme produced from 4 genes (PF0891-PF0894) evolves hydrogen from both MV and NADPH (Table 7). However, as shown in FIG. 19, a heterodimeric (αδ) enzyme produced by expression of only PF0893 and PF0894 would lack the proposed NADPH-interacting and FAD-containing γ-subunit (PF0892). This dimeric form would not be expected to evolve hydrogen from NADPH, but may from MV (FIG. 19).
[0117]To test this idea and to generate the first mutant form of recombinant P. furiosus hydrogenase, a plasmid, pEA-0893-0894, was constructed that contained only two of the four hydrogenase subunits encoded by PF0893 and PF0894 (FIG. 20). This was based on the plasmid that contains the four genes that encode all four subunits (pEA-SH1, FIG. 8); however, the P-hya promoter in this plasmid did not include the sequences encoding a his-tag. The dimeric (αδ) recombinant enzyme was produced in E. coli strain MW1001 under the same anaerobic expression conditions that were used to produce the recombinant heterotetrameric (αβγδ) enzyme (see FIG. 10) except that pEA-SH1 plasmid was replaced by the pEA-0893-0894 plasmid and that the culture was grown in a 1-liter flask rather than an 8-liter carboy. The recombinant cells (1.5 grams wet weight) were harvested by centrifugation and were lysed by resuspending them in 3 mls (per gram wet weight of cells) of anaerobic 50 mM Tris, pH 8.0, containing 0.5 mg/mL lysozyme, 50 ug/mL DNase, 1 mM phenylmethylsulfonyl fluoride, and 2 mM sodium dithionite. Samples were lysed by incubation at room temperature in an anaerobic chamber under an atmosphere of 5% H2/95% Ar for 4 hours. The protein content of the cell-free extract was 8.9 mg/mL as determined by the standard protein assay and 5.2 units of hydrogenase activity measured using MV as the electron donor at 80° C. The specific activity was 0.078 U/mg, which is comparable to that obtained with the tetrameric (αβγδ) recombinant enzyme (Table 6). However, as indicated in Table 7, the dimeric (αδ) recombinant form had no detectable hydrogen production activity using NADPH (1 mM) as the electron donor, as was predicted (FIG. 19). Also, the structural as well as the catalytic integrity of the recombinant dimeric hydrogenase differed from that of both the recombinant and native forms of tetrameric holoenzyme. As shown in Table 7, the dimeric form was much more sensitive to oxygen and was much less stable at 90° C. However, the fact that this mutated form of the enzyme containing only two subunits still had an approximate half-life at 90° C. of 1 hour shows the great advantage of using a hyperthermophilic enzyme as the starting material for any manipulation of enzyme structure. The resulting protein was expected to be considerably less stable than its native counterpart, but the extreme stability of the native means that an `unstable` form can still retain remarkably stability and activity, relative to conventional enzymes found in organisms growing at conventional temperatures. Moreover, with the demonstration here of an extremely stable dimeric mutant form with catalytic properties, the means to generate a wide variety of mutant forms, for example, with various tags for purification and immobilization, is now possible.
[0118]In summary, a series of four compatible vectors have been constructed that will express a functional hydrogenase in E. coli. It was shown that recombinant hydrogenase was produced when cells were switched to anaerobic growth and that the amount of the enzyme produced increased with cell growth until late stationary phase. Recombinant hydrogenase was also produced in recombinant E. coli cells grown to exceedingly high densities (OD˜40). A method for purifying the recombinant hydrogenase to a high level of purity is described, and analysis of the protein components of the recombinant enzyme by a standard mass spectrometry technique established unambiguously that it contained the four hydrogenase subunits encoded by the four cloned genes that were heterologously expressed. It was also demonstrated that the recombinant enzyme has approximately the same molecular weight (˜150 kDa) and metal content (20 Fe:1 Ni) as the native enzyme purified from P. furiosus biomass, it is similarly stable to high temperature (half life at 90° C. of ˜12 hr) and sensitive to inactivation by oxygen (half life of ˜6 hr in air) and, like the native enzyme, uses NADPH as an electron donor for hydrogen production at 80° C. The ability to generate mutant or modified forms of the hydrogenase was demonstrated by the production of a heterodimer form containing two subunits rather than the four subunits of the heterotetrameric enzyme. The dimeric form was still catalyitically active at 80° C. with the artificial electron donor MV, but it did not use NADPH as an electron donor. The dimeric form was still very thermostable (half-life at 90° C. of ˜1 hr). This demonstrates the great advantage of using a hyperthermophilic enzyme as the starting material for any manipulation of enzyme structure.
[0119]The complete disclosure of all patents, patent applications, and publications, and electronically available material (including, for instance, nucleotide sequence submissions in, e.g., GenBank and RefSeq, and amino acid sequence submissions in, e.g., SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq) cited herein are incorporated by reference. In the event that any inconsistency exists between the disclosure of the present application and the disclosure(s) of any document incorporated herein by reference, the disclosure of the present application shall govern. The foregoing detailed description and examples have been given for clarity of understanding only. No unnecessary limitations are to be understood therefrom. The invention is not limited to the exact details shown and described, for variations obvious to one skilled in the art will be included within the invention defined by the claims.
[0120]Unless otherwise indicated, all numbers expressing quantities of components, molecular weights, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about." Accordingly, unless otherwise indicated to the contrary, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the present invention. At the very least, and not as an attempt to limit the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques.
[0121]Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. All numerical values, however, inherently contain a range necessarily resulting from the standard deviation found in their respective testing measurements.
[0122]All headings are for the convenience of the reader and should not be used to limit the meaning of the text that follows the heading, unless so specified.
Sequence CWU
1
5911104DNAPyrococcus furiosus 1gtgaggtatg ttaagttacc caaggaaaac acttacgagt
ttttggaaag acttaaagac 60tgggggaagc tttacgctcc agtaaaaatt tcggacaagt
tctatgactt cagggagatt 120gatgatgtta gaaagataga attccactac aacaggacaa
taatgccacc taagaagttc 180ttcttcaagc cgagggaaaa gctctttgag ttcgacattt
caaaaccaga atacagggag 240gtaatagagg aagttgaacc atttattata tttggagtcc
acgcgtgtga catatatggc 300ctaaagatcc tagacacggt ataccttgat gagttccccg
acaagtacta caaggtgagg 360agagagaagg ggataatcat tggaataagc tgtatgccag
atgaatattg cttctgtaac 420ttaagagaaa cagacttcgc tgatgatggt tttgacttgt
tcttccatga actgcccgat 480ggatggttgg taagggttgg cactccaact gggcacaggc
ttgttgacaa gaacataaag 540ctctttgaag aggtaacgga caaggatatc tgtgcattta
gagattttga aaagaggaga 600cagcaagcat tcaaatacca cgaagactgg ggcaacttga
ggtatcttct cgagttggaa 660atggaacatc caatgtggga tgaggaggca gataagtgct
tggcttgtgg aatatgtaac 720accacatgcc caacgtgtag atgctatgaa gttcaggata
ttgtaaacct agatggagtt 780actggataca gggaaagaag atgggattct tgtcagttca
gaagtcatgg cttagttgct 840gggggccaca acttcaggcc cacaaagaag gatcgcttta
ggaacagata cctctgtaag 900aacgcatata acgaaaagct tggattaagc tactgtgtcg
gttgtggaag gtgtactgca 960ttctgtccag ccaatataag ttttgtaggc aatcttagaa
ggattttagg acttgaggag 1020aacaaatgtc ccccaacggt tagtgaggag attccaaaga
gaggatttgc atattcctct 1080aacattagag gtgatggagt atga
11042367PRTPyrococcus furiosus 2Met Arg Tyr Val Lys
Leu Pro Lys Glu Asn Thr Tyr Glu Phe Leu Glu1 5
10 15Arg Leu Lys Asp Trp Gly Lys Leu Tyr Ala Pro
Val Lys Ile Ser Asp 20 25
30Lys Phe Tyr Asp Phe Arg Glu Ile Asp Asp Val Arg Lys Ile Glu Phe
35 40 45His Tyr Asn Arg Thr Ile Met Pro
Pro Lys Lys Phe Phe Phe Lys Pro 50 55
60Arg Glu Lys Leu Phe Glu Phe Asp Ile Ser Lys Pro Glu Tyr Arg Glu65
70 75 80Val Ile Glu Glu Val
Glu Pro Phe Ile Ile Phe Gly Val His Ala Cys 85
90 95Asp Ile Tyr Gly Leu Lys Ile Leu Asp Thr Val
Tyr Leu Asp Glu Phe 100 105
110Pro Asp Lys Tyr Tyr Lys Val Arg Arg Glu Lys Gly Ile Ile Ile Gly
115 120 125Ile Ser Cys Met Pro Asp Glu
Tyr Cys Phe Cys Asn Leu Arg Glu Thr 130 135
140Asp Phe Ala Asp Asp Gly Phe Asp Leu Phe Phe His Glu Leu Pro
Asp145 150 155 160Gly Trp
Leu Val Arg Val Gly Thr Pro Thr Gly His Arg Leu Val Asp
165 170 175Lys Asn Ile Lys Leu Phe Glu
Glu Val Thr Asp Lys Asp Ile Cys Ala 180 185
190Phe Arg Asp Phe Glu Lys Arg Arg Gln Gln Ala Phe Lys Tyr
His Glu 195 200 205Asp Trp Gly Asn
Leu Arg Tyr Leu Leu Glu Leu Glu Met Glu His Pro 210
215 220Met Trp Asp Glu Glu Ala Asp Lys Cys Leu Ala Cys
Gly Ile Cys Asn225 230 235
240Thr Thr Cys Pro Thr Cys Arg Cys Tyr Glu Val Gln Asp Ile Val Asn
245 250 255Leu Asp Gly Val Thr
Gly Tyr Arg Glu Arg Arg Trp Asp Ser Cys Gln 260
265 270Phe Arg Ser His Gly Leu Val Ala Gly Gly His Asn
Phe Arg Pro Thr 275 280 285Lys Lys
Asp Arg Phe Arg Asn Arg Tyr Leu Cys Lys Asn Ala Tyr Asn 290
295 300Glu Lys Leu Gly Leu Ser Tyr Cys Val Gly Cys
Gly Arg Cys Thr Ala305 310 315
320Phe Cys Pro Ala Asn Ile Ser Phe Val Gly Asn Leu Arg Arg Ile Leu
325 330 335Gly Leu Glu Glu
Asn Lys Cys Pro Pro Thr Val Ser Glu Glu Ile Pro 340
345 350Lys Arg Gly Phe Ala Tyr Ser Ser Asn Ile Arg
Gly Asp Gly Val 355 360
3653879DNAPyrococcus furiosus 3atgatgttgc caaaagagat tatgatgcca
aatgataatc cgtatgccct tcatagagtc 60aaagttctaa aggtttactc cttgacggaa
acggaaaagc ttttcctctt tagatttgag 120gatcccgagt tggcagagaa gtggacgttc
aaacctggac agtttgtcca gctgacgata 180cctggagttg gagaggttcc cataagtata
tgctcttctc caatgaggaa aggattcttt 240gagctctgta taagaaaggc aggaagggtc
acaactgttg tccatagact aaagcctggc 300gatactgttc ttgtgagagg gccttacggt
aatggattcc cagtggatga gtgggaagga 360atggatctac tattaatagc tgctggcctt
ggaactgcac ctcttaggag cgtctttctc 420tatgcaatgg acaacaggtg gaagtatgga
aacattacct tcataaacac cgcacgttat 480gggaaggatc tcctcttcta caaggagctg
gaggcaatga aagacctagc tgaggctgaa 540aacgtgaaaa tcatccagag cgtcactagg
gatccaaact ggccgggcct aaagggtagg 600ccacagcagt tcatcgttga ggccaacaca
aatccaaaga acactgcagt tgcaatctgt 660gggcctccta gaatgtataa gtcagtgttt
gaggccctca tcaactacgg ttatcgccca 720gagaacatct tcgtgacatt ggagagaaga
atgaaatgtg gaatcgggaa gtgcggccac 780tgcaacgtcg gaacgagcac gagctggaag
tacatctgta aagatggacc agtcttcacg 840tacttcgaca tagtttcaac cccaggactg
ctggactga 8794292PRTPyrococcus furiosus 4Met
Met Leu Pro Lys Glu Ile Met Met Pro Asn Asp Asn Pro Tyr Ala1
5 10 15Leu His Arg Val Lys Val Leu
Lys Val Tyr Ser Leu Thr Glu Thr Glu 20 25
30Lys Leu Phe Leu Phe Arg Phe Glu Asp Pro Glu Leu Ala Glu
Lys Trp 35 40 45Thr Phe Lys Pro
Gly Gln Phe Val Gln Leu Thr Ile Pro Gly Val Gly 50 55
60Glu Val Pro Ile Ser Ile Cys Ser Ser Pro Met Arg Lys
Gly Phe Phe65 70 75
80Glu Leu Cys Ile Arg Lys Ala Gly Arg Val Thr Thr Val Val His Arg
85 90 95Leu Lys Pro Gly Asp Thr
Val Leu Val Arg Gly Pro Tyr Gly Asn Gly 100
105 110Phe Pro Val Asp Glu Trp Glu Gly Met Asp Leu Leu
Leu Ile Ala Ala 115 120 125Gly Leu
Gly Thr Ala Pro Leu Arg Ser Val Phe Leu Tyr Ala Met Asp 130
135 140Asn Arg Trp Lys Tyr Gly Asn Ile Thr Phe Ile
Asn Thr Ala Arg Tyr145 150 155
160Gly Lys Asp Leu Leu Phe Tyr Lys Glu Leu Glu Ala Met Lys Asp Leu
165 170 175Ala Glu Ala Glu
Asn Val Lys Ile Ile Gln Ser Val Thr Arg Asp Pro 180
185 190Asn Trp Pro Gly Leu Lys Gly Arg Pro Gln Gln
Phe Ile Val Glu Ala 195 200 205Asn
Thr Asn Pro Lys Asn Thr Ala Val Ala Ile Cys Gly Pro Pro Arg 210
215 220Met Tyr Lys Ser Val Phe Glu Ala Leu Ile
Asn Tyr Gly Tyr Arg Pro225 230 235
240Glu Asn Ile Phe Val Thr Leu Glu Arg Arg Met Lys Cys Gly Ile
Gly 245 250 255Lys Cys Gly
His Cys Asn Val Gly Thr Ser Thr Ser Trp Lys Tyr Ile 260
265 270Cys Lys Asp Gly Pro Val Phe Thr Tyr Phe
Asp Ile Val Ser Thr Pro 275 280
285Gly Leu Leu Asp 2905786DNAPyrococcus furiosus 5atgggaaaag
ttaggattgg attttacgca ttaacctcgt gctacggctg tcaattgcag 60ctagctatga
tggacgagtt attacaactt atcccaaatg ctgaaatagt ttgctggttc 120atgattgata
gagatagcat tgaggatgaa aaggtcgaca tagcttttat agaaggaagc 180gtttcaactg
aggaagaagt tgaactcgtg aaaaaaatta gggagaatgc aaagatcgtc 240gttgcggttg
gagcttgtgc tgttcaagga ggagttcaga gctggagtga aaagccatta 300gaagagctct
ggaagaaggt ttatggagac gcaaaagtca agttccaacc gaagaaggct 360gaaccagttt
caaaatacat aaaagttgac tacaacatct acggttgccc accagagaag 420aaggacttcc
tctacgccct gggaacattc ttgattggtt catggccaga ggatatagat 480tatccagttt
gtctagaatg taggctcaat ggacatccat gtatccttct tgagaaagga 540gaaccctgtc
taggtccagt aacaagggca ggatgtaacg cgagatgtcc aggatttgga 600gttgcgtgta
taggatgcag aggggcaata gggtacgatg tagcttggtt cgactctcta 660gctaaggtgt
tcaaggagaa ggggatgaca aaagaggaga taattgagag aatgaaaatg 720ttcaatggac
atgatgagag ggttgagaaa atggttgaaa aaatattctc aggtggtgaa 780caatga
7866261PRTPyrococcus furiosus 6Met Gly Lys Val Arg Ile Gly Phe Tyr Ala
Leu Thr Ser Cys Tyr Gly1 5 10
15Cys Gln Leu Gln Leu Ala Met Met Asp Glu Leu Leu Gln Leu Ile Pro
20 25 30Asn Ala Glu Ile Val Cys
Trp Phe Met Ile Asp Arg Asp Ser Ile Glu 35 40
45Asp Glu Lys Val Asp Ile Ala Phe Ile Glu Gly Ser Val Ser
Thr Glu 50 55 60Glu Glu Val Glu Leu
Val Lys Lys Ile Arg Glu Asn Ala Lys Ile Val65 70
75 80Val Ala Val Gly Ala Cys Ala Val Gln Gly
Gly Val Gln Ser Trp Ser 85 90
95Glu Lys Pro Leu Glu Glu Leu Trp Lys Lys Val Tyr Gly Asp Ala Lys
100 105 110Val Lys Phe Gln Pro
Lys Lys Ala Glu Pro Val Ser Lys Tyr Ile Lys 115
120 125Val Asp Tyr Asn Ile Tyr Gly Cys Pro Pro Glu Lys
Lys Asp Phe Leu 130 135 140Tyr Ala Leu
Gly Thr Phe Leu Ile Gly Ser Trp Pro Glu Asp Ile Asp145
150 155 160Tyr Pro Val Cys Leu Glu Cys
Arg Leu Asn Gly His Pro Cys Ile Leu 165
170 175Leu Glu Lys Gly Glu Pro Cys Leu Gly Pro Val Thr
Arg Ala Gly Cys 180 185 190Asn
Ala Arg Cys Pro Gly Phe Gly Val Ala Cys Ile Gly Cys Arg Gly 195
200 205Ala Ile Gly Tyr Asp Val Ala Trp Phe
Asp Ser Leu Ala Lys Val Phe 210 215
220Lys Glu Lys Gly Met Thr Lys Glu Glu Ile Ile Glu Arg Met Lys Met225
230 235 240Phe Asn Gly His
Asp Glu Arg Val Glu Lys Met Val Glu Lys Ile Phe 245
250 255Ser Gly Gly Glu Gln
26071287DNAPyrococcus furiosus 7atgaagaacc tctatcttcc aatcaccatt
gatcatatag caagagttga ggggaagggt 60ggtgtggaga taataattgg ggatgatgga
gtcaaggagg tcaagctaaa cataattgaa 120gggcccagat tctttgaggc cataactatt
gggaagaagc ttgaggaagc tctggccatt 180tacccgagaa tatgctcatt ctgttcagcc
gcccacaagt taaccgcatt agaggctgca 240gaaaaggccg tcggttttgt cccaagggaa
gagatacagg cccttagaga agtactatac 300atcggagaca tgatagagag tcatgccctt
cacctatatc ttctagttct tcccgactac 360aggggctact cgagcccact taagatggtg
aatgaataca agagggagat agagatagcc 420cttaagctga agaaccttgg cacctggatg
atggacattc tagggtcaag agccatacac 480caagaaaatg cggttttggg cggattcgga
aagctccctg agaagagtgt ccttgagaaa 540atgaaagccg agcttaggga agccctacca
cttgccgagt atacttttga gttatttgca 600aagcttgagc agtacagcga agttgaaggg
ccaataacac acttggccgt gaagccgagg 660ggagatgctt atggaattta tggagattac
ataaaggcaa gtgatgggga ggagttccca 720agtgaaaagt acagagatta tataaaggag
ttcgtcgttg aacacagttt tgcaaagcac 780agtcactaca agggcagacc cttcatggtt
ggggctatat ctagagttat taacaatgct 840gacctcctat acggcaaggc caaggagctg
tatgaggcaa acaaagacct attaaaggga 900acaaatccgt ttgcaaataa cttagcccag
gccctcgaaa tagtttactt tatagagagg 960gcaatagatc tgctcgacga ggctctcgcc
aagtggccaa ttaagcccag ggatgaagtt 1020gagataaagg acggctttgg tgtctcaacg
actgaggctc caaggggaat cttagtctat 1080gccctcaaag ttgagaatgg aagggtttct
tatgccgaca taataacacc tacagcattc 1140aacttggcaa tgatggaaga acatgtaaga
atgatggcag aaaagcacta caatgacgat 1200ccagaaaggt taaagatact ggctgagatg
gttgttaggg cttatgatcc atgcatatct 1260tgctcagtcc acgtggttag actttaa
12878428PRTPyrococcus furiosus 8Met Lys
Asn Leu Tyr Leu Pro Ile Thr Ile Asp His Ile Ala Arg Val1 5
10 15Glu Gly Lys Gly Gly Val Glu Ile
Ile Ile Gly Asp Asp Gly Val Lys 20 25
30Glu Val Lys Leu Asn Ile Ile Glu Gly Pro Arg Phe Phe Glu Ala
Ile 35 40 45Thr Ile Gly Lys Lys
Leu Glu Glu Ala Leu Ala Ile Tyr Pro Arg Ile 50 55
60Cys Ser Phe Cys Ser Ala Ala His Lys Leu Thr Ala Leu Glu
Ala Ala65 70 75 80Glu
Lys Ala Val Gly Phe Val Pro Arg Glu Glu Ile Gln Ala Leu Arg
85 90 95Glu Val Leu Tyr Ile Gly Asp
Met Ile Glu Ser His Ala Leu His Leu 100 105
110Tyr Leu Leu Val Leu Pro Asp Tyr Arg Gly Tyr Ser Ser Pro
Leu Lys 115 120 125Met Val Asn Glu
Tyr Lys Arg Glu Ile Glu Ile Ala Leu Lys Leu Lys 130
135 140Asn Leu Gly Thr Trp Met Met Asp Ile Leu Gly Ser
Arg Ala Ile His145 150 155
160Gln Glu Asn Ala Val Leu Gly Gly Phe Gly Lys Leu Pro Glu Lys Ser
165 170 175Val Leu Glu Lys Met
Lys Ala Glu Leu Arg Glu Ala Leu Pro Leu Ala 180
185 190Glu Tyr Thr Phe Glu Leu Phe Ala Lys Leu Glu Gln
Tyr Ser Glu Val 195 200 205Glu Gly
Pro Ile Thr His Leu Ala Val Lys Pro Arg Gly Asp Ala Tyr 210
215 220Gly Ile Tyr Gly Asp Tyr Ile Lys Ala Ser Asp
Gly Glu Glu Phe Pro225 230 235
240Ser Glu Lys Tyr Arg Asp Tyr Ile Lys Glu Phe Val Val Glu His Ser
245 250 255Phe Ala Lys His
Ser His Tyr Lys Gly Arg Pro Phe Met Val Gly Ala 260
265 270Ile Ser Arg Val Ile Asn Asn Ala Asp Leu Leu
Tyr Gly Lys Ala Lys 275 280 285Glu
Leu Tyr Glu Ala Asn Lys Asp Leu Leu Lys Gly Thr Asn Pro Phe 290
295 300Ala Asn Asn Leu Ala Gln Ala Leu Glu Ile
Val Tyr Phe Ile Glu Arg305 310 315
320Ala Ile Asp Leu Leu Asp Glu Ala Leu Ala Lys Trp Pro Ile Lys
Pro 325 330 335Arg Asp Glu
Val Glu Ile Lys Asp Gly Phe Gly Val Ser Thr Thr Glu 340
345 350Ala Pro Arg Gly Ile Leu Val Tyr Ala Leu
Lys Val Glu Asn Gly Arg 355 360
365Val Ser Tyr Ala Asp Ile Ile Thr Pro Thr Ala Phe Asn Leu Ala Met 370
375 380Met Glu Glu His Val Arg Met Met
Ala Glu Lys His Tyr Asn Asp Asp385 390
395 400Pro Glu Arg Leu Lys Ile Leu Ala Glu Met Val Val
Arg Ala Tyr Asp 405 410
415Pro Cys Ile Ser Cys Ser Val His Val Val Arg Leu 420
4259228DNAPyrococcus furiosus 9atgtgccttg caatcccagg gaaagtggtg
gagattaaag gtaacgttgg aatagtggat 60tttggaggaa tacggagaga ggtaaggtta
gatcttttga gtgatgttaa agttggcgat 120tacgttatag ttcacactgg ctttgctata
gaaaagttag atgagaggag agctagagaa 180attcttgaag cctgggaaga agttttctca
gtaattgggg gtgagtaa 2281075PRTPyrococcus furiosus 10Met
Cys Leu Ala Ile Pro Gly Lys Val Val Glu Ile Lys Gly Asn Val1
5 10 15Gly Ile Val Asp Phe Gly Gly
Ile Arg Arg Glu Val Arg Leu Asp Leu 20 25
30Leu Ser Asp Val Lys Val Gly Asp Tyr Val Ile Val His Thr
Gly Phe 35 40 45Ala Ile Glu Lys
Leu Asp Glu Arg Arg Ala Arg Glu Ile Leu Glu Ala 50 55
60Trp Glu Glu Val Phe Ser Val Ile Gly Gly Glu65
70 75111104DNAPyrococcus furiosus 11atgcttgaaa
aatttggaga caaagctgta gctcaaaaga ttttagaaaa aattaaagag 60gaagctaaag
ggatagaaga gctacgattt atgcacgttt gtgggactca tgaggacaca 120gtaactagga
gtggaatcag atcacttctt ccagaaaatg taaaaatcat gagtggccca 180ggatgtcccg
tctgtataac ccccgttgag gacatagtga agatgatgga aattatgaaa 240gttgcgagag
aggagaggga agaaattatt ctcactactt ttggtgacat gtatagaatt 300ccaactccaa
taggaagctt tgcagactta aagagtcagg gttacgatgt gaggatagtt 360tactctatat
acgactccta taaaatagcc aaggaaaatc cagataagct tgtagtgcac 420ttttctcctg
ggtttgagac taccgccgct ccaacagctg gaatgcttga gagcattgtg 480gaagaggggc
tagagaactt taagatttat tccgttcata ggttaacccc tcctgcagtt 540gaagctctcc
taaatgcggg gactgttttt cacggtttaa tagatcctgg tcatgtctct 600acaataattg
gggtgaaagg atgggcgtat ctcacagaaa agtttggaat tcctcaagtt 660gtggctggct
ttgagccagt tgatgtttta ctcggaatac ttattctcat taggcttgtg 720aagaggggcg
aagcgaaaat aatcaacgag tataatagag ttgtaaagtg ggaaggaaat 780gtcaaggccc
aagaactgat ttggaagtac tttgaagtta aagatgcaaa gtggagggcc 840ctaggagtaa
ttccaaggag cggattggaa cttaagaaag agtggaagga gctagaaatt 900agaacttatt
acaatcccga ggttccaaag ctcccagatc ttgaaaaagg atgtctctgt 960ggggcagtcc
ttagaggatt agccttaccg acccagtgcc aacactttgg aaagacatgt 1020acaccaagac
atccggtagg tccttgtatg gtttcgtacg aaggaacttg tcacatattt 1080tacaaatatg
gcgccctgat gtag
110412367PRTPyrococcus furiosus 12Met Leu Glu Lys Phe Gly Asp Lys Ala Val
Ala Gln Lys Ile Leu Glu1 5 10
15Lys Ile Lys Glu Glu Ala Lys Gly Ile Glu Glu Leu Arg Phe Met His
20 25 30Val Cys Gly Thr His Glu
Asp Thr Val Thr Arg Ser Gly Ile Arg Ser 35 40
45Leu Leu Pro Glu Asn Val Lys Ile Met Ser Gly Pro Gly Cys
Pro Val 50 55 60Cys Ile Thr Pro Val
Glu Asp Ile Val Lys Met Met Glu Ile Met Lys65 70
75 80Val Ala Arg Glu Glu Arg Glu Glu Ile Ile
Leu Thr Thr Phe Gly Asp 85 90
95Met Tyr Arg Ile Pro Thr Pro Ile Gly Ser Phe Ala Asp Leu Lys Ser
100 105 110Gln Gly Tyr Asp Val
Arg Ile Val Tyr Ser Ile Tyr Asp Ser Tyr Lys 115
120 125Ile Ala Lys Glu Asn Pro Asp Lys Leu Val Val His
Phe Ser Pro Gly 130 135 140Phe Glu Thr
Thr Ala Ala Pro Thr Ala Gly Met Leu Glu Ser Ile Val145
150 155 160Glu Glu Gly Leu Glu Asn Phe
Lys Ile Tyr Ser Val His Arg Leu Thr 165
170 175Pro Pro Ala Val Glu Ala Leu Leu Asn Ala Gly Thr
Val Phe His Gly 180 185 190Leu
Ile Asp Pro Gly His Val Ser Thr Ile Ile Gly Val Lys Gly Trp 195
200 205Ala Tyr Leu Thr Glu Lys Phe Gly Ile
Pro Gln Val Val Ala Gly Phe 210 215
220Glu Pro Val Asp Val Leu Leu Gly Ile Leu Ile Leu Ile Arg Leu Val225
230 235 240Lys Arg Gly Glu
Ala Lys Ile Ile Asn Glu Tyr Asn Arg Val Val Lys 245
250 255Trp Glu Gly Asn Val Lys Ala Gln Glu Leu
Ile Trp Lys Tyr Phe Glu 260 265
270Val Lys Asp Ala Lys Trp Arg Ala Leu Gly Val Ile Pro Arg Ser Gly
275 280 285Leu Glu Leu Lys Lys Glu Trp
Lys Glu Leu Glu Ile Arg Thr Tyr Tyr 290 295
300Asn Pro Glu Val Pro Lys Leu Pro Asp Leu Glu Lys Gly Cys Leu
Cys305 310 315 320Gly Ala
Val Leu Arg Gly Leu Ala Leu Pro Thr Gln Cys Gln His Phe
325 330 335Gly Lys Thr Cys Thr Pro Arg
His Pro Val Gly Pro Cys Met Val Ser 340 345
350Tyr Glu Gly Thr Cys His Ile Phe Tyr Lys Tyr Gly Ala Leu
Met 355 360 365132340DNAPyrococcus
furiosus 13atgtatctgg gggagagaat gaaagcttat agaattcacg ttcagggaat
agttcaggcc 60gtgggattta ggcccttcgt ttatagaata gctcatgctc acaacttgag
gggatacgtt 120aggaacttag gcgatgctgg agttgaaatt gttgtcgagg gaagggagga
agacatagag 180gcattcatca aggatttata caagaagaaa cccccacttg caaggattga
taaggttgag 240agggaggaaa ttcctcttca gggctttgac agattttaca tagagaaaag
ctcgacggaa 300aagaaggggg agggagattc aataatccct ccggacatag ctatttgtga
ggactgtctt 360agggagttat ttaatccaac tgacaagcgc tacatgtatc ctttcatagt
atgtacaaac 420tgtgggccga ggttcacgat aattgaagat cttccctacg atagggagaa
cacagcgatg 480agagaattcc cgatgtgcga gttctgtagg agtgaatacg aggatcccct
gaataggagg 540tatcatgcag agccggttgc atgtccaact tgtgggccga gctataggct
ttacacgagc 600gatggaaatg agataattgg agaccccctg agaaaggcgg caaaactaat
cgataaggga 660tacatagttg cgataaaggg tataggtgga attcatttgg cctgcgatgc
tacaagagag 720gatgtggtgg ccgagcttag gaagaggatt tttaggcctc agaagccttt
cgccattatg 780gccaaagatt tagaaactgt aaggactttt gcctatattt ctcccgaaga
ggaggaagaa 840ttaacaagct atagaaggcc aatagtggct ttgaagaaga aggagccctt
cccacttccc 900gaaaacctcg ctcctgggct tcacacaatt ggggtaatgc ttccctatgc
tggaacccac 960tacatattat tccactggag caagactcca gtttacgtta tgacttccgc
aaacttccca 1020gggatgccga tgataaagga caatgaagag gcatttgaaa agcttaggga
cgttgctgac 1080tacctcttgc tccacaatag gagaattcca aatagagctg acgatagcgt
tgttcgcttt 1140gtagatggta gaagagctgt tattaggagg agcagaggat ttgttccact
tggaatagag 1200attccatttg agtacaaagg attggcagtt ggtgctgagt taatgaatgc
tttcggagtt 1260gttaagaatg gaaaagttta tccaagtcag tacatagggg atacatcaaa
gattgaagtt 1320ttagagttta tgagggaagc cgtgaggcac ttcttcaaga tattgagagt
tgataactta 1380gatctagttg ttgcagattt gcatccaagc tacaacacaa ctaagctggg
aatggagatc 1440gctgaggaat ttggggcaga attccttcaa gttcaacatc actacgctca
cgtggcctct 1500gtaatggctg agcacaactt ggaggaagtt gttggaattg ctctagatgg
tgttgggtat 1560ggaaccgacg gaaaaacttg gggtggggaa gtaatatatc taagctatga
agatgtggag 1620aggttggccc acatagagta ttatccactc ccaggagggg atttggccag
ctactatccc 1680ttgagggcct taattggaat actcagctta aaccacgact tagaggaagt
tgagaaaatc 1740ataagggagt tctgtccaaa tgcaataaag agcttaaagt atggggaaac
agagtttagg 1800gtaattatga ggcaactcag cagcgggata aacgttgcct atgcctcttc
aacgggaagg 1860gtgcttgatg ccttctcggt acttttgaac gtttcctaca ggaggcacta
tgagggagag 1920cctgcgatga agctggagag ctttgcatac caaggaaaga acgatctaaa
gctcacggct 1980ccaattgaag gtgaggaaat aaaggtttca gagttgtttg aggaagttct
tgagctgatg 2040ggcaaggcca atcctaaaga catagcttac tccgttcact tagccttagc
tagggcattt 2100gctgaagtta gcgtggagaa agctaaggag tttggagcta aaactgtcgt
tttgggtggg 2160ggagtagggt acaatgagct aatagttaag acgataagaa agatagtaga
ggggagaggg 2220ctaaggttct taacaactta cgaagttccc aggggagata atggaattaa
tgtaggccag 2280gccttcctgg gaggattgta cttggaagga tacttaaata gggaagattt
gagcatttag 234014779PRTPyrococcus furiosus 14Met Tyr Leu Gly Glu Arg
Met Lys Ala Tyr Arg Ile His Val Gln Gly1 5
10 15Ile Val Gln Ala Val Gly Phe Arg Pro Phe Val Tyr
Arg Ile Ala His 20 25 30Ala
His Asn Leu Arg Gly Tyr Val Arg Asn Leu Gly Asp Ala Gly Val 35
40 45Glu Ile Val Val Glu Gly Arg Glu Glu
Asp Ile Glu Ala Phe Ile Lys 50 55
60Asp Leu Tyr Lys Lys Lys Pro Pro Leu Ala Arg Ile Asp Lys Val Glu65
70 75 80Arg Glu Glu Ile Pro
Leu Gln Gly Phe Asp Arg Phe Tyr Ile Glu Lys 85
90 95Ser Ser Thr Glu Lys Lys Gly Glu Gly Asp Ser
Ile Ile Pro Pro Asp 100 105
110Ile Ala Ile Cys Glu Asp Cys Leu Arg Glu Leu Phe Asn Pro Thr Asp
115 120 125Lys Arg Tyr Met Tyr Pro Phe
Ile Val Cys Thr Asn Cys Gly Pro Arg 130 135
140Phe Thr Ile Ile Glu Asp Leu Pro Tyr Asp Arg Glu Asn Thr Ala
Met145 150 155 160Arg Glu
Phe Pro Met Cys Glu Phe Cys Arg Ser Glu Tyr Glu Asp Pro
165 170 175Leu Asn Arg Arg Tyr His Ala
Glu Pro Val Ala Cys Pro Thr Cys Gly 180 185
190Pro Ser Tyr Arg Leu Tyr Thr Ser Asp Gly Asn Glu Ile Ile
Gly Asp 195 200 205Pro Leu Arg Lys
Ala Ala Lys Leu Ile Asp Lys Gly Tyr Ile Val Ala 210
215 220Ile Lys Gly Ile Gly Gly Ile His Leu Ala Cys Asp
Ala Thr Arg Glu225 230 235
240Asp Val Val Ala Glu Leu Arg Lys Arg Ile Phe Arg Pro Gln Lys Pro
245 250 255Phe Ala Ile Met Ala
Lys Asp Leu Glu Thr Val Arg Thr Phe Ala Tyr 260
265 270Ile Ser Pro Glu Glu Glu Glu Glu Leu Thr Ser Tyr
Arg Arg Pro Ile 275 280 285Val Ala
Leu Lys Lys Lys Glu Pro Phe Pro Leu Pro Glu Asn Leu Ala 290
295 300Pro Gly Leu His Thr Ile Gly Val Met Leu Pro
Tyr Ala Gly Thr His305 310 315
320Tyr Ile Leu Phe His Trp Ser Lys Thr Pro Val Tyr Val Met Thr Ser
325 330 335Ala Asn Phe Pro
Gly Met Pro Met Ile Lys Asp Asn Glu Glu Ala Phe 340
345 350Glu Lys Leu Arg Asp Val Ala Asp Tyr Leu Leu
Leu His Asn Arg Arg 355 360 365Ile
Pro Asn Arg Ala Asp Asp Ser Val Val Arg Phe Val Asp Gly Arg 370
375 380Arg Ala Val Ile Arg Arg Ser Arg Gly Phe
Val Pro Leu Gly Ile Glu385 390 395
400Ile Pro Phe Glu Tyr Lys Gly Leu Ala Val Gly Ala Glu Leu Met
Asn 405 410 415Ala Phe Gly
Val Val Lys Asn Gly Lys Val Tyr Pro Ser Gln Tyr Ile 420
425 430Gly Asp Thr Ser Lys Ile Glu Val Leu Glu
Phe Met Arg Glu Ala Val 435 440
445Arg His Phe Phe Lys Ile Leu Arg Val Asp Asn Leu Asp Leu Val Val 450
455 460Ala Asp Leu His Pro Ser Tyr Asn
Thr Thr Lys Leu Gly Met Glu Ile465 470
475 480Ala Glu Glu Phe Gly Ala Glu Phe Leu Gln Val Gln
His His Tyr Ala 485 490
495His Val Ala Ser Val Met Ala Glu His Asn Leu Glu Glu Val Val Gly
500 505 510Ile Ala Leu Asp Gly Val
Gly Tyr Gly Thr Asp Gly Lys Thr Trp Gly 515 520
525Gly Glu Val Ile Tyr Leu Ser Tyr Glu Asp Val Glu Arg Leu
Ala His 530 535 540Ile Glu Tyr Tyr Pro
Leu Pro Gly Gly Asp Leu Ala Ser Tyr Tyr Pro545 550
555 560Leu Arg Ala Leu Ile Gly Ile Leu Ser Leu
Asn His Asp Leu Glu Glu 565 570
575Val Glu Lys Ile Ile Arg Glu Phe Cys Pro Asn Ala Ile Lys Ser Leu
580 585 590Lys Tyr Gly Glu Thr
Glu Phe Arg Val Ile Met Arg Gln Leu Ser Ser 595
600 605Gly Ile Asn Val Ala Tyr Ala Ser Ser Thr Gly Arg
Val Leu Asp Ala 610 615 620Phe Ser Val
Leu Leu Asn Val Ser Tyr Arg Arg His Tyr Glu Gly Glu625
630 635 640Pro Ala Met Lys Leu Glu Ser
Phe Ala Tyr Gln Gly Lys Asn Asp Leu 645
650 655Lys Leu Thr Ala Pro Ile Glu Gly Glu Glu Ile Lys
Val Ser Glu Leu 660 665 670Phe
Glu Glu Val Leu Glu Leu Met Gly Lys Ala Asn Pro Lys Asp Ile 675
680 685Ala Tyr Ser Val His Leu Ala Leu Ala
Arg Ala Phe Ala Glu Val Ser 690 695
700Val Glu Lys Ala Lys Glu Phe Gly Ala Lys Thr Val Val Leu Gly Gly705
710 715 720Gly Val Gly Tyr
Asn Glu Leu Ile Val Lys Thr Ile Arg Lys Ile Val 725
730 735Glu Gly Arg Gly Leu Arg Phe Leu Thr Thr
Tyr Glu Val Pro Arg Gly 740 745
750Asp Asn Gly Ile Asn Val Gly Gln Ala Phe Leu Gly Gly Leu Tyr Leu
755 760 765Glu Gly Tyr Leu Asn Arg Glu
Asp Leu Ser Ile 770 77515972DNAPyrococcus furiosus
15atggaagaac taattaggga ggtaatcctc aagaatttaa cccttaattc tgctggagga
60ataggattag aggagcttga tgacggagct acaatccccc ttggagataa gcatttagtg
120tttacaatag atgggcatac agtaaagccg atattcttcc cagggggaga catcggaagg
180ttggccgtta gcggaactgt aaacgatttg gctgtcatgg gagctcaacc cttggcaatt
240gcaagctcgt tgataatcga ggaagggttt gaagttagtg agctggaaaa gattctgaag
300tcgatggacg aaacagctaa agaggttcca gttccaattg ttactggaga cacaaaagtc
360gttgaagaca ggataggaat cttcgttata acagctggag tgggggtagc tgagaggccg
420ataagcgatg ccggcgcaaa agttggggat gtcgttttag tgagtggaac aattggagac
480cacggaatag cactaatgag ccatagagag gggatctcct ttgagacaga gcttaagagc
540gatgtagctc caatttggga tgtcgtaaag gccgttgcag atgccattgg ttgggagaac
600atccacgcaa tgaaagatcc cacaagagga ggattgagca acgcactaaa cgagatggca
660agaaaggcaa acgttggaat tttggtaaga gaggaggcaa taccaattag gccagaagta
720aaagctgcca gcgaaatgct tggaataagt ccctatgaag ttgcaaacga aggaaaagtt
780gtaatgatag tggcgaagga gtatgcggag gaggcacttg aggccatgaa gaagacagaa
840aagggtaggg atgccgcaat aataggagaa gttattggtg aatacagagg aaaagttatt
900ctggagacgg gaattggtgg aagaagattt ttagagccgc ctctcggtga tcccgttcct
960agagtttgtt ag
97216323PRTPyrococcus furiosus 16Met Glu Glu Leu Ile Arg Glu Val Ile Leu
Lys Asn Leu Thr Leu Asn1 5 10
15Ser Ala Gly Gly Ile Gly Leu Glu Glu Leu Asp Asp Gly Ala Thr Ile
20 25 30Pro Leu Gly Asp Lys His
Leu Val Phe Thr Ile Asp Gly His Thr Val 35 40
45Lys Pro Ile Phe Phe Pro Gly Gly Asp Ile Gly Arg Leu Ala
Val Ser 50 55 60Gly Thr Val Asn Asp
Leu Ala Val Met Gly Ala Gln Pro Leu Ala Ile65 70
75 80Ala Ser Ser Leu Ile Ile Glu Glu Gly Phe
Glu Val Ser Glu Leu Glu 85 90
95Lys Ile Leu Lys Ser Met Asp Glu Thr Ala Lys Glu Val Pro Val Pro
100 105 110Ile Val Thr Gly Asp
Thr Lys Val Val Glu Asp Arg Ile Gly Ile Phe 115
120 125Val Ile Thr Ala Gly Val Gly Val Ala Glu Arg Pro
Ile Ser Asp Ala 130 135 140Gly Ala Lys
Val Gly Asp Val Val Leu Val Ser Gly Thr Ile Gly Asp145
150 155 160His Gly Ile Ala Leu Met Ser
His Arg Glu Gly Ile Ser Phe Glu Thr 165
170 175Glu Leu Lys Ser Asp Val Ala Pro Ile Trp Asp Val
Val Lys Ala Val 180 185 190Ala
Asp Ala Ile Gly Trp Glu Asn Ile His Ala Met Lys Asp Pro Thr 195
200 205Arg Gly Gly Leu Ser Asn Ala Leu Asn
Glu Met Ala Arg Lys Ala Asn 210 215
220Val Gly Ile Leu Val Arg Glu Glu Ala Ile Pro Ile Arg Pro Glu Val225
230 235 240Lys Ala Ala Ser
Glu Met Leu Gly Ile Ser Pro Tyr Glu Val Ala Asn 245
250 255Glu Gly Lys Val Val Met Ile Val Ala Lys
Glu Tyr Ala Glu Glu Ala 260 265
270Leu Glu Ala Met Lys Lys Thr Glu Lys Gly Arg Asp Ala Ala Ile Ile
275 280 285Gly Glu Val Ile Gly Glu Tyr
Arg Gly Lys Val Ile Leu Glu Thr Gly 290 295
300Ile Gly Gly Arg Arg Phe Leu Glu Pro Pro Leu Gly Asp Pro Val
Pro305 310 315 320Arg Val
Cys17420DNAPyrococcus furiosus 17atgcacgaat gggcgttggc agatgcaata
gtaaggactg ttttagatta cgctcaaaag 60gagggtgcaa gtagggtaaa ggccgtcaag
gtagtcctcg gagaactcca agatgttggg 120gaggatatag taaagtttgc catggaagag
ctcttcaggg gaacaatagc ggaaggggca 180gagataatat tcgaagagga agaggccgtc
tttaagtgcc gcaactgcgg gcatgtatgg 240aagcttaagg aagtcaaaga taagttggat
gagaggataa gagaggacat ccactttatt 300ccagaggtcg ttcatgcatt tctatcctgt
ccaaaatgtg gaagccatga ttttgaagtg 360gtgaagggaa ggggagttta catttctgga
ataatgatcg agaaggaggg agaagaatga 42018139PRTPyrococcus furiosus 18Met
His Glu Trp Ala Leu Ala Asp Ala Ile Val Arg Thr Val Leu Asp1
5 10 15Tyr Ala Gln Lys Glu Gly Ala
Ser Arg Val Lys Ala Val Lys Val Val 20 25
30Leu Gly Glu Leu Gln Asp Val Gly Glu Asp Ile Val Lys Phe
Ala Met 35 40 45Glu Glu Leu Phe
Arg Gly Thr Ile Ala Glu Gly Ala Glu Ile Ile Phe 50 55
60Glu Glu Glu Glu Ala Val Phe Lys Cys Arg Asn Cys Gly
His Val Trp65 70 75
80Lys Leu Lys Glu Val Lys Asp Lys Leu Asp Glu Arg Ile Arg Glu Asp
85 90 95Ile His Phe Ile Pro Glu
Val Val His Ala Phe Leu Ser Cys Pro Lys 100
105 110Cys Gly Ser His Asp Phe Glu Val Val Lys Gly Arg
Gly Val Tyr Ile 115 120 125Ser Gly
Ile Met Ile Glu Lys Glu Gly Glu Glu 130
13519726DNAPyrococcus furiosus 19atgatagatc ccagagaact cgcaatttca
gcgaagcttg agggagtaaa aagaataatc 60ccagttgtaa gtgggaaggg aggagtagga
aaatccctaa tctccacaac tcttgcccta 120gttctatcag aacaaaaata caaagttgga
cttctcgact tggatttcca tggagcaagt 180gaccacgtca tcctgggatt tgaacccaaa
gaacttcccg aggaagacaa aggagttatt 240cccccaacgg ttcacggaat aaagttcatg
acaatagcgt attacaccga ggacaggcca 300actcctttaa gaggaaagga gattagcgac
gccctaatag agctactaac aataaccagg 360tgggatgagc tcgacttttt agttgttgac
atgccccctg ggatgggaga tcagttctta 420gacgttttaa agtacttcaa gaggggagaa
ttcttgatag tcgcaactcc gtcaaagctc 480tctcttaatg ttgttaggaa gcttatagag
ttgctaaaag aagagaagca tcagatactt 540ggaatagttg agaatatgaa gctggatgaa
gaggaagatg ttatgagaat tgcccaggaa 600tatgggatta ggtatcttgg aggaatacct
ctgtacaggg atctagagag taaagttgga 660aatgttaatg aacttttagc cacagagttt
gccgagaaaa ttagaggaat agctaaaaag 720atttga
72620241PRTPyrococcus furiosus 20Met
Ile Asp Pro Arg Glu Leu Ala Ile Ser Ala Lys Leu Glu Gly Val1
5 10 15Lys Arg Ile Ile Pro Val Val
Ser Gly Lys Gly Gly Val Gly Lys Ser 20 25
30Leu Ile Ser Thr Thr Leu Ala Leu Val Leu Ser Glu Gln Lys
Tyr Lys 35 40 45Val Gly Leu Leu
Asp Leu Asp Phe His Gly Ala Ser Asp His Val Ile 50 55
60Leu Gly Phe Glu Pro Lys Glu Leu Pro Glu Glu Asp Lys
Gly Val Ile65 70 75
80Pro Pro Thr Val His Gly Ile Lys Phe Met Thr Ile Ala Tyr Tyr Thr
85 90 95Glu Asp Arg Pro Thr Pro
Leu Arg Gly Lys Glu Ile Ser Asp Ala Leu 100
105 110Ile Glu Leu Leu Thr Ile Thr Arg Trp Asp Glu Leu
Asp Phe Leu Val 115 120 125Val Asp
Met Pro Pro Gly Met Gly Asp Gln Phe Leu Asp Val Leu Lys 130
135 140Tyr Phe Lys Arg Gly Glu Phe Leu Ile Val Ala
Thr Pro Ser Lys Leu145 150 155
160Ser Leu Asn Val Val Arg Lys Leu Ile Glu Leu Leu Lys Glu Glu Lys
165 170 175His Gln Ile Leu
Gly Ile Val Glu Asn Met Lys Leu Asp Glu Glu Glu 180
185 190Asp Val Met Arg Ile Ala Gln Glu Tyr Gly Ile
Arg Tyr Leu Gly Gly 195 200 205Ile
Pro Leu Tyr Arg Asp Leu Glu Ser Lys Val Gly Asn Val Asn Glu 210
215 220Leu Leu Ala Thr Glu Phe Ala Glu Lys Ile
Arg Gly Ile Ala Lys Lys225 230 235
240Ile21477DNAPyrococcus furiosus 21atggaagagc tgagagaagc
tctaaaaaat gctaagagaa ttgtaatatg tggaataggg 60aatgacatca ggggagacga
cagcttcggg gtttatattg cagaaaaatt aaagagagtt 120ataaagaagg caaacattct
agtcctcaac tgtggagagg ttccagagaa ctacacaggg 180aagatactaa actttcaccc
tgatttaatc atttttatag acgcagtaaa cttcggagga 240aagcctggag aaataataat
tacagatcca gaaaatactg aaggggccgg agtttccacc 300cacagtcttc ccctcaagtt
tttggccact tatctcaaag ctaatacaaa tgccaagaca 360atcttaatag gatgccagcc
aaagaacatt gggctttttg aagatatgag cgaagaagta 420aaagccgttg cggaagtctt
attaaaattc ctttatgaaa gtcttgagct ttcttag 47722158PRTPyrococcus
furiosus 22Met Glu Glu Leu Arg Glu Ala Leu Lys Asn Ala Lys Arg Ile Val
Ile1 5 10 15Cys Gly Ile
Gly Asn Asp Ile Arg Gly Asp Asp Ser Phe Gly Val Tyr 20
25 30Ile Ala Glu Lys Leu Lys Arg Val Ile Lys
Lys Ala Asn Ile Leu Val 35 40
45Leu Asn Cys Gly Glu Val Pro Glu Asn Tyr Thr Gly Lys Ile Leu Asn 50
55 60Phe His Pro Asp Leu Ile Ile Phe Ile
Asp Ala Val Asn Phe Gly Gly65 70 75
80Lys Pro Gly Glu Ile Ile Ile Thr Asp Pro Glu Asn Thr Glu
Gly Ala 85 90 95Gly Val
Ser Thr His Ser Leu Pro Leu Lys Phe Leu Ala Thr Tyr Leu 100
105 110Lys Ala Asn Thr Asn Ala Lys Thr Ile
Leu Ile Gly Cys Gln Pro Lys 115 120
125Asn Ile Gly Leu Phe Glu Asp Met Ser Glu Glu Val Lys Ala Val Ala
130 135 140Glu Val Leu Leu Lys Phe Leu
Tyr Glu Ser Leu Glu Leu Ser145 150
15523777DNAPyrococcus furiosus 23atgaaagtag agaaaggaga tgtcataaga
cttcattaca ctggaaaggt taaagaaact 60ggagaaatct tcgacacaac ttatgaggat
gttgcaaaag aagctagaat atacaatcca 120aacggaatct atgggccagt ccctatagcg
gttggagcgg gacacgtatt gcccggacta 180gacaagagac ttatagggct tgaagttaag
aaaaaatacg tcattgaagt tccacccgaa 240gaaggctttg gattgagaga tccaggaaaa
attaagatta tcccacttgg aaagttcaga 300aaatctggaa taatcccgta ccctgggcta
gaaattgaag ttgaaacaga aaatgggaga 360aaaatgagag gtagggttct tacagttagc
ggaggaagag ttagagtaga cttcaatcat 420ccattagcag gaaagactct cgtatatgaa
gttgaagttg ttgagaaaat tgaagatcca 480atagaaaaga ttaaggcact aatagaacta
agactgccaa tgattgacaa agataaggtt 540attattgaga ttagtgaaaa agatgtaaag
ctaaacttca aagacgttga tattgatcca 600aagacactaa ttttgggcga aattcttctc
gaaagtgact tgaaatttat aggatatgag 660aaagttgaat ttgagccaac cattgaagag
ttattaaagc ccaagtctgc cgaggagcaa 720gagtctccta acgaagaaca gcaagaggag
agtgagtcta aagcggaaga atcttaa 77724258PRTPyrococcus furiosus 24Met
Lys Val Glu Lys Gly Asp Val Ile Arg Leu His Tyr Thr Gly Lys1
5 10 15Val Lys Glu Thr Gly Glu Ile
Phe Asp Thr Thr Tyr Glu Asp Val Ala 20 25
30Lys Glu Ala Arg Ile Tyr Asn Pro Asn Gly Ile Tyr Gly Pro
Val Pro 35 40 45Ile Ala Val Gly
Ala Gly His Val Leu Pro Gly Leu Asp Lys Arg Leu 50 55
60Ile Gly Leu Glu Val Lys Lys Lys Tyr Val Ile Glu Val
Pro Pro Glu65 70 75
80Glu Gly Phe Gly Leu Arg Asp Pro Gly Lys Ile Lys Ile Ile Pro Leu
85 90 95Gly Lys Phe Arg Lys Ser
Gly Ile Ile Pro Tyr Pro Gly Leu Glu Ile 100
105 110Glu Val Glu Thr Glu Asn Gly Arg Lys Met Arg Gly
Arg Val Leu Thr 115 120 125Val Ser
Gly Gly Arg Val Arg Val Asp Phe Asn His Pro Leu Ala Gly 130
135 140Lys Thr Leu Val Tyr Glu Val Glu Val Val Glu
Lys Ile Glu Asp Pro145 150 155
160Ile Glu Lys Ile Lys Ala Leu Ile Glu Leu Arg Leu Pro Met Ile Asp
165 170 175Lys Asp Lys Val
Ile Ile Glu Ile Ser Glu Lys Asp Val Lys Leu Asn 180
185 190Phe Lys Asp Val Asp Ile Asp Pro Lys Thr Leu
Ile Leu Gly Glu Ile 195 200 205Leu
Leu Glu Ser Asp Leu Lys Phe Ile Gly Tyr Glu Lys Val Glu Phe 210
215 220Glu Pro Thr Ile Glu Glu Leu Leu Lys Pro
Lys Ser Ala Glu Glu Gln225 230 235
240Glu Ser Pro Asn Glu Glu Gln Gln Glu Glu Ser Glu Ser Lys Ala
Glu 245 250 255Glu
Ser25386DNAEscherichia coli 25ctcgaattcc ttctctttta ctcgtttagc aaccggctaa
acatccccac cgcccggcca 60aaagaaaata ggtccatttt tatcgctaaa agataaatcc
acacagtttg tattgttttg 120tgcaaaagtt tcactacgct ttattaacaa tactttctgg
cgacgtgcgc cagtgcagaa 180ggatgagctt tcgttttcag catctcacgt gaagcgatgg
tttgccttgc tacagggacg 240tcgcttgccg accataagcg cccggtgtcc tgccggtgtc
gcaaggagga gagacgtgcg 300atatgggtca tcaccatcat caccacggct cgatcacaag
tttgtacaaa aaagcaggct 360cagaaaacct gtattttcag ggagga
38626637DNAEscherichia coli 26ctcgaattct
gcagcatgtc accatgacac tgtggacagc ggcggacgcg ctgggtcagt 60agcgtcacat
actgttggca tgtttcacac cagcattcgg cctcttgttc ttcgaggtgc 120agtttacaac
cttccgccac gctgccgcgg caaaccagat caaaacaaaa ggcaagagag 180ctggtttcga
cacaagaaaa tgcgccaatt ttgagccaga ccccagttac gcgttttgcg 240ccgtgttttg
cggcctgctg ttcgatcaat tccagtgccc gttggcagag ggttatttcg 300tgcatatcgc
ctcccattaa ctattgccag ctacaagcaa taattgtgcc agtgttgatt 360atccctgcgg
tgaataatgt cgatgatgtc gaaatgacac gtcgacacgg cgacgaaatt 420catctttagc
ttaaaaatct ctttaataac aataaattaa aagttggcac aaaaaatgct 480taaagctggc
atctctgtta aacgggtaac ctgacaatga ctatttggga aataagcgag 540aaagccgatt
acatcgcaca gcggcatcgt cgcctacagg accagtggca catctactgc 600aattcgctgg
ttcaggggag aggaggaata aaaaatg
63727179DNABacillus megaterium 27gaattctaga atctaatatt ataactaaat
tttctaaaaa aaacattgga atagacattt 60attttgtata tgatgaaata aagttagttt
attggataaa caaactaact ttattaaggt 120agttgatgga taaacttgtt cacttaaatc
aacccgggaa caaggaggaa taaaaaatg 17928813DNAEscherichia coli
28ggatccccgt caccctggat gctgtacaat tgacgacgac aagggcccgg gcaaactagt
60aatcagacgc ggtcgttcac ttgttcagca accagatcaa aagccattga ctcagcaagg
120gttgaccgta taattcacgc gattacaccg cattgcggta tcaacgcgcc cttagctcag
180ttggatagag caacgacctt ctaagtcgtg ggccgcaggt tcgaatcctg cagggcgcgc
240cattacaatt caatcagtta cgccttcttt atatcctcca gccatggcct tgaaatggcg
300ttagtcatga aatatagacc gccatcgagt accccttgta cccttaactc ttcctgatac
360gtaaataatg atttggtggc ccttgctgga cttgaaccag cgaccaagcg attatgagtc
420gcctgctcta accactgagc taaagggcct tgagtgtgca ataacaatac ttataaacca
480cgcaataaac atgatgatct agagaatccc gtcgtagcca ccatcttttt ttgcgggagt
540ggcgaaattg gtagacgcac cagatttagg ttctggcgcc gctaggtgtg cgagttcaag
600tctcgcctcc cgcaccattc accagaaagc gttgatcgga tgccctcgag tcgggcagcg
660ttgggtcctg gccacgggtg cgcatgatcg tgctcctgtc gttgaggacc cggctaggct
720ggcggggttg ccttactggt tagcagaatg aatcaccgat acgcgagcga acgtgaagcg
780actgctgctg caaaacgtct gcgacctgag ctc
813298123DNAartificialexpression vector sequence 29ttgtacaaac ttgtgatcga
gccgtggtga tgatggtgat gacccatatc gcacgtctct 60cctccttgcg acaccggcag
gacaccgggc gcttatggtc ggcaagcgac gtccctgtag 120caaggcaaac catcgcttca
cgtgagatgc tgaaaacgaa agctcatcct tctgcactgg 180cgcacgtcgc cagaaagtat
tgttaataaa gcgtagtgaa acttttgcac aaaacaatac 240aaactgtgtg gatttatctt
ttagcgataa aaatggacct atttttcttt tggccgggcg 300gtggggatgt ttagccggtt
gctaaacgag taaaagagaa ggaattcgag ctcgaattcg 360gatcctagag ggaaaccgtt
gtggtctccc tatagtgagt cgtattaatt tcgcgggatc 420gagatctcgg gcagcgttgg
gtcctggcca cgggtgcgca tgatcgtgct cctgtcgttg 480aggacccggc taggctggcg
gggttgcctt actggttagc agaatgaatc accgatacgc 540gagcgaacgt gaagcgactg
ctgctgcaaa acgtctgcga cctgagcaac aacatgaatg 600gtcttcggtt tccgtgtttc
gtaaagtctg gaaacgcgga agtcagcgcc ctgcaccatt 660atgttccgga tctgcatcgc
aggatgctgc tggctaccct gtggaacacc tacatctgta 720ttaacgaagc gctggcattg
accctgagtg atttttctct ggtcccgccg catccatacc 780gccagttgtt taccctcaca
acgttccagt aaccgggcat gttcatcatc agtaacccgt 840atcgtgagca tcctctctcg
tttcatcggt atcattaccc ccatgaacag aaatccccct 900tacacggagg catcagtgac
caaacaggaa aaaaccgccc ttaacatggc ccgctttatc 960agaagccaga cattaacgct
tctggagaaa ctcaacgagc tggacgcgga tgaacaggca 1020gacatctgtg aatcgcttca
cgaccacgct gatgagcttt accgcagctg cctcgcgcgt 1080ttcggtgatg acggtgaaaa
cctctgacac atgcagctcc cggagacggt cacagcttgt 1140ctgtaagcgg atgccgggag
cagacaagcc cgtcagggcg cgtcagcggg tgttggcggg 1200tgtcggggcg cagccatgac
ccagtcacgt agcgatagcg gagtgtatac tggcttaact 1260atgcggcatc agagcagatt
gtactgagag tgcaccatat atgcggtgtg aaataccgca 1320cagatgcgta aggagaaaat
accgcatcag gcgctcttcc gcttcctcgc tcactgactc 1380gctgcgctcg gtcgttcggc
tgcggcgagc ggtatcagct cactcaaagg cggtaatacg 1440gttatccaca gaatcagggg
ataacgcagg aaagaacatg tgagcaaaag gccagcaaaa 1500ggccaggaac cgtaaaaagg
ccgcgttgct ggcgtttttc cataggctcc gcccccctga 1560cgagcatcac aaaaatcgac
gctcaagtca gaggtggcga aacccgacag gactataaag 1620ataccaggcg tttccccctg
gaagctccct cgtgcgctct cctgttccga ccctgccgct 1680taccggatac ctgtccgcct
ttctcccttc gggaagcgtg gcgctttctc atagctcacg 1740ctgtaggtat ctcagttcgg
tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc 1800ccccgttcag cccgaccgct
gcgccttatc cggtaactat cgtcttgagt ccaacccggt 1860aagacacgac ttatcgccac
tggcagcagc cactggtaac aggattagca gagcgaggta 1920tgtaggcggt gctacagagt
tcttgaagtg gtggcctaac tacggctaca ctagaaggac 1980agtatttggt atctgcgctc
tgctgaagcc agttaccttc ggaaaaagag ttggtagctc 2040ttgatccggc aaacaaacca
ccgctggtag cggtggtttt tttgtttgca agcagcagat 2100tacgcgcaga aaaaaaggat
ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc 2160tcagtggaac gaaaactcac
gttaagggat tttggtcatg agattatcaa aaaggatctt 2220cacctagatc cttttaaatt
aaaaatgaag ttttaaatca atctaaagta tatatgagta 2280aacttggtct gacagttacc
aatgcttaat cagtgaggca cctatctcag cgatctgtct 2340atttcgttca tccatagttg
cctgactccc cgtcgtgtag ataactacga tacgggaggg 2400cttaccatct ggccccagtg
ctgcaatgat accgcgagac ccacgctcac cggctccaga 2460tttatcagca ataaaccagc
cagccggaag ggccgagcgc agaagtggtc ctgcaacttt 2520atccgcctcc atccagtcta
ttaattgttg ccgggaagct agagtaagta gttcgccagt 2580taatagtttg cgcaacgttg
ttgccattgc tgcaggcatc gtggtgtcac gctcgtcgtt 2640tggtatggct tcattcagct
ccggttccca acgatcaagg cgagttacat gatcccccat 2700gttgtgcaaa aaagcggtta
gctccttcgg tcctccgatc gttgtcagaa gtaagttggc 2760cgcagtgtta tcactcatgg
ttatggcagc actgcataat tctcttactg tcatgccatc 2820cgtaagatgc ttttctgtga
ctggtgagta ctcaaccaag tcattctgag aatagtgtat 2880gcggcgaccg agttgctctt
gcccggcgtc aatacgggat aataccgcgc cacatagcag 2940aactttaaaa gtgctcatca
ttggaaaacg ttcttcgggg cgaaaactct caaggatctt 3000accgctgttg agatccagtt
cgatgtaacc cactcgtgca cccaactgat cttcagcatc 3060ttttactttc accagcgttt
ctgggtgagc aaaaacagga aggcaaaatg ccgcaaaaaa 3120gggaataagg gcgacacgga
aatgttgaat actcatactc ttcctttttc aatattattg 3180aagcatttat cagggttatt
gtctcatgag cggatacata tttgaatgta tttagaaaaa 3240taaacaaata ggggttccgc
gcacatttcc ccgaaaagtg ccacctgaaa ttgtaaacgt 3300taatattttg ttaaaattcg
cgttaaattt ttgttaaatc agctcatttt ttaaccaata 3360ggccgaaatc ggcaaaatcc
cttataaatc aaaagaatag accgagatag ggttgagtgt 3420tgttccagtt tggaacaaga
gtccactatt aaagaacgtg gactccaacg tcaaagggcg 3480aaaaaccgtc tatcagggcg
atggcccact acgtgaacca tcaccctaat caagtttttt 3540ggggtcgagg tgccgtaaag
cactaaatcg gaaccctaaa gggagccccc gatttagagc 3600ttgacgggga aagccggcga
acgtggcgag aaaggaaggg aagaaagcga aaggagcggg 3660cgctagggcg ctggcaagtg
tagcggtcac gctgcgcgta accaccacac ccgccgcgct 3720taatgcgccg ctacagggcg
cgtcccattc gccaatccgg atatagttcc tcctttcagc 3780aaaaaacccc tcaagacccg
tttagaggcc ccaaggggtt atgctagtta ttgctcagcg 3840gtggcagcag ccaactcagc
ttcctttcgg gctttgttag cagccggatc tcagtggtgg 3900tggtggtggt gctcgagtgc
ggccgcaagc ttgtcgacgg agcgcaagct tagcagccgg 3960atctgatctt aattaattat
caccactttg tacaagaaag ctgggtctcc tccctgaaaa 4020tacaggtttt cctaaagtct
aaccacgtgg actgagcaag atatgcatgg atcataagcc 4080ctaacaacca tctcagccag
tatctttaac ctttctggat cgtcattgta gtgcttttct 4140gccatcattc ttacatgttc
ttccatcatt gccaagttga atgctgtagg tgttattatg 4200tcggcataag aaacccttcc
attctcaact ttgagggcat agactaagat tccccttgga 4260gcctcagtcg ttgagacacc
aaagccgtcc tttatctcaa cttcatccct gggcttaatt 4320ggccacttgg cgagagcctc
gtcgagcaga tctattgccc tctctataaa gtaaactatt 4380tcgagggcct gggctaagtt
atttgcaaac ggatttgttc cctttaatag gtctttgttt 4440gcctcataca gctccttggc
cttgccgtat aggaggtcag cattgttaat aactctagat 4500atagccccaa ccatgaaggg
tctgcccttg tagtgactgt gctttgcaaa actgtgttca 4560acgacgaact cctttatata
atctctgtac ttttcacttg ggaactcctc cccatcactt 4620gcctttatgt aatctccata
aattccataa gcatctcccc tcggcttcac ggccaagtgt 4680gttattggcc cttcaacttc
gctgtactgc tcaagctttg caaataactc aaaagtatac 4740tcggcaagtg gtagggcttc
cctaagctcg gctttcattt tctcaaggac actcttctca 4800gggagctttc cgaatccgcc
caaaaccgca ttttcttggt gtatggctct tgaccctaga 4860atgtccatca tccaggtgcc
aaggttcttc agcttaaggg ctatctctat ctccctcttg 4920tattcattca ccatcttaag
tgggctcgag tagcccctgt agtcgggaag aactagaaga 4980tataggtgaa gggcatgact
ctctatcatg tctccgatgt atagtacttc tctaagggcc 5040tgtatctctt cccttgggac
aaaaccgacg gccttttctg cagcctctaa tgcggttaac 5100ttgtgggcgg ctgaacagaa
tgagcatatt ctcgggtaaa tggccagagc ttcctcaagc 5160ttcttcccaa tagttatggc
ctcaaagaat ctgggccctt caattatgtt tagcttgacc 5220tccttgactc catcatcccc
aattattatc tccacaccac ccttcccctc aactcttgct 5280atatgatcaa tggtgattgg
aagatagagg ttcttcattg ttcaccacct gagaatattt 5340tttcaaccat tttctcaacc
ctctcatcat gtccattgaa cattttcatt ctctcaatta 5400tctcctcttt tgtcatcccc
ttctccttga acaccttagc tagagagtcg aaccaagcta 5460catcgtaccc tattgcccct
ctgcatccta tacacgcaac tccaaatcct ggacatctcg 5520cgttacatcc tgcccttgtt
actggaccta gacagggttc tcctttctca agaaggatac 5580atggatgtcc attgagccta
cattctagac aaactggata atctatatcc tctggccatg 5640aaccaatcaa gaatgttccc
agggcgtaga ggaagtcctt cttctctggt gggcaaccgt 5700agatgttgta gtcaactttt
atgtattttg aaactggttc agccttcttc ggttggaact 5760tgacttttgc gtctccataa
accttcttcc agagctcttc taatggcttt tcactccagc 5820tctgaactcc tccttgaaca
gcacaagctc caaccgcaac gacgatcttt gcattctccc 5880taattttttt cacgagttca
acttcttcct cagttgaaac gcttccttct ataaaagcta 5940tgtcgacctt ttcatcctca
atgctatctc tatcaatcat gaaccagcaa actatttcag 6000catttgggat aagttgtaat
aactcgtcca tcatagctag ctgcaattga cagccgtagc 6060acgaggttaa tgcgtaaaat
ccaatcctaa cttttcccat tttcctcacc tcagtccagc 6120agtcctgggg ttgaaactat
gtcgaagtac gtgaagactg gtccatcttt acagatgtac 6180ttccagctcg tgctcgttcc
gacgttgcag tggccgcact tcccgattcc acatttcatt 6240cttctctcca atgtcacgaa
gatgttctct gggcgataac cgtagttgat gagggcctca 6300aacactgact tatacattct
aggaggccca cagattgcaa ctgcagtgtt ctttggattt 6360gtgttggcct caacgatgaa
ctgctgtggc ctacccttta ggcccggcca gtttggatcc 6420ctagtgacgc tctggatgat
tttcacgttt tcagcctcag ctaggtcttt cattgcctcc 6480agctccttgt agaagaggag
atccttccca taacgtgcgg tgtttatgaa ggtaatgttt 6540ccatacttcc acctgttgtc
cattgcatag agaaagacgc tcctaagagg tgcagttcca 6600aggccagcag ctattaatag
tagatccatt ccttcccact catccactgg gaatccatta 6660ccgtaaggcc ctctcacaag
aacagtatcg ccaggcttta gtctatggac aacagttgtg 6720acccttcctg cctttcttat
acagagctca aagaatcctt tcctcattgg agaagagcat 6780atacttatgg gaacctctcc
aactccaggt atcgtcagct ggacaaactg tccaggtttg 6840aacgtccact tctctgccaa
ctcgggatcc tcaaatctaa agaggaaaag cttttccgtt 6900tccgtcaagg agtaaacctt
tagaactttg actctatgaa gggcatacgg attatcattt 6960ggcatcataa tctcttttgg
caacatcata ctccatcacc tctaatgtta gaggaatatg 7020caaatcctct ctttggaatc
tcctcactaa ccgttggggg acatttgttc tcctcaagtc 7080ctaaaatcct tctaagattg
cctacaaaac ttatattggc tggacagaat gcagtacacc 7140ttccacaacc gacacagtag
cttaatccaa gcttttcgtt atatgcgttc ttacagaggt 7200atctgttcct aaagcgatcc
ttctttgtgg gcctgaagtt gtggccccca gcaactaagc 7260catgacttct gaactgacaa
gaatcccatc ttctttccct gtatccagta actccatcta 7320ggtttacaat atcctgaact
tcatagcatc tacacgttgg gcatgtggtg ttacatattc 7380cacaagccaa gcacttatct
gcctcctcat cccacattgg atgttccatt tccaactcga 7440gaagatacct caagttgccc
cagtcttcgt ggtatttgaa tgcttgctgt ctcctctttt 7500caaaatctct aaatgcacag
atatccttgt ccgttacctc ttcaaagagc tttatgttct 7560tgtcaacaag cctgtgccca
gttggagtgc caacccttac caaccatcca tcgggcagtt 7620catggaagaa caagtcaaaa
ccatcatcag cgaagtctgt ttctcttaag ttacagaagc 7680aatattcatc tggcatacag
cttattccaa tgattatccc cttctctctc ctcaccttgt 7740agtacttgtc ggggaactca
tcaaggtata ccgtgtctag gatctttagg ccatatatgt 7800cacacgcgtg gactccaaat
ataataaatg gttcaacttc ctctattacc tccctgtatt 7860ctggttttga aatgtcgaac
tcaaagagct tttccctcgg cttgaagaag aacttcttag 7920gtggcattat tgtcctgttg
tagtggaatt ctatctttct aacatcatca atctccctga 7980agtcatagaa cttgtccgaa
atttttactg gagcgtaaag cttcccccag tctttaagtc 8040tttccaaaaa ctcgtaagtg
ttttccttgg gtaacttaac ataccttcct ccctgaaaat 8100acaggttttc tgagcctgct
ttt
8123307025DNAartificialexpression vector sequence 30ttgtacaaag tggttgatga
gtccggatcc caattgggag ctcgtgtaca cggcgcgcct 60gcaggtcgac aagcttgcgg
ccgcactcga gtctggtaaa gaaaccgctg ctgcgaaatt 120tgaacgccag cacatggact
cgtctactag cgcagcttaa ttaacctagg ctgctgccac 180cgctgagcaa taactagcat
aaccccttgg ggcctctaaa cgggtcttga ggggtttttt 240gctgaaacct caggcatttg
agaagcacac ggtcacactg cttccggtag tcaataaacc 300ggtaaaccag caatagacat
aagcggctat ttaacgaccc tgccctgaac cgacgaccgg 360gtcatcgtgg ccggatcttg
cggcccctcg gcttgaacga attgttagac attatttgcc 420gactaccttg gtgatctcgc
ctttcacgta gtggacaaat tcttccaact gatctgcgcg 480cgaggccaag cgatcttctt
cttgtccaag ataagcctgt ctagcttcaa gtatgacggg 540ctgatactgg gccggcaggc
gctccattgc ccagtcggca gcgacatcct tcggcgcgat 600tttgccggtt actgcgctgt
accaaatgcg ggacaacgta agcactacat ttcgctcatc 660gccagcccag tcgggcggcg
agttccatag cgttaaggtt tcatttagcg cctcaaatag 720atcctgttca ggaaccggat
caaagagttc ctccgccgct ggacctacca aggcaacgct 780atgttctctt gcttttgtca
gcaagatagc cagatcaatg tcgatcgtgg ctggctcgaa 840gatacctgca agaatgtcat
tgcgctgcca ttctccaaat tgcagttcgc gcttagctgg 900ataacgccac ggaatgatgt
cgtcgtgcac aacaatggtg acttctacag cgcggagaat 960ctcgctctct ccaggggaag
ccgaagtttc caaaaggtcg ttgatcaaag ctcgccgcgt 1020tgtttcatca agccttacgg
tcaccgtaac cagcaaatca atatcactgt gtggcttcag 1080gccgccatcc actgcggagc
cgtacaaatg tacggccagc aacgtcggtt cgagatggcg 1140ctcgatgacg ccaactacct
ctgatagttg agtcgatact tcggcgatca ccgcttccct 1200catactcttc ctttttcaat
attattgaag catttatcag ggttattgtc tcatgagcgg 1260atacatattt gaatgtattt
agaaaaataa acaaatagct agctcactcg gtcgctacgc 1320tccgggcgtg agactgcggc
gggcgctgcg gacacataca aagttaccca cagattccgt 1380ggataagcag gggactaaca
tgtgaggcaa aacagcaggg ccgcgccggt ggcgtttttc 1440cataggctcc gccctcctgc
cagagttcac ataaacagac gcttttccgg tgcatctgtg 1500ggagccgtga ggctcaacca
tgaatctgac agtacgggcg aaacccgaca ggacttaaag 1560atccccaccg ttccggcggg
tcgctccctc ttgcgctctc ctgttccgac cctgccgttt 1620accggatacc tgttccgcct
ttctccctta cgggaagtgt ggcgctttct catagctcac 1680acactggtat ctcggctcgg
tgtaggtcgt tcgctccaag ctgggctgta agcaagaact 1740ccccgttcag cccgactgct
gcgccttatc cggtaactgt tcacttgagt ccaacccgga 1800aaagcacggt aaaacgccac
tggcagcagc cattggtaac tgggagttcg cagaggattt 1860gtttagctaa acacgcggtt
gctcttgaag tgtgcgccaa agtccggcta cactggaagg 1920acagatttgg ttgctgtgct
ctgcgaaagc cagttaccac ggttaagcag ttccccaact 1980gacttaacct tcgatcaaac
cacctcccca ggtggttttt tcgtttacag ggcaaaagat 2040tacgcgcaga aaaaaaggat
ctcaagaaga tcctttgatc ttttctactg aaccgctcta 2100gatttcagtg caatttatct
cttcaaatgt agcacctgaa gtcagcccca tacgatataa 2160gttgtaattc tcatgttagt
catgccccgc gcccaccgga aggagctgac tgggttgaag 2220gctctcaagg gcatcggtcg
agatcccggt gcctaatgag tgagctaact tacattaatt 2280gcgttgcgct cactgcccgc
tttccagtcg ggaaacctgt cgtgccagct gcattaatga 2340atcggccaac gcgcggggag
aggcggtttg cgtattgggc gccagggtgg tttttctttt 2400caccagtgag acgggcaaca
gctgattgcc cttcaccgcc tggccctgag agagttgcag 2460caagcggtcc acgctggttt
gccccagcag gcgaaaatcc tgtttgatgg tggttaacgg 2520cgggatataa catgagctgt
cttcggtatc gtcgtatccc actaccgaga tgtccgcacc 2580aacgcgcagc ccggactcgg
taatggcgcg cattgcgccc agcgccatct gatcgttggc 2640aaccagcatc gcagtgggaa
cgatgccctc attcagcatt tgcatggttt gttgaaaacc 2700ggacatggca ctccagtcgc
cttcccgttc cgctatcggc tgaatttgat tgcgagtgag 2760atatttatgc cagccagcca
gacgcagacg cgccgagaca gaacttaatg ggcccgctaa 2820cagcgcgatt tgctggtgac
ccaatgcgac cagatgctcc acgcccagtc gcgtaccgtc 2880ttcatgggag aaaataatac
tgttgatggg tgtctggtca gagacatcaa gaaataacgc 2940cggaacatta gtgcaggcag
cttccacagc aatggcatcc tggtcatcca gcggatagtt 3000aatgatcagc ccactgacgc
gttgcgcgag aagattgtgc accgccgctt tacaggcttc 3060gacgccgctt cgttctacca
tcgacaccac cacgctggca cccagttgat cggcgcgaga 3120tttaatcgcc gcgacaattt
gcgacggcgc gtgcagggcc agactggagg tggcaacgcc 3180aatcagcaac gactgtttgc
ccgccagttg ttgtgccacg cggttgggaa tgtaattcag 3240ctccgccatc gccgcttcca
ctttttcccg cgttttcgca gaaacgtggc tggcctggtt 3300caccacgcgg gaaacggtct
gataagagac accggcatac tctgcgacat cgtataacgt 3360tactggtttc acattcacca
ccctgaattg actctcttcc gggcgctatc atgccatacc 3420gcgaaaggtt ttgcgccatt
cgatggtgtc cgggatctcg acgctctccc ttatgcgact 3480cctgcattag gaaattaata
cgactcacta taggggaatt gtgagcggat aacaattccc 3540ctgtagaaat aattttgttt
aactttaata aggagatata ccatggcaca tcaccaccac 3600catcacgtgg gtaccggttc
gaatgatctc gaattccttc tcttttactc gtttagcaac 3660cggctaaaca tccccaccgc
ccggccaaaa gaaaaatagg tccattttta tcgctaaaag 3720ataaatccac acagtttgta
ttgttttgtg caaaagtttc actacgcttt attaacaata 3780ctttctggcg acgtgcgcca
gtgcagaagg atgagctttc gttttcagca tctcacgtga 3840agcgatggtt tgccttgcta
cagggacgtc gcttgccgac cataagcgcc cggtgtcctg 3900ccggtgtcgc aaggaggaga
gacgtgcgat atgggtcatc accatcatca ccacatcgac 3960gacaaatcaa caagtttgta
caaaaaagca ggctcagaaa acctgtattt tcagggagga 4020tgccttgcaa tcccagggaa
agtggtggag attaaaggta acgttggaat agtggatttt 4080ggaggaatac ggagagaggt
aaggttagat cttttgagtg atgttaaagt tggcgattac 4140gttatagttc acactggctt
tgctatagaa aagttagatg agaggagagc tagagaaatt 4200cttgaagcct gggaagaagt
tttctcagta attgggggtg agtaaatgct tgaaaaattt 4260ggagacaaag ctgtagctca
aaagatttta gaaaaaatta aagaggaagc taaagggata 4320gaagagctac gatttatgca
cgtttgtggg actcatgagg acacagtaac taggagtgga 4380atcagatcac ttcttccaga
aaatgtaaaa atcatgagtg gcccaggatg tcccgtctgt 4440ataacccccg ttgaggacat
agtgaagatg atggaaatta tgaaagttgc gagagaggag 4500agggaagaaa ttattctcac
tacttttggt gacatgtata gaattccaac tccaatagga 4560agctttgcag acttaaagag
tcagggttac gatgtgagga tagtttactc tatatacgac 4620tcctataaaa tagccaagga
aaatccagat aagcttgtag tgcacttttc tcctgggttt 4680gagactaccg ccgctccaac
agctggaatg cttgagagca ttgtggaaga ggggctagag 4740aactttaaga tttattccgt
tcataggtta acccctcctg cagttgaagc tctcctaaat 4800gcggggactg tttttcacgg
tttaatagat cctggtcatg tctctacaat aattggggtg 4860aaaggatggg cgtatctcac
agaaaagttt ggaattcctc aagttgtggc tggctttgag 4920ccagttgatg ttttactcgg
aatacttatt ctcattaggc ttgtgaagag gggcgaagcg 4980aaaataatca acgagtataa
tagagttgta aagtgggaag gaaatgtcaa ggcccaagaa 5040ctgatttgga agtactttga
agttaaagat gcaaagtgga gggccctagg agtaattcca 5100aggagcggat tggaacttaa
gaaagagtgg aaggagctag aaattagaac ttattacaat 5160cccgaggttc caaagctccc
agatcttgaa aaaggatgtc tctgtggggc agtccttaga 5220ggattagcct taccgaccca
gtgccaacac tttggaaaga catgtacacc aagacatccg 5280gtaggtcctt gtatggtttc
gtacgaagga acttgtcaca tattttacaa atatggcgcc 5340ctgatgtagg aggtggaaaa
tgcacgaatg ggcgttggca gatgcaatag taaggactgt 5400tttagattac gctcaaaagg
agggtgcaag tagggtaaag gccgtcaagg tagtcctcgg 5460agaactccaa gatgttgggg
aggatatagt aaagtttgcc atggaagagc tcttcagggg 5520aacaatagcg gaaggggcag
agataatatt cgaagaggaa gaggccgtct ttaagtgccg 5580caactgcggg catgtatgga
agcttaagga agtcaaagat aagttggatg agaggataag 5640agaggacatc cactttattc
cagaggtcgt tcatgcattt ctatcctgtc caaaatgtgg 5700aagccatgat tttgaagtgg
tgaagggaag gggagtttac atttctggaa taatgatcga 5760gaaggaggga gaagaatgat
agatcccaga gaactcgcaa tttcagcgaa gcttgaggga 5820gtaaaaagaa taatcccagt
tgtaagtggg aagggaggag taggaaaatc cctaatctcc 5880acaactcttg ccctagttct
atcagaacaa aaatacaaag ttggacttct cgacttggat 5940ttccatgagc aagtgaccac
gtcatcctgg gatttgaacc caaagaactt cccgaggaag 6000acaaaggagt tattccccca
acggttcacg gaataaagtt catgacaata gcgtattaca 6060ccgaggacag gccaactcct
ttaagaggaa aggagattag cgacgcccta atagagctac 6120taacaataac caggtgggat
gagctcgact ttttagttgt tgacatgccc cctgggatgg 6180gagatcagtt cttagacgtt
ttaaagtact tcaagagggg agaattcttg atagtcgcaa 6240ctccgtcaaa gctctctctt
aatgttgtta ggaagcttat agagttgcta aaagaagaga 6300agcatcagat acttggaata
gttgagaata tgaagctgga tgaagaggaa gatgttatga 6360gaattgccca ggaatatggg
attaggtatc ttggaggaat acctctgtac agggatctag 6420agagtaaagt tggaaatgtt
aatgaacttt tagccacaga gtttgccgag aaaattagag 6480gaatagctaa aaagatttga
ctggtgcaag ctatggaaga gctgagagaa gctctaaaaa 6540atgctaagag aattgtaata
tgtggaatag ggaatgacat caggggagac gacagcttcg 6600gggtttatat tgcagaaaaa
ttaaagagag ttataaagaa ggcaaacatt ctagtcctca 6660actgtggaga ggttccagag
aactacacag ggaagatact aaactttcac cctgatttaa 6720tcatttttat agacgcagta
aacttcggag gaaagcctgg agaaataata attacagatc 6780cagaaaatac tgaaggggcc
ggagtttcca cccacagtct tcccctcaag tttttggcca 6840cttatctcaa agctaataca
aatgccaaga caatcttaat aggatgccag ccaaagaaca 6900ttgggctttt tgaagatatg
agcgaagaag taaaagccgt tgcggaagtc ttattaaaat 6960tcctttatga aagtcttgag
ctttcttagg aaaacctgta ttttcaggga ggagacccag 7020ctttc
7025317623DNAartificialexpression vector sequence 31ttgtacaaag tggtgataat
taattaagat cagatccggc tgctaagctt gcgctcggcg 60cgcctgcagg tcgacaagct
tgcggccgca taatgcttaa gtcgaacaga aagtaatcgt 120attgtacacg gccgcataat
cgaaattaat acgactcact ataggggaat tgtgagcgga 180taacaattcc ccatcttagt
atattagtta agtataagaa ggagatatac atatggcaga 240tctcaattgg atatcggccg
gccacgcgat cgctgacgtc ggtaccctcg agtctggtaa 300agaaaccgct gctgcgaaat
ttgaacgcca gcacatggac tcgtctacta gcgcagctta 360attaacctag gctgctgcca
ccgctgagca ataactagca taaccccttg gggcctctaa 420acgggtcttg aggggttttt
tgctgaaacc tcaggcattt gagaagcaca cggtcacact 480gcttccggta gtcaataaac
cggtaaacca gcaatagaca taagcggcta tttaacgacc 540ctgccctgaa ccgacgacaa
gctgacgacc gggtctccgc aagtggcact tttcggggaa 600atgtgcgcgg aacccctatt
tgtttatttt tctaaataca ttcaaatatg tatccgctca 660tgaattaatt cttagaaaaa
ctcatcgagc atcaaatgaa actgcaattt attcatatca 720ggattatcaa taccatattt
ttgaaaaagc cgtttctgta atgaaggaga aaactcaccg 780aggcagttcc ataggatggc
aagatcctgg tatcggtctg cgattccgac tcgtccaaca 840tcaatacaac ctattaattt
cccctcgtca aaaataaggt tatcaagtga gaaatcacca 900tgagtgacga ctgaatccgg
tgagaatggc aaaagtttat gcatttcttt ccagacttgt 960tcaacaggcc agccattacg
ctcgtcatca aaatcactcg catcaaccaa accgttattc 1020attcgtgatt gcgcctgagc
gagacgaaat acgcggtcgc tgttaaaagg acaattacaa 1080acaggaatcg aatgcaaccg
gcgcaggaac actgccagcg catcaacaat attttcacct 1140gaatcaggat attcttctaa
tacctggaat gctgttttcc cggggatcgc agtggtgagt 1200aaccatgcat catcaggagt
acggataaaa tgcttgatgg tcggaagagg cataaattcc 1260gtcagccagt ttagtctgac
catctcatct gtaacatcat tggcaacgct acctttgcca 1320tgtttcagaa acaactctgg
cgcatcgggc ttcccataca atcgatagat tgtcgcacct 1380gattgcccga cattatcgcg
agcccattta tacccatata aatcagcatc catgttggaa 1440tttaatcgcg gcctagagca
agacgtttcc cgttgaatat ggctcatact cttccttttc 1500aatattattg aagcatttat
cagggttatt gtctcatgag cggatacata tttgaatgta 1560tttagaaaaa taaacaaata
ggcatgcagc gctcttccgc ttcctcgctc actgactcgc 1620tacgctcggt cgttcgactg
cggcgagcgg tgtcagctca ctcaaaagcg gtaatacggt 1680tatccacaga atcaggggat
aaagccggaa agaacatgtg agcaaaaagc aaagcaccgg 1740aagaagccaa cgccgcaggc
gtttttccat aggctccgcc cccctgacga gcatcacaaa 1800aatcgacgct caagccagag
gtggcgaaac ccgacaggac tataaagata ccaggcgttt 1860ccccctggaa gctccctcgt
gcgctctcct gttccgaccc tgccgcttac cggatacctg 1920tccgcctttc tcccttcggg
aagcgtggcg ctttctcata gctcacgctg ttggtatctc 1980agttcggtgt aggtcgttcg
ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc 2040gaccgctgcg ccttatccgg
taactatcgt cttgagtcca acccggtaag acacgactta 2100tcgccactgg cagcagccat
tggtaactga tttagaggac tttgtcttga agttatgcac 2160ctgttaaggc taaactgaaa
gaacagattt tggtgagtgc ggtcctccaa cccacttacc 2220ttggttcaaa gagttggtag
ctcagcgaac cttgagaaaa ccaccgttgg tagcggtggt 2280ttttctttat ttatgagatg
atgaatcaat cggtctatca agtcaacgaa cagctattcc 2340gttactctag atttcagtgc
aatttatctc ttcaaatgta gcacctgaag tcagccccat 2400acgatataag ttgtaattct
catgttagtc atgccccgcg cccaccggaa ggagctgact 2460gggttgaagg ctctcaaggg
catcggtcga gatcccggtg cctaatgagt gagctaactt 2520acattaattg cgttgcgctc
actgcccgct ttccagtcgg gaaacctgtc gtgccagctg 2580cattaatgaa tcggccaacg
cgcggggaga ggcggtttgc gtattgggcg ccagggtggt 2640ttttcttttc accagtgaga
cgggcaacag ctgattgccc ttcaccgcct ggccctgaga 2700gagttgcagc aagcggtcca
cgctggtttg ccccagcagg cgaaaatcct gtttgatggt 2760ggttaacggc gggatataac
atgagctgtc ttcggtatcg tcgtatccca ctaccgagat 2820gtccgcacca acgcgcagcc
cggactcggt aatggcgcgc attgcgccca gcgccatctg 2880atcgttggca accagcatcg
cagtgggaac gatgccctca ttcagcattt gcatggtttg 2940ttgaaaaccg gacatggcac
tccagtcgcc ttcccgttcc gctatcggct gaatttgatt 3000gcgagtgaga tatttatgcc
agccagccag acgcagacgc gccgagacag aacttaatgg 3060gcccgctaac agcgcgattt
gctggtgacc caatgcgacc agatgctcca cgcccagtcg 3120cgtaccgtct tcatgggaga
aaataatact gttgatgggt gtctggtcag agacatcaag 3180aaataacgcc ggaacattag
tgcaggcagc ttccacagca atggcatcct ggtcatccag 3240cggatagtta atgatcagcc
cactgacgcg ttgcgcgaga agattgtgca ccgccgcttt 3300acaggcttcg acgccgcttc
gttctaccat cgacaccacc acgctggcac ccagttgatc 3360ggcgcgagat ttaatcgccg
cgacaatttg cgacggcgcg tgcagggcca gactggaggt 3420ggcaacgcca atcagcaacg
actgtttgcc cgccagttgt tgtgccacgc ggttgggaat 3480gtaattcagc tccgccatcg
ccgcttccac tttttcccgc gttttcgcag aaacgtggct 3540ggcctggttc accacgcggg
aaacggtctg ataagagaca ccggcatact ctgcgacatc 3600gtataacgtt actggtttca
cattcaccac cctgaattga ctctcttccg ggcgctatca 3660tgccataccg cgaaaggttt
tgcgccattc gatggtgtcc gggatctcga cgctctccct 3720tatgcgactc ctgcattagg
aaattaatac gactcactat aggggaattg tgagcggata 3780acaattcccc tgtagaaata
attttgttta actttaataa ggagatatac catgggcagc 3840agccatcacc atcatcacca
cagccaggat ccgaattcga gctcgaattc cttctctttt 3900actcgtttag caaccggcta
aacatcccca ccgcccggcc aaaagaaaaa taggtccatt 3960tttatcgcta aaagataaat
ccacacagtt tgtattgttt tgtgcaaaag tttcactacg 4020ctttattaac aatactttct
ggcgacgtgc gccagtgcag aaggatgagc tttcgttttc 4080agcatctcac gtgaagcgat
ggtttgcctt gctacaggga cgtcgcttgc cgaccataag 4140cgcccggtgt cctgccggtg
tcgcaaggag gagagacgtg cgatatgggt catcaccatc 4200atcaccacgg ctcgatcaca
agtttgtaca aaaaagcagg ctcagaaaac ctgtattttc 4260agggaggaga agaactaatt
agggaggtaa tcctcaagaa tttaaccctt aattctgctg 4320gaggaatagg attagaggag
cttgatgacg gagctacaat cccccttgga gataagcatt 4380tagtgtttac aatagatggg
catacagtaa agccgatatt cttcccaggg ggagacatcg 4440gaaggttggc cgttagcgga
actgtaaacg atttggctgt catgggagct caacccttgg 4500caattgcaag ctcgttgata
atcgaggaag ggtttgaagt tagtgagctg gaaaagattc 4560tgaagtcgat ggacgaaaca
gctaaagagg ttccagttcc aattgttact ggagacacaa 4620aagtcgttga agacaggata
ggaatcttcg ttataacagc tggagtgggg gtagctgaga 4680ggccgataag cgatgccggc
gcaaaagttg gggatgtcgt tttagtgagt ggaacaattg 4740gagaccacgg aatagcacta
atgagccata gagaggggat ctcctttgag acagagctta 4800agagcgatgt agctccaatt
tgggatgtcg taaaggccgt tgcagatgcc attggttggg 4860agaacatcca cgcaatgaaa
gatcccacaa gaggaggatt gagcaacgca ctaaacgaga 4920tggcaagaaa ggcaaacgtt
ggaattttgg taagagagga ggcaatacca attaggccag 4980aagtaaaagc tgccagcgaa
atgcttggaa taagtcccta tgaagttgca aacgaaggaa 5040aagttgtaat gatagtggcg
aaggagtatg cggaggaggc acttgaggcc atgaagaaga 5100cagaaaaggg tagggatgcc
gcaataatag gagaagttat tggtgaatac agaggaaaag 5160ttattctgga gacgggaatt
ggtggaagaa gatttttaga gccgcctctc ggtgatcccg 5220ttcctagagt ttgttaggag
gtggaaaatg tatctggggg agagaatgaa agcttataga 5280attcacgttc agggaatagt
tcaggccgtg ggatttaggc ccttcgttta tagaatagct 5340catgctcaca acttgagggg
atacgttagg aacttaggcg atgctggagt tgaaattgtt 5400gtcgagggaa gggaggaaga
catagaggca ttcatcaagg atttatacaa gaagaaaccc 5460ccacttgcaa ggattgataa
ggttgagagg gaggaaattc ctcttcaggg ctttgacaga 5520ttttacatag agaaaagctc
gacggaaaag aagggggagg gagattcaat aatccctccg 5580gacatagcta tttgtgagga
ctgtcttagg gagttattta atccaactga caagcgctac 5640atgtatcctt tcatagtatg
tacaaactgt gggccgaggt tcacgataat tgaagatctt 5700ccctacgata gggagaacac
agcgatgaga gaattcccga tgtgcgagtt ctgtaggagt 5760gaatacgagg atcccctgaa
taggaggtat catgcagagc cggttgcatg tccaacttgt 5820gggccgagct ataggcttta
cacgagcgat ggaaatgaga taattggaga ccccctgaga 5880aaggcggcaa aactaatcga
taagggatac atagttgcga taaagggtat aggtggaatt 5940catttggcct gcgatgctac
aagagaggat gtggtggccg agcttaggaa gaggattttt 6000aggcctcaga agcctttcgc
cattatggcc aaagatttag aaactgtaag gacttttgcc 6060tatatttctc ccgaagagga
ggaagaatta acaagctata gaaggccaat agtggctttg 6120aagaagaagg agcccttccc
acttcccgaa aacctcgctc ctgggcttca cacaattggg 6180gtaatgcttc cctatgctgg
aacccactac atattattcc actggagcaa gactccagtt 6240tacgttatga cttccgcaaa
cttcccaggg atgccgatga taaaggacaa tgaagaggca 6300tttgaaaagc ttagggacgt
tgctgactac ctcttgctcc acaataggag aattccaaat 6360agagctgacg atagcgttgt
tcgctttgta gatggtagaa gagctgttat taggaggagc 6420agaggatttg ttccacttgg
aatagagatt ccatttgagt acaaaggatt ggcagttggt 6480gctgagttaa tgaatgcttt
cggagttgtt aagaatggaa aagtttatcc aagtcagtac 6540ataggggata catcaaagat
tgaagtttta gagtttatga gggaagccgt gaggcacttc 6600ttcaagatat tgagagttga
taacttagat ctagttgttg cagatttgca tccaagctac 6660aacacaacta agctgggaat
ggagatcgct gaggaatttg gggcagaatt ccttcaagtt 6720caacatcact acgctcacgt
ggcctctgta atggctgagc acaacttgga ggaagttgtt 6780ggaattgctc tagatggtgt
tgggtatgga accgacggaa aaacttgggg tggggaagta 6840atatatctaa gctatgaaga
tgtggagagg ttggcccaca tagagtatta tccactccca 6900ggaggggatt tggccagcta
ctatcccttg agggccttaa ttggaatact cagcttaaac 6960cacgacttag aggaagttga
gaaaatcata agggagttct gtccaaatgc aataaagagc 7020ttaaagtatg gggaaacaga
gtttagggta attatgaggc aactcagcag cgggataaac 7080gttgcctatg cctcttcaac
gggaagggtg cttgatgcct tctcggtact tttgaacgtt 7140tcctacagga ggcactatga
gggagagcct gcgatgaagc tggagagctt tgcataccaa 7200ggaaagaacg atctaaagct
cacggctcca attgaaggtg aggaaataaa ggtttcagag 7260ttgtttgagg aagttcttga
gctgatgggc aaggccaatc ctaaagacat agcttactcc 7320gttcacttag ccttagctag
ggcatttgct gaagttagcg tggagaaagc taaggagttt 7380ggagctaaaa ctgtcgtttt
gggtggggga gtagggtaca atgagctaat agttaagacg 7440ataagaaaga tagtagaggg
gagagggcta aggttcttaa caacttacga agttcccagg 7500ggagataatg gaattaatgt
aggccaggcc ttcctgggag gattgtactt ggaaggatac 7560ttaaataggg aagatttgag
catttaggaa aacctgtatt ttcagggagg agacccagct 7620ttc
7623326020DNAartificialexpression vector sequence 32ttgtacaaag tggtgataat
taattaagat cagatccggc tgctaagctt gcggccgcat 60aatgcttaag tcgaacagaa
agtaatcgta ttgtacacgg ccgcataatc gaaattaata 120cgactcacta taggggaatt
gtgagcggat aacaattccc catcttagta tattagttaa 180gtataagaag gagatataca
tatggcagat ctcaattgga tatcggccgg ccacgcgatc 240gctgacgtcg gtaccctcga
gtctggtaaa gaaaccgctg ctgcgaaatt tgaacgccag 300cacatggact cgtctactag
cgcagcttaa ttaacctagg ctgctgccac cgctgagcaa 360taactagcat aaccccttgg
ggcctctaaa cgggtcttga ggggtttttt gctgaaacct 420caggcatttg agaagcacac
ggtcacactg cttccggtag tcaataaacc ggtaaaccag 480caatagacat aagcggctat
ttaacgaccc tgccctgaac cgacgaccgg gtcgaatttg 540ctttcgaatt tctgccattc
atccgcttat tatcacttat tcaggcgtag caccaggcgt 600ttaagggcac caataactgc
cttaaaaaaa ttacgccccg ccctgccact catcgcagta 660ctgttgtaat tcattaagca
ttctgccgac atggaagcca tcacagacgg catgatgaac 720ctgaatcgcc agcggcatca
gcaccttgtc gccttgcgta taatatttgc ccatagtgaa 780aacgggggcg aagaagttgt
ccatattggc cacgtttaaa tcaaaactgg tgaaactcac 840ccagggattg gctgagacga
aaaacatatt ctcaataaac cctttaggga aataggccag 900gttttcaccg taacacgcca
catcttgcga atatatgtgt agaaactgcc ggaaatcgtc 960gtggtattca ctccagagcg
atgaaaacgt ttcagtttgc tcatggaaaa cggtgtaaca 1020agggtgaaca ctatcccata
tcaccagctc accgtctttc attgccatac ggaactccgg 1080atgagcattc atcaggcggg
caagaatgtg aataaaggcc ggataaaact tgtgcttatt 1140tttctttacg gtctttaaaa
aggccgtaat atccagctga acggtctggt tataggtaca 1200ttgagcaact gactgaaatg
cctcaaaatg ttctttacga tgccattggg atatatcaac 1260ggtggtatat ccagtgattt
ttttctccat tttagcttcc ttagctcctg aaaatctcga 1320taactcaaaa aatacgcccg
gtagtgatct tatttcatta tggtgaaagt tggaacctct 1380tacgtgccga tcaacgtctc
attttcgcca aaagttggcc cagggcttcc cggtatcaac 1440agggacacca ggatttattt
attctgcgaa gtgatcttcc gtcacaggta tttattcggc 1500gcaaagtgcg tcgggtgatg
ctgccaactt actgatttag tgtatgatgg tgtttttgag 1560gtgctccagt ggcttctgtt
tctatcagct gtccctcctg ttcagctact gacggggtgg 1620tgcgtaacgg caaaagcacc
gccggacatc agcgctagcg gagtgtatac tggcttacta 1680tgttggcact gatgagggtg
tcagtgaagt gcttcatgtg gcaggagaaa aaaggctgca 1740ccggtgcgtc agcagaatat
gtgatacagg atatattccg cttcctcgct cactgactcg 1800ctacgctcgg tcgttcgact
gcggcgagcg gaaatggctt acgaacgggg cggagatttc 1860ctggaagatg ccaggaagat
acttaacagg gaagtgagag ggccgcggca aagccgtttt 1920tccataggct ccgcccccct
gacaagcatc acgaaatctg acgctcaaat cagtggtggc 1980gaaacccgac aggactataa
agataccagg cgtttcccct ggcggctccc tcgtgcgctc 2040tcctgttcct gcctttcggt
ttaccggtgt cattccgctg ttatggccgc gtttgtctca 2100ttccacgcct gacactcagt
tccgggtagg cagttcgctc caagctggac tgtatgcacg 2160aaccccccgt tcagtccgac
cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc 2220cggaaagaca tgcaaaagca
ccactggcag cagccactgg taattgattt agaggagtta 2280gtcttgaagt catgcgccgg
ttaaggctaa actgaaagga caagttttgg tgactgcgct 2340cctccaagcc agttacctcg
gttcaaagag ttggtagctc agagaacctt cgaaaaaccg 2400ccctgcaagg cggttttttc
gttttcagag caagagatta cgcgcagacc aaaacgatct 2460caagaagatc atcttattaa
tcagataaaa tatttctaga tttcagtgca atttatctct 2520tcaaatgtag cacctgaagt
cagccccata cgatataagt tgtaattctc atgttagtca 2580tgccccgcgc ccaccggaag
gagctgactg ggttgaaggc tctcaagggc atcggtcgag 2640atcccggtgc ctaatgagtg
agctaactta cattaattgc gttgcgctca ctgcccgctt 2700tccagtcggg aaacctgtcg
tgccagctgc attaatgaat cggccaacgc gcggggagag 2760gcggtttgcg tattgggcgc
cagggtggtt tttcttttca ccagtgagac gggcaacagc 2820tgattgccct tcaccgcctg
gccctgagag agttgcagca agcggtccac gctggtttgc 2880cccagcaggc gaaaatcctg
tttgatggtg gttaacggcg ggatataaca tgagctgtct 2940tcggtatcgt cgtatcccac
taccgagatg tccgcaccaa cgcgcagccc ggactcggta 3000atggcgcgca ttgcgcccag
cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 3060atgccctcat tcagcatttg
catggtttgt tgaaaaccgg acatggcact ccagtcgcct 3120tcccgttccg ctatcggctg
aatttgattg cgagtgagat atttatgcca gccagccaga 3180cgcagacgcg ccgagacaga
acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 3240aatgcgacca gatgctccac
gcccagtcgc gtaccgtctt catgggagaa aataatactg 3300ttgatgggtg tctggtcaga
gacatcaaga aataacgccg gaacattagt gcaggcagct 3360tccacagcaa tggcatcctg
gtcatccagc ggatagttaa tgatcagccc actgacgcgt 3420tgcgcgagaa gattgtgcac
cgccgcttta caggcttcga cgccgcttcg ttctaccatc 3480gacaccacca cgctggcacc
cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 3540gacggcgcgt gcagggccag
actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 3600gccagttgtt gtgccacgcg
gttgggaatg taattcagct ccgccatcgc cgcttccact 3660ttttcccgcg ttttcgcaga
aacgtggctg gcctggttca ccacgcggga aacggtctga 3720taagagacac cggcatactc
tgcgacatcg tataacgtta ctggtttcac attcaccacc 3780ctgaattgac tctcttccgg
gcgctatcat gccataccgc gaaaggtttt gcgccattcg 3840atggtgtccg ggatctcgac
gctctccctt atgcgactcc tgcattagga aattaatacg 3900actcactata ggggaattgt
gagcggataa caattcccct gtagaaataa ttttgtttaa 3960ctttaataag gagatatacc
atgggcagca gccatcacca tcatcaccac agccaggatc 4020cgtcaccctg gatgctgtac
aattgacgac gacaagggcc cgggcaaact agtaatcaga 4080cgcggtcgtt cacttgttca
gcaaccagat caaaagccat tgactcagca agggttgacc 4140gtataattca cgcgattaca
ccgcattgcg gtatcaacgc gcccttagct cagttggata 4200gagcaacgac cttctaagtc
gtgggccgca ggttcgaatc ctgcagggcg cgccattaca 4260attcaatcag ttacgccttc
tttatatcct ccagccatgg ccttgaaatg gcgttagtca 4320tgaaatatag accgccatcg
agtacccctt gtacccttaa ctcttcctga tacgtaaata 4380atgatttggt ggcccttgct
ggacttgaac cagcgaccaa gcgattatga gtcgcctgct 4440ctaaccactg agctaaaggg
ccttgagtgt gcaataacaa tacttataaa ccacgcaata 4500aacatgatga tctagagaat
cccgtcgtag ccaccatctt tttttgcggg agtggcgaaa 4560ttggtagacg caccagattt
aggttctggc gccgctaggt gtgcgagttc aagtctcgcc 4620tcccgcacca ttcaccagaa
agcgttgatc ggatgccctc gagtcgggca gcgttgggtc 4680ctggccacgg gtgcgcatga
tcgtgctcct gtcgttgagg acccggctag gctggcgggg 4740ttgccttact ggttagcaga
atgaatcacc gatacgcgag cgaacgtgaa gcgactgctg 4800ctgcaaaacg tctgcgacct
gagctcgaat tccttctctt ttactcgttt agcaaccggc 4860taaacatccc caccgcccgg
ccaaaagaaa aataggtcca tttttatcgc taaaagataa 4920atccacacag tttgtattgt
tttgtgcaaa agtttcacta cgctttatta acaatacttt 4980ctggcgacgt gcgccagtgc
agaaggatga gctttcgttt tcagcatctc acgtgaagcg 5040atggtttgcc ttgctacagg
gacgtcgctt gccgaccata agcgcccggt gtcctgccgg 5100tgtcgcaagg aggagagacg
tgcgatatgg gtcatcacca tcatcaccac ggctcgatca 5160caagtttgta caaaaaagca
ggctcagaaa acctgtattt tcagggagga aaagtagaga 5220aaggagatgt cataagactt
cattacactg gaaaggttaa agaaactgga gaaatcttcg 5280acacaactta tgaggatgtt
gcaaaagaag ctagaatata caatccaaac ggaatctatg 5340ggccagtccc tatagcggtt
ggagcgggac acgtattgcc cggactagac aagagactta 5400tagggcttga agttaagaaa
aaatacgtca ttgaagttcc acccgaagaa ggctttggat 5460tgagagatcc aggaaaaatt
aagattatcc cacttggaaa gttcagaaaa tctggaataa 5520tcccgtaccc tgggctagaa
attgaagttg aaacagaaaa tgggagaaaa atgagaggta 5580gggttcttac agttagcgga
ggaagagtta gagtagactt caatcatcca ttagcaggaa 5640agactctcgt atatgaagtt
gaagttgttg agaaaattga agatccaata gaaaagatta 5700aggcactaat agaactaaga
ctgccaatga ttgacaaaga taaggttatt attgagatta 5760gtgaaaaaga tgtaaagcta
aacttcaaag acgttgatat tgatccaaag acactaattt 5820tgggcgaaat tcttctcgaa
agtgacttga aatttatagg atatgagaaa gttgaatttg 5880agccaaccat tgaagagtta
ttaaagccca agtctgccga ggagcaagag tctcctaacg 5940aagaacagca agaggagagt
gagtctaaag cggaagaatc ttaggaaaac ctgtattttc 6000agggaggaga cccagctttc
6020336058DNAartificialplasmid sequence 33ttgtacaaac ttgtgatcga
gccacccata tcgcacgtct ctcctccttg cgacaccggc 60aggacaccgg gcgcttatgg
tcggcaagcg acgtccctgt agcaaggcaa accatcgctt 120cacgtgagat gctgaaaacg
aaagctcatc cttctgcact ggcgcacgtc gccagaaagt 180attgttaata aagcgtagtg
aaacttttgc acaaaacaat acaaactgtg tggatttatc 240ttttagcgat aaaaatggac
ctatttttct tttggccggg cggtggggat gtttagccgg 300ttgctaaacg agtaaaagag
aaggaattcg agctcgaatt cggatcctag agggaaaccg 360ttgtggtctc cctatagtga
gtcgtattaa tttcgcggga tcgagatctc gggcagcgtt 420gggtcctggc cacgggtgcg
catgatcgtg ctcctgtcgt tgaggacccg gctaggctgg 480cggggttgcc ttactggtta
gcagaatgaa tcaccgatac gcgagcgaac gtgaagcgac 540tgctgctgca aaacgtctgc
gacctgagca acaacatgaa tggtcttcgg tttccgtgtt 600tcgtaaagtc tggaaacgcg
gaagtcagcg ccctgcacca ttatgttccg gatctgcatc 660gcaggatgct gctggctacc
ctgtggaaca cctacatctg tattaacgaa gcgctggcat 720tgaccctgag tgatttttct
ctggtcccgc cgcatccata ccgccagttg tttaccctca 780caacgttcca gtaaccgggc
atgttcatca tcagtaaccc gtatcgtgag catcctctct 840cgtttcatcg gtatcattac
ccccatgaac agaaatcccc cttacacgga ggcatcagtg 900accaaacagg aaaaaaccgc
ccttaacatg gcccgcttta tcagaagcca gacattaacg 960cttctggaga aactcaacga
gctggacgcg gatgaacagg cagacatctg tgaatcgctt 1020cacgaccacg ctgatgagct
ttaccgcagc tgcctcgcgc gtttcggtga tgacggtgaa 1080aacctctgac acatgcagct
cccggagacg gtcacagctt gtctgtaagc ggatgccggg 1140agcagacaag cccgtcaggg
cgcgtcagcg ggtgttggcg ggtgtcgggg cgcagccatg 1200acccagtcac gtagcgatag
cggagtgtat actggcttaa ctatgcggca tcagagcaga 1260ttgtactgag agtgcaccat
atatgcggtg tgaaataccg cacagatgcg taaggagaaa 1320ataccgcatc aggcgctctt
ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg 1380gctgcggcga gcggtatcag
ctcactcaaa ggcggtaata cggttatcca cagaatcagg 1440ggataacgca ggaaagaaca
tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa 1500ggccgcgttg ctggcgtttt
tccataggct ccgcccccct gacgagcatc acaaaaatcg 1560acgctcaagt cagaggtggc
gaaacccgac aggactataa agataccagg cgtttccccc 1620tggaagctcc ctcgtgcgct
ctcctgttcc gaccctgccg cttaccggat acctgtccgc 1680ctttctccct tcgggaagcg
tggcgctttc tcatagctca cgctgtaggt atctcagttc 1740ggtgtaggtc gttcgctcca
agctgggctg tgtgcacgaa ccccccgttc agcccgaccg 1800ctgcgcctta tccggtaact
atcgtcttga gtccaacccg gtaagacacg acttatcgcc 1860actggcagca gccactggta
acaggattag cagagcgagg tatgtaggcg gtgctacaga 1920gttcttgaag tggtggccta
actacggcta cactagaagg acagtatttg gtatctgcgc 1980tctgctgaag ccagttacct
tcggaaaaag agttggtagc tcttgatccg gcaaacaaac 2040caccgctggt agcggtggtt
tttttgtttg caagcagcag attacgcgca gaaaaaaagg 2100atctcaagaa gatcctttga
tcttttctac ggggtctgac gctcagtgga acgaaaactc 2160acgttaaggg attttggtca
tgagattatc aaaaaggatc ttcacctaga tccttttaaa 2220ttaaaaatga agttttaaat
caatctaaag tatatatgag taaacttggt ctgacagtta 2280ccaatgctta atcagtgagg
cacctatctc agcgatctgt ctatttcgtt catccatagt 2340tgcctgactc cccgtcgtgt
agataactac gatacgggag ggcttaccat ctggccccag 2400tgctgcaatg ataccgcgag
acccacgctc accggctcca gatttatcag caataaacca 2460gccagccgga agggccgagc
gcagaagtgg tcctgcaact ttatccgcct ccatccagtc 2520tattaattgt tgccgggaag
ctagagtaag tagttcgcca gttaatagtt tgcgcaacgt 2580tgttgccatt gctgcaggca
tcgtggtgtc acgctcgtcg tttggtatgg cttcattcag 2640ctccggttcc caacgatcaa
ggcgagttac atgatccccc atgttgtgca aaaaagcggt 2700tagctccttc ggtcctccga
tcgttgtcag aagtaagttg gccgcagtgt tatcactcat 2760ggttatggca gcactgcata
attctcttac tgtcatgcca tccgtaagat gcttttctgt 2820gactggtgag tactcaacca
agtcattctg agaatagtgt atgcggcgac cgagttgctc 2880ttgcccggcg tcaatacggg
ataataccgc gccacatagc agaactttaa aagtgctcat 2940cattggaaaa cgttcttcgg
ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag 3000ttcgatgtaa cccactcgtg
cacccaactg atcttcagca tcttttactt tcaccagcgt 3060ttctgggtga gcaaaaacag
gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg 3120gaaatgttga atactcatac
tcttcctttt tcaatattat tgaagcattt atcagggtta 3180ttgtctcatg agcggataca
tatttgaatg tatttagaaa aataaacaaa taggggttcc 3240gcgcacattt ccccgaaaag
tgccacctga aattgtaaac gttaatattt tgttaaaatt 3300cgcgttaaat ttttgttaaa
tcagctcatt ttttaaccaa taggccgaaa tcggcaaaat 3360cccttataaa tcaaaagaat
agaccgagat agggttgagt gttgttccag tttggaacaa 3420gagtccacta ttaaagaacg
tggactccaa cgtcaaaggg cgaaaaaccg tctatcaggg 3480cgatggccca ctacgtgaac
catcacccta atcaagtttt ttggggtcga ggtgccgtaa 3540agcactaaat cggaacccta
aagggagccc ccgatttaga gcttgacggg gaaagccggc 3600gaacgtggcg agaaaggaag
ggaagaaagc gaaaggagcg ggcgctaggg cgctggcaag 3660tgtagcggtc acgctgcgcg
taaccaccac acccgccgcg cttaatgcgc cgctacaggg 3720cgcgtcccat tcgccaatcc
ggatatagtt cctcctttca gcaaaaaacc cctcaagacc 3780cgtttagagg ccccaagggg
ttatgctagt tattgctcag cggtggcagc agccaactca 3840gcttcctttc gggctttgtt
agcagccgga tctcagtggt ggtggtggtg gtgctcgagt 3900gcggccgcaa gcttagcagc
cggatctgat cttaattaat tatcaccact ttgtacaaga 3960aagctgggtc tccctattaa
agtctaacca cgtggactga gcaagatatg catggatcat 4020aagccctaac aaccatctca
gccagtatct ttaacctttc tggatcgtca ttgtagtgct 4080tttctgccat cattcttaca
tgttcttcca tcattgccaa gttgaatgct gtaggtgtta 4140ttatgtcggc ataagaaacc
cttccattct caactttgag ggcatagact aagattcccc 4200ttggagcctc agtcgttgag
acaccaaagc cgtcctttat ctcaacttca tccctgggct 4260taattggcca cttggcgaga
gcctcgtcga gcagatctat tgccctctct ataaagtaaa 4320ctatttcgag ggcctgggct
aagttatttg caaacggatt tgttcccttt aataggtctt 4380tgtttgcctc atacagctcc
ttggccttgc cgtataggag gtcagcattg ttaataactc 4440tagatatagc cccaaccatg
aagggtctgc ccttgtagtg actgtgcttt gcaaaactgt 4500gttcaacgac gaactccttt
atataatctc tgtacttttc acttgggaac tcctccccat 4560cacttgcctt tatgtaatct
ccataaattc cataagcatc tcccctcggc ttcacggcca 4620agtgtgttat tggcccttca
acttcgctgt actgctcaag ctttgcaaat aactcaaaag 4680tatactcggc aagtggtagg
gcttccctaa gctcggcttt cattttctca aggacactct 4740tctcagggag ctttccgaat
ccgcccaaaa ccgcattttc ttggtgtatg gctcttgacc 4800ctagaatgtc catcatccag
gtgccaaggt tcttcagctt aagggctatc tctatctccc 4860tcttgtattc attcaccatc
ttaagtgggc tcgagtagcc cctgtagtcg ggaagaacta 4920gaagatatag gtgaagggca
tgactctcta tcatgtctcc gatgtatagt acttctctaa 4980gggcctgtat ctcttccctt
gggacaaaac cgacggcctt ttctgcagcc tctaatgcgg 5040ttaacttgtg ggcggctgaa
cagaatgagc atattctcgg gtaaatggcc agagcttcct 5100caagcttctt cccaatagtt
atggcctcaa agaatctggg cccttcaatt atgtttagct 5160tgacctcctt gactccatca
tccccaatta ttatctccac accacccttc ccctcaactc 5220ttgctatatg atcaatggtg
attggaagat agaggttctt cattgttcac cacctgagaa 5280tattttttca accattttct
caaccctctc atcatgtcca ttgaacattt tcattctctc 5340aattatctcc tcttttgtca
tccccttctc cttgaacacc ttagctagag agtcgaacca 5400agctacatcg taccctattg
cccctctgca tcctatacac gcaactccaa atcctggaca 5460tctcgcgtta catcctgccc
ttgttactgg acctagacag ggttctcctt tctcaagaag 5520gatacatgga tgtccattga
gcctacattc tagacaaact ggataatcta tatcctctgg 5580ccatgaacca atcaagaatg
ttcccagggc gtagaggaag tccttcttct ctggtgggca 5640accgtagatg ttgtagtcaa
cttttatgta ttttgaaact ggttcagcct tcttcggttg 5700gaacttgact tttgcgtctc
cataaacctt cttccagagc tcttctaatg gcttttcact 5760ccagctctga actcctcctt
gaacagcaca agctccaacc gcaacgacga tctttgcatt 5820ctccctaatt tttttcacga
gttcaacttc ttcctcagtt gaaacgcttc cttctataaa 5880agctatgtcg accttttcat
cctcaatgct atctctatca atcatgaacc agcaaactat 5940ttcagcattt gggataagtt
gtaataactc gtccatcata gctagctgca attgacagcc 6000gtagcacgag gttaatgcgt
aaaatccaat cctaactttt cctcctgagc ctgctttt 605834367PRTPyrococcus
abyssi 34Met Arg Tyr Val Lys Leu Pro Lys Glu Asn Val Tyr Thr Phe Leu Glu1
5 10 15Arg Leu Lys Asp
Trp Gly Lys Leu Tyr Ala Pro Val Lys Ile Ser Glu 20
25 30Lys Phe Tyr Asp Phe Arg Glu Ile Asp Asp Val
Arg Lys Val Glu Phe 35 40 45His
Tyr Thr Arg Thr Ile Met Pro Pro Lys Lys Phe Phe Phe Lys Pro 50
55 60Arg Glu Lys Leu Phe Glu Phe Asp Ile Ser
Lys Pro Glu Tyr Arg Glu65 70 75
80Val Ile Glu Asp Val Glu Pro Phe Val Leu Phe Gly Val His Ala
Cys 85 90 95Asp Ile Tyr
Gly Leu Lys Leu Leu Asp Thr Val Tyr Leu Asp Glu Phe 100
105 110Pro Asp Lys Tyr Tyr Lys Val Arg Arg Glu
Lys Gly Ile Ile Ile Gly 115 120
125Ile Ser Cys Met Pro Asp Glu Tyr Cys Phe Cys Asn Leu Arg Glu Thr 130
135 140Asp Phe Ala Asp Asp Gly Phe Asp
Leu Phe Leu His Glu Leu Pro Asp145 150
155 160Gly Trp Leu Val Arg Val Gly Thr Pro Thr Gly His
Arg Ile Val Asp 165 170
175Lys Asn Ile Lys Leu Phe Glu Glu Val Thr Asn Glu Asp Ile Cys Ala
180 185 190Phe Arg Glu Phe Glu Lys
Lys Arg His Glu Ala Phe Lys Tyr His Glu 195 200
205Asp Trp Gly Asn Leu Arg Tyr Leu Leu Glu Leu Glu Met Glu
His Pro 210 215 220Met Trp Asp Glu Glu
Ala Glu Lys Cys Leu Ala Cys Gly Ile Cys Asn225 230
235 240Thr Thr Cys Pro Thr Cys Arg Cys Tyr Glu
Val Gln Asp Ile Val Asn 245 250
255Leu Asp Gly Val Thr Gly Tyr Arg Glu Arg Arg Trp Asp Ser Cys Gln
260 265 270Phe Arg Ser His Gly
Leu Val Ala Gly Gly His Asn Phe Arg Pro Thr 275
280 285Lys Lys Ser Arg Phe Leu Asn Arg Tyr Leu Cys Lys
Asn Ser Tyr Asn 290 295 300Glu Lys Leu
Gly Ile Ser Phe Cys Val Gly Cys Gly Arg Cys Thr Ala305
310 315 320Phe Cys Pro Ala Gly Ile Ser
Phe Val Arg Asn Leu Arg Arg Ile Leu 325
330 335Gly Leu Glu Glu Gln Lys Cys Pro Pro Ser Val Ser
Glu Glu Ile Pro 340 345 350Lys
Arg Gly Phe Ala Tyr Ser Pro Gly Val Gly Gly Glu Glu Glu 355
360 36535292PRTPyrococcus abyssi 35Met Thr Leu
Pro Lys Glu Val Met Met Pro Asn Asp Asn Pro Tyr Ala1 5
10 15Leu His Arg Val Lys Val Leu Lys Val
Tyr Asp Leu Thr Glu Arg Glu 20 25
30Lys Leu Phe Leu Phe Arg Phe Glu Asp Pro Lys Leu Ala Glu Thr Trp
35 40 45Thr Phe Lys Pro Gly Gln Phe
Val Gln Leu Thr Ile Pro Gly Val Gly 50 55
60Glu Val Pro Ile Ser Ile Cys Ser Ser Pro Met Arg Lys Gly Phe Phe65
70 75 80Glu Leu Cys Ile
Arg Arg Ala Gly Arg Val Thr Thr Val Val His Arg 85
90 95Leu Lys Pro Gly Asp Thr Val Leu Val Arg
Gly Pro Tyr Gly Asn Gly 100 105
110Phe Pro Val Asp Glu Trp Glu Gly Met Asp Leu Leu Leu Ile Ala Ala
115 120 125Gly Leu Gly Thr Ala Pro Leu
Arg Ser Val Phe Leu Tyr Ala Met Asp 130 135
140Asn Arg Trp Lys Tyr Gly Asn Ile Thr Phe Ile Asn Thr Ala Arg
Tyr145 150 155 160Gly Lys
Asp Leu Leu Phe Tyr Lys Glu Leu Glu Ala Met Lys Asp Leu
165 170 175Ala Glu Ala Glu Asn Val Lys
Ile Ile Gln Ser Val Thr Arg Asp Pro 180 185
190Asp Trp Pro Gly Leu His Gly Arg Pro Gln Gln Phe Ile Val
Glu Ala 195 200 205Asn Thr Asn Pro
Lys Asn Thr Ala Val Ala Ile Cys Gly Pro Pro Arg 210
215 220Met Tyr Lys Ala Val Phe Glu Ser Leu Ile Asn Tyr
Gly Tyr Arg Pro225 230 235
240Glu Asn Ile Tyr Val Thr Leu Glu Arg Arg Met Lys Cys Gly Ile Gly
245 250 255Lys Cys Gly His Cys
Val Ala Gly Thr Ser Thr Ser Trp Lys Tyr Ile 260
265 270Cys Lys Asp Gly Pro Val Phe Thr Tyr Phe Asp Ile
Val Ser Thr Pro 275 280 285Gly Leu
Leu Asp 29036258PRTPyrococcus abyssi 36Lys Leu Arg Ile Gly Phe Tyr Ala
Leu Thr Ser Cys Tyr Gly Cys Gln1 5 10
15Leu Gln Leu Ala Met Met Asp Glu Leu Leu Lys Leu Ile Pro
Asn Ala 20 25 30Glu Ile Val
Cys Trp Tyr Met Leu Asp Arg Asp Ser Val Glu Asp Lys 35
40 45Pro Val Asp Ile Ala Phe Ile Glu Gly Ser Val
Ser Thr Glu Glu Glu 50 55 60Val Glu
Leu Val Lys Lys Ile Arg Glu Asn Ala Lys Ile Val Val Ala65
70 75 80Val Gly Ala Cys Ala Val Gln
Gly Gly Val Gln Ser Trp Asp Lys Ser 85 90
95Leu Glu Glu Leu Trp Lys Thr Val Tyr Gly Asp Ala Lys
Val Lys Phe 100 105 110Gln Pro
Lys Lys Ala Glu Pro Val Ser Lys Tyr Ile Lys Val Asp Tyr 115
120 125Asn Ile Tyr Gly Cys Pro Pro Glu Lys Arg
Asp Phe Leu Tyr Ala Leu 130 135 140Gly
Thr Phe Leu Ile Gly Ser Trp Pro Glu Asp Ile Asp Tyr Pro Val145
150 155 160Cys Leu Glu Cys Arg Leu
Asn Gly Tyr Pro Cys Val Leu Leu Glu Lys 165
170 175Gly Glu Pro Cys Leu Gly Pro Ile Thr Arg Ala Gly
Cys Asn Ala Arg 180 185 190Cys
Pro Gly Phe Gly Ile Ala Cys Ile Gly Cys Arg Gly Ala Ile Gly 195
200 205Tyr Asp Val Ala Trp Phe Asp Ser Leu
Ala Arg Val Phe Lys Glu Lys 210 215
220Gly Leu Thr Lys Glu Glu Ile Leu Glu Arg Met Lys Ile Phe Asn Gly225
230 235 240His Asp Glu Arg
Ile Glu Lys Met Val Glu Lys Val Phe Gln Glu Val 245
250 255Lys Glu37428PRTPyrococcus abyssi 37Met
Arg Asn Leu Tyr Ile Pro Ile Thr Val Asp His Ile Ala Arg Val1
5 10 15Glu Gly Lys Gly Gly Val Glu
Ile Ile Val Gly Asp Glu Gly Val Lys 20 25
30Glu Val Lys Leu Asn Ile Ile Glu Gly Pro Arg Phe Phe Glu
Ala Ile 35 40 45Thr Ile Gly Lys
Lys Leu Glu Glu Ala Leu Ala Ile Tyr Pro Arg Ile 50 55
60Cys Ser Phe Cys Ser Ala Ala His Lys Leu Thr Ala Leu
Glu Ala Ala65 70 75
80Glu Lys Ala Ile Gly Phe Thr Pro Arg Glu Glu Ile Gln Ala Leu Arg
85 90 95Glu Val Leu Tyr Ile Gly
Asp Met Ile Glu Ser His Ala Leu His Leu 100
105 110Tyr Leu Leu Val Leu Pro Asp Tyr Leu Gly Tyr Ser
Ser Pro Leu Lys 115 120 125Met Val
Asn Glu Tyr Lys Lys Glu Leu Glu Ile Ala Leu Lys Leu Lys 130
135 140Asn Leu Gly Ser Trp Met Met Asp Val Leu Gly
Ser Arg Ala Ile His145 150 155
160Gln Glu Asn Ala Ile Leu Gly Gly Phe Gly Lys Leu Pro Ser Lys Glu
165 170 175Thr Leu Glu Glu
Met Lys Ala Lys Leu Arg Glu Ser Leu Ser Leu Ala 180
185 190Glu Tyr Thr Phe Glu Leu Phe Ala Lys Leu Glu
Gln Tyr Arg Glu Val 195 200 205Glu
Gly Glu Ile Thr His Leu Ala Val Lys Pro Arg Gly Asp Val Tyr 210
215 220Gly Ile Tyr Gly Asp Tyr Ile Lys Ala Ser
Asp Gly Glu Glu Phe Pro225 230 235
240Ser Glu Asp Tyr Lys Glu His Ile Asn Glu Phe Val Val Glu His
Ser 245 250 255Phe Ala Lys
His Ser His Tyr Lys Gly Lys Pro Phe Met Val Gly Ala 260
265 270Ile Ser Arg Val Val Asn Asn Lys Asp Leu
Leu Tyr Gly Arg Ala Lys 275 280
285Asp Leu Tyr Glu Ser His Lys Glu Leu Leu Lys Gly Thr Asn Pro Phe 290
295 300Ala Asn Asn Leu Ala Gln Ala Leu
Glu Leu Val Tyr Phe Ile Glu Arg305 310
315 320Ala Ile Asp Leu Ile Asp Glu Val Leu Ile Lys Trp
Pro Val Lys Glu 325 330
335Arg Asp Lys Val Glu Val Arg Asp Gly Phe Gly Val Ser Thr Thr Glu
340 345 350Ala Pro Arg Gly Ile Leu
Val Tyr Ala Leu Lys Val Glu Asn Gly Arg 355 360
365Val Ala Tyr Ala Asp Ile Ile Thr Pro Thr Ala Phe Asn Leu
Ala Met 370 375 380Met Glu Glu His Val
Arg Met Met Ala Glu Lys His Tyr Asn Asp Asp385 390
395 400Pro Glu Arg Leu Lys Leu Leu Ala Glu Met
Val Val Arg Ala Tyr Asp 405 410
415Pro Cys Ile Ser Cys Ser Val His Val Val Lys Leu 420
42538428PRTThermococcus kodakaraensis 38Met Lys Asn Val Tyr
Leu Pro Ile Thr Val Asp His Ile Ala Arg Val1 5
10 15Glu Gly Lys Gly Gly Val Glu Ile Val Val Gly
Asp Asp Gly Val Lys 20 25
30Glu Val Lys Leu Asn Ile Ile Glu Gly Pro Arg Phe Phe Glu Ala Ile
35 40 45Thr Leu Gly Lys Lys Leu Asp Glu
Ala Leu Ala Ile Tyr Pro Arg Ile 50 55
60Cys Ser Phe Cys Ser Ala Ala His Lys Leu Thr Ala Val Glu Ala Ala65
70 75 80Glu Lys Ala Ile Gly
Phe Thr Pro Arg Glu Glu Ile Gln Ala Leu Arg 85
90 95Glu Val Leu Tyr Ile Gly Asp Met Ile Glu Ser
His Ala Leu His Leu 100 105
110Tyr Leu Leu Val Leu Pro Asp Tyr Leu Gly Tyr Ser Gly Pro Leu His
115 120 125Met Ile Asp Glu Tyr Lys Lys
Glu Met Ser Ile Ala Leu Asp Leu Lys 130 135
140Asn Leu Gly Ser Trp Met Met Asp Glu Leu Gly Ser Arg Ala Ile
His145 150 155 160Gln Glu
Asn Ala Val Leu Gly Gly Phe Gly Lys Leu Pro Asp Lys Ser
165 170 175Val Leu Glu Asn Met Lys Arg
Arg Leu Lys Glu Ala Leu Pro Lys Ala 180 185
190Glu Tyr Thr Phe Glu Leu Phe Thr Lys Leu Glu Gln Tyr Glu
Glu Val 195 200 205Glu Gly Pro Ile
Thr His Ile Ala Val Lys Pro Arg Asn Gly Val Tyr 210
215 220Gly Ile Tyr Gly Asp Tyr Leu Lys Ala Ser Asp Gly
Asn Glu Phe Pro225 230 235
240Ser Glu Glu Tyr Arg Glu His Ile Lys Glu Phe Val Val Glu His Ser
245 250 255Phe Ala Lys His Ser
His Tyr His Gly Lys Pro Phe Met Val Gly Ala 260
265 270Ile Ser Arg Leu Val Asn Asn Ala Asp Thr Leu Tyr
Gly Arg Ala Lys 275 280 285Glu Leu
Tyr Glu Ser Tyr Lys Asp Leu Leu Arg Ser Thr Asn Pro Phe 290
295 300Ala Asn Asn Leu Ala Gln Ala Leu Glu Leu Val
Tyr Phe Thr Glu Arg305 310 315
320Ala Ile Asp Leu Ile Asp Glu Ala Leu Ala Lys Trp Pro Ile Arg Pro
325 330 335Arg Asp Glu Val
Ala Leu Lys Asp Gly Phe Gly Val Ser Thr Thr Glu 340
345 350Ala Pro Arg Gly Val Leu Val Tyr Ala Leu Lys
Val Glu Asn Gly Arg 355 360 365Val
Ser Tyr Ala Asp Ile Ile Thr Pro Thr Ala Phe Asn Leu Ala Met 370
375 380Met Glu Gln His Val Arg Met Met Ala Glu
Lys His Tyr Asn Asp Asp385 390 395
400Pro Glu Lys Leu Lys Leu Leu Ala Glu Met Val Val Arg Ala Tyr
Asp 405 410 415Pro Cys Ile
Ser Cys Ser Val His Val Ala Arg Leu 420
42539264PRTThermococcus kodakaraensis 39Met Ser Glu Lys Lys Ile Arg Ile
Gly Phe Tyr Ala Leu Thr Ser Cys1 5 10
15Tyr Gly Cys Gln Leu Gln Phe Ala Met Met Asp Glu Ile Leu
Gln Leu 20 25 30Ile Pro Asn
Val Glu Ile Ala Cys Trp Phe Met Leu Glu Arg Asp Ser 35
40 45Tyr Glu Asp Glu Pro Val Asp Ile Ala Phe Ile
Glu Gly Ser Val Ser 50 55 60Thr Glu
Glu Glu Ala Glu Leu Val Lys Lys Ile Arg Glu Asn Ala Lys65
70 75 80Ile Val Val Ala Val Gly Ser
Cys Ala Val Gln Gly Gly Val Gln Ser 85 90
95Trp Glu Lys Asp Lys Pro Leu Glu Glu Leu Trp Lys Thr
Val Tyr Gly 100 105 110Asp Ala
Lys Val Lys Phe Gln Pro Lys Met Ala Glu Pro Ile Ser Asn 115
120 125Tyr Ile Lys Val Asp Tyr Asn Ile Tyr Gly
Cys Pro Pro Glu Lys Arg 130 135 140Asp
Phe Leu Tyr Thr Leu Gly Thr Leu Leu Ile Gly Ser Trp Pro Glu145
150 155 160Asp Ile Asp Tyr Pro Val
Cys Leu Glu Cys Arg Leu Arg Gly Asn Thr 165
170 175Cys Val Leu Leu Glu Arg Gly Glu Pro Cys Leu Gly
Pro Val Thr Arg 180 185 190Ala
Gly Cys Asp Ala Arg Cys Pro Ala Tyr Gly Ile Ala Cys Ile Gly 195
200 205Cys Arg Gly Ala Ile Gly Tyr Asp Val
Ala Trp Phe Asp Ser Leu Ala 210 215
220Arg Val Phe Arg Glu Lys Gly Leu Thr Lys Glu Glu Ile Leu Glu Arg225
230 235 240Met Arg Met Phe
Asn Ala His Asn Pro Lys Leu Glu Glu Met Val Asn 245
250 255Lys Ile Phe Gln Glu Val Lys Glu
26040294PRTThermococcus kodakaraensis 40Met Ser Met Val Leu Pro Lys Glu
Ile Met Met Pro Asn Asp Asn Pro1 5 10
15Tyr Ala Leu His Arg Ala Lys Val Leu Arg Val Tyr Pro Leu
Thr Glu 20 25 30Lys Glu Lys
Leu Phe Leu Phe Arg Phe Glu Asp Ala Glu Leu Ala Glu 35
40 45Lys Trp Thr Phe Arg Pro Gly Gln Phe Val Gln
Leu Thr Ile Pro Gly 50 55 60Val Gly
Glu Val Pro Ile Ser Ile Cys Ser Ser Ala Met Arg Arg Gly65
70 75 80Phe Phe Glu Leu Cys Ile Arg
Lys Ala Gly Arg Val Thr Thr Val Val 85 90
95His Arg Leu Lys Pro Gly Asp Thr Val Leu Val Arg Gly
Pro Tyr Gly 100 105 110Asn Gly
Phe Pro Val Asp Glu Trp Glu Gly Met Asp Leu Leu Leu Ile 115
120 125Ala Ala Gly Leu Gly Thr Ala Pro Leu Arg
Ser Val Phe Leu Tyr Ala 130 135 140Met
Asp Asn Arg Trp Lys Tyr Gly Asn Ile Thr Phe Ile Asn Thr Ala145
150 155 160Arg Tyr Gly Lys Asp Leu
Leu Phe Tyr Lys Glu Leu Glu Ala Met Lys 165
170 175Asp Leu Ala Glu Ala Glu Asn Val Lys Ile Ile Gln
Ser Val Thr Arg 180 185 190Asp
Pro Asp Trp Pro Gly Leu His Gly Arg Pro Gln Asn Phe Ile Pro 195
200 205Glu Ala Asn Thr Asn Pro Lys Lys Thr
Ala Val Ala Ile Cys Gly Pro 210 215
220Pro Arg Met Tyr Lys Ala Val Phe Glu Ala Leu Ile Asn Tyr Gly Tyr225
230 235 240Arg Pro Glu Asn
Ile Tyr Val Thr Leu Glu Arg Lys Met Lys Cys Gly 245
250 255Ile Gly Lys Cys Gly His Cys Asn Val Gly
Thr Ser Thr Ser Trp Lys 260 265
270Tyr Val Cys Lys Asp Gly Pro Val Phe Gly Tyr Phe Asp Ile Ile Ser
275 280 285Thr Pro Gly Leu Leu Asp
29041367PRTThermococcus kodakaraensis 41Met Arg Tyr Val Lys Leu Pro Lys
Glu Asn Thr Tyr Thr Phe Leu Glu1 5 10
15Arg Leu Lys Glu Trp Gly Lys Leu Tyr Ala Pro Val Lys Ile
Ser Glu 20 25 30Lys Phe Tyr
Asp Phe Arg Glu Ile Asp Asp Val Arg Lys Val Glu Phe 35
40 45Asn Tyr Asn Arg Thr Ile Met Pro Pro Lys Lys
Phe Phe Phe Leu Pro 50 55 60Arg Glu
Lys Leu Phe Glu Phe Asp Leu Ser Arg Pro Glu Tyr Arg Glu65
70 75 80Thr Ile Glu Asp Val Glu Pro
Phe Val Ile Phe Gly Leu His Ala Cys 85 90
95Asp Ile His Gly Leu Lys Ile Leu Asp Thr Val Tyr Leu
Asp Glu Leu 100 105 110Pro Asp
Lys Tyr Tyr Lys Ala Arg Arg Glu Lys Gly Ile Ile Ile Gly 115
120 125Ile Ser Cys Met Pro Asp Glu Tyr Cys Phe
Cys Asn Leu Arg Glu Thr 130 135 140Asp
Phe Ala Asp Asp Gly Phe Asp Leu Phe Leu His Glu Leu Pro Asp145
150 155 160Gly Trp Leu Val Arg Val
Gly Ser Pro Thr Gly His Arg Ile Val Asp 165
170 175Lys Asn Met Glu Leu Phe Glu Glu Val Thr Thr Glu
Asp Ile Cys Asn 180 185 190Phe
Arg Glu Phe Glu Asn Lys Arg Ser Gln Ala Phe Lys Tyr His Glu 195
200 205Asp Trp Ser Asn Leu Arg Tyr Leu Leu
Glu Leu Glu Met Glu His Pro 210 215
220Met Trp Glu Glu Gln Ala Asp Leu Cys Leu Ala Cys Gly Ile Cys Asn225
230 235 240Thr Thr Cys Pro
Thr Cys Arg Cys Tyr Glu Val Gln Asp Ile Val Asn 245
250 255Leu Asp Gly Asn Thr Gly Tyr Arg Glu Arg
Arg Trp Asp Ser Cys Gln 260 265
270Phe Arg Ser His Gly Leu Val Ala Gly Gly His Asn Phe Arg Pro Thr
275 280 285Lys Lys Asp Arg Phe Arg Asn
Arg Tyr Leu Cys Lys Asn Ser Tyr Asn 290 295
300Glu Lys Leu Gly Leu Ser Tyr Cys Val Gly Cys Gly Arg Cys Thr
Tyr305 310 315 320Phe Cys
Pro Ala Gly Ile Ser Phe Val Arg Asn Leu Arg Thr Ile Leu
325 330 335Gly Leu Glu Glu Lys Ser Cys
Pro Ser Glu Ile Thr Glu Glu Ile Pro 340 345
350Lys Arg Gly Phe Ala Tyr Ala Ser His Ile Arg Gly Asp Gly
Leu 355 360 36542372PRTPyrococcus
horikoshii 42Met Glu Val Ile Leu Leu Arg Tyr Val Lys Leu Pro Lys Glu Asn
Thr1 5 10 15Tyr Glu Phe
Leu Glu Arg Leu Lys Glu Trp Gly Lys Leu Tyr Ala Pro 20
25 30Val Lys Ile Ser Glu Lys Phe Tyr Asp Phe
Arg Glu Ile Asp Asp Val 35 40
45Arg Lys Val Glu Phe His Tyr Thr Arg Thr Ile Met Pro Pro Lys Lys 50
55 60Phe Phe Phe Lys Pro Arg Glu Lys Met
Phe Glu Phe Asp Leu Ser Lys65 70 75
80Pro Glu Tyr Lys Glu Val Ile Glu Asp Val Glu Pro Phe Val
Leu Phe 85 90 95Gly Val
His Ala Cys Asp Ile Tyr Gly Leu Lys Ile Leu Asp Thr Ile 100
105 110Tyr Leu Asp Glu Leu Pro Asp Lys Tyr
Tyr Lys Ile Arg Arg Glu Lys 115 120
125Gly Ile Ile Ile Gly Ile Ser Cys Met Pro Asp Glu Tyr Cys Phe Cys
130 135 140Asn Leu Arg Lys Thr Asp Phe
Ala Asp Asp Gly Phe Asp Leu Phe Leu145 150
155 160His Glu Leu Pro Asp Gly Trp Leu Val Arg Val Gly
Ser Pro Thr Gly 165 170
175His Arg Ile Val Asp Lys Asn Ile Lys Leu Phe Glu Glu Val Thr Asp
180 185 190Glu Asp Ile Cys Ala Phe
Arg Glu Phe Glu Lys Lys Arg Gln Glu Ala 195 200
205Phe Lys Tyr His Glu Asp Trp Asp Asn Leu Arg Tyr Leu Leu
Glu Leu 210 215 220Glu Met Glu His Pro
Met Trp Glu Glu Glu Ala Asn Lys Cys Leu Ala225 230
235 240Cys Gly Ile Cys Thr Leu Thr Cys Pro Thr
Cys Arg Cys Tyr Glu Val 245 250
255Gln Asp Ile Val Asn Leu Asp Gly Ile Thr Gly Tyr Arg Glu Arg Arg
260 265 270Trp Asp Ser Cys Gln
Phe Arg Ser His Gly Leu Val Ala Gly Gly His 275
280 285Asn Phe Arg Pro Thr Lys Lys Asp Arg Phe Arg Asn
Arg Tyr Leu Cys 290 295 300Lys Asn Ala
Tyr Asn Glu Lys Leu Gly Leu Ser Tyr Cys Val Gly Cys305
310 315 320Gly Arg Cys Thr Ala Phe Cys
Pro Ala Gly Ile Ser Phe Val Arg Asn 325
330 335Leu Arg Val Ile Leu Gly Phe Glu Glu Gln Arg Cys
Pro Pro Asn Val 340 345 350Ser
Glu Glu Ile Pro Lys Lys Gly Phe Ala Tyr Ser Pro Gly Val Gly 355
360 365Gly Asp Glu Glu
37043292PRTPyrococcus horikoshii 43Met Asn Leu Pro Lys Asp Val Met Met
Pro Asn Asp Asn Pro Tyr Ala1 5 10
15Leu His Arg Val Lys Val Leu Lys Val Tyr Asp Leu Thr Glu Lys
Glu 20 25 30Lys Leu Phe Leu
Phe Arg Phe Glu Asp Pro Lys Leu Ala Glu Thr Trp 35
40 45Thr Phe Lys Pro Gly Gln Phe Val Gln Leu Thr Ile
Pro Gly Val Gly 50 55 60Glu Val Pro
Ile Ser Ile Cys Ser Ser Pro Met Arg Arg Gly Phe Phe65 70
75 80Glu Leu Cys Ile Arg Arg Ala Gly
Arg Val Thr Thr Val Val His Arg 85 90
95Leu Lys Pro Gly Asp Ile Val Leu Val Arg Gly Pro Tyr Gly
Asn Gly 100 105 110Phe Pro Val
Asp Glu Trp Glu Gly Met Asp Leu Leu Leu Ile Ala Ala 115
120 125Gly Leu Gly Ala Ala Pro Leu Arg Ser Val Phe
Leu Tyr Ala Met Asp 130 135 140Asn Arg
Trp Lys Tyr Gly Asn Ile Thr Phe Ile Asn Thr Ala Arg Tyr145
150 155 160Gly Lys Asp Leu Leu Phe Tyr
Lys Glu Leu Glu Ala Ile Lys Asp Leu 165
170 175Ala Glu Ala Glu Asn Val Lys Ile Ile Gln Ser Val
Thr Arg Asp Pro 180 185 190Asn
Trp Pro Gly Leu His Gly Arg Pro Gln Gln Phe Ile Val Glu Ala 195
200 205Asn Thr Asn Pro Lys Asn Thr Ala Val
Ala Ile Cys Gly Pro Pro Arg 210 215
220Met Tyr Lys Ser Val Phe Glu Ala Leu Ile Asn Tyr Gly Tyr Arg Pro225
230 235 240Glu Asn Ile Tyr
Val Thr Leu Glu Arg Lys Met Lys Cys Gly Ile Gly 245
250 255Lys Cys Gly His Cys Val Val Gly Thr Ser
Thr Ser Leu Lys Tyr Ile 260 265
270Cys Lys Asp Gly Pro Val Phe Thr Tyr Phe Asp Ile Val Ser Thr Pro
275 280 285Gly Leu Leu Asp
29044265PRTPyrococcus horikoshii 44Met Gly Glu Met Gly Lys Lys Lys Ile
Arg Ile Gly Phe Tyr Ala Leu1 5 10
15Thr Ser Cys Tyr Gly Cys Gln Leu Gln Leu Ala Met Met Asp Glu
Leu 20 25 30Leu Leu Leu Leu
Pro His Ile Glu Leu Val Cys Trp Tyr Met Val Asp 35
40 45Arg Asp Ser Ile Asp Asp Glu Pro Val Asp Ile Ala
Phe Ile Glu Gly 50 55 60Ser Val Ser
Thr Glu Glu Glu Val Glu Leu Val Lys Lys Ile Arg Glu65 70
75 80Asn Ser Lys Ile Val Val Ala Val
Gly Ala Cys Ala Val Gln Gly Gly 85 90
95Val Gln Ser Trp Asp Lys Ser Leu Glu Glu Leu Trp Arg Thr
Val Tyr 100 105 110Gly Asp Ala
Lys Val Lys Phe Lys Pro Lys Lys Ala Glu Pro Val Ser 115
120 125Lys Tyr Ile Lys Val Asp Tyr Asn Ile Tyr Gly
Cys Pro Pro Glu Lys 130 135 140Arg Asp
Phe Leu Tyr Ala Leu Gly Thr Phe Leu Ile Gly Ser Trp Pro145
150 155 160Glu Asp Ile Asp Tyr Pro Val
Cys Leu Glu Cys Arg Leu Asn Gly Tyr 165
170 175Pro Cys Val Leu Leu Glu Lys Gly Glu Pro Cys Leu
Gly Pro Val Thr 180 185 190Arg
Ala Gly Cys Asn Ala Arg Cys Pro Gly Phe Gly Ile Ala Cys Ile 195
200 205Gly Cys Arg Gly Ala Ile Gly Tyr Asp
Val Ala Trp Phe Asp Ser Leu 210 215
220Ala Arg Val Phe Lys Glu Lys Gly Leu Thr Lys Glu Glu Ile Ile Glu225
230 235 240Arg Met Lys Ile
Phe Asn Gly His Asp Asp Arg Ile Glu Lys Met Val 245
250 255Glu Lys Ile Phe Gln Gly Val Lys Glu
260 26545429PRTPyrococcus horikoshii 45Met Lys Glu
Ile Tyr Ile Pro Ile Thr Val Asp His Ile Ala Arg Ile1 5
10 15Glu Gly Lys Ala Gly Val Glu Ile Leu
Val Gly Glu Asp Gly Val Lys 20 25
30Glu Val Lys Leu Asn Ile Ile Glu Gly Pro Arg Phe Phe Glu Ala Ile
35 40 45Thr Leu Gly Lys Lys Leu Glu
Glu Ala Leu Ala Ile Tyr Pro Arg Ile 50 55
60Cys Ser Phe Cys Ser Ala Ala His Lys Leu Thr Ala Leu Glu Ala Ala65
70 75 80Glu Lys Ala Ile
Gly Phe Thr Pro Arg Glu Glu Ile Gln Ala Leu Arg 85
90 95Glu Ile Leu Tyr Ile Gly Asp Ile Ile Glu
Ser His Ala Leu His Leu 100 105
110Tyr Leu Leu Val Leu Pro Asp Tyr Leu Gly Tyr Ser Ser Pro Leu Lys
115 120 125Met Val Asp Glu Tyr Lys Lys
Glu Leu Glu Thr Ala Ile Lys Leu Lys 130 135
140Asn Leu Gly Ser Trp Ile Met Asp Val Leu Gly Ala Arg Ala Ile
His145 150 155 160Gln Glu
Asn Ala Ile Leu Gly Gly Phe Gly Lys Leu Pro Ser Lys Glu
165 170 175Thr Leu Glu Lys Ile Lys Asp
Glu Leu Lys Ser Ala Leu Pro Leu Ala 180 185
190Glu Tyr Thr Phe Glu Leu Phe Ser Lys Leu Glu Gln Tyr Lys
Glu Val 195 200 205Glu Gly Glu Ile
Thr His Leu Ala Val Lys Pro Arg Lys Asp Ala Tyr 210
215 220Gly Ile Tyr Gly Asp Arg Ile Lys Ala Ser Asp Gly
Glu Glu Phe Pro225 230 235
240Ser Glu Glu Tyr Lys Asn Tyr Ile Lys Glu Phe Val Val Glu His Ser
245 250 255Phe Ala Lys His Ser
His Tyr Lys Gly Arg Pro Phe Met Val Gly Ala 260
265 270Ile Ser Arg Leu Val Asn Asn His Lys Leu Leu Tyr
Gly Lys Ala Lys 275 280 285Glu Leu
Tyr Glu Asn Asn Lys Asp Leu Leu Arg Pro Thr Asn Pro Phe 290
295 300Ala Asn Asn Leu Ala Gln Ala Leu Glu Ile Val
Tyr Phe Met Glu Arg305 310 315
320Ala Ile Asp Leu Ile Asp Glu Val Leu Ala Lys Trp Pro Ile Lys Pro
325 330 335Arg Asp Glu Val
Lys Val Arg Asp Gly Phe Gly Val Ser Thr Thr Glu 340
345 350Ala Pro Arg Gly Ile Leu Val Tyr Ala Leu Lys
Val Glu Asn Gly Arg 355 360 365Val
Ser Tyr Ala Asp Ile Ile Thr Pro Thr Ala Phe Asn Leu Ala Met 370
375 380Met Glu Arg His Val Arg Met Met Ala Glu
Glu His Tyr Lys Asp Asp385 390 395
400Pro Glu Lys Leu Lys Leu Leu Ala Glu Met Val Val Arg Ala Tyr
Asp 405 410 415Pro Cys Ile
Ser Cys Ser Val His Val Val Lys Leu Gln 420
4254618PRTartificialconsensus L1 site 46Arg Xaa Cys Xaa Xaa Cys Xaa Xaa
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa1 5 10
15Xaa Xaa4718PRTartificialL1 site 47Arg Ile Cys Ser Phe Cys
Ser Ala Ala His Lys Leu Thr Ala Leu Glu1 5
10 15Ala Ala4818PRTartificialL1 site 48Arg Val Cys Gly
Ile Cys Ser Ala Ala His Lys Leu Thr Ala Leu Glu1 5
10 15Ala Ala4912PRTartificialconsensus L2 site
49Arg Xaa Xaa Asp Pro Cys Ile Ser Cys Xaa Xaa His1 5
105081DNAEscherichia coli 50ggtcatcacc atcatcacca cggctcgatc
acaagtttgt acaaaaaagc aggctcagaa 60aacctgtatt ttcagggagg a
815128PRTartificialfusion protein
fragment 51Met Gly His His His His His His Gly Ser Ile Thr Ser Leu Tyr
Lys1 5 10 15Lys Ala Gly
Ser Glu Asn Leu Tyr Phe Gln Gly Gly 20
255210DNAartificialCD-ABI intergenic sequence 52gaggtggaaa
105372DNAartificialhypD-hypA
intergenic sequence 53tttacaaata tggcgccctg atgtaggagg tggaaaatgc
acgaatgggc gttggcagat 60gcaatagtaa gg
725475DNAartificialhypE-hypF intergenic sequence
54gtgatcccgt tcctagagtt tgttaggagg tggaaaatga tctgggggag agaatgaaag
60cttatagaat tcacg
7555813DNAartificialmodified sequence of expression vector 55ggatccccgt
caccctggat gctgtacaat tgacgacgac aagggcccgg gcaaactagt 60aatcagacgc
ggtcgttcac ttgttcagca accagatcaa aagccattga ctcagcaagg 120gttgaccgta
taattcacgc gattacaccg cattgcggta tcaacgcgcc cttagctcag 180ttggatagag
caacgacctt ctaagtcgtg ggccgcaggt tcgaatcctg cagggcgcgc 240cattacaatt
caatcagtta cgccttcttt atatcctcca gccatggcct tgaaatggcg 300ttagtcatga
aatatagacc gccatcgagt accccttgta cccttaactc ttcctgatac 360gtaaataatg
atttggtggc ccttgctgga cttgaaccag cgaccaagcg attatgagtc 420gcctgctcta
accactgagc taaagggcct tgagtgtgca ataacaatac ttataaacca 480cgcaataaac
atgatgatct agagaatccc gtcgtagcca ccatcttttt ttgcgggagt 540ggcgaaattg
gtagacgcac cagatttagg ttctggcgcc gctaggtgtg cgagttcaag 600tctcgcctcc
cgcaccattc accagaaagc gttgatcgga tgccctcgag tcgggcagcg 660ttgggtcctg
gccacgggtg cgcatgatcg tgctcctgtc gttgaggacc cggctaggct 720ggcggggttg
ccttactggt tagcagaatg aatcaccgat acgcgagcga acgtgaagcg 780actgctgctg
caaaacgtct gcgacctgag ctc
8135610484DNAPyrococcus furiosus 56aggggttttt aacctttggt tttcaatttt
cgggtttaaa aaggcttttt tatctccctc 60accaacttta gactgggaaa caaaaatgtt
cactaacgaa aatttgagga gtattggtca 120attatgctca ttgggaggtg gtttgtgtga
ggtatgttaa gttacccaag gaaaacactt 180acgagttttt ggaaagactt aaagactggg
ggaagcttta cgctccagta aaaatttcgg 240acaagttcta tgacttcagg gagattgatg
atgttagaaa gatagaattc cactacaaca 300ggacaataat gccacctaag aagttcttct
tcaagccgag ggaaaagctc tttgagttcg 360acatttcaaa accagaatac agggaggtaa
tagaggaagt tgaaccattt attatatttg 420gagtccacgc gtgtgacata tatggcctaa
agatcctaga cacggtatac cttgatgagt 480tccccgacaa gtactacaag gtgaggagag
agaaggggat aatcattgga ataagctgta 540tgccagatga atattgcttc tgtaacttaa
gagaaacaga cttcgctgat gatggttttg 600acttgttctt ccatgaactg cccgatggat
ggttggtaag ggttggcact ccaactgggc 660acaggcttgt tgacaagaac ataaagctct
ttgaagaggt aacggacaag gatatctgtg 720catttagaga ttttgaaaag aggagacagc
aagcattcaa ataccacgaa gactggggca 780acttgaggta tcttctcgag ttggaaatgg
aacatccaat gtgggatgag gaggcagata 840agtgcttggc ttgtggaata tgtaacacca
catgcccaac gtgtagatgc tatgaagttc 900aggatattgt aaacctagat ggagttactg
gatacaggga aagaagatgg gattcttgtc 960agttcagaag tcatggctta gttgctgggg
gccacaactt caggcccaca aagaaggatc 1020gctttaggaa cagatacctc tgtaagaacg
catataacga aaagcttgga ttaagctact 1080gtgtcggttg tggaaggtgt actgcattct
gtccagccaa tataagtttt gtaggcaatc 1140ttagaaggat tttaggactt gaggagaaca
aatgtccccc aacggttagt gaggagattc 1200caaagagagg atttgcatat tcctctaaca
ttagaggtga tggagtatga tgttgccaaa 1260agagattatg atgccaaatg ataatccgta
tgcccttcat agagtcaaag ttctaaaggt 1320ttactccttg acggaaacgg aaaagctttt
cctctttaga tttgaggatc ccgagttggc 1380agagaagtgg acgttcaaac ctggacagtt
tgtccagctg acgatacctg gagttggaga 1440ggttcccata agtatatgct cttctccaat
gaggaaagga ttctttgagc tctgtataag 1500aaaggcagga agggtcacaa ctgttgtcca
tagactaaag cctggcgata ctgttcttgt 1560gagagggcct tacggtaatg gattcccagt
ggatgagtgg gaaggaatgg atctactatt 1620aatagctgct ggccttggaa ctgcacctct
taggagcgtc tttctctatg caatggacaa 1680caggtggaag tatggaaaca ttaccttcat
aaacaccgca cgttatggga aggatctcct 1740cttctacaag gagctggagg caatgaaaga
cctagctgag gctgaaaacg tgaaaatcat 1800ccagagcgtc actagggatc caaactggcc
gggcctaaag ggtaggccac agcagttcat 1860cgttgaggcc aacacaaatc caaagaacac
tgcagttgca atctgtgggc ctcctagaat 1920gtataagtca gtgtttgagg ccctcatcaa
ctacggttat cgcccagaga acatcttcgt 1980gacattggag agaagaatga aatgtggaat
cgggaagtgc ggccactgca acgtcggaac 2040gagcacgagc tggaagtaca tctgtaaaga
tggaccagtc ttcacgtact tcgacatagt 2100ttcaacccca ggactgctgg actgaggtga
ggaaaatggg aaaagttagg attggatttt 2160acgcattaac ctcgtgctac ggctgtcaat
tgcagctagc tatgatggac gagttattac 2220aacttatccc aaatgctgaa atagtttgct
ggttcatgat tgatagagat agcattgagg 2280atgaaaaggt cgacatagct tttatagaag
gaagcgtttc aactgaggaa gaagttgaac 2340tcgtgaaaaa aattagggag aatgcaaaga
tcgtcgttgc ggttggagct tgtgctgttc 2400aaggaggagt tcagagctgg agtgaaaagc
cattagaaga gctctggaag aaggtttatg 2460gagacgcaaa agtcaagttc caaccgaaga
aggctgaacc agtttcaaaa tacataaaag 2520ttgactacaa catctacggt tgcccaccag
agaagaagga cttcctctac gccctgggaa 2580cattcttgat tggttcatgg ccagaggata
tagattatcc agtttgtcta gaatgtaggc 2640tcaatggaca tccatgtatc cttcttgaga
aaggagaacc ctgtctaggt ccagtaacaa 2700gggcaggatg taacgcgaga tgtccaggat
ttggagttgc gtgtatagga tgcagagggg 2760caatagggta cgatgtagct tggttcgact
ctctagctaa ggtgttcaag gagaagggga 2820tgacaaaaga ggagataatt gagagaatga
aaatgttcaa tggacatgat gagagggttg 2880agaaaatggt tgaaaaaata ttctcaggtg
gtgaacaatg aagaacctct atcttccaat 2940caccattgat catatagcaa gagttgaggg
gaagggtggt gtggagataa taattgggga 3000tgatggagtc aaggaggtca agctaaacat
aattgaaggg cccagattct ttgaggccat 3060aactattggg aagaagcttg aggaagctct
ggccatttac ccgagaatat gctcattctg 3120ttcagccgcc cacaagttaa ccgcattaga
ggctgcagaa aaggccgtcg gttttgtccc 3180aagggaagag atacaggccc ttagagaagt
actatacatc ggagacatga tagagagtca 3240tgcccttcac ctatatcttc tagttcttcc
cgactacagg ggctactcga gcccacttaa 3300gatggtgaat gaatacaaga gggagataga
gatagccctt aagctgaaga accttggcac 3360ctggatgatg gacattctag ggtcaagagc
catacaccaa gaaaatgcgg ttttgggcgg 3420attcggaaag ctccctgaga agagtgtcct
tgagaaaatg aaagccgagc ttagggaagc 3480cctaccactt gccgagtata cttttgagtt
atttgcaaag cttgagcagt acagcgaagt 3540tgaagggcca ataacacact tggccgtgaa
gccgagggga gatgcttatg gaatttatgg 3600agattacata aaggcaagtg atggggagga
gttcccaagt gaaaagtaca gagattatat 3660aaaggagttc gtcgttgaac acagttttgc
aaagcacagt cactacaagg gcagaccctt 3720catggttggg gctatatcta gagttattaa
caatgctgac ctcctatacg gcaaggccaa 3780ggagctgtat gaggcaaaca aagacctatt
aaagggaaca aatccgtttg caaataactt 3840agcccaggcc ctcgaaatag tttactttat
agagagggca atagatctgc tcgacgaggc 3900tctcgccaag tggccaatta agcccaggga
tgaagttgag ataaaggacg gctttggtgt 3960ctcaacgact gaggctccaa ggggaatctt
agtctatgcc ctcaaagttg agaatggaag 4020ggtttcttat gccgacataa taacacctac
agcattcaac ttggcaatga tggaagaaca 4080tgtaagaatg atggcagaaa agcactacaa
tgacgatcca gaaaggttaa agatactggc 4140tgagatggtt gttagggctt atgatccatg
catatcttgc tcagtccacg tggttagact 4200ttaatccttt ttatctattt ttgttgagta
cttgtggaga ttctcattca catcacaata 4260ggagagctct tctcttgagg agatgataac
aatgcccttc tctttgagaa tttcgaggat 4320agactttagg actttatgtt ttgagtcctc
atcaatggca acaactggat cgtcaagaac 4380ataaatctcg gcattcacta gcaaggtgga
tgccaattga actcttctaa ttgttccctg 4440ggaaagctct cccagcttct tctttaaatc
caagacctcc acggattcaa gtgcatccat 4500aatttcattt ttattaactt taactccata
aagactggcc actgctttta aataatcctc 4560aacacttatt ttcctgggca cgattatttc
ttcaggaagg aaaaatattt tgcccttaac 4620ttttgttata gggactccat tataaattat
ttctcccttg aggggtttca aatatgttga 4680tattgttttt aaaagtgtgg tttttcctat
cccatttgga ccgtggaagt tcacgacatt 4740acctttctct atggtcattg ttattctttc
gagaactggt ttatcataac caacactaag 4800atctctaatc tcaagtttca ttcccatccc
tcccaaattc ctattattcc agaaatagat 4860actaaaagga gggggattgc agcaatacca
tttcctttgc taaccaatat tattcctata 4920atgaagggag ctatgaatcc aagaatccag
ccacacaact ttctaattga actaacttcc 4980actgtcggtt cccacacaaa cattaatttc
ttgaaatcta tagttacttt tacaggtgtc 5040attaggggaa gatattgaag aacttcatga
acatataccg ctccaactag tggcaacact 5100acatttttca ctatagattt catatagcaa
ttagtgaatt cccctgttat tttacctata 5160agaaaactaa tccaaagtac tagagctaca
agaaatccta catatatgct taccattttt 5220atgaaattta aaaattgcct agacatttct
tatcaccctt tctagcttta tcctcacaaa 5280atatgcaagt ggagagataa gaattaacaa
gggaattacc cacataggaa taattttcct 5340tataataaat ggagcaccca ggataattag
atacagaaat aggaagctgt ttctttcaaa 5400acttggcaat gttattgata atacaactct
acttatcgct aacatgaaaa ataagatata 5460tagtgtccct aaatactccc tttcaagaat
tgccaaggaa taaaatgtca acggaacaat 5520tagcgaaatc aagaagagta gaatttcctt
tattagtctt cttaaatagt tctctggttt 5580tagatagtgg agataggcta tataagaatc
aacataacta tcaccaacca ggaataaagg 5640ccaaatcata ggggcagtta tgcatatagt
aactagcgta cttgcaatcc tctctataat 5700atagaatttt tgcttatcta caatgatatg
taatggtaac aaggctttaa atgctccaac 5760tccacaccca aatttcactc cttgcattct
caggtgctgg gccataagtg tgaaaactat 5820agagagaata atagcgattc ccctaatttc
aaaggaaagg gtagaaatat agatgtttct 5880aactagataa actcttatca gcctatcttt
tagaacagag gctaccaagt atgcaattac 5940gaggatgata taacctatca gagtcatctt
tctaccacta attgctagga gaccaagaat 6000tgtggctatg cacttaattt taaatgaaac
agacagtgga agtatagata aagcggcaaa 6060tacaataata ttggagggca aaaaacctgg
gtaaagatag gagcaactaa ggatggaggg 6120gagatttata actactgaga caaatagcac
aatgagatca gtgtcgggtc tatatttgag 6180gatcactcca gttttcttag gatcaaagga
atttgaaaag agaagtggaa ttgcacctaa 6240caaggcaaaa actattatgt cttcgataat
cttacctttt aagaagattg ttaatggtat 6300aagagagatc atcccagcga gataattata
gttttttata gataccaaaa tatggtatct 6360taatatttcc actattctca ctttcattac
ctcctaaatc ttctaaggat ttttattgag 6420ctcacaaccc ccaaaagata acataggatt
cttgttattg gagttacctt tactgagaca 6480taatatggct catttattgc attaaataga
atgccctgcc cgggtggtat tttatttgta 6540gtcaaaatga agattgccaa caaataacta
actaaaatag aaaatgaaag agctaagggt 6600attgccgaaa ccaaggctag ggcaaaaaat
agttttttag atcttaccac acgaaatcac 6660ctcctatcgc agttggaagc gctggatctg
taggattatc tggcatacat tcacagaggc 6720atttgatctc aactcctgaa atagttgctt
ttgtctgtgg cccacagtcg gacattatac 6780ctccacctgt actacacaag attgggcact
ccctacaata gccatagcac atctttgtgt 6840agtatgtcgc tgtagcagct attattacaa
ataaaactat tacccacaat cctacaccat 6900aatacttttg ttttctttaa tacatatata
atcaccattt aaattatgct actataaatt 6960ttataaaatt ttcgagaata tcactataac
agaagctatt aaaatataat aattattcct 7020aatttgatcg acgatactgt caggataact
ggggtatcac ctcttgaagc cattcagtca 7080catcaccagg cggtccacca aaccgagaat
gaattctaac aaaattatac cagaatgaaa 7140acagaaaaac aaacctgtga accctcctcc
agtctctagc cctgaagtta ttccagaaac 7200gctttgttct ctctttaaca gtcctaaacc
agcgctcaac acagttcctc ggcccgaaag 7260tcacatgcag ataatccagc ccgagagatt
taaacgctga tttataccac ggccctttgt 7320caaccaggaa aattggctgt ccctcgcagg
atttcaaaac aactagaatg aagtccctgg 7380caatccacca gttcctaacg cttgtaatcc
atactgctag gatttctttg ctctcaacgt 7440cgattgcagc ccagagaaat ctcttctggc
cgttgatctt tatcactgtc tcgtcaattg 7500cgatgaagtt tctctgtttt ttgactgcga
ggattttcgg ctggtaaact gctttcgcga 7560atttttggac tgtttcccag actgttgtgt
ggctgatttc gaggattgtt cctacctgtc 7620tgtaacttag tccgtgcagg tacaggttta
ttgccctggt tttctttttt gctgggattt 7680tgttccggcg aaaggttttt aagactgaaa
ccagtaagta gataatggtt tcagtcctca 7740tttctctccc cttttctgaa gaggtatcag
aaacttaaac ctaacgtccc actgcttatc 7800ctgacagtgt cttgatcgac tttagaaaca
tttttattct tgtttatgtt cccttagact 7860atgagcacca ggggagactt gatcagaatt
ttaggtgaga tagaggaaaa gatgaacgaa 7920ctgaaaatgg atggctttaa ccctgacata
atcctttttg gcagagaggc ttataacttt 7980ctttcaaatc tcttaaaaaa ggaaatggaa
gaggaagggc cttttacgca tgtctctaat 8040atcaagatag aaattcttga ggaattagga
ggagacgcag ttgttataga ttcaaaagtc 8100ctaggcctag ttcctggggc cgcaaagaga
atcaaaatta ttaagtagcg ctttccaaag 8160tacaggagat gctcacttcc tccttagcta
ggattagacc aaaatataac ataaaggagt 8220tgagtgttgc ccaggagggg actagcctcc
ttgatattaa taaagggtct ctgcgaagag 8280ttttgtcctg tatcatatta aagagttcgt
taattcttgc atctgcaagt tgaaggccta 8340accttgtccg agatttggct gtaatgactt
taacagagta atgtttaacc aaaaaaagaa 8400gactttaaaa ccttccactc acaataagta
gacgagtcaa caacaatttg agggaaaaga 8460catgggaaat gaaggtgtcc acccccacct
gcggaaaagg ttttggagag agatgggtat 8520aaatgcagaa tttgtgatca cagctatctc
gatattcatt acaaggacgg gaatgtagag 8580aacaagaatt tagaaaattt gatagttttg
tgcaaacaat gtcattatcg acttcaccaa 8640aaggaaagga tggaaagcat taaacaagct
ttcgaggatt tcctcgatga actttctaaa 8700aatcctattg aagttgttat agatttcagt
ttcaaaaaaa ttgtagagag taatgaagaa 8760aaaatccgaa gagagattat acagggattt
actcgtcctt ttggtgttat atcaaggatc 8820caagagaaag ttagggatgc aataatgaag
gaaatcgagg aggaaataga aaaagagcaa 8880gcaagtactc ctgaacatct ccgaaaggtt
gttcttgaaa gaaataatta tagatgttca 8940gtgtgcggat acggatattt agaggttcac
catgtggatg gaaatattct aaataacacc 9000ttggataatt tagtaaccct ctgtagaagg
tgtcatcgta aagtccatta tcatccaagt 9060tttcatacaa caccggagga tatggacaaa
tgtattagaa gttttcatca tgagttttat 9120agtacgatct atgaaataat gaagaacaaa
aagggaaaca ttagaataag cattaaattc 9180gatcaactag gtgttaaagg tgtaaaaatt
agtagagctc aatttaaaag aattaatggg 9240ctctttaatc atgaagtcat aaatgatggt
atttttaagc agtgggaaag agaaattaag 9300aattatttaa gccgacttga atgggaacag
caaaaagaaa tatatagaaa tgtatacttc 9360ttgctagaat gtattttgcc taaagattca
tttgaagcgt ttgttaacct tgcaaggaaa 9420ggaaaatttg atagaagaac attaagggaa
gcaaagaaag tactaaagaa ctcaattaaa 9480taatttttgt aatttttccc tggaaataca
gctcctattc tactattttt aaagtgctgt 9540cttcttcttt tataaaccca tattttttgt
tactctttag gaagttcttt attatttcac 9600aaagctcagg gttgagagat cttttcagtt
acacacttct tattattcct aaagtacgaa 9660tagaacttag gacttccact ggagtggtat
actccaagta tcttcttgtt tctcagcttt 9720tcagctatgt cgggaaaaat cttcttgtat
atttttttct ttttagccaa ctttttcata 9780atttcgtagg actctcgccc caaacataaa
attaagtccc ccgctatgaa atcaagtacg 9840cgacctaaaa acttccctat gcagttttgt
agggtttctg caggtagtgg gactctaaca 9900ttttcactct cacagtatac taattctcca
aacagaattg tacttcctcc ggtaaattaa 9960aacagtcttt gctttcttta agaattctag
tgtgtttgca aagtatttat agtggaaagg 10020atagttacgg tagttcttga taaattctga
gacccatgtg tcatggattt ctttaaaagc 10080tttgatggcg ttacttttgc tatattctga
agtaaataat ccatgctttt tcaggatgtc 10140tgtgtagtaa agtctttcaa atggtaagtg
ggggcctgga tttattccaa aaataccaat 10200ttttgctttt tctctggaag gctcgccttg
aagtgtagaa aggattagaa cccttggaat 10260aatgccctca tccttggaat tctttatacc
ttcacacttc tccttctctg agcataagat 10320catctcactg cctaactcca agaatgcctc
aggcatcatg gtaatgattt tacacagaga 10380atttaataat aatttcggat ttctcaatgc
ttcttaattg agaagctaca ttttgaaaat 10440tgagaaaaat caaaggtacc agtgtgtctc
agaaaagtga atat 104845710029DNAPyrococcus furiosus
57agtaataaaa ctacataaaa cttttaccct agttcccatc aggtcgttag aattattaag
60tacttaaatt ttttgattgg ttggtggttg ttatgaactt tcagcaggaa atcctgatca
120taaaatccga aatctatccg atagtcagca aacactaccc gaaaaacact cgcagggaag
180taatcagcct ctacgacctg ataaccttcg caatactagc ccacctgcac ttcggaggag
240tttacaaaca cgcttacgga gccctaatcg aggaaatgaa actgttcccc aaaatcaggt
300acaacaaact aacagaacgc ttgaacaggc acgaaaaact tctgctccta gcgcaggaag
360aattattcaa aaaacacgcc agagaatacg ttagaatact ggactcaaaa cccattcaga
420ccaaggagtt ggccagaaaa aacaggaagg ataaggaggg ttcttcagaa atcatctctg
480aaaagcccgc agttgggttt gttccctcta aaaaaagttt tactatgggt acaagctgac
540ctgttactct gatgggaacc tgttggcttt gctgtccgtt gatccggcaa acaagcatga
600tgtgagtgtt gtcagggaaa agttctgggt gattgttgag gagttttccg gctgttttct
660gtttttggat aagggttacg ttagtagaga acttcaggag gaattcctga agtttggcgt
720tgtttacacg ccggtgaagc gggagaatca ggttagtaat ctggaggaga agaagtttta
780caagtacttg tctgactttc gcagaaggat tgagactttg ttttcgaagt tttctgagtt
840tcttctgagg ccgagcagga gtgttagttt gagggggtta gctgtcagga ttttaggggc
900gattctggcc gtgaatctgg acagattata caacttcaca gatggtggga actagggtta
960aaactttttg atcgtcaatt aatcataata atggcaaaag tttacttagt ggattattat
1020gccacttatg atcttttcat aggggttagt atggaaaacc atatcaagat attgaaggac
1080atgaagtggg gggtaagaaa tggttcgtgt tacgctcgtt aactatacaa agaggccctt
1140agaaacaata acttgggctg cccttataag ctattggggg gaatggagca cggaatcatt
1200tgaaaggata agtgagaatg atgtagaaaa gcatctccct cggatattgg gttatggtca
1260tgagagcatt ttggagcatg caacgtttac tttctcaatc gaaggttgta gtagggtttg
1320tactcatcaa cttgtgaggc atagaatagc cagctacacc cagcaaagcc agcgttacat
1380tgttcttgac gaggagaacg ttgaggaaac gtttgtaatt cctgaatcga taaagaaaga
1440tagagagctt tatgaaaaat ggaagaaggt catggctgag acaataagcc tttacaagga
1500gagcataaat aggggagttc accaggaaga tgctcgattc attcttcctc aagctgtgaa
1560aacgaagata attgtgacga tgaacttgag agaattgaag cacttctttg gccttagact
1620atgtgaaagg gctcaatggg agattaggga agttgcatgg aagatgttag aggagatggc
1680gaagagggat gatataaggc cgataataaa gtgggctaaa cttgggccta ggtgcattca
1740gtttggctat tgtcccgaga gagatctaat gcctcctggg tgcttaaaga aaactagaaa
1800aaagtgggaa aaagttgcgg aaagtaagag ctaaattgtt atattgagta aaagctttct
1860ttctttattt gtctttatgg caaaatccca gaagttcagc tattgaatta gagaactgtt
1920cgtcactgaa agtaaacttc tatgggattc ttctgaatta tatggtaagg tttggaaaat
1980ttggacataa aagtcttaaa gtttcctttt tcaactctaa actagggtga gctaatggat
2040actgaaaaac ttatgaaagc cggagaaata gcaaaaaaag taagagagaa agctattaaa
2100cttgctagac ctgggatgtt gttgttagaa cttgcagagt ctatagaaaa gatgataatg
2160gaacttgggg gtaaacctgc tttcccagta aatttatcaa ttaatgaaat tgcagctcac
2220tatactcctt acaagggaga tactactgtt ctgaaagagg gggattatct aaagatcgac
2280gtgggggttc acatagatgg atttatagca gatactgcag ttacagttag agtagggatg
2340gaagaagatg agcttatgga ggctgccaag gaagcgttaa acgccgcaat ttctgtagct
2400agggcgggag tggagataaa ggaactagga aaggcaatag aaaatgaaat taggaagaga
2460ggattcaaac caatagttaa tctaagtggg cacaagatag aaagatacaa gcttcatgca
2520gggattagca ttccgaacat ttatagaccg catgataact atgttttaaa ggaaggagat
2580gttttcgcaa ttgagccttt cgctactata ggtgctggtc aagtaattga ggttccccca
2640accttaatct acatgtacgt tagagatgtt ccagttagag tggcccaagc taggttcctt
2700ttggctaaga taaaaaggga atatggaacc ctaccctttg cctataggtg gcttcagaat
2760gacatgccag aaggacagct taagttggcc ctaaaaaccc tcgaaaaggc tggagctata
2820tatggctatc cagtgcttaa agaaattaga aatggcattg tggcacaatt tgagcacaca
2880atcattgttg aaaaggattc tgtgatagtg acgacagaat gagttaaact ttataagttc
2940tcatgtatca agaaattggg agcgccgggg tagcctagtc agggaaggcg cgggactcga
3000gatcccgtgg gcgttcgccc gccggggttc aaatccccgc cccggcgcca tttgttaagc
3060acttggaggt ttgataatat ggcatttcta aaggtagtgt cattggaaga agcaatttca
3120ataattaata gctttagact tgaaatagga tttgaggaag ttactttaga taaagctctg
3180gggaggatag ttgcagagga tatttattcc cccttggata ttcctccctt tgatagatcg
3240accgttgatg ggtatgctgt tagggcggag gatactttta tggccagtga agctaatcca
3300gtggaactca aagtaattgg agaagttcat gccggagaac aaccttcagt aaagttaagc
3360aagggagagg cggtctacat tacaacgggg tcaatgatgc cagagaacgc aaatgctgtg
3420attccttttg aggatgttga gagagaagga gatattataa gaatttataa gcctgcatac
3480ccaggtttag gagtcatgaa gaaaggaact gacataaaaa agggccaact cttaattaga
3540agaggaacta agctaacgtt taaagaaact gccctgcttt ctgctgcggg atttttaaaa
3600gtaaaggtct ttaaaaagcc taaagttgcg gtcataagta cggggaatga aattgttctc
3660ccaggtgaag agcttaggcc tggccaaata tatgacatca atggtagagc aatagttgat
3720gccgttaatg aattgggtgg agagggaata ttcgttggga ttgccaggga tgacagagaa
3780agtctcaaaa aattaatact tcaagcctta gaagttggag atattatcgt tattagtggg
3840ggggcaagtg ggggaataaa agacttaaca gcctcggtaa tagaggaact tggagaggtt
3900aaagttcatg gaattgcaat tcagccaggt aaacccacaa taataggggt tataaacggt
3960aagcctgtct ttggcctacc tgggtatccg acaagttgcc taacaaactt caccctctta
4020gttgctcccc tgcttttgag gctacttgga agggaaggaa aaattaagaa ggttaaggcg
4080aaaattaagc ataaagtatt ttcggtaaag ggaagaagac aattcctccc agttaaactt
4140gagggagatg tagcggttcc tatcttgaag ggaagcggag cagtcacaag ctttgtggag
4200gcagatggtt ttgtggaaat tcccgagaat gtagaaagcc ttgatgaggg agaagaagta
4260acggtaacgt tgttctcgtt ttaggaggtg atagtatggt caaggttaag gttaagtact
4320ttgctagatt taggcaactt gcaggagttg atgaagagga gattgagctt ccagagggag
4380ctagagttag ggacttgata gaagaaataa agaaaagaca tgaaaaattt aaggaggagg
4440tctttggaga aggatacgat gaggatgccg atgttaacat tgccgtaaat ggaaggtatg
4500taagctggga tgaagagtta aaggatgggg atgttgttgg agtatttcct cccgtaagcg
4560gaggttaaca tttacatact tttacataaa cttctcttct cctgggtcca tctaactcta
4620caaagagaat gctctgccaa gttcctaaca taagttggcc atttactatt ggaatagtca
4680cgcttgggcc aagtattata gctctgaggt gagagtgggc gttgttatct atagaatcgt
4740gtctgtatcc tgcacctttg ggaattaatt ttgagagaat attttctatg tcgttaagga
4800gccttggctc gttctcattt actattattc ctgtggtggt atgcctagta tagacaacgg
4860caattccatt atcgatgcca ctttttctaa cgatttcctg gactttttcc gttatatcta
4920ttatttcaac ttctttggaa gtccttatag tgatggtttc aatcatattt cttcccctct
4980agataccttt ttatcatctc cctagcgttt tctatatgct tatttgcctc ttcttcatta
5040atgtttttta acgtggccct aacagtaaca atgctctcaa aggcctccaa taacctttga
5100tctgttgtct cttttagaat tctttttagc attaagtatg cttcgtctat gctctcctct
5160cttagggttt tcttagatag tccttcgagg attcgatcta gaatgaatat tctatcttgc
5220aatagcttcc ttcttagtat taatcttctc attgactttc ccctctacca cttttactaa
5280aagttcggaa gcaagttttg aagcaatacc tctatcttta atattcaaaa caacgtcaat
5340agcatctcca acacgcccaa ttctaatgag ataatgggct aggaatccta aggcaactga
5400cctgtgccgt tcgctattaa ttctttttat tactctaact gcctgttgaa ggttgttgag
5460ctctaaataa tactttgtta ttcctactag tatgtcttcg cttattccct ctttctcgag
5520gaggacttga atcattggct ccatcttggg agaacccctc tctagaatcc taaatatgat
5580atccttaacc attaggacca tatcaggggg aaggctttcc attaaaattt tcagcttatc
5640taagtcttct ttagaaagta gagaagttag aatttcccta actattattc tttgcgtggt
5700gggcggtatt gtctttaata cttcaatcga ttgttggata aatccgtgaa ttgcaaatat
5760gtaggcaata tcttccctta tatcttctcc cactttttta gccagctcct cactctcttc
5820tatcaaaatt tttactattt ctgagttttc ctcattattc ttcaaaaatt ctagaacctc
5880atttaatgcc tgtgctatcc agagtttgct ccctatagag tttattaact caagaacaag
5940cttgtattct cctagtgaaa gtagtggttt aattgacttg actattgcct cctctctata
6000gggctcctca atagtttcga gaattaggag tactttcctt cttttatata ttttgtttag
6060tttttcccct tccaaatttt ctaaaacttt ttcgagtatt tcaagaagtt ttttgtttct
6120aagcttcttg ttcttaatct ccacggcata aaataccgct tcgtcagtgt cttttaatga
6180aagcaaataa tcgattgctt cggcttttac aatgtcctga atttcctctg gaagaagttg
6240tgccgatgat atggcctctc taaatgcttt ctttgctgat ttaagccctg ctaaacttgt
6300tgaatatcca atggcgagaa gagctcttac taaaatataa ggatcttcta tttttgataa
6360ttcttcgaaa gcttttctga atgcttttcc tgccctggga tcttttattt tagacaaata
6420tactcctatt ctcccatatg ttaataccct aacgaatggg tctggtatcg agggtactaa
6480ctctaatatt tcatctatta ccataatacc tcaccataag attatacatg gcaaaacgca
6540cttactaagg taaatttatg gacatagata ttttaatctt ttcgtttttg aagcaaatct
6600ttttgtagga agatgatgaa ctaatggttt caaaatgttt aaataaaagc ttaaggtgta
6660gtcaaaatgt tgtctcaaat ttaaagaaaa gaggcgaaac aaagaaaata gagggaagat
6720actttacttc ttgagctttt cacacttctt tacccactcc tcaagaacgt ctctgagctt
6780tggcttgcct atttcctcaa gctcatactt gactcttact gcaggcttgt tgaggttcat
6840gaatcttctt aggtctactg gagttcccat aacgacaacg tctgcatctg ctctgttaat
6900tgtttcctct agctctttga tctgcttctt gccgtatccc attgctggga gtatgttgct
6960taggtgtggg tacttcttgt atgtttcaat tattgaccca acagcgtatg gccttggatc
7020tactatctcc ttagctccga acttcttggc tgctatgtaa cctgcaccga agctcattcc
7080accatgggtg agggtcggac catcctcaac tacgagaacg cgcttaccct tgattagctc
7140tggcttgtcc acgaagattg gtgatgctgc ttcaatgact atagcatttg gatttatctt
7200ttcaatgttc tctctaatct tctgtatgtt ctctggtggg gctgtgtcta ttttattgat
7260tataataaca tcagcacttc tgaagtttgt ttcacctggg tggtgtgtca actcatgacc
7320aggtctgtgt gggtcagtga caactatcca taagtcgggc tcgaagaatg ggaagtcgtt
7380gttcccaccg tcccagagga ttatgtcggc ctctttctct gcctccctca gtatcttctc
7440gtagtcaact ccagcgtata ctaccattcc tctctctagg tatggctcat actcttctct
7500ctcctcaatt gtacactcat atctgtcgag gtcctcaaag gtcgcaaagc gctgaacaac
7560ttgctttctt agatcaccgt agggcattgg gtgtctgact gcaactacct tgaatcccat
7620ctcttggagg atttgggcca cttttcttga ggtctggctc tttccacatc ctgttctgac
7680tgcagttacg gctacaacgg gcttgcttga ctttagcatt gtgctctttg gtccaagtag
7740ccagaagtca gccccagcac tgtgggctct acttgctaag tgcatgacgt gttcgtgaga
7800aacgtcagag tacgcgaaaa ccactatgtc aacatcatgc tctttgatta tcttttccaa
7860atcatcttct ggtagaattg gaattccatt tggatacagt tcaccagcta gctctggggg
7920atatattctc ccctctatat ctggaatttg ggtggcagtg aaggcaacaa cctcgtaatc
7980tgggttatct ctgaaaaaga cgttgaagtt gtggaagtct ctacccgcag cacccagaat
8040tacaaccctt ctcctttttt tctcggccat tttgatcacc tcagaatgtt ttatttcgag
8100ataatactca atctagacat ttataacgat tttcatttaa attggaaata atttttcgaa
8160tgattttaag taaaagttgt gtaaagtcga aaatatttcg aataaatgtg tgtattatta
8220aagggattaa gaaaagggaa aaggttgaaa acttcaagtt tcaaaaaccc ctaaaaagtc
8280taaatcaaac cctctaatgg tgggagtaaa atgtgccttg caatcccagg gaaagtggtg
8340gagattaaag gtaacgttgg aatagtggat tttggaggaa tacggagaga ggtaaggtta
8400gatcttttga gtgatgttaa agttggcgat tacgttatag ttcacactgg ctttgctata
8460gaaaagttag atgagaggag agctagagaa attcttgaag cctgggaaga agttttctca
8520gtaattgggg gtgagtaaat gcttgaaaaa tttggagaca aagctgtagc tcaaaagatt
8580ttagaaaaaa ttaaagagga agctaaaggg atagaagagc tacgatttat gcacgtttgt
8640gggactcatg aggacacagt aactaggagt ggaatcagat cacttcttcc agaaaatgta
8700aaaatcatga gtggcccagg atgtcccgtc tgtataaccc ccgttgagga catagtgaag
8760atgatggaaa ttatgaaagt tgcgagagag gagagggaag aaattattct cactactttt
8820ggtgacatgt atagaattcc aactccaata ggaagctttg cagacttaaa gagtcagggt
8880tacgatgtga ggatagttta ctctatatac gactcctata aaatagccaa ggaaaatcca
8940gataagcttg tagtgcactt ttctcctggg tttgagacta ccgccgctcc aacagctgga
9000atgcttgaga gcattgtgga agaggggcta gagaacttta agatttattc cgttcatagg
9060ttaacccctc ctgcagttga agctctccta aatgcgggga ctgtttttca cggtttaata
9120gatcctggtc atgtctctac aataattggg gtgaaaggat gggcgtatct cacagaaaag
9180tttggaattc ctcaagttgt ggctggcttt gagccagttg atgttttact cggaatactt
9240attctcatta ggcttgtgaa gaggggcgaa gcgaaaataa tcaacgagta taatagagtt
9300gtaaagtggg aaggaaatgt caaggcccaa gaactgattt ggaagtactt tgaagttaaa
9360gatgcaaagt ggagggccct aggagtaatt ccaaggagcg gattggaact taagaaagag
9420tggaaggagc tagaaattag aacttattac aatcccgagg ttccaaagct cccagatctt
9480gaaaaaggat gtctctgtgg ggcagtcctt agaggattag ccttaccgac ccagtgccaa
9540cactttggaa agacatgtac accaagacat ccggtaggtc cttgtatggt ttcgtacgaa
9600ggaacttgtc acatatttta caaatatggc gccctgatgt agtttttatt acgcaaaagt
9660aatataccac tacagcataa accccaaata tggattatcg aaaaattctc gatattcatc
9720atagttttgg ttgttttttc atcagttgct cttctgtcaa agccttatct tccaagagaa
9780cagaaaagaa taacgtactc aggagaaaag ataatcttgc ctgccccaag aactgaagga
9840gaaatgagtg ttgaagaagc tattgcaaaa agaaggagca ttaggacata caaaaatgag
9900cctctaaaga tagaggagct tggtcaacta ttatgggctg cacaaggtat aactcatgaa
9960tataagaggg cagccccaag tgcaggagca acatatccct ttgaaatctt cgttgtcgtt
10020ggtaatgtc
100295812PRTartificialGateway attB1 site 58Gly Ser Ile Thr Ser Leu Tyr
Lys Lys Ala Gly Ser1 5
10597PRTartificialTEV protease recognition site 59Glu Asn Leu Tyr Phe Gln
Gly1 5
User Contributions:
Comment about this patent or add new information about this topic: