Patent application title: THERMOCELLULASES FOR LIGNOCELLULOSIC DEGRADATION
Inventors:
Rolf A. Prade (Stillwater, OK, US)
Hongliang Wang (Tempe, AZ, US)
IPC8 Class: AC12P1914FI
USPC Class:
435 99
Class name: Micro-organism, tissue cell culture or enzyme using process to synthesize a desired chemical compound or composition preparing compound containing saccharide radical produced by the action of a carbohydrase (e.g., maltose by the action of alpha amylase on starch, etc.)
Publication date: 2015-04-02
Patent application number: 20150093791
Abstract:
Thermostable cellulase enzyme systems comprising at least one each of a
thermostable endoglucanase, an exo-processive-endoglucanase, and a
β-glucosidase carry out the complete, coordinated hydrolysis of
crystalline cellulose to monomeric glucose.Claims:
1-5. (canceled)
6. A method of hydrolyzing cellulose, comprising the steps of contacting said cellulose with one or more recombinant cellulase enzymes with an amino acid sequence represented by an amino acid sequence selected from the group consisting of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ. ID NO: 30 and SEQ ID NO: 32; and activating said one or more recombinant cellulase enzymes by the application of heat.
7. A method of hydrolyzing cellulose to produce glucose, comprising the step of heating said cellulose to a temperature in the range of 70.degree. C. to 100.degree. C. in the presence of one or more recombinant cellulase enzymes, wherein said one or more recombinant cellulase enzymes includes: at least one recombinant endoglucanase selected from the group consisting of Termocel 3, Termocel4, Termocel 5, Termocel 6, Termocel 9 and Termocel 10; at least one recombinant exoprocessive-endoglucanase selected from the group consisting of Termocel 1 and Termocel 2; and, at least one recombinant beta-glucosidase selected from the group consisting of Termocel 7 and Termocel 8.
8. A method of fermenting glucose to produce ethanol, comprising the steps of providing a source of cellulose; heating said cellulose to a temperature in the range of 80.degree. C. to 100.degree. C. in the presence of at least one recombinant endoglucanase enzyme selected from the group consisting of Termocel 3, Termocel4, Termocel 5, Termocel 6, Termocel 9 and Termocel 10; at least one recombinant exoprocessive-endoglucanase enzyme selected from the group consisting of Termocel 1 and Termocel 2; and at least one recombinant beta-glucosidase enzyme selected from the group consisting of Termocel 7 and Termocel 8; wherein said step of heating is carried out under conditions whereby at least a portion of said cellulose is hydrolyzed to glucose; and fermenting said glucose to produce ethanol.
9. The method of claim 8, wherein said step of fermenting is catalyzed by one or more yeasts.
10. (canceled)
Description:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional application of issued U.S. Pat. No. 8,847,031 B2, issued Sep. 30, 2014, and claims the benefit of U.S. Provisional Patent Application Ser. No. 61/079,208 filed on Jul. 9, 2008 and international patent application No. PCT/US2009/050080, filed Jul. 9, 2009, and incorporates said applications by reference into this document as if fully set out at this point.
BACKGROUND OF THE INVENTION
[0003] 1. Field of the Invention
[0004] This invention generally relates to thermostable enzymes capable of degrading (hydrolyzing) cellulose at high temperatures, and the incorporation of nucleic acids coding for one or more of such enzymes into a host, and, more particularly, a host that produces or is composed of cellulosic material.
[0005] 2. Background of the Invention
[0006] Cellulose is a polysaccharide consisting of a linear chain of several hundred to over nine thousand β (1→4) linked D-glucose units [formula (C6H10O.sub.5)n]. Cellulose is the most abundant organic compound on earth, making up about 33 percent of all plant matter, about 50 percent of wood, and about 90 percent of products such as cotton. In nature, cellulose is present as part of the lignocellulosic biomass of plants, which is composed of cellulose, hemicellulose, and lignin. The carbohydrate polymers (cellulose and hemicelluloses) are tightly bound to the lignin, by hydrogen and covalent bonds.
[0007] Many highly desirable products are derived from lignocellulosic biomass. In particular, much interest has recently been focused on recapturing the saccharide building blocks locked in plant biomass for biofuel production. For example, fermentation of plant biomass to ethanol is an attractive carbon neutral energy option since the combustion of ethanol from biomass produces no net carbon dioxide in the earth's atmosphere. Further, biomass is readily available, and its fermentation provides an attractive way to dispose of many industrial and agricultural waste products. Finally, plant biomass is a highly renewable resource. Many dedicated energy crops can provide high energy biomass, which may be harvested multiple times each year.
[0008] One barrier to the production of products from biomass is that the cellulosic polymer has evolved to resist degradation and to confer hydrolytic stability and structural robustness to the cell walls of plants. This robustness or "recalcitrance" is due largely to extensive intermolecular hydrogen bonding between cellulose polymer chains. Some organisms, notably fungi, bacteria, and protozoans, but also some plants and animals, have evolved the ability to digest cellulose. In vivo cellulose breakdown typically entails the cooperative interaction of several cellulases, enzymes that catalyze the cellulolysis (hydrolysis) of cellulose. Several different kinds of cellulases, which differ structurally and mechanistically, are known, and some of these have been isolated, characterized and used to break down cellulose in vitro. General categories of cellulases include: endo-cellulases (endoglucanases), which randomly hydrolyze internal bonds to disrupt the crystalline structure of cellulose, thereby exposing individual cellulose polysaccharide chains; and exo-cellulases (exo-processive-endoglucanases), which cleave 2-4 units from the ends of the exposed chains produced by endocellulases to produce tetrasaccharides or disaccharides such as cellobiose. Two major types of exo-cellulases are known, one of which works processively from the reducing end, and one of which works processively from the non-reducing end of cellulose. A third major type of cellulase is cellobiase or beta-glucosidase, which hydrolyses exo-cellulase products such as cellobiose into individual glucose monosaccharides.
[0009] Typically, the digestion of cellulose is carried out at temperatures approaching 100° C. because, at high temperatures, intermolecular hydrogen bonds are disrupted and recalcitrant cellulose polymers become accessible to the cellulase enzymes. Therefore, cellulases used commercially in such processes must be able to withstand very high temperatures, preferably for extended periods of time.
[0010] There is an ongoing need to identify, isolate and characterize cellulases, especially thermally stable cellulases, for use in the enzymatic hydrolysis of cellulose. Of particular interest is the development of groups or systems of cellulases that include enzymes with endo-cellulase, exo-cellulase and beta-glucosidase activity, the enzymes in the system acting in concert to carry out the complete hydrolysis of cellulose to glucose at high temperatures.
SUMMARY OF THE INVENTION
[0011] Protein sequences which heretofore were not recognized as having enzymatic activity have been isolated and characterized as thermostable enzymes capable of degrading (hydrolyzing) cellulose at high temperatures. The activity is referred to herein as cellulase or cellulase-like. The enzymes, originating from Archaea and various thermophilic bacteria, include: endoglucanases that randomly hydrolyze internal glycosidic bonds; exo-processive-endoglucanases that split off cellobiose dimers; and β-glucosidases that reduce cellobiose into monomeric glucose molecules. While the β-glucosidase enzymes are technically not "cellulases" because cellobiose (not cellulose) is the substrate they cleave, the three groups of enzymes may be sometimes collectively referred to as "cellulases" herein. The enzymes are optimally catalytically active at temperatures at or above about 85° C. and retain >85% of their enzymatic activity even after a 5 day incubation at elevated temperature, e.g. 90° C. In some embodiments, the enzymes, or enzyme systems or groupings comprising multiple thermostable catalytic activities may advantageously be used to degrade cellulose. Preferably, in the case of systems which have multiple thermostable catalytic activities, such a system comprises at least one endoglucanase, at least one exo processive-endoglucanase, and at least one beta-glucosidase enzyme, and thus can carry out the complete hydrolysis of cellulose to glucose at high temperatures in a sequential, cooperative manner. Catalytic consolidation at high-temperatures using the enzyme systems described herein is not additive but synergistic, accessing recalcitrant cellulose and hydrolyzing beta linkages at temperatures above 85° C. Thus, one aspect of the invention is to employ the enzymes, alone or in a group, in processes to break down cellulosic material by contacting the cellulosic material with the enzymes and elevating the temperature to activate the enzymes to break down the cellulosic material. These processes might be performed, for example, in tanks where the cellulosic material is distributed in a liquid carrier; however, the enzymatic breakdown may be achieved simply through elevating the temperature of the cellulosic material with the enzymes being in contact with the cellulosic material.
[0012] The invention also contemplates the incorporation of nucleic acids coding for one or more of the enzymes into a host (e.g., a plant, fungi, bacterium or animal). In the case where the host produces or is composed of cellulosic material (e.g., plants such as corn, switch grass, sugar cane, sorghum, pinus and eucalyptus), the host can be subjected to breakdown of the cellulosic material, for example, after harvest. That is, in a particular example, corn or switchgass transformed to include nucleic acids coding for the enzymes will express the enzymes internally, and after collection or harvest of the corn or switchgrass, the enzymes can be activated to begin and preferably ultimately to completely degrade the cellulose simply by elevating the temperature of the corn or switchgrass.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1A-B. Pyrococcus furiosus Termocel 1 endoglucanase. A, nucleotide sequence (903 bp, SEQ ID NO: 1); and B, amino acid sequence (301 aa, SEQ ID NO: 2). Nucleotide sequence is optimized for expression in corn and shown without N-terminal signal peptide encoding sequence.
[0014] FIG. 2A-B. Thermotoga petrophila Termocel 2 endoglucanase. A, nucleotide sequence (825 bp, SEQ ID NO: 3); and B, amino acid sequence (274 aa, SEQ ID NO: 4).
[0015] FIG. 3A-B. Pyrococcus horikoshii Termocel 3 exocellulase. A, nucleotide sequence (1377 bp, SEQ ID NO: 5); and B, amino acid sequence (458 aa, SEQ ID NO: 6).
[0016] FIG. 4A-B. Pyrococcus abyssi Termocel 4 exocellulase. A, nucleotide sequence (1542 bp, SEQ ID NO: 7); and B, amino acid sequence (514 aa, SEQ ID NO: 8). Nucleotide sequence is optimized for expression in rice and shown without stop codon.
[0017] FIG. 5A-B. Thermotoga petrophila Termocel 5 endoglucanase. A, nucleotide sequence (987 bp, SEQ ID NO: 9); and B, amino acid sequence (328 aa, SEQ ID NO: 10).
[0018] FIG. 6A-B. Caldivirga inaquilingenesis Termocel 6 endoglucanase. A, nucleotide sequence (852 bp, SEQ ID NO: 11); and B, amino acid sequence (284 aa, SEQ ID NO: 12). Nucleotide sequence is optimized for expression in corn.
[0019] FIG. 7A-B. Thermotoga petrophila Termocel 7 beta-glucosidase. A, nucleotide sequence (2169 bp, SEQ ID NO: 13); and B, amino acid sequence (722 aa, SEQ ID NO: 14).
[0020] FIG. 8A-B. Thermotoga petrophila Termocel 8 beta-glucosidase. A, nucleotide sequence (1341 bp, SEQ ID NO: 15); and B, amino acid sequence (446 aa, SEQ ID NO: 16).
[0021] FIG. 9. Characteristics of high-temperature operating thermo-stable cellulases. swAvicel=phosphoric acid swollen Avicel; avCellulose=Avicel; cellulase specific activity is expressed as μM of reducing sugar/mg protein/day at 85° C., pH 6; beta-glucosidase specific activity is expressed as nM p-nitrophenol (pN)/μg protein/minute.
[0022] FIG. 10A-E. Processive exocellulases (cellobiohydrolase). Capillary zone electrophoresis (CZE) of 8-aminonaphthalene-1,3,6 trisulfonic acid (ANTS)-labeled cellopentose breakdown products incubated with Termocel 1 (A), Termocel 2 (B) and Termocel 3 (C). CZE retention times (D) of purified monomer, (DPI) dimer (DP2), trimer (DP3) and the substrate cellopentose (DP5). Sequential predicted cleavage pattern (E) between DP5 and DP4, DP4 and DP3, DP3 and DP2. Assay conditions, Substrate, ANTS-cellopentose (FIG. 9C), buffer sodium phosphate/citrate 50 mM, incubated at 95° C., pH 6.
[0023] FIG. 11A-E. Processive endocellulases (endoglucanase). Capillary zone electrophoresis (CZE) of ANTS-labeled cellopentose breakdown products incubated with Termocel 4 (A), Termocel 5 (B) and Termocel 6 (C). CZE retention times (D) of purified monomer, (DPI) dimer (DP2), trimer (DP3) and the substrate cellopentose (DP5). Predicted cleavage pattern (E) between DP2 ad DP3 and DP3 and DP4. Assay conditions, Substrate, ANTS-cellopentose (FIG. 9C), buffer sodium phosphate 50 mM, incubated at 95° C., pH 6.
[0024] FIG. 12. Temperature optima for high-temperature catalytic cellulases, and activities at 60, 45 and 20° C. swAvicel=phosphoric acid swollen Avicel; avCellulose=Avicel; cellulase specific activity is expressed as μM of reducing sugar/mg protein/day at 85° C., pH 6; beta-glucosidase specific activity is expressed as nM p-nitrophenol (pN)/μg protein/minute.
[0025] FIG. 13A-D. Termocel thermostability. Termocels were incubated at 90° C. in phosphate/citrate buffer for the indicated number of hours and CMC or PNPG activity determined. The amount of residual activity is shown. A, Termocels 1 and 2; B, Termocels 3 and 4; C, Termocels 5 and 6; D, Termocels 7 and 8 With the exception of Termocel 3, all cellulases retained >80% of activity after 120 hrs at 90° C. Thus, these enzymes are stable at high temperatures.
[0026] FIG. 14. Flow chart illustrating cellulose treatment steps.
[0027] FIG. 15. Table depicting biomass substrate specificity of particular Termocels.
[0028] FIG. 16A-B. Caldivirga maquilingensis Termocel 9 endoglucanase. A, nucleotide sequence (1230 bp, SEQ ID NO: 29); and B, amino acid sequence (410 aa, SEQ ID NO: 30). Nucleotide sequence is optimized for expression in corn.
[0029] FIG. 17A-B. Pyrococcus horikoshii Termocel 10 endoglucanase. A, nucleotide sequence (2214 bp, SEQ ID NO: 31); and B, amino acid sequence (737 aa, SEQ ID NO: 32).
[0030] FIG. 18A-F. Nucleotide sequences as set forth in SEQ ID NO: 1 (from Pyrococcus furiosus), SEQ ID NO: 3 (from Thermotoga petrophila), SEQ ID NO: 5 (from Pyrococcus horikoshii), SEQ ID NO: 7 (from Pyrococcus abyssi), SEQ ID NO: 9 (from Thermotoga petrophila), SEQ ID NO: 11 (from Caldivirga maquilingenesis), SEQ ID NO: 29 (from Caldivirga maquilingensis), SEQ ID NO: 31 (from Pyrococcus horikoshii), SEQ ID NO: 13 (from Thermotoga petrophila), and SEQ ID NO: 15 (from Thermotoga petrophila).
[0031] FIG. 19A-C. Amino acid sequences as set forth in SEQ ID NO: 2 (from Pyrococcus furiosus), SEQ ID NO: 4 (from Thermotoga petrophila), SEQ ID NO: 6 (from Pyrococcus horikoshii), SEQ ID NO: 8 (from Pyrococcus abyssi), SEQ ID NO: 10 (from Thermotoga petrophila), SEQ ID NO: 12 (from Caldivirga maquilingenesis), SEQ ID NO: 30 (from Caldivirga maquilingensis), SEQ ID NO: 32 (from Pyrococcus horikoshii), SEQ ID NO: 14 (from Thermotoga petrophila), and SEQ ID NO: 16 (from Thermotoga petrophila).
DETAILED DESCRIPTION
[0032] The present invention is based on the identification and characterization of a comprehensive set of thermostable cellulases that work in concert to catalyze the hydrolysis of cellulose to glucose at very high temperatures. The cellulases, originally identified in archeal and bacterial genomes, include endoglucanases that randomly cleave internal glycosidic bonds; exo-processive-endoglucanases that further hydrolyze cellulose fragments into cellobiose dimers; and β-glucosidases that further hydrolyze cellobiose dimers to glucose monomers. While the enzymes may be used individually, in some embodiments of the invention they are grouped to form a cooperative enzyme system. By combining into a group at least one endoglucanase, at least one exo-processive-endoglucanase and at least one β-glucosidase, an enzyme system is formed that is capable of the complete breakdown of cellulose to glucose. Importantly, the enzyme system's various catalytic activities are optimal at temperatures that are high enough to destabilize the hydrogen bonds between crystalline cellulose strands (e.g. at temperatures greater than 80° C.). Destabilization of hydrogen bonds at high temperatures causes disruption of the crystalline structure of cellulose, thereby facilitating access by the first enzyme in the series (endoglucanase) to internal glycosidic bonds of individual cellulose polymer strands, and allowing the step-wise process of cellulose breakdown to begin.
[0033] Exemplary amino acid sequences of the recombinant enzymes of the invention and exemplary nucleotide sequences that encode them are depicted in FIGS. 1-8. However, those of skill in the art will recognize that the invention also encompasses variant proteins comprising amino acid sequences that are based on or derived from the sequences disclosed herein. By an amino acid sequence that is "derived from" or "based on" the sequence disclosed herein, we mean that a derived sequence (or variant sequence) displays at least about 50 to 100% identity to an amino acid sequence disclosed herein, or about 60 to 100% identify, or about 70 to 100% identity, or even from about 80 to 100% identify. In preferred embodiments, a variant sequence displays from about 90 to 100% or about 95 to 100% amino acid identity. In further preferred embodiments, a variant sequence is 95, 96, 97, 98 or 99% identical to at least one sequence disclosed herein. Variations in the sequences may be due to a number of factors and may include, for example: conservative or non-conservative amino acid substitutions; natural variations among different populations as isolated from natural sources; various deletions or insertions (which may be amino terminal, carboxyl terminal, or internal); addition of leader sequences to promote secretion from the cell; addition of targeting sequences to direct the intracellular destination of a polypeptide; etc. Such alterations may be naturally occurring or may be intentionally introduced (e.g. via genetic engineering) for any of a wide variety of reasons, e.g. in order to eliminate or introduce protease cleavage sites, to eliminate or introduce glycosylation sites, in order to improve solubility of the polypeptide, to facilitate polypeptide isolation (e.g. introduction of a histidine or other tag), as a result of a purposeful change in the nucleic acid sequence (see discussion of the nucleic acid sequence below) which results in a non-silent change in one or more codons and thus the translated amino acid, in order to improve thermal stability of the protein, etc. All such variant sequences are encompassed by the present invention, so long as the resulting polypeptide is capable of catalyzing the enzyme activity of the original protein as disclosed herein. For example, the invention includes shorter portions of the sequences that also retain the catalytic activity of the enzyme. The full-length protein sequences and/or active portions thereof are both referred to as polypeptides herein. In addition, the invention also includes chimeric or fusion proteins that include, for example: more than one of the enzymes disclosed herein (or active portions thereof); or one or more of the enzymes disclosed herein (or portions thereof) plus some other useful protein or peptide sequence(s), e.g. signal sequences, spacer or linker sequences, etc.
[0034] The invention also comprehends nucleic acid sequences that encode the proteins and polypeptides of the invention. Several exemplary nucleic acid sequences are provided herein. However, as is well known, due to the degeneracy of the nucleic acid triplet code, many other nucleic acid sequences that would encode an identical polypeptide could also be designed, and the invention also encompasses such nucleic acid sequences. Further, as described above, many useful variant forms of the proteins and peptides of the invention also exist, and nucleic acid sequences encoding such variants are intended to be encompassed by the present invention. In addition, such nucleic acid sequences may be varied for any of a variety of reasons, for example, to facilitate cloning, to facilitate transfer of a clone from one construct to another, to increase transcription or translation in a particular host cell (e.g. the sequences may be optimized for expression in, for example, corn, rice, yeast or other hosts), to add or replace promoter sequences, to add or eliminate a restriction cleavage site, etc. In addition, all genera of nucleic acids (e.g. DNA, RNA, various composite and hybrid nucleic acids, etc.) encoding proteins of the invention (or active portions thereof) are intended to be encompassed by the invention.
[0035] The invention further comprehends vectors, which contain nucleic acid sequences encoding the polypeptides of the invention. Those of skill in the art are familiar with the many types of vectors, which can be useful for such a purpose, for example: plasmids, cosmids, various expression vectors, viral vectors, etc.
[0036] Production of the nucleic acids and proteins of the invention can be accomplished in any of many ways that are known to those of skill in the art. The sequences may be synthesized chemically using methods that are well-known to those of skill in the art. Alternatively, nucleotide sequences may be cloned using, for example, polymerase chain reaction (PCR) and/or other known molecular biology and genetic engineering techniques Recombinant proteins may be made from a plasmid contained within a bacterial host such as Escherichia coli, in insect expression systems, yeast expression systems, plant cell expression systems, etc. Further, the nucleic acid sequences may be optimized for expression in a particular organism or system. To that end, the present invention also encompasses a host cell that has been transformed or otherwise manipulated to contain nucleic acids encoding the proteins and polypeptides of the invention, either as extra-chromosomal elements, or incorporated into the chromosome of the host. In particular, in the practice of the present invention, nucleic acid sequences encoding one or more of the cellulases (e.g. an entire "system" as described herein) may be introduced into plant cells, seeds, etc., to generate recombinant plants that contain the nucleic acids.
[0037] Plant transformation to incorporate one or more nucleic acids coding for one or more cellulase enzymes described herein can be accomplished by a variety of techniques known to those of skill in the art. Plant transformation is the introduction of a foreign piece of DNA, conferring a specific trait, into host plant tissue. Plant transformation can be carried out in a number of different ways; Agrobacterium mediated transformation, particle bombardment, electroporation and viral transformation.
[0038] Suitable examples of plants that may be transformed to include one or more cellulase enzymes or sets of enzymes include but are not limited to rice, corn, various grasses such as switchgrass, sugar cane, sorghum, pinus and eucalyptus, etc. Advantages of genetically engineering plants to contain and express the cellulase genes include but are not limited to the availability of the enzymes within the cell wall tissues (cellulosic fibers) and ready to be activated by high temperatures (e.g., heating to 70 or 80 C or more). Deposition of these enzymes produced by the plant cells and targeted to the apoplast, should largely overcome the recalcitrant nature of biomass.
[0039] The cellulases and/or cellulase enzyme systems of the invention may be used for the breakdown (catalysis) of cellulose in biomass from a wide variety of sources. Biomass comes in many different types, which may be grouped into four main categories: (1) wood residues (including sawmill and paper mill discards); (2) municipal paper waste; (3) agricultural residues (including corn stover and sugarcane bagasse); and (4) dedicated energy crops, which are mostly composed of fast growing tall, woody grasses. Cellulose-containing biomass from any of these or other sources may be acted upon by the enzymes and consolidated enzyme systems of the invention.
[0040] Generally, the breakdown of cellulose will be complete, i.e. the endproduct is glucose. This is especially true when a consolidated enzyme system that includes at least three different types of enzymes (for example, an endoglucanase, an exo-processive-endoglucanases, and a β-glucosidase) are employed. However, this need not always be the case. Depending on the goal of the reaction, only one enzyme may be utilized (e.g. an endoglucanase to generate randomly cleaved cellulose polymers); or only two enzymes may be utilized (e.g. an endoglucanase and an exo-processive-endoglucanases to generate dimeric disaccharides such as cellobiose), etc. Any desired grouping of the enzymes of the invention may be utilized to generate any desired endproduct that the enzymes are capable of producing from a suitable substrate. Further, one or more of the enzymes of the invention may be used in combination with other cellulases, or with enzymes having other types of activities. In one embodiment of the invention, a "system" could further include a yeast or other organism capable of fermenting glucose to e.g. ethanol.
[0041] The cellulases of the invention have very high temperature optima, an optimal temperature being the temperature at which an enzyme is maximally active (e.g. as an endoglucanase, an exo-processive-endoglucanases, or a β-glucosidase), as determined by a standard assay recognized by those of skill in the art. As described in the Examples section below, the lowest temperature optimum for an enzyme of the invention is about 85° C., and the highest temperature optimum is about 102° C. Further, the enzymes of the invention are thermally stable, i.e. they are capable of retaining catalytic activity at high temperatures (e.g. at their temperature maximum, or at temperatures that deviate somewhat from the maximum) for extended periods of time, for example, for at least for several hours (e.g. 1-24 hours), and in many cases, for several days (e.g. from 1-7 days or even longer). By "retain catalytic activity" we mean that the enzyme retains at least about 10, 20, 30, 40 or 50% or more of the activity displayed at the beginning of the extended time period, when measured under standard conditions; and preferably the enzyme retains 60, 65, 70, 75, 80, 85, 90, 95, or even 100% of the activity displayed at the beginning of the extended time period.
[0042] The enzymes of the invention are generally employed in reactions that are carried out at temperatures at or near those which are optimal for their activity. Some enzymes may be used over a wide temperature range (e.g. at a temperature that is about 50, 40, 30, 20, 10, 5 or fewer degrees lower than (below) the temperature optimum, and up to about 5, 10, 15, or more degrees greater than (above) the temperature optimum. For other enzymes, the range may be more restricted, i.e. they may display catalytic activity within a narrow temperature range of only less than about 10, or less than about 5, or fewer degrees of their optimal catalytic temperature. When carrying out a cellulose digestion reaction, the enzymes may be used one at a time sequentially (i.e. one enzyme is added, reaction occurs, and then another enzyme is added, with or without removal of the previous enzyme, and so on), or the reaction mixture may contain two or even all three of the enzymes (an enzyme system) may be added at the same time. When designing groups of enzymes to be included in an enzyme system, those of skill in the art will recognize that a suitable temperature at which all enzymes in the group are active will be selected as the temperature for reaction. Or, conversely, if it is desired to carry out a reaction at a particular temperature, enzymes with optimal activity at or near that temperature would be selected for inclusion in the set. For example, for a reaction to be carried out at 97° C., one might choose a set of enzymes that includes Termocel 5 (endocellulase, optimum=96° C.), plus Termocel 2 (exocellulase, optimum=98° C.), plus Termocel 7 (β-glucosidase, optimum=98° C.); whereas for a reaction that is to be carried out at 90° C., one might choose Termocel 6 (endocellulase, optimum=85° C.), plus Termocel 3 (exocellulase, optimum=94° C.), plus Termocel 7 (β-glucosidase, optimum=92° C.). If an enzyme is used individually, the reaction may be carried out at a temperature near its optimum, or at which the enzyme retains sufficient activity to be useful. In addition, the selection of a reaction temperature may be based on other considerations, e.g. safety or other practical considerations of high temperature operations, or concerns about the cost of keeping a reaction mixture at a high temperature, the temperature used for preparing biomass for the reaction, the temperature of procedures that follow the reaction, etc. Generally, the degradation of cellulose will be carried out at a temperature in the range of from about 70 to about 95° C.
[0043] The invention also provides methods of use of the enzymes disclosed herein. The methods generally involve the used of at least three enzymes of the invention, at least one from each of the three classes endoglucanase, exo-processive-endoglucanases, and β-glucosidase. The three classes of enzymes act in concert to sequentially breakdown cellulose to glucose. The methods of the invention may be carried out for any purpose for which it is desirable to prepare glucose (or other products produced by the enzymes), and further metabolize into other chemicals, such as ethanol, xylitol, butanol, amino acids, glycol etc.
[0044] Generally, such methods are carried out by first pretreating a cellulose-rich feedstock by removing the lignin (usually through ball milling). The production of sugars (saccharification) of the pretreated cellulose is carried out by suspending the pretreated cellulose in a cellulase broth that contains suitable cellulase enzymes such as those disclosed herein. Generally, the reaction will be carried out at a temperature in the range of from about 70 to about 95 C, and the length of time for a reaction will be in the range of from about one hour to about six days. Reactions are carried out in media such as aqueous buffered to a suitable pH, e.g. in the range of from about pH 4 to about pH 9.
[0045] Thereafter, the desired products (e.g. glucose and cellobiose) may be harvested from the broth, or the reaction products may be further processed. For example, for the production of ethanol, fermentation of the glucose in the broth may be carried out by known conventional batch or continuous fermentation processes, usually using yeast. Ethanol may be recovered by known stripping or extractive distillation processes. This process is illustrated schematically in FIG. 14, which shows the steps of pretreating biomass to provide a source of cellulose; contacting the cellulose with one or more cellulase enzymes of the invention to hydrolyze cellulose to glucose, and fermenting the glucose to produce ethanol.
EXAMPLES
Example 1
Isolation and Characterization of Cellulases that Catalyze High-Temperature Thermo-Stable Bio-Consolidated Cellulose Breakdown
Abstract
[0046] Cellulose breakdown entails cooperative interaction of various cellulases by accessing and cleaving the recalcitrant cellulosic polymer. At high temperatures, most of the recalcitrant biomass polymers become enzymatically accessible because of intermolecular hydrogen bond disruption. Here, we describe a high-temperature operating thermo-stable cellulose enzyme system, consisting of endoglucanases, exoprocessive-endoglucanases and beta-glucosidases. Two catalytic types of cellulose cleaving enzymes was found: endoglucanases that randomly hydrolyze internal glycosidic bonds and exo-processive-endoglucanase, which split off cellobiose dimers. Finally, a third activity, βglucosidase, reduces cellobiose into glucose molecules. The consolidated enzyme system operates optimally at temperatures above 85° C. and retains >85% of its enzymatic activity after a 5 day incubation at 90° C. Catalytic consolidation with high-temperatures is not additive but synergistic, accessing recalcitrant cellulose and hydrolyzing beta linkages above 85° C.
Introduction
[0047] Cellulose is an abundant biopolymer component of plant cell walls. Cellulose is a linear biopolymer of D-glucose, linked by β-1,4-glucosyl linkages. Cellulosic enzyme systems completely hydrolyze cellulose rendering glucose molecules. A cellulosic enzymatic system consists of multiple cellulases, endo-β-glucanase, cellobiohydrolase and β-glucosidase, which interact synergistically in producing glucose. Endoglucanases randomly hydrolyze the internal glycosidic bonds to decrease the length of the cellulose chain. Cellobiohydrolases are exo- or endo-processive enzymes that split off cellobiose of the shortened cellulose chains. Cellobiose is hydrolyzed by β-glucosidase to glucose. Native cellulose molecules appear predominantly as crystalline cellulose, which shows a high degree of intermolecular hydrogen bonding explaining its remarkable stability and recalcitrance to enzymes. Thus disrupting crystal intermolecular hydrogen bonds through cellulose swelling and dissolution with high-temperature operating cellulases overcomes recalcitrance and result in enzymatic digestion of native cellulose.
Results
Isolation and Characterization of High-Temperature Operating and Thermostable Cellulases
[0048] A series of ten high-temperature operating and thermo-stable cellulases were identified through bioinformatics driven searches of archeal and bacterial genomes. The corresponding genes were genetically manipulated to adapt expression to a laboratory tractable system (Escherichia coli) by codon optimization and usage controlled promoters. Individual proteins were expressed and isolated (purified) from E. coli crude extracts and analyzed for activity and other physical and chemical properties. Data presented in tabular form in FIG. 9 describes the eight enzymes isolated in this study.
[0049] Termocel 1 and 2, group into a class with similar physical and catalytic properties, they exhibit a molecular weight of 34,005 and 31,930 D, a pI of 4.8 and 4.77 and a net charge at pH 7 of -13.10 and -13.30, respectively. They appear to function through an exo-processive-endoglucanase cleaving pattern with a specific activity on Avicel of 63.4 and 8.1 U and on swollen cellulose of 13.6 and 2.2. U, respectively.
[0050] Termocel 3 and 4 differ slightly with a molecular weight of 51,930 and 59,980 D, a pI of 6.47 and 7.05 and a net charge at pH 7 of -3.60 and 0.30, respectively. These enzymes also seem not to overlap with their predicted mode of operation, one exoprocessive type and the other as a endoglucanase with specific activity on Avicel of 48.5 and 6.8 and on swollen cellulose of 8.4 and 2.2 U, respectively.
[0051] Termocel 5 and 6 fall in a third class with similar physical and catalytic properties. They exhibit a molecular weight of 38,226 and 31,818 D, a pI of 5.58 and 5.66 and a net charge at pH 7 of -6.60 and -5.00, respectively. They appear to function through an internal cleaving pattern (endoglucanase) with a specific activity on Avicel of 34.1 and 20.6 U and on swollen cellulose of 6.8 and 5.1 U respectively.
[0052] Termocel 7 and 8 are β-glucosidases with distinct physical properties but similar catalytic activity. They exhibit a molecular weight of 81,243 and 51,509 D, a pI of 5.38 and 5.84 and a net charge at pH7 of -16.90 and -9.10, respectively. They cleave cellobiose with a specific activity on pNPG of 69.4 and 60.9 U, respectively.
[0053] Termocel 9 and 10 are endocellulases that exhibit a molecular weight of 45,059 and 85,598 D, a pI of 6.16 and 7.80 and a net charge at pH 7 of -2.20 and 4.30, respectively. They have a specific activity on Avicel of 5.2 and 4.9 U and on swollen cellulose of 1.4 and 1.5 U respectively.
Mode of Operation
[0054] FIGS. 10 and 11 describe the mode of operations of all six cellulases. Termocel 1, 2 and 3 are cellulases that function by sequentially cleaving glucose residues of the non-reducing end of a polymeric substrate. FIG. 10 shows the sequential depolymerization breakdown products through capillary zone electrophoresis.
[0055] Termocel 4, 5 and 6 are cellulases that function by internally cleaving a multimeric substrate. FIG. 11 shows trimeric and dimeric breakdown products, indicating internal cleavage of the pentameric substrate.
High-Temperature Catalytic Operation
[0056] FIG. 12 shows the optimum temperature of operation of eight Termocels in tabular form. The highest optimum was found for Termocel 1 with and optimum of 102° C. and the lowest optimum was found to be Termocel 6 with 85° C. At 60° C., all Termocels lost at least 40% of their activity (except Termocel 4) and at 20° C. the Termocels operated with less than 20% of their optimum activity.
[0057] Among the beta-glucosidases, no significant differences between activity and temperature optimum were apparent. However, catalytic inactivation at lower temperatures (45 and 20° C.) to levels below 1% residual activity for Termocel 7 is remarkable.
Thermal Stability
[0058] Thermostability of the Termocels was evaluated to determine the working time frame with useful enzymatic activity at high-temperatures. Enzymes were incubated at 90° C. for up to 5 days and than assayed for CMC (endo-glucanase and exo-cellulase) or PNPG (beta-glucosidase) activity and results are reported as % of residual activity in FIG. 13. With the exception of Termocel 3, all enzymes retained over 80% of their initial enzymatic activity after a 5-day incubation period at 90° C.
Modes of Use
[0059] These high-temperature operating cellulases can be used in all processes in which cellulose degradation at high temperatures is desired. These applications include but are not restricted to food processing, feedstuff preparation, textile finishing and paper pulping. The consolidated enzyme system is useful to hydrolyze fibrous crystalline cellulosic biomass materials, at high temperatures with Termocel 1, 2, 3, 4, 5, 6, 7 and 8 to produce high-sugar containing fermentation broths. In addition the genes of the high-temperature operating enzyme system can be used in producing transgenic organisms capable of expressing one or more high-temperature operating and thermostable plant cell wall degrading enzymes.
Methods
Cloning
[0060] Genomic DNA of Pyrococcus horikoshii OT3 served as the PCR template for the amplification of the PH1171 gene. Likewise, genomic DNA of Thermotoga petrophila RKU-1 served as PCR template for the cloning of the PetroA, PetroB, Tpet--0898 and Tpet--0952 genes. Primer sequences are shown in Table 1. Restriction sites were introduced (bold letters). The O-eglA, ZP and E1 genes were synthesized without using a DNA template; the codons of the three genes were also optimized according to the sequences of corn and rice genomes (FIGS. 1A, 4A and 6A). All gene segments generated were cloned into the NcoI and XbaI sites of the pBAD/Myc-His vector (Invitrogen), which carries a fusion sequence (GAACAAAAACTCA TCTCAGAAG AGGATCTGAATAGCGCCGTCGACCATCATCATCATCATCATCAT, SEQ ID NO: 17) encoding six histidine residues at the C-terminus of any protein expressed from the vector (EQKLISEEDLNSAVDHHHHHH, SEQ ID NO: 18) The expression plasmids were used to transform Escherichia coli TOP 10F' (Invitrogen). All constructs were verified by DNA sequencing.
TABLE-US-00001 TABLE 1 Oligonucleotide sequences used in this study. SEQ ID Primer Sequence (5'→3')a NO: Termocel 3 ATATCCATGGAGGGGAATACTATTCTTAAAATCGTACTAAT (Forward) 19 ATGCTCTAGAAACCTGGGAGCCCTTCTTAAG (Reverse) 20 Termocel 5 GAAACGCTCCTCCCTGTAGT (Forward) 21 ATGCTCTAGAAATTCTCTCACCTCCAGATCAATAGAGA (Reverse) 22 Termocel 2 AGGTGGGTAGTTCTTCTGATGG (Forward) 23 ATGCTCTAGAAATTTTACAACTTCGACGAAGAAGTCTTTGA (Reverse) 24 Termocel 7 ATATCCATGGGAAAGATCGATGAAATCCTTTCA (Forward) 25 ATGCTCTAGAAATGGTTTGAATCTCTTCTCTCCC (Reverse) 26 Termocel 8 AACGTGAAAAAGTTCCCTGAAG (Forward) 27 ATGCTCTAGAAAATCTTCCAGACTGTTGCTTTTG (Reverse) 28 aBoldface indicates sequences complementary to the primers used to amplify the selectable markers.
Expression and Purification
[0061] An overnight growth of transformed E. coli strain containing the fusion protein vector was inoculated into fresh Luria-Bertani medium containing ampicillin. When the OD600 reached 0.5-0.6, L-arabinose was added to a final concentration of 0.2%. The culture was allowed to grow for another 4-5 h at 37° C. and the cells were collected by centrifugation. The pellet was stored at -80° C. prior to further processes. Cells were disrupted by sonication and the cell debris was removed by centrifugation at 10,000×g for 20 min. The protein pool was then heat treated at 95° C. for 5 min, and denatured proteins were removed by centrifugation at 12, 000×g for 20 min. The recombinant protein carrying a His6 tag was then purified by immobilized metal-chelate affinity chromatography (Qiagen). Hydrolysis of cellulose, hemicellulose and starch Hydrolysis of Avicel PH101, carboxymethyl cellulose (CMC), xylan from birch wood, α-cellulose, β-glucan barley, laminarin, lichenan, starch, swollen Avicel PH101, wheat arabinoxylan, xylan from beechwood and xylan from oat-spelt was measured spectrophotometrically by the increase of reducing ends at various temperatures and pH. The amount of reducing sugar ends was determined by the dinitrosalicyclic acid (DNS) method. The assay mix contained 10 μl of diluted enzymes, 30 μl of 100 mm sodium phosphate buffer, pH 6.0, and 20 μl of 0.5% (wt/vol) soluble substrates or 1% slurries (wt/vol) of insoluble substrates for 30 min or 1 hour. The reaction was terminated by adding 60 μl of DNS Solution. The absorbance of assay mix was read at 575 nm after the incubation at 100° C. for 5 min. The activity of enzymes as a function of temperature and pH was measured with CMC. Temperature gradient was achieved using PCR cycler (MJ Research). Phosphate/citrate buffers were used to generate pH gradient (ie., 2, 3, 4, 5, 6, 7, 8, 9.1). For the thermostability assay, each enzyme was incubated at 90° C. An aliquot of enzymes was taken each day. Residual activity was measured with CMC.
Hydrolysis of p-Nitrophenol-β-D-Glucoside
[0062] Activity of β-glucosidase was determined spectrophotometrically by monitoring the release of p-nitrophenol from the substrate p-nitrophenol-β-D-glucoside (Sigma) at various temperatures and pH. The assay mix contained 10 μl of diluted enzymes, 30 μl of 100 mm pH buffer, and 20 μl of 50 mM p-nitrophenol-β-D-glucoside for 10 min. The reaction was terminated by adding 120 μl of 1M Na2CO3. The absorbance of assay mix was read at 412 nm. Temperature and pH dependent activities and thermostability were measured as described above except that p-nitrophenol-β-D-glucoside was used as substrate.
Capillary Electrophoresis of Oligosaccharides
[0063] Capillary electrophoresis of oligosaccharides was performed on a BioFocus 2000 (Bio-Rad Laboratories,) with laser-induced fluorescence detection. A fused-silica capillary (TSP050375, Polymicro Technologies) of internal diameter 50 μm and length 31 cm was used as the separation column for oligosaccharides. The samples were injected by application of 4.5 lbin-2 of helium pressure for 0.22 sec. Electrophoresis conditions were 15 kV/70-100 μA with the cathode at the inlet, 0.1 M sodium phosphate, pH 2.5, as running buffer, and a controlled temperature of 20° C. The capillary was rinsed with 1 M NaOH followed by running buffer with adip-cycle to prevent carryover after injection. Oligomers labeled with APTS were excited at 488 nm and emission was collected through a 520-nm band pass filter.
Biomass Substrate Specificity of Termocels
[0064] A table depicting the biomass substrate specificity of Termocels 1-8 is provided as FIG. 15.
[0065] While the invention has been described in terms of its preferred embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims. Accordingly, the present invention should not be limited to the embodiments as described above, but should further include all modifications and equivalents thereof within the spirit and scope of the description provided herein.
Sequence CWU
1
1
321903DNAPyrococcus furiosus 1atgatctatt ttgttgagaa ataccacacc tcagaagaca
aatccacaag caatacctcc 60tcaacccccc ctcaaacgac acttagcaca acaaaggttc
tcaaaattcg gtatcctgac 120gacggcgaat ggcctggcgc tcccatagac aaagacggcg
acggaaatcc tgagttctat 180atcgaaatca acctctggaa catactcaac gcgactggat
tcgcagagat gacctataac 240ttgacatctg gcgttctcca ttacgttcaa caactcgata
atatcgttct ccgcgatcgc 300tcaaactggg tacatggcta tcctgaaatt ttttacggca
ataaaccctg gaacgcgaat 360tatgccaccg acggcccgat ccctctcccc agtaaagttt
ccaatctcac agacttttac 420ttgactatct cctacaagct tgaaccaaag aacggactcc
ctataaattt tgcaatcgaa 480tcttggctta ctagagaagc atggcgcact actggaatca
actccgatga acaggaagta 540atgatctgga tttactatga cggactccaa ccagccggtt
ccaaggtgaa agaaatcgtt 600gtacctataa tcgttaatgg caccccagtt aatgctacct
tcgaagtgtg gaaagctaat 660atcggatggg aatacgttgc ctttagaatc aagacaccaa
ttaaagaagg aaccgtgaca 720atcccctacg gtgcattcat tagcgtagct gctaacattt
cttccctccc aaattacaca 780gaactttacc tggaagacgt tgagataggc acagagtttg
gaacaccttc aactactagc 840gcacatctcg aatggtggat tactaacatt accctcaccc
cacttgatcg tcccctgatc 900tcc
9032301PRTPyrococcus furiosus 2Met Ile Tyr Phe Val
Glu Lys Tyr His Thr Ser Glu Asp Lys Ser Thr 1 5
10 15 Ser Asn Thr Ser Ser Thr Pro Pro Gln Thr
Thr Leu Ser Thr Thr Lys 20 25
30 Val Leu Lys Ile Arg Tyr Pro Asp Asp Gly Glu Trp Pro Gly Ala
Pro 35 40 45 Ile
Asp Lys Asp Gly Asp Gly Asn Pro Glu Phe Tyr Ile Glu Ile Asn 50
55 60 Leu Trp Asn Ile Leu Asn
Ala Thr Gly Phe Ala Glu Met Thr Tyr Asn 65 70
75 80 Leu Thr Ser Gly Val Leu His Tyr Val Gln Gln
Leu Asp Asn Ile Val 85 90
95 Leu Arg Asp Arg Ser Asn Trp Val His Gly Tyr Pro Glu Ile Phe Tyr
100 105 110 Gly Asn
Lys Pro Trp Asn Ala Asn Tyr Ala Thr Asp Gly Pro Ile Pro 115
120 125 Leu Pro Ser Lys Val Ser Asn
Leu Thr Asp Phe Tyr Leu Thr Ile Ser 130 135
140 Tyr Lys Leu Glu Pro Lys Asn Gly Leu Pro Ile Asn
Phe Ala Ile Glu 145 150 155
160 Ser Trp Leu Thr Arg Glu Ala Trp Arg Thr Thr Gly Ile Asn Ser Asp
165 170 175 Glu Gln Glu
Val Met Ile Trp Ile Tyr Tyr Asp Gly Leu Gln Pro Ala 180
185 190 Gly Ser Lys Val Lys Glu Ile
Val Val Pro Ile Ile Val Asn Gly Thr 195 200
205 Pro Val Asn Ala Thr Phe Glu Val Trp Lys Ala Asn
Ile Gly Trp Glu 210 215 220
Tyr Val Ala Phe Arg Ile Lys Thr Pro Ile Lys Glu Gly Thr Val Thr 225
230 235 240 Ile Pro Tyr
Gly Ala Phe Ile Ser Val Ala Ala Asn Ile Ser Ser Leu 245
250 255 Pro Asn Tyr Thr Glu Leu Tyr Leu
Glu Asp Val Glu Ile Gly Thr Glu 260 265
270 Phe Gly Thr Pro Ser Thr Thr Ser Ala His Leu Glu
Trp Trp Ile Thr 275 280 285
Asn Ile Thr Leu Thr Pro Leu Asp Arg Pro Leu Ile Ser 290
295 300 3825DNAThermotoga petrophila
3atgaggtggg tagttcttct gatggtggcg ttttctgctc tgctcttttc ctccgaggtg
60gttctcacga gcgttggcgc agcggatatc tccttcaacg gatttcccgt caccatggag
120ctcaacttct ggaacataaa gtcgtatgag ggagaaacgt ggctcaaatt cgatggagaa
180aaggttgagt tctacgcgga tttgtacaac atcgttcttc agaatccaga cagctgggtg
240catggatatc cggagatcta ctacggttac aagccctggg cgagtcacaa cagcggtgtt
300gaatttcttc ctgtgaaggt gaaagatctt ccggatttct acgtgactct tgattactcg
360atctggtacg aaaacaatct gcctatcaac cttgcaatgg aaacatggat cacgaaaagc
420cccgaccaga cttctgtttc ttcgggtgat gcggagatca tggtttggtt ttacaacaac
480gttctgatgc ccggcggtca gaaagtggat gagttcacca caacagttga gataaacgga
540gtgaagcagg aagcaaaatg ggatgtttac ttcgcaccgt ggagctggga ttaccttgcc
600ttcagactga caacaccgat gaaagaagga aaggtgaagt tcaacgtgaa ggacttcgtt
660cagaaagccg cggaagttgt caaaaagcac tcaacgagaa tagacaattt cgaagagctg
720tatttctgcg tctgggagat cgggacggaa tttggagatc caaacacaac aacggcaaaa
780ttcggctgga ccttcaaaga cttcttcgtc gaagttgtaa aataa
8254274PRTThermotoga petrophila 4Met Arg Trp Val Val Leu Leu Met Val Ala
Phe Ser Ala Leu Leu Phe 1 5 10
15 Ser Ser Glu Val Val Leu Thr Ser Val Gly Ala Ala Asp Ile Ser
Phe 20 25 30 Asn
Gly Phe Pro Val Thr Met Glu Leu Asn Phe Trp Asn Ile Lys Ser 35
40 45 Tyr Glu Gly Glu Thr Trp
Leu Lys Phe Asp Gly Glu Lys Val Glu Phe 50 55
60 Tyr Ala Asp Leu Tyr Asn Ile Val Leu Gln Asn
Pro Asp Ser Trp Val 65 70 75
80 His Gly Tyr Pro Glu Ile Tyr Tyr Gly Tyr Lys Pro Trp Ala Ser His
85 90 95 Asn Ser
Gly Val Glu Phe Leu Pro Val Lys Val Lys Asp Leu Pro Asp 100
105 110 Phe Tyr Val Thr Leu Asp
Tyr Ser Ile Trp Tyr Glu Asn Asn Leu Pro 115 120
125 Ile Asn Leu Ala Met Glu Thr Trp Ile Thr Lys
Ser Pro Asp Gln Thr 130 135 140
Ser Val Ser Ser Gly Asp Ala Glu Ile Met Val Trp Phe Tyr Asn Asn
145 150 155 160 Val Leu
Met Pro Gly Gly Gln Lys Val Asp Glu Phe Thr Thr Thr Val
165 170 175 Glu Ile Asn Gly Val Lys
Gln Glu Ala Lys Trp Asp Val Tyr Phe Ala 180
185 190 Pro Trp Ser Trp Asp Tyr Leu Ala Phe Arg
Leu Thr Thr Pro Met Lys 195 200
205 Glu Gly Lys Val Lys Phe Asn Val Lys Asp Phe Val Gln Lys
Ala Ala 210 215 220
Glu Val Val Lys Lys His Ser Thr Arg Ile Asp Asn Phe Glu Glu Leu 225
230 235 240 Tyr Phe Cys Val Trp
Glu Ile Gly Thr Glu Phe Gly Asp Pro Asn Thr 245
250 255 Thr Thr Ala Lys Phe Gly Trp Thr Phe Lys
Asp Phe Phe Val Glu Val 260 265
270 Val Lys 51377DNAPyrococcus horikoshii 5atggagggga
atactattct taaaatcgta ctaatttgca ctattttagc aggcctattc 60gggcaagtcg
tgccagtata tgcagaaaat acaacatatc aaacaccgac tggaatttac 120tacgaagtga
gaggagatac gatatacatg attaatgtca ccagtggaga ggaaactccc 180attcatctct
ttggtgtaaa ctggtttggc tttgaaacac ctaatcatgt agtgcacgga 240ctttggaaga
gaaactggga agacatgctt cttcagatca aaagcttagg cttcaatgca 300ataagacttc
ctttctgtac tgagtctgta aaaccaggaa cacaaccaat tggaatagat 360tacagtaaaa
atccagatct tcgtggacta gatagcctac agattatgga aaagatcata 420aagaaggccg
gagatcttgg tatctttgtc ttactcgact atcataggat aggatgcact 480cacatagaac
ccctctggta cacggaagac ttctcagagg aagactttat taacacatgg 540atagaggttg
ccaaaaggtt cggtaagtac tggaacgtaa taggggctga tctaaagaat 600gagcctcata
gtgttacctc acccccagct gcttatacag atggtaccgg ggctacatgg 660ggtatgggaa
accctgcaac cgattggaac ttggcggctg agaggatagg aaaagcgatt 720ctgaaggttg
cccctcattg gttgatattc gtggagggga cacaatttac taatccgaag 780actgacagta
gttacaaatg gggctacaac gcttggtggg gaggaaatct aatggccgta 840aaggattatc
cagttaactt acctaggaat aagctagtat acagccctca cgtatatggg 900ccagatgtct
ataatcaacc gtactttggt cccgctaagg gttttccgga taatcttcca 960gatatctggt
atcaccactt tggatacgta aaattagaac taggatattc agttgtaata 1020ggagagtttg
gaggaaaata tgggcatgga ggcgatccaa gggatgttat atggcaaaat 1080aagctagttg
attggatgat agagaataaa ttttgtgatt tcttttactg gagctggaat 1140ccagatagtg
gagataccgg agggattcta caggatgatt ggacaacaat atgggaagat 1200aagtataata
acctgaagag attgatggat agttgttcca aaagttcttc aagtactcaa 1260tccgttattc
ggagtaccac ccctacaaag tcaaatacaa gtaagaagat ttgtggacca 1320gcaattctta
tcatcctagc agtattctct cttctcttaa gaagggctcc caggtag
13776458PRTPyrococcus horikoshii 6Met Glu Gly Asn Thr Ile Leu Lys Ile Val
Leu Ile Cys Thr Ile Leu 1 5 10
15 Ala Gly Leu Phe Gly Gln Val Val Pro Val Tyr Ala Glu Asn Thr
Thr 20 25 30 Tyr
Gln Thr Pro Thr Gly Ile Tyr Tyr Glu Val Arg Gly Asp Thr Ile 35
40 45 Tyr Met Ile Asn Val Thr
Ser Gly Glu Glu Thr Pro Ile His Leu Phe 50 55
60 Gly Val Asn Trp Phe Gly Phe Glu Thr Pro Asn
His Val Val His Gly 65 70 75
80 Leu Trp Lys Arg Asn Trp Glu Asp Met Leu Leu Gln Ile Lys Ser Leu
85 90 95 Gly Phe
Asn Ala Ile Arg Leu Pro Phe Cys Thr Glu Ser Val Lys Pro 100
105 110 Gly Thr Gln Pro Ile Gly
Ile Asp Tyr Ser Lys Asn Pro Asp Leu Arg 115 120
125 Gly Leu Asp Ser Leu Gln Ile Met Glu Lys Ile
Ile Lys Lys Ala Gly 130 135 140
Asp Leu Gly Ile Phe Val Leu Leu Asp Tyr His Arg Ile Gly Cys Thr
145 150 155 160 His Ile
Glu Pro Leu Trp Tyr Thr Glu Asp Phe Ser Glu Glu Asp Phe
165 170 175 Ile Asn Thr Trp Ile Glu
Val Ala Lys Arg Phe Gly Lys Tyr Trp Asn 180
185 190 Val Ile Gly Ala Asp Leu Lys Asn Glu Pro
His Ser Val Thr Ser Pro 195 200
205 Pro Ala Ala Tyr Thr Asp Gly Thr Gly Ala Thr Trp Gly Met
Gly Asn 210 215 220
Pro Ala Thr Asp Trp Asn Leu Ala Ala Glu Arg Ile Gly Lys Ala Ile 225
230 235 240 Leu Lys Val Ala Pro
His Trp Leu Ile Phe Val Glu Gly Thr Gln Phe 245
250 255 Thr Asn Pro Lys Thr Asp Ser Ser Tyr Lys
Trp Gly Tyr Asn Ala Trp 260 265
270 Trp Gly Gly Asn Leu Met Ala Val Lys Asp Tyr Pro Val Asn
Leu Pro 275 280 285
Arg Asn Lys Leu Val Tyr Ser Pro His Val Tyr Gly Pro Asp Val Tyr 290
295 300 Asn Gln Pro Tyr Phe
Gly Pro Ala Lys Gly Phe Pro Asp Asn Leu Pro 305 310
315 320 Asp Ile Trp Tyr His His Phe Gly Tyr Val
Lys Leu Glu Leu Gly Tyr 325 330
335 Ser Val Val Ile Gly Glu Phe Gly Gly Lys Tyr Gly His Gly Gly
Asp 340 345 350 Pro
Arg Asp Val Ile Trp Gln Asn Lys Leu Val Asp Trp Met Ile Glu 355
360 365 Asn Lys Phe Cys Asp Phe
Phe Tyr Trp Ser Trp Asn Pro Asp Ser Gly 370 375
380 Asp Thr Gly Gly Ile Leu Gln Asp Asp Trp Thr
Thr Ile Trp Glu Asp 385 390 395
400 Lys Tyr Asn Asn Leu Lys Arg Leu Met Asp Ser Cys Ser Lys Ser Ser
405 410 415 Ser Ser
Thr Gln Ser Val Ile Arg Ser Thr Thr Pro Thr Lys Ser Asn 420
425 430 Thr Ser Lys Lys Ile Cys
Gly Pro Ala Ile Leu Ile Ile Leu Ala Val 435 440
445 Phe Ser Leu Leu Leu Arg Arg Ala Pro Arg
450 455 71542DNAPyrococcus abyssi 7atggaaatca
agctcttctg cgtgtttatc gtgttcatca tcctcttctc ccctttcgtg 60attgcactct
cgtatccaga tgttaactat actgccgaga atggtattat cttcgtgcag 120aacgtcacta
cgggtgagaa gaagccactt tatcttcacg gagtgtcatg gtttggattc 180gagctgaagg
accacgtcgt ctatggcttg gataaacgga actggaaaga tatactcaag 240gatgttaagc
gcttgggttt taatgctatc aggcttccct tctgctctga aagcatccgc 300cctgatacgc
gcccttcgcc tgagcggata aactacgagt tgaaccccga cttgaagaat 360ctgacttccc
tcgaaataat ggagaagatt attgaatacg ccaactcaat cgggctctac 420atactcttgg
attatcaccg catcggttgt gaggagatcg aacctctttg gtataccgag 480aattactcag
aggagcagta tataaaggat tggatcttcc tcgcaaagcg gttcgggaag 540taccctaacg
tgataggagc tgatatcaag aacgagccgc atggtgaagc cgggtggggt 600acgggagatg
agcgggattt ccgcctcttt gccgagaagg tcgggcgcga gatactcaag 660gtggccccac
actggttgat attcgtcgag ggaacgcaat atacccatgt cccgaatatt 720gatgagatca
tcgagaagaa gggctggtgg acattttggg gagagaatct tatgggagtt 780aaggactatc
cagtcaggct tccgcgcggc aaggtcgtgt actcaccgca tgtctatgga 840ccatctgtct
acatgatgga ctacttcaag tcgccagact ttccgaacaa tatgccgata 900atctgggaaa
cacacttcgg atacttgacc gacctgaatt ataccttggt cataggcgag 960tggggtggca
actatgaggg ccttgacaag gtgtggcaag acgctttcgt gaagtggctg 1020attaagaaga
agatctataa cttcttctac tggtgcctga acccggagtc gggtgacacc 1080ggtggcatct
ttctcgacga ctggaaaacc gttaactggg aaaagatgag ggttatttac 1140aggctcatca
aggcggcgaa ccccgagttt gaggaacccc tttacatcat tttgaaaact 1200aacgcgacga
catctatcct gggcgtgggt gagaggatcc ggatttactg gtacacaaat 1260ggcaaagtta
ttgactctaa cttcgcgcat tccagcgaag gcgaaatgaa cattacagtg 1320acgaagtcca
tgactctgta catcatcgtg aagaagggca atcagacact gaggaaggaa 1380ctcaaactgt
acgttatcgg cggcaattac ggctccaata tctccactac ccagctggtt 1440actcccaaga
aaggcggcga aaggattagc accagcctga agctggcaat tagcctgctc 1500ttcattctcc
tcttcgtttg gtatctcctc cgggagaagc at
15428514PRTPyrococcus abyssi 8Met Glu Ile Lys Leu Phe Cys Val Phe Ile Val
Phe Ile Ile Leu Phe 1 5 10
15 Ser Pro Phe Val Ile Ala Leu Ser Tyr Pro Asp Val Asn Tyr Thr Ala
20 25 30 Glu Asn
Gly Ile Ile Phe Val Gln Asn Val Thr Thr Gly Glu Lys Lys 35
40 45 Pro Leu Tyr Leu His Gly Val
Ser Trp Phe Gly Phe Glu Leu Lys Asp 50 55
60 His Val Val Tyr Gly Leu Asp Lys Arg Asn Trp Lys
Asp Ile Leu Lys 65 70 75
80 Asp Val Lys Arg Leu Gly Phe Asn Ala Ile Arg Leu Pro Phe Cys Ser
85 90 95 Glu Ser Ile
Arg Pro Asp Thr Arg Pro Ser Pro Glu Arg Ile Asn Tyr 100
105 110 Glu Leu Asn Pro Asp Leu Lys
Asn Leu Thr Ser Leu Glu Ile Met Glu 115 120
125 Lys Ile Ile Glu Tyr Ala Asn Ser Ile Gly Leu Tyr
Ile Leu Leu Asp 130 135 140
Tyr His Arg Ile Gly Cys Glu Glu Ile Glu Pro Leu Trp Tyr Thr Glu 145
150 155 160 Asn Tyr Ser
Glu Glu Gln Tyr Ile Lys Asp Trp Ile Phe Leu Ala Lys 165
170 175 Arg Phe Gly Lys Tyr Pro Asn Val
Ile Gly Ala Asp Ile Lys Asn Glu 180 185
190 Pro His Gly Glu Ala Gly Trp Gly Thr Gly Asp Glu
Arg Asp Phe Arg 195 200 205
Leu Phe Ala Glu Lys Val Gly Arg Glu Ile Leu Lys Val Ala Pro His
210 215 220 Trp Leu Ile
Phe Val Glu Gly Thr Gln Tyr Thr His Val Pro Asn Ile 225
230 235 240 Asp Glu Ile Ile Glu Lys Lys
Gly Trp Trp Thr Phe Trp Gly Glu Asn 245
250 255 Leu Met Gly Val Lys Asp Tyr Pro Val Arg Leu
Pro Arg Gly Lys Val 260 265
270 Val Tyr Ser Pro His Val Tyr Gly Pro Ser Val Tyr Met Met Asp
Tyr 275 280 285 Phe
Lys Ser Pro Asp Phe Pro Asn Asn Met Pro Ile Ile Trp Glu Thr 290
295 300 His Phe Gly Tyr Leu Thr
Asp Leu Asn Tyr Thr Leu Val Ile Gly Glu 305 310
315 320 Trp Gly Gly Asn Tyr Glu Gly Leu Asp Lys Val
Trp Gln Asp Ala Phe 325 330
335 Val Lys Trp Leu Ile Lys Lys Lys Ile Tyr Asn Phe Phe Tyr Trp Cys
340 345 350 Leu Asn
Pro Glu Ser Gly Asp Thr Gly Gly Ile Phe Leu Asp Asp Trp 355
360 365 Lys Thr Val Asn Trp Glu Lys
Met Arg Val Ile Tyr Arg Leu Ile Lys 370 375
380 Ala Ala Asn Pro Glu Phe Glu Glu Pro Leu Tyr Ile
Ile Leu Lys Thr 385 390 395
400 Asn Ala Thr Thr Ser Ile Leu Gly Val Gly Glu Arg Ile Arg Ile Tyr
405 410 415 Trp Tyr Thr
Asn Gly Lys Val Ile Asp Ser Asn Phe Ala His Ser Ser 420
425 430 Glu Gly Glu Met Asn Ile Thr
Val Thr Lys Ser Met Thr Leu Tyr Ile 435 440
445 Ile Val Lys Lys Gly Asn Gln Thr Leu Arg Lys Glu
Leu Lys Leu Tyr 450 455 460
Val Ile Gly Gly Asn Tyr Gly Ser Asn Ile Ser Thr Thr Gln Leu Val 465
470 475 480 Thr Pro Lys
Lys Gly Gly Glu Arg Ile Ser Thr Ser Leu Lys Leu Ala 485
490 495 Ile Ser Leu Leu Phe Ile Leu Leu
Phe Val Trp Tyr Leu Leu Arg Glu 500 505
510 Lys His 9987DNAThermotoga petrophila 9atggaaacgc
tcctccctgt agtcgtggtc cacgatattg agccagtttc aatgcgtctt 60cagaggtaca
agaacaaaaa ttcgataaaa agagaaaagc agggattaat acccctgttt 120ttttattttt
gggtgtattt agttctattt gcgaattttc agattttgaa tgtaaacatt 180ttcataataa
gatgttttct ggaggtgata atggtggtac tgatgacaaa accgggaaca 240tcggattttg
tatggaatgg cattcccctt tccatggagc tgaatctgtg gaacataaag 300gaatactccg
gttctgtagc tatgaaattc gacggtgaaa aggtaacttt cgacgcggac 360attcagaatc
tttctccaaa agaaccagaa aggtacgttc tcggttatcc cgagttctat 420tacggttata
aaccctggga aaagcacacg gcagaaggtt cgaaacttcc agtacctgtt 480tcctctatga
aatcattttc cgtcgaagtt tctttcgata ttcaccacga accgtctctg 540cctttgaact
ttgccatgga aacatggctc acaagagaaa agtaccagac ggaagcgtcg 600atcggcgatg
ttgaaatcat ggtctggttc tatttcaaca atctcacacc agggggcaaa 660aagatagagg
agtttacgat tccgttcgtg ctgaacggag agagtgtcga aggcacctgg 720gaactgtggc
acgcggagtg gggatgggac tacctcgctt tccgcttgaa ggatcccgtg 780aagaagggaa
gggtgaagtt cgacgtgagg cattttcttg atgccgccgg gaaagctctt 840tcgaattcca
ctcgtgtgaa agattttgaa aatctttact tcaccgtctg ggaaattgga 900accgagtttg
gaagcccgga aacaaagagc gcgcaattcg ggtggaagtt tgaaaacttc 960tctattgatc
tggaggtgag agaatga
98710328PRTThermotoga petrophila 10Met Glu Thr Leu Leu Pro Val Val Val
Val His Asp Ile Glu Pro Val 1 5 10
15 Ser Met Arg Leu Gln Arg Tyr Lys Asn Lys Asn Ser Ile Lys
Arg Glu 20 25 30
Lys Gln Gly Leu Ile Pro Leu Phe Phe Tyr Phe Trp Val Tyr Leu Val
35 40 45 Leu Phe Ala Asn
Phe Gln Ile Leu Asn Val Asn Ile Phe Ile Ile Arg 50
55 60 Cys Phe Leu Glu Val Ile Met Val
Val Leu Met Thr Lys Pro Gly Thr 65 70
75 80 Ser Asp Phe Val Trp Asn Gly Ile Pro Leu Ser Met
Glu Leu Asn Leu 85 90
95 Trp Asn Ile Lys Glu Tyr Ser Gly Ser Val Ala Met Lys Phe Asp Gly
100 105 110 Glu Lys
Val Thr Phe Asp Ala Asp Ile Gln Asn Leu Ser Pro Lys Glu 115
120 125 Pro Glu Arg Tyr Val Leu Gly
Tyr Pro Glu Phe Tyr Tyr Gly Tyr Lys 130 135
140 Pro Trp Glu Lys His Thr Ala Glu Gly Ser Lys Leu
Pro Val Pro Val 145 150 155
160 Ser Ser Met Lys Ser Phe Ser Val Glu Val Ser Phe Asp Ile His His
165 170 175 Glu Pro Ser
Leu Pro Leu Asn Phe Ala Met Glu Thr Trp Leu Thr Arg 180
185 190 Glu Lys Tyr Gln Thr Glu Ala
Ser Ile Gly Asp Val Glu Ile Met Val 195 200
205 Trp Phe Tyr Phe Asn Asn Leu Thr Pro Gly Gly Lys
Lys Ile Glu Glu 210 215 220
Phe Thr Ile Pro Phe Val Leu Asn Gly Glu Ser Val Glu Gly Thr Trp 225
230 235 240 Glu Leu Trp
His Ala Glu Trp Gly Trp Asp Tyr Leu Ala Phe Arg Leu 245
250 255 Lys Asp Pro Val Lys Lys Gly Arg
Val Lys Phe Asp Val Arg His Phe 260 265
270 Leu Asp Ala Ala Gly Lys Ala Leu Ser Asn Ser Thr
Arg Val Lys Asp 275 280 285
Phe Glu Asn Leu Tyr Phe Thr Val Trp Glu Ile Gly Thr Glu Phe Gly
290 295 300 Ser Pro Glu
Thr Lys Ser Ala Gln Phe Gly Trp Lys Phe Glu Asn Phe 305
310 315 320 Ser Ile Asp Leu Glu Val Arg
Glu 325 11852DNACaldivirga maquilingenesis
11atgttgaaac ttattccact tgttaatggc aattataagt tgattcaatg ggagccactc
60ggcggcgtgc acggagcaga tatcgagtgc atacatgtta ccccaaacgt atggaacata
120gataaatcat cagttggcac tgtacagatc gaatatgagc cccaagttgg ctgtcttcgt
180ttttcaattg atttcccgag gataagtata agacataatg taggcgtagc ggcatattca
240gaagttattt acggacacaa gccgtggggc cccaccactt gcatggaccc tcagttcaag
300ttccctatca aagtcaatga gtcaaaagga ctgtactcgt atgtaaatta taacgttaaa
360tctaggtcac cagatgactc aatctttaat attgcttacg atctctggct tacaacgtcc
420ccaaacctta caaacggacc ccagccagga gacgtagaag ttatgatctg gttgtactac
480cacggacagc gccctgcagg cagactcatc ggggaactcc gcatgccgat tacattgggc
540gatagtgagg cggcacgtga ctttgaagta tgggtggctg acacaggaat aggaatcggt
600gaatgggcgg tagtgacctt cagaatcaag gacccaataa agggcggttt gataggagtt
660aacctcataa actacatcga aagtgctttt aaaacgctcg aagaactcaa cccggtcaag
720tggcggtacg gcgacctgct caacaaatat cttaatggaa ttgaattcgg cagtgagttt
780ggtaatgtct cctcaggaat gataaaactt aattgggaac tctgcggcct gagccttgtg
840aaagactctt ct
85212284PRTThermotoga petrophila 12Met Leu Lys Leu Ile Pro Leu Val Asn
Gly Asn Tyr Lys Leu Ile Gln 1 5 10
15 Trp Glu Pro Leu Gly Gly Val His Gly Ala Asp Ile Glu Cys
Ile His 20 25 30
Val Thr Pro Asn Val Trp Asn Ile Asp Lys Ser Ser Val Gly Thr Val
35 40 45 Gln Ile Glu Tyr
Glu Pro Gln Val Gly Cys Leu Arg Phe Ser Ile Asp 50
55 60 Phe Pro Arg Ile Ser Ile Arg His
Asn Val Gly Val Ala Ala Tyr Ser 65 70
75 80 Glu Val Ile Tyr Gly His Lys Pro Trp Gly Pro Thr
Thr Cys Met Asp 85 90
95 Pro Gln Phe Lys Phe Pro Ile Lys Val Asn Glu Ser Lys Gly Leu Tyr
100 105 110 Ser Tyr
Val Asn Tyr Asn Val Lys Ser Arg Ser Pro Asp Asp Ser Ile 115
120 125 Phe Asn Ile Ala Tyr Asp Leu
Trp Leu Thr Thr Ser Pro Asn Leu Thr 130 135
140 Asn Gly Pro Gln Pro Gly Asp Val Glu Val Met Ile
Trp Leu Tyr Tyr 145 150 155
160 His Gly Gln Arg Pro Ala Gly Arg Leu Ile Gly Glu Leu Arg Met Pro
165 170 175 Ile Thr Leu
Gly Asp Ser Glu Ala Ala Arg Asp Phe Glu Val Trp Val 180
185 190 Ala Asp Thr Gly Ile Gly Ile
Gly Glu Trp Ala Val Val Thr Phe Arg 195 200
205 Ile Lys Asp Pro Ile Lys Gly Gly Leu Ile Gly Val
Asn Leu Ile Asn 210 215 220
Tyr Ile Glu Ser Ala Phe Lys Thr Leu Glu Glu Leu Asn Pro Val Lys 225
230 235 240 Trp Arg Tyr
Gly Asp Leu Leu Asn Lys Tyr Leu Asn Gly Ile Glu Phe 245
250 255 Gly Ser Glu Phe Gly Asn Val Ser
Ser Gly Met Ile Lys Leu Asn Trp 260 265
270 Glu Leu Cys Gly Leu Ser Leu Val Lys Asp Ser Ser
275 280 132166DNAThermotoga
petrophila 13atgatgggaa agatcgatga aatcctttca cagctgacta ttgaagaaaa
agtgaaactt 60gtagtggggg ttggtcttcc aggacttttt ggaaatccac attccagagt
ggcaggtgca 120gctggagaaa cgcatcctgt tccgaggctt ggaattcctt ctttcgttct
ggccgacggt 180cccgcgggcc tcagaataaa tcccacaaga gagaacgacg aaaacaccta
ttacacaaca 240gcgtttcctg ttgaaatcat gctcgcttcc acctggaaca aagatcttct
ggaagaagta 300ggaaaagcta tgggagaaga agtcagggaa tacggtgtcg atgtgcttct
tgcacctgcg 360atgaacattc acaggaaccc tctttgtgga aggaatttcg agtattattc
agaagatcct 420gtcctttccg gtgaaatggc ttcagccttt gtcaagggag ttcaatctca
aggggtggga 480gcctgcataa aacactttgt cgcgaacaac caggaaacga acaggatggt
agtggacacg 540atcgtgtccg agcgagccct cagagaaata tatctgaaag gttttgaaat
tgccgtcaag 600aaagcaagac cctggaccgt gatgagcgct tacaacaaac tgaatggaaa
atactgttca 660cagaacgaat ggcttttgaa gaaggttctc agggaagaat ggggatttga
cggtttcgtg 720atgagcgact ggtacgcggg agacaaccct gtagaacagc tcaaggccgg
aaacgatatg 780atcatgcctg gaaaagcgta tcaggtgaac acggaaagaa gagatgaaat
agaagaaatc 840atggaggcgt tgaaggaggg aagactcagt gaggaagtcc tgaacgaatg
tgtgagaaac 900atcctcaaag ttcttgtgaa cgcgccttcc tttaaagggt acaggtactc
gaacaaaccg 960gacctcgaat ctcacgcgaa agttgcctac gaagcaggtg tggagggtgt
tgtccttctt 1020gagaacaacg gtgttcttcc attcgatgaa agtatccatg tcgccgtctt
tggcaccggt 1080caaatcgaaa caataaaggg aggaacggga agtggagaca cccatccgag
atacacgatc 1140tctatccttg aaggcataaa agaaagaaac atgaagttcg acgaagaact
cacctccatc 1200tatgaggatt acatcaaaaa gatgagagaa acagaggaat ataaacccag
aactgactcc 1260tggggaacgg ttataaaacc gaaacttcca gagaactttc tctcagaaaa
agagataaag 1320aaggctgcga agaaaaacga tgctgcagtt gttgtaatca gtaggatctc
cggtgaggga 1380tacgacagaa agccggtgaa aggtgacttc acctctccga tgacgagctg
gagctcataa 1440aaacagtctc aagggaattc cacgaacagg gtaagaaggt tgtggttctt
ctcaacatcg 1500gaagtcccat tgaagttgca agctggagag atcttgtgga tggaatcctt
ctcgtctggc 1560aagcaggaca ggagatggga agaatagtgg ccgatgttct tgtgggaagg
gtaaacccct 1620ccggaaaact tccaacgacc ttcccgaagg attactcgga cgttccatcc
tggacgttcc 1680caggagagcc aaaggacaat ccgcaaagag tggtgtacga ggaagacatc
tacgtgggat 1740acaggtacta cgacaccttt ggtgtggaac ctgcctacga gttcggctac
ggcctctctt 1800acacaaagtt tgaatacaaa gatttaaaga tcgctatcga cggagatata
ctcagagtgt 1860cgtacacgat cacaaacacc ggggacagag ctggaaagga agtctcacag
gtttatgtca 1920aagctccaaa agggaaaata gacaaaccct tccaggagct gaaagcgttc
cacaaaacaa 1980aacttttgaa cccgggtgaa tccgaaaaga tctttctgga aattcctctt
agagatcttg 2040cgagtttcga tgggaaagaa tggttgtcga gtcaggagaa tacgaggtca
gggtcggtgc 2100atcttcgagg gatataggtt gagagatatt tttctggttg agggagagaa
gagattcaaa 2160ccatga
216614722PRTThermotoga petrophila 14Met Met Gly Lys Ile Asp
Glu Ile Leu Ser Gln Leu Thr Ile Glu Glu 1 5
10 15 Lys Val Lys Leu Val Val Gly Val Gly Leu Pro
Gly Leu Phe Gly Asn 20 25
30 Pro His Ser Arg Val Ala Gly Ala Ala Gly Glu Thr His Pro Val
Pro 35 40 45 Arg
Leu Gly Ile Pro Ser Phe Val Leu Ala Asp Gly Pro Ala Gly Leu 50
55 60 Arg Ile Asn Pro Thr Arg
Glu Asn Asp Glu Asn Thr Tyr Tyr Thr Thr 65 70
75 80 Ala Phe Pro Val Glu Ile Met Leu Ala Ser Thr
Trp Asn Lys Asp Leu 85 90
95 Leu Glu Glu Val Gly Lys Ala Met Gly Glu Glu Val Arg Glu Tyr Gly
100 105 110 Val Asp
Val Leu Leu Ala Pro Ala Met Asn Ile His Arg Asn Pro Leu 115
120 125 Cys Gly Arg Asn Phe Glu Tyr
Tyr Ser Glu Asp Pro Val Leu Ser Gly 130 135
140 Glu Met Ala Ser Ala Phe Val Lys Gly Val Gln Ser
Gln Gly Val Gly 145 150 155
160 Ala Cys Ile Lys His Phe Val Ala Asn Asn Gln Glu Thr Asn Arg Met
165 170 175 Val Val Asp
Thr Ile Val Ser Glu Arg Ala Leu Arg Glu Ile Tyr Leu 180
185 190 Lys Gly Phe Glu Ile Ala Val
Lys Lys Ala Arg Pro Trp Thr Val Met 195 200
205 Ser Ala Tyr Asn Lys Leu Asn Gly Lys Tyr Cys Ser
Gln Asn Glu Trp 210 215 220
Leu Leu Lys Lys Val Leu Arg Glu Glu Trp Gly Phe Asp Gly Phe Val 225
230 235 240 Met Ser Asp
Trp Tyr Ala Gly Asp Asn Pro Val Glu Gln Leu Lys Ala 245
250 255 Gly Asn Asp Met Ile Met Pro Gly
Lys Ala Tyr Gln Val Asn Thr Glu 260 265
270 Arg Arg Asp Glu Ile Glu Glu Ile Met Glu Ala Leu
Lys Glu Gly Arg 275 280 285
Leu Ser Glu Glu Val Leu Asn Glu Cys Val Arg Asn Ile Leu Lys Val
290 295 300 Leu Val Asn
Ala Pro Ser Phe Lys Gly Tyr Arg Tyr Ser Asn Lys Pro 305
310 315 320 Asp Leu Glu Ser His Ala Lys
Val Ala Tyr Glu Ala Gly Val Glu Gly 325
330 335 Val Val Leu Leu Glu Asn Asn Gly Val Leu Pro
Phe Asp Glu Ser Ile 340 345
350 His Val Ala Val Phe Gly Thr Gly Gln Ile Glu Thr Ile Lys Gly
Gly 355 360 365 Thr
Gly Ser Gly Asp Thr His Pro Arg Tyr Thr Ile Ser Ile Leu Glu 370
375 380 Gly Ile Lys Glu Arg Asn
Met Lys Phe Asp Glu Glu Leu Thr Ser Ile 385 390
395 400 Tyr Glu Asp Tyr Ile Lys Lys Met Arg Glu Thr
Glu Glu Tyr Lys Pro 405 410
415 Arg Thr Asp Ser Trp Gly Thr Val Ile Lys Pro Lys Leu Pro Glu Asn
420 425 430 Phe Leu
Ser Glu Lys Glu Ile Lys Lys Ala Ala Lys Lys Asn Asp Ala 435
440 445 Ala Val Val Val Ile Ser Arg
Ile Ser Gly Glu Gly Tyr Asp Arg Lys 450 455
460 Pro Val Lys Gly Asp Phe Tyr Leu Ser Asp Asp Glu
Leu Glu Leu Ile 465 470 475
480 Lys Thr Val Ser Arg Glu Phe His Glu Gln Gly Lys Lys Val Val Val
485 490 495 Leu Leu Asn
Ile Gly Ser Pro Ile Glu Val Ala Ser Trp Arg Asp Leu 500
505 510 Val Asp Gly Ile Leu Leu Val
Trp Gln Ala Gly Gln Glu Met Gly Arg 515 520
525 Ile Val Ala Asp Val Leu Val Gly Arg Val Asn Pro
Ser Gly Lys Leu 530 535 540
Pro Thr Thr Phe Pro Lys Asp Tyr Ser Asp Val Pro Ser Trp Thr Phe 545
550 555 560 Pro Gly Glu
Pro Lys Asp Asn Pro Gln Arg Val Val Tyr Glu Glu Asp 565
570 575 Ile Tyr Val Gly Tyr Arg Tyr Tyr
Asp Thr Phe Gly Val Glu Pro Ala 580 585
590 Tyr Glu Phe Gly Tyr Gly Leu Ser Tyr Thr Lys Phe
Glu Tyr Lys Asp 595 600 605
Leu Lys Ile Ala Ile Asp Gly Asp Ile Leu Arg Val Ser Tyr Thr Ile
610 615 620 Thr Asn Thr
Gly Asp Arg Ala Gly Lys Glu Val Ser Gln Val Tyr Val 625
630 635 640 Lys Ala Pro Lys Gly Lys Ile
Asp Lys Pro Phe Gln Glu Leu Lys Ala 645
650 655 Phe His Lys Thr Lys Leu Leu Asn Pro Gly Glu
Ser Glu Lys Ile Phe 660 665
670 Leu Glu Ile Pro Leu Arg Asp Leu Ala Ser Phe Asp Gly Lys Glu
Trp 675 680 685 Val
Val Glu Ser Gly Glu Tyr Glu Val Arg Val Gly Ala Ser Ser Arg 690
695 700 Asp Ile Arg Leu Arg Asp
Ile Phe Leu Val Glu Gly Glu Lys Arg Phe 705 710
715 720 Lys Pro 151341DNAThermotoga petrophila
15atgaacgtga aaaagttccc tgaaggattc ctctggggtg ttgcaacagc ttcctaccag
60atcgagggtt ctcccctcgc agacggagct ggtatgtcta tctggcacac cttctcccat
120actcctggaa atgtaaagaa cggtgacacg ggagatgtgg cctgcgacca ctacaacaga
180tggaaagagg acattgaaat catagagaaa ctcggagtaa aggcttacag attttcaatc
240agctggccaa gaatacttcc ggaaggaaca ggaagggtga atcagaaagg actggatttt
300tacaacagga tcatagacac cctgctggaa aaaggtatca caccctttgt gaccatctat
360cactgggatc ttcccttcgc tcttcagttg aaaggaggat gggcgaacag agaaatagcg
420gattggttcg cagaatactc aagggttctc tttgaaaatt tcggcgaccg tgtgaagaac
480tggatcacct tgaacgaacc gtgggttgtt gccatagtgg ggcatctgta cggagtccac
540gctcctggaa tgagagatat ttacgtggct ttccgagctg ttcacaatct cttgagggca
600cacgccaaag cggtgaaagt gttcagggaa actgtgaaag atggaaagat cggaatagtt
660ttcaacaatg gatatttcga acctgcgagt gaaaaagagg aggacatcag agcggcgaga
720ttcatgcatc agttcaacaa ctatcctctc tttctcaatc cgatctacag aggagattat
780ccggagctcg ttctggaatt tgccagagag tatctaccgg agaattacaa agatgacatg
840tccgagatac aggaaaagat cgactttgtt ggattgaact attactccgg tcatttggtg
900aagttcgatc cagatgcacc agctaaggtc tctttcgttg aaagggatct tccaaaaaca
960gccatgggat gggagatcgt tccagaagga atctactgga tcctgaagaa ggtgaaagaa
1020gaatacaacc caccagaggt ttacatcaca gagaatgggg ctgcttttga cgacgtagtt
1080agtgaagatg gaagagttca cgatcaaaac agaatcgatt atttgaaggc ccacattggt
1140caggcatgga aggccataca ggagggagtg ccgcttaaag gttacttcgt ctggtcgctc
1200ctcgacaatt tcgaatgggc agagggatat tccaagagat ttggtattgt gtacgtggac
1260tacagtactc aaaaacgcat cataaaagac agtggttact ggtactcgaa cgtggtcaaa
1320agcaacagtc tggaagattg a
134116446PRTThermotoga petrophila 16Met Asn Val Lys Lys Phe Pro Glu Gly
Phe Leu Trp Gly Val Ala Thr 1 5 10
15 Ala Ser Tyr Gln Ile Glu Gly Ser Pro Leu Ala Asp Gly Ala
Gly Met 20 25 30
Ser Ile Trp His Thr Phe Ser His Thr Pro Gly Asn Val Lys Asn Gly
35 40 45 Asp Thr Gly Asp
Val Ala Cys Asp His Tyr Asn Arg Trp Lys Glu Asp 50
55 60 Ile Glu Ile Ile Glu Lys Leu Gly
Val Lys Ala Tyr Arg Phe Ser Ile 65 70
75 80 Ser Trp Pro Arg Ile Leu Pro Glu Gly Thr Gly Arg
Val Asn Gln Lys 85 90
95 Gly Leu Asp Phe Tyr Asn Arg Ile Ile Asp Thr Leu Leu Glu Lys Gly
100 105 110 Ile Thr
Pro Phe Val Thr Ile Tyr His Trp Asp Leu Pro Phe Ala Leu 115
120 125 Gln Leu Lys Gly Gly Trp Ala
Asn Arg Glu Ile Ala Asp Trp Phe Ala 130 135
140 Glu Tyr Ser Arg Val Leu Phe Glu Asn Phe Gly Asp
Arg Val Lys Asn 145 150 155
160 Trp Ile Thr Leu Asn Glu Pro Trp Val Val Ala Ile Val Gly His Leu
165 170 175 Tyr Gly Val
His Ala Pro Gly Met Arg Asp Ile Tyr Val Ala Phe Arg 180
185 190 Ala Val His Asn Leu Leu Arg
Ala His Ala Lys Ala Val Lys Val Phe 195 200
205 Arg Glu Thr Val Lys Asp Gly Lys Ile Gly Ile Val
Phe Asn Asn Gly 210 215 220
Tyr Phe Glu Pro Ala Ser Glu Lys Glu Glu Asp Ile Arg Ala Ala Arg 225
230 235 240 Phe Met His
Gln Phe Asn Asn Tyr Pro Leu Phe Leu Asn Pro Ile Tyr 245
250 255 Arg Gly Asp Tyr Pro Glu Leu Val
Leu Glu Phe Ala Arg Glu Tyr Leu 260 265
270 Pro Glu Asn Tyr Lys Asp Asp Met Ser Glu Ile Gln
Glu Lys Ile Asp 275 280 285
Phe Val Gly Leu Asn Tyr Tyr Ser Gly His Leu Val Lys Phe Asp Pro
290 295 300 Asp Ala Pro
Ala Lys Val Ser Phe Val Glu Arg Asp Leu Pro Lys Thr 305
310 315 320 Ala Met Gly Trp Glu Ile Val
Pro Glu Gly Ile Tyr Trp Ile Leu Lys 325
330 335 Lys Val Lys Glu Glu Tyr Asn Pro Pro Glu Val
Tyr Ile Thr Glu Asn 340 345
350 Gly Ala Ala Phe Asp Asp Val Val Ser Glu Asp Gly Arg Val His
Asp 355 360 365 Gln
Asn Arg Ile Asp Tyr Leu Lys Ala His Ile Gly Gln Ala Trp Lys 370
375 380 Ala Ile Gln Glu Gly Val
Pro Leu Lys Gly Tyr Phe Val Trp Ser Leu 385 390
395 400 Leu Asp Asn Phe Glu Trp Ala Glu Gly Tyr Ser
Lys Arg Phe Gly Ile 405 410
415 Val Tyr Val Asp Tyr Ser Thr Gln Lys Arg Ile Ile Lys Asp Ser Gly
420 425 430 Tyr Trp
Tyr Ser Asn Val Val Lys Ser Asn Ser Leu Glu Asp 435
440 445 1766DNAArtificialsynthetic sequence
encoding fusion sequence with six histidines 17gaacaaaaac tcatctcaga
agaggatctg aatagcgccg tcgaccatca tcatcatcat 60catcat
661821PRTArtificialsynthetic fusion sequence with six histidines 18Glu
Gln Lys Leu Ile Ser Glu Glu Asp Leu Asn Ser Ala Val Asp His 1
5 10 15 His His His His His
20 1941DNAArtificialsynthetic oligonucleotide primer
19atatccatgg aggggaatac tattcttaaa atcgtactaa t
412031DNAArtificialsynthetic oligonucleotide primer 20atgctctaga
aacctgggag cccttcttaa g
312120DNAArtificialsynthetic oligonucleotide primer 21gaaacgctcc
tccctgtagt
202238DNAArtificialsynthetic oligonucleotide primer 22atgctctaga
aattctctca cctccagatc aatagaga
382322DNAArtificialsynthetic oligonucleotide primer 23aggtgggtag
ttcttctgat gg
222441DNAArtificialsynthetic oligonucleotide primer 24atgctctaga
aattttacaa cttcgacgaa gaagtctttg a
412533DNAArtificialsynthetic oligonucleotide primer 25atatccatgg
gaaagatcga tgaaatcctt tca
332634DNAArtificialsynthetic oligonucleotide primer 26atgctctaga
aatggtttga atctcttctc tccc
342722DNAArtificialsynthetic oligonucleotide primer 27aacgtgaaaa
agttccctga ag
222834DNAArtificialsynthetic oligonucleotide primer 28atgctctaga
aaatcttcca gactgttgct tttg
34291230DNACaldivirga maquilingensis 29atggactact ctatcaactg ctctatcaac
cctataaccc tcatggtcgc gcactcttct 60cccctgaacc catctaacac actcgaactt
acacttattc tcgaaaatgg catcaccacc 120acagtaactg tcaccgcgac accacgcaac
acttacccta tgatctccct tggctacatt 180aatattaccc ctaacctctg gaaccttaac
acagcttcgt catcaggata cgcctctatg 240gtctacgatg catcacaggg tgctctttat
attcatgtta atttcacaaa ggtttacctc 300aatcagcaag ttggtgttgc cgcctactct
gaattcatct atggctacaa accctggggc 360acgctcacct ccgaggcagg cgggttcaat
tttcctgtta agcttaccga actcggttct 420cttctttcgt tcatcaatta ctcactcatt
tcatattctc cacaagtcgc tatcttcgat 480tgggcatacg acctttggct cacaacatcc
ccaaatctca ccaacggccc tcaacccggc 540gacgtcgagg tcatgatctg gctctattat
cacctgcaac aacctgcggg ttttcccgtc 600gctaacgtta cagtgccaat atgggtcaat
ggctccctcg ttaacgaaac atttgaggtt 660tggattggtt ctccacagat cgaacccggc
acccacgcta tagtctcctt caggccaacg 720aatccaatcc ctagaggcct cgtcggcgta
aatgtcacga agttccttca acttgccgtt 780aactatctcg tgacactcta cccctcatac
tggaactaca catatctgga gagcaagtac 840ttgaatggca tcgaattcgg atcagaatgg
ggcaatccgt ctacatacaa tattacactc 900aattgggtca tttataaagc ttatcttatc
aaggtgcctc tggagtcaca gggcaccgtt 960accgtcacat atactacaac tgttacatcc
accatgactg ttacctcaat ccttgctacc 1020acatccaccg tcaccactac atctacactt
acatctaccg ttaccgccac ttcagtttct 1080acttccaccg tcacgcagac tctcactacc
tccatcgtca aaaccgtcat ccctgtctac 1140tatactgcca ccataatcgt ccttcttata
atcatcgcag tcgtcattgc acttgcgttc 1200gcccgccgcg gcatccgggt tcgtctctgt
123030410PRTCaldivirga maquilingensis
30Met Asp Tyr Ser Ile Asn Cys Ser Ile Asn Pro Ile Thr Leu Met Val 1
5 10 15 Ala His Ser Ser
Pro Leu Asn Pro Ser Asn Thr Leu Glu Leu Thr Leu 20
25 30 Ile Leu Glu Asn Gly Ile Thr Thr Thr
Val Thr Val Thr Ala Thr Pro 35 40
45 Arg Asn Thr Tyr Pro Met Ile Ser Leu Gly Tyr Ile Asn Ile
Thr Pro 50 55 60
Asn Leu Trp Asn Leu Asn Thr Ala Ser Ser Ser Gly Tyr Ala Ser Met 65
70 75 80 Val Tyr Asp Ala Ser
Gln Gly Ala Leu Tyr Ile His Val Asn Phe Thr 85
90 95 Lys Val Tyr Leu Asn Gln Gln Val Gly Val
Ala Ala Tyr Ser Glu Phe 100 105
110 Ile Tyr Gly Tyr Lys Pro Trp Gly Thr Leu Thr Ser Glu Ala
Gly Gly 115 120 125
Phe Asn Phe Pro Val Lys Leu Thr Glu Leu Gly Ser Leu Leu Ser Phe 130
135 140 Ile Asn Tyr Ser Leu
Ile Ser Tyr Ser Pro Gln Val Ala Ile Phe Asp 145 150
155 160 Trp Ala Tyr Asp Leu Trp Leu Thr Thr Ser
Pro Asn Leu Thr Asn Gly 165 170
175 Pro Gln Pro Gly Asp Val Glu Val Met Ile Trp Leu Tyr Tyr His
Leu 180 185 190 Gln
Gln Pro Ala Gly Phe Pro Val Ala Asn Val Thr Val Pro Ile Trp 195
200 205 Val Asn Gly Ser Leu Val
Asn Glu Thr Phe Glu Val Trp Ile Gly Ser 210 215
220 Pro Gln Ile Glu Pro Gly Thr His Ala Ile Val
Ser Phe Arg Pro Thr 225 230 235
240 Asn Pro Ile Pro Arg Gly Leu Val Gly Val Asn Val Thr Lys Phe Leu
245 250 255 Gln Leu
Ala Val Asn Tyr Leu Val Thr Leu Tyr Pro Ser Tyr Trp Asn 260
265 270 Tyr Thr Tyr Leu Glu Ser
Lys Tyr Leu Asn Gly Ile Glu Phe Gly Ser 275 280
285 Glu Trp Gly Asn Pro Ser Thr Tyr Asn Ile Thr
Leu Asn Trp Val Ile 290 295 300
Tyr Lys Ala Tyr Leu Ile Lys Val Pro Leu Glu Ser Gln Gly Thr Val
305 310 315 320 Thr Val
Thr Tyr Thr Thr Thr Val Thr Ser Thr Met Thr Val Thr Ser
325 330 335 Ile Leu Ala Thr Thr Ser
Thr Val Thr Thr Thr Ser Thr Leu Thr Ser 340
345 350 Thr Val Thr Ala Thr Ser Val Ser Thr Ser
Thr Val Thr Gln Thr Leu 355 360
365 Thr Thr Ser Ile Val Lys Thr Val Ile Pro Val Tyr Tyr Thr
Ala Thr 370 375 380
Ile Ile Val Leu Leu Ile Ile Ile Ala Val Val Ile Ala Leu Ala Phe 385
390 395 400 Ala Arg Arg Gly Ile
Arg Val Arg Leu Cys 405 410
312214DNAPyrococcus horikoshii 31atgagatttc aattcggatt ctccaaagaa
gatgaacagg tgctgggcac aatactaaca 60ctcggaaatg gacaattagg agttagggga
gaatttgaac tcgagagatc tccttatgga 120acgatcgtta gcggggtcta tgattacact
ccctacttct acagggaatt ggtaaatggt 180cccaggacta tagggatgat aataattata
gatggagaac taataaatcc aagctctcaa 240aaagtcaagg aattccagag agagctcgat
atagaaaaag gcttattaag aactcactta 300gagattgaaa caaaaaatgg aaataaaatt
ttatataaaa gtacaaggat agtccacatg 360aaaagaaaaa acctaatcct tctagatttt
gagctaaaag ctagcaaggg aggaatcgca 420gttgtagtta atcccataga attcaatact
gcaaatccag ggtttataga cgagataatg 480atcaagcatt atagagtgga ctcgataaaa
gagactgagg agggagtata cgctagggtg 540aaaactttag acaataagta cacgttggaa
attgcaagta gcttggttcc atcagaatat 600acatcgagga gcacctttag aaccgataat
gaaattggag aaatttacat tgttaaactt 660aaaccaggaa aaacgtacaa atttacaaag
tacgttacag tatctaaagg agcagcttta 720gaggagttaa aagatgttaa gagattagga
tttgaaaagc tatatgaaga gcatataaac 780agctggaaga gaatatggga gaaagtgaaa
gtggaaatcg aaggagataa agaccttgaa 840aatgccctaa actttaacat ttttcacttg
atccaatccc ttccaccaac agataaagtc 900tcgctaccag caaggggaat acatgggttt
gggtataggg gacatatatt ctgggataca 960gagatatatg cattaccttt cttcatattc
acgatgccaa aagaggccag gagattgctc 1020ctctatagat gcaacaactt agatgccgct
aaagaaaatg caaagatgaa tggatatcaa 1080ggggtccaat ttccctggga gtcggcagat
gatggacgcg aggctacccc ctctgagata 1140ccattggata tgttgggaag gaaaatcgtt
agaatttaca ccggagagga ggaacatcac 1200ataactgcgg atatagcata tatagttgat
ttttattacc aagtctctgg agatctcgaa 1260tttatgaaca ggtgtggcct tgagataatc
tttgagacgg cccgattttg ggctagtagg 1320gttgagttcg aggaaggaaa agggtacgtc
attaaaaaag taataggacc tgatgaatac 1380catgagcacg ttaacaacaa cttctttaca
aacttaatgg ccaagcataa tctcgaactt 1440gcaataagat actttagaga gtcaaagaat
agggaaccgt ggaaaaagat tgtcgaaaaa 1500ttaaacataa gagaggagga ggttgaaaaa
tgggaagaga tagctaaaaa catgtacatt 1560cccaggaaga tagacggagt ttttgaagag
tttgatggtt actttgaatt gatggatttt 1620gaagttgatc ccttcaatat tggagaaaaa
acactccccg aggaaatcag gaataacata 1680gggaaaacga aactcgttaa gcaggccgat
gtcatcatgg cccaatatct ccttaaggac 1740tacttctctc cagaggaaat aaagagtaac
tttaactatt atataaggag aactacccat 1800gcttcatcac tctccatgcc cccatacgcg
atcattgcaa cctggatagg ggaggtaaag 1860atagcatatg agtacttcaa gagatgtgca
aatatagatc tcaaaaacgt gtacggaaac 1920actgcagagg gatttcactt agcaacggcg
ggaggaacct ggcaagtact cgtcagagga 1980ttttgtggcc tcaatgtaaa aggaaacaaa
atagagctta atcctaatct tcctgaaaaa 2040tggaagtacg ttaagttcag gatattcttc
aaaggttcat ggatagaatt taaaatttct 2100aggaagaaag ttagggctag aatgcttgaa
ggatcgagaa aagtcaaaat atctagcttt 2160ggaaaggaag tagatctata tcctggaaaa
gaggttgtaa tagtagctaa ttaa 221432737PRTPyrococcus horikoshii
32Met Arg Phe Gln Phe Gly Phe Ser Lys Glu Asp Glu Gln Val Leu Gly 1
5 10 15 Thr Ile Leu Thr
Leu Gly Asn Gly Gln Leu Gly Val Arg Gly Glu Phe 20
25 30 Glu Leu Glu Arg Ser Pro Tyr Gly Thr
Ile Val Ser Gly Val Tyr Asp 35 40
45 Tyr Thr Pro Tyr Phe Tyr Arg Glu Leu Val Asn Gly Pro Arg
Thr Ile 50 55 60
Gly Met Ile Ile Ile Ile Asp Gly Glu Leu Ile Asn Pro Ser Ser Gln 65
70 75 80 Lys Val Lys Glu Phe
Gln Arg Glu Leu Asp Ile Glu Lys Gly Leu Leu 85
90 95 Arg Thr His Leu Glu Ile Glu Thr Lys Asn
Gly Asn Lys Ile Leu Tyr 100 105
110 Lys Ser Thr Arg Ile Val His Met Lys Arg Lys Asn Leu Ile
Leu Leu 115 120 125
Asp Phe Glu Leu Lys Ala Ser Lys Gly Gly Ile Ala Val Val Val Asn 130
135 140 Pro Ile Glu Phe Asn
Thr Ala Asn Pro Gly Phe Ile Asp Glu Ile Met 145 150
155 160 Ile Lys His Tyr Arg Val Asp Ser Ile Lys
Glu Thr Glu Glu Gly Val 165 170
175 Tyr Ala Arg Val Lys Thr Leu Asp Asn Lys Tyr Thr Leu Glu Ile
Ala 180 185 190 Ser
Ser Leu Val Pro Ser Glu Tyr Thr Ser Arg Ser Thr Phe Arg Thr 195
200 205 Asp Asn Glu Ile Gly Glu
Ile Tyr Ile Val Lys Leu Lys Pro Gly Lys 210 215
220 Thr Tyr Lys Phe Thr Lys Tyr Val Thr Val Ser
Lys Gly Ala Ala Leu 225 230 235
240 Glu Glu Leu Lys Asp Val Lys Arg Leu Gly Phe Glu Lys Leu Tyr Glu
245 250 255 Glu His
Ile Asn Ser Trp Lys Arg Ile Trp Glu Lys Val Lys Val Glu 260
265 270 Ile Glu Gly Asp Lys Asp
Leu Glu Asn Ala Leu Asn Phe Asn Ile Phe 275 280
285 His Leu Ile Gln Ser Leu Pro Pro Thr Asp Lys
Val Ser Leu Pro Ala 290 295 300
Arg Gly Ile His Gly Phe Gly Tyr Arg Gly His Ile Phe Trp Asp Thr
305 310 315 320 Glu Ile
Tyr Ala Leu Pro Phe Phe Ile Phe Thr Met Pro Lys Glu Ala
325 330 335 Arg Arg Leu Leu Leu Tyr
Arg Cys Asn Asn Leu Asp Ala Ala Lys Glu 340
345 350 Asn Ala Lys Met Asn Gly Tyr Gln Gly Val
Gln Phe Pro Trp Glu Ser 355 360
365 Ala Asp Asp Gly Arg Glu Ala Thr Pro Ser Glu Ile Pro Leu
Asp Met 370 375 380
Leu Gly Arg Lys Ile Val Arg Ile Tyr Thr Gly Glu Glu Glu His His 385
390 395 400 Ile Thr Ala Asp Ile
Ala Tyr Ile Val Asp Phe Tyr Tyr Gln Val Ser 405
410 415 Gly Asp Leu Glu Phe Met Asn Arg Cys Gly
Leu Glu Ile Ile Phe Glu 420 425
430 Thr Ala Arg Phe Trp Ala Ser Arg Val Glu Phe Glu Glu Gly
Lys Gly 435 440 445
Tyr Val Ile Lys Lys Val Ile Gly Pro Asp Glu Tyr His Glu His Val 450
455 460 Asn Asn Asn Phe Phe
Thr Asn Leu Met Ala Lys His Asn Leu Glu Leu 465 470
475 480 Ala Ile Arg Tyr Phe Arg Glu Ser Lys Asn
Arg Glu Pro Trp Lys Lys 485 490
495 Ile Val Glu Lys Leu Asn Ile Arg Glu Glu Glu Val Glu Lys Trp
Glu 500 505 510 Glu
Ile Ala Lys Asn Met Tyr Ile Pro Arg Lys Ile Asp Gly Val Phe 515
520 525 Glu Glu Phe Asp Gly Tyr
Phe Glu Leu Met Asp Phe Glu Val Asp Pro 530 535
540 Phe Asn Ile Gly Glu Lys Thr Leu Pro Glu Glu
Ile Arg Asn Asn Ile 545 550 555
560 Gly Lys Thr Lys Leu Val Lys Gln Ala Asp Val Ile Met Ala Gln Tyr
565 570 575 Leu Leu
Lys Asp Tyr Phe Ser Pro Glu Glu Ile Lys Ser Asn Phe Asn 580
585 590 Tyr Tyr Ile Arg Arg Thr
Thr His Ala Ser Ser Leu Ser Met Pro Pro 595 600
605 Tyr Ala Ile Ile Ala Thr Trp Ile Gly Glu Val
Lys Ile Ala Tyr Glu 610 615 620
Tyr Phe Lys Arg Cys Ala Asn Ile Asp Leu Lys Asn Val Tyr Gly Asn
625 630 635 640 Thr Ala
Glu Gly Phe His Leu Ala Thr Ala Gly Gly Thr Trp Gln Val
645 650 655 Leu Val Arg Gly Phe Cys
Gly Leu Asn Val Lys Gly Asn Lys Ile Glu 660
665 670 Leu Asn Pro Asn Leu Pro Glu Lys Trp Lys
Tyr Val Lys Phe Arg Ile 675 680
685 Phe Phe Lys Gly Ser Trp Ile Glu Phe Lys Ile Ser Arg Lys
Lys Val 690 695 700
Arg Ala Arg Met Leu Glu Gly Ser Arg Lys Val Lys Ile Ser Ser Phe 705
710 715 720 Gly Lys Glu Val Asp
Leu Tyr Pro Gly Lys Glu Val Val Ile Val Ala 725
730 735 Asn
User Contributions:
Comment about this patent or add new information about this topic: