Patent application title: ARABINOSE ISOMERASES FOR YEAST
Inventors:
IPC8 Class: AC12N990FI
USPC Class:
1 1
Class name:
Publication date: 2019-04-18
Patent application number: 20190112590
Abstract:
A group of arabinose isomerases are disclosed that provide effective
amounts of activity for use of arabinose in production of ethanol, when
expressed in yeast cells expressing the other enzymes of an arabinose
utilization pathway. The group of arabinose isomerases represents a clade
of a phylogenetic tree, having a distinguishing conserved amino acid
sequence motif. Other useful arabinose isomerases are also disclosed.Claims:
1. A recombinant yeast cell comprising an arabinose utilization pathway
that comprises a polypeptide having arabinose isomerase activity, wherein
the yeast cell comprises a heterologous polynucleotide encoding said
polypeptide, wherein the polypeptide comprises a motif that is at least
90% identical to SEQ ID NO:67, to and wherein the position of the motif
in the polypeptide corresponds with positions 237-269 of SEQ ID NO:7.
2. The recombinant yeast cell of claim 1, wherein the motif comprises at least seventeen amino acids selected from the group consisting of: (a) I at position 237; (b) R or K at position 238; (c) Y at position 239; (d) R or K at position 242; (e) E at position 243; (f) I at position 245; (g) A at position 246; (h) I or M at position 247; (i) K at position 249; (j) I or M at position 250; (k) R or A at position 253; (l) E or N at position 254; (m) G at position 255; (n) A or C at position 256; (o) F at position 259; (p) N at position 261; (q) T at position 262; (r) Q or E at position 264; and (s) M at position 269.
3. The recombinant yeast cell of claim 1, wherein the polypeptide comprises a motif that is SEQ ID NO:67.
4. The recombinant yeast cell of claim 1, wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:66.
5. The recombinant yeast cell of claim 4, wherein the polypeptide comprises a motif that is SEQ ID NO:66.
6. The recombinant yeast cell of claim 1, wherein the polypeptide comprises an amino acid sequence that is at least 85% identical to SEQ ID NO:7, 8, 10, 17, 18, or 19.
7. The recombinant yeast cell of claim 1, further comprising a metabolic pathway that produces a target compound, optionally wherein the target compound is ethanol, butanol, or 1,3-propanediol.
8. A method for producing a yeast cell having arabinose isomerase activity, said method comprising: (a) providing a yeast cell lacking arabinose isomerase; and (b) introducing a heterologous polynucleotide into the yeast cell, wherein the heterologous polynucleotide encodes a polypeptide having arabinose isomerase activity, and wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:67.
9. The method of claim 8, wherein: (i) the yeast cell of step (a) comprises one or more polynucleotides encoding enzymes, except an arabinose isomerase, of an arabinose utilization pathway, or (ii) step (b) further comprises introducing, into the yeast cell, one or more polynucleotides encoding enzymes of an arabinose utilization pathway, wherein this further introduction is at the same time of, or after, said introducing the heterologous polynucleotide encoding the polypeptide having arabinose isomerase activity.
10. A method of producing a target compound from arabinose comprising: (a) providing the recombinant yeast cell of claim 7; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and c) optionally isolating the target compound of (b).
11. The method of claim 10, wherein the target compound is ethanol, butanol, or 1,3-propanediol.
12. A recombinant yeast cell comprising an arabinose utilization pathway that comprises a polypeptide having arabinose isomerase activity, wherein the yeast cell comprises a heterologous polynucleotide encoding said polypeptide, and wherein the polypeptide comprises an amino acid sequence that is at least 85% identical to SEQ ID NO:15 or 20.
13. A method of producing a target compound from arabinose comprising: (a) providing the recombinant yeast cell of claim 12; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and c) optionally isolating the target compound of (b).
Description:
[0001] This application claims the benefit of U.S. Provisional Application
No. 62/319,945 (filed Apr. 8, 2016), which is incorporated herein by
reference in its entirety.
FIELD OF INVENTION
[0002] The field of invention relates to genetic engineering, and more specifically to engineering yeast to enhance utilization of arabinose during fermentation.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0003] The official copy of the sequence listing is submitted electronically via EFS-Web as an ASCII formatted sequence listing with a file named CL6357WOPCT_SequenceListing_ST25_ExtraLinesRemoved created on Mar. 31, 2017, and having a size of 284 kilobytes and is filed concurrently with the specification. The sequence listing contained in this ASCII-formatted document is part of the specification and is herein incorporated by reference in its entirety.
BACKGROUND
[0004] Currently, fermentative production of ethanol is typically done with yeast, particularly Saccharomyces cerevisiae, using hexoses obtained from grains or mash as the carbohydrate source. Use of hydrolysate prepared from cellulosic biomass as a carbohydrate source for fermentation is desirable, as this is a readily renewable resource that does not compete with the food supply. The most abundant sugar in cellulosic biomass hydrolysate is glucose, while the pentoses xylose and arabinose are also present. Many biocatalysts, including the yeast Saccharomyces cerevisiae, are not naturally capable of metabolizing xylose or arabinose, but can be engineered to express xylose and/or arabinose utilization pathways. One approach to engineering a xylose utilization pathway in yeast includes introduction of xylose isomerase, and increasing expression of the pentose phosphate pathway including xylulokinase, transaldolase, transketolase 1, D-ribulose-5-phosphate 3-epimerase, and ribose 5-phosphate ketol-isomerase. Use of arabinose can be achieved by additionally engineering expression of L-arabinose isomerase, L-ribulokinase, and L-ribulose-5-phosphate 4-epimerase (e.g., Becker and Boles, Appl. Environ. Microbiol. 69:4144-4150).
[0005] Though yeast strains engineered in this manner can utilize arabinose, such arabinose use is typically inefficient due to poor efficiency of arabinose isomerase, which operates at the first step of the bacterial-type arabinose assimilation pathway. Thus, there remains a need for arabinose isomerases that are more effective when expressed in yeast, and engineered yeast cells that express an arabinose isomerase that allows greater efficiency of arabinose utilization.
SUMMARY
[0006] In one embodiment, the present disclosure concerns a recombinant yeast cell comprising an arabinose utilization pathway that comprises a polypeptide having arabinose isomerase activity, wherein the yeast cell comprises a heterologous polynucleotide encoding the polypeptide, wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:67, and wherein the position of the motif in the polypeptide corresponds with positions 237-269 of SEQ ID NO:7.
[0007] In another embodiment, the present disclosure concerns a method for producing a yeast cell having arabinose isomerase activity. This method comprises: (a) providing a yeast cell lacking arabinose isomerase; and (b) introducing a heterologous polynucleotide into the yeast cell, wherein the heterologous polynucleotide encodes a polypeptide having arabinose isomerase activity, and wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:67.
[0008] In another embodiment, the present disclosure concerns a method of producing a target compound from arabinose comprising: (a) providing a recombinant yeast cell as disclosed herein; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and (c) optionally isolating the target compound of (b).
[0009] In another embodiment, the present disclosure concerns a recombinant yeast cell comprising an arabinose utilization pathway that comprises a polypeptide having arabinose isomerase activity, wherein the yeast cell comprises a heterologous polynucleotide encoding the polypeptide, and wherein the polypeptide comprises an amino acid sequence that is at least 85% identical to SEQ ID NO:15 or 20. The present disclosure also concerns a method of using such a yeast cell to produce a target compound from arabinose comprising: (a) providing the yeast cell; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and c) optionally isolating the target compound of (b).
BRIEF DESCRIPTION OF THE DRAWINGS AND SEQUENCES
[0010] FIG. 1 shows a phylogenetic tree of twenty candidate arabinose isomerase proteins, and the B. subtilis arabinose isomerase protein.
[0011] FIG. 2A shows the 237-269 Motif formula I (SEQ ID NO:67).
[0012] FIG. 2B shows the 237-269 Motif formula II (SEQ ID NO:67) with bold and underline highlighting of specific positions, as described in the detailed description.
[0013] FIG. 2C shows an alignment of twenty candidate arabinose isomerases and the B. subtilis arabinose isomerase in the region of positions 237-269 (with reference to SEQ ID NO:7), with the six members of the 237-269 Motif clade at the top. The BSaraA amino acid sequence used in this alignment is SEQ ID NO:41.
[0014] FIG. 3A is a plasmid map of pSX01 (SEQ ID NO:43).
[0015] FIG. 3B is a plasmid map of pSX208 (SEQ ID NO:47).
[0016] FIG. 4A is a plasmid map of pSX209 (SEQ ID NO:48).
[0017] FIG. 4B is a plasmid map of pSX210 (SEQ ID NO:49).
[0018] FIG. 5A is a plasmid map of pSA0-B (SEQ ID NO:58).
[0019] FIG. 5B is a plasmid map of pSA503 (SEQ ID NO:59).
[0020] FIG. 6 is a graph showing growth (OD600), arabinose use, and production of ethanol during fermentation by yeast strain PX182-araBAD5030 in medium containing arabinose as the only sugar.
[0021] FIG. 7 is a graph of in vitro arabinose isomerase activities (.mu.mol/mg/min) from arabinose- and xylose-utilizing yeast strains expressing different candidate arabinose isomerases or the B. subtilis arabinose isomerase.
TABLE-US-00001
[0022] TABLE 1 SEQ ID NOs for Amino Acid (AA) and Nucleotide (NT) Sequences (Codon-Optimized Coding Regions) of Candidate Arabinose Isomerases. Designation SEQ ID NO: AA SEQ ID NO: NT HMPREF9412_4417 1 21 POTG_01507 2 22 HMPREF9374_3716 3 23 DORFOR_01282 4 24 HMPREF0994_04908 5 25 NODE_4061684 6 26 NODE_3664377 7 27 NODE_458803 8 28 NODE_3921064 9 29 NODE_3693095 10 30 DORLON_00938 11 31 HMPREF9469_04726 12 32 HMPREF9467_00216 13 33 RTO_26010 14 34 BRYFOR_08166 15 35 RUMOBE_03031 16 36 NODE_3658038 17 37 NODE_4175755 18 38 NODE_2588280 19 39 NODE_3735508 20 40
[0023] Arabinose isomerase designations in Table 1 are from the cow rumen metagenome dataset ("NODE" designations; Hess et al., Science 331:463-467, incorporated herein by reference) or the human microbiome dataset (The Human Microbiome Jumpstart Reference Strains Consortium et al., Science 328:994-999, incorporated herein by reference).
[0024] SEQ ID NO:41 is the amino acid sequence of the arabinose isomerase from B. subtilis (BSaraA).
[0025] SEQ ID NO:42 is the nucleotide sequence of the codon-optimized coding to region for the B. subtilis arabinose isomerase.
[0026] SEQ ID NO:43 is the nucleotide sequence of the plasmid pSX01.
[0027] SEQ ID NO:44 is the amino acid sequence of the xylose isomerase VDxylA.
[0028] SEQ ID NO:45 is the nucleotide sequence of the codon-optimized coding region for the VDxylA arabinose isomerase.
[0029] SEQ ID NO:46 is the nucleotide sequence of the 2966-bp chimeric expression cassette designated as UAS(FBA1)::PDC1p::VDxylA::ILV5t.
[0030] SEQ ID NO:47 is the nucleotide sequence of the plasmid pSX208.
[0031] SEQ ID NO:48 is the nucleotide sequence of the plasmid pSX209.
[0032] SEQ ID NO:49 is the nucleotide sequence of the plasmid pSX210.
[0033] SEQ ID NO:50 is the nucleotide sequence of the CRE recombinase vector pJT254.
[0034] SEQ ID NO:51 is the nucleotide sequence of the 2429-bp chimeric expression cassette designated as ADHp::BSaraA::CYC1t.
[0035] SEQ ID NO:52 is the amino acid sequence of the E. coli araB gene-encoded ribulokinase.
[0036] SEQ ID NO:53 is the nucleotide sequence of the codon-optimized coding region for the E. coli ribulokinase.
[0037] SEQ ID NO:54 is the nucleotide sequence of the 2907-bp chimeric expression cassette designated as ILV5p::ECaraB::PHO13-3'UTR.
[0038] SEQ ID NO:55 is the amino acid sequence of the E. coli araD gene-encoded L-ribulose-5-phosphate 4-epimerase.
[0039] SEQ ID NO:56 is the nucleotide sequence of the codon-optimized coding region for the E. coli L-ribulose-5-phosphate 4-epimerase.
[0040] SEQ ID NO:57 is the nucleotide sequence of the 1691-bp chimeric expression cassette designated as GPDp::ECaraD::ADH1t.
[0041] SEQ ID NO:58 is the nucleotide sequence of the plasmid pSA0-B.
[0042] SEQ ID NO:59 is the nucleotide sequence of the plasmid pSA503.
[0043] SEQ ID NO:60 is the amino acid sequence of the arabinose isomerase from E. coli.
[0044] SEQ ID NO:61 is the amino acid sequence of the arabinose isomerase from Bacillus licheniformis.
[0045] SEQ ID NO:62 is the amino acid sequence of the arabinose isomerase from Clostridium acetobutylicum.
[0046] SEQ ID NO:63 is the amino acid sequence of the arabinose isomerase from Leuconostoc mesenteroides.
[0047] SEQ ID NO:64 is the amino acid sequence of the arabinose isomerase from Lactobacillus plantarum.
[0048] SEQ ID NO:65 is the amino acid sequence of the arabinose isomerase from Pediococcus pentosaceus.
[0049] SEQ ID NO:66 is the amino acid sequence representing a "237-269 Motif".
[0050] SEQ ID NO:67 is the amino acid sequence of the 237-269 Motif shown in FIG. 2A.
[0051] Each of SEQ ID NOs:68-71 is an amino acid sequence that can optionally be excluded in certain embodiments of the present disclosure.
DETAILED DESCRIPTION
[0052] The disclosures of all cited patent and non-patent literature are incorporated herein by reference in their entirety.
[0053] Unless otherwise disclosed, the terms "a" and "an" as used herein are intended to encompass one or more (i.e., at least one) of a stated feature.
[0054] Where present, all ranges are inclusive and combinable, except as otherwise noted. For example, when a range of "1 to 5" is recited, the recited range should be construed as including ranges "1 to 4", "1 to 3", "1-2", "1-2 & 4-5", "1-3 & 5", and the like.
[0055] The terms "about", "approximately" and the like in some aspects, as used to modify certain numerical values herein, refer to being within 5%-10% of the stated numerical value.
[0056] The terms "arabinose isomerase", "L-arabinose isomerase" and the like refer to an enzyme that catalyzes the conversion of L-arabinose to L-ribulose. Arabinose isomerases belong to the group of enzymes classified in Enzyme Commission (EC) entry 5.3.1.4.
[0057] The terms "carbon substrate", "fermentable carbon substrate" and the like refer to a carbon source capable of being metabolized by microorganisms. A type of carbon substrate is "fermentable sugars" which refer to oligosaccharides and monosaccharides that can be used as a carbon source by a microorganism in a fermentation process. Arabinose and xylose are examples of fermentable sugars.
[0058] The term "lignocellulosic" refers to a composition comprising both lignin and cellulose. Lignocellulosic material may also comprise hemicellulose.
[0059] The term "cellulosic" refers to a composition comprising cellulose and additional components, which may include hemicellulose and lignin.
[0060] The term "saccharification" refers to the production of fermentable sugars from polysaccharides such as cellulose and hemicellulose. Saccharification can be done via chemical and/or enzymatic means.
[0061] "Biomass" refers to any cellulosic or lignocellulosic material and includes materials comprising cellulose, and optionally further comprising hemicellulose, lignin, starch, oligosaccharides and/or monosaccharides. Biomass may also comprise additional components, such as protein and/or lipid. Biomass may be derived from a single source, or biomass can comprise a mixture derived from more than one source; for example, biomass could comprise a mixture of corn cobs and corn stover, or a mixture of grass and leaves. Biomass includes, but is not limited to, bioenergy crops, agricultural residues, municipal solid waste, industrial solid waste, sludge from paper manufacture, yard waste, wood and forestry waste. Further examples of biomass include, but are not limited to, corn cobs, crop residues such as corn husks, corn stover, corn grain fiber, grasses, beet pulp, wheat straw, wheat chaff, oat straw, barley straw, barley hulls, hay, rice straw, rice hulls, switchgrass, miscanthus, cord grass, reed canary grass, waste paper, sugar cane bagasse, sorghum bagasse, sorghum stover, soybean stover, components obtained from milling of grains, trees, branches, roots, leaves, wood chips, sawdust, palm waste, shrubs and bushes, vegetables, fruits, flowers, and animal manure.
[0062] The term "pretreated biomass" refers to biomass that has been subjected to thermal, physical and/or chemical treatment to increase the availability of polysaccharides in the biomass to saccharification enzymes. Biomass pretreatment is typically done before saccharification.
[0063] "Biomass hydrolysate" refers to the product resulting from saccharification of biomass, and comprises fermentable sugars.
[0064] The terms "target compound", "target chemical" and the like refer to a compound made by a microorganism via an endogenous or recombinant biosynthetic/metabolic pathway that is able to metabolize a fermentable carbon source to produce the target compound.
[0065] The terms "percent by volume", "volume percent", "vol %", "v/v %" and the like are used interchangeably herein. The percent by volume of a solute in a solution can be determined using the formula: [(volume of solute)/(volume of solution)].times.100%.
[0066] The terms "percent by weight", "weight percentage (wt %)", "weight-weight percentage (% w/w)" and the like are used interchangeably herein. Percent by weight refers to the percentage of a material on a mass basis as it is comprised in a composition, mixture, or solution.
[0067] The terms "polynucleotide", "polynucleotide sequence", "nucleic acid molecule" and the like are used interchangeably herein. These terms encompass nucleotide sequences and the like. A polynucleotide may be a polymer of DNA or RNA that is single- or double-stranded, that optionally contains synthetic, non-natural or altered nucleotide bases. A polynucleotide may be comprised of one or more segments of cDNA, genomic DNA, synthetic DNA, or mixtures thereof. Nucleotides (ribonucleotides or deoxyribonucleotides) can be referred to by a single letter designation as follows: "A" for adenylate or deoxyadenylate (for RNA or DNA, respectively), "C" for cytidylate or deoxycytidylate (for RNA or DNA, respectively), "G" for guanylate or deoxyguanylate (for RNA or DNA, respectively), "U" for uridylate (for RNA), "T" for deoxythymidylate (for DNA), "R" for purines (A or G), "Y" for pyrimidines (C or T), "K" for G or T, "H" for A or C or T, "I" for inosine, "W" for A or T, and "N" for any nucleotide (e.g., N can be A, C, T, or G, if referring to a DNA sequence; N can be A, C, U, or G, if referring to an RNA sequence).
[0068] The terms "motif", "conserved motif" and the like herein refer to a distinctive and recurring structural unit, such as within an amino acid sequence. By "recurring" it is meant that a motif occurs in multiple related polypeptides, for example.
[0069] Herein, a first polynucleotide sequence that is "complementary" to a second polynucleotide sequence can alternatively be referred to as being in the "antisense" orientation with the second sequence.
[0070] The term "gene" as used herein refers to a DNA polynucleotide sequence that expresses an RNA (RNA is transcribed from the DNA polynucleotide sequence) from a coding region, which RNA can be a messenger RNA (encoding a protein) or a non-protein-coding RNA. A gene may refer to the coding region alone, or may include regulatory sequences upstream and/or downstream to the coding region (e.g., promoters, 5'-untranslated regions, 3'-transcription terminator regions). A coding region encoding a protein can alternatively be referred to herein as an "open reading frame" (ORF). A gene that is "native" or "endogenous" refers to a gene as found in nature with its own regulatory sequences; such a gene is located in its natural location in the genome of a host cell. A "chimeric" gene refers to any gene that is not a native gene, comprising regulatory and coding sequences that are not found together in nature (i.e., the regulatory and coding regions are heterologous with each other). Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. A "foreign" or "heterologous" gene can refer to a gene that is introduced into the host organism by gene transfer. Foreign/heterologous genes can comprise native genes inserted into a non-native organism, native genes introduced into a new location within the native host, or chimeric genes. The polynucleotide sequences in certain embodiments disclosed herein are heterologous. A "transgene" is a gene that has been introduced into the genome by a gene delivery procedure (e.g., transformation). A "codon-optimized" open reading frame has its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell.
[0071] The term "heterologous" means not naturally found in the location of interest. For example, a heterologous gene can be one that is not naturally found in a host organism, but that is introduced into the host organism by gene transfer. As another example, a nucleic acid molecule that is present in a chimeric gene can be characterized as being heterologous, as such a nucleic acid molecule is not naturally associated with the other segments of the chimeric gene (e.g., a promoter can be heterologous to a coding sequence).
[0072] A "non-native" amino acid sequence or polynucleotide sequence comprised in a cell or organism herein does not occur in a native (natural) counterpart of such cell or organism. Such an amino acid sequence or polynucleotide sequence can also be referred to as being heterologous to the cell or organism.
[0073] "Regulatory sequences" as used herein refer to nucleotide sequences located upstream of a gene's transcription start site (e.g., promoter), 5' untranslated regions, introns, and 3' non-coding regions, and which may influence the transcription, processing or stability, and/or translation of an RNA transcribed from the gene. Regulatory sequences herein may include promoters, enhancers, silencers, 5' untranslated leader sequences, introns, polyadenylation recognition sequences, RNA processing sites, effector binding sites, stem-loop structures, and other elements involved in regulation of gene expression. One or more regulatory elements herein may be heterologous to a coding region herein.
[0074] A "promoter" as used herein refers to a DNA sequence capable of controlling the transcription of RNA from a gene. In general, a promoter sequence is upstream of the transcription start site of a gene. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. Promoters that cause a gene to be expressed in a cell at most times under all circumstances are commonly referred to as "constitutive promoters". One or more promoters herein may be heterologous to a coding region herein.
[0075] A "strong promoter" as used herein refers to a promoter that can direct a relatively large number of productive initiations per unit time, and/or is a promoter driving a higher level of gene transcription than the average transcription level of the genes in a cell.
[0076] The terms "3' non-coding sequence", "transcription terminator" and "terminator" as used herein refer to DNA sequences located downstream of a coding sequence. This includes polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
[0077] The terms "upstream" and "downstream" as used herein with respect to polynucleotides refer to "5' of" and "3' of", respectively.
[0078] The term "expression" as used herein refers to (i) transcription of RNA (e.g., mRNA or a non-protein-coding RNA) from a coding region, and/or (ii) translation of a polypeptide from mRNA. Expression of a coding region of a polynucleotide sequence can be up-regulated or down-regulated in certain embodiments.
[0079] The term "operably linked" as used herein refers to the association of two or more nucleic acid sequences such that the function of one is affected by the other. For example, a promoter is operably linked with a coding sequence when it is capable of affecting the expression of that coding sequence. That is, the coding sequence is under the transcriptional control of the promoter. A coding sequence can be operably linked to one (e.g., promoter) or more (e.g., promoter and terminator) regulatory sequences, for example.
[0080] The term "recombinant" when used herein to characterize a DNA sequence such as a plasmid, vector, or construct refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis and/or by manipulation of isolated segments of nucleic acids by genetic engineering techniques.
[0081] The term "transformation" as used herein refers to the transfer of a nucleic acid molecule into a host organism or host cell by any method. A nucleic acid molecule that has been transformed into an organism/cell may be one that replicates autonomously in the organism/cell, or that integrates into the genome of the organism/cell, or that exists transiently in the cell without replicating or integrating. Non-limiting examples of nucleic acid molecules suitable for transformation are disclosed herein, such as plasmids and linear DNA molecules. Host organisms/cells herein containing a transforming nucleic acid sequence can be referred to as "transgenic", "recombinant", "transformed", "engineered", as a "transformant", and/or as being "modified for exogenous gene expression", for example.
[0082] The terms "control cell" and "suitable control cell" are used interchangeably herein and may be referenced with respect to a cell in which a particular modification (e.g., over-expression of a polynucleotide, down-regulation of a polynucleotide) has been made (i.e., an "experimental cell"). A control cell may be any cell that does not have or does not express the particular modification of the experimental cell. Thus, a control cell may be an untransformed wild type cell or may be genetically transformed but does not express the particular modification. For example, a control cell may be a direct parent of the experimental cell, which direct parent cell does not have the particular modification that is in the experimental cell. Alternatively, a control cell may be a parent of the experimental cell that is removed by one or more generations. Alternatively still, a control cell may be a sibling of the experimental cell, which sibling does not comprise the particular modification that is present in the experimental cell. A control cell can optionally be characterized as a cell as it existed before being modified to be an experimental cell.
[0083] The terms "sequence identity", "identity" and the like as used herein with respect to polynucleotide or polypeptide sequences refer to the nucleic acid residues or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window. Thus, "percentage of sequence identity", "percent identity" and the like refer to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity. It would be understood that, when calculating sequence identity between a DNA sequence and an RNA sequence, T residues of the DNA sequence align with, and can be considered "identical" with, U residues of the RNA sequence. For purposes of determining "percent complementarity" of first and second polynucleotides, one can obtain this by determining (i) the percent identity between the first polynucleotide and the complement sequence of the second polynucleotide (or vice versa), for example, and/or (ii) the percentage of bases between the first and second polynucleotides that would create canonical Watson and Crick base pairs.
[0084] Percent identity can be readily determined by any known method, including but not limited to those described in: 1) Computational Molecular Biology (Lesk, A. M., Ed.) Oxford University: NY (1988); 2) Biocomputing: Informatics and Genome Projects (Smith, D. W., Ed.) Academic: NY (1993); 3) Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H. G., Eds.) Humana: NJ (1994); 4) Sequence Analysis in Molecular Biology (von Heinje, G., Ed.) Academic (1987); and 5) Sequence Analysis Primer (Gribskov, M. and Devereux, J., Eds.) Stockton: NY (1991), all of which are incorporated herein by reference.
[0085] Preferred methods for determining percent identity are designed to give the best match between the sequences tested. Methods of determining identity and similarity are codified in publicly available computer programs, for example. Sequence alignments and percent identity calculations can be performed using the MEGALIGN program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis.), for example. Multiple alignment of sequences can be performed, for example, using the Clustal method of alignment which encompasses several varieties of the algorithm including the Clustal V method of alignment (described by Higgins and Sharp, CABIOS. 5:151-153 (1989); Higgins, D. G. et al., Comput. Appl. Biosci., 8:189-191 (1992)) and found in the MEGALIGN v8.0 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc.). For multiple alignments, the default values can correspond to GAP PENALTY=10 and GAP LENGTH PENALTY=10. Default parameters for pairwise alignments and calculation of percent identity of protein sequences using the Clustal method can be KTUPLE=1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. For nucleic acids, these parameters can be KTUPLE=2, GAP PENALTY=5, WINDOW=4 and DIAGONALS SAVED=4. Additionally the Clustal W method of alignment can be used (described by Higgins and Sharp, CABIOS. 5:151-153 (1989); Higgins, D. G. et al., Comput. Appl. Biosci. 8:189-191(1992); Thompson, J. D. et al, Nucleic Acids Research, 22 (22): 4673-4680, 1994) and found in the MEGALIGN v8.0 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc.). Default parameters for multiple alignment (protein/nucleic acid) can be: GAP PENALTY=10/15, GAP LENGTH PENALTY=0.2/6.66, Delay Divergen Seqs(%)=30/30, DNA Transition Weight=0.5, Protein Weight Matrix=Gonnet Series, DNA Weight Matrix=IUB.
[0086] Various polypeptide amino acid sequences and polynucleotide sequences are disclosed herein as features of certain embodiments. Variants of these sequences that are at least about 70-85%, 85-90%, or 90%-95% identical to the sequences disclosed herein can be used or referenced. Alternatively, a variant amino acid sequence or polynucleotide sequence can have at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity with a sequence disclosed herein. The variant amino acid sequence or polynucleotide sequence has the same function/activity of the disclosed sequence, or at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the function/activity of the disclosed sequence. Any polypeptide amino acid sequence disclosed herein not beginning with a methionine can typically further comprise at least a start-methionine at the N-terminus of the amino acid sequence.
[0087] All the amino acid residues at each amino acid position of the proteins disclosed herein are examples. Given that certain amino acids share similar structural and/or charge features with each other (i.e., conserved), the amino acid at each position of a protein herein can be as provided in the disclosed sequences or substituted with a conserved amino acid residue ("conservative amino acid substitution") as follows:
[0088] 1. The following small aliphatic, nonpolar or slightly polar residues can substitute for each other: Ala (A), Ser (S), Thr (T), Pro (P), Gly (G);
[0089] 2. The following polar, negatively charged residues and their amides can substitute for each other: Asp (D), Asn (N), Glu (E), Gln (Q);
[0090] 3. The following polar, positively charged residues can substitute for each other: His (H), Arg (R), Lys (K);
[0091] 4. The following aliphatic, nonpolar residues can substitute for each other: Ala (A), Leu (L), Ile (I), Val (V), Cys (C), Met (M); and
[0092] 5. The following large aromatic residues can substitute for each other: Phe (F), Tyr (Y), Trp (W).
[0093] The terms "corresponds with", "corresponds to", "aligns with" and the like can be used interchangeably herein. The relative position of a conserved amino acid motif (e.g., SEQ ID NO:67 or 66) in an arabinose isomerase herein can, for example, correspond with certain positions/residues (e.g., positions 237-269) that are associated with (define the location of) the conserved motif as it exists in a reference arabinose isomerase (e.g., SEQ ID NO:7). The position of the conserved motif of SEQ ID NO:67 or 66 in a particular arabinose isomerase can thus be determined with reference to positions 237-269 of SEQ ID NO:7, for example. In general, one can align the amino acid sequence of a query arabinose isomerase with SEQ ID NO:7 using an alignment algorithm and/or software described herein (e.g., BLASTP, ClustalW, ClustalV, Clustal Omega, EMBOSS) to determine if the conserved motif of SEQ ID NO:67 or 66, if present, is located at the noted position. In some embodiments, an alignment further indicates that SEQ ID NO:67 or 66 is at a position corresponding with positions 237-269 of SEQ ID NO:7, if the location of SEQ ID NO:67 or 66 (i) begins at any residue from positions 217-257, 227-247, 230-244, or 232-242, and (ii) ends at any residue from positions 249-289, 259-279, 262-276, or 264-274, of the amino acid sequence of the query arabinose isomerase.
[0094] The term "isolated" as used herein refers to a polynucleotide or polypeptide molecule that has been completely or partially purified from its native source. In some instances, the isolated polynucleotide or polypeptide molecule is part of a greater composition, buffer system or reagent mix. For example, an isolated polynucleotide or polypeptide molecule can be comprised within a cell or organism in a heterologous manner. Such a cell or organism containing heterologous components does not occur in nature, and/or exhibits properties not believed to naturally occur.
[0095] The term "increased" as used herein can refer to a quantity or activity that is at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 50%, 100%, or 200% more than the quantity or activity for which the increased quantity or activity is being compared. The terms "increased", "elevated", "enhanced", "greater than", "improved" and the like are used interchangeably herein. These terms can be used to characterize the "over-expression" or "up-regulation" of a polynucleotide encoding a protein, for example.
[0096] The present disclosure relates to engineered yeast strains that have arabinose isomerase activity. A challenge for engineering yeast to utilize arabinose, which is a sugar that can be obtained from cellulosic biomass, is to produce sufficient arabinose isomerase activity in the yeast cell. Arabinose isomerase catalyzes the conversion of arabinose to ribulose, which is the first step in an arabinose utilization pathway. Applicants have found that expression of specific arabinose isomerase polypeptides provides arabinose isomerase activity in yeast cells, while expression of other arabinose isomerase polypeptides does not provide activity. A yeast cell expressing arabinose isomerase activity provides a host cell for expression of a complete arabinose utilization pathway, thereby engineering a yeast cell that can produce a target chemical, such as ethanol, butanol, or 1,3-propanediol, using arabinose derived from lignocellulosic biomass as a carbon source, for example.
Yeast Host Cells
[0097] Yeast cells of the present disclosure are those that comprise an arabinose isomerase (i.e., heterologous arabinose isomerase) that supports effective utilization of arabinose in an arabinose utilization pathway, and are capable of producing a target chemical. Preferred target chemicals are those of commercial value including, but not limited to, ethanol, butanol, or 1,3-propanediol.
[0098] Any yeast cells that either produce a target chemical, or can be engineered to produce a target chemical, may be used as host cells herein. Examples of such yeasts include, but are not limited to, yeasts of the genera Kluyveromyces, Candida, Pichia, Hansenula, Schizosaccharomyces (e.g., S. pombe), Kloeckera, Schwanniomyces, Yarrowia, and Saccharomyces (e.g., S. cerevisiae).
[0099] Yeast cells of the present disclosure comprising an effective arabinose isomerase may be engineered according to methods well known in the art. For example, yeast cells that have the native ability to produce ethanol from C6 sugars may be transformed with genes encoding C5 metabolic pathways including an arabinose isomerase disclosed herein. Such cells may be capable of either aerobic or anaerobic fermentation ethanol production.
[0100] In some embodiments, yeast cells may be engineered to express a pathway for synthesis of butanol or 1,3-propanediol. Engineering of pathways for butanol synthesis (including isobutanol, 1-butanol, and/or 2-butanol) have been disclosed, for example, in U.S. Pat. Nos. 8,206,970 and 7,851,188, and in U.S. Patent Appl. Publ. Nos. 2007/0292927, 2009/0155870 and 2008/0182308, all of which are incorporated herein by reference. Engineering of pathways for 1,3-propanediol have been disclosed in U.S. Pat. Nos. 6,514,733, 5,686,276, 7,005,291, 6,013,494 and 7,629,151, which are incorporated herein by reference.
[0101] For utilization of xylose as a carbon source, a yeast cell can be engineered for expression of a complete xylose utilization pathway. Engineering of yeast such as S. cerevisiae for production of ethanol from xylose is described in Matsushika et al. (Appl. Microbiol. Biotechnol. 84:37-53) and in Kuyper et al. (FEMS Yeast Res. 5:399-409), which are incorporated herein by reference. In certain embodiments, in addition to engineering a yeast cell to have xylose isomerase activity, the activities of other pathway enzymes are increased in the cell to provide the ability to grow on xylose. Typically the activity levels of five pentose pathway enzymes are increased: xylulokinase (XKS1), transaldolase (TAL1), transketolase 1 (TKL1), D-ribulose-5-phosphate 3-epimerase (RPE1), and ribose 5-phosphate ketol-isomerase (RKI1). Any method known to one skilled in the art for increasing expression of a gene may be used. For example, as described in the Examples (below), these activities may be increased by expressing a coding region for each protein using a highly active promoter. Chimeric genes for expression can be constructed and integrated into the yeast genome. Alternatively, heterologous coding regions for these enzymes may be expressed in the yeast cell to obtain increased enzyme activities. Other suitable methods for engineering yeast capable of metabolizing xylose have been disclosed in, for example, U.S. Pat. Nos. 7,622,284, 8,058,040, 8,129,171 and 7,943,366, International Patent Appl. Publ. Nos. WO2011153516A2, WO2011149353A1, WO2006115455A1 and WO2011079388A1, and U.S. Patent Appl. Publ. Nos. 2010/0112658, 2010/0028975, 2009/0061502, 2007/0155000 and 2006/0216804, all of which are incorporated herein by reference.
[0102] For utilization of arabinose as a carbon source, a yeast cell can be engineered to express a complete arabinose utilization pathway, as disclosed in U.S. Patent Appl. Publ. No. 2005/0142648 and U.S. Pat. No. 8,129,171, for example, which are incorporated herein by reference. To allow arabinose utilization, activities expressed in addition to activities of the xylose utilization pathway include: 1) L-arabinose isomerase (examples of which are presently disclosed) to convert L-arabinose to L-ribulose, 2) L-ribulokinase to convert L-ribulose to L-ribulose-5-phosphate, and 3) L-ribulose-5-phosphate-4-epimerase to convert L-ribulose-5-phosphate to D-xylulose. These enzyme activities can be expressed using coding regions of araA, araB, and araD genes, respectively. In certain aspects, the araB-encoded L-ribulokinase is from E. coli, and the araD-encoded L-ribulose-5-phosphate-4-epimerase is from E. coli. Any method known to one skilled in the art for expressing a foreign coding region may be used. For example, as described in the Examples (below), these activities can be expressed by introducing chimeric genes containing promoters active in yeast cells, heterologous codon-optimized coding regions for the enzymes, and termination sequences active in yeast cells.
Arabinose Isomerase
[0103] Obtaining an effective amount of arabinose isomerase activity in yeast cells has been problematic. A group of arabinose isomerase enzymes were found herein that provide effective arabinose isomerase activity in a yeast cell for producing ethanol from arabinose in fermentation. The yeast cell, in addition to expressing the arabinose isomerase enzyme described herein, was genetically engineered as described above to express a xylose utilization pathway and a partial arabinose utilization pathway that lacks arabinose isomerase. The present arabinose isomerase was then expressed to complete the arabinose utilization pathway. One or more additional arabinose isomerases may also be expressed, if desired.
[0104] Twenty candidate arabinose isomerase enzymes (SEQ ID NOs:1-20) were chosen from the cow rumen metagenome dataset (Hess et al., Science 331:463-467) and the human microbiome dataset (The Human Microbiome Jumpstart Reference Strains Consortium et al., Science 328:994-999) as described in the Examples (below). Each of these arabinose enzymes was expressed in yeast cells from a codon-optimized coding sequence as described in Example 6 herein, and tested for the ability to support ethanol production by yeast cells grown in medium containing arabinose as the only sugar. Eight of the arabinose isomerase candidates (SEQ ID NOs:7, 8, 10, 15, 17, 18, 19 and 20) were effective in supporting production of ethanol, as compared to the arabinose isomerase from B. subtilis. Five of these arabinose isomerase candidates supported greater ethanol production as compared to the arabinose isomerase from B. subtilis: SEQ ID NOs:7, 10, 17, 18 and 19. Enzyme activities in protein extracts from the expressing strains showed these to be higher than for the B. subtilis arabinose isomerase, though the activities did not directly correlate with the ethanol production level of the corresponding strain in all cases. In some aspects, a yeast cell expressing an arabinose isomerase as presently disclosed can produce at least about 90%, 100%, 110%, 120%, 130%, 140%, or 150% of the amount of ethanol that is produced under suitable conditions by a yeast cell expressing a B. subtilis arabinose isomerase (e.g., SEQ ID NO:41).
[0105] Six of the effective arabinose isomerase candidates (SEQ ID NOs:7, 8, 10, 17, 18 and 19) were found to be separated from the other candidates as members of one clade in a phylogenetic tree prepared for the twenty candidate arabinose isomerase enzyme amino acid sequences and the arabinose isomerase sequence from Bacillus subtilis (BSaraA; SEQ ID NO:41) (see FIG. 1). These six sequences were all from the cow rumen metagenome dataset. The sequences of SEQ ID NOs:7 and 8 have 94% amino acid sequence identity to each other. The sequences of SEQ ID NOs:10 and 17 have 92% amino acid sequence identity to each other. The sequences of SEQ ID NOs:10 and 17 have 80% amino acid sequence identity to SEQ ID NO: 18. Sequence identities among the other sequences are lower. Though the amino acid sequence identities vary among the six sequences of this clade, further sequence analysis identified an amino acid sequence motif (SEQ ID NO:67) that distinguishes the six sequences of this clade from the other fourteen candidate arabinose isomerase enzyme amino acid sequences, and that of the B. subtilis arabinose isomerase (SEQ ID NO:41). The distinguishing motif occurs starting with amino acid #237 and ending with amino acid #269, with reference to positions in the amino acid sequence of SEQ ID NO:7 (see FIGS. 2A and B). Based on the motif, this clade is called herein the "237-269 Motif" clade. The corresponding amino acid positions in different arabinose isomerase sequences can readily be determined by performing a sequence alignment. For example, these positions are #241 through #273 in both SEQ ID NOs:10 and 18.
[0106] The 237-269 Motif (FIG. 2A) is represented as formula I: I[RK]YQA[RK]EEIA[IM].K[IM][LM].[RA][EN]G[AC].AF.NTF[QE]DL . . . M (SEQ ID NO:67), where the standard single letter abbreviations for amino acids are used; where "[ ]" (brackets) indicate a position where the amino acid can be any one of the bracketed amino acids; and where each "." (period mark) indicates a position that does not have distinguishing amino acids (i.e., the residue at such position may also be found at the corresponding position in a non-clade AI sequence) (e.g., can be any standard amino acid) in terms of physical properties.
[0107] Further information about the 237-269 Motif (FIG. 2B) is represented in formula II: I[RK]YQA[RK]EEIA[IM].K[IM][LM].[RA][EN]G[AC].AF.NTF[QE]DL . . . M (SEQ ID NO:67), where the standard single letter abbreviations for amino acids are used; where "[ ]" (brackets) indicate a position where the amino acid can be any one of the bracketed amino acids; where the letters in bold indicate positions that are conserved (for the specific amino acid or for the physiochemical properties of the amino acid in each particular bold position) among all twenty candidate sequences and are therefore not distinguishing for the motif; where each "." (period mark) indicates a position that does not have distinguishing amino acids (i.e., the residue at such position may also be found at the corresponding position in a non-clade AI sequence) (e.g., can be any standard amino acid) in terms of physical properties; and where each underline indicates an amino acid that is only found at the underlined position in the present sequences (e.g., SEQ ID NOs:7, 8, 10, 17, 18, 19, or any other AI comprising SEQ ID NO:67) (i.e., they are not found in the non-clade sequences).
[0108] The amino acid sequence of the 237-269 Motif can be given as the following sequence, where the positions with non-distinguishing amino acids ("." positions) are shown as bracketed multiple amino acids, any one of which may occur in these bracketed positions:
TABLE-US-00002 (SEQ ID NO: 66) I[RK]YQA[RK]EEIA[IM][EK]K[IM][LM][VTD][RA][EN]G[AC] [KNR]AF[SVT]NTF[EQ]DL[HIY]GM.
Thus, the 237-269 Motif as represented by SEQ ID NO:66 has an "Xaa" at each position having possible multiple amino acids, with the possible amino acids in these positions, as shown above in the bracket positions, designated in the present Sequence Listing.
[0109] There is variation in the amino acids of the motif among the six sequences of the 237-269 Motif clade (see FIG. 2B), however, the amino acid sequence of an arabinose isomerase can be readily identified by one skilled in the art as belonging to this clade based on overall matching with the motif. Thus, in certain embodiments, the present arabinose isomerase sequences for expression in yeast have the 237-269 Motif described above (e.g., SEQ ID NO:67 or SEQ ID NO:66) or a related sequence (e.g., a motif that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:67 or SEQ ID NO:66). In some embodiments, an arabinose isomerase can comprise a motif that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to positions 237-269 of SEQ ID NO:7. The present arabinose isomerase sequences are any that belong to the 237-269 Motif clade (i.e., any that comprise SEQ ID NO:67, SEQ ID NO:66, or a related sequence thereof). It is noted for clarity that SEQ ID NO:66 is a version of SEQ ID NO:67.
[0110] In some embodiments, the present arabinose isomerase sequences are identified by specific amino acids matching to distinguishing positions in the motif sequence. The present arabinose isomerase is identified in this manner by having at least seventeen, eighteen, or nineteen of, or all of, the following amino acids in the motif: I at position 237; R or K at position 238; Y at position 239; R or K at position 242; E at position 243; I at position 245; A at position 246; I or M at position 247; K at position 249; I or M at position 250; R or A at position 253; E or N at position 254; G at position 255; A or C at position 256; F at position 259; N at position 261; T at position 262; Q or E at position 264; and M at position 269.
[0111] In some embodiments, the present arabinose isomerase sequences are identified by specific amino acids that are different from the amino acids at the corresponding positions in the sequences which are not in the 237-269 Motif clade. These specific amino acids are: E at position 243; I or M at position 250; A or C at position 256; and N at position 261.
[0112] In certain embodiments, an arabinose isomerase comprises, or consists of, an amino acid sequence that is 100% identical to, or at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, SEQ ID NO:7, 8, 10, 17, 18, or 19. Such an arabinose isomerase can optionally be further characterized to comprise SEQ ID NO:67 or SEQ ID NO:66 (or a variant of either motif that is at least 90% or 95% identical thereto), either of which being located at amino acid positions corresponding to positions 237-269 of SEQ ID NO:7. In some embodiments, an arabinose isomerase can (i) comprise, or consist of, an amino acid sequence that is at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, SEQ ID NO:7, 8, 10, 17, 18, or 19, and (ii) comprise the respective motif shown in FIG. 2C.
[0113] In some embodiments, an arabinose isomerase can comprise a motif that is at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to positions 237-269 of SEQ ID NO:7. Such an arabinose isomerase can optionally further be characterized as comprising, or consisting of, an amino acid sequence that is at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:7, 8, 10, 17, 18, or 19.
[0114] In some embodiments, an arabinose isomerase comprises, or consists of, the amino acid sequence of (i) SEQ ID NO:10, 17, 18, or 19 (or any variant thereof as disclosed herein) or (ii) SEQ ID NO:10, 17, or 18 (or any variant thereof as disclosed herein). An arabinose isomerase, in some aspects of the present disclosure, does not comprise SEQ ID NO:15, 68, 69, 70, or 71.
[0115] The present amino acid sequences that provide arabinose isomerase activity in yeast cells are not native to yeast cells, thus their encoding polynucleotide sequences are heterologous to yeast cells. For expression, polynucleotide molecules encoding the present polypeptides may be designed using codon optimization for the desired yeast cell. For example, to express SEQ ID NO:7, 8,10, 17, 18, and/or 19, a codon-optimized coding region of SEQ ID NO:27, 28, 30, 37, 38, and/or 39 can be used, respectively. A polynucleotide can also be characterized as being heterologous in some aspects by virtue of comprising heterologously combined elements (e.g., a promoter that is heterologous to the sequence encoding the polypeptide).
[0116] Methods for gene expression in yeasts are known in the art (see for example Methods in Enzymology, Volume 194, Guide to Yeast Genetics and Molecular and Cell Biology (Part A, 2004, Christine Guthrie and Gerald R. Fink (Eds.), Elsevier Academic Press, San Diego, Calif.). Expression of genes in yeast typically requires a promoter, operably linked to the coding region of interest, and a transcriptional terminator. A number of yeast promoters can be used in constructing expression cassettes for genes encoding the desired proteins, including, but not limited to, constitutive promoters (e.g., FBA1, GPD1, PDC1, ADH1, GPM, TPI1, TDH3, PGK1, ILV5p) and inducible promoters (e.g., GAL1, GAL10, CUP1). Suitable transcription terminators include, but are not limited to, FBAt, GPDt, GPMt, ERG10t, GAL1t, CYC1t, ADH1t, TAL1t, TKL1t, ILV5t, and ADHt.
[0117] A polynucleotide sequence herein encoding an arabinose isomerase can be a vector (e.g., plasmid, cosmid) containing a selectable marker and sequences allowing autonomous replication or chromosomal integration in the desired host, for example. Typically used plasmids in yeast are shuttle vectors pRS423, pRS424, pRS425, and pRS426 (American Type Culture Collection, Rockville, Md.), which contain an E. coli replication origin (e.g., pMB1), a yeast 2.mu. origin of replication, and a marker for nutritional selection. The selection markers for these four vectors are His3 (vector pRS423), Trp1 (vector pRS424), Leu2 (vector pRS425) and Ura3 (vector pRS426). Additional vectors that may be used include pHR81 (ATCC #87541) and pRS313 (ATCC #77142). Construction of expression vectors with chimeric genes encoding the desired proteins may be performed by either standard molecular cloning techniques in E. coli or by the gap repair recombination method in yeast, for example.
[0118] The present disclosure also provides a method for producing a yeast cell that has arabinose isomerase activity following the teachings above. In this method, a heterologous polynucleotide encoding an arabinose isomerase as presently disclosed is introduced into a yeast cell lacking arabinose isomerase. Any yeast cell as disclosed herein can be produced using this method, if desired. Some aspects are drawn to increasing the arabinose isomerase activity of a yeast cell that already comprises a heterologous arabinose isomerase, by introducing a polynucleotide encoding an arabinose isomerase as presently disclosed to the yeast cell.
[0119] In various embodiments, a heterologous polynucleotide encoding a polypeptide having arabinose isomerase activity can be introduced into the yeast cell before, after, or at the same time that other genes for expressing enzymes of an arabinose utilization pathway are introduced.
[0120] In certain embodiments, a heterologous polynucleotide herein can be introduced into a yeast cell that has already been modified to comprise a complete xylose utilization pathway. Introduction of arabinose isomerase activity and additional modifications for a xylose utilization pathway and a complete arabinose utilization pathway may be performed in any order, and/or with two or more introductions/modifications performed concurrently. Such yeast cells have the ability to grow in medium containing arabinose as the sole carbon source. More typically, these cells are grown in medium containing arabinose, as well as other sugars such as glucose and/or xylose. This latter growth scheme allows effective use of the sugars found in a hydrolysate medium prepared from cellulosic biomass by pretreatment and saccharification. Some embodiments therefore are drawn to a fermentation comprising at least a yeast cell as disclosed herein, arabinose, and optionally xylose and/or glucose. Any or all of the sugar components of such a fermentation can be provided from a lignocellulosic biomass hydrolysate, for example.
[0121] In certain embodiments, a heterologous polynucleotide herein can be introduced into a yeast cell that has a metabolic pathway that produces a target chemical. The pathway may be endogenous, or it may be engineered in the cell. Introduction of arabinose isomerase activity and a metabolic pathway producing a target chemical may be performed in any order, and/or with two or more genetic modifications performed concurrently. Examples of target chemicals include ethanol, butanol, and 1,3-propanediol. Yeast cells containing metabolic pathways for production of target chemicals are described above, for example.
Production of a Target Chemical Using Arabinose
[0122] The present yeast cells expressing an arabinose isomerase as part of an arabinose utilization pathway, and producing ethanol and/or another target chemical, can be grown in medium containing arabinose. Typically, such yeast cells also are able to utilize xylose and the medium contains additional sugars such as glucose and xylose. In certain embodiments, lignocellulosic biomass hydrolysate is used in fermentation medium, for example as disclosed in U.S. Pat. No. 7,932,063, which is incorporated herein by reference.
[0123] A variety of culture methodologies may be applied. For example, large-scale fermentation/production may use batch, fed-batch, or continuous culture methodologies. Other fermentation conditions such as pH, oxygenation, and temperature can be applied, accordingly.
[0124] In any embodiment disclosed herein, an arabinose isomerase can, instead of being one as disclosed herein to comprise a 237-269 Motif (e.g., SEQ ID NO:66 or 67), rather be one comprising, or consisting of, an amino acid sequence that is 100% identical to, or at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to, SEQ ID NO:15 or 20. Such enzymes do not belong to the 237-269 Motif clade.
[0125] Non-limiting examples of compositions and methods disclosed herein include:
1. A recombinant yeast cell comprising an arabinose utilization pathway that comprises a polypeptide having arabinose isomerase activity, wherein the yeast cell comprises a heterologous polynucleotide encoding the polypeptide, wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:67, and wherein the position of the motif in the polypeptide corresponds with positions 237-269 of SEQ ID NO:7. 2. The recombinant yeast cell of embodiment 1, wherein the motif comprises at least seventeen amino acids selected from the group consisting of: (a) I at position 237; (b) R or K at position 238; (c) Y at position 239; (d) R or K at position 242; (e) E at position 243; (f) I at position 245; (g) A at position 246; (h) I or M at position 247; (i) K at position 249; (j) I or M at position 250; (k) R or A at position 253; (l) E or N at position 254; (m) G at position 255; (n) A or C at position 256; (o) F at position 259; (p) N at position 261; (q) T at position 262; (r) Q or E at position 264; and (s) M at position 269; wherein each position of (a)-(s) corresponds with the respective position in positions 237-269 of SEQ ID NO:7. 3. The recombinant yeast cell of embodiment 1 or 2, wherein the polypeptide comprises a motif that is SEQ ID NO:67. 4. The recombinant yeast cell of embodiment 1, 2, or 3, wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:66. 5. The recombinant yeast cell of embodiment 4, wherein the polypeptide comprises a motif that is SEQ ID NO:66. 6. The recombinant yeast cell of embodiment 1, 2, 3, 4, or 5, wherein the polypeptide comprises an amino acid sequence that is at least 85% identical to SEQ ID NO:7, 8, 10, 17, 18, or 19. 7. The recombinant yeast cell of embodiment 1, 2, 3, 4, 5, or 6, further comprising a metabolic pathway that produces a target compound, optionally wherein the target compound is ethanol, butanol, or 1,3-propanediol. 8. A method for producing a yeast cell having arabinose isomerase activity, the method comprising: (a) providing a yeast cell lacking arabinose isomerase; and (b) introducing a heterologous polynucleotide into the yeast cell, wherein the heterologous polynucleotide encodes a polypeptide having arabinose isomerase activity, and wherein the polypeptide comprises a motif that is at least 90% identical to SEQ ID NO:67. 9. The method of embodiment 8, wherein: (i) the yeast cell of step (a) comprises one or more polynucleotides encoding enzymes, except an arabinose isomerase, of an arabinose utilization pathway, or (ii) step (b) further comprises introducing, into the yeast cell, one or more polynucleotides encoding enzymes of an arabinose utilization pathway, wherein this further introduction is at the same time of, or after, the introducing the heterologous polynucleotide encoding the polypeptide having arabinose isomerase activity. 10. A method of producing a target compound from arabinose comprising: (a) providing the recombinant yeast cell of any one of embodiments 1-7; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and c) optionally isolating the target compound of (b). 11. The method of embodiment 10, wherein the target compound is ethanol, butanol, or 1,3-propanediol. 12. A recombinant yeast cell comprising an arabinose utilization pathway that comprises a polypeptide having arabinose isomerase activity, wherein the yeast cell comprises a heterologous polynucleotide encoding the polypeptide, wherein the polypeptide comprises an amino acid sequence that is at least 85% identical to SEQ ID NO:15 or 20. 13. The recombinant yeast cell of embodiment 12, further comprising a metabolic pathway that produces a target compound, optionally wherein the target compound is ethanol, butanol, or 1,3-propanediol. 14. A method of producing a target compound from arabinose comprising: (a) providing the recombinant yeast cell of embodiment 12 or 13; (b) growing the yeast cell of (a) in medium comprising arabinose, wherein the target compound is produced; and c) optionally isolating the target compound of (b). 15. A method for producing a yeast cell having arabinose isomerase activity, the method comprising: (a) providing a yeast cell lacking arabinose isomerase; and (b) introducing a heterologous polynucleotide into the yeast cell, wherein the heterologous polynucleotide encodes a polypeptide (i) having arabinose isomerase activity and (ii) comprising an amino acid sequence that is at least 85% identical to SEQ ID NO:15 or 20.
EXAMPLES
[0126] The present disclosure is further exemplified in the following Examples. It should be understood that these Examples, while indicating certain preferred aspects herein, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of the disclosed embodiments, and without departing from the spirit and scope thereof, can make various changes and modifications to adapt the disclosed embodiments to various uses and conditions.
Example 1
Plasmid Constructs for Xylose Utilization Pathway
[0127] To make xylose-utilizing yeast strains, a plasmid for increasing expression of the pentose pathway in Saccharomyces cerevisiae that was described as P5 Integration Vector in GRE3 in U.S. Pat. No. 8,669,076 (Example 1 therein), which is incorporated herein by reference, was used. This plasmid was renamed herein as pSX01 (SEQ ID NO:43; FIG. 3A). It contains a 12719-bp P5 transgene fragment having five chimeric genes (XKS1, TKL1, RKI1, RPE1, and TAL1) and a URA3 marker, flanked by a pair of homologous recombination fragments (HRF), GRE3-I (572-bp) and GRE3-II (541-bp). GRE3-I and GRE3-II direct integration of the transgene fragment into the GRE3 locus on chromosome 8 of the S. cerevisiae genome, between positions 323809 and 324118. Integration truncates the GRE3 coding sequence between nucleotides 401 and 710, which removes a 308-bp sequence from it. On the transgene fragment, TKL1, RKI1, RPE1, and TAL1 encode four pentose phosphate pathway enzymes: transketolase (EC 2.2.1.1), ribose-5-phosphate ketokisornerase (EC 5.3.1.6), ribulose-phosphate 3-epimerase (EC 5.1.3.1), and transaldolase (EC 2.2.1.2); XKS1 encodes a xylose assimilation pathway enzyme xylulokinase (EC 2.7.1.17); URA3 functions as a selection marker for transformation of the URA3-deletion strains. A pair of Lox elements was located at 5' and 3' ends of the URA3 marker so that it could be removed after integration of the transgene, when a Cre recombinase is introduced to transformants.
[0128] Xylose isomerase (EC 5.3.1.5) is a key enzyme for the xylose assimilation pathway in many bacteria and a few fungi. It is a slow enzyme with poor kinetic properties. It was previously disclosed in U.S. Pat. Nos. 8,114,974 and 8,093,037, which are incorporated herein by reference, that a synthetic xylA gene (herein named VDxylA) was expressed and functioned well in S. cerevisiae. VDxylA has the amino acid sequence shown in SEQ ID NO:44. A 1323-bp VDxylA coding sequence was synthesized using codon optimization for expression in S. cerevisiae (SEQ ID NO:45). It was then linked with a 868-bp PDC1 promoter and 110-bp UAS(FBA1) enhancer at its 5' end and a 623-bp ILV5 terminator at its 3' end, forming a 2966-bp chimeric expression cassette designated as UAS(FBA1)::PDC1p::VDxylA::ILV5t (SEQ ID NO:46).
[0129] The VDxylA cassette was constructed into three different integration plasmids, which targeted for three integration loci specifically. These plasmids had a 2653-bp common backbone sequence, which contained an E. coli replication origin (ORI) and ampicillin-resistance marker (AP.sup.r) for plasmid propagation in E. coli, as well as KasI sites at both ends. The transgene sequences had a structure of HRF-UNDxylA Cassette/HRF-DD/URA3 Cassette/HRF-D. They were connected with the backbone through KasI sites. In this transgene structure, HRF-U and HRF-D were two homologous recombination fragments able to integrate the transgene into the S. cerevisiae chromosome between the sequences corresponding to these two HRFs. The URA3 cassette provided a selective marker for integration. HRF-DD was a third homologous recombination fragment corresponding to a chromosomal sequence further downstream of HRF-D. After integration, this fragment was able to interact with the chromosomal copy of HRF-DD to loop-out the URA3 cassette and the HRF-D, thus leaving the VDxylA cassette between HRF-U and HRF-DD on the chromosome without a selective marker.
[0130] Plasmid pSX208 (SEQ ID NO:47; FIG. 3B) has a VDxylA cassette in its transgene fragment. Its three HRFs are 546-bp STB5U-U, 487-bp STB5U-D, and 386-bp STB5U-DD, corresponding to the S. cerevisiae chromosome-VIII from coordinates 457706 to 458251, from coordinates 458334 to 458820, and from coordinates 458836 to 459221, respectively. Therefore, during integration, the transgene fragment inserts into chromosome-VIII between coordinates 458251 and 458334. After recycling the URA3 marker, the VDxylA cassette was located between coordinates 458251 and 458836, that is, in an intergenic region upstream of the STB5 locus (YHR178W).
[0131] Plasmid pSX209 (SEQ ID NO:48; FIG. 4A) has a VDxylA cassette in its transgene fragment. Its three HRFs are 465-bp AAP1U-U, 476-bp AAP1U-D, and 522-bp AAP1U-DD, corresponding to S. cerevisiae chromosome-VIII from coordinates 203581 to 203117, from coordinates 202845 to 202370, and from coordinates 202362 to 201841, respectively. Therefore, during integration, the transgene fragment inserts into chromosome-VIII between coordinates 203117 and 202845. After recycling the URA3 marker, the VDxylA cassette from pSX209 was located between coordinates 203117 and 202362, that is, in an intergenic region upstream of the AAP1 locus (YHR047C).
[0132] Plasmid pSX210 (SEQ ID NO:49; FIG. 4B) has a VDxylA cassette in its transgene fragment. Its three HRFs are 437-bp PTC7U-U, 453-bp PTC7U-D, and 465-bp PTC7U-DD, corresponding to S. cerevisiae chromosome-VIII from coordinates 249702 to 250138, from coordinates 250199 to 250651, and from coordinates 250670 to 251134, respectively. Therefore, during integration, the transgene fragment inserts into chromosome-VIII between coordinates 250138 and 250199. After recycling the URA3 marker, the VDxylA cassette from pSX210 was located between coordinates 250138 and 250670, that is, in an intergenic region upstream of the PTC7 locus (YHR076W). Table 2 is a summary of plasmids pSX208, pSX209, and pSX210.
TABLE-US-00003 TABLE 2 Summary of pSX208, pSX209 and pSX210 Constructs. pSX208 pSX209 pSX210 Size 8601 bp 8645 bp 8537 bp Backbone 6376-427 6420-427 6312-427 STB5U-U 428-973 428-892 428-864 VDxylA Cassette 980-3945 899-3864 871-3836 STB5U-DD 3960-4345 3879-4400 3851-4315 URA3 Cassette 4472-5826 4527-5881 4442-5796 STB5U-D 5889-6375 5944-6419 5859-6311
Example 2
Development of Xylose-Utilization Strains
[0133] To make xylose-utilizing recombinant strains, the S. cerevisiae strain PXI3 was used as the recipient of transgenes. PXI3, also called BP1548, is a CEN.PK-based haploid laboratory strain derived from prototrophic diploid strain CBS 8272 (Centraalbureau voor Schimmelcultures [CBS] Fungal Biodiversity Centre, Netherlands), with a genotype of MAT.alpha. ura3.DELTA. his3.DELTA.. The strain was described in detail in Patent Appl. Publ. No. 2014/0178954 (Example 5 therein), which is incorporated herein by reference. Both native URA3 coding sequences were deleted by an approach similar to that described in Patent Appl. Publ. No. 2014/0178954.
[0134] The 12719-bp P5 transgene fragment (see Example 1), which included the homologous recombination regions GRE3-I and GRE3-II, was isolated from plasmid pSX01 by KasI digestion and transformed into the PXI3 strain using the FROZEN-EZ Yeast Transformation II Kit from Zymo Research (Irvine, Calif.). Transformants were selected on plates with CM/Gluc/-Ura (Teknova), which is a synthetic dropout (SD) medium lacking uracil. The URA3 marker was removed by introducing a CRE recombinase vector, pJT254 (SEQ ID NO:50), into the transformants. This vector was derived from pRS413 and the cre coding region (nt 2562 to 3593) was under the control of the GAL1 promoter (nt 2119 to 2561). Strains that could no longer grow on SD (-uracil) medium were selected. Further passages on YPD medium (Teknova) were used to cure the pJT254 plasmid. The resulting CEN-PK-based strain was named PX112. It has a P5 transgene fragment integrated into GRE3 locus as described in Example 1, with a genotype of MAT.alpha. ura3.DELTA. his3.DELTA. gre3.DELTA.:P5.
[0135] To integrate the xylA gene into the PX112 strain, the transgene fragments with the flanked HRFs were amplified by PCR from pSX208 or isolated from pSX209 and pSX210 by KasI digestion. The fragments were sequentially transformed into the strain to achieve multiple insertions. For one round of integration, the xylA transgene fragment was transformed into the recipient strain using the FROZEN-EZ Yeast Transformation II Kit from Zymo Research. Transformants were selected on CM/Gluc/-Ura Plates. Accurate integration was confirmed by PCR. Integration was followed by a recycling procedure to remove the URA3 marker. For this purpose, the transformant was grown in YPD broth (Teknova) overnight and then on a CM-FOA plate (6.7 g/L Sigma yeast nitrogen base without amino acids, 0.77 g/L Clontech dropout mix without uracil, 20 g/L glucose, 40 g/L uracil, 1 g/L 5-fluoro-orotic acid, 20 g/L agar) for two days. The survivors were streaked on both a YPD plate and a CM/Gluc/-Ura plate. A marker-free transformant was identified when it grew on the YPD plate but not on the CM/Gluc/-Ura plate. Marker removal was confirmed by PCR. When the transgene fragments of pSX208, pSX209, and pSX210 were integrated sequentially into PX112, a CEN-PK based haploid strain was obtained, which has a genotype of MAT.alpha. ura3.DELTA.::loxP his3.DELTA. gre3.DELTA.:P5 AAP1U.DELTA.::UAS(FBA1)-PDC1p-VDxylA-ILV5t PTC7U.DELTA.::UAS(FBA1)-PDC1p-VDxylA-ILV5t STB5U.DELTA.::UAS(FBA1)-PDC1p-VDxylA-1 LV5t. The resulting strain was named PXI68.
[0136] PXI68 was subjected to adaptation to speed up xylose fermentation. Adaptation was carried out in 4 mL YPX4 broth (10 g/L yeast extract, 20 g/L peptone, 4 g/L xylose) with a starting cell density at an OD.sub.600 value of 0.5. The strain was grown under micro-aerobic conditions at 32.degree. C. and 200 rpm shaking until approximately 5 doublings. Then it was passed to fresh YPX4 and grown under the same conditions. Adaptation was completed after 10 passages (approximately 50 doublings). Individual adapted strains were isolated and examined for improved xylose utilization using a standard mini-fermentation assay. This assay is described in Example 4 below, using YPX4 (YP medium containing 40 g/L xylose) as fermentation broth. The adaptation resulted in the improved strain, PXI82.
Example 3
Plasmid Constructs for Arabinose Utilization Pathway
[0137] To assemble a bacterium-type arabinose assimilation pathway in a xylose-utilizing strain so that the resultant strain is able to utilize glucose, xylose, and arabinose, three enzymes are required. L-arabinose isomerase (EC 5.3.1.4) is a key enzyme for the arabinose assimilation pathway in many bacteria, encoded by the araA gene. Similar to xylose isomerase, it is a slow enzyme with poor kinetic properties. The B. subtilis araA (BSaraA) gene-encoded arabinose isomerase has the amino acid sequence shown in SEQ ID NO:41. To express this araA, a 1491-bp BSaraA coding region was synthesized using codon optimization for expression in S. cerevisiae (SEQ ID NO:42). It was then linked with the 678-bp ADP promoter (ADHp) at its 5' end and with the 252-bp CYC1 terminator at its 3' end, forming a 2429-bp chimeric expression cassette called ADHp::BSaraA::CYC1t (SEQ ID NO:51) or BSaraA cassette. L-ribulokinase (EC 2.7.1.16) is the second enzyme in the bacterial arabinose assimilation pathway, encoded by the araB gene. The E. coli araB (ECaraB) gene-encoded ribulokinase has the amino acid sequence shown in SEQ ID NO:52. To express this araB, a 1701-bp ECaraB coding region was synthesized using codon optimization for expression in S. cerevisiae (SEQ ID NO:53). It was then linked with the 700-bp ILV5 promoter (ILV5p) at its 5' end and with the 500-bp PHO13 terminator (PHO13-3'UTR) at its 3' end, forming a 2907-bp chimeric expression cassette called ILV5p::ECaraB::PHO13-3'UTR (SEQ ID NO:54) or ECaraB cassette. L-ribulose-5-phosphate 4-epimerase (EC 5.1.3.4) is the third enzyme in the bacterial arabinose assimilation pathway, encoded by the araD gene. The E. coli araD (ECaraD) gene-encoded L-ribulose-5-phosphate 4-epimerase has the amino acid sequence shown in SEQ ID NO:55. To express this araD, a 696-bp ECaraD coding region was synthesized using codon optimization for expression in S. cerevisiae (SEQ ID NO:56). It was then linked with the 679-bp GPD promoter (GPDp) at its 5' end and with the 316-bp ADH1 terminator (ADP1t) at its 3' end, forming a 1691-bp chimeric expression cassette called GPDp::ECaraD::ADH1t (SEQ ID NO:57) or ECaraD cassette.
[0138] The ECaraB cassette was constructed into an integration plasmid, which targeted integration in the PHO13 locus (YDL236W, encoding for an alkaline phosphatase specific for p-nitrophenyl phosphate) on chromosome-IV, resulting in pSA0-B (SEQ ID NO:58; FIG. 5A). Similar to the integration plasmid described earlier, the backbone of this plasmid contained an E. coli replication origin (pBR322 ori) and ampicillin-resistance marker (AP.sup.r) for plasmid propagation in E. coli, as well as AscI and NotI sites at the ends. In the integration plasmid, transgene sequences had a structure of PHO13-U2/ECaraB Cassette/PHO13-3'UTR/URA3 Cassette/PHO13-D2. It was connected with the backbone through NotI and AscI sites. In this transgene structure, 500-bp PHO13-U2 and 500-bp PHO13-D2 were two homologous recombination fragments, corresponding to the chromosome-IV sequence from coordinates 31794 to 32294 and from coordinates 32735 to 33234, respectively. These sequences direct integration of the transgenes into chromosome-IV between the sequences corresponding to these two HRFs, which interrupted the PHO13 locus and deleted the first 439 nucleotides in the coding region. Within the transgene structure, the URA3 cassette provided a selective marker for integration. The PHO13-3'UTR not only served as a terminator for ECaraB but also was the third homologous recombination fragment corresponding to a chromosomal sequence further downstream of PHO13-D2, from coordinates 33235 to 33734. After integration, this fragment was able to interact with the chromosomal copy of PHO13-3'UTR to loop-out the URA3 Cassette and PHO13-D2, thus leaving the ECaraB cassette between PHO13-U2 and PHO13-3'UTR on chromosome-IV without a selective marker. The marker recycling of URA3 further removed the rest of the PHO13 coding sequence. Therefore, the entire PHO13 coding sequence was completely deleted.
[0139] The BSaraA cassette and ECaraD cassette were constructed into a high copy number shuttle vector, resulting in pSA503 (SEQ ID NO:59, FIG. 5B) that could support a high level transient expression of these transgenes. The backbone of this plasmid contained an E. coli replication origin (pBR322 ori) and ampicillin-resistance marker (AP.sup.r) for plasmid propagation in E. coli. It also contained an S. cerevisiae 2 micron replication sequence, and LEU2 and URA3 selection markers for plasmid propagation in S. cerevisiae. The BSaraA cassette was located downstream of the 2 micron element, with a unique 5' BamHI site. The ECaraD cassette was located downstream of the BSaraA cassette but in opposite orientation, with a unique 5' SacII site. The two cassettes were separated by a unique NotI site.
Example 4
Development of Arabinose-Utilization Strains
[0140] To make arabinose-utilizing recombinant strains, PX182 (prepared in Example 2) was used as the recipient of transgenes. The transgene fragment of PHO13-U2/ECaraB Cassette/PHO13-3'UTR/URA3 Cassette/PHO13-D2 was isolated from plasmid pSA0-B by NotI and AscI digestion and transformed into the PX182 strain using the FROZEN-EZ Yeast Transformation II Kit from Zymo Research (Irvine, Calif.). Transformants were selected on CM/Gluc/-Ura plates. Accurate integration was confirmed by PCR. Integration was followed by a recycling procedure to remove the URA3 marker. For this purpose, transformants were grown in YPD broth overnight and then on CM-FOA plates (6.7 g/L Sigma yeast nitrogen base without amino acids, 0.77 g/L Clontech dropout mix without uracil, 20 g/L glucose, 40 g/L uracil, 1 g/L 5-fluoro-orotic acid, 20 g/L agar) for two days. The survivors were streaked on both YPD plates and CM/Gluc/-Ura plates. A marker-free transformant was identified when it grew on a YPD plate but not on a CM/Gluc/-Ura plate, and was named PX182-araB. Marker removal was confirmed by PCR. To introduce the BSaraA cassette and the ECaraD cassette into the strain, shuttle vector pSA503 was transformed into the PX182-araB strain using the FROZEN-EZ Yeast Transformation II Kit from Zymo Research. Transformants were selected on CM/Gluc/-Ura Plates and named PX182-araBAD5030.
[0141] The PX182-araBAD5030 strain was tested for arabinose assimilation and fermentation capacity in a mini-shaking bottle fermentation under micro-aerobic conditions. Fermentation broth was YPA4 (YP medium containing 40 g/L arabinose). To assemble the fermentation, the strain was grown in 3 mL YPD culture at 32.degree. C. with shaking at 200 rpm overnight. Fresh cells were added into 5-mL NALGENE PETG diagnostics vials containing 4 mL fermentation broth, with a starting OD.sub.600 at 0.5. The PETG vials were sealed by screw caps mounted with a WHEATON 20-mm SEPTA PTEF red rubber pad within it. Micro-aerobic conditions were achieved by inserting BD 26G needles through the caps. Fermentation was conducted at 32.degree. C. with shaking at 200 rpm. At specified time intervals, 0.5 mL of culture was drawn through the needle using a syringe. One fifth of the collected culture (0.1 mL) was used for measurement of OD.sub.600. The rest of the culture was spun in a micro-centrifuge at 14000 rpm for 2 min. The supernatant was carefully collected by pipette, placed into a 0.22-.mu.m COSTAR SPIN-X Centrifuge Tube Filter (Corning Inc., Corning, N.Y.), and then passed through the filter by micro-centrifuging at 6000 rpm for 2 min. The flow-through was loaded into an AGILENT 250-4 vial insert within a 2-mL crimp vial and sealed. Xylose, ethanol (EtOH) and other metabolites were analyzed by running flow-through samples through a BIORAD AMINEX HPX-A7H ion exclusion column with 0.01 N H.sub.2SO.sub.4 at a speed of 0.6 mL/min at 55.degree. C. on an AGILENT 1100 HPLC system. Assay results (FIG. 6) showed that during 72-hr of fermentation, arabinose concentration in the broth was reduced from 38.5 g/L to 10.1 g/L (73.7% arabinose had been consumed). The arabinose consumption resulted in cell growth from OD.sub.600 value of 0.5 to 7.9, and supported production of 12.4 g/L ethanol. It confirmed that an arabinose assimilation pathway was assembled successfully in the PX182-araBAD5030 strain. This pathway, as combined with the pentose phosphate pathway engineered into the strain previously, was able to ferment arabinose to ethanol.
Example 5
Identification of AraA Candidates from Cow Rumen and Human Microbiome Databases
[0142] Arabinose isomerase is slow enzyme and functions in the first step of the bacterial-type arabinose assimilation pathway. To identify new bacterial arabinose isomerase candidates for expression testing in yeast, we used amino acid sequences of the arabinose isomerases from seven bacteria species: Bacillus subtilis (BS; SEQ ID NO:41), Escherichia coli (EC; SEQ ID NO:60), Bacillus licheniformis (BL; SEQ ID NO:61), Clostridium acetobutylicum (CA; SEQ ID NO:62), Leuconostoc mesenteroides (LM; SEQ ID NO:63), Lactobacillus plantarum (LP; SEQ ID NO:64), and Pediococcus pentosaceus (PP; SEQ ID NO:65) as queries in BLAST searches against translated open reading frames of the databases generated from the cow rumen metagenome dataset (Hess et al., Science 331:463-467) and the human microbiome dataset (The Human Microbiome Jumpstart Reference Strains Consortium et al., Science 328: 994-999). The putative open reading frames in the BLAST search results were first filtered by removing those entries with greater than 70% identity to any of the seven query amino acid sequences, or those entries containing one or more ambiguous nucleotide "N" in the nucleotide sequences. As all seven query arabinose isomerases have an L-arabinose isomerase protein domain (Arabinose_Isome, PFAM identifier PF02610) followed by an L-arabinose isomerase C-terminal protein domain (Arabinose_Iso_C, PFAM identifier PF11762), we further removed any open reading frames in the BLAST search results which did not contain both aforementioned protein domains in the same order. From the remaining search results, nine putative arabinose isomerases from among the sequences with the closest identities to B. subtilis arabinose isomerase were chosen from the cow rumen metagenome dataset (CR), and eleven were chosen from the human microbiome dataset (HM). These twenty arabinose isomerase (AI) candidates are listed and their sequence identity comparisons with the seven query arabinose isomerases are given in Table 3.
TABLE-US-00004 TABLE 3 Al Candidates and Their Sequence Comparison with Known Als. Data SEQ Percent Identity to Seven Known Als Al Candidate set ID NO BS EC BL CA LM LP PP HMPREF9412_4417 HM 1 64.9 59.8 69.1 56.7 51.6 51.8 50.6 POTG_01507 HM 2 63.2 59.0 67.5 55.2 52.3 52.2 52.2 HMPREF9374_3716 HM 3 59.6 55.8 63.8 52.7 52.3 51.5 49.7 DORFOR_01282 HM 4 57.6 51.3 58.8 52.4 52.0 51.6 51.9 HMPREF0994_04908 HM 5 56.7 52.8 57.4 49.6 48.6 48.3 48.4 NODE_4061684 CR 6 55.6 49.7 56.4 51.6 48.9 48.6 47.6 NODE_3664377 CR 7 52.0 50.2 56.9 51.6 51.3 49.6 50.0 NODE_458803 CR 8 50.9 48.6 55.8 50.5 50.0 48.3 49.6 NODE_3921064 CR 9 50.9 46.9 52.0 48.2 46.8 45.5 45.5 NODE_3693095 CR 10 50.3 50.0 52.6 51.4 48.0 48.3 47.6 DORLON_00938 HM 11 56.4 51.7 58.4 52.6 51.8 51.6 51.3 HMPREF9469_04726 HM 21 56.3 50.6 58.2 50.0 50.2 48.2 49.3 HMPREF9467_00216 HM 13 55.7 49.8 57.0 50.0 49.3 48.2 48.4 RTO_26010 HM 14 55.3 51.3 57.6 50.2 49.6 50.3 49.3 BRYFOR_08166 HM 15 55.1 49.5 57.4 51.1 49.0 47.8 48.4 RUMOBE_03031 HM 16 54.7 51.6 57.6 50.8 50.8 48.9 49.3 NODE_3658038 CR 17 50.3 50.5 53.1 52.0 49.0 49.1 48.5 NODE_4175755 CR 18 48.3 49.7 50.8 50.6 48.6 49.7 48.6 NODE_2588280 CR 19 48.3 47.6 51.1 49.3 49.9 50.2 49.5 NODE_3735508 CR 20 45.2 44.7 46.7 48.8 42.4 43.1 43.1
Example 6
Plasmid Constructs and Functional Studies of the Arabinose Isomerase Candidates
[0143] Synthetic coding sequences for the twenty arabinose isomerase (AI) candidates identified above were designed and synthesized using codon optimization for expression in S. cerevisiae. They were named araA-1 to araA-20 (SEQ ID NOs:21-40, respectively), corresponding to arabinose isomerase candidates with amino acid sequences of SEQ ID NOs:1-20, respectively. To test these arabinose isomerases, each synthetic coding region was constructed into pSA503 between the last nucleotide of the ADHp fragment and the PacI site in front of the CYC1t fragment, which accurately replaced the BSaraA coding sequence. This resulted in twenty plasmid constructs (from pSA503-1 to pSA503-20) that had sequences identical to pSA503 except for different araA coding regions. All twenty plasmid constructs were transformed into the PXI82-araB strain using the FROZEN-EZ Yeast Transformation II Kit from Zymo Research. Transformants were selected on CM/Gluc/-Ura Plates and named PX182-araBAD50301 to PX182-araBAD50320, corresponding to plasmid constructs pSA503-1 to pSA503-20, respectively. As a control, pSA503 was also transformed into PXI82-araB and selected on a CM/Glud-Ura Plate. Its transformants were named as PXI82-araBAD50300. The constructs and transformants are summarized in Table 4.
TABLE-US-00005 TABLE 4 Summary of Twenty Synthetic AraA Nucleotide Sequences and Their Transformants. Synthetic SEQ Size of Transformant Plasmid araA ID NO araA Encoded Al PXI82- pSA503 BSaraA 42 1491 nt BSAI araBAD50300 PXI82- psSA503-1 araA-1 21 1488 nt HMPREF9412_4417 araBAD50301 PXI82- psSA503-2 araA-2 22 1488 nt POTG_01507 araBAD50302 PXI82- psSA503-3 araA-3 23 1488 nt HMPREF9374_3716 araBAD50303 PXI82- psSA503-4 araA-4 24 1500 nt DORFOR_01282 araBAD50304 PXI82- psSA503-5 araA-5 25 1497 nt HMPREF0994_04908 araBAD50305 PXI82- psSA503-6 araA-6 26 1497 nt NODE_4061684 araBAD50306 PXI82- psSA503-7 araA-7 27 1434 nt NODE_3664377 araBAD50307 PXI82- psSA503-8 araA-8 28 1434 nt NODE_458803 araBAD50308 PXI82- psSA503-9 araA-9 29 1497 nt NODE_3921064 araBAD50309 PXI82- psSA503- araA-10 30 1464 nt NODE_3693095 araBAD50310 10 PXI82- psSA503- araA-11 31 1500 nt DORLON_00938 araBAD50311 11 PXI82- psSA503- araA-12 32 1497 nt HMPREF9469_04726 araBAD50312 12 PXI82- psSA503- araA-13 33 1497 nt HMPREF9467_00216 araBAD50313 13 PXI82- psSA503- araA-14 34 1500 nt RTO_26010 araBAD50314 14 PXI82- psSA503- araA-15 35 1497 nt BRYFOR_08166 araBAD50315 15 PXI82- psSA503- araA-16 36 1497 nt RUMOBE_03031 araBAD50316 16 PXI82- psSA503- araA-17 37 1476 nt NODE_3658038 araBAD50317 17 PXI82- psSA503- araA-18 38 1467 nt NODE_4175755 araBAD50318 18 PXI82- psSA503- araA-19 39 1467 nt NODE_2588280 araBAD50319 19 PXI82- psSA503- araA-20 40 1413 nt NODE_3735508 araBAD50320 20
[0144] To carry out the functional study for these twenty arabinose isomerases, three transformants were picked as replicas from each of twenty transformations and a control transformation. They were tested for arabinose assimilation and fermentation capacity in the mini-shaking bottle fermentation under micro-aerobic conditions described in Example 4. The fermentation broth used was YPA4 (YP medium containing 40 g/L arabinose). Cultures were grown, and samples were taken and analyzed as described. Fermentation assays for transformants from PX182-araBAD50301 to PX182-araBAD50315 and control transformant PX182-araBAD50300 were carried out for 72 hrs. Fermentations for PX182-araBAD50316 PX182-araBAD50320 were carried out for only 48 hrs because fermentation went fast for those transformants. Therefore, the 48-hr fermentation assays using to PXI82-araBAD50300 were also set up as controls. The assay result for each transformant was an average of fermentation data from three replicas. Ethanol production relative to that of PXI82-araBAD50300 that expressed B. subtilis arabinose isomerase is summarized in Table 5.
TABLE-US-00006 TABLE 5 Functional Assays of Twenty AI Candidates SEQ ID NO Ethanol Production: % Synthetic of encoded of PXI82- Transformant araA protein araBAD50300 PXI82-araBAD50300 BSaraA 41 100.0% PXI82-araBAD50301 araA-1 1 0.0% PXI82-araBAD50302 araA-2 2 0.0% PXI82-araBAD50303 araA-3 3 0.0% PXI82-araBAD50304 araA-4 4 39.9% PXI82-araBAD50305 araA-5 5 65.5% PXI82-araBAD50306 araA-6 6 17.4% PXI82-araBAD50307 araA-7 7 110.9% PXI82-araBAD50308 araA-8 8 97.7% PXI82-araBAD50309 araA-9 9 68.8% PXI82-araBAD50310 araA-10 10 105.2% PXI82-araBAD50311 araA-11 11 39.6% PXI82-araBAD50312 araA-12 12 0.0% PXI82-araBAD50313 araA-13 13 43.8% PXI82-araBAD50314 araA-14 14 0.0% PXI82-araBAD50315 araA-15 15 94.6% PXI82-araBAD50316 araA-16 16 0.0% PXI82-araBAD50317 araA-17 17 109.9% PXI82-araBAD50318 araA-18 18 152.4% PXI82-araBAD50319 araA-19 19 139.0% PXI82-araBAD50320 araA-20 20 96.1%
[0145] The results showed that six arabinose isomerase candidates, which were constructed into pSA503-1, pSA503-2, pSA503-3, pSA503-12, pSA503-14, and pSA503-16, did not support production of ethanol by S. cerevisiae at all; seven arabinose isomerase candidates, which were constructed into pSA503-4, pSA503-5, pSA503-6, pSA503-9, pSA503-11 and pSA503-13, functioned in S. cerevisiae but were 31.2% to 82.6% less effective in supporting ethanol production than B. subtilis arabinose isomerase. The other eight arabinose isomerase candidates, which were constructed into pSA503-7, pSA503-8, pSA503-10, pSA503-15, pSA503-17, pSA503-18, pSA503-19 and pSA503-20, performed similarly to, or better than, B. subtilis arabinose isomerase for ethanol production. It is interesting to note that all of the candidates in this group originated from the cow rumen dataset except for araA-15, which was from the human microbiome dataset. The best candidate was a cow rumen arabinose isomerase encoded by araA-18. It supported ethanol production up to 152.4% of that supported by the B. subtilis arabinose isomerase.
Example 7
In Vitro Activity Assay of the Top Arabinose Isomerase Candidates
[0146] Example 6 showed that the top performers in fermentation to produce ethanol included PX182-araBAD50307, PX182-araBAD50308, PX182-araBAD50310, PX182-araBAD50315, PX182-araBAD50317, PX182-araBAD50318, PX182-araBAD50319 and PX182-araBAD50320. To determine whether the arabinose isomerases expressed in these transformants were indeed highly active, the transformants were grown in 25 mL CM/Gluc/-Ura broth (6.7 g/L Sigma yeast nitrogen base without amino acids, 0.77 g/L CLONTECH dropout mix without uracil, 2% glucose) overnight at 32.degree. C. with 200 rpm shaking. At the same time, PX182-araBAD50313 and PX182-araBAD50316 were also grown up as representatives of the groups of arabinose isomerases supporting ethanol production at less than BSara or not at all, respectively. PX182-araBAD50300 was grown as a positive control. To prepare total soluble protein extract, overnight-grown cells with an OD.sub.600 value of 100 were collected and washed in 10 mL protein extraction buffer (PEB) (10 mM triethanolamine, pH 8.0, 10 mM MgSO.sub.4, 1 mM DTT, 5% glycerol). Cells were resuspended in 1 mL ice-cold PEB with Roche COMPLETE MINI EDTA-Free proteinase inhibitors (product #: 11836170001) and transferred into a tube containing 0.5-mm soda lime glass beads (BioSpec Products, Inc.). Total protein was extracted by beating cells in the tube using a BIO101 FP120 FASTPREP at setting 6 for 30 sec. Beating was repeated six times. Between the beatings, the tube was cooled down on ice for 2 min. Finally, protein extract was obtained by centrifugation and the protein concentration was determined using Coomassie Protein Assay Reagent (Thermo Scientific).
[0147] Arabinose isomerase activity in each protein extract was measured by the cysteine-carbazol method (Dische and Borefreund, J. Biol. Chem. 192:583-587). First, a 100-.mu.L assay reaction was assembled to include 10 mM MgSO.sub.4, 10 mM triethanolamine (pH 8.0), 50 mM arabinose, and appropriate amount (2-10 .mu.L) protein extract. The reaction was incubated at 32.degree. C. for 10 min and then stopped by adding 3 mL ice cold 75% H.sub.2SO.sub.4. Ribulose produced in the assay was quantified in a color reaction by adding 100 .mu.L of 2.4% cysteine hydrochloride and 100 .mu.L of 0.12% carbazol ethanolic solution. After incubating at room temperature for 6 min, OD.sub.540 value was measured using a spectrophotometer. Ribulose concentration was determined by comparing the OD.sub.540 value with a standard curve of a ribulose color reaction. A unit of arabinose isomerase was defined as the amount of enzyme required to produce one micromole of ribulose in 10 minutes of incubation.
[0148] Arabinose isomerase activity of each protein extract was determined by the abovementioned cysteine-carbazol method. For one assay, three replicas were carried out and the result was an average of them. A blank assay without protein extract was set up and deducted from all assay results. Specific activity of a protein extract was calculated based on its activity unit and protein concentration, expressed as unit per milligram of protein per min. FIG. 7 shows arabinose isomerase activities in the protein extracts. The results indicated that (1) protein extract of PX182-araBAD50316 did not have detectable arabinose isomerase activity, which was consistent with its functional assay (Table 5); (2) protein extract of PX182-araBAD50313 presented an activity approximately 2.4-fold higher than that of PXI82-araBAD50300, even though in the functional assay the ethanol productivity of PX182-araBAD50313 was only 43.8% of PXI82-araBAD50300 (Table 5); (3) other protein extracts had activities up to about 20-fold higher than that of PXI82-araBAD50300, but the maximal ethanol productivity was about 152% of PXI82-araBAD50300 (Table 5). Thus ethanol production did not always directly correlate with the in vitro enzyme activity result, perhaps due to the different conditions of the intracellular environment. Good arabinose isomerase candidates should perform well not only in the in vitro activity assay but also in the functional assay. Therefore, both in vitro activity assay and fermentation functional assay confirmed that arabinose isomerase candidates expressed in PX182-araBAD50307, PX182-araBAD50308, PX182-araBAD50310, PX182-araBAD50315, PX182-araBAD50317, PX182-araBAD50318, PX182-araBAD50319, and PX182-araBAD50320 were the enzymes that performed as well or better than B. subtilis arabinose isomerase in S. cerevisiae.
Sequence CWU
1
1
711495PRTunknownsequence from Human Microbiome dataset 1Met Asn Leu Lys
Pro His Thr Phe Trp Phe Val Thr Gly Ser Gln His1 5
10 15Leu Tyr Gly Pro Glu Thr Leu Glu Gln Val
Ala Glu His Ser Arg Ile 20 25
30Val Ala Thr Glu Phe Asp Lys Asp Pro Val Phe Thr Tyr Pro Ile Val
35 40 45Phe Lys Pro Ile Val Thr Thr Pro
Asp Glu Ile Tyr Lys Leu Ile Leu 50 55
60Glu Ala Asn Asn Asp Glu Ser Cys Ala Gly Ile Met Thr Trp Met His65
70 75 80Thr Phe Ser Pro Ala
Lys Met Trp Ile Ala Gly Leu Ser Gln Leu Gln 85
90 95Lys Pro Leu Leu His Phe His Thr Gln Phe Asn
Arg Asp Ile Pro Trp 100 105
110Glu Thr Ile Asp Met Asp Phe Met Asn Leu Asn Gln Ser Ala His Gly
115 120 125Asp Arg Glu Tyr Gly His Ile
Gly Ala Arg Leu Gly Ile Ala Arg Lys 130 135
140Val Val Val Gly His Trp Glu Asp Gly Glu Val Arg Gly Ser Ile
Ala145 150 155 160Gly Trp
Met Arg Thr Ala Ala Ala Tyr Ala Glu Ser Arg Arg Leu Lys
165 170 175Val Ala Arg Phe Gly Asp Asn
Met Arg Gln Val Ala Val Thr Glu Gly 180 185
190Asp Lys Val Glu Ala Gln Ile Lys Leu Gly Trp Ser Val Asn
Gly Tyr 195 200 205Gly Ile Gly Asp
Leu Val Gln Ser Met Asn Glu Val Gly Asp Glu Glu 210
215 220Val Lys Ala Leu Leu Asn Glu Tyr Ala Glu Ser Tyr
Ser Ile Thr Lys225 230 235
240Glu Gly Leu Ser Asp Gly Pro Val Arg Asp Ser Ile Ala Tyr Gln Ala
245 250 255Arg Ile Glu Ile Ala
Leu Arg Arg Phe Leu Glu Glu Gly Gly Phe Gly 260
265 270Ala Phe Thr Thr Thr Phe Glu Asp Leu His Gly Met
Lys Gln Leu Pro 275 280 285Gly Leu
Ala Val Gln Arg Leu Met Glu Ser Gly Tyr Gly Phe Gly Gly 290
295 300Glu Gly Asp Trp Lys Thr Ala Ala Leu Thr Arg
Val Leu Lys Val Leu305 310 315
320Ala Asp Asn Lys Ser Thr Ser Phe Met Glu Asp Tyr Thr Tyr His Phe
325 330 335Glu Pro Gly Asn
His Met Ile Leu Gly Ser His Met Leu Glu Val Cys 340
345 350Pro Thr Ile Ala Leu Asp Lys Pro Thr Leu Glu
Val His Pro Leu Gly 355 360 365Ile
Gly Gly Lys Gly Asp Pro Ala Arg Leu Val Phe Asn Gly Gln Asp 370
375 380Gly Pro Ala Val Asn Ala Ser Leu Ile Asp
Leu Gly His Arg Phe Arg385 390 395
400Leu Leu Val Asn Val Val Asp Gly Val Lys Val Glu Gln Pro Met
Pro 405 410 415Lys Leu Pro
Val Ala Arg Val Leu Trp Lys Pro Gln Pro Ser Leu Arg 420
425 430Glu Ser Ala Glu Ala Trp Ile Leu Ala Gly
Gly Ala His His Thr Val 435 440
445Leu Ser Tyr Ala Met Thr Ala Glu His Leu Ser Asp Trp Ala Glu Met 450
455 460Thr Gly Ile Glu Ala Val Val Ile
Asp Lys Asp Thr Thr Ile Pro Arg465 470
475 480Phe Lys Asn Glu Leu Arg Trp Ser Glu Ala Ala Tyr
Arg Leu Arg 485 490
4952495PRTunknownsequence from Human Microbiome dataset 2Met Lys Leu Lys
Pro His Ser Phe Trp Phe Val Thr Gly Ser Gln His1 5
10 15Leu Tyr Gly Pro Glu Thr Leu Glu Glu Val
Ala Gly His Ser Arg Ile 20 25
30Ile Ala Glu Gln Leu Asp Lys Asp Pro Ala Ile Gly Phe Pro Val Val
35 40 45Phe Lys Pro Ile Val Thr Thr Pro
Asp Glu Ile Tyr Lys Leu Ile Leu 50 55
60Ala Ala Asn Gly Asp Glu Thr Cys Ala Gly Ile Ile Thr Trp Met His65
70 75 80Thr Phe Ser Pro Ala
Lys Met Trp Ile Ala Gly Leu Ser Gln Leu Gln 85
90 95Lys Pro Leu Leu His Phe His Thr Gln Phe Asn
Arg Asp Ile Pro Trp 100 105
110Glu Thr Ile Asp Met Asp Phe Met Asn Leu Asn Gln Ser Ala His Gly
115 120 125Asp Arg Glu Tyr Gly His Ile
Gly Ala Arg Leu Gly Ile Asn Arg Lys 130 135
140Ile Val Val Gly His Trp Glu Asp Glu Glu Val Arg Ala Ser Leu
Ala145 150 155 160Gly Trp
Met Arg Thr Ala Val Ala Tyr Ala Glu Ser Arg Gln Leu Lys
165 170 175Val Ala Arg Phe Gly Asp Asn
Met Arg Glu Val Ala Val Thr Glu Gly 180 185
190Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp Ser Val Asn
Gly Tyr 195 200 205Gly Val Gly Asp
Leu Val Gln Val Leu Asn Glu Val Thr Asp Ala Glu 210
215 220Ala Glu Ala Leu Leu Lys Glu Tyr Ala Glu Gln Tyr
Thr Ile Thr Gln225 230 235
240Ala Gly Leu Ser Ser Gly Pro Ile Arg Asp Ser Ile Ala Tyr Gln Ala
245 250 255Lys Leu Glu Ile Ala
Met Lys Arg Phe Leu Glu Gln Gly Gly Phe Gly 260
265 270Ala Phe Thr Thr Thr Phe Glu Asp Leu His Gly Leu
Lys Gln Leu Pro 275 280 285Gly Leu
Ala Val Gln Arg Leu Met Glu Ala Gly Tyr Gly Phe Gly Gly 290
295 300Glu Gly Asp Trp Lys Thr Ala Ala Leu Thr Arg
Val Leu Lys Val Leu305 310 315
320Ala Asn Asn Lys Ser Thr Ser Phe Met Glu Asp Tyr Thr Tyr His Phe
325 330 335Glu Pro Gly Asn
His Met Ile Leu Gly Ala His Met Leu Glu Val Cys 340
345 350Pro Thr Ile Ala Ala Thr Lys Pro Thr Ile Glu
Val His Pro Leu Gly 355 360 365Ile
Gly Gly Lys Ala Asp Pro Ala Arg Met Val Phe Asp Gly Gln Ala 370
375 380Gly Pro Ala Val Asn Ala Ser Leu Val Asp
Leu Gly His Arg Phe Arg385 390 395
400Leu Leu Val Asn Val Val Asp Gly Val Lys Val Glu Lys Pro Met
Pro 405 410 415Lys Leu Pro
Val Ala Arg Val Leu Trp Lys Pro Gln Pro Ser Leu Arg 420
425 430Glu Ser Ala Glu Ala Trp Ile Leu Ala Gly
Gly Ala His His Thr Val 435 440
445Leu Ser Tyr Ala Ile Thr Ala Glu Asn Leu Ser Asp Trp Ala Glu Met 450
455 460Val Gly Ile Glu Ala Val Ile Ile
Asp Lys Asp Thr Ser Val Pro Arg465 470
475 480Phe Lys Asn Glu Leu Arg Trp Ser Asp Ala Ala Tyr
Arg Leu Arg 485 490
4953495PRTunknownsequence from Human Microbiome dataset 3Met Gln Arg Thr
Pro Tyr Glu Phe Trp Phe Val Thr Gly Ser Gln His1 5
10 15Leu Tyr Gly Ser Glu Ala Leu Ala Glu Val
Ser Ser His Ser Arg Gln 20 25
30Ile Thr Gln Ala Phe Asn Glu Ala Asp Ser Ile Ser Phe Pro Ile Val
35 40 45Val Lys Pro Val Val Lys Thr Pro
Glu Glu Ile Leu Gln Leu Cys Met 50 55
60Glu Ala Asn Ser Asp Glu Asn Cys Ala Gly Leu Ile Thr Trp Met His65
70 75 80Thr Phe Ser Pro Gly
Lys Met Trp Ile Gly Gly Leu Ser Gln Leu His 85
90 95Lys Pro Leu Leu His Phe His Thr Gln Phe His
Arg Glu Ile Pro Trp 100 105
110Asp Arg Ile Asp Met Asp Phe Met Asn Leu His Gln Ser Ala His Gly
115 120 125Asp Arg Glu Phe Gly Phe Ile
Ala Thr Arg Leu Gly Ile Leu Arg Lys 130 135
140Glu Val Val Gly His Trp Arg Asp Glu Ala Val Gln Lys Arg Leu
Ser145 150 155 160Asp Trp
Met Arg Thr Ala Ile Ala Cys Leu Glu Gly Lys Lys Leu Lys
165 170 175Val Ala Arg Phe Gly Asp Asn
Met Arg Arg Val Ala Val Thr Glu Gly 180 185
190Asp Lys Val Glu Ala Gln Ile Gln Phe Gly Trp Ser Ile Asn
Gly Tyr 195 200 205Gly Val Gly Asp
Leu Val Gln Arg Ile Thr Asp Ile Ser Asp Thr Ala 210
215 220Val His Gln Leu Phe Arg Glu Tyr Gln Glu Arg Tyr
Asp Phe Pro Pro225 230 235
240Glu Ala Arg Glu Ala Gly Pro Ile Arg Asp Ser Ile Leu Glu Gln Ala
245 250 255Arg Ile Glu Leu Gly
Leu Lys Leu Phe Leu Arg Glu Gly Gly Tyr Ser 260
265 270Ala Phe Thr Thr Thr Phe Glu Asp Leu His Gly Leu
Lys Gln Leu Pro 275 280 285Gly Leu
Ala Val Gln Arg Leu Met Ser Glu Gly Tyr Gly Phe Gly Ala 290
295 300Glu Gly Asp Trp Arg Thr Ala Gly Leu Leu Arg
Met Met Lys Ile Met305 310 315
320Ala Asp Asn Glu Gly Thr Ser Phe Met Glu Asp Tyr Thr Tyr His Leu
325 330 335Glu Pro Gly Asn
Glu Met Ile Leu Gly Ala His Met Leu Glu Val Cys 340
345 350Pro Thr Ile Ala Ala Gln Arg Pro Gly Ile Arg
Val His Pro Leu Ser 355 360 365Ile
Gly Gly Lys Ala Asp Pro Ala Arg Leu Val Phe Asp Gly Arg Pro 370
375 380Gly Pro Ala Leu Asn Val Ser Leu Ile Asp
Leu Gly Asn Arg Phe Arg385 390 395
400Leu Leu Ile Asn Lys Val Asp Ala Val His Pro Lys Ser Ala Met
Pro 405 410 415His Leu Pro
Val Ala Arg Val Leu Trp Lys Pro Arg Pro Ser Leu His 420
425 430Asp Ser Ala Glu Ala Trp Met Tyr Ala Gly
Gly Ala His His Thr Val 435 440
445Phe Ser Tyr His Val Thr Thr Glu Gln Leu Leu Asp Trp Ala Glu Trp 450
455 460Val Asp Met Glu Ala Leu Val Ile
Asp Glu Gln Thr Ser Leu Ser Ser465 470
475 480Phe Arg Arg Gln Leu Lys Trp Asn Asp Ala Tyr Tyr
Arg Ile Arg 485 490
4954499PRTunknownsequence from Human Microbiome dataset 4Met Leu Lys Thr
Lys Asn Tyr Gln Phe Trp Phe Cys Thr Gly Ser Gln1 5
10 15Asp Leu Tyr Gly Asp Glu Cys Leu Ala His
Val Ala Glu His Ala Lys 20 25
30Lys Ile Val Glu Ala Leu Asn Ala Ser Gly Asn Leu Pro Tyr Glu Val
35 40 45Val Trp Lys Pro Thr Leu Ile Thr
Asn Glu Leu Ile Arg Arg Thr Phe 50 55
60Asn Glu Ala Asn Thr Asp Glu Asn Cys Ala Gly Val Ile Thr Trp Met65
70 75 80His Thr Phe Ser Pro
Ala Lys Ser Trp Ile Leu Gly Leu Gln Glu Phe 85
90 95Arg Lys Pro Leu Leu His Leu His Thr Gln Phe
Asn Arg Glu Ile Pro 100 105
110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn Gln Ser Ala His
115 120 125Gly Asp Arg Glu Phe Gly His
Ile Phe Ser Arg Leu His Met Asn Arg 130 135
140Lys Val Val Val Gly Tyr Trp Ala Asp Glu Asp Val Gln Lys Gln
Ile145 150 155 160Gly Ser
Trp Met Arg Thr Ala Val Gly Val Val Glu Ser Ser His Ile
165 170 175Arg Val Met Arg Ile Ala Asp
Asn Met Arg Asn Val Ala Val Thr Glu 180 185
190Gly Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp Glu Val
Asp Ala 195 200 205Tyr Pro Val Asn
Glu Ala Val Glu Ala Val Asn Ala Val Ser Gln Ala 210
215 220Asp Ile Asp Thr Leu Val Glu Glu Tyr Tyr Asp Lys
Tyr Glu Ile Leu225 230 235
240Leu Glu Gly Arg Asp Glu Lys Glu Phe Arg Arg His Val Ala Val Gln
245 250 255Ala Gly Ile Glu Ile
Gly Leu Glu Arg Phe Leu Glu Glu Asn Asn Tyr 260
265 270Gln Ala Ile Val Thr His Phe Gly Asp Leu Gly Gly
Phe Lys Gln Leu 275 280 285Pro Gly
Leu Ala Met Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Ala Glu Gly Asp Trp Lys Thr Ala Ala Met Val
Arg Leu Met Lys Ile305 310 315
320Met Thr Gly Gly Met Lys Asp Ala Lys Gly Thr Ser Phe Met Glu Asp
325 330 335Tyr Thr Tyr Asn
Leu Val Pro Gly Lys Glu Gly Ile Leu Glu Ala His 340
345 350Met Leu Glu Val Cys Pro Thr Ile Ala Asp Gly
Lys Ile Ser Ile Lys 355 360 365Glu
Gln Pro Leu Ser Met Gly Asp Arg Glu Asp Pro Ala Arg Leu Val 370
375 380Phe Thr Ala Lys Glu Gly Pro Ala Ile Ala
Ala Ser Leu Ile Asp Leu385 390 395
400Gly Asp Arg Phe Arg Leu Leu Ile Asn Glu Val Glu Cys Lys Lys
Thr 405 410 415Glu Lys Pro
Met Pro Lys Leu Pro Val Ala Thr Ala Phe Trp Thr Pro 420
425 430Lys Pro Asn Leu Lys Ile Gly Ala Gln Ser
Trp Ile Leu Ala Gly Gly 435 440
445Ala His His Thr Ala Phe Ser Tyr Asp Leu Ser Ala Glu Gln Met Gly 450
455 460Asp Trp Ala Glu Ala Met Gly Ile
Glu Ala Val Tyr Ile Asp Ala Asp465 470
475 480Thr Thr Ile Arg Gln Leu Lys Asn Glu Leu Arg Trp
Asn Glu Leu Ala 485 490
495Tyr Arg Arg5498PRTunknownsequence from Human Microbiome dataset 5Met
Lys Thr Gly Arg Asp Tyr Lys Phe Trp Phe Cys Thr Gly Ser Gln1
5 10 15Asp Leu Tyr Gly Glu Glu Cys
Leu Arg Lys Val Ala Glu His Ser Ala 20 25
30Lys Ile Val Glu Gly Leu Asn Ala Ser Gly Arg Leu Pro Phe
Glu Val 35 40 45Val Leu Lys Pro
Thr Leu Ile Asp Pro Ala Thr Ile Arg Arg Thr Leu 50 55
60Asn Glu Ala Asn Glu Asp Gly Glu Cys Ala Gly Val Ile
Thr Trp Met65 70 75
80His Thr Phe Ser Pro Ala Lys Met Trp Ile Leu Gly Leu Lys Glu Tyr
85 90 95Arg Lys Pro Leu Cys His
Leu His Thr Gln Phe Asn Glu Glu Ile Pro 100
105 110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn
Gln Ser Ala His 115 120 125Gly Asp
Arg Glu Phe Gly His Met Val Ser Arg Met Gly Met Glu Arg 130
135 140Lys Ile Ile Val Gly His Trp Ala Asn Ala Glu
Val Gln Glu Lys Ile145 150 155
160Gly Ser Trp Met Arg Thr Ala Ile Gly Ile Met Glu Ser Ser His Ile
165 170 175Arg Val Cys Arg
Ile Gly Asp Asn Met Asn Asn Val Ala Val Thr Glu 180
185 190Gly Asp Lys Val Glu Ala Glu Val Lys Phe Gly
Trp Glu Ile Asp His 195 200 205Tyr
Cys Val Asn Asp Ala Val Glu Tyr Val Asn Ala Val Ser Glu Gly 210
215 220Asp Val Asn Ala Leu Val Glu Glu Tyr Tyr
Ser Lys Tyr Gln Ile Leu225 230 235
240Leu Glu Gly Arg Asp Pro Glu Glu Phe Arg Ala His Val Ala Ala
Gln 245 250 255Ala Lys Ile
Glu Ile Gly Leu Glu Lys Phe Leu Glu Asp Gly Asp Tyr 260
265 270His Ala Ile Val Thr His Phe Gly Met Leu
Gly Gly Leu Gln Gln Leu 275 280
285Pro Gly Leu Ala Ile Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala
Ala Met Val Arg Leu Met Lys Ile305 310
315 320Met Ala Ala Gly Val Pro Gly Ala Lys Gly Thr Ser
Phe Met Glu Asp 325 330
335Tyr Thr Tyr Asn Leu Val Pro Gly Lys Glu Gly Ile Leu Gln Ala His
340 345 350Met Leu Glu Val Cys Pro
Ser Ile Ala Glu Gly Pro Ile Ser Ile Lys 355 360
365Val Gln Pro Leu Ser Met Gly Asn Arg Glu Asp Pro Ala Arg
Leu Val 370 375 380Phe Thr Ser Lys Thr
Gly Pro Ala Val Ala Thr Ser Leu Val Asp Leu385 390
395 400Gly Asn Arg Phe Arg Leu Ile Ile Asn Ala
Val Asp Cys Lys Lys Cys 405 410
415Glu Lys Glu Met Pro Lys Leu Pro Val Ala Thr Ala Phe Trp Thr Pro
420 425 430Gln Pro Asp Leu Ala
Thr Gly Ala Gln Ala Trp Ile Leu Ala Gly Gly 435
440 445Ala His His Thr Ala Phe Ser Tyr Asp Leu Thr Val
Asp Gln Met Val 450 455 460Asp Trp Ala
Ala Ala Met Gly Ile Glu Ser Val Val Ile Asp Lys Asp465
470 475 480Thr Thr Ile Arg Asn Phe Lys
Asn Glu Leu Arg Trp Asn Ser Ile Tyr 485
490 495Tyr Arg6498PRTunknownsequence from cow rumen
metagenome dataset 6Met Ile Gln Thr Lys Ala Tyr Lys Phe Trp Phe Cys Thr
Gly Ser Gln1 5 10 15Asp
Leu Tyr Gly Asp Glu Val Leu Arg His Val Ala Asp His Ser Lys 20
25 30Glu Ile Val Glu Glu Leu Asn Lys
Ser Gly Ile Leu Pro Tyr Glu Val 35 40
45Val Trp Lys Pro Val Leu Ile Thr Asn Gln Leu Ile Arg Gln Thr Phe
50 55 60Asn Glu Ala Asn Ala Asp Asp Ser
Cys Ala Gly Val Ile Thr Trp Met65 70 75
80His Thr Phe Ser Pro Ala Lys Ser Trp Ile Leu Gly Leu
Gln Glu Phe 85 90 95Arg
Lys Pro Leu Leu His Leu His Thr Gln Tyr Asn Glu Glu Ile Pro
100 105 110Tyr Asp Thr Ile Asp Met Asp
Phe Met Asn Glu Asn Gln Ala Ala His 115 120
125Gly Asp Arg Glu Tyr Gly His Ile Val Ser Arg Met Gly Ile Glu
Arg 130 135 140Lys Val Ile Ala Gly Tyr
Trp Lys Asp Asn Glu Val Arg Ser Arg Ile145 150
155 160Ala Ser Trp Met Arg Thr Ala Val Gly Val Met
Glu Ser Ser His Ile 165 170
175Arg Val Met Arg Val Ala Asp Asn Met Arg Asn Val Ala Val Thr Glu
180 185 190Gly Asp Lys Val Glu Ala
Gln Ile Lys Phe Gly Trp Glu Val Asp Thr 195 200
205Tyr Pro Val Asn Glu Ile Ala Asp Ser Val Ala Thr Val Ser
Ala Ser 210 215 220Asp Val Asn Ala Leu
Leu Asp Glu Tyr Tyr Asp Lys Tyr Glu Ile Ile225 230
235 240Leu Asp Gly Arg Asp Pro Asp Glu Phe Lys
Lys His Val Ala Val Gln 245 250
255Ala Gln Ile Glu Leu Gly Phe Glu Arg Phe Leu Glu Glu Lys Asn Tyr
260 265 270Gln Ala Ile Val Thr
His Phe Gly Asp Leu Gly Ala Leu Gly Gln Leu 275
280 285Pro Gly Leu Ala Ile Gln Arg Leu Met Glu Lys Gly
Tyr Gly Phe Gly 290 295 300Ala Glu Gly
Asp Trp Lys Val Ala Ala Met Val Arg Leu Met Lys Ile305
310 315 320Met Thr Ser Gly Met Lys Asp
Ala Lys Gly Thr Ser Met Leu Glu Asp 325
330 335Tyr Thr Tyr Asn Leu Val Arg Gly Lys Glu Gly Ile
Leu Glu Ala His 340 345 350Met
Leu Glu Ile Cys Pro Thr Ile Ala Asp Gly Pro Ile Ser Ile Arg 355
360 365Val Lys Pro Leu Ser Met Gly Asp Arg
Glu Asp Pro Ala Arg Leu Val 370 375
380Phe Thr Ser Lys Glu Gly Lys Gly Val Ala Thr Ser Leu Ile Asp Leu385
390 395 400Gly Asn Arg Phe
Arg Leu Ile Ile Asn Glu Val Glu Cys Lys Lys Thr 405
410 415Glu Lys Pro Met Pro Asn Leu Pro Val Ala
Thr Ala Tyr Trp Thr Pro 420 425
430Tyr Pro Asp Leu Tyr Thr Gly Ala Glu Ala Trp Ile Leu Ala Gly Gly
435 440 445Ala His His Thr Ala Phe Ser
Tyr Asp Leu Thr Ser Gly Gln Met Ala 450 455
460Asp Trp Ala Glu Met Met Gly Ile Glu Ala Val Ile Ile Asp Lys
Asn465 470 475 480Thr Thr
Ile Pro Ala Phe Lys Lys Glu Leu Lys Leu Gly Asp Val Phe
485 490 495Tyr Arg7477PRTunknownsequence
from cow rumen metagenome dataset 7Met Lys Phe Trp Phe Val Thr Gly Ser
Gln Phe Leu Tyr Gly Glu Glu1 5 10
15Thr Leu Arg Gln Val Glu Glu Asp Ser Lys Lys Ile Val Asp Gly
Leu 20 25 30Arg Leu Pro Phe
Pro Val Glu Tyr Lys Leu Thr Val Lys Thr Glu Ser 35
40 45Glu Ile Glu Arg Ile Val Lys Glu Ala Asn Tyr Asp
Asp Glu Cys Ala 50 55 60Gly Ile Ile
Thr Phe Cys His Thr Phe Ser Pro Ser Lys Met Trp Ile65 70
75 80Asn Gly Leu Ala Leu Leu Gln Lys
Pro Trp Leu His Phe His Thr Gln 85 90
95Phe Asn Glu Thr Ile Pro Asn Glu Ala Ile Asp Met Asp Tyr
Met Asn 100 105 110Leu His Gln
Ser Ala His Gly Asp Arg Glu His Gly Phe Ile Gly Ala 115
120 125Arg Leu Arg Val Pro Arg Ala Val Val Ala Gly
Tyr Trp Lys Asp Pro 130 135 140Ala Val
Gln Ala Lys Ile Gly Glu Trp Gln Arg Ala Ala Val Gly Val145
150 155 160Met Phe Ser Arg Ser Leu Lys
Ile Val Arg Phe Gly Asp Asn Met Arg 165
170 175Glu Val Ala Val Thr Glu Gly Asp Lys Ile Glu Ala
Gln Leu Arg Leu 180 185 190Gly
Trp Gln Val Asn Thr Phe Ala Val Gly Asp Leu Val Glu Tyr Met 195
200 205Asp Ala Val Thr Asp Ala Glu Ile Asp
Ala Leu Met Lys Glu Tyr Ala 210 215
220Glu Leu Tyr Glu Phe Ser Glu Ala Asp Thr Asp Thr Ile Arg Tyr Gln225
230 235 240Ala Arg Glu Glu
Ile Ala Ile Glu Lys Ile Leu Val Arg Glu Gly Ala 245
250 255Lys Ala Phe Ser Asn Thr Phe Glu Asp Leu
His Gly Met Lys Gln Leu 260 265
270Pro Gly Leu Ala Thr Gln His Leu Met His Lys Gly Tyr Gly Phe Gly
275 280 285Ala Glu Gly Asp Trp Lys Thr
Ala Gly Met Thr Ala Ile Val Lys Ala 290 295
300Met Tyr Pro Asp Gly Asn Thr Ser Phe Met Glu Asp Tyr Thr Tyr
Asp305 310 315 320Tyr Glu
Arg Gln Leu Ile Leu Gly Ser His Met Leu Glu Val Cys Pro
325 330 335Ser Ile Ala Ala Asp Arg Pro
Arg Ile Glu Val His Lys Leu Gly Ile 340 345
350Gly Gly Lys Asp Ala Pro Ala Arg Ile Val Phe Glu Gly Arg
Ala Gly 355 360 365Ser Ala Lys Val
Leu Ser Leu Ile Asp Ile Gly Gly Arg Phe Arg Leu 370
375 380Ile Gln Gln Asp Ile Glu Cys Glu Lys Pro Phe Gln
Ser Met Pro Asn385 390 395
400Leu Pro Val Ala Arg Thr Met Trp Arg Pro Ala Pro Ser Phe Leu Glu
405 410 415Gly Leu Glu Cys Trp
Ile Ile Ala Gly Gly Ala His His Thr Val Leu 420
425 430Ser Tyr Asp Ile Thr Asp Glu Thr Val Arg Asp Phe
Ala Arg Ile Met 435 440 445Gly Ile
Glu Leu Val Val Ile Asn Lys Asp Thr Thr Lys Glu Lys Leu 450
455 460Glu Arg Asp Ile Met Ile Gly Asp Val Ile Tyr
Gly Arg465 470 4758477PRTunknownsequence
from cow rumen metagenome dataset 8Met Lys Phe Trp Phe Ile Thr Gly Ser
Gln Phe Leu Tyr Gly Glu Glu1 5 10
15Thr Ile Arg Gln Val Glu Glu Asp Ser Lys Lys Ile Val Asp Gly
Leu 20 25 30Lys Leu Pro Phe
Pro Val Glu Tyr Lys Leu Thr Val Lys Lys Glu Ser 35
40 45Glu Ile Glu Arg Ile Val Lys Glu Ala Asn Phe Asp
Asp Glu Cys Ala 50 55 60Gly Ile Ile
Thr Phe Cys His Thr Phe Ser Pro Ser Lys Met Trp Ile65 70
75 80Asn Gly Leu Ala Ile Leu Gln Lys
Pro Trp Leu His Phe His Thr Gln 85 90
95Phe Asn Glu Thr Ile Pro Asn Glu Ala Ile Asp Met Ala Tyr
Met Asn 100 105 110Leu His Gln
Ser Ala His Gly Asp Arg Glu His Gly Phe Ile Gly Ala 115
120 125Arg Leu Arg Met Pro Arg Ala Val Val Ala Gly
Tyr Trp Lys Asp Pro 130 135 140Glu Val
Gln Ala Lys Ile Ala Glu Trp Gln Arg Ala Ala Val Gly Val145
150 155 160Met Phe Ser Lys Ser Leu Lys
Ile Val Arg Phe Gly Asp Asn Met Arg 165
170 175Glu Val Ala Val Thr Glu Gly Asp Lys Ile Glu Ala
Gln Leu Lys Leu 180 185 190Gly
Trp Gln Val Asn Thr Phe Ala Val Gly Asp Leu Val Glu Tyr Met 195
200 205Asn Ala Val Thr Asp Ala Glu Ile Asp
Val Leu Met Lys Glu Tyr Ala 210 215
220Glu Leu Tyr Asp Tyr Asp Lys Ala Asp Glu Glu Thr Ile Arg Tyr Gln225
230 235 240Ala Arg Glu Glu
Ile Ala Ile Glu Lys Ile Leu Val Arg Glu Gly Ala 245
250 255Lys Ala Phe Ser Asn Thr Phe Glu Asp Leu
His Gly Met Gln Gln Leu 260 265
270Pro Gly Leu Ala Thr Gln His Leu Met His Lys Gly Tyr Gly Phe Gly
275 280 285Ala Glu Gly Asp Trp Lys Thr
Ala Gly Met Thr Ala Ile Val Lys Ala 290 295
300Met Tyr Pro Asp Gly Asn Thr Ser Phe Met Glu Asp Tyr Thr Tyr
Asp305 310 315 320Tyr Glu
Arg Lys Leu Ile Leu Gly Ser His Met Leu Glu Val Cys Pro
325 330 335Ser Ile Ala Ala Asp Arg Pro
Arg Ile Glu Val His Pro Leu Gly Ile 340 345
350Gly Gly Lys Glu Pro Pro Ala Arg Ile Val Phe Glu Gly Lys
Ala Gly 355 360 365Ser Ala Lys Val
Leu Ser Leu Ile Asp Ile Gly Gly Arg Leu Arg Leu 370
375 380Ile Gln Gln Asp Ile Glu Cys Glu Lys Pro Phe Gln
Ser Met Pro Asn385 390 395
400Leu Pro Val Ala Arg Thr Met Trp Arg Pro Ala Pro Ser Phe Leu Glu
405 410 415Gly Leu Glu Cys Trp
Ile Ile Ala Gly Gly Ala His His Thr Val Leu 420
425 430Ser Tyr Asp Ile Ser Asp Glu Thr Val Arg Asp Phe
Ala Arg Ile Met 435 440 445Gly Ile
Glu Leu Val Val Ile Asn Lys Asp Thr Thr Lys Glu Lys Leu 450
455 460Glu Arg Asp Ile Met Ile Gly Asp Met Ile Tyr
Gly Arg465 470 4759498PRTunknownsequence
from cow rumen metagenome dataset 9Met Ser Glu Met Lys Lys Tyr Gln Phe
Trp Phe Cys Thr Gly Ser Gln1 5 10
15Asp Leu Tyr Gly Asp Glu Cys Leu Ala His Val Ala Ala His Ser
Lys 20 25 30Glu Met Val Glu
Gly Leu Asn Lys Ser Gly Val Leu Pro Phe Glu Ile 35
40 45Val Trp Lys Pro Thr Leu Ile Thr Asn Glu Leu Ile
Arg Lys Thr Phe 50 55 60Asn Glu Ala
Asn Asn Asp Pro Asn Cys Ala Gly Val Ile Thr Trp Met65 70
75 80His Thr Phe Ser Pro Ala Lys Ser
Trp Ile Leu Gly Leu Gln Glu Phe 85 90
95Arg Lys Pro Leu Leu His Leu His Thr Gln Tyr Asn Glu Glu
Ile Pro 100 105 110Tyr Ala Thr
Met Asp Met Asp Phe Met Asn Glu Asn Gln Ala Ala His 115
120 125Gly Asp Arg Glu Tyr Ala His Ile Leu Ser Arg
Met Arg Ile Glu Arg 130 135 140Lys Val
Val Val Gly Phe Trp Lys Asp Ser Glu Val Gln Lys Lys Ile145
150 155 160Ala Ser Trp Met Arg Thr Ala
Ile Gly Ile Met Glu Ser Ser His Ile 165
170 175Arg Val Cys Arg Val Ala Asp Asn Met Arg Asn Val
Ala Val Thr Glu 180 185 190Gly
Asp Lys Val Glu Ala Gln Leu Lys Phe Gly Trp Glu Ile Asp Ala 195
200 205Tyr Pro Val Asn Glu Ile Ala Glu Ala
Val Ala Ala Val Ser Ala Ser 210 215
220Asp Thr Asn Ala Leu Val Asp Glu Tyr Tyr Ser Lys Tyr Asp Ile Cys225
230 235 240Leu Glu Gly Arg
Asp Pro Glu Glu Phe Lys Lys His Val Ala Val Gln 245
250 255Ala Gln Ile Glu Ile Gly Phe Glu Arg Phe
Leu Lys Glu Lys Asn Tyr 260 265
270Gln Ala Ile Val Thr His Phe Gly Asp Leu Gly Ala Leu Lys Gln Leu
275 280 285Pro Gly Leu Ala Ile Gln Arg
Leu Met Glu Lys Gly Tyr Gly Phe Gly 290 295
300Ala Glu Gly Asp Trp Lys Val Ala Ala Met Val Arg Leu Met Lys
Ile305 310 315 320Met Ser
Ala Gly Met Lys Asp Ala Lys Gly Ser Ser Met Leu Glu Asp
325 330 335Tyr Thr Tyr Asn Leu Val Lys
Gly Lys Glu Gly Ile Ile Gln Ala His 340 345
350Met Leu Glu Ile Cys Pro Ser Ile Ser Asp Gly Pro Ile Gln
Ile Lys 355 360 365Cys Gln Pro Leu
Ser Met Gly Asp Arg Glu Asp Pro Ala Arg Leu Val 370
375 380Phe Gln Ser Lys Thr Gly Ala Gly Ile Ala Thr Ser
Leu Ile Asp Leu385 390 395
400Gly Asn Arg Phe Arg Leu Ile Ile Gln Asp Val Glu Cys Lys Lys Val
405 410 415Glu Lys Pro Leu Pro
Lys Leu Pro Thr Ala Ile Asn Phe Trp Thr Pro 420
425 430Gln Pro Asp Phe Tyr Thr Gly Thr Glu Ala Trp Leu
Leu Ala Gly Gly 435 440 445Ala His
His Thr Ala Phe Ser Tyr Asp Ile Thr Ala Glu Gln Met Gly 450
455 460Asp Trp Ala Ala Ala Met Gly Ile Glu Ala Val
Phe Ile Asp Lys Asn465 470 475
480Thr Asn Ile Arg Asp Phe Lys Lys Asp Leu Met Leu Gly Glu Val Phe
485 490 495Tyr
Arg10487PRTunknownsequence from cow rumen metagenome dataset 10Met Gln
Arg Glu Phe Trp Phe Ile Val Gly Ser Gln Phe Leu Tyr Gly1 5
10 15Gln Asp Val Leu Asp Thr Val Asp
Ala Arg Ala Arg Glu Met Ala Ala 20 25
30Glu Leu Ser Lys Val Leu Pro Tyr Pro Leu Val Tyr Lys Val Thr
Ala 35 40 45Lys Thr Asn Lys Glu
Ile Ala Asp Thr Val Lys Glu Ala Asn Tyr Arg 50 55
60Asp Glu Val Met Gly Ile Val Thr Trp Cys His Thr Phe Ser
Pro Ser65 70 75 80Lys
Met Trp Ile Asn Gly Leu Val Asn Leu Gln Lys Pro Tyr Cys His
85 90 95Leu Ala Thr Gln Tyr Asn Arg
Glu Leu Pro Asn Glu Glu Ile Asp Ile 100 105
110Asp Phe Met Asn Leu Asn Gln Ala Ala His Gly Asp Arg Glu
His Gly 115 120 125Phe Ile Ala Ala
Arg Leu Arg Met Pro Arg Lys Val Ile Ala Gly Tyr 130
135 140Trp Gln Asp Glu Lys Val His Lys Arg Leu Ser Asp
Trp Met Lys Ala145 150 155
160Ala Val Gly Val Asp Val Ser Lys His Met Lys Val Met Arg Phe Gly
165 170 175Asp Asn Met Arg Glu
Val Ala Val Thr Glu Gly Asp Lys Val Glu Thr 180
185 190Gln Ile Lys Leu Gly Trp Gln Val Asn Thr Trp Ala
Val Gly Asp Leu 195 200 205Val Lys
Glu Met Asn Asn Val Thr Glu Ala Glu Ile Asp Ala Leu Phe 210
215 220Ala Glu Tyr Glu Ala Gln Tyr Asp Ile Ala Thr
Asp Asn Leu Ala Ala225 230 235
240Ile Arg Tyr Gln Ala Lys Glu Glu Ile Ala Met Lys Lys Met Leu Asp
245 250 255Arg Glu Gly Cys
Lys Ala Phe Ser Asn Thr Phe Gln Asp Leu Tyr Gly 260
265 270Met Glu Gln Leu Pro Gly Leu Ala Ser Gln His
Leu Met Ala Gln Gly 275 280 285Tyr
Gly Tyr Gly Gly Glu Gly Asp Trp Lys Val Ser Ala Met Thr Ala 290
295 300Ile Leu Lys Ala Met Gly Glu Asn Gly Asn
Gly Ala Ser Ala Phe Met305 310 315
320Glu Asp Tyr Thr Tyr His Leu Val Glu Gly Gln Glu Tyr Ser Leu
Gly 325 330 335Ala His Met
Leu Glu Val Cys Pro Ser Leu Ala Ala Asp Lys Pro Arg 340
345 350Ile Glu Thr His His Leu Gly Ile Gly Met
Asn Glu Lys Asp Pro Ala 355 360
365Arg Leu Val Phe Glu Gly Lys Ala Gly Lys Gly Ile Val Thr Ser Leu 370
375 380Ile Asp Met Gly Gly Arg Met Arg
Leu Ile Val Gln Asp Ile Glu Ala385 390
395 400Val Lys Pro Ile Leu Pro Met Pro Asn Leu Pro Val
Ala Arg Val Met 405 410
415Trp Arg Ala Met Pro Asp Leu Thr Thr Gly Val Glu Cys Trp Ile Thr
420 425 430Ala Gly Gly Ala His His
Thr Val Leu Ser Phe Asp Val Thr Pro Ala 435 440
445Met Leu Arg Asp Trp Ala Arg Met Met Asp Ile Glu Phe Val
Tyr Ile 450 455 460Thr Lys Asp Thr Thr
Pro Glu Glu Leu Glu Glu Glu Leu Leu Ile Lys465 470
475 480Asp Leu Val Trp Lys Leu Lys
48511499PRTunknownsequence from Human Microbiome dataset 11Met Leu Lys
Thr Lys Asn Tyr Gln Phe Trp Phe Cys Thr Gly Ser Gln1 5
10 15Asp Leu Tyr Gly Asp Glu Cys Leu Ala
His Val Ala Glu His Ser Lys 20 25
30Ile Ile Val Asp Ala Leu Asn Lys Ser Gly Asn Leu Pro Tyr Glu Val
35 40 45Val Trp Lys Pro Thr Met Ile
Thr Asn Glu Val Ile Arg Lys Thr Phe 50 55
60Asn Glu Ala Asn Thr Asp Glu Asn Cys Ala Gly Val Ile Thr Trp Met65
70 75 80His Thr Phe Ser
Pro Ala Lys Ser Trp Ile Leu Gly Leu Gln Glu Tyr 85
90 95Arg Lys Pro Leu Leu His Leu His Thr Gln
Phe Asn Arg Glu Ile Pro 100 105
110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn Gln Ala Ala His
115 120 125Gly Asp Arg Glu Tyr Gly His
Ile Phe Ser Arg Leu Asn Met Glu Arg 130 135
140Lys Val Val Ala Gly Tyr Trp Glu Asp Glu Asp Val Gln Lys Gln
Ile145 150 155 160Gly Ser
Trp Met Arg Thr Ala Val Gly Val Val Glu Ser Ser His Val
165 170 175Arg Val Met Arg Val Ala Asp
Asn Met Arg Asn Val Ala Val Thr Glu 180 185
190Gly Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp Glu Val
Asp Ala 195 200 205Tyr Pro Val Asn
Glu Val Val Glu Ala Val Asn Ala Val Ser Gln Ala 210
215 220Asp Ile Asp Thr Leu Val Glu Glu Tyr Tyr Asp Lys
Tyr Asp Ile Leu225 230 235
240Leu Glu Gly Arg Asp Glu Lys Glu Phe Arg Glu His Val Ala Val Gln
245 250 255Ala Gly Ile Glu Leu
Gly Phe Glu Arg Phe Leu Asp Glu Asn Asn Tyr 260
265 270Gln Ala Val Val Thr His Phe Gly Asp Leu Gly Gly
Leu Lys Gln Leu 275 280 285Pro Gly
Leu Ala Met Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Ala Glu Gly Asp Trp Lys Thr Ala Ala Met Val
Arg Val Met Lys Ile305 310 315
320Met Thr Gln Gly Met Lys Asp Ala Lys Gly Thr Ser Phe Met Glu Asp
325 330 335Tyr Thr Tyr Asn
Leu Val Ser Gly Lys Glu Gly Val Leu Glu Ala His 340
345 350Met Leu Glu Val Cys Pro Thr Ile Ala Asp Gly
Lys Ile Ser Ile Lys 355 360 365Glu
Gln Pro Leu Ser Met Gly Asn Arg Glu Asp Pro Ala Arg Leu Val 370
375 380Phe Thr Ser Lys Thr Gly Pro Ala Ile Ala
Thr Ser Leu Ile Asp Leu385 390 395
400Gly Asp Arg Phe Arg Leu Ile Ile Asn Asp Val Asp Cys Lys Lys
Thr 405 410 415Glu Lys Pro
Met Pro Lys Leu Pro Val Ala Thr Ala Phe Trp Thr Pro 420
425 430Gln Pro Asn Leu Lys Val Gly Thr Glu Ala
Trp Ile Leu Ala Gly Gly 435 440
445Ala His His Thr Ala Phe Ser Tyr Asp Leu Thr Ala Glu Gln Met Gly 450
455 460Asp Trp Ala Ala Cys Met Gly Ile
Glu Ala Val Tyr Ile Asp Lys Asp465 470
475 480Thr Thr Ile Arg Gln Phe Lys Asn Glu Leu Leu Trp
Asn Ser Val Ala 485 490
495Tyr Arg Lys12498PRTunknownsequence from Human Microbiome dataset 12Met
Thr Gly Val Lys Asn Tyr Lys Phe Trp Phe Cys Thr Gly Ser Gln1
5 10 15Asp Leu Tyr Gly Glu Glu Cys
Leu Ala His Val Ala Glu His Ser Arg 20 25
30Ile Ile Val Glu Ser Leu Asn Arg Ser Gly Ile Leu Pro Tyr
Glu Val 35 40 45Val Trp Lys Pro
Thr Leu Ile Thr Asn Glu Leu Ile Arg Arg Thr Phe 50 55
60Asn Glu Ala Asn Ala Asp Glu Glu Cys Ala Gly Val Ile
Thr Trp Met65 70 75
80His Thr Phe Ser Pro Ala Lys Ser Trp Ile Leu Gly Leu Gln Glu Phe
85 90 95Arg Lys Pro Leu Met His
Phe His Thr Gln Phe Asn Arg Glu Ile Pro 100
105 110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn
Gln Ser Ala His 115 120 125Gly Asp
Arg Glu Tyr Gly His Met Val Thr Arg Met Gly Ile Glu Arg 130
135 140Lys Val Ile Val Gly His Trp Ser Asp Glu Lys
Val Val Gly Arg Ile145 150 155
160Ala Gly Trp Met Arg Thr Ala Val Gly Ile Met Glu Ser Ser His Val
165 170 175Arg Val Val Arg
Phe Ala Asp Asn Met Arg Asn Val Ala Val Thr Glu 180
185 190Gly Asp Lys Val Glu Ala Gln Val Lys Phe Gly
Trp Glu Val Asp Ala 195 200 205Tyr
Pro Val Asn Glu Leu Cys Gln Tyr Val Lys Ala Val Pro Lys Gly 210
215 220Asp Ile Thr Ala Leu Val Asp Glu Tyr Tyr
Ser Lys Tyr Thr Ile Leu225 230 235
240Leu Glu Gly Arg Asp Pro Glu Glu Phe Lys Arg His Val Ala Val
Gln 245 250 255Ala Gln Ile
Glu Ala Gly Leu Glu Arg Phe Leu Val Glu Lys Asp Tyr 260
265 270His Ala Ile Val Thr His Phe Gly Asp Leu
Gly Glu Leu Gln Gln Leu 275 280
285Pro Gly Leu Ala Ile Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala
Ala Met Val Arg Leu Met Lys Ile305 310
315 320Met Ala Gln Gly Val Lys Asn Ala Lys Gly Thr Ser
Phe Met Glu Asp 325 330
335Tyr Thr Tyr Asn Leu Val Pro Gly Lys Glu Gly Ile Leu Glu Ala His
340 345 350Met Leu Glu Val Cys Pro
Ser Ile Ala Asp Gly Glu Ile Ser Ile Lys 355 360
365Val Asn Pro Leu Ser Met Gly Asp Arg Glu Asp Pro Ala Arg
Leu Val 370 375 380Phe Thr Ser Lys Thr
Gly His Gly Ile Ala Thr Ser Leu Val Asp Leu385 390
395 400Gly Thr Arg Phe Arg Leu Ile Ile Asn Asp
Val Glu Cys Arg Lys Thr 405 410
415Glu Lys Ala Met Pro Lys Leu Pro Val Ala Thr Ala Phe Trp Thr Pro
420 425 430Glu Pro Ser Leu Ala
Thr Gly Ala Glu Ala Trp Ile Leu Ala Gly Gly 435
440 445Ala His His Thr Ala Phe Ser Tyr Asp Leu Thr Ala
Glu Gln Met Gly 450 455 460Asp Trp Ala
Glu Ser Met Gly Ile Glu Val Val Tyr Ile Asp Lys Asp465
470 475 480Thr Thr Ile Arg Gly Leu Lys
Asn Glu Met Arg Trp Asn Gly Ala Val 485
490 495Tyr Arg13498PRTunknownsequence from Human
Microbiome dataset 13Met Ile Ala Val Lys Asn Tyr Lys Phe Trp Phe Cys Thr
Gly Ser Gln1 5 10 15Asp
Leu Tyr Gly Asp Glu Cys Leu Ala His Val Ala Glu His Ser Gly 20
25 30Ile Ile Val Asp Ser Leu Asn Lys
Ser Gly Ile Leu Pro Tyr Glu Val 35 40
45Val Leu Lys Pro Thr Leu Ile Thr Asn Glu Leu Ile Arg Arg Thr Phe
50 55 60Asn Glu Ala Asn Ala Asp Glu Glu
Cys Ala Gly Val Ile Thr Trp Met65 70 75
80His Thr Phe Ser Pro Ala Lys Ser Trp Ile Leu Gly Leu
Gln Glu Tyr 85 90 95Arg
Lys Pro Leu Met His Phe His Thr Gln Phe Asn Gln Glu Ile Pro
100 105 110Tyr Asp Ser Ile Asp Met Asp
Phe Met Asn Glu Asn Gln Ser Ala His 115 120
125Gly Asp Arg Glu Tyr Gly His Met Val Thr Arg Met Gly Ile Glu
Arg 130 135 140Lys Val Ile Val Gly His
Trp Arg Asp Glu Lys Val Val Gly Arg Ile145 150
155 160Ala Ala Trp Met Arg Thr Ala Val Gly Ile Met
Glu Ser Ser His Val 165 170
175Arg Val Ala Arg Phe Ala Asp Asn Met Arg Asn Val Ala Val Thr Glu
180 185 190Gly Asp Lys Val Glu Ala
Gln Met Lys Phe Gly Trp Glu Val Asp Ala 195 200
205Tyr Pro Val Asn Glu Leu Ala Glu Tyr Val Lys Ala Val Pro
Lys Gly 210 215 220Asp Ile Thr Ala Leu
Val Asp Glu Tyr Tyr Ser Lys Tyr Thr Ile Leu225 230
235 240Leu Glu Gly Arg Asp Pro Glu Glu Phe Lys
Arg His Val Ala Val Gln 245 250
255Ala Gln Ile Glu Ala Gly Leu Glu Lys Phe Leu Leu Glu Lys Asp Tyr
260 265 270His Ala Ile Val Thr
His Phe Gly Asp Leu Gly Glu Leu Gln Gln Leu 275
280 285Pro Gly Leu Ala Ile Gln Arg Leu Met Glu Lys Gly
Tyr Gly Phe Gly 290 295 300Ala Glu Gly
Asp Trp Lys Thr Ala Ala Met Val Arg Leu Met Lys Ile305
310 315 320Met Thr Gln Gly Met Lys Asp
Ala Lys Gly Thr Ser Phe Met Glu Asp 325
330 335Tyr Thr Tyr Asn Leu Val Pro Gly Lys Glu Gly Ile
Leu Glu Ala His 340 345 350Met
Leu Glu Val Cys Pro Thr Ile Ala Asp Gly Glu Ile Ser Ile Lys 355
360 365Ala Cys Pro Leu Ser Met Gly Asp Arg
Glu Asp Pro Ala Arg Leu Val 370 375
380Phe Thr Ser Lys Thr Gly His Gly Ile Ala Ala Ser Leu Val Asp Leu385
390 395 400Gly Thr Arg Phe
Arg Leu Ile Ile Asn Asp Val Glu Cys Lys Lys Thr 405
410 415Glu Lys Pro Met Pro Lys Leu Pro Val Ala
Thr Ala Phe Trp Thr Pro 420 425
430Glu Pro Asn Leu Ala Thr Gly Ala Glu Ser Trp Ile Leu Ala Gly Gly
435 440 445Ala His His Thr Ala Phe Ser
Tyr Asp Leu Thr Ala Glu Gln Met Gly 450 455
460Asp Trp Ala Asp Ala Met Gly Ile Glu Thr Val Tyr Ile Asp Lys
Asp465 470 475 480Thr Thr
Ile Arg Gly Leu Lys Asn Glu Leu Arg Trp Asn Ala Ala Ala
485 490 495Tyr Arg14499PRTunknownsequence
from Human Microbiome dataset 14Met Leu Lys Lys Lys Glu Tyr Lys Phe Trp
Phe Cys Thr Gly Ser Gln1 5 10
15Asp Leu Tyr Gly Asp Glu Cys Leu Ala His Val Ala Glu His Ala Lys
20 25 30Ile Ile Val Glu Lys Leu
Asn Glu Ser Gly Val Leu Pro Tyr Glu Val 35 40
45Val Trp Lys Pro Thr Leu Ile Thr Asn Glu Leu Ile Arg Lys
Thr Phe 50 55 60Asn Glu Ala Asn Ile
Asp Asp Glu Cys Ala Gly Val Ile Thr Trp Met65 70
75 80His Thr Phe Ser Pro Ala Lys Ser Trp Ile
Leu Gly Leu Gln Glu Phe 85 90
95Arg Lys Pro Leu Leu His Leu His Thr Gln Phe Asn Met Glu Ile Pro
100 105 110Tyr Asp Thr Ile Asp
Met Asp Phe Met Asn Glu Asn Gln Ser Ala His 115
120 125Gly Gly Arg Glu Phe Gly His Ile Phe Thr Arg Leu
Gly Ile Glu Arg 130 135 140Lys Val Val
Val Gly His Trp Ser Asp Glu Lys Val Gln Glu Lys Ile145
150 155 160Ala Ser Trp Met Arg Thr Ala
Val Gly Val Ile Glu Ser Ser His Val 165
170 175Arg Val Met Arg Val Ala Asp Asn Met Arg Asn Val
Ala Val Thr Glu 180 185 190Gly
Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp Glu Val Asp Ala 195
200 205Tyr Pro Val Asn Glu Ile Ala Glu Ser
Val Asp Ala Val Ser Ala Ala 210 215
220Asp Val Asn Thr Leu Val Glu Glu Tyr Tyr Asp Lys Tyr Glu Ile Leu225
230 235 240Leu Glu Gly Arg
Asp Pro Glu Glu Phe Arg Lys His Val Ala Val Gln 245
250 255Ala Gln Ile Glu Leu Gly Phe Glu Arg Phe
Leu Glu Glu Lys Asn Tyr 260 265
270Gln Ala Ile Val Thr His Phe Gly Asp Leu Gly Val Leu Lys Gln Leu
275 280 285Pro Gly Leu Ala Ile Gln Arg
Leu Met Gln Lys Gly Tyr Gly Phe Gly 290 295
300Ala Glu Gly Asp Trp Lys Thr Ala Ala Met Val Arg Ile Met Lys
Ile305 310 315 320Met Thr
Glu Gly Met Lys Asp Ala Lys Gly Thr Ser Met Leu Glu Asp
325 330 335Tyr Thr Tyr Asn Phe Val Pro
Gly Lys Glu Gly Ile Leu Gln Ala His 340 345
350Met Leu Glu Ile Cys Pro Ser Ile Ala Asp Gly Pro Ile Ser
Ile Lys 355 360 365Val Asn Pro Leu
Ser Met Gly Asp Arg Glu Asp Pro Ala Arg Leu Val 370
375 380Phe Thr Ser Lys Glu Gly Lys Gly Ile Ala Thr Ser
Leu Ile Asp Leu385 390 395
400Gly Asp Arg Phe Arg Leu Ile Ile Asn Thr Val Asp Cys Lys Lys Asn
405 410 415Glu Lys Pro Met Pro
Lys Leu Pro Val Ala Thr Asn Phe Trp Thr Pro 420
425 430Glu Pro Asp Leu Ala Thr Gly Ala Glu Ala Trp Ile
Leu Cys Gly Gly 435 440 445Ala His
His Thr Ala Phe Ser Tyr Asp Ile Thr Ala Glu Gln Met Gly 450
455 460Asp Trp Ala Ala Met Met Gly Ile Glu Ala Val
Tyr Ile Asp Lys Asp465 470 475
480Thr Thr Ile Arg Asn Leu Lys Asn Glu Leu Arg Trp Asn Glu Leu Ala
485 490 495Phe Arg
Lys15498PRTunknownsequence from Human Microbiome dataset 15Met Lys Ala
Ala Lys Asp Tyr Lys Phe Trp Phe Cys Thr Gly Ser Gln1 5
10 15Asp Leu Tyr Gly Asp Glu Cys Leu Ala
His Val Ala Glu His Ser Arg 20 25
30Ile Ile Val Asp Ala Leu Asn Lys Ser Gly Val Leu Pro Tyr Glu Ile
35 40 45Val Trp Lys Pro Thr Leu Ile
Thr Asn Glu Leu Ile Arg Lys Thr Phe 50 55
60Asn Glu Ala Asn Ala Asp Glu Asn Cys Ala Gly Val Ile Thr Trp Met65
70 75 80His Thr Phe Ser
Pro Ala Lys Ser Trp Ile Leu Gly Leu Gln Glu Phe 85
90 95Arg Lys Pro Leu Leu His Phe His Thr Gln
Phe Asn Arg Glu Ile Pro 100 105
110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn Gln Ala Ala His
115 120 125Gly Asp Arg Glu Tyr Gly His
Ile Val Ser Arg Met Gly Ile Glu Arg 130 135
140Lys Ile Ile Val Gly Tyr Trp Glu Asp Arg Asp Val Gln Glu Lys
Ile145 150 155 160Ala Ser
Trp Met Leu Thr Ala Ile Gly Ile Met Glu Ser Ser His Ile
165 170 175Arg Val Cys Arg Ile Ala Asp
Asn Met Arg Asn Val Ala Val Thr Glu 180 185
190Gly Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp Glu Ile
Asp Ala 195 200 205Tyr Pro Val Asn
Glu Ile Ala Glu Tyr Val Ala Ala Val Pro Gln Gly 210
215 220Glu Ile Asn Ala Leu Val Glu Glu Tyr Tyr Ser Lys
Tyr Asp Ile Ile225 230 235
240Leu Glu Gly Arg Asp Pro Gln Glu Phe Arg Glu His Val Ala Val Gln
245 250 255Ala Gly Ile Glu Ile
Gly Phe Glu Lys Phe Leu Glu Glu Lys Asn Tyr 260
265 270Gln Ala Ile Val Thr His Phe Gly Asp Leu Gly Ser
Leu Lys Gln Leu 275 280 285Pro Gly
Leu Ala Ile Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala Ala Met Val
Arg Leu Met Lys Ile305 310 315
320Met Thr Ala Gly Val Lys Asn Pro Lys Gly Thr Ser Phe Met Glu Asp
325 330 335Tyr Thr Tyr Asn
Leu Val Pro Gly Lys Glu Gly Val Leu Glu Ala His 340
345 350Met Leu Glu Val Cys Pro Ser Val Ala Asp Gly
Pro Ile Gly Ile Lys 355 360 365Val
Cys Pro Leu Ser Met Gly Asp Arg Glu Asp Pro Ala Arg Leu Val 370
375 380Tyr Thr Ser Lys Thr Gly Pro Ala Ile Ala
Thr Ser Leu Ile Asp Leu385 390 395
400Gly Asn Arg Phe Arg Leu Ile Ile Asn Glu Val Glu Cys Lys Lys
Val 405 410 415Glu Lys Pro
Met Pro Lys Leu Pro Val Ala Thr Ala Phe Trp Thr Pro 420
425 430Tyr Pro Asp Leu Lys Thr Gly Ala Glu Ala
Trp Ile Leu Ala Gly Gly 435 440
445Ala His His Thr Ala Phe Ser Tyr Asp Leu Thr Ala Glu Gln Met Gly 450
455 460Asp Trp Ala Ala Ala Met Gly Ile
Glu Ala Val Tyr Ile Asp Lys Asp465 470
475 480Thr Thr Ile Arg Asn Phe Lys Arg Asp Leu Gln Leu
Gly Asn Ile Val 485 490
495Tyr Arg16498PRTunknownsequence from Human Microbiome dataset 16Met Val
Thr Gly Arg Asn Tyr Lys Phe Trp Phe Cys Thr Gly Ser Gln1 5
10 15Asp Leu Tyr Gly Asp Glu Cys Leu
Arg Lys Val Ala Glu His Ser Arg 20 25
30Ile Ile Val Glu Glu Leu Asn Lys Ser Gly Val Leu Pro Phe Glu
Leu 35 40 45Val Trp Lys Pro Thr
Leu Ile Thr Asn Glu Leu Ile Arg Lys Thr Phe 50 55
60Asn Glu Ala Asn Ala Asp Asp Glu Cys Ala Gly Val Ile Thr
Trp Met65 70 75 80His
Thr Phe Ser Pro Ala Lys Ser Trp Ile Leu Gly Leu Lys Glu Tyr
85 90 95Arg Lys Pro Leu Cys His Leu
His Thr Gln Phe Asn Gln Glu Ile Pro 100 105
110Tyr Asp Thr Ile Asp Met Asp Phe Met Asn Glu Asn Gln Ser
Ala His 115 120 125Gly Gly Arg Glu
Tyr Gly His Ile Val Thr Arg Met Gly Ile Glu Arg 130
135 140Lys Val Ile Val Gly His Trp Ala Asp Lys Lys Val
Gln Glu Arg Leu145 150 155
160Ala Ser Trp Met Arg Thr Ala Val Gly Ile Met Glu Ser Ser His Ile
165 170 175Arg Val Cys Arg Val
Ala Asp Asn Met Arg Asn Val Ala Val Thr Glu 180
185 190Gly Asp Lys Val Glu Ala Gln Ile Lys Phe Gly Trp
Glu Val Asp Ala 195 200 205Tyr Pro
Val Asn Glu Val Cys Asp Tyr Val Lys Asp Val Ser Lys Gly 210
215 220Asp Ile Asp Val Leu Val Glu Glu Tyr Tyr Asn
Lys Tyr Asp Ile Leu225 230 235
240Phe Glu Gly Arg Asp Pro Glu Glu Phe Lys Arg His Val Ala Val Gln
245 250 255Ala Ala Ile Glu
Ile Gly Phe Glu Arg Phe Leu Glu Glu Lys Asn Tyr 260
265 270Gln Ala Val Val Thr His Phe Gly Asp Leu Gly
Gly Leu Gln Gln Leu 275 280 285Pro
Gly Leu Ala Met Gln Arg Leu Met Glu Lys Gly Tyr Gly Phe Gly 290
295 300Ala Glu Gly Asp Trp Lys Thr Ala Ala Met
Val Arg Leu Met Lys Ile305 310 315
320Met Thr Ala Gly Val Lys Asp Ala Lys Gly Thr Ser Phe Met Glu
Asp 325 330 335Tyr Thr Tyr
Asn Leu Val Pro Gly Lys Glu Gly Ile Leu Gln Ser His 340
345 350Met Leu Glu Val Cys Pro Thr Ile Ala Asp
Gly Lys Ile Gly Ile Lys 355 360
365Val Cys Pro Leu Ser Met Gly Asp Arg Glu Asp Pro Ala Arg Leu Phe 370
375 380Thr Ser Lys Thr Gly Pro Ala Val
Ala Thr Ser Leu Val Asp Leu Gly385 390
395 400Asp Arg Phe Arg Leu Ile Ile Asn Asp Val Asp Cys
Lys Lys Val Glu 405 410
415Lys Pro Met Pro Lys Leu Pro Val Gly Ser Ala Phe Trp Thr Pro Gln
420 425 430Pro Asp Leu Ala Thr Gly
Ala Glu Ala Trp Ile Leu Ala Gly Gly Ala 435 440
445His His Thr Ala Phe Ser Tyr Asp Leu Thr Ala Glu Gln Met
Gly Asp 450 455 460Trp Ala Ala Ala Met
Gly Ile Glu Ala Val Tyr Ile Asp Lys Asp Thr465 470
475 480Thr Ile Arg Asn Phe Lys Asn Glu Leu Arg
Trp Asn Glu Val Ala Phe 485 490
495Arg Lys17491PRTunknownsequence from cow rumen metagenome dataset
17Met Glu Asp Ile Met Lys Arg Glu Phe Trp Phe Ile Val Gly Ser Gln1
5 10 15Phe Leu Tyr Gly Gln Asp
Val Leu Asp Thr Val Asp Ala Arg Ala Lys 20 25
30Glu Met Ala Ala Glu Leu Ser Lys Val Leu Pro Tyr Pro
Leu Val Tyr 35 40 45Lys Val Thr
Ala Lys Thr Asn Lys Glu Ile Thr Asp Val Ile Lys Glu 50
55 60Ala Asn Tyr Arg Asp Glu Cys Ala Gly Ile Val Thr
Trp Cys His Thr65 70 75
80Phe Ser Pro Ser Lys Met Trp Ile Asn Gly Leu Ala Asn Leu Gln Lys
85 90 95Pro Tyr Cys His Leu Ala
Thr Gln Tyr Asn Lys Glu Ile Pro Asn Asp 100
105 110Glu Ile Asp Met Asp Phe Met Asn Leu Asn Gln Ala
Ala His Gly Asp 115 120 125Arg Glu
His Gly Phe Ile Ala Ala Arg Leu Arg Leu Pro Arg Lys Val 130
135 140Ile Ala Gly Phe Trp Gln Asp Glu Lys Ile His
Lys Arg Leu Ser Asp145 150 155
160Trp Met Arg Ala Ala Val Gly Val Ala Val Ser Lys Lys Met Lys Val
165 170 175Met Arg Phe Gly
Asp Asn Met Arg Glu Val Ala Val Thr Glu Gly Asp 180
185 190Lys Val Glu Val Gln Thr Lys Leu Gly Trp Gln
Val Asn Thr Trp Ala 195 200 205Val
Gly Asp Leu Val Lys Glu Met Gly Lys Val Thr Glu Ala Glu Ile 210
215 220Asp Ala Leu Val Ala Glu Tyr Glu Ala Asn
Tyr Asp Ile Ala Thr Asp225 230 235
240Asn Thr Ala Ala Ile Arg Tyr Gln Ala Arg Glu Glu Ile Ala Met
Lys 245 250 255Lys Met Leu
Asp Arg Glu Gly Cys Arg Ala Phe Thr Asn Thr Phe Gln 260
265 270Asp Leu Tyr Gly Met Glu Gln Leu Pro Gly
Leu Ala Ser Gln His Leu 275 280
285Met Ala Gln Gly Tyr Gly Tyr Gly Gly Glu Gly Asp Trp Lys Val Ser 290
295 300Ala Met Thr Ala Ile Leu Lys Ala
Met Gly Glu Asn Gly Asn Gly Ala305 310
315 320Ser Gly Phe Met Glu Asp Tyr Thr Tyr His Leu Val
Glu Gly Gln Glu 325 330
335Tyr Ser Leu Gly Ala His Met Leu Glu Val Cys Pro Ser Leu Ala Ala
340 345 350Asp Lys Pro Arg Ile Glu
Thr His His Leu Gly Ile Gly Met Asn Glu 355 360
365Lys Asp Pro Ala Arg Leu Val Phe Glu Gly Lys Ala Gly Lys
Gly Ile 370 375 380Val Val Ser Leu Ile
Asp Met Gly Gly Arg Leu Arg Leu Ile Val Gln385 390
395 400Asp Ile Glu Ala Val Lys Pro Ile Leu Pro
Met Pro Asn Leu Pro Val 405 410
415Ala Arg Val Met Trp Arg Ala Met Pro Asp Leu Thr Thr Gly Val Glu
420 425 430Cys Trp Ile Thr Ala
Gly Gly Ala His His Thr Val Leu Ser Tyr Asp 435
440 445Val Thr Ala Glu Gln Met Arg Asp Trp Ala Arg Met
Met Asp Ile Glu 450 455 460Phe Val His
Ile Thr Lys Asp Thr Thr Pro Glu Lys Leu Glu Glu Glu465
470 475 480Leu Leu Val Lys Asp Leu Val
Trp Lys Leu Lys 485
49018488PRTunknownsequence from cow rumen metagenome dataset 18Met Ser
Lys Glu Phe Trp Phe Val Val Gly Ser Gln Asp Leu Tyr Gly1 5
10 15Glu Glu Val Leu Lys Ile Val Ala
Glu Arg Ala Ala Glu Met Ala Ala 20 25
30Trp Leu Ser Glu Lys Leu Pro Tyr Pro Leu Ile Tyr Lys Val Thr
Ala 35 40 45Met Ser Ser Asn Gln
Ile Thr Ser Val Met Lys Glu Ala Asn Phe Asp 50 55
60Asp Asn Cys Leu Gly Val Val Thr Trp Cys His Thr Phe Ser
Pro Ser65 70 75 80Lys
Met Trp Leu Thr Gly Leu Asp Leu Leu Gln Lys Pro Trp Cys His
85 90 95Phe Ala Thr Gln Tyr Asn Leu
Glu Ile Pro Asn Glu Glu Ile Asp Met 100 105
110Asp Phe Met Asn Leu Asn Gln Ala Ala His Gly Asp Arg Glu
His Gly 115 120 125Phe Ile Gly Ala
Arg Leu Arg Lys Ala Arg Lys Val Val Ala Gly Tyr 130
135 140Trp Lys Asp Glu Lys Val Ile Ala Arg Leu Ala Glu
Phe Gln Lys Val145 150 155
160Ala Val Gly Val Asp Ala Ser Lys His Met Lys Val Met Arg Phe Gly
165 170 175Asp Asn Met Arg Asp
Val Ala Val Thr Glu Gly Asp Lys Val Glu Val 180
185 190Gln Lys Lys Leu Gly Trp Glu Val Asn Thr Trp Ala
Val Gly Asp Leu 195 200 205Val Lys
Glu Met Asn Ala Val Thr Asp Glu Glu Val Glu Ala Leu Phe 210
215 220Asn Glu Tyr Lys Ala Ser Tyr Asp Ile Asn Thr
Asp Asn Ile Tyr Ala225 230 235
240Ile Lys Tyr Gln Ala Arg Glu Glu Ile Ala Ile Lys Lys Met Met Asp
245 250 255Arg Asn Gly Cys
Lys Ala Phe Ser Asn Thr Phe Gln Asp Leu Tyr Gly 260
265 270Met Glu Gln Leu Pro Gly Leu Ala Ser Gln His
Leu Met Ser Leu Gly 275 280 285Tyr
Gly Tyr Gly Gly Glu Gly Asp Trp Lys Val Ser Ala Met Thr Ala 290
295 300Ile Leu Lys Ala Met Gly Glu Asn Gly Asn
Gly Ala Ser Ala Phe Met305 310 315
320Glu Asp Tyr Thr Tyr His Leu Val Lys Gly His Glu Tyr Ser Leu
Gly 325 330 335Ala His Met
Leu Glu Val Cys Pro Ser Leu Ala Ala Asp Lys Pro Arg 340
345 350Ile Glu Thr His His Leu Gly Ile Gly Met
Asn Glu Lys Asp Pro Ala 355 360
365Arg Leu Val Phe Glu Gly Lys Glu Gly Arg Gly Ile Val Ala Ser Leu 370
375 380Ile Asp Met Gly Gly Arg Leu Arg
Leu Ile Val Gln Asp Ile Glu Ala385 390
395 400Val Lys Pro Ile Met Pro Met Pro Asn Leu Pro Val
Ala Arg Val Met 405 410
415Trp Arg Ala Leu Pro Asp Leu Thr Asp Gly Val Glu Cys Trp Ile Thr
420 425 430Ala Gly Gly Ala His His
Thr Val Leu Ser Tyr Asp Val Thr Pro Glu 435 440
445Met Met Arg Asp Phe Ala Lys Phe Met Asp Ile Glu Phe Val
His Ile 450 455 460Asp Lys Asp Thr Thr
Val Glu Lys Leu Glu Asp Glu Leu Met Val Lys465 470
475 480Asp Leu Val Trp Lys Met Lys Gly
48519488PRTunknownsequence from cow rumen metagenome dataset 19Met
Gly Asn Lys Lys Asn Phe Trp Phe Val Val Gly Ser Gln Phe Leu1
5 10 15Tyr Gly Asn Glu Val Leu Glu
Thr Val Ala Ala Arg Ala Gln Glu Met 20 25
30Ala Glu Lys Met Ser Lys Ser Leu Pro Tyr Glu Leu Lys Phe
Lys Gly 35 40 45Ile Val Lys Thr
Trp Asp Glu Ala Thr Gln Tyr Ala Lys Glu Ala Asn 50 55
60Phe Asp Asp Asn Cys Cys Gly Val Ile Thr Trp Cys His
Thr Phe Ser65 70 75
80Pro Ser Lys Met Trp Ile Glu Ala Phe Arg Leu Leu Gln Lys Pro Leu
85 90 95Leu His Phe Ala Thr Gln
Tyr Asn Arg Tyr Ile Pro Asp Lys Glu Ile 100
105 110Asp Met Asp Phe Met Asn Leu Asn Gln Ala Ala His
Gly Asp Arg Glu 115 120 125His Gly
Phe Ile Ile Ala Arg Met Arg Leu Gln Gln Lys Ile Val Thr 130
135 140Gly Phe Trp Glu Asp Gln Pro Val Leu Asp Glu
Ile Gly Thr Trp Met145 150 155
160Arg Ala Ala Val Ala Tyr Asp Phe Ser Arg Asn Leu Arg Val Met Arg
165 170 175Phe Gly Asp Asn
Met Arg Glu Val Ala Val Thr Glu Gly Asp Lys Val 180
185 190Glu Ala Gln Ile Lys Phe Gly Trp Gln Val Asn
Thr Trp Pro Val Gly 195 200 205Lys
Leu Val Glu Glu Ile Gly Lys Val Thr Glu Glu Glu Val Asp Glu 210
215 220Leu Leu Lys Val Tyr Thr Asp Thr Tyr Glu
Leu Ala Thr Asp Asp Ile225 230 235
240Glu Thr Ile Arg Tyr Gln Ala Arg Glu Glu Ile Ala Met Lys Lys
Met 245 250 255Met Thr Ala
Glu Gly Ala Asn Ala Phe Val Asn Thr Phe Gln Asp Leu 260
265 270Ile Gly Met Lys Gln Leu Pro Gly Ile Ala
Ser Gln His Leu Met Ala 275 280
285Gln Gly Tyr Gly Tyr Gly Ala Glu Gly Asp Trp Lys Leu Ser Ala Leu 290
295 300Val Ser Ile Val Lys Lys Met Thr
Glu Gly Met Thr Gly Gly Thr Ser305 310
315 320Phe Met Glu Asp Tyr Thr Tyr His Leu Asp Pro Asn
Ala Glu Tyr Ala 325 330
335Leu Gly Ala His Met Leu Glu Val Cys Pro Ser Ile Ala Ala Asp Lys
340 345 350Pro Arg Ile Glu Val His
Pro Leu Gly Ile Gly Asp Arg Glu Asp Pro 355 360
365Ala Arg Leu Val Phe Glu Gly His Glu Gly Asp Ala Val Val
Val Thr 370 375 380Leu Ile Asp Met Gly
Glu Arg Phe Arg Met Leu Val Gln Asp Ile His385 390
395 400Cys Val Lys Pro Ile Tyr Glu Met Pro Asn
Leu Pro Val Ala Arg Val 405 410
415Met Trp Glu Gly Lys Pro Ser Leu Asn Glu Gly Leu Lys Met Trp Leu
420 425 430Met Ala Gly Gly Ala
His His Ser Val Leu Ser Tyr Asp Ala Thr Pro 435
440 445Glu Met Leu Lys Asp Leu Ala Arg Met Met Asp Ile
Glu Phe Val His 450 455 460Ile Thr Ala
Asp Ser Lys Pro Glu Glu Phe Glu Lys Asp Leu Phe Phe465
470 475 480Ala Asp Leu Ala Trp Lys Leu
Lys 48520470PRTunknownsequence from cow rumen metagenome
dataset 20Met Lys Lys Ile Tyr Phe Ile Thr Gly Ser Gln Asp Leu Tyr Gly
Glu1 5 10 15Asp Val Leu
Lys Thr Val Ala Lys Asp Ser Gln Glu Met Val Asn Phe 20
25 30Leu Asp Glu Gln Val Gly Glu Arg Ala Glu
Ile Glu Phe Leu Gly Val 35 40
45Val Arg Asp Ser Glu Ile Cys Leu Asp Phe Ile Leu Lys Ala Asn Phe 50
55 60Asp Lys Glu Cys Ile Gly Ile Ile Thr
Trp Met His Thr Phe Ser Pro65 70 75
80Ala Lys Met Trp Ile Arg Gly Leu Lys Val Leu His Lys Pro
Met Leu 85 90 95His Leu
His Thr Gln Tyr Asn Glu Lys Leu Pro Tyr Asp Ser Ile Asp 100
105 110Met Asp Phe Met Asn Leu Asn Gln Ala
Ala His Gly Asp Arg Glu Phe 115 120
125Gly Phe Ile Ala Ala Arg Met Asn Ile Lys Gln His Val Leu Ser Gly
130 135 140Tyr Tyr Lys Asn Lys Asp Phe
Ile Glu Gly Val Lys Gln Tyr Ile Asp145 150
155 160Val Cys Leu Ser Ile Asp Ala Ala Lys Tyr Leu Arg
Val Ala Met Phe 165 170
175Gly Ser Asn Met Arg Asp Val Ala Val Thr Asp Gly Asp Arg Val Gln
180 185 190Ser Glu Ile Asp Phe Gly
Trp Asn Val Asn Tyr Tyr Gly Ile Gly Asp 195 200
205Leu Val Asp Ile Ile Asn Lys Val Lys Asp Glu Glu Ile Asp
Ala Gln 210 215 220Phe Glu Glu Tyr Lys
Lys Arg Tyr Thr Ile Asn Thr Thr Asn Ile Glu225 230
235 240Ala Ile Lys Glu Gln Ala Lys Tyr Glu Val
Ala Leu Lys Lys Phe Ile 245 250
255Lys Lys Glu Asn Val Gln Ala Phe Thr Asp Asn Phe Gln Asp Leu His
260 265 270Gly Leu Lys Gln Leu
Pro Gly Leu Ala Val Gln Asp Leu Met Gln Glu 275
280 285Gly Ile Ser Phe Gly Pro Glu Gly Asp Tyr Lys Thr
Pro Ala Leu Leu 290 295 300Ala Thr Leu
Leu Pro Met Thr Lys Tyr Arg Lys Gly Ala Thr Gly Phe305
310 315 320Ile Glu Asp Tyr Thr Tyr Asp
Leu Ile Glu Gly Lys Glu Ile Glu Leu 325
330 335Gly Ser His Met Leu Glu Val Pro Pro Ser Phe Ala
Thr Ser Lys Pro 340 345 350Glu
Ile Gln Val Arg Pro Leu Ser Ile Gly Asp Lys Ala Ala Pro Ala 355
360 365Arg Leu Val Phe Asp Ser Ile Glu Gly
Glu Gly Leu Gln Ile Thr Met 370 375
380Val Asp Met Gly Thr His Phe Arg Ile Ile Ala Ala Lys Ile Gln Leu385
390 395 400Val Lys Gln Pro
Ala Pro Met Pro Lys Leu Pro Val Ala Arg Ile Met 405
410 415Trp Lys His Ile Pro Asn Phe Lys Ile Ser
Thr Glu Ala Trp Met Leu 420 425
430Tyr Gly Gly Gly His His Ser Val Ile Thr Thr Ala Leu Thr Ile Glu
435 440 445Asp Ile Lys Leu Phe Ala Lys
Leu Thr Gly Thr Glu Leu Cys Val Ile 450 455
460Asp Glu Asn Thr Lys Ile465
470211488DNAunknownSequence from the human microbiome dataset
21atgaacttga agccacacac cttctggttc gtcactggtt cccaacactt gtacggtcca
60gaaactttgg aacaagtcgc tgaacactcc agaatcgtcg ctactgaatt tgacaaggac
120ccagttttca cctacccaat cgtcttcaag ccaatcgtta ccactccaga tgaaatctac
180aagttgatct tggaagctaa caacgacgaa tcctgtgctg gtatcatgac ttggatgcac
240accttctctc cagctaagat gtggatcgct ggtttgtccc aattgcaaaa gccattgttg
300cacttccaca ctcaattcaa cagagatatc ccttgggaaa ccatcgacat ggatttcatg
360aacttgaacc aatctgctca cggtgacaga gaatacggtc acatcggtgc tagattgggt
420atcgctagaa aggttgtcgt tggtcactgg gaagacggtg aagtcagagg ttctatcgct
480ggttggatga gaaccgctgc tgcttacgct gaatccagaa gattgaaggt cgctagattc
540ggtgacaaca tgagacaagt cgctgttacc gaaggtgaca aggttgaagc tcaaatcaag
600ttgggttggt ctgtcaacgg ttacggtatc ggtgacttgg ttcaatccat gaacgaagtc
660ggtgacgaag aagttaaggc tttgttgaac gaatacgctg aatcttactc catcactaag
720gaaggtttgt ctgacggtcc agtcagagat tccatcgctt accaagctag aatcgaaatc
780gctttgagaa gattcttgga agaaggtggt ttcggtgctt tcaccactac cttcgaagac
840ttgcacggta tgaagcaatt gccaggtttg gctgttcaaa gattgatgga atctggttac
900ggtttcggtg gtgaaggtga ctggaagact gctgctttga ccagagtctt gaaggttttg
960gctgacaaca agtctacttc cttcatggaa gattacacct accacttcga accaggtaac
1020cacatgatct tgggttctca catgttggaa gtctgtccaa ctatcgcttt ggacaagcca
1080accttggaag ttcacccatt gggtatcggt ggtaaaggtg acccagctag attggtcttc
1140aacggtcaag atggtccagc tgttaacgct tctttgatcg acttgggtca cagattcaga
1200ttgttggtca acgtcgttga cggtgtcaag gttgaacaac caatgccaaa gttgccagtc
1260gctagagttt tgtggaagcc acaaccatct ttgagagaat ctgctgaagc ctggatcttg
1320gctggtggtg ctcaccacac tgttttgtct tacgctatga ccgctgaaca cttgtccgac
1380tgggctgaaa tgaccggtat cgaagctgtc gttatcgaca aggatactac catcccaaga
1440ttcaagaacg aattgagatg gtctgaagct gcttacagat tgagatga
1488221488DNAunknownSequence from the human microbiome dataset
22atgaagttga agccacactc tttctggttc gttactggtt cccaacactt gtacggtcca
60gaaaccttgg aagaagtcgc tggtcactcc agaatcatcg ctgaacaatt ggacaaggac
120ccagctatcg gtttcccagt tgtcttcaag ccaatcgtta ccactccaga cgaaatctac
180aagttgatct tggctgctaa cggtgacgaa acttgtgctg gtatcatcac ttggatgcac
240accttctctc cagctaagat gtggatcgct ggtttgtccc aattgcaaaa gccattgttg
300cacttccaca ctcaattcaa cagagacatc ccttgggaaa ccatcgacat ggatttcatg
360aacttgaacc aatctgctca cggtgacaga gaatacggtc acatcggtgc tagattgggt
420atcaacagaa agatcgttgt cggtcactgg gaagacgaag aagtcagagc ttctttggct
480ggttggatga gaactgctgt tgcttacgct gaatccagac aattgaaggt cgctagattc
540ggtgacaaca tgagagaagt tgctgtcacc gaaggtgaca aggttgaagc tcaaatcaag
600ttcggttggt ctgttaacgg ttacggtgtc ggtgacttgg ttcaagtctt gaacgaagtc
660accgatgctg aagctgaagc cttgttgaag gaatacgctg aacaatacac tatcacccaa
720gctggtttgt cttccggtcc aatcagagac tccatcgctt accaagccaa gttggaaatc
780gctatgaaga gattcttgga acaaggtggt ttcggtgctt tcaccactac cttcgaagac
840ttgcacggtt tgaagcaatt gccaggtttg gctgttcaaa gattgatgga agctggttac
900ggtttcggtg gtgaaggtga ctggaagact gctgctttga ccagagtttt gaaggtcttg
960gctaacaaca agtctacttc cttcatggaa gactacacct accacttcga accaggtaac
1020cacatgatct tgggtgctca catgttggaa gtttgtccaa ctatcgctgc tactaagcca
1080accatcgaag tccacccatt gggtatcggt ggtaaagctg acccagctag aatggttttc
1140gatggtcaag ctggtccagc tgttaacgct tctttggttg acttgggtca cagattcaga
1200ttgttggtca acgttgtcga tggtgttaag gtcgaaaagc caatgccaaa gttgccagtt
1260gctagagtct tgtggaagcc acaaccatct ttgagagaat ctgctgaagc ctggatcttg
1320gctggtggtg ctcaccacac tgttttgtct tacgctatca ccgctgaaaa cttgtccgac
1380tgggctgaaa tggttggtat cgaagctgtc atcatcgaca aggatacctc tgtcccaaga
1440ttcaagaacg aattgagatg gtccgacgct gcttacagat tgagatga
1488231488DNAunknownSequence from the human microbiome dataset
23atgcaaagaa ctccatacga attctggttc gtcactggtt cccaacactt gtacggttcc
60gaagctttgg ctgaagtctc ctcccactcc agacaaatca ctcaagcctt caacgaagct
120gactctatct ccttcccaat cgttgtcaag ccagttgtca agaccccaga agaaatcttg
180caattgtgta tggaagctaa ctctgacgaa aactgtgctg gtttgatcac ttggatgcac
240accttctctc caggtaaaat gtggatcggt ggtttgtccc aattgcacaa gccattgttg
300cacttccaca ctcaattcca cagagaaatc ccttgggaca gaatcgacat ggatttcatg
360aacttgcacc aatctgctca cggtgacaga gaatttggtt tcatcgctac cagattgggt
420atcttgagaa aggaagttgt cggtcactgg agagacgaag ctgttcaaaa gagattgtct
480gattggatga gaactgctat cgcttgtttg gaaggtaaaa agttgaaggt tgctagattc
540ggtgacaaca tgagaagagt tgctgtcacc gaaggtgaca aggtcgaagc tcaaatccaa
600ttcggttggt ctatcaacgg ttacggtgtt ggtgacttgg tccaaagaat cactgacatc
660tccgataccg ctgttcacca attgttcaga gaataccaag aaagatacga cttcccacca
720gaagctagag aagctggtcc aatcagagat tctatcttgg aacaagctag aatcgaattg
780ggtttgaagt tgttcttgag agaaggtggt tactccgctt tcaccactac cttcgaagac
840ttgcacggtt tgaagcaatt gccaggtttg gctgtccaaa gattgatgtc tgaaggttac
900ggtttcggtg ctgaaggtga ctggagaact gctggtttgt tgagaatgat gaagatcatg
960gctgacaacg aaggtacttc cttcatggaa gattacacct accacttgga accaggtaac
1020gaaatgatct tgggtgctca catgttggaa gtttgtccaa ccatcgctgc tcaaagacca
1080ggtatcagag tccacccatt gtctatcggt ggtaaagctg acccagctag attggttttc
1140gatggtagac caggtccagc tttgaacgtc tctttgatcg acttgggtaa cagattcaga
1200ttgttgatca acaaggttga tgctgtccac ccaaagtccg ctatgccaca cttgccagtt
1260gctagagtct tgtggaagcc aagaccatct ttgcacgact ctgctgaagc ctggatgtac
1320gctggtggtg ctcaccacac tgttttctct taccacgtca ctaccgaaca attgttggac
1380tgggctgaat gggttgatat ggaagccttg gttatcgacg aacaaacctc cttgtcttcc
1440ttcagaagac aattgaagtg gaacgacgct tactacagaa tcagatga
1488241500DNAunknownSequence from the human microbiome dataset
24atgttgaaga ctaagaacta ccaattctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgaacac gctaagaaga tcgttgaagc cttgaacgct
120tctggtaact tgccatacga agttgtctgg aagccaacct tgatcactaa cgaattgatc
180agaagaacct tcaacgaagc taacactgac gaaaactgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300ttgcacttgc acacccaatt caacagagaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaatctgc tcacggtgac agagaatttg gtcacatctt ctccagattg
420cacatgaaca gaaaggttgt cgttggttac tgggctgacg aagatgttca aaagcaaatc
480ggttcttgga tgagaaccgc tgttggtgtc gttgaatctt cccacatcag agtcatgaga
540atcgctgaca acatgagaaa cgtcgctgtt actgaaggtg acaaggttga agctcaaatc
600aagttcggtt gggaagttga cgcttaccca gtcaacgaag ctgtcgaagc tgttaacgct
660gtctcccaag ctgacatcga taccttggtc gaagaatact acgacaagta cgaaatcttg
720ttggaaggta gagatgaaaa ggagttcaga agacacgtcg ctgttcaagc tggtatcgaa
780atcggtttgg aaagattctt ggaagaaaac aactaccaag ctatcgttac tcacttcggt
840gacttgggtg gtttcaagca attgccaggt ttggctatgc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag accgctgcta tggtcagatt gatgaagatc
960atgactggtg gtatgaagga cgctaagggt acttctttca tggaagatta cacttacaac
1020ttggttccag gtaaagaagg tatcttggaa gctcacatgt tggaagtctg tccaaccatc
1080gctgacggta aaatctctat caaggaacaa ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttcactgc taaggaaggt ccagctatcg ctgcttcttt gatcgacttg
1200ggtgacagat tcagattgtt gatcaacgaa gttgaatgta agaagaccga aaagccaatg
1260ccaaagttgc cagtcgctac cgctttctgg actccaaagc caaacttgaa gatcggtgct
1320caatcctgga tcttggctgg tggtgctcac cacactgctt tctcttacga cttgtccgct
1380gaacaaatgg gtgactgggc tgaagctatg ggtatcgaag ctgtctacat cgacgctgat
1440accactatca gacaattgaa gaacgaattg agatggaacg aattggctta cagaagatga
1500251497DNAunknownSequence from the human microbiome dataset
25atgaagactg gtagagatta caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gaagaatgtt tgagaaaggt tgctgaacac tccgctaaga tcgtcgaagg tttgaacgct
120tctggtagat tgccattcga agttgtcttg aagccaacct tgatcgatcc agctaccatc
180agaagaactt tgaacgaagc taacgaagac ggtgaatgtg ctggtgttat cacctggatg
240cacactttct ccccagctaa gatgtggatc ttgggtttga aggaatacag aaagccattg
300tgtcacttgc acacccaatt caacgaagaa atcccatacg atactatcga catggatttc
360atgaacgaaa accaatccgc tcacggtgac agagaatttg gtcacatggt ttccagaatg
420ggtatggaaa gaaagatcat cgtcggtcac tgggctaacg ctgaagttca agaaaagatc
480ggttcttgga tgagaaccgc tatcggtatc atggaatctt cccacatcag agtctgtaga
540atcggtgaca acatgaacaa cgttgctgtc actgaaggtg acaaggtcga agctgaagtc
600aagttcggtt gggaaatcga tcactactgt gttaacgacg ctgttgaata cgtcaacgct
660gtttccgaag gtgacgtcaa cgctttggtt gaagaatact actctaagta ccaaatcttg
720ttggaaggta gagacccaga agagttcaga gctcacgtcg ctgctcaagc taagatcgaa
780atcggtttgg aaaagttctt ggaagacggt gactaccacg ctatcgttac ccacttcggt
840atgttgggtg gtttgcaaca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtggtgaagg tgactggaag actgctgcta tggtcagatt gatgaagatc
960atggctgctg gtgttccagg tgctaagggt acttctttca tggaagacta cacttacaac
1020ttggtcccag gtaaagaagg tatcttgcaa gctcacatgt tggaagtctg tccatctatc
1080gctgaaggtc caatctccat caaggttcaa ccattgtcta tgggtaacag agaagaccca
1140gctagattgg ttttcacctc caagactggt ccagctgtcg ctacctcttt ggttgatttg
1200ggtaacagat tcagattgat catcaacgct gttgactgta agaagtgtga aaaggaaatg
1260ccaaagttgc cagttgctac cgctttctgg actccacaac cagacttggc tactggtgct
1320caagcctgga tcttggctgg tggtgctcac cacaccgctt tctcctacga cttgactgtc
1380gatcaaatgg ttgactgggc tgctgctatg ggtatcgaat ctgttgtcat cgacaaggat
1440accactatca gaaacttcaa gaacgaattg agatggaact ctatctacta cagatga
1497261497DNAunknownSequence from the cow rumen metagenome dataset
26atgatccaaa ctaaggctta caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaagttt tgagacacgt tgctgatcac tctaaggaaa tcgttgaaga attgaacaag
120tccggtatct tgccatacga agttgtctgg aagccagtct tgatcaccaa ccaattgatc
180agacaaactt tcaacgaagc taacgctgac gattcttgtg ctggtgttat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300ttgcacttgc acacccaata caacgaagaa atcccatacg atactatcga catggatttc
360atgaacgaaa accaagctgc tcacggtgac agagaatacg gtcacatcgt ttccagaatg
420ggtatcgaaa gaaaggtcat cgctggttac tggaaggaca acgaagttag atccagaatc
480gcttcctgga tgagaaccgc tgttggtgtc atggaatctt cccacatcag agttatgaga
540gtcgctgata acatgagaaa cgttgctgtc actgaaggtg acaaggttga agctcaaatc
600aagttcggtt gggaagttga cacctaccca gtcaacgaaa tcgctgattc tgttgctact
660gtctctgctt ccgacgtcaa cgctttgttg gacgaatact acgataagta cgaaatcatc
720ttggacggta gagacccaga tgagttcaag aagcacgttg ctgtccaagc tcaaatcgaa
780ttgggtttcg aaagattctt ggaagaaaag aactaccaag ctatcgttac ccacttcggt
840gacttgggtg ctttgggtca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag gttgctgcta tggtcagatt gatgaagatc
960atgacctctg gtatgaagga tgctaagggt acttccatgt tggaagacta cacttacaac
1020ttggttagag gtaaagaagg tatcttggaa gctcacatgt tggaaatctg tccaactatc
1080gctgacggtc caatctctat cagagtcaag ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttcacctc taaggaaggt aaaggtgtcg ctacttcctt gatcgacttg
1200ggtaacagat tcagattgat catcaacgaa gttgaatgta agaagaccga aaagccaatg
1260ccaaacttgc cagtcgctac cgcttactgg actccatacc cagacttgta cactggtgct
1320gaagcctgga tcttggctgg tggtgctcac cacaccgctt tctcttacga cttgacttcc
1380ggtcaaatgg ctgattgggc tgaaatgatg ggtatcgaag ctgttatcat cgataagaac
1440accactatcc cagctttcaa gaaggaattg aagttgggtg acgtcttcta cagatga
1497271434DNAunknownSequence from the cow rumen metagenome dataset
27atgaagttct ggttcgttac tggttcccaa ttcttgtacg gtgaagaaac cttgagacaa
60gttgaagaag actctaagaa gatcgttgac ggtttgagat tgccattccc agttgaatac
120aagttgaccg tcaagactga atctgaaatc gaaagaatcg ttaaggaagc taactacgac
180gatgaatgtg ctggtatcat caccttctgt cacactttct ctccatccaa gatgtggatc
240aacggtttgg ctttgttgca aaagccttgg ttgcacttcc acacccaatt caacgaaact
300atcccaaacg aagctatcga catggattac atgaacttgc accaatccgc tcacggtgac
360agagaacacg gtttcatcgg tgctagattg agagttccaa gagctgttgt cgctggttac
420tggaaggacc cagctgtcca agctaagatc ggtgaatggc aaagagctgc tgttggtgtc
480atgttctcca gatccttgaa gatcgtcaga ttcggtgaca acatgagaga agttgctgtc
540accgaaggtg acaagatcga agctcaattg agattgggtt ggcaagttaa caccttcgct
600gttggtgact tggtcgaata catggacgct gtcactgatg ctgaaatcga cgctttgatg
660aaggaatacg ctgaattgta cgaattttct gaagctgaca ccgatactat cagataccaa
720gctagagaag aaatcgctat cgaaaagatt ttggttagag aaggtgctaa ggctttctcc
780aacaccttcg aagacttgca cggtatgaag caattgccag gtttggctac tcaacacttg
840atgcacaagg gttacggttt cggtgctgaa ggtgactgga agaccgctgg tatgactgct
900atcgtcaagg ctatgtaccc agacggtaac acctctttca tggaagatta cacttacgac
960tacgaaagac aattgatctt gggttctcac atgttggaag tttgtccatc catcgctgct
1020gatagaccaa gaatcgaagt ccacaagttg ggtatcggtg gtaaagacgc tccagctaga
1080atcgttttcg aaggtagagc tggttctgct aaggtcttgt ccttgatcga tatcggtggt
1140agattcagat tgatccaaca agacatcgaa tgtgaaaagc cattccaatc tatgccaaac
1200ttgccagttg ctagaaccat gtggagacca gctccatcct tcttggaagg tttggaatgt
1260tggatcatcg ctggtggtgc tcaccacact gttttgtctt acgacatcac cgatgaaact
1320gtcagagatt tcgctagaat catgggtatc gaattggttg tcatcaacaa ggacaccact
1380aaggaaaagt tggaaagaga tatcatgatc ggtgacgtca tctacggtag atga
1434281434DNAunknownSequence from the cow rumen metagenome dataset
28atgaagttct ggttcatcac tggttcccaa ttcttgtacg gtgaagaaac tatcagacaa
60gtcgaagaag attccaagaa gatcgtcgac ggtttgaagt tgccattccc agttgaatac
120aagttgaccg tcaagaagga atctgaaatc gaaagaatcg ttaaggaagc taacttcgac
180gatgaatgtg ctggtatcat caccttctgt cacactttct ctccatccaa gatgtggatc
240aacggtttgg ctatcttgca aaagccttgg ttgcacttcc acacccaatt caacgaaact
300atcccaaacg aagctatcga catggcttac atgaacttgc accaatctgc tcacggtgac
360agagaacacg gtttcatcgg tgctagattg agaatgccaa gagctgttgt cgctggttac
420tggaaggacc cagaagttca agctaagatc gctgaatggc aaagagctgc tgttggtgtc
480atgttctcta agtccttgaa gatcgtcaga ttcggtgaca acatgagaga agttgctgtc
540accgaaggtg acaagatcga agctcaattg aagttgggtt ggcaagtcaa caccttcgct
600gttggtgact tggtcgaata catgaacgct gttactgacg ctgaaatcga tgtcttgatg
660aaggaatacg ctgaattgta cgactacgat aaggctgacg aagaaactat cagataccaa
720gctagagaag aaatcgctat cgaaaagatt ttggttagag aaggtgctaa ggctttctct
780aacaccttcg aagacttgca cggtatgcaa caattgccag gtttggctac tcaacacttg
840atgcacaagg gttacggttt cggtgctgaa ggtgactgga agaccgctgg tatgactgct
900atcgtcaagg ctatgtaccc agacggtaac acctccttca tggaagacta cacttacgat
960tacgaaagaa agttgatctt gggttctcac atgttggaag tttgtccatc catcgctgct
1020gacagaccaa gaatcgaagt ccacccattg ggtatcggtg gtaaagaacc accagctaga
1080atcgttttcg aaggtaaagc tggttctgct aaggtcttgt ccttgatcga catcggtggt
1140agattgagat tgatccaaca agatatcgaa tgtgaaaagc cattccaatc tatgccaaac
1200ttgccagttg ctagaactat gtggagacca gctccatcct tcttggaagg tttggaatgt
1260tggatcatcg ctggtggtgc tcaccacacc gttttgtctt acgacatctc cgatgaaact
1320gtcagagact tcgctagaat catgggtatc gaattggttg tcatcaacaa ggataccact
1380aaggaaaagt tggaaagaga catcatgatc ggtgacatga tctacggtag atga
1434291497DNAunknownSequence from the cow rumen metagenome dataset
29atgtccgaaa tgaagaagta ccaattctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgctcac tctaaggaaa tggtcgaagg tttgaacaag
120tccggtgtct tgccattcga aatcgtttgg aagccaacct tgatcactaa cgaattgatc
180agaaagacct tcaacgaagc taacaacgac ccaaactgtg ctggtgttat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300ttgcacttgc acacccaata caacgaagaa atcccatacg ctactatgga catggatttc
360atgaacgaaa accaagctgc tcacggtgac agagaatacg ctcacatctt gtccagaatg
420agaatcgaaa gaaaggttgt cgttggtttc tggaaggatt ctgaagtcca aaagaagatc
480gcttcctgga tgagaaccgc tatcggtatc atggaatctt cccacatcag agtctgtaga
540gttgctgaca acatgagaaa cgtcgctgtt actgaaggtg acaaggtcga agctcaattg
600aagttcggtt gggaaatcga cgcttaccca gttaacgaaa tcgctgaagc tgtcgctgct
660gtttctgctt ccgacaccaa cgctttggtc gatgaatact actctaagta cgacatctgt
720ttggaaggta gagatccaga agagttcaag aagcacgtcg ctgttcaagc tcaaatcgaa
780atcggtttcg aaagattctt gaaggaaaag aactaccaag ctatcgttac tcacttcggt
840gacttgggtg ctttgaagca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag gtcgctgcta tggttagatt gatgaagatc
960atgtctgctg gtatgaagga cgctaagggt tcttccatgt tggaagatta cacctacaac
1020ttggtcaagg gtaaagaagg tatcatccaa gctcacatgt tggaaatctg tccatctatc
1080tccgacggtc caatccaaat caagtgtcaa ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttccaatc taagaccggt gctggtatcg ctacttcctt gatcgacttg
1200ggtaacagat tcagattgat catccaagat gtcgaatgta agaaggttga aaagccattg
1260ccaaagttgc caaccgctat caacttctgg actccacaac cagacttcta caccggtact
1320gaagcctggt tgttggctgg tggtgctcac cacaccgctt tctcttacga catcactgct
1380gaacaaatgg gtgactgggc tgctgctatg ggtatcgaag ctgtcttcat cgacaagaac
1440actaacatca gagacttcaa gaaggatttg atgttgggtg aagttttcta cagatga
1497301464DNAunknownSequence from the cow rumen metagenome dataset
30atgcaaagag aattctggtt catcgtcggt tcccaattct tgtacggtca agatgttttg
60gacactgttg atgctagagc tagagaaatg gctgctgaat tgtctaaggt cttgccatac
120ccattggtct acaaggttac cgctaagact aacaaggaaa tcgctgacac tgttaaggaa
180gctaactaca gagatgaagt catgggtatc gttacctggt gtcacacttt ctctccatcc
240aagatgtgga tcaacggttt ggtcaacttg caaaagccat actgtcactt ggctacccaa
300tacaacagag aattgccaaa cgaagaaatc gacatcgatt tcatgaactt gaaccaagct
360gctcacggtg acagagaaca cggtttcatc gctgctagat tgagaatgcc aagaaaggtc
420atcgctggtt actggcaaga cgaaaaggtt cacaagagat tgtctgattg gatgaaggct
480gctgttggtg ttgacgtttc caagcacatg aaggtcatga gattcggtga caacatgaga
540gaagtcgctg ttaccgaagg tgacaaggtc gaaactcaaa tcaagttggg ttggcaagtt
600aacacttggg ctgtcggtga cttggttaag gaaatgaaca acgttaccga agctgaaatc
660gacgctttgt tcgctgaata cgaagctcaa tacgacatcg ctactgataa cttggctgct
720atcagatacc aagctaagga agaaatcgct atgaagaaga tgttggatag agaaggttgt
780aaggctttct ctaacacctt ccaagacttg tacggtatgg aacaattgcc aggtttggct
840tcccaacact tgatggctca aggttacggt tacggtggtg aaggtgactg gaaggtctct
900gctatgactg ctatcttgaa ggctatgggt gaaaacggta acggtgcttc cgctttcatg
960gaagactaca cctaccactt ggtcgaaggt caagaatact ctttgggtgc tcacatgttg
1020gaagtttgtc catccttggc tgctgacaag ccaagaatcg aaactcacca cttgggtatc
1080ggtatgaacg aaaaggaccc agctagattg gtcttcgaag gtaaagctgg taaaggtatc
1140gttacctctt tgatcgatat gggtggtaga atgagattga tcgtccaaga catcgaagct
1200gttaagccaa tcttgccaat gccaaacttg ccagtcgcta gagttatgtg gagagctatg
1260ccagacttga ccactggtgt tgaatgttgg atcaccgctg gtggtgctca ccacactgtc
1320ttgtctttcg acgttacccc agctatgttg agagactggg ctagaatgat ggatatcgaa
1380tttgtctaca tcactaagga taccactcca gaagaattgg aagaagaatt gttgatcaag
1440gacttggttt ggaagttgaa gtga
1464311500DNAunknownSequence from the human microbiome dataset
31atgttgaaga ctaagaacta ccaattctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgaacac tctaagatca tcgttgacgc tttgaacaag
120tccggtaact tgccatacga agttgtctgg aagccaacca tgatcactaa cgaagttatc
180agaaagacct tcaacgaagc taacactgac gaaaactgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagaatacag aaagccattg
300ttgcacttgc acacccaatt caacagagaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaagctgc tcacggtgac agagaatacg gtcacatctt ctccagattg
420aacatggaaa gaaaggttgt cgctggttac tgggaagacg aagatgttca aaagcaaatc
480ggttcctgga tgagaaccgc tgtcggtgtt gtcgaatctt cccacgttag agtcatgaga
540gttgctgaca acatgagaaa cgttgctgtc actgaaggtg acaaggtcga agctcaaatc
600aagttcggtt gggaagttga cgcttaccca gtcaacgaag ttgtcgaagc tgttaacgct
660gtctctcaag ctgacatcga taccttggtt gaagaatact acgacaagta cgatatcttg
720ttggaaggta gagacgaaaa ggagttcaga gaacacgttg ctgtccaagc tggtatcgaa
780ttgggtttcg aaagattctt ggacgaaaac aactaccaag ctgttgtcac tcacttcggt
840gacttgggtg gtttgaagca attgccaggt ttggctatgc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag accgctgcta tggttagagt catgaagatc
960atgactcaag gtatgaagga cgctaagggt acttctttca tggaagatta cacttacaac
1020ttggtttccg gtaaagaagg tgtcttggaa gctcacatgt tggaagtctg tccaaccatc
1080gctgacggta aaatctctat caaggaacaa ccattgtcta tgggtaacag agaagaccca
1140gctagattgg ttttcacctc taagactggt ccagctatcg ctacctcctt gatcgacttg
1200ggtgacagat tcagattgat catcaacgac gtcgattgta agaagactga aaagccaatg
1260ccaaagttgc cagttgctac cgctttctgg actccacaac caaacttgaa ggtcggtact
1320gaagcctgga tcttggctgg tggtgctcac cacaccgctt tctcttacga cttgactgct
1380gaacaaatgg gtgactgggc tgcttgtatg ggtatcgaag ctgtttacat cgacaaggat
1440accactatca gacaattcaa gaacgaattg ttgtggaact ctgtcgctta cagaaagtaa
1500321497DNAunknownSequence from the human microbiome dataset
32atgactggtg ttaagaacta caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gaagaatgtt tggctcacgt cgctgaacac tccagaatca tcgttgaatc tttgaacaga
120tccggtatct tgccatacga agttgtctgg aagccaacct tgatcactaa cgaattgatc
180agaagaacct tcaacgaagc taacgctgac gaagaatgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300atgcacttcc acacccaatt caacagagaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaatccgc tcacggtgac agagaatacg gtcacatggt tactagaatg
420ggtatcgaaa gaaaggttat cgtcggtcac tggtctgacg aaaaggttgt cggtagaatc
480gctggttgga tgagaaccgc tgttggtatc atggaatctt cccacgtcag agttgtcaga
540ttcgctgaca acatgagaaa cgttgctgtc actgaaggtg acaaggttga agctcaagtc
600aagttcggtt gggaagttga cgcttaccca gtcaacgaat tgtgtcaata cgttaaggct
660gtcccaaagg gtgacatcac cgctttggtc gatgaatact actccaagta cactatcttg
720ttggaaggta gagacccaga agagttcaag agacacgttg ctgtccaagc tcaaatcgaa
780gctggtttgg aaagattctt ggttgaaaag gactaccacg ctatcgtcac ccacttcggt
840gacttgggtg aattgcaaca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtggtgaagg tgactggaag actgctgcta tggttagatt gatgaagatc
960atggctcaag gtgtcaagaa cgctaagggt acttctttca tggaagacta cacttacaac
1020ttggttccag gtaaagaagg tatcttggaa gctcacatgt tggaagtttg tccatctatc
1080gctgacggtg aaatctccat caaggtcaac ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttcacctc caagactggt cacggtatcg ctacctcttt ggttgacttg
1200ggtactagat tcagattgat catcaacgat gtcgaatgta gaaagaccga aaaggctatg
1260ccaaagttgc cagtcgctac cgctttctgg actccagaac catctttggc tactggtgct
1320gaagcctgga tcttggctgg tggtgctcac cacaccgctt tctcctacga cttgactgct
1380gaacaaatgg gtgactgggc tgaatctatg ggtatcgaag ttgtctacat cgacaaggat
1440accactatca gaggtttgaa gaacgaaatg agatggaacg gtgctgtcta cagataa
1497331497DNAunknownSequence from the human microbiome dataset
33atgatcgctg ttaagaacta caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgaacac tctggtatca tcgttgactc tttgaacaag
120tccggtatct tgccatacga agttgtcttg aagccaacct tgatcactaa cgaattgatc
180agaagaacct tcaacgaagc taacgctgac gaagaatgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagaatacag aaagccattg
300atgcacttcc acacccaatt caaccaagaa atcccatacg actctatcga catggatttc
360atgaacgaaa accaatccgc tcacggtgac agagaatacg gtcacatggt tactagaatg
420ggtatcgaaa gaaaggttat cgtcggtcac tggagagacg aaaaggttgt cggtagaatc
480gctgcttgga tgagaaccgc tgttggtatc atggaatctt cccacgttag agtcgctaga
540ttcgctgaca acatgagaaa cgttgctgtc actgaaggtg acaaggtcga agctcaaatg
600aagttcggtt gggaagttga cgcttaccca gtcaacgaat tggctgaata cgttaaggct
660gtcccaaagg gtgacatcac cgctttggtc gatgaatact actctaagta cactatcttg
720ttggaaggta gagacccaga agagttcaag agacacgttg ctgtccaagc tcaaatcgaa
780gctggtttgg aaaagttctt gttggaaaag gactaccacg ctatcgttac ccacttcggt
840gacttgggtg aattgcaaca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag accgctgcta tggtcagatt gatgaagatc
960atgactcaag gtatgaagga cgctaagggt acttctttca tggaagatta cacttacaac
1020ttggttccag gtaaagaagg tatcttggaa gctcacatgt tggaagtctg tccaactatc
1080gctgacggtg aaatctctat caaggcttgt ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttcacctc taagactggt cacggtatcg ctgcttcctt ggttgacttg
1200ggtactagat tcagattgat catcaacgat gtcgaatgta agaagactga aaagccaatg
1260ccaaagttgc cagtcgctac cgctttctgg actccagaac caaacttggc taccggtgct
1320gaatcttgga tcttggctgg tggtgctcac cacaccgctt tctcctacga cttgactgct
1380gaacaaatgg gtgactgggc tgatgctatg ggtatcgaaa ctgtttacat cgacaaggat
1440accactatca gaggtttgaa gaacgaattg agatggaacg ctgctgctta cagataa
1497341500DNAunknownSequence from the human microbiome dataset
34atgttgaaga agaaggaata caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgaacac gctaagatca tcgtcgaaaa gttgaacgaa
120tccggtgttt tgccatacga agttgtctgg aagccaacct tgatcactaa cgaattgatc
180agaaagacct tcaacgaagc taacatcgac gatgaatgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300ttgcacttgc acacccaatt caacatggaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaatctgc tcacggtggt agagaatttg gtcacatctt cactagattg
420ggtatcgaaa gaaaggttgt cgttggtcac tggtccgacg aaaaggttca agaaaagatc
480gcttcttgga tgagaaccgc tgtcggtgtt atcgaatctt cccacgtcag agttatgaga
540gtcgctgaca acatgagaaa cgtcgctgtt actgaaggtg acaaggttga agctcaaatc
600aagttcggtt gggaagttga cgcttaccca gttaacgaaa tcgctgaatc tgttgacgct
660gtttccgctg ctgatgtcaa caccttggtt gaagaatact acgacaagta cgaaatcttg
720ttggaaggta gagatccaga agagttcaga aagcacgtcg ctgttcaagc tcaaatcgaa
780ttgggtttcg aaagattctt ggaagaaaag aactaccaag ctatcgtcac tcacttcggt
840gacttgggtg ttttgaagca attgccaggt ttggctatcc aaagattgat gcaaaagggt
900tacggtttcg gtgctgaagg tgactggaag accgctgcta tggtcagaat catgaagatc
960atgactgaag gtatgaagga cgctaagggt acttctatgt tggaagatta cacttacaac
1020ttcgttccag gtaaagaagg tatcttgcaa gctcacatgt tggaaatctg tccatctatc
1080gctgacggtc caatctccat caaggtcaac ccattgtcta tgggtgacag agaagatcca
1140gctagattgg ttttcacctc caaggaaggt aaaggtatcg ctacttcttt gatcgacttg
1200ggtgacagat tcagattgat catcaacacc gttgactgta agaagaacga aaagccaatg
1260ccaaagttgc cagttgctac caacttctgg actccagaac cagacttggc tactggtgct
1320gaagcctgga tcttgtgtgg tggtgctcac cacaccgctt tctcttacga catcactgct
1380gaacaaatgg gtgactgggc tgctatgatg ggtatcgaag ctgtctacat cgacaaggat
1440accactatca gaaacttgaa gaacgaattg agatggaacg aattggcttt cagaaagtaa
1500351497DNAunknownSequence from the human microbiome dataset
35atgaaggctg ctaaggatta caagttctgg ttctgtactg gttctcaaga tttgtacggt
60gacgaatgtt tggctcacgt tgctgaacac tccagaatca tcgttgacgc tttgaacaag
120tccggtgttt tgccatacga aatcgtctgg aagccaacct tgatcactaa cgaattgatc
180agaaagacct tcaacgaagc taacgctgac gaaaactgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttgc aagagttcag aaagccattg
300ttgcacttcc acacccaatt caacagagaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaagctgc tcacggtgac agagaatacg gtcacatcgt ttccagaatg
420ggtatcgaaa gaaagatcat cgttggttac tgggaagaca gagatgtcca agaaaagatc
480gcttcctgga tgttgaccgc tatcggtatc atggaatctt cccacatcag agtctgtaga
540atcgctgaca acatgagaaa cgttgctgtc actgaaggtg acaaggttga agctcaaatc
600aagttcggtt gggaaatcga cgcttaccca gtcaacgaaa tcgctgaata cgttgctgct
660gtcccacaag gtgaaatcaa cgctttggtt gaagaatact actctaagta cgacatcatc
720ttggaaggta gagatccaca agagttcaga gaacacgttg ctgtccaagc tggtatcgaa
780atcggtttcg aaaagttctt ggaagaaaag aactaccaag ctatcgtcac tcacttcggt
840gacttgggtt ctttgaagca attgccaggt ttggctatcc aaagattgat ggaaaagggt
900tacggtttcg gtggtgaagg tgactggaag accgctgcta tggttagatt gatgaagatc
960atgactgctg gtgtcaagaa cccaaagggt acttctttca tggaagacta cacttacaac
1020ttggttccag gtaaagaagg tgtcttggaa gctcacatgt tggaagtttg tccatctgtc
1080gctgatggtc caatcggtat caaggtttgt ccattgtcta tgggtgacag agaagatcca
1140gctagattgg tctacacctc taagactggt ccagctatcg ctacctcctt gatcgacttg
1200ggtaacagat tcagattgat catcaacgaa gttgaatgta agaaggtcga aaagccaatg
1260ccaaagttgc cagttgctac cgctttctgg actccatacc cagacttgaa gactggtgct
1320gaagcctgga tcttggctgg tggtgctcac cacaccgctt tctcttacga cttgactgct
1380gaacaaatgg gtgactgggc tgctgctatg ggtatcgaag ctgtttacat cgacaaggat
1440accactatca gaaacttcaa gagagacttg caattgggta acatcgtcta cagataa
1497361497DNAunknownSequence from the human microbiome dataset
36atggttactg gtagaaacta caagttctgg ttctgtactg gttcccaaga tttgtacggt
60gacgaatgtt tgagaaaggt tgctgaacac tccagaatca tcgttgaaga attgaacaag
120tccggtgttt tgccattcga attggtctgg aagccaacct tgatcactaa cgaattgatc
180agaaagacct tcaacgaagc taacgctgac gatgaatgtg ctggtgtcat cacctggatg
240cacactttct ctccagctaa gtcttggatc ttgggtttga aggaatacag aaagccattg
300tgtcacttgc acacccaatt caaccaagaa atcccatacg acactatcga catggatttc
360atgaacgaaa accaatctgc tcacggtggt agagaatacg gtcacatcgt tactagaatg
420ggtatcgaaa gaaaggttat cgtcggtcac tgggctgaca agaaggttca agaaagattg
480gcttcctgga tgagaaccgc tgtcggtatc atggaatctt cccacatcag agtttgtaga
540gtcgctgaca acatgagaaa cgttgctgtc actgaaggtg acaaggttga agctcaaatc
600aagttcggtt gggaagttga cgcttaccca gttaacgaag tctgtgacta cgttaaggat
660gtctctaagg gtgacatcga tgttttggtc gaagaatact acaacaagta cgacatcttg
720ttcgaaggta gagatccaga agagttcaag agacacgttg ctgtccaagc tgctatcgaa
780atcggtttcg aaagattctt ggaagaaaag aactaccaag ctgttgtcac ccacttcggt
840gacttgggtg gtttgcaaca attgccaggt ttggctatgc aaagattgat ggaaaagggt
900tacggtttcg gtgctgaagg tgactggaag accgctgcta tggttagatt gatgaagatc
960atgactgctg gtgtcaagga cgctaagggt acttctttca tggaagatta cacttacaac
1020ttggttccag gtaaagaagg tatcttgcaa tcccacatgt tggaagtttg tccaaccatc
1080gctgacggta aaatcggtat caaggtctgt ccattgtcta tgggtgacag agaagatcca
1140gctagattgt tcacctctaa gactggtcca gctgttgcta cttccttggt tgacttgggt
1200gacagattca gattgatcat caacgacgtt gattgtaaga aggtcgaaaa gccaatgcca
1260aagttgccag tcggttctgc tttctggacc ccacaaccag acttggctac tggtgctgaa
1320gcctggatct tggctggtgg tgctcaccac accgctttct cctacgactt gactgctgaa
1380caaatgggtg actgggctgc tgctatgggt atcgaagctg tttacatcga caaggatacc
1440actatcagaa acttcaagaa cgaattgaga tggaacgaag tcgctttcag aaagtaa
1497371476DNAunknownSequence from the cow rumen metagenome dataset
37atggaagaca tcatgaagag agaattttgg ttcatcgttg gttcccaatt cttgtacggt
60caagacgttt tggacactgt tgacgctaga gctaaggaaa tggctgctga attgtctaag
120gtcttgccat acccattggt ctacaaggtt accgctaaga ctaacaagga aatcactgac
180gtcatcaagg aagctaacta cagagatgaa tgtgctggta tcgttacctg gtgtcacact
240ttctctccat ccaagatgtg gatcaacggt ttggctaact tgcaaaagcc atactgtcac
300ttggctaccc aatacaacaa ggaaatccca aacgacgaaa tcgacatgga tttcatgaac
360ttgaaccaag ctgctcacgg tgacagagaa cacggtttca tcgctgctag attgagattg
420ccaagaaagg ttatcgctgg tttctggcaa gacgaaaaga tccacaagag attgtctgat
480tggatgagag ctgctgttgg tgtcgctgtt tccaagaaga tgaaggtcat gagattcggt
540gacaacatga gagaagtcgc tgttactgaa ggtgacaagg tcgaagttca aactaagttg
600ggttggcaag ttaacacctg ggctgtcggt gacttggtta aggaaatggg taaagtcacc
660gaagctgaaa tcgatgcttt ggttgctgaa tacgaagcta actacgacat cgctaccgat
720aacactgctg ctatcagata ccaagctaga gaagaaatcg ctatgaagaa gatgttggac
780agagaaggtt gtagagcttt caccaacact ttccaagatt tgtacggtat ggaacaattg
840ccaggtttgg cttctcaaca cttgatggct caaggttacg gttacggtgg tgaaggtgac
900tggaaggtct ctgctatgac tgctatcttg aaggctatgg gtgaaaacgg taacggtgct
960tccggtttca tggaagacta cacctaccac ttggtcgaag gtcaagaata ctctttgggt
1020gctcacatgt tggaagtttg tccatccttg gctgctgaca agccaagaat cgaaactcac
1080cacttgggta tcggtatgaa cgaaaaggac ccagctagat tggttttcga aggtaaagct
1140ggtaaaggta tcgttgtctc tttgatcgac atgggtggta gattgagatt gatcgtccaa
1200gatatcgaag ctgttaagcc aatcttgcca atgccaaact tgccagtcgc tagagttatg
1260tggagagcta tgccagactt gaccactggt gtcgaatgtt ggatcaccgc tggtggtgct
1320caccacactg tcttgtccta cgacgttacc gctgaacaaa tgagagactg ggctagaatg
1380atggatatcg aatttgttca catcaccaag gacactaccc cagaaaagtt ggaagaagaa
1440ttgttggtta aggatttggt ttggaagttg aagtaa
1476381467DNAunknownSequence from the cow rumen metagenome dataset
38atgtctaagg aattttggtt cgtcgtcggt tcccaagatt tgtacggtga agaagttttg
60aagatcgtcg ctgaaagagc tgctgaaatg gctgcttggt tgtctgaaaa gttgccatac
120ccattgatct acaaggtcac tgctatgtct tccaaccaaa tcacctccgt tatgaaggaa
180gctaacttcg acgataactg tttgggtgtt gtcacctggt gtcacacttt ctctccatcc
240aagatgtggt tgactggttt ggacttgttg caaaagcctt ggtgtcactt cgctacccaa
300tacaacttgg aaatcccaaa cgaagaaatc gacatggatt tcatgaactt gaaccaagct
360gctcacggtg acagagaaca cggtttcatc ggtgctagat tgagaaaggc tagaaaggtt
420gtcgctggtt actggaagga cgaaaaggtc atcgctagat tggctgaatt tcaaaaggtt
480gctgtcggtg ttgacgcttc taagcacatg aaggttatga gattcggtga caacatgaga
540gatgtcgctg ttactgaagg tgacaaggtc gaagttcaaa agaagttggg ttgggaagtc
600aacacttggg ctgtcggtga cttggttaag gaaatgaacg ctgtcaccga tgaagaagtt
660gaagccttgt tcaacgaata caaggcttct tacgacatca acactgataa catctacgct
720atcaagtacc aagctagaga agaaatcgct atcaagaaga tgatggacag aaacggttgt
780aaggctttct ccaacacctt ccaagatttg tacggtatgg aacaattgcc aggtttggct
840tctcaacact tgatgtcctt gggttacggt tacggtggtg aaggtgactg gaaggtttct
900gctatgactg ctatcttgaa ggctatgggt gaaaacggta acggtgcttc cgctttcatg
960gaagactaca cctaccactt ggtcaagggt cacgaatact ctttgggtgc tcacatgttg
1020gaagtttgtc catccttggc tgctgacaag ccaagaatcg aaacccacca cttgggtatc
1080ggtatgaacg aaaaggaccc agctagattg gtcttcgaag gtaaagaagg tagaggtatc
1140gttgcttctt tgatcgacat gggtggtaga ttgagattga tcgtccaaga tatcgaagct
1200gttaagccaa tcatgccaat gccaaacttg ccagtcgcta gagttatgtg gagagctttg
1260ccagacttga ctgatggtgt cgaatgttgg atcaccgctg gtggtgctca ccacactgtc
1320ttgtcttacg acgttacccc agaaatgatg agagacttcg ctaagttcat ggatatcgaa
1380tttgttcaca tcgacaagga taccaccgtt gaaaagttgg aagatgaatt gatggttaag
1440gatttggttt ggaagatgaa gggttaa
1467391467DNAunknownSequence from the cow rumen metagenome dataset
39atgggtaaca agaagaactt ctggttcgtc gtcggttctc aattcttgta cggtaacgaa
60gttttggaaa ctgtcgctgc tagagctcaa gaaatggctg aaaagatgtc taagtccttg
120ccatacgaat tgaagttcaa gggtatcgtc aagacctggg acgaagctac tcaatacgct
180aaggaagcta acttcgacga taactgttgt ggtgttatca cctggtgtca cactttctct
240ccatccaaga tgtggatcga agccttcaga ttgttgcaaa agccattgtt gcacttcgct
300acccaataca acagatacat cccagacaag gaaatcgaca tggatttcat gaacttgaac
360caagctgctc acggtgacag agaacacggt ttcatcatcg ctagaatgag attgcaacaa
420aagatcgtca ccggtttctg ggaagaccaa ccagttttgg atgaaatcgg tacttggatg
480agagctgctg tcgcttacga cttctccaga aacttgagag ttatgagatt cggtgacaac
540atgagagaag tcgctgttac tgaaggtgac aaggtcgaag ctcaaatcaa gttcggttgg
600caagttaaca cctggccagt cggtaaattg gttgaagaaa tcggtaaagt cactgaagaa
660gaagttgacg aattgttgaa ggtctacacc gatacttacg aattggctac cgacgatatc
720gaaactatca gataccaagc tagagaagaa atcgctatga agaagatgat gaccgctgaa
780ggtgctaacg ctttcgttaa cactttccaa gacttgatcg gtatgaagca attgccaggt
840atcgcttctc aacacttgat ggctcaaggt tacggttacg gtgctgaagg tgactggaag
900ttgtctgctt tggtctccat cgttaagaag atgaccgaag gtatgaccgg tggtacttct
960ttcatggaag actacactta ccacttggac ccaaacgctg aatacgcttt gggtgctcac
1020atgttggaag tctgtccatc catcgctgct gacaagccaa gaatcgaagt tcacccattg
1080ggtatcggtg acagagaaga tccagctaga ttggtcttcg aaggtcacga aggtgacgct
1140gttgtcgtta ccttgatcga tatgggtgaa agattcagaa tgttggtcca agacatccac
1200tgtgttaagc caatctacga aatgccaaac ttgccagtcg ctagagttat gtgggaaggt
1260aaaccatctt tgaacgaagg tttgaagatg tggttgatgg ctggtggtgc tcaccactct
1320gtcttgtcct acgacgctac cccagaaatg ttgaaggact tggctagaat gatggatatc
1380gaatttgttc acatcactgc tgactccaag ccagaagaat ttgaaaagga cttgttcttc
1440gctgatttgg cttggaagtt gaagtaa
1467401413DNAunknownSequence from the cow rumen metagenome dataset
40atgaagaaga tttacttcat cactggttcc caagacttgt acggtgaaga cgttttgaag
60actgtcgcta aggactccca agaaatggtt aacttcttgg acgaacaagt cggtgaaaga
120gctgaaatcg aatttttggg tgttgtcaga gactctgaaa tctgtttgga tttcatcttg
180aaggctaact tcgacaagga atgtatcggt atcatcacct ggatgcacac tttctcccca
240gctaagatgt ggatcagagg tttgaaggtt ttgcacaagc caatgttgca cttgcacacc
300caatacaacg aaaagttgcc atacgactct atcgacatgg atttcatgaa cttgaaccaa
360gctgctcacg gtgacagaga atttggtttc atcgctgcta gaatgaacat caagcaacac
420gttttgtccg gttactacaa gaacaaggac ttcatcgaag gtgttaagca atacatcgac
480gtctgtttgt ctatcgatgc tgctaagtac ttgagagtcg ctatgttcgg ttccaacatg
540agagacgttg ctgtcactga cggtgacaga gttcaatctg aaatcgactt cggttggaac
600gtcaactact acggtatcgg tgacttggtt gatatcatca acaaggtcaa ggacgaagaa
660atcgatgctc aattcgaaga atacaagaag agatacacca tcaacaccac taacatcgaa
720gctatcaagg aacaagccaa gtacgaagtt gctttgaaga agttcatcaa gaaggaaaac
780gtccaagcct tcaccgacaa cttccaagat ttgcacggtt tgaagcaatt gccaggtttg
840gctgttcaag acttgatgca agaaggtatc tctttcggtc cagaaggtga ctacaagacc
900ccagctttgt tggctacctt gttgccaatg actaagtaca gaaagggtgc taccggtttc
960atcgaagact acacttacga tttgatcgaa ggtaaagaaa tcgaattggg ttctcacatg
1020ttggaagttc caccatcttt cgctacttcc aagccagaaa tccaagtcag accattgtct
1080atcggtgaca aggctgctcc agctagattg gttttcgatt ccatcgaagg tgaaggtttg
1140caaatcacta tggttgacat gggtactcac ttcagaatca tcgctgctaa gatccaattg
1200gttaagcaac cagctccaat gccaaagttg ccagtcgcta gaatcatgtg gaagcacatc
1260ccaaacttca agatttctac cgaagcctgg atgttgtacg gtggtggtca ccactccgtc
1320atcaccactg ctttgactat cgaagacatc aagttgttcg ctaagttgac tggtactgaa
1380ttgtgtgtta tcgatgaaaa cactaagatt taa
141341496PRTBacillus subtilis 41Met Leu Gln Thr Lys Asp Tyr Glu Phe Trp
Phe Val Thr Gly Ser Gln1 5 10
15His Leu Tyr Gly Glu Glu Thr Leu Glu Leu Val Asp Gln His Ala Lys
20 25 30Ser Ile Cys Glu Gly Leu
Ser Gly Val Ser Ser Arg Tyr Lys Ile Thr 35 40
45His Lys Pro Val Val Thr Ser Ser Glu Thr Ile Arg Gln Leu
Leu Arg 50 55 60Glu Ala Glu Tyr Ser
Glu Thr Cys Ala Gly Ile Ile Thr Trp Met His65 70
75 80Thr Phe Ser Pro Ala Lys Met Trp Ile Glu
Gly Leu Ser Ser Tyr Gln 85 90
95Lys Pro Leu Met His Leu His Thr Gln Tyr Asn Arg Asp Ile Pro Trp
100 105 110Gly Thr Ile Asp Met
Asp Phe Met Asn Ser Asn Gln Ser Ala His Gly 115
120 125Asp Arg Glu Tyr Gly Tyr Ile Asn Ser Arg Met Gly
Leu Ser Arg Lys 130 135 140Val Val Ala
Gly Tyr Trp Asp Asp Glu Glu Val Lys Lys Glu Ile Ser145
150 155 160Gln Trp Met Asp Thr Ala Ala
Ala Leu Asn Glu Ser Arg His Ile Lys 165
170 175Val Ala Arg Phe Gly Asp Asn Met Arg His Val Ala
Val Thr Asp Gly 180 185 190Asp
Lys Val Gly Ala His Ile Gln Phe Gly Trp Gln Val Asp Gly Tyr 195
200 205Gly Ile Gly Asp Leu Val Glu Val Met
Asn Arg Ile Thr Asp Asp Glu 210 215
220Val Asp Thr Leu Tyr Ala Glu Tyr Asp Arg Leu Tyr Val Ile Ser Glu225
230 235 240Glu Thr Lys Arg
Asp Glu Ala Lys Val Ala Ser Ile Lys Glu Gln Ala 245
250 255Lys Ile Glu Leu Gly Leu Thr Thr Phe Leu
Glu Gln Gly Gly Tyr Ser 260 265
270Ala Phe Thr Thr Ser Phe Glu Val Leu His Gly Met Lys Gln Leu Pro
275 280 285Gly Leu Ala Val Gln Arg Leu
Met Glu Lys Gly Tyr Gly Phe Ala Gly 290 295
300Glu Gly Asp Trp Lys Thr Ala Ala Leu Val Arg Met Met Lys Ile
Met305 310 315 320Ser Gln
Gly Lys Arg Thr Ser Phe Met Glu Asp Tyr Thr Tyr His Phe
325 330 335Glu Pro Gly Asn Glu Met Ile
Leu Gly Ser His Met Leu Glu Val Cys 340 345
350Pro Thr Val Ala Leu Asp Gln Pro Lys Ile Glu Val His Pro
Leu Ser 355 360 365Ile Gly Gly Lys
Glu Asp Pro Ala Arg Phe Val Phe Asn Gly Ile Ser 370
375 380Gly Ser Ala Ile Gln Ala Ser Leu Val Asp Ile Gly
Gly Arg Phe Arg385 390 395
400Leu Val Leu Asn Glu Val Asn Gly Gln Glu Ile Glu Lys Asp Met Pro
405 410 415Asn Leu Pro Val Ala
Arg Val Leu Trp Lys Pro Glu Pro Ser Leu Lys 420
425 430Thr Ala Ala Glu Ala Trp Ile Leu Ala Gly Gly Ala
His His Thr Cys 435 440 445Leu Ser
Tyr Glu Leu Thr Val Glu Gln Met Leu Asp Trp Ala Glu Met 450
455 460Ala Gly Ile Glu Ser Val Leu Ile Ser Arg Asp
Thr Thr Ile His Lys465 470 475
480Leu Lys His Glu Leu Lys Trp Asn Glu Ala Leu Tyr Arg Leu Gln Lys
485 490
495421491DNAartificial sequenceCodon-optimized coding region of B.
subtilis AI. 42atgttgcaaa ctaaggatta cgaattctgg ttcgttactg
gttctcaaca cttgtacggt 60gaagaaactt tggaattggt cgatcaacac gctaagtcta
tctgtgaagg tttgtccggt 120gtctcttcca gatacaagat cacccacaag ccagttgtca
cctcttccga aactatcaga 180caattgttga gagaagctga atactctgaa acttgtgctg
gtatcatcac ctggatgcac 240actttctctc cagctaagat gtggatcgaa ggtttgtctt
cctaccaaaa gccattgatg 300cacttgcaca cccaatacaa cagagacatc ccttggggta
ctatcgacat ggatttcatg 360aactctaacc aatccgctca cggtgacaga gaatacggtt
acatcaactc cagaatgggt 420ttgtccagaa aggttgtcgc tggttactgg gacgatgaag
aagtcaagaa ggaaatctct 480caatggatgg acaccgctgc tgctttgaac gaatccagac
acatcaaggt tgctagattc 540ggtgacaaca tgagacacgt tgctgtcact gacggtgaca
aggttggtgc tcacatccaa 600ttcggttggc aagttgacgg ttacggtatc ggtgacttgg
ttgaagtcat gaacagaatc 660accgacgatg aagttgacac tttgtacgct gaatacgata
gattgtacgt catctctgaa 720gaaaccaaga gagacgaagc taaggttgct tccatcaagg
aacaagctaa gatcgaattg 780ggtttgacca ctttcttgga acaaggtggt tactctgctt
tcaccacttc cttcgaagtc 840ttgcacggta tgaagcaatt gccaggtttg gctgttcaaa
gattgatgga aaagggttac 900ggtttcgctg gtgaaggtga ctggaagacc gctgctttgg
tcagaatgat gaagatcatg 960tctcaaggta aaagaacctc cttcatggaa gactacactt
accacttcga accaggtaac 1020gaaatgatct tgggttctca catgttggaa gtttgtccaa
ctgtcgcttt ggaccaacca 1080aagatcgaag ttcacccatt gtctatcggt ggtaaagaag
atccagctag attcgtcttc 1140aacggtatct ctggttccgc tatccaagcc tctttggttg
acatcggtgg tagattcaga 1200ttggttttga acgaagtcaa cggtcaagaa atcgaaaagg
acatgccaaa cttgccagtt 1260gctagagtct tgtggaagcc agaaccatct ttgaagactg
ctgctgaagc ctggatcttg 1320gctggtggtg ctcaccacac ctgtttgtct tacgaattga
ctgtcgaaca aatgttggac 1380tgggctgaaa tggctggtat cgaatctgtt ttgatctcca
gagataccac tatccacaag 1440ttgaagcacg aattgaagtg gaacgaagcc ttgtacagat
tgcaaaagta a 14914316404DNAartificial sequenceconstructed
plasmid 43aaacgccagc aacgcggcct ttttacggtt cctggccttt tgctggcctt
ttgctcacat 60gttctttcct gcgttatccc ctgattctgt ggataaccgt attaccgcct
ttgagtgagc 120tgataccgct cgccgcagcc gaacgaccga gcgcagcgag tcagtgagcg
aggaagcgga 180agagcgccca atacgcaaac cgcctctccc cgcgcgttgg ccgattcatt
aatgcagctg 240gcacgacagg tttcccgact ggaaagcggg cagtgagcgc aacgcaatta
atgtgagtta 300gctcactcat taggcacccc aggctttaca ctttatgctt ccggctcgta
tgttgtgtgg 360aattgtgagc ggataacaat ttcacacagg aaacagctat gaccatgatt
aggcgcctac 420ttctaggggg cctatcaagt aaattactcc tggtacactg aagtatataa
gggatataga 480agcaaatagt tgtcagtgca atccttcaag acgattggga aaatactgta
atataaatcg 540taaaggaaaa ttggaaattt tttaaagatg tcttcactgg ttactcttaa
taacggtctg 600aaaatgcccc tagtcggctt agggtgctgg aaaattgaca aaaaagtctg
tgcgaatcaa 660atttatgaag ctatcaaatt aggctaccgt ttattcgatg gtgcttgcga
ctacggcaac 720gaaaaggaag ttggtgaagg tatcaggaaa gccatctccg aaggtcttgt
ttctagaaag 780gatatatttg ttgtttcaaa gttatggaac aattttcacc atcctgatca
tgtaaaatta 840gctttaaaga agaccttaag cgatatggga cttgattatt tagacctgta
ttatattcac 900ttcccaatcg ccttcaaata tgttccattt gaagagaaat accctccagg
attctatacg 960ggcgcagaag gattctatac gggcgcagaa ctagtgatct cgaggttcca
gagctcggat 1020ccaccacagg tgttgtcctc tgaggacata aaatacacac cgagattcat
caactcattg 1080ctggagttag catatctaca attgggtgaa atggggagcg atttgcaggc
atttgctcgg 1140catgccggta gaggtgtggt caataagagc gacctcatgc tatacctgag
aaagcaacct 1200gacctacagg aaagagttac tcaagaataa gaattttcgt tttaaaacct
aagagtcact 1260ttaaaatttg tatacactta ttttttttat aacttattta ataataaaaa
tcataaatca 1320taagaaattc gcttactcat cccgggttag atgagagtct tttccagttc
gcttaagggg 1380acaatcttgg aattatagcg atcccaattt tcattatcca catcggatat
gctttccatt 1440acatgccatg gaaaattgtc attcagaaat ttatcaaaag gaactgcaat
tttattagag 1500tcatataaca atgaccacat ggccttataa caaccaccaa gggcacatga
gtttggtgtt 1560tctagcctaa aattaccctt tgtagcacca atgacttgag caaacttctt
cacaatagca 1620tcgtttttag aagccccacc tacaaaaaaa gtcctttctg gccttttatt
taggtagtcc 1680cgcagcggag attcatcgta atcaaacttc acgattgtat cttcgttcag
tctctgttgt 1740gagcttgcgt ttgaatccga aagcagggga gatattctta ccctgcaact
taaagcctgt 1800gattctacaa tatttttggc atcgtgcctc ttgtctttga acttggccac
ctctctttca 1860atcatacccg tttttggatt gaagataacc cttttgttta tggcttttac
gctaggaacg 1920atctccccca gaggaaaata tacacctaat tcattttcac tactttctga
gtcatctagc 1980acagcttgat taaaaagagt ccaatcgtta gtcttctcat aattattttc
ccgttctttg 2040tttaactcgt ctcttatcct ctcccttgcc aaagaaccat tacaataaca
aatcataccc 2100atataatggt ttggcagagt tggatgaatg aaaagatgat agttcggaga
ggggtgatac 2160ttatcggtga ccagaagaac tgtagtactt gttcctaggg aaacgagaac
gtcattcttc 2220cgcaggggta aagaacatat agtggctaaa ttatccccag tcatgggaga
gaccttgcag 2280tttgtattga aaccgtactt ctcaataaaa tatttacaga tggtacccgc
tatcaaattt 2340ttcatgggtg ctctcattaa tttttgtctg atagttttat ccttagaaga
actatcaatt 2400agatgtagta gctcatcact gaattttctt tcacgtatat cataaaggtt
cataccacag 2460gcatctgcct cctctaattc aacaagatgg cccactaaga tagaagtcaa
aaaattagac 2520actaaagaaa tggtctttgt tttttcgtaa gcttctggtt ctaattgtgc
aattttcaga 2580atttgaggac cagtaaatct aaaatgggct ctggaccctg ttaattgagc
cattttttca 2640ggcccaccta tgcactcttc aaactcttga cattgctttg cagtactgtg
gtcttgccaa 2700ttgggggcgg tttgccttgc aaatgctaca gagctcacgt agtgcaataa
atctttttcc 2760ggtttcttat tcaattgctc taacagagat tcggcttggg aggaccagta
gacagacccg 2820tgctgctggc aggaccctga gacggccata actttgttca atggaaattt
agcctcgcga 2880tatttcgaga gaaccagatc tagagcctct aaccacatgg ctacgggaca
ttcgatagtg 2940tcgccgtgta tatagacacc cttctttgtg tgataatgcg gaagatcctt
ttcaaattcc 3000actgtttctg aatggacaat ttttaggtcc tggttaatgg cgagacattt
cagttgttgg 3060gtcgaaagat caaacccaag atagtatgag tctaaagaca ttgtgttgga
aacctctctt 3120gtctgtctct gaattactga acacaacata ctagtcgtac ggttttattt
tttacttata 3180ttgctggtag ggtaaaaaaa tataactcct aggaataggt tgtctatatg
tttttgtctt 3240gcttctataa ttgtaacaaa caaggaaagg gaaaatactg ggtgtaaaag
ccattgagtc 3300aagttaggtc atccctttta tacaaaattt ttcaattttt tttccaagat
tcttgtacga 3360ttaattattt tttttttgcg tcctacagcg tgatgaaaat ttccgcctgc
tgcaagatga 3420gcgggaacgg gcgaaatgtg cacgcgcaca acttacgaaa cgcggatgag
tcactgacag 3480ccaccgcaga ggttctgact cctactgagc tctattggag gtggcagaac
cggtaccgga 3540ggagaccgct ataaccggtt tgaatttatt gtcacagtgt cacatcagcg
gcaactcaga 3600agtttgacag caagcaagtt catcattcga actagcctta ttgttttagt
tcagtgacag 3660cgaactgccg tactcgatgc tttatttctc acggtagagc ggaagaacag
ataggggcag 3720cgtgagaaga gttagaaagt aaatttttat cacgtctgaa gtattcttat
tcataggaaa 3780ttttgcaagg ttttttagct caataacggg ctaagttata taaggtgttc
acgcgatttt 3840cttgttatgt atacctcttc tggcgcgcct ctttttatta accttaattt
ttattttaga 3900ttcctgactt caactcaaga cgcacagata ttataacatc tgcataatag
gcatttgcaa 3960gaattactcg tgagtaagga aagagtgagg aactatcgca tacctgcatt
taaagatgcc 4020gatttgggcg cgaatccttt attttggctt caccctcata ctattatcag
ggccagaaaa 4080aggaagtgtt tccctccttc ttgaattgat gttaccctca taaagcacgt
ggcctcttat 4140cgagaaagaa attaccgtcg ctcgtgattt gtttgcaaaa agaacaaaac
tgaaaaaacc 4200cagacacgct cgacttcctg tcttcctatt gattgcagct tccaatttcg
tcacacaaca 4260aggtcctagc gacggctcac aggttttgta acaagcaatc gaaggttctg
gaatggcggg 4320aaagggttta gtaccacatg ctatgatgcc cactgtgatc tccagagcaa
agttcgttcg 4380atcgtactgt tactctctct ctttcaaaca gaattgtccg aatcgtgtga
caacaacagc 4440ctgttctcac acactctttt cttctaacca agggggtggt ttagtttagt
agaacctcgt 4500gaaacttaca tttacatata tataaacttg cataaattgg tcaatgcaag
aaatacatat 4560ttggtctttt ctaattcgta gtttttcaag ttcttagatg ctttcttttt
ctctttttta 4620cagatcatca aggaagtaat tatctacttt ttacaacaaa tataaaacac
gtacgactag 4680tatgactcaa ttcactgaca ttgataagtt ggccgtctcc accataagaa
ttttggctgt 4740ggacaccgta tccaaggcca actcaggtca cccaggtgct ccattgggta
tggcaccagc 4800tgcacacgtt ctatggagtc aaatgcgcat gaacccaacc aacccagact
ggatcaacag 4860agatagattt gtcttgtcta acggtcacgc ggtcgctttg ttgtattcta
tgctacattt 4920gactggttac gatctgtcta ttgaagactt gaaacagttc agacagttgg
gttccagaac 4980accaggtcat cctgaatttg agttgccagg tgttgaagtt actaccggtc
cattaggtca 5040aggtatctcc aacgctgttg gtatggccat ggctcaagct aacctggctg
ccacttacaa 5100caagccgggc tttaccttgt ctgacaacta cacctatgtt ttcttgggtg
acggttgttt 5160gcaagaaggt atttcttcag aagcttcctc cttggctggt catttgaaat
tgggtaactt 5220gattgccatc tacgatgaca acaagatcac tatcgatggt gctaccagta
tctcattcga 5280tgaagatgtt gctaagagat acgaagccta cggttgggaa gttttgtacg
tagaaaatgg 5340taacgaagat ctagccggta ttgccaaggc tattgctcaa gctaagttat
ccaaggacaa 5400accaactttg atcaaaatga ccacaaccat tggttacggt tccttgcatg
ccggctctca 5460ctctgtgcac ggtgccccat tgaaagcaga tgatgttaaa caactaaaga
gcaaattcgg 5520tttcaaccca gacaagtcct ttgttgttcc acaagaagtt tacgaccact
accaaaagac 5580aattttaaag ccaggtgtcg aagccaacaa caagtggaac aagttgttca
gcgaatacca 5640aaagaaattc ccagaattag gtgctgaatt ggctagaaga ttgagcggcc
aactacccgc 5700aaattgggaa tctaagttgc caacttacac cgccaaggac tctgccgtgg
ccactagaaa 5760attatcagaa actgttcttg aggatgttta caatcaattg ccagagttga
ttggtggttc 5820tgccgattta acaccttcta acttgaccag atggaaggaa gcccttgact
tccaacctcc 5880ttcttccggt tcaggtaact actctggtag atacattagg tacggtatta
gagaacacgc 5940tatgggtgcc ataatgaacg gtatttcagc tttcggtgcc aactacaaac
catacggtgg 6000tactttcttg aacttcgttt cttatgctgc tggtgccgtt agattgtccg
ctttgtctgg 6060ccacccagtt atttgggttg ctacacatga ctctatcggt gtcggtgaag
atggtccaac 6120acatcaacct attgaaactt tagcacactt cagatcccta ccaaacattc
aagtttggag 6180accagctgat ggtaacgaag tttctgccgc ctacaagaac tctttagaat
ccaagcatac 6240tccaagtatc attgctttgt ccagacaaaa cttgccacaa ttggaaggta
gctctattga 6300aagcgcttct aagggtggtt acgtactaca agatgttgct aacccagata
ttattttagt 6360ggctactggt tccgaagtgt ctttgagtgt tgaagctgct aagactttgg
ccgcaaagaa 6420catcaaggct cgtgttgttt ctctaccaga tttcttcact tttgacaaac
aacccctaga 6480atacagacta tcagtcttac cagacaacgt tccaatcatg tctgttgaag
ttttggctac 6540cacatgttgg ggcaaatacg ctcatcaatc cttcggtatt gacagatttg
gtgcctccgg 6600taaggcacca gaagtcttca agttcttcgg tttcacccca gaaggtgttg
ctgaaagagc 6660tcaaaagacc attgcattct ataagggtga caagctaatt tctcctttga
aaaaagcttt 6720ctaaattctg atcgtagatc atcagatttg atatgatatt atttgtgaaa
aaatgaaata 6780aaactttata caacttaaat acaacttttt ttataaacga ttaagcaaaa
aaatagtttc 6840aaacttttaa caatattcca aacactcagt ccttttcctt cttatattat
aggtgtacgt 6900attatagaaa aatttcaatg attacttttt ctttcttttt ccttgtacca
gcacatggcc 6960gagcttgaat gttaaaccct tcgagagaat cacaccattc aagtataaag
ccaataaaga 7020atataactcc taaaaggcta attgaaaccc tgtgattttt gcccgggttt
aaggcgcgcc 7080ctttatcatt atcaatactg ccatttcaaa gaatacgtaa ataattaata
gtagtgattt 7140tcctaacttt atttagtcaa aaaattagcc ttttaattct gctgtaaccc
gtacatgccc 7200aaaatagggg gcgggttaca cagaatatat aacatcgtag gtgtctgggt
gaacagttta 7260ttcctggcat ccactaaata taatggagcc cgctttttaa gctggcatcc
agaaaaaaaa 7320agaatcccag caccaaaata ttgttttctt caccaaccat cagttcatag
gtccattctc 7380ttagcgcaac tacagagaac aggggcacaa acaggcaaaa aacgggcaca
acctcaatgg 7440agtgatgcaa cctgcctgga gtaaatgatg acacaaggca attgacccac
gcatgtatct 7500atctcatttt cttacacctt ctattacctt ctgctctctc tgatttggaa
aaagctgaaa 7560aaaaaggttg aaaccagttc cctgaaatta ttcccctact tgactaataa
gtatataaag 7620acggtaggta ttgattgtaa ttctgtaaat ctatttctta aacttcttaa
attctacttt 7680tatagttagt ctttttttta gttttaaaac accaagaact tagtttcgaa
taaacacaca 7740taaacaaaca ccactagcat ggctgccggt gtcccaaaaa ttgatgcgtt
agaatctttg 7800ggcaatcctt tggaggatgc caagagagct gcagcataca gagcagttga
tgaaaattta 7860aaatttgatg atcacaaaat tattggaatt ggtagtggta gcacagtggt
ttatgttgcc 7920gaaagaattg gacaatattt gcatgaccct aaattttatg aagtagcgtc
taaattcatt 7980tgcattccaa caggattcca atcaagaaac ttgattttgg ataacaagtt
gcaattaggc 8040tccattgaac agtatcctcg cattgatata gcgtttgacg gtgctgatga
agtggatgag 8100aatttacaat taattaaagg tggtggtgct tgtctatttc aagaaaaatt
ggttagtact 8160agtgctaaaa ccttcattgt cgttgctgat tcaagaaaaa agtcaccaaa
acatttaggt 8220aagaactgga ggcaaggtgt tcccattgaa attgtacctt cctcatacgt
gagggtcaag 8280aatgatctat tagaacaatt gcatgctgaa aaagttgaca tcagacaagg
aggttctgct 8340aaagcaggtc ctgttgtaac tgacaataat aacttcatta tcgatgcgga
tttcggtgaa 8400atttccgatc caagaaaatt gcatagagaa atcaaactgt tagtgggcgt
ggtggaaaca 8460ggtttattca tcgacaacgc ttcaaaagcc tacttcggta attctgacgg
tagtgttgaa 8520gttaccgaaa agtgagcggc cgcgtgaatt tactttaaat cttgcattta
aataaatttt 8580ctttttatag ctttatgact tagtttcaat ttatatacta ttttaatgac
attttcgatt 8640cattgattga aagctttgtg ttttttcttg atgcgctatt gcattgttct
tgtctttttc 8700gccacatgta atatctgtag tagatacctg atacattgtg gatgctgagt
gaaattttag 8760ttaataatgg aggcgctctt aataattttg gggatattgg cttttttttt
taaagtttac 8820aaatgaattt tttccgccag gataacgatt ctgaagttac tcttagcgtt
cctatcggta 8880cagccatcaa atcatgccta taaatcatgc ctatatttgc gtgcagtcag
tatcatctac 8940atgaaaaaaa ctcccgcaat ttcttataga atacgttgaa aattaaatgt
acgcgccaag 9000ataagataac atatatctag atgcagtaat atacacagat tcccgcggac
gtgggaagga 9060aaaaattaga taacaaaatc tgagtgatat ggaaattccg ctgtatagct
catatctttc 9120cctccaccgc ggtggtcgac tttcacatac gttgcatacg tcgatataga
taataatgat 9180aatgacagca ggattatcgt aatacgtaat agctgaaaat ctcaaaaatg
tgtgggtcat 9240tacgtaaata atgataggaa tgggattctt ctatttttcc tttttccatt
ctagcagccg 9300tcgggaaaac gtggcatcct ctctttcggg ctcaattgga gtcacgctgc
cgtgagcatc 9360ctctctttcc atatctaaca actgagcacg taaccaatgg aaaagcatga
gcttagcgtt 9420gctccaaaaa agtattggat ggttaatacc atttgtctgt tctcttctga
ctttgactcc 9480tcaaaaaaaa aaatctacaa tcaacagatc gcttcaatta cgccctcaca
aaaacttttt 9540tccttcttct tcgcccacgt taaattttat ccctcatgtt gtctaacgga
tttctgcact 9600tgatttatta taaaaagaca aagacataat acttctctat caatttcagt
tattgttctt 9660ccttgcgtta ttcttctgtt cttctttttc ttttgtcata tataaccata
accaagtaat 9720acatattcaa acttaagact cgagatggtc aaaccaatta tagctcccag
tatccttgct 9780tctgacttcg ccaacttggg ttgcgaatgt cataaggtca tcaacgccgg
cgcagattgg 9840ttacatatcg atgtcatgga cggccatttt gttccaaaca ttactctggg
ccaaccaatt 9900gttacctccc tacgtcgttc tgtgccacgc cctggcgatg ctagcaacac
agaaaagaag 9960cccactgcgt tcttcgattg tcacatgatg gttgaaaatc ctgaaaaatg
ggtcgacgat 10020tttgctaaat gtggtgctga ccaatttacg ttccactacg aggccacaca
agaccctttg 10080catttagtta agttgattaa gtctaagggc atcaaagctg catgcgccat
caaacctggt 10140acttctgttg acgttttatt tgaactagct cctcatttgg atatggctct
tgttatgact 10200gtggaacctg ggtttggagg ccaaaaattc atggaagaca tgatgccaaa
agtggaaact 10260ttgagagcca agttccccca tttgaatatc caagtcgatg gtggtttggg
caaggagacc 10320atcccgaaag ccgccaaagc cggtgccaac gttattgtcg ctggtaccag
tgttttcact 10380gcagctgacc cgcacgatgt tatctccttc atgaaagaag aagtctcgaa
ggaattgcgt 10440tctagagatt tgctagatta gacgtctgtt taaagattac ggatatttaa
cttacttaga 10500ataatgccat ttttttgagt tataataatc ctacgttagt gtgagcggga
tttaaactgt 10560gaggacctta atacattcag acacttctgc ggtatcaccc tacttattcc
cttcgagatt 10620atatctagga acccatcagg ttggtggaag attacccgtt ctaagacttt
tcagcttcct 10680ctattgatgt tacacctgga cacccctttt ctggcatcca gtttttaatc
ttcagtggca 10740tgtgagattc tccgaaatta attaaagcaa tcacacaatt ctctcggata
ccacctcggt 10800tgaaactgac aggtggtttg ttacgcatgc taatgcaaag gagcctatat
acctttggct 10860cggctgctgt aacagggaat ataaagggca gcataattta ggagtttagt
gaacttgcaa 10920catttactat tttcccttct tacgtaaata tttttctttt taattctaaa
tcaatctttt 10980tcaatttttt gtttgtattc ttttcttgct taaatctata actacaaaaa
acacatacat 11040aaactaaaac gtacgactag tatgtctgaa ccagctcaaa agaaacaaaa
ggttgctaac 11100aactctctag aacaattgaa agcctccggc actgtcgttg ttgccgacac
tggtgatttc 11160ggctctattg ccaagtttca acctcaagac tccacaacta acccatcatt
gatcttggct 11220gctgccaagc aaccaactta cgccaagttg atcgatgttg ccgtggaata
cggtaagaag 11280catggtaaga ccaccgaaga acaagtcgaa aatgctgtgg acagattgtt
agtcgaattc 11340ggtaaggaga tcttaaagat tgttccaggc agagtctcca ccgaagttga
tgctagattg 11400tcttttgaca ctcaagctac cattgaaaag gctagacata tcattaaatt
gtttgaacaa 11460gaaggtgtct ccaaggaaag agtccttatt aaaattgctt ccacttggga
aggtattcaa 11520gctgccaaag aattggaaga aaaggacggt atccactgta atttgactct
attattctcc 11580ttcgttcaag cagttgcctg tgccgaggcc caagttactt tgatttcccc
atttgttggt 11640agaattctag actggtacaa atccagcact ggtaaagatt acaagggtga
agccgaccca 11700ggtgttattt ccgtcaagaa aatctacaac tactacaaga agtacggtta
caagactatt 11760gttatgggtg cttctttcag aagcactgac gaaatcaaaa acttggctgg
tgttgactat 11820ctaacaattt ctccagcttt attggacaag ttgatgaaca gtactgaacc
tttcccaaga 11880gttttggacc ctgtctccgc taagaaggaa gccggcgaca agatttctta
catcagcgac 11940gaatctaaat tcagattcga cttgaatgaa gacgctatgg ccactgaaaa
attgtccgaa 12000ggtatcagaa aattctctgc cgatattgtt actctattcg acttgattga
aaagaaagtt 12060accgcttaag gaagtatctc ggaaatatta atttaggcca tgtccttatg
cacgtttctt 12120ttgatactta cgggtacatg tacacaagta tatctatata tataaattaa
tgaaaatccc 12180ctatttatat atatgacttt aacgagacag aacagttttt tattttttat
cctatttgat 12240gaatgataca gtttcggatc cacgatcgca ttgcggatta cgtattctaa
tgttcagtac 12300cgttcgtata atgtatgcta tacgaagtta tgcagattgt actgagagtg
caccatacca 12360cagcttttca attcaattca tcattttttt tttattcttt tttttgattt
cggtttcttt 12420gaaatttttt tgattcggta atctccgaac agaaggaaga acgaaggaag
gagcacagac 12480ttagattggt atatatacgc atatgtagtg ttgaagaaac atgaaattgc
ccagtattct 12540taacccaact gcacagaaca aaaacctgca ggaaacgaag ataaatcatg
tcgaaagcta 12600catataagga acgtgctgct actcatccta gtcctgttgc tgccaagcta
tttaatatca 12660tgcacgaaaa gcaaacaaac ttgtgtgctt cattggatgt tcgtaccacc
aaggaattac 12720tggagttagt tgaagcatta ggtcccaaaa tttgtttact aaaaacacat
gtggatatct 12780tgactgattt ttccatggag ggcacagtta agccgctaaa ggcattatcc
gccaagtaca 12840attttttact cttcgaagac agaaaatttg ctgacattgg taatacagtc
aaattgcagt 12900actctgcggg tgtatacaga atagcagaat gggcagacat tacgaatgca
cacggtgtgg 12960tgggcccagg tattgttagc ggtttgaagc aggcggcaga agaagtaaca
aaggaaccta 13020gaggcctttt gatgttagca gaattgtcat gcaagggctc cctatctact
ggagaatata 13080ctaagggtac tgttgacatt gcgaagagcg acaaagattt tgttatcggc
tttattgctc 13140aaagagacat gggtggaaga gatgaaggtt acgattggtt gattatgaca
cccggtgtgg 13200gtttagatga caagggagac gcattgggtc aacagtatag aaccgtggat
gatgtggtct 13260ctacaggatc tgacattatt attgttggaa gaggactatt tgcaaaggga
agggatgcta 13320aggtagaggg tgaacgttac agaaaagcag gctgggaagc atatttgaga
agatgcggcc 13380agcaaaacta aaaaactgta ttataagtaa atgcatgtat actaaactca
caaattagag 13440cttcaattta attatatcag ttattaccct atgcggtgtg aaataccgca
cagatgcgta 13500aggagaaaat accgcatcag gaaattgtaa acgttaatat tttgttaaaa
ttcgcgttaa 13560atttttgtta aatcagctca ttttttaacc aataggccga aatcggcaaa
atcccttata 13620aatcaaaaga atagaccgag atagggttga gtgttgttcc agtttggaac
aagagtccac 13680tattaaagaa cgtggactcc aacgtcaaag ggcgaaaaac cgtctatcag
ggcgatggcc 13740cactacgtga accatcaccc taatcaagat aacttcgtat aatgtatgct
atacgaacgg 13800tacccgccaa ctctgttcga gaatgatgta atcaagaagg tctcacaaaa
ccatccaggc 13860agtaccactt cccaagtatt gcttagatgg gcaactcaga gaggcattgc
cgtcattcca 13920aaatcttcca agaaggaaag gttacttggc aacctagaaa tcgaaaaaaa
gttcacttta 13980acggagcaag aattgaagga tatttctgca ctaaatgcca acatcagatt
taatgatcca 14040tggacctggt tggatggtaa attccccact tttgcctgat ccagccagta
aaatccatac 14100tcaacgacga tatgaacaaa tttccctcat tccgatgctg tatatgtgta
taaattttta 14160catgctcttc tgtttagaca cagaacagct ttaaataaaa tgttggatat
actttttctg 14220cctgtggtgt catccacgct tttaattcat ctcttgtatg gttgacaatt
tggctatttt 14280ttaacagaac ccaacggtaa ttgaaattaa aagggaaacg agtgggggcg
atgagtgagt 14340gatacggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt
cacaccgcat 14400atggtgcact ctcagtacaa tctgctctga tgccgcatag ttaagccagc
cccgacaccc 14460gccaacaccc gctgacgcgc cctgacgggc ttgtctgctc ccggcatccg
cttacagaca 14520agctgtgacc gtctccggga gctgcatgtg tcagaggttt tcaccgtcat
caccgaaacg 14580cgcgagacga aagggcctcg tgatacgcct atttttatag gttaatgtca
tgataataat 14640ggtttcttag acgtcaggtg gcacttttcg gggaaatgtg cgcggaaccc
ctatttgttt 14700atttttctaa atacattcaa atatgtatcc gctcatgaga caataaccct
gataaatgct 14760tcaataatat tgaaaaagga agagtatgag tattcaacat ttccgtgtcg
cccttattcc 14820cttttttgcg gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg
tgaaagtaaa 14880agatgctgaa gatcagttgg gtgcacgagt gggttacatc gaactggatc
tcaacagcgg 14940taagatcctt gagagttttc gccccgaaga acgttttcca atgatgagca
cttttaaagt 15000tctgctatgt ggcgcggtat tatcccgtat tgacgccggg caagagcaac
tcggtcgccg 15060catacactat tctcagaatg acttggttga gtactcacca gtcacagaaa
agcatcttac 15120ggatggcatg acagtaagag aattatgcag tgctgccata accatgagtg
ataacactgc 15180ggccaactta cttctgacaa cgatcggagg accgaaggag ctaaccgctt
ttttgcacaa 15240catgggggat catgtaactc gccttgatcg ttgggaaccg gagctgaatg
aagccatacc 15300aaacgacgag cgtgacacca cgatgcctgt agcaatggca acaacgttgc
gcaaactatt 15360aactggcgaa ctacttactc tagcttcccg gcaacaatta atagactgga
tggaggcgga 15420taaagttgca ggaccacttc tgcgctcggc ccttccggct ggctggttta
ttgctgataa 15480atctggagcc ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc
cagatggtaa 15540gccctcccgt atcgtagtta tctacacgac ggggagtcag gcaactatgg
atgaacgaaa 15600tagacagatc gctgagatag gtgcctcact gattaagcat tggtaactgt
cagaccaagt 15660ttactcatat atactttaga ttgatttaaa acttcatttt taatttaaaa
ggatctaggt 15720gaagatcctt tttgataatc tcatgaccaa aatcccttaa cgtgagtttt
cgttccactg 15780agcgtcagac cccgtagaaa agatcaaagg atcttcttga gatccttttt
ttctgcgcgt 15840aatctgctgc ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt
tgccggatca 15900agagctacca actctttttc cgaaggtaac tggcttcagc agagcgcaga
taccaaatac 15960tgtccttcta gtgtagccgt agttaggcca ccacttcaag aactctgtag
caccgcctac 16020atacctcgct ctgctaatcc tgttaccagt ggctgctgcc agtggcgata
agtcgtgtct 16080taccgggttg gactcaagac gatagttacc ggataaggcg cagcggtcgg
gctgaacggg 16140gggttcgtgc acacagccca gcttggagcg aacgacctac accgaactga
gatacctaca 16200gcgtgagcta tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca
ggtatccggt 16260aagcggcagg gtcggaacag gagagcgcac gagggagctt ccagggggaa
acgcctggta 16320tctttatagt cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt
tgtgatgctc 16380gtcagggggg cggagcctat ggaa
1640444440PRTartificial sequenceconstructed xylose isomerase
44Met Ala Lys Glu Tyr Phe Pro Gln Ile Gln Lys Ile Gln Tyr Gln Gly1
5 10 15Pro Lys Ser Thr Asp Pro
Leu Ser Phe Lys Tyr Tyr Asn Pro Glu Glu 20 25
30Val Ile Asn Gly Lys Thr Met Arg Glu His Leu Lys Phe
Ala Leu Ser 35 40 45Trp Trp His
Thr Met Gly Gly Asp Gly Thr Asp Met Phe Gly Cys Gly 50
55 60Thr Thr Asp Lys Thr Trp Gly Gln Ser Asp Pro Ala
Ala Arg Ala Lys65 70 75
80Ala Lys Val Asp Ala Ala Phe Glu Ile Met Asp Lys Leu Ser Ile Asp
85 90 95Tyr Tyr Cys Phe His Asp
Arg Asp Leu Ser Pro Glu Tyr Gly Ser Leu 100
105 110Lys Ala Thr Asn Asp Gln Leu Asp Ile Val Thr Asp
Tyr Ile Lys Glu 115 120 125Lys Gln
Gly Asp Lys Phe Lys Cys Leu Trp Gly Thr Ala Lys Cys Phe 130
135 140Asp His Pro Arg Phe Met His Gly Ala Gly Thr
Ser Pro Ser Ala Asp145 150 155
160Val Phe Ala Phe Ser Ala Ala Gln Ile Lys Lys Ala Leu Glu Ser Thr
165 170 175Val Lys Leu Gly
Ala Asn Gly Tyr Val Phe Trp Gly Gly Arg Glu Gly 180
185 190Tyr Glu Thr Leu Leu Asn Thr Asn Met Gly Leu
Glu Leu Asp Asn Met 195 200 205Ala
Arg Leu Met Lys Met Ala Val Glu Tyr Gly Arg Ser Ile Gly Phe 210
215 220Lys Gly Asp Phe Tyr Ile Glu Pro Lys Pro
Lys Glu Pro Thr Lys His225 230 235
240Gln Tyr Asp Phe Asp Thr Ala Thr Val Leu Gly Phe Leu Arg Lys
Tyr 245 250 255Gly Leu Asp
Lys Asp Phe Lys Met Asn Ile Glu Ala Asn His Ala Thr 260
265 270Leu Ala Gln His Thr Phe Gln His Glu Leu
Arg Val Ala Arg Asp Asn 275 280
285Gly Val Phe Gly Ser Ile Asp Ala Asn Gln Gly Asp Val Leu Leu Gly 290
295 300Trp Asp Thr Asp Gln Phe Pro Thr
Asn Ile Tyr Asp Thr Thr Met Cys305 310
315 320Met Tyr Glu Val Ile Lys Ala Gly Gly Phe Thr Asn
Gly Gly Leu Asn 325 330
335Phe Asp Ala Lys Ala Arg Arg Gly Ser Phe Thr Pro Glu Asp Ile Phe
340 345 350Tyr Ser Tyr Ile Ala Gly
Met Asp Ala Phe Ala Leu Gly Phe Arg Ala 355 360
365Ala Leu Lys Leu Ile Glu Asp Gly Arg Ile Asp Lys Phe Val
Ala Asp 370 375 380Arg Tyr Ala Ser Trp
Asn Thr Gly Ile Gly Ala Asp Ile Ile Ala Gly385 390
395 400Lys Ala Asp Phe Ala Ser Leu Glu Lys Tyr
Ala Leu Glu Lys Gly Glu 405 410
415Val Thr Ala Ser Leu Ser Ser Gly Arg Gln Glu Met Leu Glu Ser Ile
420 425 430Val Asn Asn Val Leu
Phe Ser Leu 435 440451323DNAartificial
sequencecodon optimized coding region for constructed xylose
isomerase VDxylA 45atggctaagg aatacttccc acaaatccaa aagatccaat accaaggtcc
aaagtccact 60gacccattgt ccttcaagta ctacaaccca gaagaagtca tcaacggtaa
aaccatgaga 120gaacacttga agttcgcttt gtcttggtgg cacaccatgg gtggtgacgg
tactgatatg 180ttcggttgtg gtactactga caagacttgg ggtcaatctg atccagctgc
tagagctaag 240gctaaggttg acgctgcttt cgaaatcatg gacaagttgt ccatcgatta
ctactgtttc 300cacgacagag atttgtctcc agaatacggt tccttgaagg ctaccaacga
ccaattggat 360atcgtcactg actacatcaa ggaaaagcaa ggtgacaagt tcaagtgttt
gtggggtact 420gctaagtgtt tcgaccaccc aagattcatg cacggtgctg gtacttctcc
atccgctgac 480gttttcgctt tctctgctgc tcaaatcaag aaggctttgg aatccaccgt
taagttgggt 540gctaacggtt acgtcttctg gggtggtaga gaaggttacg aaaccttgtt
gaacactaac 600atgggtttgg aattggacaa catggctaga ttgatgaaga tggctgttga
atacggtaga 660tctatcggtt tcaagggtga cttctacatc gaaccaaagc caaaggaacc
aactaagcac 720caatacgact tcgataccgc tactgtcttg ggtttcttga gaaagtacgg
tttggacaag 780gatttcaaga tgaacatcga agctaaccac gctaccttgg ctcaacacac
tttccaacac 840gaattgagag ttgctagaga caacggtgtc ttcggttcca tcgatgctaa
ccaaggtgac 900gttttgttgg gttgggacac cgatcaattc ccaactaaca tctacgacac
cactatgtgt 960atgtacgaag tcatcaaggc tggtggtttc accaacggtg gtttgaactt
cgacgctaag 1020gctagaagag gttctttcac tccagaagac atcttctact cctacatcgc
tggtatggac 1080gctttcgctt tgggtttcag agctgctttg aagttgatcg aagacggtag
aatcgataag 1140ttcgttgctg acagatacgc ttcttggaac accggtatcg gtgctgacat
catcgctggt 1200aaagctgatt tcgcttcttt ggaaaagtac gctttggaaa agggtgaagt
cactgcttcc 1260ttgtcctctg gtagacaaga aatgttggaa tccatcgtca acaacgtttt
gttctctttg 1320tga
1323462966DNAartificial sequenceconstructed chimeric
expression cassette for VDxykA 46tgacagcagg attatcgtaa tacgtaatag
ttgaaaatct caaaaatgtg tgggtcatta 60cgtaaataat gataggaatg ggattcttct
atttttcctt tttccattct gtcgaccgca 120cgccgaaatg catgcaagta acctattcaa
agtaatatct catacatgtt tcatgagggt 180aacaacatgc gactgggtga gcatatgttc
cgctgatgtg atgtgcaaga taaacaagca 240agacagaaac taacttcttc ttcatgtaat
aaacacaccc cgcgtttatt tacctatctt 300taaacttcaa caccttatat cataactaat
atttcttgag ataagcacac tgcacccata 360ccttccttaa aaacgtagct tccagttttt
ggtggttctg gcttccttcc cgattccgcc 420cgctaaacgc ataattttgt tgcctggtgg
catttgcaaa atgcataacc tatgcattta 480aaagattatg tatgctcttc tgacttttcg
tgtgatgagg ctcgtggaaa aaatgaataa 540tttatgaatt tgagaacaat tttgtgttgt
tacggtattt tactatggaa taatcaatca 600attgaggatt ttatgcaaat atcgtttgaa
tatttttccg accctttgag tacttttctt 660cataattgca taatattgtc cgctgcccgt
ttttctgtta gacggtgtct tgatctactt 720gctatcgttc aacaccacct tatcttctaa
ctattttttt tttagctcat ttgaatcagc 780ttatggtgat ggcacatttt tgcataaacc
tagctgtcct cgttgaacat aggaaaaaaa 840aatatataaa caaggctctt tcactctcct
tggaatcaga tttgggtttg ttccctttat 900tttcatattt cttgtcatat tcttttctca
attattattt tctactcata acctcacgca 960aaataacaca gtcaaatcaa tcaagtttaa
acagtatggc taaggaatac ttcccacaaa 1020tccaaaagat ccaataccaa ggtccaaagt
ccactgaccc attgtccttc aagtactaca 1080acccagaaga agtcatcaac ggtaaaacca
tgagagaaca cttgaagttc gctttgtctt 1140ggtggcacac catgggtggt gacggtactg
atatgttcgg ttgtggtact actgacaaga 1200cttggggtca atctgatcca gctgctagag
ctaaggctaa ggttgacgct gctttcgaaa 1260tcatggacaa gttgtccatc gattactact
gtttccacga cagagatttg tctccagaat 1320acggttcctt gaaggctacc aacgaccaat
tggatatcgt cactgactac atcaaggaaa 1380agcaaggtga caagttcaag tgtttgtggg
gtactgctaa gtgtttcgac cacccaagat 1440tcatgcacgg tgctggtact tctccatccg
ctgacgtttt cgctttctct gctgctcaaa 1500tcaagaaggc tttggaatcc accgttaagt
tgggtgctaa cggttacgtc ttctggggtg 1560gtagagaagg ttacgaaacc ttgttgaaca
ctaacatggg tttggaattg gacaacatgg 1620ctagattgat gaagatggct gttgaatacg
gtagatctat cggtttcaag ggtgacttct 1680acatcgaacc aaagccaaag gaaccaacta
agcaccaata cgacttcgat accgctactg 1740tcttgggttt cttgagaaag tacggtttgg
acaaggattt caagatgaac atcgaagcta 1800accacgctac cttggctcaa cacactttcc
aacacgaatt gagagttgct agagacaacg 1860gtgtcttcgg ttccatcgat gctaaccaag
gtgacgtttt gttgggttgg gacaccgatc 1920aattcccaac taacatctac gacaccacta
tgtgtatgta cgaagtcatc aaggctggtg 1980gtttcaccaa cggtggtttg aacttcgacg
ctaaggctag aagaggttct ttcactccag 2040aagacatctt ctactcctac atcgctggta
tggacgcttt cgctttgggt ttcagagctg 2100ctttgaagtt gatcgaagac ggtagaatcg
ataagttcgt tgctgacaga tacgcttctt 2160ggaacaccgg tatcggtgct gacatcatcg
ctggtaaagc tgatttcgct tctttggaaa 2220agtacgcttt ggaaaagggt gaagtcactg
cttccttgtc ctctggtaga caagaaatgt 2280tggaatccat cgtcaacaac gttttgttct
ctttgtgagg ccctgcaggc cagaggaaaa 2340taatatcaag tgctggaaac tttttctctt
ggaatttttg caacatcaag tcatagtcaa 2400ttgaattgac ccaatttcac atttaagatt
tttttttttt catccgacat acatctgtac 2460actaggaagc cctgtttttc tgaagcagct
tcaaatatat atatttttta catatttatt 2520atgattcaat gaacaatcta attaaatcga
aaacaagaac cgaaacgcga ataaataatt 2580tatttagatg gtgacaagtg tataagtcct
catcgggaca gctacgattt ctctttcggt 2640tttggctgag ctactggttg ctgtgacgca
gcggcattag cgcggcgtta tgagctaccc 2700tcgtggcctg aaagatggcg ggaataaagc
ggaactaaaa attactgact gagccatatt 2760gaggtcaatt tgtcaactcg tcaagtcacg
tttggtggac ggcccctttc caacgaatcg 2820tatatactaa catgcgcgcg cttcctatat
acacatatac atatatatat atatatatat 2880gtgtgcgtgt atgtgtacac ctgtatttaa
tttccttact cgcgggtttt tcttttttct 2940caattcttgg cttcctcttt ctcgag
2966478601DNAartificial
sequenceconstructed plasmid 47aaacgccagc aacgcggcct ttttacggtt cctggccttt
tgctggcctt ttgctcacat 60gttctttcct gcgttatccc ctgattctgt ggataaccgt
attaccgcct ttgagtgagc 120tgataccgct cgccgcagcc gaacgaccga gcgcagcgag
tcagtgagcg aggaagcgga 180agagcgccca atacgcaaac cgcctctccc cgcgcgttgg
ccgattcatt aatgcagctg 240gcacgacagg tttcccgact ggaaagcggg cagtgagcgc
aacgcaatta atgtgagtta 300gctcactcat taggcacccc aggctttaca ctttatgctt
ccggctcgta tgttgtgtgg 360aattgtgagc ggataacaat ttcacacagg aaacagctat
gaccatgatt acgccaagct 420tggcgccact tgtgcatgat taccgacaac caaaaccagt
cacagattct attaatcctc 480caaatgtaaa cataaccacc tccacaacca acaagaacct
agatggcatt tatattttgc 540cagctcctcg tatgaatccc ccggctcaaa cacaatacca
aatgattcat gcgccagaca 600gcatgcaaca tccaccaaca tttagtaaaa acaacacatc
aagcaatcct aaatcccacc 660aatactcaaa gtagaagatc agcatccttt caattgctga
aaggttcacc taaagtaccg 720ctcatattcc aaaaggattc ttcactacat agaaagggca
gccaattgtg tgtttttcag 780aaagggtttt aaaaaaacag gagggtgctt gttcttgttg
ttccctacca tcgatggatt 840tcgaaaaact atttatagga ccatctgatt ttcacctcca
tcattgtatc atatactaac 900aagcatatcc aaatttgtaa ttctatcatg aaatttccag
agaaagaaac gcaagggaac 960tgagaaatca aacactagtt gacagcagga ttatcgtaat
acgtaatagt tgaaaatctc 1020aaaaatgtgt gggtcattac gtaaataatg ataggaatgg
gattcttcta tttttccttt 1080ttccattctg tcgaccgcac gccgaaatgc atgcaagtaa
cctattcaaa gtaatatctc 1140atacatgttt catgagggta acaacatgcg actgggtgag
catatgttcc gctgatgtga 1200tgtgcaagat aaacaagcaa gacagaaact aacttcttct
tcatgtaata aacacacccc 1260gcgtttattt acctatcttt aaacttcaac accttatatc
ataactaata tttcttgaga 1320taagcacact gcacccatac cttccttaaa aacgtagctt
ccagtttttg gtggttctgg 1380cttccttccc gattccgccc gctaaacgca taattttgtt
gcctggtggc atttgcaaaa 1440tgcataacct atgcatttaa aagattatgt atgctcttct
gacttttcgt gtgatgaggc 1500tcgtggaaaa aatgaataat ttatgaattt gagaacaatt
ttgtgttgtt acggtatttt 1560actatggaat aatcaatcaa ttgaggattt tatgcaaata
tcgtttgaat atttttccga 1620ccctttgagt acttttcttc ataattgcat aatattgtcc
gctgcccgtt tttctgttag 1680acggtgtctt gatctacttg ctatcgttca acaccacctt
atcttctaac tatttttttt 1740ttagctcatt tgaatcagct tatggtgatg gcacattttt
gcataaacct agctgtcctc 1800gttgaacata ggaaaaaaaa atatataaac aaggctcttt
cactctcctt ggaatcagat 1860ttgggtttgt tccctttatt ttcatatttc ttgtcatatt
cttttctcaa ttattatttt 1920ctactcataa cctcacgcaa aataacacag tcaaatcaat
caagtttaaa cagtatggct 1980aaggaatact tcccacaaat ccaaaagatc caataccaag
gtccaaagtc cactgaccca 2040ttgtccttca agtactacaa cccagaagaa gtcatcaacg
gtaaaaccat gagagaacac 2100ttgaagttcg ctttgtcttg gtggcacacc atgggtggtg
acggtactga tatgttcggt 2160tgtggtacta ctgacaagac ttggggtcaa tctgatccag
ctgctagagc taaggctaag 2220gttgacgctg ctttcgaaat catggacaag ttgtccatcg
attactactg tttccacgac 2280agagatttgt ctccagaata cggttccttg aaggctacca
acgaccaatt ggatatcgtc 2340actgactaca tcaaggaaaa gcaaggtgac aagttcaagt
gtttgtgggg tactgctaag 2400tgtttcgacc acccaagatt catgcacggt gctggtactt
ctccatccgc tgacgttttc 2460gctttctctg ctgctcaaat caagaaggct ttggaatcca
ccgttaagtt gggtgctaac 2520ggttacgtct tctggggtgg tagagaaggt tacgaaacct
tgttgaacac taacatgggt 2580ttggaattgg acaacatggc tagattgatg aagatggctg
ttgaatacgg tagatctatc 2640ggtttcaagg gtgacttcta catcgaacca aagccaaagg
aaccaactaa gcaccaatac 2700gacttcgata ccgctactgt cttgggtttc ttgagaaagt
acggtttgga caaggatttc 2760aagatgaaca tcgaagctaa ccacgctacc ttggctcaac
acactttcca acacgaattg 2820agagttgcta gagacaacgg tgtcttcggt tccatcgatg
ctaaccaagg tgacgttttg 2880ttgggttggg acaccgatca attcccaact aacatctacg
acaccactat gtgtatgtac 2940gaagtcatca aggctggtgg tttcaccaac ggtggtttga
acttcgacgc taaggctaga 3000agaggttctt tcactccaga agacatcttc tactcctaca
tcgctggtat ggacgctttc 3060gctttgggtt tcagagctgc tttgaagttg atcgaagacg
gtagaatcga taagttcgtt 3120gctgacagat acgcttcttg gaacaccggt atcggtgctg
acatcatcgc tggtaaagct 3180gatttcgctt ctttggaaaa gtacgctttg gaaaagggtg
aagtcactgc ttccttgtcc 3240tctggtagac aagaaatgtt ggaatccatc gtcaacaacg
ttttgttctc tttgtgaggc 3300cctgcaggcc agaggaaaat aatatcaagt gctggaaact
ttttctcttg gaatttttgc 3360aacatcaagt catagtcaat tgaattgacc caatttcaca
tttaagattt tttttttttc 3420atccgacata catctgtaca ctaggaagcc ctgtttttct
gaagcagctt caaatatata 3480tattttttac atatttatta tgattcaatg aacaatctaa
ttaaatcgaa aacaagaacc 3540gaaacgcgaa taaataattt atttagatgg tgacaagtgt
ataagtcctc atcgggacag 3600ctacgatttc tctttcggtt ttggctgagc tactggttgc
tgtgacgcag cggcattagc 3660gcggcgttat gagctaccct cgtggcctga aagatggcgg
gaataaagcg gaactaaaaa 3720ttactgactg agccatattg aggtcaattt gtcaactcgt
caagtcacgt ttggtggacg 3780gcccctttcc aacgaatcgt atatactaac atgcgcgcgc
ttcctatata cacatataca 3840tatatatata tatatatatg tgtgcgtgta tgtgtacacc
tgtatttaat ttccttactc 3900gcgggttttt cttttttctc aattcttggc ttcctctttc
tcgagcccgg gatttaaatg 3960tggcttactc cattgttgat gcaaaagttg taaatttcac
gaattattta atgcgttcct 4020tgcaaccttc tattttgatg aacatcagga attgaaacaa
aaaaaaggct tcaatctcaa 4080cggaaaacgg gaagaaaact acactcgatt atactatata
tgccaagaag attctccgac 4140agattgtcta cttaatttca cataatatat cttgttttac
tagcttatta tatagcgtcg 4200catttaattc atggcgccat cacccgcagg gaatataacg
acaaggccga taccacggga 4260aaaatagggc gagcggaaat actaaaagaa aaataagctt
ccgaaataaa acaccgacaa 4320tgaagttctt ggcaaggttc ggttaggatc cgcattgcgg
attacgtatt ctaatgttca 4380gtaccgttcg tataatgtat gctatacgaa gttatgcaga
ttgtactgag agtgcaccat 4440accacagctt ttcaattcaa ttcatcattt tttttttatt
cttttttttg atttcggttt 4500ctttgaaatt tttttgattc ggtaatctcc gaacagaagg
aagaacgaag gaaggagcac 4560agacttagat tggtatatat acgcatatgt agtgttgaag
aaacatgaaa ttgcccagta 4620ttcttaaccc aactgcacag aacaaaaacc tgcaggaaac
gaagataaat catgtcgaaa 4680gctacatata aggaacgtgc tgctactcat cctagtcctg
ttgctgccaa gctatttaat 4740atcatgcacg aaaagcaaac aaacttgtgt gcttcattgg
atgttcgtac caccaaggaa 4800ttactggagt tagttgaagc attaggtccc aaaatttgtt
tactaaaaac acatgtggat 4860atcttgactg atttttccat ggagggcaca gttaagccgc
taaaggcatt atccgccaag 4920tacaattttt tactcttcga agacagaaaa tttgctgaca
ttggtaatac agtcaaattg 4980cagtactctg cgggtgtata cagaatagca gaatgggcag
acattacgaa tgcacacggt 5040gtggtgggcc caggtattgt tagcggtttg aagcaggcgg
cagaagaagt aacaaaggaa 5100cctagaggcc ttttgatgtt agcagaattg tcatgcaagg
gctccctatc tactggagaa 5160tatactaagg gtactgttga cattgcgaag agcgacaaag
attttgttat cggctttatt 5220gctcaaagag acatgggtgg aagagatgaa ggttacgatt
ggttgattat gacacccggt 5280gtgggtttag atgacaaggg agacgcattg ggtcaacagt
atagaaccgt ggatgatgtg 5340gtctctacag gatctgacat tattattgtt ggaagaggac
tatttgcaaa gggaagggat 5400gctaaggtag agggtgaacg ttacagaaaa gcaggctggg
aagcatattt gagaagatgc 5460ggccagcaaa actaaaaaac tgtattataa gtaaatgcat
gtatactaaa ctcacaaatt 5520agagcttcaa tttaattata tcagttatta ccctatgcgg
tgtgaaatac cgcacagatg 5580cgtaaggaga aaataccgca tcaggaaatt gtaaacgtta
atattttgtt aaaattcgcg 5640ttaaattttt gttaaatcag ctcatttttt aaccaatagg
ccgaaatcgg caaaatccct 5700tataaatcaa aagaatagac cgagataggg ttgagtgttg
ttccagtttg gaacaagagt 5760ccactattaa agaacgtgga ctccaacgtc aaagggcgaa
aaaccgtcta tcagggcgat 5820ggcccactac gtgaaccatc accctaatca agataacttc
gtataatgta tgctatacga 5880acggtaccga gatacccact tcgaaagtta ctgatattat
aactcttgtg tcctctcttc 5940taatacctta ctttcacctt tctcacgtag ttaaagttgc
aacaacacat tttgtcctca 6000tccaatttct tctatagaat atccgtttgc ctccaggagt
gaagaaatga tagcagtaac 6060actgtgaaag cgagactaag agaaacgact taaagctcga
agacttcttg aggatacgtt 6120tatgtttctg tggcttcttc ttcgcggcgc ggttctcgcg
tataggaatg ttctaagaca 6180agaaggcatg aagttatgtt aacagattct atatctactc
gctacgcata tataaacgga 6240ttcatcattg aaacaatggt acttgtggta atgtgtacga
cgatttcaac ccgaataaaa 6300gcaaaagtgc aaaaaaaaaa caagaagcgc ttagcactgt
tgaatcattt agaacactac 6360taatgctggt aatacggcgc cgaattcact ggccgtcgtt
ttacaacgtc gtgactggga 6420aaaccctggc gttacccaac ttaatcgcct tgcagcacat
ccccctttcg ccagctggcg 6480taatagcgaa gaggcccgca ccgatcgccc ttcccaacag
ttgcgcagcc tgaatggcga 6540atggcgcctg atgcggtatt ttctccttac gcatctgtgc
ggtatttcac accgcatatg 6600gtgcactctc agtacaatct gctctgatgc cgcatagtta
agccagcccc gacacccgcc 6660aacacccgct gacgcgccct gacgggcttg tctgctcccg
gcatccgctt acagacaagc 6720tgtgaccgtc tccgggagct gcatgtgtca gaggttttca
ccgtcatcac cgaaacgcgc 6780gagacgaaag ggcctcgtga tacgcctatt tttataggtt
aatgtcatga taataatggt 6840ttcttagacg tcaggtggca cttttcgggg aaatgtgcgc
ggaaccccta tttgtttatt 6900tttctaaata cattcaaata tgtatccgct catgagacaa
taaccctgat aaatgcttca 6960ataatattga aaaaggaaga gtatgagtat tcaacatttc
cgtgtcgccc ttattccctt 7020ttttgcggca ttttgccttc ctgtttttgc tcacccagaa
acgctggtga aagtaaaaga 7080tgctgaagat cagttgggtg cacgagtggg ttacatcgaa
ctggatctca acagcggtaa 7140gatccttgag agttttcgcc ccgaagaacg ttttccaatg
atgagcactt ttaaagttct 7200gctatgtggc gcggtattat cccgtattga cgccgggcaa
gagcaactcg gtcgccgcat 7260acactattct cagaatgact tggttgagta ctcaccagtc
acagaaaagc atcttacgga 7320tggcatgaca gtaagagaat tatgcagtgc tgccataacc
atgagtgata acactgcggc 7380caacttactt ctgacaacga tcggaggacc gaaggagcta
accgcttttt tgcacaacat 7440gggggatcat gtaactcgcc ttgatcgttg ggaaccggag
ctgaatgaag ccataccaaa 7500cgacgagcgt gacaccacga tgcctgtagc aatggcaaca
acgttgcgca aactattaac 7560tggcgaacta cttactctag cttcccggca acaattaata
gactggatgg aggcggataa 7620agttgcagga ccacttctgc gctcggccct tccggctggc
tggtttattg ctgataaatc 7680tggagccggt gagcgtgggt ctcgcggtat cattgcagca
ctggggccag atggtaagcc 7740ctcccgtatc gtagttatct acacgacggg gagtcaggca
actatggatg aacgaaatag 7800acagatcgct gagataggtg cctcactgat taagcattgg
taactgtcag accaagttta 7860ctcatatata ctttagattg atttaaaact tcatttttaa
tttaaaagga tctaggtgaa 7920gatccttttt gataatctca tgaccaaaat cccttaacgt
gagttttcgt tccactgagc 7980gtcagacccc gtagaaaaga tcaaaggatc ttcttgagat
cctttttttc tgcgcgtaat 8040ctgctgcttg caaacaaaaa aaccaccgct accagcggtg
gtttgtttgc cggatcaaga 8100gctaccaact ctttttccga aggtaactgg cttcagcaga
gcgcagatac caaatactgt 8160ccttctagtg tagccgtagt taggccacca cttcaagaac
tctgtagcac cgcctacata 8220cctcgctctg ctaatcctgt taccagtggc tgctgccagt
ggcgataagt cgtgtcttac 8280cgggttggac tcaagacgat agttaccgga taaggcgcag
cggtcgggct gaacgggggg 8340ttcgtgcaca cagcccagct tggagcgaac gacctacacc
gaactgagat acctacagcg 8400tgagctatga gaaagcgcca cgcttcccga agggagaaag
gcggacaggt atccggtaag 8460cggcagggtc ggaacaggag agcgcacgag ggagcttcca
gggggaaacg cctggtatct 8520ttatagtcct gtcgggtttc gccacctctg acttgagcgt
cgatttttgt gatgctcgtc 8580aggggggcgg agcctatgga a
8601488645DNAartificial sequenceconstructed plasmid
48accttggctc aacacacttt ccaacacgaa ttgagagttg ctagagacaa cggtgtcttc
60ggttccatcg atgctaacca aggtgacgtt ttgttgggtt gggacaccga tcaattccca
120actaacatct acgacaccac tatgtgtatg tacgaagtca tcaaggctgg tggtttcacc
180aacggtggtt tgaacttcga cgctaaggct agaagaggtt ctttcactcc agaagacatc
240ttctactcct acatcgctgg tatggacgct ttcgctttgg gtttcagagc tgctttgaag
300ttgatcgaag acggtagaat cgataagttc gttgctgaca gatacgcttc ttggaacacc
360ggtatcggtg ctgacatcat cgctggtaaa gctgatttcg cttctttgga aaagtacgct
420ttggaaaagg gtgaagtcac tgcttccttg tcctctggta gacaagaaat gttggaatcc
480atcgtcaaca acgttttgtt ctctttgtga ggccctgcag gccagaggaa aataatatca
540agtgctggaa actttttctc ttggaatttt tgcaacatca agtcatagtc aattgaattg
600acccaatttc acatttaaga tttttttttt ttcatccgac atacatctgt acactaggaa
660gccctgtttt tctgaagcag cttcaaatat atatattttt tacatattta ttatgattca
720atgaacaatc taattaaatc gaaaacaaga accgaaacgc gaataaataa tttatttaga
780tggtgacaag tgtataagtc ctcatcggga cagctacgat ttctctttcg gttttggctg
840agctactggt tgctgtgacg cagcggcatt agcgcggcgt tatgagctac cctcgtggcc
900tgaaagatgg cgggaataaa gcggaactaa aaattactga ctgagccata ttgaggtcaa
960tttgtcaact cgtcaagtca cgtttggtgg acggcccctt tccaacgaat cgtatatact
1020aacatgcgcg cgcttcctat atacacatat acatatatat atatatatat atgtgtgcgt
1080gtatgtgtac acctgtattt aatttcctta ctcgcgggtt tttctttttt ctcaattctt
1140ggcttcctct ttctcgagcc cgggatttaa atccttatac cctcatctta ccgcagtgcg
1200gttttacgcc tcatgttatt tttgccagct ttaataacaa catcagtaat attacgttat
1260tggatcttcc actccggttc gaggaaaaaa agagaggagg agaaacgcat aagctacaat
1320aatgagtgag ttaacgcttt aatttgctca gtgatcattg ctagccggat atttgtgttt
1380ttgtagaacc cagccatacc taatcatctc agtatattat ctgaattctt gactcaaaat
1440atggattttc tcgaacgtct cacttccaat catcccatca aatgctgtgg atgaaaacat
1500ttaaaacacg ggcaagaaaa aatgagtata gtatagttgt ggccagttct tgttgtacta
1560atgcacttgc tttcgtatca aagttgaaga agcactgaaa gaacaacaaa aaatttatta
1620aaagcaaaaa tcattctacg ttcaacgaaa atgtaaggca aatatatctt tatataggca
1680agcataaccg acgcggatcc gcattgcgga ttacgtattc taatgttcag taccgttcgt
1740ataatgtatg ctatacgaag ttatgcagat tgtactgaga gtgcaccata ccacagcttt
1800tcaattcaat tcatcatttt ttttttattc ttttttttga tttcggtttc tttgaaattt
1860ttttgattcg gtaatctccg aacagaagga agaacgaagg aaggagcaca gacttagatt
1920ggtatatata cgcatatgta gtgttgaaga aacatgaaat tgcccagtat tcttaaccca
1980actgcacaga acaaaaacct gcaggaaacg aagataaatc atgtcgaaag ctacatataa
2040ggaacgtgct gctactcatc ctagtcctgt tgctgccaag ctatttaata tcatgcacga
2100aaagcaaaca aacttgtgtg cttcattgga tgttcgtacc accaaggaat tactggagtt
2160agttgaagca ttaggtccca aaatttgttt actaaaaaca catgtggata tcttgactga
2220tttttccatg gagggcacag ttaagccgct aaaggcatta tccgccaagt acaatttttt
2280actcttcgaa gacagaaaat ttgctgacat tggtaataca gtcaaattgc agtactctgc
2340gggtgtatac agaatagcag aatgggcaga cattacgaat gcacacggtg tggtgggccc
2400aggtattgtt agcggtttga agcaggcggc agaagaagta acaaaggaac ctagaggcct
2460tttgatgtta gcagaattgt catgcaaggg ctccctatct actggagaat atactaaggg
2520tactgttgac attgcgaaga gcgacaaaga ttttgttatc ggctttattg ctcaaagaga
2580catgggtgga agagatgaag gttacgattg gttgattatg acacccggtg tgggtttaga
2640tgacaaggga gacgcattgg gtcaacagta tagaaccgtg gatgatgtgg tctctacagg
2700atctgacatt attattgttg gaagaggact atttgcaaag ggaagggatg ctaaggtaga
2760gggtgaacgt tacagaaaag caggctggga agcatatttg agaagatgcg gccagcaaaa
2820ctaaaaaact gtattataag taaatgcatg tatactaaac tcacaaatta gagcttcaat
2880ttaattatat cagttattac cctatgcggt gtgaaatacc gcacagatgc gtaaggagaa
2940aataccgcat caggaaattg taaacgttaa tattttgtta aaattcgcgt taaatttttg
3000ttaaatcagc tcatttttta accaataggc cgaaatcggc aaaatccctt ataaatcaaa
3060agaatagacc gagatagggt tgagtgttgt tccagtttgg aacaagagtc cactattaaa
3120gaacgtggac tccaacgtca aagggcgaaa aaccgtctat cagggcgatg gcccactacg
3180tgaaccatca ccctaatcaa gataacttcg tataatgtat gctatacgaa cggtacctga
3240aaggtttacg atattagctg ctaagttctc ccttgcccta gtgaattacg tttttttcga
3300ttaaaagggt tggtcccgaa attgacctct tatatgtacc tcaattgcag atagcatcaa
3360atttagacta cgtcatcaac caaattacac cctatgaaaa acataagatt ataagcatgc
3420gtttgtgtaa aggtcatacc attttatacg ttgacgcaat ggtgcaaata gaaaggttta
3480cgaaagctct tattccgctt agcgcaattc tattgttaat ttcaagtaaa aagataaata
3540tctcacgtcc ttatattatc tttgattaaa atttcacgat caatgtaacg aaaaaaggtc
3600gcaaatattt ctttttcatc actctgatta aacaatggca tcatgaaacg ctcattggat
3660cgttacacgt tctcattgta catatgtatg atttagaaca gcttgttcgc ctgggcgccg
3720aattcactgg ccgtcgtttt acaacgtcgt gactgggaaa accctggcgt tacccaactt
3780aatcgccttg cagcacatcc ccctttcgcc agctggcgta atagcgaaga ggcccgcacc
3840gatcgccctt cccaacagtt gcgcagcctg aatggcgaat ggcgcctgat gcggtatttt
3900ctccttacgc atctgtgcgg tatttcacac cgcatatggt gcactctcag tacaatctgc
3960tctgatgccg catagttaag ccagccccga cacccgccaa cacccgctga cgcgccctga
4020cgggcttgtc tgctcccggc atccgcttac agacaagctg tgaccgtctc cgggagctgc
4080atgtgtcaga ggttttcacc gtcatcaccg aaacgcgcga gacgaaaggg cctcgtgata
4140cgcctatttt tataggttaa tgtcatgata ataatggttt cttagacgtc aggtggcact
4200tttcggggaa atgtgcgcgg aacccctatt tgtttatttt tctaaataca ttcaaatatg
4260tatccgctca tgagacaata accctgataa atgcttcaat aatattgaaa aaggaagagt
4320atgagtattc aacatttccg tgtcgccctt attccctttt ttgcggcatt ttgccttcct
4380gtttttgctc acccagaaac gctggtgaaa gtaaaagatg ctgaagatca gttgggtgca
4440cgagtgggtt acatcgaact ggatctcaac agcggtaaga tccttgagag ttttcgcccc
4500gaagaacgtt ttccaatgat gagcactttt aaagttctgc tatgtggcgc ggtattatcc
4560cgtattgacg ccgggcaaga gcaactcggt cgccgcatac actattctca gaatgacttg
4620gttgagtact caccagtcac agaaaagcat cttacggatg gcatgacagt aagagaatta
4680tgcagtgctg ccataaccat gagtgataac actgcggcca acttacttct gacaacgatc
4740ggaggaccga aggagctaac cgcttttttg cacaacatgg gggatcatgt aactcgcctt
4800gatcgttggg aaccggagct gaatgaagcc ataccaaacg acgagcgtga caccacgatg
4860cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg gcgaactact tactctagct
4920tcccggcaac aattaataga ctggatggag gcggataaag ttgcaggacc acttctgcgc
4980tcggcccttc cggctggctg gtttattgct gataaatctg gagccggtga gcgtgggtct
5040cgcggtatca ttgcagcact ggggccagat ggtaagccct cccgtatcgt agttatctac
5100acgacgggga gtcaggcaac tatggatgaa cgaaatagac agatcgctga gataggtgcc
5160tcactgatta agcattggta actgtcagac caagtttact catatatact ttagattgat
5220ttaaaacttc atttttaatt taaaaggatc taggtgaaga tcctttttga taatctcatg
5280accaaaatcc cttaacgtga gttttcgttc cactgagcgt cagaccccgt agaaaagatc
5340aaaggatctt cttgagatcc tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa
5400ccaccgctac cagcggtggt ttgtttgccg gatcaagagc taccaactct ttttccgaag
5460gtaactggct tcagcagagc gcagatacca aatactgtcc ttctagtgta gccgtagtta
5520ggccaccact tcaagaactc tgtagcaccg cctacatacc tcgctctgct aatcctgtta
5580ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg ggttggactc aagacgatag
5640ttaccggata aggcgcagcg gtcgggctga acggggggtt cgtgcacaca gcccagcttg
5700gagcgaacga cctacaccga actgagatac ctacagcgtg agctatgaga aagcgccacg
5760cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg gcagggtcgg aacaggagag
5820cgcacgaggg agcttccagg gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc
5880cacctctgac ttgagcgtcg atttttgtga tgctcgtcag gggggcggag cctatggaaa
5940aacgccagca acgcggcctt tttacggttc ctggcctttt gctggccttt tgctcacatg
6000ttctttcctg cgttatcccc tgattctgtg gataaccgta ttaccgcctt tgagtgagct
6060gataccgctc gccgcagccg aacgaccgag cgcagcgagt cagtgagcga ggaagcggaa
6120gagcgcccaa tacgcaaacc gcctctcccc gcgcgttggc cgattcatta atgcagctgg
6180cacgacaggt ttcccgactg gaaagcgggc agtgagcgca acgcaattaa tgtgagttag
6240ctcactcatt aggcacccca ggctttacac tttatgcttc cggctcgtat gttgtgtgga
6300attgtgagcg gataacaatt tcacacagga aacagctatg accatgatta cgccaagctt
6360ggcgcccgag gtttgagtca atgtacgaga ttttgaaacc atgagggatg tgccctcaaa
6420ctttatgatc tggtttcttt ttttatttat atcgttttta tgctgggtta cgttgttgtt
6480ttatattcct tgtctagtat tacggtcagc agaggccaaa tcctatatgc attatatgtg
6540gttagcattg actccttaat agaacggcca gttatcaatc aaattgccta taagcatgga
6600tcactagcgg ctagtattgt attgcatttg gaatagccct cttcgttgag ccacaaagat
6660atcttgcgga aatgtggcta aagcaacttt tgatgattgc tgggaaacgg tttggtaaac
6720tgctgaagca attgttgtac acaatacttc tgcttttggc cccatctgga tttttcattt
6780ccgcttttgc ctcaggtgca acaataacta tctttgagca tcgtaccacc aactagttga
6840cagcaggatt atcgtaatac gtaatagttg aaaatctcaa aaatgtgtgg gtcattacgt
6900aaataatgat aggaatggga ttcttctatt tttccttttt ccattctgtc gaccgcacgc
6960cgaaatgcat gcaagtaacc tattcaaagt aatatctcat acatgtttca tgagggtaac
7020aacatgcgac tgggtgagca tatgttccgc tgatgtgatg tgcaagataa acaagcaaga
7080cagaaactaa cttcttcttc atgtaataaa cacaccccgc gtttatttac ctatctttaa
7140acttcaacac cttatatcat aactaatatt tcttgagata agcacactgc acccatacct
7200tccttaaaaa cgtagcttcc agtttttggt ggttctggct tccttcccga ttccgcccgc
7260taaacgcata attttgttgc ctggtggcat ttgcaaaatg cataacctat gcatttaaaa
7320gattatgtat gctcttctga cttttcgtgt gatgaggctc gtggaaaaaa tgaataattt
7380atgaatttga gaacaatttt gtgttgttac ggtattttac tatggaataa tcaatcaatt
7440gaggatttta tgcaaatatc gtttgaatat ttttccgacc ctttgagtac ttttcttcat
7500aattgcataa tattgtccgc tgcccgtttt tctgttagac ggtgtcttga tctacttgct
7560atcgttcaac accaccttat cttctaacta tttttttttt agctcatttg aatcagctta
7620tggtgatggc acatttttgc ataaacctag ctgtcctcgt tgaacatagg aaaaaaaaat
7680atataaacaa ggctctttca ctctccttgg aatcagattt gggtttgttc cctttatttt
7740catatttctt gtcatattct tttctcaatt attattttct actcataacc tcacgcaaaa
7800taacacagtc aaatcaatca agtttaaaca gtatggctaa ggaatacttc ccacaaatcc
7860aaaagatcca ataccaaggt ccaaagtcca ctgacccatt gtccttcaag tactacaacc
7920cagaagaagt catcaacggt aaaaccatga gagaacactt gaagttcgct ttgtcttggt
7980ggcacaccat gggtggtgac ggtactgata tgttcggttg tggtactact gacaagactt
8040ggggtcaatc tgatccagct gctagagcta aggctaaggt tgacgctgct ttcgaaatca
8100tggacaagtt gtccatcgat tactactgtt tccacgacag agatttgtct ccagaatacg
8160gttccttgaa ggctaccaac gaccaattgg atatcgtcac tgactacatc aaggaaaagc
8220aaggtgacaa gttcaagtgt ttgtggggta ctgctaagtg tttcgaccac ccaagattca
8280tgcacggtgc tggtacttct ccatccgctg acgttttcgc tttctctgct gctcaaatca
8340agaaggcttt ggaatccacc gttaagttgg gtgctaacgg ttacgtcttc tggggtggta
8400gagaaggtta cgaaaccttg ttgaacacta acatgggttt ggaattggac aacatggcta
8460gattgatgaa gatggctgtt gaatacggta gatctatcgg tttcaagggt gacttctaca
8520tcgaaccaaa gccaaaggaa ccaactaagc accaatacga cttcgatacc gctactgtct
8580tgggtttctt gagaaagtac ggtttggaca aggatttcaa gatgaacatc gaagctaacc
8640acgct
8645498537DNAartificial sequenceconstructed plasmid 49aaacgccagc
aacgcggcct ttttacggtt cctggccttt tgctggcctt ttgctcacat 60gttctttcct
gcgttatccc ctgattctgt ggataaccgt attaccgcct ttgagtgagc 120tgataccgct
cgccgcagcc gaacgaccga gcgcagcgag tcagtgagcg aggaagcgga 180agagcgccca
atacgcaaac cgcctctccc cgcgcgttgg ccgattcatt aatgcagctg 240gcacgacagg
tttcccgact ggaaagcggg cagtgagcgc aacgcaatta atgtgagtta 300gctcactcat
taggcacccc aggctttaca ctttatgctt ccggctcgta tgttgtgtgg 360aattgtgagc
ggataacaat ttcacacagg aaacagctat gaccatgatt acgccaagct 420tggcgccttg
agcattagtt gatcattacc gctgcagtca gttaatgata tccctaaata 480ggcatattgt
acaatttcaa atatattacc cgaattacaa gaacaatagg gtcttctaca 540atattactat
gaacaattca taggggaaca tagtccactt caatgtcggt gccaacgatt 600ctatttgttg
taagtatata tgctgtttcc cggcaatact ttatattggg ggaattttct 660ggaggtttaa
gggcatatta ctacgtcaag gctgggattg accttctggt atatattacg 720ttgcgcagga
cattccttat acttgagcct ttgtaaaaac ttcttcgttc tcctgaccaa 780ccgttcaatt
atgttgtgaa ttttttgatc cgagtttgaa acatccaaat tcacaaacaa 840accgttgtca
tcctgagttg gcgtactagt tgacagcagg attatcgtaa tacgtaatag 900ttgaaaatct
caaaaatgtg tgggtcatta cgtaaataat gataggaatg ggattcttct 960atttttcctt
tttccattct gtcgaccgca cgccgaaatg catgcaagta acctattcaa 1020agtaatatct
catacatgtt tcatgagggt aacaacatgc gactgggtga gcatatgttc 1080cgctgatgtg
atgtgcaaga taaacaagca agacagaaac taacttcttc ttcatgtaat 1140aaacacaccc
cgcgtttatt tacctatctt taaacttcaa caccttatat cataactaat 1200atttcttgag
ataagcacac tgcacccata ccttccttaa aaacgtagct tccagttttt 1260ggtggttctg
gcttccttcc cgattccgcc cgctaaacgc ataattttgt tgcctggtgg 1320catttgcaaa
atgcataacc tatgcattta aaagattatg tatgctcttc tgacttttcg 1380tgtgatgagg
ctcgtggaaa aaatgaataa tttatgaatt tgagaacaat tttgtgttgt 1440tacggtattt
tactatggaa taatcaatca attgaggatt ttatgcaaat atcgtttgaa 1500tatttttccg
accctttgag tacttttctt cataattgca taatattgtc cgctgcccgt 1560ttttctgtta
gacggtgtct tgatctactt gctatcgttc aacaccacct tatcttctaa 1620ctattttttt
tttagctcat ttgaatcagc ttatggtgat ggcacatttt tgcataaacc 1680tagctgtcct
cgttgaacat aggaaaaaaa aatatataaa caaggctctt tcactctcct 1740tggaatcaga
tttgggtttg ttccctttat tttcatattt cttgtcatat tcttttctca 1800attattattt
tctactcata acctcacgca aaataacaca gtcaaatcaa tcaagtttaa 1860acagtatggc
taaggaatac ttcccacaaa tccaaaagat ccaataccaa ggtccaaagt 1920ccactgaccc
attgtccttc aagtactaca acccagaaga agtcatcaac ggtaaaacca 1980tgagagaaca
cttgaagttc gctttgtctt ggtggcacac catgggtggt gacggtactg 2040atatgttcgg
ttgtggtact actgacaaga cttggggtca atctgatcca gctgctagag 2100ctaaggctaa
ggttgacgct gctttcgaaa tcatggacaa gttgtccatc gattactact 2160gtttccacga
cagagatttg tctccagaat acggttcctt gaaggctacc aacgaccaat 2220tggatatcgt
cactgactac atcaaggaaa agcaaggtga caagttcaag tgtttgtggg 2280gtactgctaa
gtgtttcgac cacccaagat tcatgcacgg tgctggtact tctccatccg 2340ctgacgtttt
cgctttctct gctgctcaaa tcaagaaggc tttggaatcc accgttaagt 2400tgggtgctaa
cggttacgtc ttctggggtg gtagagaagg ttacgaaacc ttgttgaaca 2460ctaacatggg
tttggaattg gacaacatgg ctagattgat gaagatggct gttgaatacg 2520gtagatctat
cggtttcaag ggtgacttct acatcgaacc aaagccaaag gaaccaacta 2580agcaccaata
cgacttcgat accgctactg tcttgggttt cttgagaaag tacggtttgg 2640acaaggattt
caagatgaac atcgaagcta accacgctac cttggctcaa cacactttcc 2700aacacgaatt
gagagttgct agagacaacg gtgtcttcgg ttccatcgat gctaaccaag 2760gtgacgtttt
gttgggttgg gacaccgatc aattcccaac taacatctac gacaccacta 2820tgtgtatgta
cgaagtcatc aaggctggtg gtttcaccaa cggtggtttg aacttcgacg 2880ctaaggctag
aagaggttct ttcactccag aagacatctt ctactcctac atcgctggta 2940tggacgcttt
cgctttgggt ttcagagctg ctttgaagtt gatcgaagac ggtagaatcg 3000ataagttcgt
tgctgacaga tacgcttctt ggaacaccgg tatcggtgct gacatcatcg 3060ctggtaaagc
tgatttcgct tctttggaaa agtacgcttt ggaaaagggt gaagtcactg 3120cttccttgtc
ctctggtaga caagaaatgt tggaatccat cgtcaacaac gttttgttct 3180ctttgtgagg
ccctgcaggc cagaggaaaa taatatcaag tgctggaaac tttttctctt 3240ggaatttttg
caacatcaag tcatagtcaa ttgaattgac ccaatttcac atttaagatt 3300tttttttttt
catccgacat acatctgtac actaggaagc cctgtttttc tgaagcagct 3360tcaaatatat
atatttttta catatttatt atgattcaat gaacaatcta attaaatcga 3420aaacaagaac
cgaaacgcga ataaataatt tatttagatg gtgacaagtg tataagtcct 3480catcgggaca
gctacgattt ctctttcggt tttggctgag ctactggttg ctgtgacgca 3540gcggcattag
cgcggcgtta tgagctaccc tcgtggcctg aaagatggcg ggaataaagc 3600ggaactaaaa
attactgact gagccatatt gaggtcaatt tgtcaactcg tcaagtcacg 3660tttggtggac
ggcccctttc caacgaatcg tatatactaa catgcgcgcg cttcctatat 3720acacatatac
atatatatat atatatatat gtgtgcgtgt atgtgtacac ctgtatttaa 3780tttccttact
cgcgggtttt tcttttttct caattcttgg cttcctcttt ctcgagcccg 3840ggatttaaat
gccgttgaca ttcatgtaag cagtattagc ttttaaactg ccattaacgc 3900acctatacga
ttgcggcggt gcgaaaacag taactagcac tcttacggct cattgtttcc 3960tctatccaaa
gttgcctata tcacaataaa attagaaatt aatcatttta ataagcttgc 4020tgccattcag
tgactcgcaa tctacgtttt aaacgtaatt taaaaatcca cttccaccta 4080catattttgc
aacagtcgta cgacagaaac tgaggctcta taaaatgaga tgtgctaatt 4140gttctttctt
ggctggtttc agctacatct ttctgtaatc aatctacaaa tttacacgcg 4200agcttcattt
tgacagtaaa caaatgtaaa agacacatgc aataaaagcg gtccagaaaa 4260caaacgacaa
agccaccaaa aatgtttgca aacgttggat ttagaacttt gagggggatc 4320cgcattgcgg
attacgtatt ctaatgttca gtaccgttcg tataatgtat gctatacgaa 4380gttatgcaga
ttgtactgag agtgcaccat accacagctt ttcaattcaa ttcatcattt 4440tttttttatt
cttttttttg atttcggttt ctttgaaatt tttttgattc ggtaatctcc 4500gaacagaagg
aagaacgaag gaaggagcac agacttagat tggtatatat acgcatatgt 4560agtgttgaag
aaacatgaaa ttgcccagta ttcttaaccc aactgcacag aacaaaaacc 4620tgcaggaaac
gaagataaat catgtcgaaa gctacatata aggaacgtgc tgctactcat 4680cctagtcctg
ttgctgccaa gctatttaat atcatgcacg aaaagcaaac aaacttgtgt 4740gcttcattgg
atgttcgtac caccaaggaa ttactggagt tagttgaagc attaggtccc 4800aaaatttgtt
tactaaaaac acatgtggat atcttgactg atttttccat ggagggcaca 4860gttaagccgc
taaaggcatt atccgccaag tacaattttt tactcttcga agacagaaaa 4920tttgctgaca
ttggtaatac agtcaaattg cagtactctg cgggtgtata cagaatagca 4980gaatgggcag
acattacgaa tgcacacggt gtggtgggcc caggtattgt tagcggtttg 5040aagcaggcgg
cagaagaagt aacaaaggaa cctagaggcc ttttgatgtt agcagaattg 5100tcatgcaagg
gctccctatc tactggagaa tatactaagg gtactgttga cattgcgaag 5160agcgacaaag
attttgttat cggctttatt gctcaaagag acatgggtgg aagagatgaa 5220ggttacgatt
ggttgattat gacacccggt gtgggtttag atgacaaggg agacgcattg 5280ggtcaacagt
atagaaccgt ggatgatgtg gtctctacag gatctgacat tattattgtt 5340ggaagaggac
tatttgcaaa gggaagggat gctaaggtag agggtgaacg ttacagaaaa 5400gcaggctggg
aagcatattt gagaagatgc ggccagcaaa actaaaaaac tgtattataa 5460gtaaatgcat
gtatactaaa ctcacaaatt agagcttcaa tttaattata tcagttatta 5520ccctatgcgg
tgtgaaatac cgcacagatg cgtaaggaga aaataccgca tcaggaaatt 5580gtaaacgtta
atattttgtt aaaattcgcg ttaaattttt gttaaatcag ctcatttttt 5640aaccaatagg
ccgaaatcgg caaaatccct tataaatcaa aagaatagac cgagataggg 5700ttgagtgttg
ttccagtttg gaacaagagt ccactattaa agaacgtgga ctccaacgtc 5760aaagggcgaa
aaaccgtcta tcagggcgat ggcccactac gtgaaccatc accctaatca 5820agataacttc
gtataatgta tgctatacga acggtaccaa ttgtttctcc acatgtgtac 5880cgtataaagt
tcttgttatg aattagaagg attatcacaa aagccctcca tcaaatctga 5940taatttcatt
acccaaagac tgtaatacct tgggtgtaaa gatcgtattt tttgatgtct 6000tcgtgatatt
ttgcagatct ttccatttta taaatttctg caaatatcgt gcatcagctc 6060atcttggaat
tggcacaatc tgcgtaattt ttttcttggt tctcttctcc ttttgatttt 6120aagagcgctt
tctattcagt agcagatttt cagtaaataa agccatcttt ctcaagtaaa 6180ctcatgatca
tgcagttgtt caaaattaca ccaataacga tgttatcttt aaaactttcc 6240ccagaaagaa
gatggcactt ttgcactaaa tacttcaaac tatttaactt tgaagcagga 6300catattgccg
gggcgccgaa ttcactggcc gtcgttttac aacgtcgtga ctgggaaaac 6360cctggcgtta
cccaacttaa tcgccttgca gcacatcccc ctttcgccag ctggcgtaat 6420agcgaagagg
cccgcaccga tcgcccttcc caacagttgc gcagcctgaa tggcgaatgg 6480cgcctgatgc
ggtattttct ccttacgcat ctgtgcggta tttcacaccg catatggtgc 6540actctcagta
caatctgctc tgatgccgca tagttaagcc agccccgaca cccgccaaca 6600cccgctgacg
cgccctgacg ggcttgtctg ctcccggcat ccgcttacag acaagctgtg 6660accgtctccg
ggagctgcat gtgtcagagg ttttcaccgt catcaccgaa acgcgcgaga 6720cgaaagggcc
tcgtgatacg cctattttta taggttaatg tcatgataat aatggtttct 6780tagacgtcag
gtggcacttt tcggggaaat gtgcgcggaa cccctatttg tttatttttc 6840taaatacatt
caaatatgta tccgctcatg agacaataac cctgataaat gcttcaataa 6900tattgaaaaa
ggaagagtat gagtattcaa catttccgtg tcgcccttat tccctttttt 6960gcggcatttt
gccttcctgt ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct 7020gaagatcagt
tgggtgcacg agtgggttac atcgaactgg atctcaacag cggtaagatc 7080cttgagagtt
ttcgccccga agaacgtttt ccaatgatga gcacttttaa agttctgcta 7140tgtggcgcgg
tattatcccg tattgacgcc gggcaagagc aactcggtcg ccgcatacac 7200tattctcaga
atgacttggt tgagtactca ccagtcacag aaaagcatct tacggatggc 7260atgacagtaa
gagaattatg cagtgctgcc ataaccatga gtgataacac tgcggccaac 7320ttacttctga
caacgatcgg aggaccgaag gagctaaccg cttttttgca caacatgggg 7380gatcatgtaa
ctcgccttga tcgttgggaa ccggagctga atgaagccat accaaacgac 7440gagcgtgaca
ccacgatgcc tgtagcaatg gcaacaacgt tgcgcaaact attaactggc 7500gaactactta
ctctagcttc ccggcaacaa ttaatagact ggatggaggc ggataaagtt 7560gcaggaccac
ttctgcgctc ggcccttccg gctggctggt ttattgctga taaatctgga 7620gccggtgagc
gtgggtctcg cggtatcatt gcagcactgg ggccagatgg taagccctcc 7680cgtatcgtag
ttatctacac gacggggagt caggcaacta tggatgaacg aaatagacag 7740atcgctgaga
taggtgcctc actgattaag cattggtaac tgtcagacca agtttactca 7800tatatacttt
agattgattt aaaacttcat ttttaattta aaaggatcta ggtgaagatc 7860ctttttgata
atctcatgac caaaatccct taacgtgagt tttcgttcca ctgagcgtca 7920gaccccgtag
aaaagatcaa aggatcttct tgagatcctt tttttctgcg cgtaatctgc 7980tgcttgcaaa
caaaaaaacc accgctacca gcggtggttt gtttgccgga tcaagagcta 8040ccaactcttt
ttccgaaggt aactggcttc agcagagcgc agataccaaa tactgtcctt 8100ctagtgtagc
cgtagttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc 8160gctctgctaa
tcctgttacc agtggctgct gccagtggcg ataagtcgtg tcttaccggg 8220ttggactcaa
gacgatagtt accggataag gcgcagcggt cgggctgaac ggggggttcg 8280tgcacacagc
ccagcttgga gcgaacgacc tacaccgaac tgagatacct acagcgtgag 8340ctatgagaaa
gcgccacgct tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc 8400agggtcggaa
caggagagcg cacgagggag cttccagggg gaaacgcctg gtatctttat 8460agtcctgtcg
ggtttcgcca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg 8520gggcggagcc
tatggaa
8537506728DNAartificial sequenceconstructed vector 50acatatttga
atgtatttag aaaaataaac aaataggggt tccgcgcaca tttccccgaa 60aagtgccacc
tgggtccttt tcatcacgtg ctataaaaat aattataatt taaatttttt 120aatataaata
tataaattaa aaatagaaag taaaaaaaga aattaaagaa aaaatagttt 180ttgttttccg
aagatgtaaa agactctagg gggatcgcca acaaatacta ccttttatct 240tgctcttcct
gctctcaggt attaatgccg aattgtttca tcttgtctgt gtagaagacc 300acacacgaaa
atcctgtgat tttacatttt acttatcgtt aatcgaatgt atatctattt 360aatctgcttt
tcttgtctaa taaatatata tgtaaagtac gctttttgtt gaaatttttt 420aaacctttgt
ttattttttt ttcttcattc cgtaactctt ctaccttctt tatttacttt 480ctaaaatcca
aatacaaaac ataaaaataa ataaacacag agtaaattcc caaattattc 540catcattaaa
agatacgagg cgcgtgtaag ttacaggcaa gcgatccgtc ctaagaaacc 600attattatca
tgacattaac ctataaaaat aggcgtatca cgaggccctt tcgtctcgcg 660cgtttcggtg
atgacggtga aaacctctga cacatgcagc tcccggagac ggtcacagct 720tgtctgtaag
cggatgccgg gagcagacaa gcccgtcagg gcgcgtcagc gcgtgttggc 780gggtgtcggg
gctggcttaa ctatgcggca tcagagcaga ttgtactgag agtgcaccat 840aaattcccgt
tttaagagct tggtgagcgc taggagtcac tgccaggtat cgtttgaaca 900cggcattagt
cagggaagtc ataacacagt cctttcccgc aattttcttt ttctattact 960cttggcctcc
tctagtacac tctatatttt tttatgcctc ggtaatgatt ttcatttttt 1020tttttcccct
agcggatgac tctttttttt tcttagcgat tggcattatc acataatgaa 1080ttatacatta
tataaagtaa tgtgatttct tcgaagaata tactaaaaaa tgagcaggca 1140agataaacga
aggcaaagat gacagagcag aaagccctag taaagcgtat tacaaatgaa 1200accaagattc
agattgcgat ctctttaaag ggtggtcccc tagcgataga gcactcgatc 1260ttcccagaaa
aagaggcaga agcagtagca gaacaggcca cacaatcgca agtgattaac 1320gtccacacag
gtatagggtt tctggaccat atgatacatg ctctggccaa gcattccggc 1380tggtcgctaa
tcgttgagtg cattggtgac ttacacatag acgaccatca caccactgaa 1440gactgcggga
ttgctctcgg tcaagctttt aaagaggccc tactggcgcg tggagtaaaa 1500aggtttggat
caggatttgc gcctttggat gaggcacttt ccagagcggt ggtagatctt 1560tcgaacaggc
cgtacgcagt tgtcgaactt ggtttgcaaa gggagaaagt aggagatctc 1620tcttgcgaga
tgatcccgca ttttcttgaa agctttgcag aggctagcag aattaccctc 1680cacgttgatt
gtctgcgagg caagaatgat catcaccgta gtgagagtgc gttcaaggct 1740cttgcggttg
ccataagaga agccacctcg cccaatggta ccaacgatgt tccctccacc 1800aaaggtgttc
ttatgtagtg acaccgatta tttaaagctg cagcatacga tatatataca 1860tgtgtatata
tgtataccta tgaatgtcag taagtatgta tacgaacagt atgatactga 1920agatgacaag
gtaatgcatc attctatacg tgtcattctg aacgaggcgc gctttccttt 1980tttctttttg
ctttttcttt ttttttctct tgaactcgac ggatctatgc ggtgtgaaat 2040accgcacaga
tgcgtaagga gaaaataccg catcaggaaa ttgtaaacgt taatattttg 2100ttaaaattcg
cgttaaattt ttgttaaatc agctcatttt ttaaccaata ggccgaaatc 2160ggcaaaatcc
cttataaatc aaaagaatag accgagatag ggttgagtgt tgttccagtt 2220tggaacaaga
gtccactatt aaagaacgtg gactccaacg tcaaagggcg aaaaaccgtc 2280tatcagggcg
atggcccact acgtgaacca tcaccctaat caagtttttt ggggtcgagg 2340tgccgtaaag
cactaaatcg gaaccctaaa gggagccccc gatttagagc ttgacgggga 2400aagccggcga
acgtggcgag aaaggaaggg aagaaagcga aaggagcggg cgctagggcg 2460ctggcaagtg
tagcggtcac gctgcgcgta accaccacac ccgccgcgct taatgcgccg 2520ctacagggcg
cgtcgcgcca ttcgccattc aggctgcgca actgttggga agggcgatcg 2580gtgcgggcct
cttcgctatt acgccagctg gcgaaagggg gatgtgctgc aaggcgatta 2640agttgggtaa
cgccagggtt ttcccagtca cgacgttgta aaacgacggc cagtgagcgc 2700gcgtaatacg
actcactata gggcgaattg ggtaccgggc cccccctcga ggtcgacggt 2760atcgataagc
ttgattagaa gccgccgagc gggcgacagc cctccgacgg aagactctcc 2820tccgtgcgtc
ctcgtcttca ccggtcgcgt tcctgaaacg cagatgtgcc tcgcgccgca 2880ctgctccgaa
caataaagat tctacaatac tagcttttat ggttatgaag aggaaaaatt 2940ggcagtaacc
tggccccaca aaccttcaaa ttaacgaatc aaattaacaa ccataggatg 3000ataatgcgat
tagtttttta gccttatttc tggggtaatt aatcagcgaa gcgatgattt 3060ttgatctatt
aacagatata taaatggaaa agctgcataa ccactttaac taatactttc 3120aacattttca
gtttgtatta cttcttattc aaatgtcata aaagtatcaa caaaaaattg 3180ttaatatacc
tctatacttt aacgtcaagg agaaaaatgt ccaatttact gcccgtacac 3240caaaatttgc
ctgcattacc ggtcgatgca acgagtgatg aggttcgcaa gaacctgatg 3300gacatgttca
gggatcgcca ggcgttttct gagcatacct ggaaaatgct tctgtccgtt 3360tgccggtcgt
gggcggcatg gtgcaagttg aataaccgga aatggtttcc cgcagaacct 3420gaagatgttc
gcgattatct tctatatctt caggcgcgcg gtctggcagt aaaaactatc 3480cagcaacatt
tgggccagct aaacatgctt catcgtcggt ccgggctgcc acgaccaagt 3540gacagcaatg
ctgtttcact ggttatgcgg cggatccgaa aagaaaacgt tgatgccggt 3600gaacgtgcaa
aacaggctct agcgttcgaa cgcactgatt tcgaccaggt tcgttcactc 3660atggaaaata
gcgatcgctg ccaggatata cgtaatctgg catttctggg gattgcttat 3720aacaccctgt
tacgtatagc cgaaattgcc aggatcaggg ttaaagatat ctcacgtact 3780gacggtggga
gaatgttaat ccatattggc agaacgaaaa cgctggttag caccgcaggt 3840gtagagaagg
cacttagcct gggggtaact aaactggtcg agcgatggat ttccgtctct 3900ggtgtagctg
atgatccgaa taactacctg ttttgccggg tcagaaaaaa tggtgttgcc 3960gcgccatctg
ccaccagcca gctatcaact cgcgccctgg aagggatttt tgaagcaact 4020catcgattga
tttacggcgc taaggatgac tctggtcaga gatacctggc ctggtctgga 4080cacagtgccc
gtgtcggagc cgcgcgagat atggcccgcg ctggagtttc aataccggag 4140atcatgcaag
ctggtggctg gaccaatgta aatattgtca tgaactatat ccgtaacctg 4200gatagtgaaa
caggggcaat ggtgcgcctg ctggaagatg gcgattagga gtaagcgaat 4260ttcttatgat
ttatgatttt tattattaaa taagttataa aaaaaataag tgtatacaaa 4320ttttaaagtg
actcttaggt tttaaaacga aaattcttat tcttgagtaa ctctttcctg 4380taggtcaggt
tgctttctca ggtatagcat gaggtcgctc ttattgacca cacctctacc 4440ggcatgccga
gcaaatgcct gcaaatcgct ccccatttca cccaattgta gatatgctaa 4500ctccagcaat
gagttgatga atctcggtgt gtattttatg tcctcagagg acaacacctg 4560tggtgttcta
gagcggccgc caccgcggtg gagctccagc ttttgttccc tttagtgagg 4620gttaattgcg
cgcttggcgt aatcatggtc atagctgttt cctgtgtgaa attgttatcc 4680gctcacaatt
ccacacaaca taggagccgg aagcataaag tgtaaagcct ggggtgccta 4740atgagtgagg
taactcacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa 4800cctgtcgtgc
cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat 4860tgggcgctct
tccgcttcct cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg 4920agcggtatca
gctcactcaa aggcggtaat acggttatcc acagaatcag gggataacgc 4980aggaaagaac
atgtgagcaa aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt 5040gctggcgttt
ttccataggc tccgcccccc tgacgagcat cacaaaaatc gacgctcaag 5100tcagaggtgg
cgaaacccga caggactata aagataccag gcgtttcccc ctggaagctc 5160cctcgtgcgc
tctcctgttc cgaccctgcc gcttaccgga tacctgtccg cctttctccc 5220ttcgggaagc
gtggcgcttt ctcatagctc acgctgtagg tatctcagtt cggtgtaggt 5280cgttcgctcc
aagctgggct gtgtgcacga accccccgtt cagcccgacc gctgcgcctt 5340atccggtaac
tatcgtcttg agtccaaccc ggtaagacac gacttatcgc cactggcagc 5400agccactggt
aacaggatta gcagagcgag gtatgtaggc ggtgctacag agttcttgaa 5460gtggtggcct
aactacggct acactagaag gacagtattt ggtatctgcg ctctgctgaa 5520gccagttacc
ttcggaaaaa gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg 5580tagcggtggt
ttttttgttt gcaagcagca gattacgcgc agaaaaaaag gatctcaaga 5640agatcctttg
atcttttcta cggggtctga cgctcagtgg aacgaaaact cacgttaagg 5700gattttggtc
atgagattat caaaaaggat cttcacctag atccttttaa attaaaaatg 5760aagttttaaa
tcaatctaaa gtatatatga gtaaacttgg tctgacagtt accaatgctt 5820aatcagtgag
gcacctatct cagcgatctg tctatttcgt tcatccatag ttgcctgact 5880ccccgtcgtg
tagataacta cgatacggga gggcttacca tctggcccca gtgctgcaat 5940gataccgcga
gacccacgct caccggctcc agatttatca gcaataaacc agccagccgg 6000aagggccgag
cgcagaagtg gtcctgcaac tttatccgcc tccatccagt ctattaattg 6060ttgccgggaa
gctagagtaa gtagttcgcc agttaatagt ttgcgcaacg ttgttgccat 6120tgctacaggc
atcgtggtgt cacgctcgtc gtttggtatg gcttcattca gctccggttc 6180ccaacgatca
aggcgagtta catgatcccc catgttgtgc aaaaaagcgg ttagctcctt 6240cggtcctccg
atcgttgtca gaagtaagtt ggccgcagtg ttatcactca tggttatggc 6300agcactgcat
aattctctta ctgtcatgcc atccgtaaga tgcttttctg tgactggtga 6360gtactcaacc
aagtcattct gagaatagtg tatgcggcga ccgagttgct cttgcccggc 6420gtcaatacgg
gataataccg cgccacatag cagaacttta aaagtgctca tcattggaaa 6480acgttcttcg
gggcgaaaac tctcaaggat cttaccgctg ttgagatcca gttcgatgta 6540acccactcgt
gcacccaact gatcttcagc atcttttact ttcaccagcg tttctgggtg 6600agcaaaaaca
ggaaggcaaa atgccgcaaa aaagggaata agggcgacac ggaaatgttg 6660aatactcata
ctcttccttt ttcaatatta ttgaagcatt tatcagggtt attgtctcat 6720gagcggat
6728512431DNAartificial sequencechimeric expression cassette for BSaraA
51cttttctggc aaccaaaccc atacatcggg attcctataa taccttcgtt ggtctcccta
60acatgtaggt ggcggagggg agatatacaa tagaacagat accagacaag acataatggg
120ctaaacaaga ctacaccaat tacactgcct cattgatggt ggtacataac gaactaatac
180tgtagcccta gacttgatag ccatcatcat atcgaagttt cactaccctt tttccatttg
240ccatctattg aagtaataat aggcgcatgc aacttctttt cttttttttt cttttctctc
300tcccccgttg ttgtctcacc atatccgcaa tgacaaaaaa atgatggaag acactaaagg
360aaaaaattaa cgacaaagac agcaccaaca gatgtcgttg ttccagagct gatgaggggt
420atctcgaagc acacgaaact ttttccttcc ttcattcacg cacactactc tctaatgagc
480aacggtatac ggccttcctt ccagttactt gaatttgaaa taaaaaaaag tttgctgtct
540tgctatcaag tataaataga cctgcaatta ttaatctttt gtttcctcgt cattgttctc
600gttccctttc ttccttgttt ctttttctgc acaatatttc aagctatacc aagcatacaa
660tcaactatct catatacaat gttgcaaact aaggattacg aattctggtt cgttactggt
720tctcaacact tgtacggtga agaaactttg gaattggtcg atcaacacgc taagtctatc
780tgtgaaggtt tgtccggtgt ctcttccaga tacaagatca cccacaagcc agttgtcacc
840tcttccgaaa ctatcagaca attgttgaga gaagctgaat actctgaaac ttgtgctggt
900atcatcacct ggatgcacac tttctctcca gctaagatgt ggatcgaagg tttgtcttcc
960taccaaaagc cattgatgca cttgcacacc caatacaaca gagacatccc ttggggtact
1020atcgacatgg atttcatgaa ctctaaccaa tccgctcacg gtgacagaga atacggttac
1080atcaactcca gaatgggttt gtccagaaag gttgtcgctg gttactggga cgatgaagaa
1140gtcaagaagg aaatctctca atggatggac accgctgctg ctttgaacga atccagacac
1200atcaaggttg ctagattcgg tgacaacatg agacacgttg ctgtcactga cggtgacaag
1260gttggtgctc acatccaatt cggttggcaa gttgacggtt acggtatcgg tgacttggtt
1320gaagtcatga acagaatcac cgacgatgaa gttgacactt tgtacgctga atacgataga
1380ttgtacgtca tctctgaaga aaccaagaga gacgaagcta aggttgcttc catcaaggaa
1440caagctaaga tcgaattggg tttgaccact ttcttggaac aaggtggtta ctctgctttc
1500accacttcct tcgaagtctt gcacggtatg aagcaattgc caggtttggc tgttcaaaga
1560ttgatggaaa agggttacgg tttcgctggt gaaggtgact ggaagaccgc tgctttggtc
1620agaatgatga agatcatgtc tcaaggtaaa agaacctcct tcatggaaga ctacacttac
1680cacttcgaac caggtaacga aatgatcttg ggttctcaca tgttggaagt ttgtccaact
1740gtcgctttgg accaaccaaa gatcgaagtt cacccattgt ctatcggtgg taaagaagat
1800ccagctagat tcgtcttcaa cggtatctct ggttccgcta tccaagcctc tttggttgac
1860atcggtggta gattcagatt ggttttgaac gaagtcaacg gtcaagaaat cgaaaaggac
1920atgccaaact tgccagttgc tagagtcttg tggaagccag aaccatcttt gaagactgct
1980gctgaagcct ggatcttggc tggtggtgct caccacacct gtttgtctta cgaattgact
2040gtcgaacaaa tgttggactg ggctgaaatg gctggtatcg aatctgtttt gatctccaga
2100gataccacta tccacaagtt gaagcacgaa ttgaagtgga acacgaagcc ttgtacagat
2160tgcaaaagta attaattaat catgtaatta gttatgtcac gcttacattc acgccctcct
2220cccacatccg ctctaaccga aaaggaagga gttagacaac ctgaagtcta ggtccctatt
2280tatttttttt aatagttatg ttagtattaa gaacgttatt tatatttcaa atttttcttt
2340tttttctgta caaacgcgtg tacgcatgta acattatact gaaaaccttg cttgagaagg
2400ttttgggacg ctcgaaggct ttaatttgcg g
243152566PRTEscherichia coli 52Met Ala Ile Ala Ile Gly Leu Asp Phe Gly
Ser Asp Ser Val Arg Ala1 5 10
15Leu Ala Val Asp Cys Ala Ser Gly Glu Glu Ile Ala Thr Ser Val Glu
20 25 30Trp Tyr Pro Arg Trp Gln
Lys Gly Gln Phe Cys Asp Ala Pro Asn Asn 35 40
45Gln Phe Arg His His Pro Arg Asp Tyr Ile Glu Ser Met Glu
Ala Ala 50 55 60Leu Lys Thr Val Leu
Ala Glu Leu Ser Val Glu Gln Arg Ala Ala Val65 70
75 80Val Gly Ile Gly Val Asp Ser Thr Gly Ser
Thr Pro Ala Pro Ile Asp 85 90
95Ala Asp Gly Asn Val Leu Ala Leu Arg Pro Glu Phe Ala Glu Asn Pro
100 105 110Asn Ala Met Phe Val
Leu Trp Lys Asp His Thr Ala Val Glu Arg Ser 115
120 125Glu Glu Ile Thr Arg Leu Cys His Ala Pro Gly Asn
Val Asp Tyr Ser 130 135 140Arg Tyr Ile
Gly Gly Ile Tyr Ser Ser Glu Trp Phe Trp Ala Lys Ile145
150 155 160Leu His Val Thr Arg Gln Asp
Ser Ala Val Ala Gln Ser Ala Ala Ser 165
170 175Trp Ile Glu Leu Cys Asp Trp Val Pro Ala Leu Leu
Ser Gly Thr Thr 180 185 190Arg
Pro Gln Asp Ile Arg Arg Gly Arg Cys Ser Ala Gly His Lys Ser 195
200 205Leu Trp His Glu Ser Trp Gly Gly Leu
Pro Pro Ala Ser Phe Phe Asp 210 215
220Glu Leu Asp Pro Ile Leu Asn Arg His Leu Pro Ser Pro Leu Phe Thr225
230 235 240Asp Thr Trp Thr
Ala Asp Ile Pro Val Gly Thr Leu Cys Pro Glu Trp 245
250 255Ala Gln Arg Leu Gly Leu Pro Glu Ser Val
Val Ile Ser Gly Gly Ala 260 265
270Phe Asp Cys His Met Gly Ala Val Gly Ala Gly Ala Gln Pro Asn Ala
275 280 285Leu Val Lys Val Ile Gly Thr
Ser Thr Cys Asp Ile Leu Ile Ala Asp 290 295
300Lys Gln Ser Val Gly Glu Arg Ala Val Lys Gly Ile Cys Gly Gln
Val305 310 315 320Asp Gly
Ser Val Val Pro Gly Phe Ile Gly Leu Glu Ala Gly Gln Ser
325 330 335Ala Phe Gly Asp Ile Tyr Ala
Trp Phe Gly Arg Val Leu Ser Trp Pro 340 345
350Leu Glu Gln Leu Ala Ala Gln His Pro Glu Leu Lys Ala Gln
Ile Asn 355 360 365Ala Ser Gln Lys
Gln Leu Leu Pro Ala Leu Thr Glu Ala Trp Ala Lys 370
375 380Asn Pro Ser Leu Asp His Leu Pro Val Val Leu Asp
Trp Phe Asn Gly385 390 395
400Arg Arg Ser Pro Asn Ala Asn Gln Arg Leu Lys Gly Val Ile Thr Asp
405 410 415Leu Asn Leu Ala Thr
Asp Ala Pro Leu Leu Phe Gly Gly Leu Ile Ala 420
425 430Ala Thr Ala Phe Gly Ala Arg Ala Ile Met Glu Cys
Phe Thr Asp Gln 435 440 445Gly Ile
Ala Val Asn Asn Val Met Ala Leu Gly Gly Ile Ala Arg Lys 450
455 460Asn Gln Val Ile Met Gln Ala Cys Cys Asp Val
Leu Asn Arg Pro Leu465 470 475
480Gln Ile Val Ala Ser Asp Gln Cys Cys Ala Leu Gly Ala Ala Ile Phe
485 490 495Ala Ala Val Ala
Ala Lys Val His Ala Asp Ile Pro Ser Ala Gln Gln 500
505 510Lys Met Ala Ser Ala Val Glu Lys Thr Leu Gln
Pro Arg Ser Glu Gln 515 520 525Ala
Gln Arg Phe Glu Gln Leu Tyr Arg Arg Tyr Gln Gln Trp Ala Met 530
535 540Ser Ala Glu Gln His Tyr Leu Pro Thr Ser
Ala Pro Ala Gln Ala Ala545 550 555
560Gln Ala Val Ala Thr Leu 565531701DNAartificial
sequencecodon optimized coding region for E. coli ribulokinase
53atggctatcg ctatcggttt ggacttcggt tctgactccg ttagagcttt ggctgttgac
60tgtgcttccg gtgaagaaat cgctacttct gtcgaatggt atccaagatg gcaaaagggt
120caattctgtg acgctccaaa caaccaattc agacaccacc caagagatta catcgaatct
180atggaagctg ctttgaagac tgttttggct gaattgtctg tcgaacaaag agctgctgtt
240gtcggtatcg gtgttgactc tactggttcc accccagctc caatcgacgc tgatggtaac
300gttttggctt tgagaccaga attcgctgaa aacccaaacg ctatgttcgt tttgtggaag
360gaccacactg ctgtcgaaag atccgaagaa atcaccagat tgtgtcacgc tccaggtaac
420gttgactact ccagatacat cggtggtatc tactcttccg aatggttctg ggctaagatt
480ttgcacgtta ctagacaaga ctctgctgtc gctcaatctg ctgcttcctg gatcgaattg
540tgtgactggg ttccagcttt gttgtctggt actactagac cacaagatat cagaagaggt
600agatgttctg ctggtcacaa gtccttgtgg cacgaatctt ggggtggttt gccaccagct
660tccttcttcg acgaattgga cccaatcttg aacagacact tgccatctcc attgttcacc
720gacacttgga ccgctgatat cccagtcggt actttgtgtc cagaatgggc tcaaagattg
780ggtttgccag aatctgttgt catctccggt ggtgctttcg actgtcacat gggtgctgtt
840ggtgctggtg ctcaaccaaa cgctttggtt aaggtcatcg gtacttccac ctgtgacatc
900ttgatcgctg ataagcaatc tgttggtgaa agagctgtca agggtatctg tggtcaagtt
960gacggttccg ttgtcccagg tttcatcggt ttggaagctg gtcaatctgc tttcggtgac
1020atctacgctt ggttcggtag agttttgtcc tggccattgg aacaattggc tgctcaacac
1080ccagaattga aggctcaaat caacgcttct caaaagcaat tgttgccagc tttgactgaa
1140gcctgggcta agaacccatc cttggaccac ttgccagttg tcttggattg gttcaacggt
1200agaagatccc caaacgctaa ccaaagattg aagggtgtta tcactgactt gaacttggct
1260accgatgctc cattgttgtt cggtggtttg atcgctgcta ctgctttcgg tgctagagct
1320atcatggaat gtttcaccga ccaaggtatc gctgttaaca acgtcatggc tttgggtggt
1380atcgctagaa agaaccaagt tatcatgcaa gcctgttgtg acgtcttgaa cagaccattg
1440caaatcgttg cttctgatca atgttgtgct ttgggtgctg ctatcttcgc tgctgttgct
1500gctaaggtcc acgctgacat cccatccgct caacaaaaga tggcttctgc tgtcgaaaag
1560accttgcaac caagatccga acaagctcaa agattcgaac aattgtacag aagataccaa
1620caatgggcta tgtccgctga acaacactac ttgccaactt ctgctccagc tcaagctgct
1680caagctgttg ctaccttgta a
1701542911DNAartificial sequenceconstructed chimeric expression cassette
for ECaraB 54gagaagaggt atacataaca agaaaatcgc gtgaacacct tatataactt
agcccgttat 60tgagctaaaa aaccttgcaa aatttcctat gaataagaat acttcagacg
tgataaaaat 120ttactttcta actcttctca cgctgcccct atctgttctt ccgctctacc
gtgagaaata 180aagcatcgag tacggcagtt cgctgtcact gaactaaaac aataaggcta
gttcgaatga 240tgaacttgct tgctgtcaaa cttctgagtt gccgctgatg tgacactgtg
acaataaatt 300caaaccggtt atagcggtct cctccggtac cggttctgcc acctccaata
gagctcagta 360ggagtcagaa cctctgcggt ggctgtcagt gactcatccg cgtttcgtaa
gttgtgcgcg 420tgcacatttc gcccgttccc gctcatcttg cagcaggcga aattttcatc
acgctgtagg 480acgcaaaaaa aaaataatta atcgtacaag aatcttggaa aaaaaattga
aaaattttgt 540ataaaaggga tgacctaact tgactcaatg gcttttacac ccagtatttt
ccctttcctt 600gtttgttaca attatagaag caagacaaaa acatatagac aacctattcc
taggagttat 660atttttttac cctaccagca atataagtaa aaaataaaac atggctatcg
ctatcggttt 720ggacttcggt tctgactccg ttagagcttt ggctgttgac tgtgcttccg
gtgaagaaat 780cgctacttct gtcgaatggt atccaagatg gcaaaagggt caattctgtg
acgctccaaa 840caaccaattc agacaccacc caagagatta catcgaatct atggaagctg
ctttgaagac 900tgttttggct gaattgtctg tcgaacaaag agctgctgtt gtcggtatcg
gtgttgactc 960tactggttcc accccagctc caatcgacgc tgatggtaac gttttggctt
tgagaccaga 1020attcgctgaa aacccaaacg ctatgttcgt tttgtggaag gaccacactg
ctgtcgaaag 1080atccgaagaa atcaccagat tgtgtcacgc tccaggtaac gttgactact
ccagatacat 1140cggtggtatc tactcttccg aatggttctg ggctaagatt ttgcacgtta
ctagacaaga 1200ctctgctgtc gctcaatctg ctgcttcctg gatcgaattg tgtgactggg
ttccagcttt 1260gttgtctggt actactagac cacaagatat cagaagaggt agatgttctg
ctggtcacaa 1320gtccttgtgg cacgaatctt ggggtggttt gccaccagct tccttcttcg
acgaattgga 1380cccaatcttg aacagacact tgccatctcc attgttcacc gacacttgga
ccgctgatat 1440cccagtcggt actttgtgtc cagaatgggc tcaaagattg ggtttgccag
aatctgttgt 1500catctccggt ggtgctttcg actgtcacat gggtgctgtt ggtgctggtg
ctcaaccaaa 1560cgctttggtt aaggtcatcg gtacttccac ctgtgacatc ttgatcgctg
ataagcaatc 1620tgttggtgaa agagctgtca agggtatctg tggtcaagtt gacggttccg
ttgtcccagg 1680tttcatcggt ttggaagctg gtcaatctgc tttcggtgac atctacgctt
ggttcggtag 1740agttttgtcc tggccattgg aacaattggc tgctcaacac ccagaattga
aggctcaaat 1800caacgcttct caaaagcaat tgttgccagc tttgactgaa gcctgggcta
agaacccatc 1860cttggaccac ttgccagttg tcttggattg gttcaacggt agaagatccc
caaacgctaa 1920ccaaagattg aagggtgtta tcactgactt gaacttggct accgatgctc
cattgttgtt 1980cggtggtttg atcgctgcta ctgctttcgg tgctagagct atcatggaat
gtttcaccga 2040ccaaggtatc gctgttaaca acgtcatggc tttgggtggt atcgctagaa
agaaccaagt 2100tatcatgcaa gcctgttgtg acgtcttgaa cagaccattg caaatcgttg
cttctgatca 2160atgttgtgct ttgggtgctg ctatcttcgc tgctgttgct gctaaggtcc
acgctgacat 2220cccatccgct caacaaaaga tggcttctgc tgtcgaaaag accttgcaac
caagatccga 2280acaagctcaa agattcgaac aattgtacag aagataccaa caatgggcta
tgtccgctga 2340acaacactac ttgccaactt ctbamhgctc cagctcaagc tgctcaagct
gttgctacct 2400tgtaaggatc caggagcaat gcaaaatcta ggggtagaat tactttttga
aaaggaaaaa 2460tattcaggtt tgttgttttt atgtaagttg tatgatttga tatacatata
tatatatata 2520taatatatat tgtacatgtg tttttccggg gaagaatgga ttatccggag
gtgtgaataa 2580aatgatgacg attataggtt tgtgttgtaa tatttagata actcaattct
cgccagtttg 2640aactccaacc tagactggtt caaagctttt gctatcaaga tgagatatat
ggaattttcg 2700tctttatcgt ccacttgtat ctttatttcc tcgtcatctt catcaatatt
gattccatta 2760ataatcgatt tatcgctcag agtgttgacc aattcggtct tgttggggaa
gaaatgttcc 2820atttttcttc ccaagttttg aattctttca caaacccagg caattctttg
taagcctaat 2880gcagcagaag aaccctttaa aaaatggccc a
291155231PRTEscherichia coli 55Met Leu Glu Asp Leu Lys Arg Gln
Val Leu Glu Ala Asn Leu Ala Leu1 5 10
15Pro Lys His Asn Leu Val Thr Leu Thr Trp Gly Asn Val Ser
Ala Val 20 25 30Asp Arg Glu
Arg Gly Val Phe Val Ile Lys Pro Ser Gly Val Asp Tyr 35
40 45Ser Ile Met Thr Ala Asp Asp Met Val Val Val
Ser Ile Glu Thr Gly 50 55 60Glu Val
Val Glu Gly Ala Lys Lys Pro Ser Ser Asp Thr Pro Thr His65
70 75 80Arg Leu Leu Tyr Gln Ala Phe
Pro Ser Ile Gly Gly Ile Val His Thr 85 90
95His Ser Arg His Ala Thr Ile Trp Ala Gln Ala Gly Gln
Ser Ile Pro 100 105 110Ala Thr
Gly Thr Thr His Ala Asp Tyr Phe Tyr Gly Thr Ile Pro Cys 115
120 125Thr Arg Lys Met Thr Asp Ala Glu Ile Asn
Gly Glu Tyr Glu Trp Glu 130 135 140Thr
Gly Asn Val Ile Val Glu Thr Phe Glu Lys Gln Gly Ile Asp Ala145
150 155 160Ala Gln Met Pro Gly Val
Leu Val His Ser His Gly Pro Phe Ala Trp 165
170 175Gly Lys Asn Ala Glu Asp Ala Val His Asn Ala Ile
Val Leu Glu Glu 180 185 190Val
Ala Tyr Met Gly Ile Phe Cys Arg Gln Leu Ala Pro Gln Leu Pro 195
200 205Asp Met Gln Gln Thr Leu Leu Asn Lys
His Tyr Leu Arg Lys His Gly 210 215
220Ala Lys Ala Tyr Tyr Gly Gln225 23056696DNAartificial
sequenceE coli L-ribulose-5-phosphate 4-epimerase codon optimized
coding region 56atgttggaag atttgaagag acaagttttg gaagctaact tggctttgcc
aaagcacaac 60ttggtcacct tgacctgggg taacgtctct gctgttgaca gagaaagagg
tgtcttcgtt 120atcaagccat ctggtgttga ttactccatc atgactgctg acgatatggt
tgtcgtttcc 180atcgaaaccg gtgaagtcgt tgaaggtgct aagaagccat cttccgacac
cccaactcac 240agattgttgt accaagcctt cccatctatc ggtggtatcg tccacactca
ctccagacac 300gctaccatct gggctcaagc tggtcaatct atcccagcta ctggtactac
tcacgctgac 360tacttctacg gtactatccc atgtactaga aagatgaccg atgctgaaat
caacggtgaa 420tacgaatggg aaactggtaa cgtcatcgtt gaaaccttcg aaaagcaagg
tatcgacgct 480gctcaaatgc caggtgtctt ggttcactct cacggtccat tcgcttgggg
taaaaacgct 540gaagatgctg ttcacaacgc tatcgtcttg gaagaagttg cttacatggg
tatcttctgt 600agacaattgg ctccacaatt gccagacatg caacaaacct tgttgaacaa
gcactacttg 660agaaagcacg gtgctaaggc ttactacggt caataa
696571691DNAartificial sequenceconstructed chimeric
expression cassette for ECaraD 57agttcgagtt tatcattatc aatactgcca
tttcaaagaa tacgtaaata attaatagta 60gtgattttcc taactttatt tagtcaaaaa
attagccttt taattctgct gtaacccgta 120catgcccaaa atagggggcg ggttacacag
aatatataac atcgtaggtg tctgggtgaa 180cagtttattc ctggcatcca ctaaatataa
tggagcccgc tttttaagct ggcatccaga 240aaaaaaaaga atcccagcac caaaatattg
ttttcttcac caaccatcag ttcataggtc 300cattctctta gcgcaactac agagaacagg
ggcacaaaca ggcaaaaaac gggcacaacc 360tcaatggagt gatgcaacct gcctggagta
aatgatgaca caaggcaatt gacccacgca 420tgtatctatc tcattttctt acaccttcta
ttaccttctg ctctctctga tttggaaaaa 480gctgaaaaaa aaggttgaaa ccagttccct
gaaattattc ccctacttga ctaataagta 540tataaagacg gtaggtattg attgtaattc
tgtaaatcta tttcttaaac ttcttaaatt 600ctacttttat agttagtctt ttttttagtt
ttaaaacacc aagaacttag tttcgaataa 660acacacataa acaaacaaaa tgttggaaga
tttgaagaga caagttttgg aagctaactt 720ggctttgcca aagcacaact tggtcacctt
gacctggggt aacgtctctg ctgttgacag 780agaaagaggt gtcttcgtta tcaagccatc
tggtgttgat tactccatca tgactgctga 840cgatatggtt gtcgtttcca tcgaaaccgg
tgaagtcgtt gaaggtgcta agaagccatc 900ttccgacacc ccaactcaca gattgttgta
ccaagccttc ccatctatcg gtggtatcgt 960ccacactcac tccagacacg ctaccatctg
ggctcaagct ggtcaatcta tcccagctac 1020tggtactact cacgctgact acttctacgg
tactatccca tgtactagaa agatgaccga 1080tgctgaaatc aacggtgaat acgaatggga
aactggtaac gtcatcgttg aaaccttcga 1140aaagcaaggt atcgacgctg ctcaaatgcc
aggtgtcttg gttcactctc acggtccatt 1200cgcttggggt aaaaacgctg aagatgctgt
tcacaacgct atcgtcttgg aagaagttgc 1260ttacatgggt atcttctgta gacaattggc
tccacaattg ccagacatgc aacaaacctt 1320gttgaacaag cactacttga gaaagcacgg
tgctaaggct tactacggtc aataagagta 1380agcgaatttc ttatgattta tgatttttat
tattaaataa gttataaaaa aaataagtgt 1440atacaaattt taaagtgact cttaggtttt
aaaacgaaaa ttcttattct tgagtaactc 1500tttcctgtag gtcaggttgc tttctcaggt
atagcatgag gtcgctctta ttgaccacac 1560ctctaccggc atgccgagca aatgcctgca
aatcgctccc catttcaccc aattgtagat 1620atgctaactc cagcaatgag ttgatgaatc
tcggtgtgta ttttatgtcc tcagaggaca 1680acacctgtgg t
1691588202DNAartificial
sequenceconstructed plasmidmisc_feature(421)..(421)n is a, c, g, or t
58tcccattacc gacatttggg cgctatacgt gcatatgttc atgtatgtat ctgtatttaa
60aacacttttg tattattttt cctcatatat gtgtataggt ttatacggat gatttaatta
120ttacttcacc accctttatt tcaggctgat atcttagcct tgttactaga ttaatcatgt
180aattagttat gtcacgctta cattcacgcc ctccccccac atccgctcta accgaaaagg
240aaggagttag acaacctgaa gtctaggtcc ctatttattt ttttatagtt atgttagtat
300taagaacgtt atttatattt caaatttttc ttttttttct gtacagacgc gtgtacgcat
360gtaacattat actgaaaacc ttgcttgaga aggttttggg acgctcgaag gctttaattt
420ntgcgggcgg ccgctggaca atttattcat ggcatcgtca ttgatataag tggcttgagc
480tgtggataag aaaagccata tatttatata aacatttaga tatgaatagg aagtagattg
540ttcgacgcaa ctacccgttc aagaagtata atggggaatg gtctcatctt ccctcacagg
600atatagttct ctgaagagat acatacgttt gtgtatacta tgcttcttta tcaactcaag
660ttttgtagag gaagacgttg aagatggtga tgtgacatct ttactattct ccagcacgtt
720ttcagtattt acttaatcgt atattaatga cgtcccttat ctattaactt tccggttttt
780ctttttttcg gtgaatgttc tttccgtttt agtgaatttt tcaattgtaa ttgacgcaat
840cggtttataa caagcagaca taaatatcaa gctcgagcca aatcacaaaa aaagccttat
900agbamhcttg ccctgacaaa gaatatacaa ctcgggaagg atccgagaag aggtatacat
960aacaagaaaa tcgcgtgaac accttatata acttagcccg ttattgagct aaaaaacctt
1020gcaaaatttc ctatgaataa gaatacttca gacgtgataa aaatttactt tctaactctt
1080ctcacgctgc ccctatctgt tcttccgctc taccgtgaga aataaagcat cgagtacggc
1140agttcgctgt cactgaacta aaacaataag gctagttcga atgatgaact tgcttgctgt
1200caaacttctg agttgccgct gatgtgacac tgtgacaata aattcaaacc ggttatagcg
1260gtctcctccg gtaccggttc tgccacctcc aatagagctc agtaggagtc agaacctctg
1320cggtggctgt cagtgactca tccgcgtttc gtaagttgtg cgcgtgcaca tttcgcccgt
1380tcccgctcat cttgcagcag gcgaaatttt catcacgctg taggacgcaa aaaaaaaata
1440attaatcgta caagaatctt ggaaaaaaaa ttgaaaaatt ttgtataaaa gggatgacct
1500aacttgactc aatggctttt acacccagta ttttcccttt ccttgtttgt tacaattata
1560gaagcaagac aaaaacatat agacaaccta ttcctaggag ttatattttt ttaccctacc
1620agcaatataa gtaaaaaata aaacatggct atcgctatcg gtttggactt cggttctgac
1680tccgttagag ctttggctgt tgactgtgct tccggtgaag aaatcgctac ttctgtcgaa
1740tggtatccaa gatggcaaaa gggtcaattc tgtgacgctc caaacaacca attcagacac
1800cacccaagag attacatcga atctatggaa gctgctttga agactgtttt ggctgaattg
1860tctgtcgaac aaagagctgc tgttgtcggt atcggtgttg actctactgg ttccacccca
1920gctccaatcg acgctgatgg taacgttttg gctttgagac cagaattcgc tgaaaaccca
1980aacgctatgt tcgttttgtg gaaggaccac actgctgtcg aaagatccga agaaatcacc
2040agattgtgtc acgctccagg taacgttgac tactccagat acatcggtgg tatctactct
2100tccgaatggt tctgggctaa gattttgcac gttactagac aagactctgc tgtcgctcaa
2160tctgctgctt cctggatcga attgtgtgac tgggttccag ctttgttgtc tggtactact
2220agaccacaag atatcagaag aggtagatgt tctgctggtc acaagtcctt gtggcacgaa
2280tcttggggtg gtttgccacc agcttccttc ttcgacgaat tggacccaat cttgaacaga
2340cacttgccat ctccattgtt caccgacact tggaccgctg atatcccagt cggtactttg
2400tgtccagaat gggctcaaag attgggtttg ccagaatctg ttgtcatctc cggtggtgct
2460ttcgactgtc acatgggtgc tgttggtgct ggtgctcaac caaacgcttt ggttaaggtc
2520atcggtactt ccacctgtga catcttgatc gctgataagc aatctgttgg tgaaagagct
2580gtcaagggta tctgtggtca agttgacggt tccgttgtcc caggtttcat cggtttggaa
2640gctggtcaat ctgctttcgg tgacatctac gcttggttcg gtagagtttt gtcctggcca
2700ttggaacaat tggctgctca acacccagaa ttgaaggctc aaatcaacgc ttctcaaaag
2760caattgttgc cagctttgac tgaagcctgg gctaagaacc catccttgga ccacttgcca
2820gttgtcttgg attggttcaa cggtagaaga tccccaaacg ctaaccaaag attgaagggt
2880gttatcactg acttgaactt ggctaccgat gctccattgt tgttcggtgg tttgatcgct
2940gctactgctt tcggtgctag agctatcatg gaatgtttca ccgaccaagg tatcgctgtt
3000aacaacgtca tggctttggg tggtatcgct agaaagaacc aagttatcat gcaagcctgt
3060tgtgacgtct tgaacagacc attgcaaatc gttgcttctg atcaatgttg tgctttgggt
3120gctgctatct tcgctgctgt tgctgctaag gtccacgctg acatcccatc cgctcaacaa
3180aagatggctt ctgctgtcga aaagaccttg caaccaagat ccgaacaagc tcaaagattc
3240gaacaattgt acagaagata ccaacaatgg gctatgtccg ctgaacaaca ctacttgcca
3300acttctbamh gctccagctc aagctgctca agctgttgct accttgtaag gatccaggag
3360caatgcaaaa tctaggggta gaattacttt ttgaaaagga aaaatattca ggtttgttgt
3420ttttatgtaa gttgtatgat ttgatataca tatatatata tatataatat atattgtaca
3480tgtgtttttc cggggaagaa tggattatcc ggaggtgtga ataaaatgat gacgattata
3540ggtttgtgtt gtaatattta gataactcaa ttctcgccag tttgaactcc aacctagact
3600ggttcaaagc ttttgctatc aagatgagat atatggaatt ttcgtcttta tcgtccactt
3660gtatctttat ttcctcgtca tcttcatcaa tattgattcc attaataatc gatttatcgc
3720tcagagtgtt gaccaattcg gtcttgttgg ggaagaaatg ttccattttt cttcccaagt
3780tttgaattct ttcacaaacc caggcaattc tttgtaagcc taatgcagca gaagaaccct
3840ttaaaaaatg gcccattaat gtggctgtgg tttcagggtc cataaagctt ttcaattcat
3900cttttttttt tttgttcttt tttttgattc cggtttcttt gaaatttttt tgattcggta
3960atctccgagc agaaggaaga acgaaggaag gagcacagac ttagattggt atatatacgc
4020atatgtggtg ttgaagaaac atgaaattgc ccagtattct taacccaact gcacagaaca
4080aaaacctgca ggaaacgaag ataaatcatg tcgaaagcta catataagga acgtgctgct
4140actcatccta gtcctgttgc tgccaagcta tttaatatca tgcacgaaaa gcaaacaaac
4200ttgtgtgctt cattggatgt tcgtaccacc aaggaattac tggagttagt tgaagcatta
4260ggtcccaaaa tttgtttact aaaaacacat gtggatatct tgactgattt ttccatggag
4320ggcacagtta agccgctaaa ggcattatcc gccaagtaca attttttact cttcgaagac
4380agaaaatttg ctgacattgg taatacagtc aaattgcagt actctgcggg tgtatacaga
4440atagcagaat gggcagacat tacgaatgca cacggtgtgg tgggcccagg tattgttagc
4500ggtttgaagc aggcggcgga agaagtaaca aaggaaccta gaggcctttt gatgttagca
4560gaattgtcat gcaagggctc cctagctact ggagaatata ctaagggtac tgttgacatt
4620gcgaagagcg acaaagattt tgttatcggc tttattgctc aaagagacat gggtggaaga
4680gatgaaggtt acgattggtt gattatgaca cccggtgtgg gtttagatga caagggagac
4740gcattgggtc aacagtatag aaccgtggat gatgtggtct ctacaggatc tgacattatt
4800attgttggaa gaggactatt tgcaaaggga agggatgcta aggtagaggg tgaacgttac
4860agaaaagcag gctgggaagc atatttgaga agatgcggcc agcaaaacta aaaaactgta
4920ttataagtaa atgcatgtat actaaactca caaattagag cttcaattta attatatcag
4980ttattacccg ggaatctcgg tcgtaatgat ttctataatg acgaaaaaaa aaaaattgga
5040aagaaaaagc ttcatggcct tctaaatcac catttttggt gaacggcctt gataaggatg
5100ttagttgtgt tattgctggg ttagacacga aggtaaatta ccaccgtttg gctgttacac
5160tgcagtattt gcagaaggat tctgttcact ttgttggtac aaatgttgat tctactttcc
5220cgcaaaaggg ttatacattt cccggtgcag gctccatgat tgaatcattg gcattctcat
5280ctaataggag gccatcgtac tgtggtaagc caaatcaaaa tatgctaaac agcattatat
5340cggcattcaa cctggataga tcaaagtgct gtatggttgg tgacagatta aacaccgata
5400tgaaattcgg tgttgaaggt gggttaggtg gcacactact cgttttgagt ggtattgaaa
5460ccgaagagag agccttgaag atttcgcacg attatccaag acctaaattt tacattgata
5520aacttggtga asccatctac accttaacca ataatgagtt atagggcgcg ccgacgtcag
5580gtggcacttt tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt
5640caaatatgta tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa
5700ggaagagtat gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt
5760gccttcctgt ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt
5820tgggtgcacg agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt
5880ttcgccccga agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg
5940tattatcccg tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga
6000atgacttggt tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa
6060gagaattatg cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga
6120caacgatcgg aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa
6180ctcgccttga tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca
6240ccacgatgcc tgtagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta
6300ctctagcttc ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac
6360ttctgcgctc ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc
6420gtgggtctcg cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag
6480ttatctacac gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga
6540taggtgcctc actgattaag cattggtaac tgtcagacca agtttactca tatatacttt
6600agattgattt aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata
6660atctcatgac caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag
6720aaaagatcaa aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa
6780caaaaaaacc accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt
6840ttccgaaggt aactggcttc agcagagcgc agataccaaa tactgttctt ctagtgtagc
6900cgtagttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa
6960tcctgttacc agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa
7020gacgatagtt accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc
7080ccagcttgga gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa
7140gcgccacgct tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa
7200caggagagcg cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg
7260ggtttcgcca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc
7320tatggaaaaa cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg
7380ctcacatgtt ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg
7440agtgagctga taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg
7500aagcggaaga gcgcccaata cgcaaaccgc ctctccccgc gcgttggccg attcattaat
7560gcagctggca cgacaggttt cccgactgga aagcgggcag tgagcgcaac gcaattaatg
7620tgagttagct cactcattag gcaccccagg ctttacactt tatgcttccg gctcgtatgt
7680tgtgtggaat tgtgagcgga taacaatttc acacaggaaa cagctatgac catgattacg
7740ccaagctttt tctttccaat tttttttttt tcgtcattat aaaaatcatt acgaccgaga
7800ttccctaata agagaatagg aacttcggaa taggaacttc aaagcgtttc cgaaaacgag
7860cgcttccgaa aatgcaacgc gagctgcgca catacagctc actgttcacg tcgcacctat
7920atctgcgtgt tgcctgtata tatatataca tgagaagaac ggcatagtgc gtgtttatgc
7980ttaaatgcgt acttatatgc gtctatttat gtaggatgaa aggtagtcta gtacctcctg
8040tgatattatc ccattccatg cggggtatcg tatgcttcct tcagcactac cctttagctg
8100ttctatatgc tgccactcct caattggatt agtctcatcc ttcaatgcta tcatttcctt
8160tgatattgga tcatatgcat agtaccgaga aactagagga tc
82025910887DNAartificial sequenceconstructed plasmid 59ggcctcgagc
ggaccggatc cttttctggc aaccaaaccc atacatcggg attcctataa 60taccttcgtt
ggtctcccta acatgtaggt ggcggagggg agatatacaa tagaacagat 120accagacaag
acataatggg ctaaacaaga ctacaccaat tacactgcct cattgatggt 180ggtacataac
gaactaatac tgtagcccta gacttgatag ccatcatcat atcgaagttt 240cactaccctt
tttccatttg ccatctattg aagtaataat aggcgcatgc aacttctttt 300cttttttttt
cttttctctc tcccccgttg ttgtctcacc atatccgcaa tgacaaaaaa 360atgatggaag
acactaaagg aaaaaattaa cgacaaagac agcaccaaca gatgtcgttg 420ttccagagct
gatgaggggt atctcgaagc acacgaaact ttttccttcc ttcattcacg 480cacactactc
tctaatgagc aacggtatac ggccttcctt ccagttactt gaatttgaaa 540taaaaaaaag
tttgctgtct tgctatcaag tataaataga cctgcaatta ttaatctttt 600gtttcctcgt
cattgttctc gttccctttc ttccttgttt ctttttctgc acaatatttc 660aagctatacc
aagcatacaa tcaactatct catatacaat gttgcaaact aaggattacg 720aattctggtt
cgttactggt tctcaacact tgtacggtga agaaactttg gaattggtcg 780atcaacacgc
taagtctatc tgtgaaggtt tgtccggtgt ctcttccaga tacaagatca 840cccacaagcc
agttgtcacc tcttccgaaa ctatcagaca attgttgaga gaagctgaat 900actctgaaac
ttgtgctggt atcatcacct ggatgcacac tttctctcca gctaagatgt 960ggatcgaagg
tttgtcttcc taccaaaagc cattgatgca cttgcacacc caatacaaca 1020gagacatccc
ttggggtact atcgacatgg atttcatgaa ctctaaccaa tccgctcacg 1080gtgacagaga
atacggttac atcaactcca gaatgggttt gtccagaaag gttgtcgctg 1140gttactggga
cgatgaagaa gtcaagaagg aaatctctca atggatggac accgctgctg 1200ctttgaacga
atccagacac atcaaggttg ctagattcgg tgacaacatg agacacgttg 1260ctgtcactga
cggtgacaag gttggtgctc acatccaatt cggttggcaa gttgacggtt 1320acggtatcgg
tgacttggtt gaagtcatga acagaatcac cgacgatgaa gttgacactt 1380tgtacgctga
atacgataga ttgtacgtca tctctgaaga aaccaagaga gacgaagcta 1440aggttgcttc
catcaaggaa caagctaaga tcgaattggg tttgaccact ttcttggaac 1500aaggtggtta
ctctgctttc accacttcct tcgaagtctt gcacggtatg aagcaattgc 1560caggtttggc
tgttcaaaga ttgatggaaa agggttacgg tttcgctggt gaaggtgact 1620ggaagaccgc
tgctttggtc agaatgatga agatcatgtc tcaaggtaaa agaacctcct 1680tcatggaaga
ctacacttac cacttcgaac caggtaacga aatgatcttg ggttctcaca 1740tgttggaagt
ttgtccaact gtcgctttgg accaaccaaa gatcgaagtt cacccattgt 1800ctatcggtgg
taaagaagat ccagctagat tcgtcttcaa cggtatctct ggttccgcta 1860tccaagcctc
tttggttgac atcggtggta gattcagatt ggttttgaac gaagtcaacg 1920gtcaagaaat
cgaaaaggac atgccaaact tgccagttgc tagagtcttg tggaagccag 1980aaccatcttt
gaagactgct gctgaagcct ggatcttggc tggtggtgct caccacacct 2040gtttgtctta
cgaattgact gtcgaacaaa tgttggactg ggctgaaatg gctggtatcg 2100aatctgtttt
gatctccaga gataccacta tccacaagtt gaagcacgaa ttgaagtgga 2160acgaagcctt
gtacagattg caaaagtaat taattaatca tgtaattagt tatgtcacgc 2220ttacattcac
gccctcctcc cacatccgct ctaaccgaaa aggaaggagt tagacaacct 2280gaagtctagg
tccctattta ttttttttaa tagttatgtt agtattaaga acgttattta 2340tatttcaaat
ttttcttttt tttctgtaca aacgcgtgta cgcatgtaac attatactga 2400aaaccttgct
tgagaaggtt ttgggacgct cgaaggcttt aatttgcggg cggccgctct 2460agaactagta
ccacaggtgt tgtcctctga ggacataaaa tacacaccga gattcatcaa 2520ctcattgctg
gagttagcat atctacaatt gggtgaaatg gggagcgatt tgcaggcatt 2580tgctcggcat
gccggtagag gtgtggtcaa taagagcgac ctcatgctat acctgagaaa 2640gcaacctgac
ctacaggaaa gagttactca agaataagaa ttttcgtttt aaaacctaag 2700agtcacttta
aaatttgtat acacttattt tttttataac ttatttaata ataaaaatca 2760taaatcataa
gaaattcgct tactcttatt gaccgtagta agccttagca ccgtgctttc 2820tcaagtagtg
cttgttcaac aaggtttgtt gcatgtctgg caattgtgga gccaattgtc 2880tacagaagat
acccatgtaa gcaacttctt ccaagacgat agcgttgtga acagcatctt 2940cagcgttttt
accccaagcg aatggaccgt gagagtgaac caagacacct ggcatttgag 3000cagcgtcgat
accttgcttt tcgaaggttt caacgatgac gttaccagtt tcccattcgt 3060attcaccgtt
gatttcagca tcggtcatct ttctagtaca tgggatagta ccgtagaagt 3120agtcagcgtg
agtagtacca gtagctggga tagattgacc agcttgagcc cagatggtag 3180cgtgtctgga
gtgagtgtgg acgataccac cgatagatgg gaaggcttgg tacaacaatc 3240tgtgagttgg
ggtgtcggaa gatggcttct tagcaccttc aacgacttca ccggtttcga 3300tggaaacgac
aaccatatcg tcagcagtca tgatggagta atcaacacca gatggcttga 3360taacgaagac
acctctttct ctgtcaacag cagagacgtt accccaggtc aaggtgacca 3420agttgtgctt
tggcaaagcc aagttagctt ccaaaacttg tctcttcaaa tcttccaaca 3480ttttgtttgt
ttatgtgtgt ttattcgaaa ctaagttctt ggtgttttaa aactaaaaaa 3540aagactaact
ataaaagtag aatttaagaa gtttaagaaa tagatttaca gaattacaat 3600caatacctac
cgtctttata tacttattag tcaagtaggg gaataatttc agggaactgg 3660tttcaacctt
ttttttcagc tttttccaaa tcagagagag cagaaggtaa tagaaggtgt 3720aagaaaatga
gatagataca tgcgtgggtc aattgccttg tgtcatcatt tactccaggc 3780aggttgcatc
actccattga ggttgtgccc gttttttgcc tgtttgtgcc cctgttctct 3840gtagttgcgc
taagagaatg gacctatgaa ctgatggttg gtgaagaaaa caatattttg 3900gtgctgggat
tctttttttt tctggatgcc agcttaaaaa gcgggctcca ttatatttag 3960tggatgccag
gaataaactg ttcacccaga cacctacgat gttatatatt ctgtgtaacc 4020cgccccctat
tttgggcatg tacgggttac agcagaatta aaaggctaat tttttgacta 4080aataaagtta
ggaaaatcac tactattaat tatttacgta ttctttgaaa tggcagtatt 4140gataatgata
aactcgaact agatctatcc gcggtgccgg cagatctatt taaatggcgc 4200gccgacgtca
ggtggcactt ttcggggaaa tgtgcgcgga acccctattt gtttattttt 4260ctaaatacat
tcaaatatgt atccgctcat gagacaataa ccctgataaa tgcttcaata 4320atattgaaaa
aggaagagta tgagtattca acatttccgt gtcgccctta ttcccttttt 4380tgcggcattt
tgccttcctg tttttgctca cccagaaacg ctggtgaaag taaaagatgc 4440tgaagatcag
ttgggtgcac gagtgggtta catcgaactg gatctcaaca gcggtaagat 4500ccttgagagt
tttcgccccg aagaacgttt tccaatgatg agcactttta aagttctgct 4560atgtggcgcg
gtattatccc gtattgacgc cgggcaagag caactcggtc gccgcataca 4620ctattctcag
aatgacttgg ttgagtactc accagtcaca gaaaagcatc ttacggatgg 4680catgacagta
agagaattat gcagtgctgc cataaccatg agtgataaca ctgcggccaa 4740cttacttctg
acaacgatcg gaggaccgaa ggagctaacc gcttttttgc acaacatggg 4800ggatcatgta
actcgccttg atcgttggga accggagctg aatgaagcca taccaaacga 4860cgagcgtgac
accacgatgc ctgtagcaat ggcaacaacg ttgcgcaaac tattaactgg 4920cgaactactt
actctagctt cccggcaaca attaatagac tggatggagg cggataaagt 4980tgcaggacca
cttctgcgct cggcccttcc ggctggctgg tttattgctg ataaatctgg 5040agccggtgag
cgtgggtctc gcggtatcat tgcagcactg gggccagatg gtaagccctc 5100ccgtatcgta
gttatctaca cgacggggag tcaggcaact atggatgaac gaaatagaca 5160gatcgctgag
ataggtgcct cactgattaa gcattggtaa ctgtcagacc aagtttactc 5220atatatactt
tagattgatt taaaacttca tttttaattt aaaaggatct aggtgaagat 5280cctttttgat
aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc 5340agaccccgta
gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg 5400ctgcttgcaa
acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct 5460accaactctt
tttccgaagg taactggctt cagcagagcg cagataccaa atactgttct 5520tctagtgtag
ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct 5580cgctctgcta
atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg 5640gttggactca
agacgatagt taccggataa ggcgcagcgg tcgggctgaa cggggggttc 5700gtgcacacag
cccagcttgg agcgaacgac ctacaccgaa ctgagatacc tacagcgtga 5760gctatgagaa
agcgccacgc ttcccgaagg gagaaaggcg gacaggtatc cggtaagcgg 5820cagggtcgga
acaggagagc gcacgaggga gcttccaggg ggaaacgcct ggtatcttta 5880tagtcctgtc
gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg 5940ggggcggagc
ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg 6000ctggcctttt
gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat 6060taccgccttt
gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc 6120agtgagcgag
gaagcggaag agcgcccaat acgcaaaccg cctctccccg cgcgttggcc 6180gattcattaa
tgcagctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa 6240cgcaattaat
gtgagttagc tcactcatta ggcaccccag gctttacact ttatgcttcc 6300ggctcgtatg
ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaa acagctatga 6360ccatgattac
gccaagcttt ttctttccaa tttttttttt ttcgtcatta taaaaatcat 6420tacgaccgag
attcccgggt aataactgat ataattaaat tgaagctcta atttgtgagt 6480ttagtataca
tgcatttact tataatacag ttttttagtt ttgctggccg catcttctca 6540aatatgcttc
ccagcctgct tttctgtaac gttcaccctc taccttagca tcccttccct 6600ttgcaaatag
tcctcttcca acaataataa tgtcagatcc tgtagagacc acatcatcca 6660cggttctata
ctgttgaccc aatgcgtctc ccttgtcatc taaacccaca ccgggtgtca 6720taatcaacca
atcgtaacct tcatctcttc cacccatgtc tctttgagca ataaagccga 6780taacaaaatc
tttgtcgctc ttcgcaatgt caacagtacc cttagtatat tctccagtag 6840atagggagcc
cttgcatgac aattctgcta acatcaaaag gcctctaggt tcctttgtta 6900cttcttctgc
cgcctgcttc aaaccgctaa caatacctgg gcccaccaca ccgtgtgcat 6960tcgtaatgtc
tgcccattct gctattctgt atacacccgc agagtactgc aatttgactg 7020tattaccaat
gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac ttggcggata 7080atgcctttag
cggcttaact gtgccctcca tggaaaaatc agtcaagata tccacatgtg 7140tttttagtaa
acaaattttg ggacctaatg cttcaactaa ctccagtaat tccttggtgg 7200tacgaacatc
caatgaagca cacaagtttg tttgcttttc gtgcatgata ttaaatagct 7260tggcagcaac
aggactagga tgagtagcag cacgttcctt atatgtagct ttcgacatga 7320tttatcttcg
tttcctgcag gtttttgttc tgtgcagttg ggttaagaat actgggcaat 7380ttcatgtttc
ttcaacacta catatgcgta tatataccaa tctaagtctg tgctccttcc 7440ttcgttcttc
cttctgttcg gagattaccg aatcaaaaaa atttcaagga aaccgaaatc 7500aaaaaaaaga
ataaaaaaaa aatgatgaat tgaaaagctt gcatgcctgc aggtcgactc 7560tagtatactc
cgtctactgt acgatacact tccgctcagg tccttgtcct ttaacgaggc 7620cttaccactc
ttttgttact ctattgatcc agctcagcaa aggcagtgtg atctaagatt 7680ctatcttcgc
gatgtagtaa aactagctag accgagaaag agactagaaa tgcaaaaggc 7740acttctacaa
tggctgccat cattattatc cgatgtgacg ctgcattttt tttttttttt 7800tttttttttt
tttttttttt tttttttttt tttttttgta caaatatcat aaaaaaagag 7860aatcttttta
agcaaggatt ttcttaactt cttcggcgac agcatcaccg acttcggtgg 7920tactgttgga
accacctaaa tcaccagttc tgatacctgc atccaaaacc tttttaactg 7980catcttcaat
ggctttacct tcttcaggca agttcaatga caatttcaac atcattgcag 8040cagacaagat
agtggcgata gggttgacct tattctttgg caaatctgga gcggaaccat 8100ggcatggttc
gtacaaacca aatgcggtgt tcttgtctgg caaagaggcc aaggacgcag 8160atggcaacaa
acccaaggag cctgggataa cggaggcttc atcggagatg atatcaccaa 8220acatgttgct
ggtgattata ataccattta ggtgggttgg gttcttaact aggatcatgg 8280cggcagaatc
aatcaattga tgttgaactt tcaatgtagg gaattcgttc ttgatggttt 8340cctccacagt
ttttctccat aatcttgaag aggccaaaac attagcttta tccaaggacc 8400aaataggcaa
tggtggctca tgttgtaggg ccatgaaagc ggccattctt gtgattcttt 8460gcacttctgg
aacggtgtat tgttcactat cccaagcgac accatcacca tcgtcttcct 8520ttctcttacc
aaagtaaata cctcccacta attctctaac aacaacgaag tcagtacctt 8580tagcaaattg
tggcttgatt ggagataagt ctaaaagaga gtcggatgca aagttacatg 8640gtcttaagtt
ggcgtacaat tgaagttctt tacggatttt tagtaaacct tgttcaggtc 8700taacactacc
ggtaccccat ttaggaccac ccacagcacc taacaaaacg gcatcagcct 8760tcttggaggc
ttccagcgcc tcatctggaa gtggaacacc tgtagcatcg atagcagcac 8820caccaattaa
atgattttcg aaatcgaact tgacattgga acgaacatca gaaatagctt 8880taagaacctt
aatggcttcg gctgtgattt cttgaccaac gtggtcacct ggcaaaacga 8940cgatcttctt
aggggcagac attacaatgg tatatccttg aaatatatat aaaaaaaaaa 9000aaaaaaaaaa
aaaaaaaaaa tgcagcttct caatgatatt cgaatacgct ttgaggagat 9060acagcctaat
atccgacaaa ctgttttaca gatttacgat cgtacttgtt acccatcatt 9120gaattttgaa
catccgaacc tgggagtttt ccctgaaaca gatagtatat ttgaacctgt 9180ataataatat
atagtctagc gctttacgga agacaatgta tgtatttcgg ttcctggaga 9240aactattgca
tctattgcat aggtaatctt gcacgtcgca tccccggttc attttctgcg 9300tttccatctt
gcacttcaat agcatatctt tgttaacgaa gcatctgtgc ttcattttgt 9360agaacaaaaa
tgcaacgcga gagcgctaat ttttcaaaca aagaatctga gctgcatttt 9420tacagaacag
aaatgcaacg cgaaagcgct attttaccaa cgaagaatct gtgcttcatt 9480tttgtaaaac
aaaaatgcaa cgcgagagcg ctaatttttc aaacaaagaa tctgagctgc 9540atttttacag
aacagaaatg caacgcgaga gcgctatttt accaacaaag aatctatact 9600tcttttttgt
tctacaaaaa tgcatcccga gagcgctatt tttctaacaa agcatcttag 9660attacttttt
ttctcctttg tgcgctctat aatgcagtct cttgataact ttttgcactg 9720taggtccgtt
aaggttagaa gaaggctact ttggtgtcta ttttctcttc cataaaaaaa 9780gcctgactcc
acttcccgcg tttactgatt actagcgaag ctgcgggtgc attttttcaa 9840gataaaggca
tccccgatta tattctatac cgatgtggat tgcgcatact ttgtgaacag 9900aaagtgatag
cgttgatgat tcttcattgg tcagaaaatt atgaacggtt tcttctattt 9960tgtctctata
tactacgtat aggaaatgtt tacattttcg tattgttttc gattcactct 10020atgaatagtt
cttactacaa tttttttgtc taaagagtaa tactagagat aaacataaaa 10080aatgtagagg
tcgagtttag atgcaagttc aaggagcgaa aggtggatgg gtaggttata 10140tagggatata
gcacagagat atatagcaaa gagatacttt tgagcaatgt ttgtggaagc 10200ggtattcgca
atattttagt agctcgttac agtccggtgc gtttttggtt ttttgaaagt 10260gcgtcttcag
agcgcttttg gttttcaaaa gcgctctgaa gttcctatac tttctagaga 10320ataggaactt
cggaatagga acttcaaagc gtttccgaaa acgagcgctt ccgaaaatgc 10380aacgcgagct
gcgcacatac agctcactgt tcacgtcgca cctatatctg cgtgttgcct 10440gtatatatat
atacatgaga agaacggcat agtgcgtgtt tatgcttaaa tgcgtactta 10500tatgcgtcta
tttatgtagg atgaaaggta gtctagtacc tcctgtgata ttatcccatt 10560ccatgcgggg
tatcgtatgc ttccttcagc actacccttt agctgttcta tatgctgcca 10620ctcctcaatt
ggattagtct catccttcaa tgctatcatt tcctttgata ttggatcata 10680tgcatagtac
cgagaaacta gaggatctcc cattaccgac atttgggcgc tatacgtgca 10740tatgttcatg
tatgtatctg tatttaaaac acttttgtat tatttttcct catatatgtg 10800tataggttta
tacggatgat ttaattatta cttcaccacc ctttatttca ggctgatatc 10860ttagccttgt
tactagtcac cggtggc
1088760500PRTEscherichia coli 60Met Thr Ile Phe Asp Asn Tyr Glu Val Trp
Phe Val Ile Gly Ser Gln1 5 10
15His Leu Tyr Gly Pro Glu Thr Leu Arg Gln Val Thr Gln His Ala Glu
20 25 30His Val Val Asn Ala Leu
Asn Thr Glu Ala Lys Leu Pro Cys Lys Leu 35 40
45Val Leu Lys Pro Leu Gly Thr Thr Pro Asp Glu Ile Thr Ala
Ile Cys 50 55 60Arg Asp Ala Asn Tyr
Asp Asp Pro Cys Ala Gly Leu Val Val Trp Leu65 70
75 80His Thr Phe Ser Pro Ala Lys Met Trp Ile
Asn Gly Leu Thr Met Leu 85 90
95Asn Lys Pro Leu Leu Gln Phe His Thr Gln Phe Asn Ala Ala Leu Pro
100 105 110Trp Asp Ser Ile Asp
Met Asp Phe Met Asn Leu Asn Gln Thr Ala His 115
120 125Gly Gly Arg Glu Phe Gly Phe Ile Gly Ala Arg Met
Arg Gln Gln His 130 135 140Ala Val Val
Thr Gly His Trp Gln Asp Lys Gln Ala His Glu Arg Ile145
150 155 160Gly Ser Trp Met Arg Gln Ala
Val Ser Lys Gln Asp Thr Arg His Leu 165
170 175Lys Val Cys Arg Phe Gly Asp Asn Met Arg Glu Val
Ala Val Thr Asp 180 185 190Gly
Asp Lys Val Ala Ala Gln Ile Lys Phe Gly Phe Ser Val Asn Thr 195
200 205Trp Ala Val Gly Asp Leu Val Gln Val
Val Asn Ser Ile Ser Asp Gly 210 215
220Asp Val Asn Ala Leu Val Asp Glu Tyr Glu Ser Cys Tyr Thr Met Thr225
230 235 240Pro Ala Thr Gln
Ile His Gly Glu Lys Arg Gln Asn Val Leu Glu Ala 245
250 255Ala Arg Ile Glu Leu Gly Met Lys Arg Phe
Leu Glu Gln Gly Gly Phe 260 265
270His Ala Phe Thr Thr Thr Phe Glu Asp Leu His Gly Leu Lys Gln Leu
275 280 285Pro Gly Leu Ala Val Gln Arg
Leu Met Gln Gln Gly Tyr Gly Phe Ala 290 295
300Gly Glu Gly Asp Trp Lys Thr Ala Ala Leu Leu Arg Ile Met Lys
Val305 310 315 320Met Ser
Thr Gly Leu Gln Gly Gly Thr Ser Phe Met Glu Asp Tyr Thr
325 330 335Tyr His Phe Glu Lys Gly Asn
Asp Leu Val Leu Gly Ser His Met Leu 340 345
350Glu Val Cys Pro Ser Ile Ala Val Glu Glu Lys Pro Ile Leu
Asp Val 355 360 365Gln His Leu Gly
Ile Gly Gly Lys Asp Asp Pro Ala Arg Leu Ile Phe 370
375 380Asn Thr Gln Thr Gly Pro Ala Ile Val Ala Ser Leu
Ile Asp Leu Gly385 390 395
400Asp Arg Tyr Arg Leu Leu Val Asn Cys Ile Asp Thr Val Lys Thr Pro
405 410 415His Ser Leu Pro Lys
Leu Pro Val Ala Asn Ala Leu Trp Lys Ala Gln 420
425 430Pro Asp Leu Pro Thr Ala Ser Glu Ala Trp Ile Leu
Ala Gly Gly Ala 435 440 445His His
Thr Val Phe Ser His Ala Leu Asn Leu Asn Asp Met Arg Gln 450
455 460Phe Ala Glu Met His Asp Ile Glu Ile Thr Val
Ile Asp Asn Asp Thr465 470 475
480Arg Leu Pro Ala Phe Lys Asp Ala Leu Arg Trp Asn Glu Val Tyr Tyr
485 490 495Gly Phe Arg Arg
50061493PRTBacillus licheniformis 61Met Ile Gln Ala Lys Thr His
Val Phe Trp Phe Val Thr Gly Ser Gln1 5 10
15His Leu Tyr Gly Glu Glu Ala Val Gln Glu Val Glu Glu
His Ser Lys 20 25 30Met Ile
Cys Asn Gly Leu Asn Asp Gly Asp Leu Arg Phe Gln Val Glu 35
40 45Tyr Lys Ala Val Ala Thr Ser Leu Asp Gly
Val Arg Lys Leu Phe Glu 50 55 60Glu
Ala Asn Arg Asp Glu Glu Cys Ala Gly Ile Ile Thr Trp Met His65
70 75 80Thr Phe Ser Pro Ala Lys
Met Trp Ile Pro Gly Leu Ser Glu Leu Asn 85
90 95Lys Pro Leu Leu His Phe His Thr Gln Phe Asn Arg
Asp Ile Pro Trp 100 105 110Asp
Lys Ile Asp Met Asp Phe Met Asn Ile Asn Gln Ser Ala His Gly 115
120 125Asp Arg Glu Tyr Gly Phe Ile Gly Ala
Arg Leu Gly Ile Pro Arg Lys 130 135
140Val Ile Ala Gly Tyr Trp Glu Asp Arg Glu Val Lys Arg Ser Ile Asp145
150 155 160Lys Trp Met Ser
Ala Ala Val Ala Tyr Ile Glu Ser Arg His Ile Lys 165
170 175Val Ala Arg Phe Gly Asp Asn Met Arg Asn
Val Ala Val Thr Glu Gly 180 185
190Asp Lys Ile Glu Ala Gln Ile Gln Leu Gly Trp Ser Val Asp Gly Tyr
195 200 205Gly Ile Gly Asp Leu Val Thr
Glu Ile Asn Ala Val Ser Glu Gln Ser 210 215
220Leu Ser Glu Leu Ile Ser Glu Tyr Glu Glu Leu Tyr Glu Trp Pro
Glu225 230 235 240Gly Glu
Ala Ala Arg Glu Ser Val Lys Glu Gln Ala Arg Ile Glu Leu
245 250 255Gly Leu Lys Arg Phe Leu Ser
Ser Gly Gly Tyr Thr Ala Phe Thr Thr 260 265
270Thr Phe Glu Asp Leu His Gly Met Lys Gln Leu Pro Gly Leu
Ala Val 275 280 285Gln Arg Leu Met
Ala Glu Gly Tyr Gly Phe Gly Gly Glu Gly Asp Trp 290
295 300Lys Thr Ala Ala Leu Val Arg Met Met Lys Met Met
Ala Gly Gly Lys305 310 315
320Glu Thr Ser Phe Met Glu Asp Tyr Thr Tyr His Phe Glu Pro Gly Asn
325 330 335Glu Met Ile Leu Gly
Ser His Met Leu Glu Val Cys Pro Ser Ile Ala 340
345 350Glu His Lys Pro Arg Ile Glu Val His Pro Leu Ser
Met Gly Ala Lys 355 360 365Asp Asp
Pro Ala Arg Leu Val Phe Asp Gly Ile Ala Gly Pro Ala Val 370
375 380Asn Val Ser Leu Ile Asp Leu Gly Gly Arg Phe
Arg Leu Val Ile Asn385 390 395
400Lys Val Glu Ala Val Lys Val Pro His Asp Met Pro Asn Leu Pro Val
405 410 415Ala Arg Val Leu
Trp Lys Pro Gln Pro Ser Leu Arg Thr Ser Ala Glu 420
425 430Ala Trp Ile Leu Ala Gly Gly Ala His His Thr
Cys Leu Ser Tyr Gln 435 440 445Leu
Thr Ala Glu Gln Met Leu Asp Trp Ala Glu Met Ser Gly Ile Glu 450
455 460Ala Val Leu Ile Asn Arg Asp Thr Thr Ile
Leu Asn Leu Arg Asn Glu465 470 475
480Leu Lys Trp Ser Glu Ala Ala Tyr Arg Leu Arg Lys Phe
485 49062488PRTClostridium acetobutylicum 62Met Leu
Glu Asn Lys Lys Met Glu Phe Trp Phe Val Val Gly Ser Gln1 5
10 15His Leu Tyr Gly Glu Glu Ala Leu
Lys Glu Val Arg Lys Asn Ser Glu 20 25
30Thr Ile Val Asp Glu Leu Asn Lys Ser Ala Asn Leu Pro Tyr Lys
Ile 35 40 45Ile Phe Lys Asp Leu
Ala Thr Ser Ala Asp Lys Ile Lys Glu Ile Met 50 55
60Lys Glu Val Asn Tyr Arg Asp Glu Val Ala Gly Val Ile Thr
Trp Met65 70 75 80His
Thr Phe Ser Pro Ala Lys Met Trp Ile Ala Gly Thr Lys Ile Leu
85 90 95Gln Lys Pro Leu Leu His Phe
Ala Thr Gln Tyr Asn Glu Asn Ile Pro 100 105
110Trp Lys Thr Ile Asp Met Asp Tyr Met Asn Leu His Gln Ser
Ala His 115 120 125Gly Asp Arg Glu
Tyr Gly Phe Ile Asn Ala Arg Leu Lys Lys His Asn 130
135 140Lys Val Val Val Gly Tyr Trp Lys Asp Lys Glu Val
Gln Lys Gln Val145 150 155
160Ser Asp Trp Met Lys Val Ala Ala Gly Tyr Ile Ala Ser Glu Ser Ile
165 170 175Lys Val Ala Arg Phe
Gly Asp Asn Met Arg Asn Val Ala Val Thr Glu 180
185 190Gly Asp Lys Val Glu Ala Gln Ile Gln Phe Gly Trp
Thr Val Asp Tyr 195 200 205Phe Gly
Ile Gly Asp Leu Val Ala Glu Met Asp Lys Val Ser Gln Asp 210
215 220Glu Ile Asn Lys Thr Tyr Glu Glu Phe Lys Asp
Leu Tyr Ile Leu Asp225 230 235
240Pro Gly Glu Asn Asp Pro Ala Phe Tyr Glu Lys Gln Val Lys Glu Gln
245 250 255Ile Lys Ile Glu
Ile Gly Leu Arg Arg Phe Leu Glu Lys Gly Asn Tyr 260
265 270Asn Ala Phe Thr Thr Asn Phe Glu Asp Leu Tyr
Gly Met Lys Gln Leu 275 280 285Pro
Gly Leu Ala Val Gln Arg Leu Asn Ala Glu Gly Tyr Gly Phe Ala 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala Ala Leu
Asp Arg Leu Leu Lys Val305 310 315
320Met Thr Asn Asn Thr Ala Thr Gly Phe Met Glu Asp Tyr Thr Tyr
Glu 325 330 335Leu Ser Arg
Gly Asn Glu Lys Ala Leu Gly Ala His Met Leu Glu Val 340
345 350Asp Pro Thr Phe Ala Ser Asp Lys Pro Lys
Val Ile Val Lys Pro Leu 355 360
365Gly Ile Gly Asp Lys Glu Asp Pro Ala Arg Leu Ile Phe Asn Gly Ser 370
375 380Thr Gly Lys Gly Val Ala Val Ser
Met Leu Asp Leu Gly Thr His Tyr385 390
395 400Arg Leu Ile Ile Asn Gly Leu Thr Ala Val Lys Pro
Asp Glu Asp Met 405 410
415Pro Asn Leu Pro Val Ala Lys Met Val Trp Lys Pro Glu Pro Asn Phe
420 425 430Ile Glu Gly Val Lys Ser
Trp Ile Tyr Ala Gly Gly Gly His His Thr 435 440
445Val Val Ser Leu Glu Leu Thr Val Glu Gln Val Tyr Asp Trp
Ser Arg 450 455 460Met Val Gly Leu Glu
Ala Val Ile Ile Asp Lys Asp Thr Lys Leu Arg465 470
475 480Asp Ile Ile Glu Lys Thr Thr Lys
48563474PRTLeuconostoc mesenteroides 63Met Ala Asp Ile Lys Asp Tyr
Lys Phe Trp Phe Val Thr Gly Ser Gln1 5 10
15Phe Leu Tyr Gly Pro Glu Val Leu Lys Gln Val Glu Glu
Asp Ser Lys 20 25 30Lys Ile
Ile Glu Lys Leu Asn Glu Ser Gly Asn Leu Pro Tyr Pro Ile 35
40 45Glu Phe Lys Thr Val Gly Val Thr Ala Glu
Asn Ile Thr Glu Ala Met 50 55 60Lys
Glu Ala Asn Tyr Asp Asp Ser Val Ala Gly Val Ile Thr Trp Ala65
70 75 80His Thr Phe Ser Pro Ala
Lys Asn Trp Ile Arg Gly Thr Gln Leu Leu 85
90 95Asn Lys Pro Leu Leu His Leu Ala Thr Gln Met Leu
Asn Asn Ile Pro 100 105 110Tyr
Asp Ser Ile Asp Phe Asp Tyr Met Asn Leu Asn Gln Ser Ala His 115
120 125Gly Asp Arg Glu Tyr Ala Phe Ile Asn
Ala Arg Leu Arg Leu Asn Asn 130 135
140Lys Ile Val Phe Gly His Trp Ala Asp Glu Ala Val Gln Val Gln Ile145
150 155 160Gly Lys Trp Met
Asp Val Ala Val Ala Tyr Glu Glu Ser Phe Lys Ile 165
170 175Lys Val Val Thr Phe Ala Asp Lys Met Arg
Asn Val Ala Val Thr Asp 180 185
190Gly Asp Lys Ile Glu Ala Gln Ile Lys Phe Gly Trp Thr Val Asp Tyr
195 200 205Trp Gly Val Gly Asp Leu Val
Thr Tyr Val Asn Ala Ile Asp Asp Ala 210 215
220Asp Ile Asp Asn Leu Tyr Ile Glu Leu Gln Asp Lys Tyr Asp Phe
Val225 230 235 240Ala Gly
Gln Asn Asp Ser Glu Lys Tyr Glu His Asn Val Lys Tyr Gln
245 250 255Leu Arg Glu Tyr Leu Gly Ile
Lys Arg Phe Leu Thr Asp Lys Gly Tyr 260 265
270Ser Ala Phe Thr Thr Asn Phe Glu Asp Leu Val Gly Leu Glu
Gln Leu 275 280 285Pro Gly Leu Ala
Ala Gln Leu Leu Met Ala Asp Gly Phe Gly Phe Ala 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala Ala Leu Thr Arg
Leu Leu Lys Ile305 310 315
320Val Ser His Asn Gln Ala Thr Ala Phe Met Glu Asp Tyr Thr Leu Asp
325 330 335Leu Arg Gln Gly His
Glu Ala Ile Leu Gly Ser His Met Leu Glu Val 340
345 350Asp Pro Thr Ile Ala Ser Asp Lys Pro Arg Val Glu
Val His Pro Leu 355 360 365Gly Ile
Gly Gly Lys Glu Asp Pro Ala Arg Leu Val Phe Ser Gly Arg 370
375 380Thr Gly Asp Ala Val Asp Val Thr Ile Ser Asp
Phe Gly Asp Glu Phe385 390 395
400Lys Leu Ile Ser Tyr Asp Val Thr Gly Asn Lys Pro Glu Ala Glu Thr
405 410 415Pro Tyr Leu Pro
Val Ala Lys Gln Leu Trp Thr Pro Lys Ala Gly Leu 420
425 430Lys Ala Gly Ala Glu Gly Trp Leu Thr Val Gly
Gly Gly His His Thr 435 440 445Thr
Leu Ser Phe Ser Val Asp Ser Glu Gln Leu Thr Asp Leu Ala Asn 450
455 460Leu Phe Gly Val Thr Tyr Val Asp Ile
Lys465 47064474PRTLactobacillus plantarum 64Met Leu Ser
Val Pro Asp Tyr Glu Phe Trp Phe Val Thr Gly Ser Gln1 5
10 15His Leu Tyr Gly Glu Glu Gln Leu Lys
Ser Val Ala Lys Asp Ala Gln 20 25
30Asp Ile Ala Asp Lys Leu Asn Ala Ser Gly Lys Leu Pro Tyr Lys Val
35 40 45Val Phe Lys Asp Val Met Thr
Thr Ala Glu Ser Ile Thr Asn Phe Met 50 55
60Lys Glu Val Asn Tyr Asn Asp Lys Val Ala Gly Val Ile Thr Trp Met65
70 75 80His Thr Phe Ser
Pro Ala Lys Asn Trp Ile Arg Gly Thr Glu Leu Leu 85
90 95Gln Lys Pro Leu Leu His Leu Ala Thr Gln
Tyr Leu Asn Asn Ile Pro 100 105
110Tyr Ala Asp Ile Asp Phe Asp Tyr Met Asn Leu Asn Gln Ser Ala His
115 120 125Gly Asp Arg Glu Tyr Ala Tyr
Ile Asn Ala Arg Leu Gln Lys His Asn 130 135
140Lys Ile Val Tyr Gly Tyr Trp Gly Asp Glu Asp Val Gln Glu Gln
Ile145 150 155 160Ala Arg
Trp Glu Asp Val Ala Val Ala Tyr Asn Glu Ser Phe Lys Val
165 170 175Lys Val Ala Arg Phe Gly Asp
Thr Met Arg Asn Val Ala Val Thr Glu 180 185
190Gly Asp Lys Val Glu Ala Gln Ile Lys Met Gly Trp Thr Val
Asp Tyr 195 200 205Tyr Gly Ile Gly
Asp Leu Val Glu Glu Ile Asn Lys Val Ser Asp Ala 210
215 220Asp Val Asp Lys Glu Tyr Ala Asp Leu Glu Ser Arg
Tyr Glu Met Val225 230 235
240Gln Gly Asp Asn Asp Ala Asp Thr Tyr Lys His Ser Val Arg Val Gln
245 250 255Leu Ala Gln Tyr Leu
Gly Ile Lys Arg Phe Leu Glu Arg Gly Gly Tyr 260
265 270Thr Ala Phe Thr Thr Asn Phe Glu Asp Leu Trp Gly
Met Glu Gln Leu 275 280 285Pro Gly
Leu Ala Ser Gln Leu Leu Ile Arg Asp Gly Tyr Gly Phe Gly 290
295 300Ala Glu Gly Asp Trp Lys Thr Ala Ala Leu Gly
Arg Val Met Lys Ile305 310 315
320Met Ser His Asn Lys Gln Thr Ala Phe Met Glu Asp Tyr Thr Leu Asp
325 330 335Leu Arg His Gly
His Glu Ala Ile Leu Gly Ser His Met Leu Glu Val 340
345 350Asp Pro Ser Ile Ala Ser Asp Lys Pro Arg Val
Glu Val His Pro Leu 355 360 365Asp
Ile Gly Gly Lys Asp Asp Pro Ala Arg Leu Val Phe Thr Gly Ser 370
375 380Glu Gly Glu Ala Ile Asp Val Thr Val Ala
Asp Phe Arg Asp Gly Phe385 390 395
400Lys Met Ile Ser Tyr Ala Val Asp Ala Asn Lys Pro Glu Ala Glu
Thr 405 410 415Pro Asn Leu
Pro Val Ala Lys Gln Leu Trp Thr Pro Lys Met Gly Leu 420
425 430Lys Lys Gly Ala Leu Glu Trp Met Gln Ala
Gly Gly Gly His His Thr 435 440
445Met Leu Ser Phe Ser Leu Thr Glu Glu Gln Met Glu Asp Tyr Ala Thr 450
455 460Met Val Gly Met Thr Lys Ala Phe
Leu Lys465 47065474PRTPediococcus pentosaceus 65Met Lys
Lys Val Gln Asp Tyr Glu Phe Trp Phe Val Thr Gly Ser Gln1 5
10 15Phe Leu Tyr Gly Glu Glu Thr Leu
Arg Ser Val Glu Lys Asp Ala Lys 20 25
30Glu Ile Val Asp Lys Leu Asn Glu Ser Lys Lys Leu Pro Tyr Pro
Val 35 40 45Lys Phe Lys Leu Val
Ala Thr Thr Ala Glu Asn Ile Thr Glu Val Met 50 55
60Lys Glu Val Asn Tyr Asn Asp Lys Val Ala Gly Val Ile Thr
Trp Met65 70 75 80His
Thr Phe Ser Pro Ala Lys Asn Trp Ile Arg Gly Thr Glu Leu Leu
85 90 95Gln Lys Pro Leu Leu His Leu
Ala Thr Gln Phe Leu Asn His Ile Pro 100 105
110Tyr Asp Thr Ile Asp Phe Asp Tyr Met Asn Leu Asn Gln Ser
Ala His 115 120 125Gly Asp Arg Glu
Tyr Ala Phe Ile Asn Ala Arg Leu Arg Lys Asn Asn 130
135 140Lys Ile Ile Ser Gly Tyr Trp Gly Asp Glu Gly Ile
Gln Lys Gln Ile145 150 155
160Ala Lys Trp Met Asp Val Ala Val Ala Tyr Asn Glu Ser Tyr Gly Ile
165 170 175Lys Val Val Thr Phe
Ala Asp Lys Met Arg Asn Val Ala Val Thr Asp 180
185 190Gly Asp Lys Ile Glu Ala Gln Ile Lys Phe Gly Trp
Thr Val Asp Tyr 195 200 205Trp Gly
Val Ala Asp Leu Val Glu Glu Val Asn Ala Val Ser Asp Glu 210
215 220Asp Ile Asp Lys Lys Tyr Glu Glu Met Lys Asn
Asp Tyr Asn Phe Val225 230 235
240Glu Gly Gln Asn Ser Ser Glu Lys Phe Glu His Asn Thr Lys Tyr Gln
245 250 255Ile Arg Glu Tyr
Phe Gly Leu Lys Lys Phe Met Asp Asp Arg Gly Tyr 260
265 270Thr Ala Phe Thr Thr Asn Phe Glu Asp Leu Ala
Gly Leu Glu Gln Leu 275 280 285Pro
Gly Leu Ala Ala Gln Met Leu Met Ala Glu Gly Tyr Gly Phe Ala 290
295 300Gly Glu Gly Asp Trp Lys Thr Ala Ala Leu
Asp Arg Leu Leu Lys Ile305 310 315
320Met Ala His Asn Lys Gln Thr Val Phe Met Glu Asp Tyr Thr Leu
Asp 325 330 335Leu Arg Glu
Gly His Glu Ala Ile Leu Gly Ser His Met Leu Glu Val 340
345 350Asp Pro Ser Ile Ala Ser Asp Thr Pro Arg
Val Glu Val His Pro Leu 355 360
365Asp Ile Gly Gly Lys Glu Asp Pro Ala Arg Phe Val Phe Thr Gly Met 370
375 380Glu Gly Asp Ala Val Asp Val Thr
Met Ala Asp Tyr Gly Asp Glu Phe385 390
395 400Lys Leu Met Ser Tyr Asp Val Thr Gly Asn Lys Thr
Glu Lys Glu Thr 405 410
415Pro Tyr Leu Pro Val Ala Lys Gln Leu Trp Thr Pro Lys Gln Gly Trp
420 425 430Lys Gln Gly Ala Glu Gly
Trp Leu Thr Leu Gly Gly Gly His His Thr 435 440
445Val Leu Ser Phe Ala Ile Asp Ala Glu Gln Leu Gln Asp Leu
Ser Asn 450 455 460Met Phe Gly Leu Thr
Tyr Val Asn Ile Lys465 4706633PRTartificial
sequence237-269 amino acid motif with variation at multiple
positionsVARIANT(2)..(2)R or KVARIANT(6)..(6)R or KVARIANT(11)..(11)I or
MVARIANT(12)..(12)E or KVARIANT(14)..(14)I or MVARIANT(15)..(15)L or
MVARIANT(16)..(16)V or T or DVARIANT(17)..(17)R or AVARIANT(18)..(18)E or
NVARIANT(20)..(20)A or CVARIANT(21)..(21)K or N or RVARIANT(24)..(24)S or
V or TVARIANT(28)..(28)E or QVARIANT(31)..(31)H or I or Y 66Ile Xaa Tyr
Gln Ala Xaa Glu Glu Ile Ala Xaa Xaa Lys Xaa Xaa Xaa1 5
10 15Xaa Xaa Gly Xaa Xaa Ala Phe Xaa Asn
Thr Phe Xaa Asp Leu Xaa Gly 20 25
30Met6733PRTartificial sequenceAmino acid sequence of the 237-269
Motif shown in FIG. 2AMISC_FEATURE(2)..(2)Arg or
LysMISC_FEATURE(6)..(6)Arg or LysMISC_FEATURE(11)..(11)Ile or
MetMISC_FEATURE(12)..(12)any amino acidMISC_FEATURE(14)..(14)Ile or
MetMISC_FEATURE(15)..(15)Leu or MetMISC_FEATURE(16)..(16)any amino
acidMISC_FEATURE(17)..(17)Arg or AlaMISC_FEATURE(18)..(18)Glu or
AsnMISC_FEATURE(20)..(20)Ala or CysMISC_FEATURE(21)..(21)any amino
acidMISC_FEATURE(24)..(24)any amino acidMISC_FEATURE(28)..(28)Gln or
GluMISC_FEATURE(31)..(31)any amino acidMISC_FEATURE(32)..(32)any amino
acid 67Ile Xaa Tyr Gln Ala Xaa Glu Glu Ile Ala Xaa Xaa Lys Xaa Xaa Xaa1
5 10 15Xaa Xaa Gly Xaa Xaa
Ala Phe Xaa Asn Thr Phe Xaa Asp Leu Xaa Gly 20
25 30Met68477PRTRuminococcus flavefaciens 68Met Lys Phe
Trp Phe Val Thr Gly Ser Gln Phe Leu Tyr Gly Glu Glu1 5
10 15Thr Leu Arg Gln Val Glu Glu Asp Ser
Lys Lys Ile Val Asp Gly Leu 20 25
30Asp Leu Pro Phe Pro Val Glu Tyr Lys Met Thr Val Lys Lys Glu Ser
35 40 45Glu Ile Glu Arg Ile Ile Lys
Glu Ala Asn Tyr Asp Asp Glu Cys Ala 50 55
60Gly Ile Ile Thr Phe Cys His Thr Phe Ser Pro Ser Lys Met Trp Ile65
70 75 80Asn Gly Leu Ala
Leu Leu Gln Lys Pro Trp Leu His Phe His Thr Gln 85
90 95Phe Asn Glu Thr Ile Pro Asn Glu Gly Ile
Asp Met Asp Tyr Met Asn 100 105
110Leu His Gln Ser Ala His Gly Asp Arg Glu His Gly Phe Ile Gly Ala
115 120 125Arg Leu Arg Met Pro Arg Ala
Val Val Ala Gly His Trp Lys Asp Lys 130 135
140Lys Val Gln Glu Lys Ile Ala Glu Trp Gln Arg Ala Ala Val Gly
Ala145 150 155 160Leu Phe
Ser Lys Ser Leu Lys Ile Val Arg Phe Gly Asp Asn Met Arg
165 170 175Glu Val Ala Val Thr Glu Gly
Asp Lys Ile Glu Ala Gln Leu Lys Leu 180 185
190Gly Trp Gln Val Asn Thr Phe Ala Val Gly Asp Leu Val Glu
Ile Met 195 200 205Asn Ala Val Lys
Asp Ala Glu Ile Asp Glu Leu Met Lys Glu Tyr Ala 210
215 220Glu Leu Tyr Asp Tyr Asp Lys Ala Asp Glu Glu Thr
Ile Arg Tyr Gln225 230 235
240Ala Arg Glu Glu Ile Ala Ile Glu Lys Ile Leu Val Arg Glu Gly Ala
245 250 255Lys Ala Phe Ser Asn
Thr Phe Glu Asp Leu His Gly Met Arg Gln Leu 260
265 270Pro Gly Leu Ala Thr Gln His Leu Met His Lys Gly
Tyr Gly Phe Gly 275 280 285Ala Glu
Gly Asp Trp Lys Thr Ala Gly Met Thr Ala Ile Val Lys Ala 290
295 300Met Tyr Pro Glu Gly Asn Thr Ser Phe Met Glu
Asp Tyr Thr Tyr Asp305 310 315
320Tyr Lys His Glu Leu Ile Leu Gly Ser His Met Leu Glu Val Cys Pro
325 330 335Ser Ile Ala Ala
Asp Lys Pro Arg Ile Glu Val His Lys Leu Gly Ile 340
345 350Gly Gly Lys Glu Ala Pro Ala Arg Ile Val Phe
Glu Gly Arg Ala Gly 355 360 365Ser
Ala Lys Ala Leu Ser Leu Ile Asp Ile Gly Gly Arg Phe Arg Leu 370
375 380Ile Ser Gln Asp Val Glu Cys Glu Lys Pro
Phe Gln Ser Met Pro Asn385 390 395
400Leu Pro Val Ala Arg Thr Met Trp Lys Pro Ala Pro Ser Phe Leu
Glu 405 410 415Gly Leu Glu
Cys Trp Ile Ile Ala Gly Gly Ala His His Thr Val Leu 420
425 430Ser Tyr Asp Ile Thr Asp Glu Thr Val Arg
Asp Phe Ala Arg Ile Met 435 440
445Gly Ile Glu Leu Val Val Ile Asn Lys Asp Thr Thr Lys Glu Lys Leu 450
455 460Glu Arg Asp Ile Met Ile Gly Asp
Met Ile Tyr Gly Arg465 470
47569477PRTRuminococcus flavefaciens 69Met Lys Phe Trp Phe Ile Thr Gly
Ser Gln Phe Leu Tyr Gly Glu Glu1 5 10
15Thr Leu Arg Gln Val Asp Glu Asp Ser Lys Lys Ile Val Ala
Gly Leu 20 25 30Lys Leu Pro
Phe Pro Val Glu Tyr Lys Ser Thr Val Lys Thr Glu Ser 35
40 45Glu Ile Gln Arg Ile Ile Lys Glu Ala Asn Phe
Asp Asp Glu Cys Ala 50 55 60Gly Val
Ile Thr Phe Cys His Thr Phe Ser Pro Ser Lys Met Trp Ile65
70 75 80Asn Gly Leu Ala Leu Leu Gln
Lys Pro Trp Leu His Phe His Thr Gln 85 90
95Phe Asn Glu Thr Ile Pro Asn Glu Ala Ile Asp Met Asp
Tyr Met Asn 100 105 110Leu His
Gln Ser Ala His Gly Asp Arg Glu His Gly Phe Ile Gly Ala 115
120 125Arg Leu Arg Met Pro Arg Ala Val Val Ala
Gly His Trp Gln Asp Pro 130 135 140Glu
Val Gln Ala Lys Ile Ala Glu Trp Gln Arg Ala Ala Val Gly Val145
150 155 160Met Phe Ser Lys Ser Leu
Lys Ile Val Arg Phe Gly Asp Asn Met Arg 165
170 175Glu Val Ala Val Thr Glu Gly Asp Lys Ile Glu Ala
Gln Leu Lys Leu 180 185 190Gly
Trp Gln Val Asn Thr Phe Ala Val Gly Asp Leu Val Glu Ile Met 195
200 205Asn Ala Val Thr Asp Ala Glu Ile Asp
Ala Leu Met Lys Glu Tyr Ala 210 215
220Glu Leu Tyr Asp Tyr Lys Lys Glu Asp Glu Glu Thr Ile Arg Tyr Gln225
230 235 240Ala Arg Glu Glu
Ile Ala Ile Glu Lys Ile Leu Val Arg Glu Gly Ala 245
250 255Lys Ala Tyr Ser Asn Thr Phe Glu Asp Leu
His Gly Met Lys Gln Leu 260 265
270Pro Gly Leu Ala Thr Gln His Leu Met His Lys Gly Tyr Gly Phe Gly
275 280 285Ala Glu Gly Asp Trp Lys Thr
Ala Gly Met Thr Ala Ile Val Lys Ala 290 295
300Met Tyr Pro Glu Gly Asn Thr Ser Phe Met Glu Asp Tyr Thr Tyr
Asp305 310 315 320Tyr Lys
Gln Glu Leu Ile Leu Gly Ser His Met Leu Glu Val Cys Pro
325 330 335Ser Ile Ala Ala Asp Arg Pro
Arg Ile Glu Val His Lys Leu Gly Ile 340 345
350Gly Gly Lys Glu Pro Pro Ala Arg Ile Val Phe Glu Gly Lys
Ala Gly 355 360 365Ser Ala Lys Val
Leu Ser Leu Ile Asp Ile Gly Gly Arg Leu Arg Leu 370
375 380Ile Gln Gln Asp Ile Glu Cys Val Lys Pro Phe Gln
Ser Met Pro Asn385 390 395
400Leu Pro Val Ala Arg Thr Met Trp Arg Pro Ala Pro Ser Phe Leu Asp
405 410 415Gly Leu Glu Cys Trp
Ile Ile Ala Gly Gly Ala His His Thr Val Leu 420
425 430Ser Tyr Asp Ile Ser Asp Glu Ala Val Arg Ser Phe
Ala Arg Ile Met 435 440 445Gly Ile
Glu Leu Val Val Ile Asn Lys Asp Thr Thr Val Asn Gly Leu 450
455 460Glu Arg Asp Ile Met Ile Gly Asp Val Ile Tyr
Gly Arg465 470 47570477PRTRuminococcus
flavefaciens 70Met Lys Phe Trp Phe Ile Thr Gly Ser Gln Phe Leu Tyr Gly
Glu Glu1 5 10 15Thr Leu
Arg Gln Val Asp Glu Asp Ser Lys Lys Ile Val Ala Gly Leu 20
25 30Lys Leu Pro Phe Pro Val Glu Tyr Lys
Ser Thr Val Lys Thr Glu Arg 35 40
45Glu Ile Glu Arg Ile Ile Lys Glu Ala Asn Tyr Asp Asp Glu Cys Ala 50
55 60Gly Ile Ile Thr Phe Cys His Thr Phe
Ser Pro Ser Lys Met Trp Ile65 70 75
80Asn Gly Leu Ala Leu Leu Gln Lys Pro Trp Leu His Phe His
Thr Gln 85 90 95Phe Asn
Glu Thr Ile Pro Asn Glu Ala Ile Asp Met Asp Tyr Met Asn 100
105 110Leu His Gln Ser Ala His Gly Asp Arg
Glu His Gly Phe Ile Gly Ala 115 120
125Arg Leu Arg Met Pro Arg Ala Val Val Ala Gly His Trp Gln Asp Pro
130 135 140Glu Val Gln Ala Lys Ile Ala
Glu Trp Gln Arg Ala Ala Val Gly Val145 150
155 160Met Phe Ser Lys Ser Leu Lys Ile Val Arg Phe Gly
Asp Asn Met Arg 165 170
175Glu Val Ala Val Thr Glu Gly Asp Lys Val Glu Ala Gln Leu Lys Leu
180 185 190Gly Trp Gln Val Asn Thr
Phe Ala Val Gly Asp Leu Val Glu Ile Met 195 200
205Asn Ala Val Thr Asn Thr Glu Ile Asp Ala Leu Met Lys Glu
Tyr Ala 210 215 220Glu Leu Tyr Asp Tyr
Lys Lys Glu Asp Glu Glu Thr Ile Arg Tyr Gln225 230
235 240Ala Arg Glu Glu Ile Ala Ile Glu Lys Ile
Leu Val Arg Glu Gly Ala 245 250
255Lys Ala Phe Ser Asn Thr Phe Glu Asp Leu His Gly Met Lys Gln Leu
260 265 270Pro Gly Leu Ala Thr
Gln His Leu Met His Lys Gly Tyr Gly Phe Gly 275
280 285Ala Glu Gly Asp Trp Lys Thr Ala Gly Met Thr Ala
Ile Val Lys Ala 290 295 300Met Tyr Pro
Asp Gly Asn Thr Ser Phe Met Glu Asp Tyr Thr Tyr Asp305
310 315 320Tyr Lys Gln Gln Leu Ile Leu
Gly Ser His Met Leu Glu Val Cys Pro 325
330 335Ser Ile Ala Ala Asp Lys Pro Arg Ile Glu Val His
Lys Leu Gly Ile 340 345 350Gly
Gly Lys Glu Pro Pro Ala Arg Ile Val Phe Glu Gly Lys Ala Gly 355
360 365Ser Ala Lys Ala Leu Ser Leu Ile Asp
Ile Gly Gly Arg Leu Arg Leu 370 375
380Ile Ser Gln Asp Val Glu Cys Val Lys Pro Phe Gln Ser Met Pro Asn385
390 395 400Leu Pro Val Ala
Arg Thr Met Trp Arg Pro Ala Pro Ser Phe Leu Glu 405
410 415Gly Leu Glu Cys Trp Ile Val Ala Gly Gly
Ala His His Thr Val Leu 420 425
430Ser Tyr Asp Ile Ser Asp Glu Ala Val Arg Ser Phe Ala Arg Ile Met
435 440 445Gly Ile Glu Leu Val Val Ile
Asn Lys Asp Thr Thr Val Asn Gly Leu 450 455
460Glu Arg Asp Ile Met Ile Gly Asp Val Ile Tyr Gly Arg465
470 47571477PRTRuminococcus flavefaciens 71Met Lys
Phe Trp Phe Val Thr Gly Ser Gln Phe Leu Tyr Gly Glu Glu1 5
10 15Thr Leu Arg Gln Val Glu Glu Asp
Ser Lys Lys Ile Val Ala Gly Leu 20 25
30Lys Leu Pro Phe Pro Val Glu Tyr Lys Leu Thr Val Lys Lys Glu
Ala 35 40 45Glu Ile Thr Lys Ile
Ile Lys Glu Ala Asn Tyr Asp Asp Glu Cys Ala 50 55
60Gly Ile Ile Thr Phe Cys His Thr Phe Ser Pro Ser Lys Met
Trp Ile65 70 75 80Asn
Gly Leu Arg Ser Leu Gln Lys Pro Trp Leu His Phe His Thr Gln
85 90 95Phe Asn Asp Asn Ile Pro Asn
Asp Ala Ile Asp Met Asp Tyr Met Asn 100 105
110Leu His Gln Ser Ala His Gly Asp Arg Glu His Gly Phe Ile
Gly Ala 115 120 125Arg Leu Arg Met
Pro Arg Ala Val Val Ala Gly His Trp Ala Asp Pro 130
135 140Ala Val Gln Glu Lys Ile Ala Asp Trp Met Arg Ala
Ala Val Gly Val145 150 155
160Gln Phe Ser Lys Ser Leu Lys Ile Val Arg Phe Gly Asp Asn Met Arg
165 170 175Glu Val Ala Val Thr
Glu Gly Asp Lys Ile Glu Ala Gln Ile Lys Leu 180
185 190Gly Trp Gln Val Asn Thr Phe Ala Val Gly Asp Leu
Val Gln Ile Met 195 200 205Asn Ala
Val Thr Asp Ala Glu Ile Asp Ala Leu Met Lys Glu Tyr Ala 210
215 220Glu Leu Tyr Asp Phe Asp Lys Ala Asp Glu Glu
Cys Ile Arg Tyr Gln225 230 235
240Ala Arg Glu Glu Ile Ala Ile Glu Lys Ile Leu Val Arg Glu Gly Ala
245 250 255Met Ala Phe Ser
Asn Thr Phe Glu Asp Leu His Gly Met Lys Gln Leu 260
265 270Pro Gly Leu Ala Thr Gln His Leu Met His Lys
Gly Tyr Gly Phe Gly 275 280 285Ala
Glu Gly Asp Trp Lys Thr Ala Gly Met Thr Ala Ile Ile Lys Ala 290
295 300Met Tyr Pro Asp Gly Asn Thr Ser Phe Met
Glu Asp Tyr Asn Tyr Asp305 310 315
320Tyr Lys His Glu Leu Ile Leu Gly Ala His Met Leu Glu Val Cys
Pro 325 330 335Ser Ile Ala
Ala Gly Arg Pro Arg Ile Glu Val His Pro Leu Gly Ile 340
345 350Gly Gly Lys Asp Ala Pro Ala Arg Ile Val
Phe Glu Gly Lys Ala Gly 355 360
365Ser Ala Lys Ala Ile Ser Leu Ile Asp Ile Gly Gly Arg Leu Arg Leu 370
375 380Ile Ala Gln Asp Val Glu Cys Glu
Lys Pro Phe Gln Thr Met Pro Asn385 390
395 400Leu Pro Val Ala Arg Thr Met Trp Lys Pro Ala Pro
Ser Phe Leu Glu 405 410
415Gly Leu Glu Cys Trp Ile Ile Ala Gly Gly Ala His His Thr Val Leu
420 425 430Ser Tyr Asp Ile Ser Asp
Glu Thr Val Arg Asp Phe Ala Arg Ile Met 435 440
445Gly Ile Glu Leu Val Val Ile Asn Lys Asn Thr Asn Lys Tyr
Gln Leu 450 455 460Glu Arg Asp Met Met
Ile Gly Asp Val Ile Tyr Gly Arg465 470
475
User Contributions:
Comment about this patent or add new information about this topic: