Patent application title: HETEROLOGOUS PRODUCTION OF 10-METHYLSTEARIC ACID
Inventors:
IPC8 Class: AC12P764FI
USPC Class:
1 1
Class name:
Publication date: 2021-09-02
Patent application number: 20210269835
Abstract:
Nucleic acids and cells comprising a methyltransferase gene and/or a
reductase gene are disclosed. These nucleic acids and cells may be used
to produce branched (methyl)lipids, such as 10-methylstearate.Claims:
1.-29. (canceled)
30. A nucleic acid comprising a recombinant methyltransferase gene and a constitutive promoter operably-linked to the recombinant methyltransferase gene, wherein the constitutive promoter is a eukaryotic promoter, and wherein the recombinant methyltransferase gene encodes an amino acid sequence with at least 95% sequence identity with the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO: 76.
31. The nucleic acid of claim 30, wherein the recombinant methyltransferase gene is codon-optimized for expression in yeast, algae, or plants.
32. The nucleic acid of claim 30, further comprising a recombinant reductase gene, wherein the recombinant reductase gene encodes an amino acid sequence with at least 95% sequence identity with the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO: 10, SEQ ID NO: 14, SEQ ID NO: 18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO: 74.
33. (canceled)
34. The nucleic acid of claim 32, wherein the recombinant reductase gene is codon-optimized for expression in yeast, algae, or plants.
35.-36. (canceled)
37. The nucleic acid of claim 32, wherein the recombinant methyltransferase gene and the recombinant reductase gene are part of a single open reading frame that encodes a fusion protein.
38-49. (canceled)
50. The nucleic acid of claim 30, wherein the constitutive promoter is a bacterial or yeast promoter.
51. The nucleic acid of claim 50, wherein the constitutive promoter is a Yarrowia, Arxula, or Saccharomyces promoter.
52. The nucleic acid of claim 32, further comprising a second constitutive promoter operably-linked to the recombinant reductase gene, wherein the second constitutive promoter is a eukaryotic promoter.
53. The nucleic acid of claim 52, wherein the second constitutive promoter is a bacterial or yeast promoter.
54. The nucleic acid of claim 53, wherein the second constitutive promoter is a Yarrowia, Arxula, or Saccharomyces promoter.
55. The nucleic acid of claim 30, wherein the amino acid sequence has at least 95% sequence identity with the amino acid sequence set forth in SEQ ID NO:76.
56. The nucleic acid of claim 55, wherein the amino acid sequence comprises the amino acid sequence set forth in SEQ ID NO:76.
57. The nucleic acid of claim 55, wherein the recombinant methyltransferase gene comprises a nucleic acid sequence having at least 95% sequence identity with the nucleotide sequence set forth in SEQ ID NO:75.
58. The nucleic acid of claim 57, wherein the recombinant methyltransferase gene comprises the nucleotide sequence set forth in SEQ ID NO:75.
59. The nucleic acid of claim 32, wherein the amino acid sequence encoded by the recombinant reductase gene has at least 95% sequence identity with the amino acid sequence set forth in SEQ ID NO:74.
60. The nucleic acid of claim 59, wherein the amino acid sequence encoded by the recombinant reductase gene comprises the amino acid sequence set forth in SEQ ID NO:74.
61. The nucleic acid of claim 59, wherein the recombinant reductase gene comprises a nucleic acid sequence having at least 95% sequence identity with the nucleotide sequence set forth in SEQ ID NO:73.
62. The nucleic acid of claim 61, wherein the recombinant reductase gene comprises the nucleotide sequence set forth in SEQ ID NO:73.
63. The nucleic acid of claim 37, wherein the nucleic acid comprises a nucleotide sequence having at least 95% sequence identity with the nucleotide sequence set forth in SEQ ID NO:97 or wherein the nucleic acid comprises the nucleotide sequence set forth in SEQ ID NO:97.
64. The nucleic acid of claim 37, wherein the nucleic acid comprises a nucleotide sequence having at least 95% sequence identity with the nucleotide sequence set forth in SEQ ID NO:98 or wherein the nucleic acid comprises the nucleotide sequence set forth in SEQ ID.
Description:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent application Ser. No. 16/664,378, filed Oct. 25, 2019, which is a divisional of U.S. patent application Ser. No. 15/710,734, filed Sep. 20, 2017, now U.S. Pat. No. 10,457,963, which claims priority to U.S. Provisional Patent Application Ser. No. 62/396,870, filed Sep. 20, 2016, each of which are incorporated by reference herein in their entirety.
BACKGROUND
[0002] Fatty acids derived from agricultural plant and animal oils find use as industrial lubricants, hydraulic fluids, greases, and other specialty fluids in addition to oleochemical feedstocks for processing. The physical and chemical properties of these fatty acids result in large part from their carbon chain length and number of unsaturated double bonds. Fatty acids are typically 16:0 (sixteen carbons, zero double bonds), 16:1 (sixteen carbons, 1 double bond), 18:0, 18:1, 18:2, or 18:3. Importantly, fatty acids with no double bonds (saturated) have high oxidative stability, but they solidify at low temperature. Double bonds improve low-temperature fluidity, but decrease oxidative stability. This trade-off poses challenges for lubricant and other specialty-fluid formulations because consistent long term performance (high oxidative stability) over a wide range of operating temperatures is desirable. High 18:1 (oleic) fatty acid oils provide low temperature fluidity with relatively good oxidative stability. Accordingly, several commercial products, such as high oleic soybean oil, high oleic sunflower oil, and high oleic algal oil, have been developed with high oleic compositions. Oleic acid is an alkene, however, and subject to oxidative degradation.
SUMMARY
[0003] The nucleic acids, cells, and methods described herein are generally useful for the production of branched (methyl)lipids, such as 10-methylstearic acid, and compositions that include such lipids. Saturated branched (methyl)lipids like 10-methylstearic acid have favorable low-temperature fluidity and favorable oxidative stability, which are desirable properties for lubricants and specialty fluids.
[0004] Various aspects relate to nucleic acids comprising a recombinant tmsB gene encoding a methyltransferase protein, a recombinant tmsA gene encoding a reductase protein, and/or a recombinant tmsC gene encoding a tmsC protein. The methyltransferase protein, reductase protein, and/or tmsC protein may be proteins expressed by species of Actinobacteria, and the recombinant tmsB gene, recombinant tmsA gene, and/or recombinant tmsC gene may be codon-optimized for expression in a different phylum of bacteria (e.g., Proteobacterium) or in eukaryotes (e.g., yeast, such as Arxula adeninivorans (also known as Blastobotrys adeninivorans or Trichosporon adeninivorans), Saccharomyces cerevisiae, or Yarrowia lipolytica). The recombinant tmsB gene, recombinant tmsA gene, or recombinant tmsC gene may be operably-linked to a promoter capable of driving expression in a phylum of bacteria other than Actinobacteria (e.g., Proteobacterium) or in eukaryotes (e.g., yeast). The nucleic acid may be a plasmid or a chromosome.
[0005] Some aspects relate to a cell comprising a nucleic acid as described herein. The cell may comprise a branched (methyl)lipid, such as 10-methylstearic acid, and/or an exomethylene-substituted lipid, such as 10-methylenestearic acid. The cell may be a eukaryotic cell, such as an algae cell, yeast cell, or plant cell.
[0006] Some aspects relate to a composition produced by cultivating a cell culture comprising cells as described herein. The oil composition may comprise a branched (methyl)lipid, such as 10-methylstearic acid, and or an exomethylene-substituted lipid, such as 10-methylenestearic acid.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 depicts one possible mechanism for the conversion of oleic acid to 10-methylstearic acid. An oleic acid substrate may be present as an acyl chain of a glycerolipid or phospholipid. A methionine substrate, which donates the methyl group, may be present as S-adenosyl methionine. The oleic acid and methionine substrates may be converted to 10-methylenestearic acid (e.g., present as an acyl chain of a glycerolipid or phospholipid) and homocysteine (e.g., present as S-adenosyl homocysteine). This reaction may be catalyzed by a tmsB protein as described herein, infra. 10-methylenestearic acid (e.g., present as an acyl chain of a glycerolipid or phospholipid) may be reduced to 10-methylstearic acid. The reduction may be catalyzed by a tmsA protein as describe herein, infra, for example, using NADPH as a reducing agent. The language of the specification and claims, however, is not limited to any particular reaction mechanism.
[0008] FIG. 2 depicts one possible mechanism for the conversion of oleic acid to 10-methylstearic acid. Oleic acid, present as a carboxylic acid in the cytosol, may be added to monoacylglycerol-3-phosphate to form a diacylglycerol-3-phosphate comprising an oleate acyl group. "10-methyl synthase" may convert diacylglycerol-3-phosphate comprising an oleate acyl group to diacylglycerol-3-phosphate comprising a 10-methylsterate acyl group. The diacyl-3-phosphate may subsequently be converted to a triacylglycerol, converted into another phospholipid, such as phosphatidylcholine, or converted back into a monoacylglycerol-3-phosphate (e.g., thereby releasing free 10-methylstearate into the cytosol). The language of the specification and claims, however, is not limited to any particular reaction mechanism.
[0009] FIGS. 3A and 3B depict prokaryotic operons encoding enzymes that catalyze the transfer of methyl groups to alkyl chains from sixteen different species of bacteria, labeled A-H (FIG. 3A) and I-P (FIG. 3B). The tmsA and tmsB genes are particularly important for methylating alkyl chains. The tmsC gene may also be important for methylating alkyl chains. The nucleotide sequences of these genes and the amino acid sequences that they encode are shown in SEQ ID NO:1-76.
[0010] FIG. 4 is a map of plasmid pNC704, which may be used to express Mycobacterium smegmatis genes tmsA (SEQ ID NO:1) and tmsB (SEQ ID NO:3) in E. coli. The nucleotide sequence of plasmid pNC738 is set forth in SEQ ID NO:77.
[0011] FIG. 5 is a map of plasmid pNC738, which may be used to express codon-optimized versions of Mycobacterium smegmatis genes tmsA (SEQ ID NO:80) and tmsB (SEQ ID NO:81) in yeast, such as Arxula adeninivorans, Saccharomyces cerevisiae, and Yarrowia lipolytica. The nucleotide sequence of plasmid pNC738 is set forth in SEQ ID NO:78.
[0012] FIG. 6 is a map of plasmid BS-10MS_ER, which may be used to express codon-optimized versions of Mycobacterium smegmatis genes tmsA (SEQ ID NO:80) and tmsB (SEQ ID NO:81) in yeast, such as Arxula adeninivorans, Saccharomyces cerevisiae, and Yarrowia lipolytica. The nucleotide sequence of plasmid BS-10MS_ER is set forth in SEQ ID NO:79.
[0013] FIGS. 7A and 7B consist of overlaid gas chromatography (GC) traces of various fatty acid standards and lipids extracted from various samples. The standards were stearic acid, 10-methylstearic acid, and oleic acid. Each sample and standard was transesterified into fatty acid methyl esters (FAMEs) prior to analysis. FIG. 7A depicts the GC trace of FAMEs prepared from E. coli that express the tmsA and tmsB genes from Mycobacterium smegmatis as well as the GC traces of each standard. The tmsA/tmsB sample displayed a peak at about 10.777 minutes, corresponding to the 10-methylstearic acid standard. FIG. 7B depicts each trace of FIG. 7A and two additional traces. The first additional trace corresponds to FAMEs prepared from E. coli that express the ufa gene from Mycobacterium tuberculosis. This sample displayed a peak at about 10.777 minutes, corresponding to the 10-methylstearic acid standard. The second additional trace corresponds to FAMEs prepared from E. coli that had been transfected with an empty vector. This control did not display a peak at 10.777 minutes, suggesting that the tmsA and tmsB genes synthesized 10-methylstearic acid in the transformed E. coli.
[0014] FIGS. 8A and 8B depict GC-MS result. FIG. 8A is a gas chromatography (GC) trace of lipids eluting from a GC column. The lipids were purified from E. coli that had been transfected with pNC704 encoding Mycobacterium smegmatis genes tmsA and tmsB, and the lipids were converted into fatty acid methyl esters. FIG. 8B is a mass spectroscopy spectrum of the lipids eluted during the GC run of panel A from 20.388 to 20.447 minutes. The mass spectrum is gated for the 10-methylstearate fatty acid methyl ester, which has a molecular weight of 312. The spectrum also displays a peak at 313 m/z corresponding to 10-methylstearate methyl esters comprising natural-abundance isotopes (e.g., a single .sup.13C).
[0015] FIGS. 9A-9D depict maps of the following vectors, which can be used to express the tmsA and tmsB genes of the indicated species: pNC721 (Mycobacterium vanbaaleni) (SEQ ID NO:83), pNC755 (Amycolicicoccus subflavus) (SEQ ID NO:84), pNC757 (Corynebacterium glyciniphilum) (SEQ ID NO:85), pNC 904 (Rhodococcus opacus) (SEQ ID NO:86), pNC905 (Thermobifida fusca) (SEQ ID NO:87), pNC906 (Thermomonospora curvata) (SEQ ID NO:88), pNC907 (Corynebacterium glutamicum) (SEQ ID NO:89), pNC908 (Agromycies subbeticus) (SEQ ID NO:90), pNC910 (Mycobacterium gilvum) (SEQ ID NO:91), pNC911 (Mycobacterium sp. indicus) (SEQ ID NO:92).
[0016] FIG. 10 depicts maps of vectors pNC985 (SEQ ID NO:93), which can be used to express the M. smegmatis tmsAB genes in Rhodococcus bacteria, and pNC986 (SEQ ID NO:94), which can be used to express the T. fusca tmsAB genes in Rhodococcus bacteria.
[0017] FIG. 11 depicts maps of vectors pNC963 (SEQ ID NO:95), which encodes the T. curvata tmsB gene under control of the constitutive tac promoter, and pNC964 (SEQ ID NO:96), which encodes the T curvata tmsA gene under control of the constitutive tac promoter.
[0018] FIG. 12 is a graph showing gas chromatographic detection of 10-methylene stearic acid in Y. lipolytica expressing tmsB genes from various organisms.
[0019] FIG. 13 is a graph showing percentage of 10-methylene fatty acids as compared to total fatty acids in 8 transformants of Arxula adeninivorans containing a plasmid encoding T. curvata tmsB. The two isolates furthest to the right were transformed with empty vector control.
[0020] FIG. 14 is a graph showing the percentage by weight of 10-methylene fatty acids and 10-methyl fatty acids in Yarrowia lipolytica containing a stably integrated copy of the T. curvata tmsB gene and transformed with plasmids expressing tmsA from C. glutamicum (C.gl.), T. curvata (T.cu.), or T. fusca (T.fu.), or an empty vector control (the two transformants furthest to the right).
[0021] FIG. 15 is a graph showing the percentage by weight of 10-methylene fatty acids and 10-methyl fatty acids as compared to total fatty acids in transformants of S. cerevisiae transformed with empty vector (-) or vectors encoding T. curvata tmsA (T.cu. tmsA), T. curvata tmsB (T.cu. tmsB), or both T. curvata tmsA and tmsB (T.cu. tmsA+tmsB).
[0022] FIG. 16 is a graph showing the percentage by weight of 10-methylene fatty acids and 10-methyl fatty acids as compared to total fatty acids in transformants of S. cerevisiae containing the tmsA-B fusion protein, the tmsB-A fusion protein, or empty vector (-).
[0023] FIG. 17 is a graph showing the percentage by weight of 10-methylene fatty acids and 10-methyl fatty acids as compared to total fatty acids in transformants of Y. lipolytica containing the tmsA-B fusion protein, the tmsB-A fusion protein, or empty vector (-).
[0024] FIG. 18 is a graph showing the percentage by weight of 10-methylene fatty acids and 10-methyl fatty acids as compared to total fatty acids in transformants of A. adeninivorans containing the tmsA-B fusion protein or empty vector (-).
[0025] FIGS. 19A-19D show a CLUSTAL OMEGA alignment of TmsB protein sequences encoded by the tmsB genes from Mycobacterium smegmatis (SEQ ID NO:4), Mycobacterium vanbaaleni (SEQ ID NO:54), Amycolicicoccus subflavus (SEQ ID NO:12), Corynebacterium glyciniphilum (SEQ ID NO:20), Corynebacterium glutamicum (SEQ ID NO:16), Rhodococcus opacus (SEQ ID NO:60), Agromyces subbeticus (SEQ ID NO:8), Knoellia aerolata (SEQ ID NO:26), Mycobacterium gilvum (SEQ ID NO:36), Mycobacterium sp. Indicus (SEQ ID NO:42), Thermobifida fusca (SEQ ID NO:70), and Thermomonospora curvata (SEQ ID NO:76), along with the cyclopropane fatty acid synthase (Cfa) enzyme from Escherichia coli.
[0026] FIGS. 20A-20E show a CLUSTAL OMEGA alignment of TmsA protein sequences encoded by the tmsA genes from Mycobacterium smegmatis (SEQ ID NO:2), Mycobacterium vanbaaleni (SEQ ID NO:52), Amycolicicoccus subflavus (SEQ ID NO:10), Corynebacterium glyciniphilum (SEQ ID NO:18), Corynebacterium glutamicum (SEQ ID NO:14), Rhodococcus opacus (SEQ ID NO:58), Agromyces subbeticus (SEQ ID NO:6), Knoellia aerolata (SEQ ID NO:24), Mycobacterium gilvum (SEQ ID NO:34), Mycobacterium sp. Indicus (SEQ ID NO:40), Thermobifida fusca (SEQ ID NO:68), and Thermomonospora curvata (SEQ ID NO:74), along with the Glycolate oxidase subunit GlcD enzyme from Escherichia coli.
DETAILED DESCRIPTION
Definitions
[0027] The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element.
[0028] The term "biologically-active portion" refers to an amino acid sequence that is less than a full-length amino acid sequence, but exhibits at least one activity of the full length sequence. For example, a biologically-active portion of a methyltransferase may refer to one or more domains of tmsB having biological activity for converting oleic acid (e.g., a phospholipid comprising an ester of oleate) and methionine (e.g., S-adenosyl methionine) into 10-methylenestearic acid (e.g., a phospholipid comprising an ester of 10-methylenestearate). A biologically-active portion of a reductase may refer to one or more domains of tmsA having biological activity for converting 10-methylenestearic acid (e.g., a phospholipid comprising an ester of 10-methylenestearate) and a reducing agent (e.g., NADH, NADPH, FAD, FADH2, FMNH2) into 10-methylstearic acid (e.g., a phospholipid comprising an ester of 10-methylstearate). Biologically-active portions of a protein include peptides or polypeptides comprising amino acid sequences sufficiently identical to or derived from the amino acid sequence of the protein, e.g., the amino acid sequence set forth in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, or 76, which include fewer amino acids than the full length protein, and exhibit at least one activity of the protein, especially methyltransferase or reductase activity. A biologically-active portion of a protein may comprise, comprise at least, or comprise to most, for example, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, or more amino acids or any range derivable therein. Typically, biologically-active portions comprise a domain or motif having a catalytic activity, such as catalytic activity for producing 10-methylenestearic acid or 10-methylstearic acid. A biologically-active portion of a protein includes portions of the protein that have the same activity as the full-length peptide and every portion that has more activity than background. For example, a biologically-active portion of an enzyme may have, have at least, or have at most 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 100%, 100.1%, 100.2%, 100.3%, 100.4%, 100.5%, 100.6%, 100.7%, 100.8%, 100.9%, 101%, 105%, 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 160%, 170%, 180%, 190%, 200%, 220%, 240%, 260%, 280%, 300%, 320%, 340%, 360%, 380%, 400% or higher activity relative to the full-length enzyme (or any range derivable therein). A biologically-active portion of a protein may include portions of a protein that lack a domain that targets the protein to a cellular compartment.
[0029] The terms "codon optimized" and "codon-optimized for the cell" refer to coding nucleotide sequences (e.g., genes) that have been altered to substitute at least one codon that is relatively rare in a desired host cell with a synonymous codon that is relatively prevalent in the host cell. Codon optimization thereby allows for better utilization of the tRNA of a host cell by matching the codons of a recombinant gene with the tRNA of the host cell. For example, the codon usage of the species of Actinobacteria (prokaryotes) varies from the codon usage of yeast (eukaryotes). The translation efficiency in a yeast host cell of an mRNA encoding a Actinobacteria protein may be increased by substituting the codons of the corresponding Actinobacteria gene with codons that are more prevalent in the particular species of yeast. A codon optimized gene thereby has a nucleotide sequence that varies from a naturally-occurring gene.
[0030] The term "constitutive promoter" refers to a promoter that mediates the transcription of an operably linked gene independent of a particular stimulus (e.g., independent of the presence of a reagent such as isopropyl .beta.-D-1-thiogalactopyranoside).
[0031] The term "DGAT1" refers to a gene that encodes a type 1 diacylglycerol acyltransferase protein, such as a gene that encodes a yeast DGA2 protein.
[0032] The term "DGAT2" refers to a gene that encodes a type 2 diacylglycerol acyltransferase protein, such as a gene that encodes a yeast DGA1 protein.
[0033] "Diacylglyceride," "diacylglycerol," and "diglyceride," are esters comprised of glycerol and two fatty acids.
[0034] The terms "diacylglycerol acyltransferase" and "DGA" refer to any protein that catalyzes the formation of triacylglycerides from diacylglycerol. Diacylglycerol acyltransferases include type 1 diacylglycerol acyltransferases (DGA2), type 2 diacylglycerol acyltransferases (DGA1), and type 3 diacylglycerol acyltransferases (DGA3) and all homologs that catalyze the above-mentioned reaction.
[0035] The terms "diacylglycerol acyltransferase, type 1" and "type 1 diacylglycerol acyltransferases" refer to DGA2 and DGA2 orthologs.
[0036] The terms "diacylglycerol acyltransferase, type 2" and "type 2 diacylglycerol acyltransferases" refer to DGA1 and DGA1 orthologs.
[0037] The term "domain" refers to a part of the amino acid sequence of a protein that is able to fold into a stable three-dimensional structure independent of the rest of the protein.
[0038] The term "drug" refers to any molecule that inhibits cell growth or proliferation, thereby providing a selective advantage to cells that contain a gene that confers resistance to the drug. Drugs include antibiotics, antimicrobials, toxins, and pesticides.
[0039] "Dry weight" and "dry cell weight" mean weight determined in the relative absence of water. For example, reference to oleaginous cells as comprising a specified percentage of a particular component by dry weight means that the percentage is calculated based on the weight of the cell after substantially all water has been removed. The term "% dry weight," when referring to a specific fatty acid (e.g., oleic acid or 10-methylstearic acid), includes fatty acids that are present as carboxylates, esters, thioesters, and amides. For example, a cell that comprises 10-methylstearic acid as a percentage of total fatty acids by % dry cell weight includes 10-methylstearic acid, 10-methylstearate, the 10-methylstearate portion of a diacylglycerol comprising a 10-methylstearate ester, the 10-methylstearate portion of a triacylglycerol comprising a 10-methylstearate ester, the 10-methylstearate portion of a phospholipid comprising a 10-methylstearate ester, and the 10-methylstearate portion of 10-methylstearate CoA. The term "% dry weight," when referring to a specific type of fatty acid (e.g., C16 fatty acids, C18 fatty acids), includes fatty acids that are present as carboxylates, esters, thioesters, and amides as described above (e.g., for 10 methylstearic acid).
[0040] The term "encode" refers to nucleic acids that comprise a coding region, portion of a coding region, or compliments thereof. Both DNA and RNA may encode a gene. Both DNA and RNA may encode a protein.
[0041] The term "enzyme" as used herein refers to a protein that can catalyze a chemical reaction.
[0042] The term "expression" refers to the amount of a nucleic acid or amino acid sequence (e.g., peptide, polypeptide, or protein) in a cell. The increased expression of a gene refers to the increased transcription of that gene. The increased expression of an amino acid sequence, peptide, polypeptide, or protein refers to the increased translation of a nucleic acid encoding the amino acid sequence, peptide, polypeptide, or protein.
[0043] The term "gene," as used herein, may encompass genomic sequences that contain exons, particularly polynucleotide sequences encoding polypeptide sequences involved in a specific activity. The term further encompasses synthetic nucleic acids that did not derive from genomic sequence. In certain embodiments, the genes lack introns, as they are synthesized based on the known DNA sequence of cDNA and protein sequence. In other embodiments, the genes are synthesized, non-native cDNA wherein the codons have been optimized for expression in Y. lipolytica or A. adeninivorans based on codon usage. The term can further include nucleic acid molecules comprising upstream, downstream, and/or intron nucleotide sequences.
[0044] The term "inducible promoter" refers to a promoter that mediates the transcription of an operably linked gene in response to a particular stimulus.
[0045] The term "integrated" refers to a nucleic acid that is maintained in a cell as an insertion into the cell's genome, such as insertion into a chromosome, including insertions into a plastid genome.
[0046] "In operable linkage" refers to a functional linkage between two nucleic acid sequences, such a control sequence (typically a promoter) and the linked sequence (typically a sequence that encodes a protein, also called a coding sequence). A promoter is in operable linkage with a gene if it can mediate transcription of the gene.
[0047] The term "knockout mutation" or "knockout" refers to a genetic modification that prevents a native gene from being transcribed and translated into a functional protein.
[0048] The term "nucleic acid" refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. A polynucleotide may be further modified, such as by conjugation with a labeling component. In all nucleic acid sequences provided herein, U nucleotides are interchangeable with T nucleotides.
[0049] The term "phospholipid" refers to esters comprising glycerol, two fatty acids, and a phosphate. The phosphate may be covalently linked to carbon-3 of the glycerol and comprise no further substitution, i.e., the phospholipid may be a phosphatidic acid. The phosphate may be substituted with ethanolamine (e.g., phosphatidylethanolamine), choline (e.g., phosphatidylcholine), serine (e.g., phosphatidylserine), inositol (e.g., phosphatidylinositol), inositol phosphate (e.g., phosphatidylinositol-3-phosphate, phosphatidylinositol-4-phosphate, phosphatidylinositol-5-phosphate), inositol bisphosphate (e.g., phosphatidylinositol-4,5-bisphosphate), or inositol triphosphate (e.g., phosphatidylinositol-3,4,5-bisphosphate).
[0050] As used herein, the term "plasmid" refers to a circular DNA molecule that is physically separate from an organism's genomic DNA. Plasmids may be linearized before being introduced into a host cell (referred to herein as a linearized plasmid). Linearized plasmids may not be self-replicating, but may integrate into and be replicated with the genomic DNA of an organism.
[0051] A "promoter" is a nucleic acid control sequence that directs the transcription of a nucleic acid. As used herein, a promoter includes the necessary nucleic acid sequences near the start site of transcription.
[0052] The term "protein" refers to molecules that comprise an amino acid sequence, wherein the amino acids are linked by peptide bonds.
[0053] "Transformation" refers to the transfer of a nucleic acid into a host organism or into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid are referred to as "recombinant," "transgenic," or "transformed" organisms. Thus, nucleic acids of the present invention can be incorporated into recombinant constructs, typically DNA constructs, capable of introduction into and replication in a host cell. Such a construct can be a vector that includes a replication system and sequences that are capable of transcription and translation of a polypeptide-encoding sequence in a given host cell. Typically, expression vectors include, for example, one or more cloned genes under the transcriptional control of 5' and 3' regulatory sequences and a selectable marker. Such vectors also can contain a promoter regulatory region (e.g., a regulatory region controlling inducible or constitutive, environmentally- or developmentally-regulated, or location-specific expression), a transcription initiation start site, a ribosome binding site, a transcription termination site, and/or a polyadenylation signal.
[0054] The term "transformed cell" refers to a cell that has undergone a transformation. Thus, a transformed cell comprises the parent's genome and an inheritable genetic modification.
[0055] The terms "triacylglyceride," "triacylglycerol," "triglyceride," and "TAG" are esters comprised of glycerol and three fatty acids.
Microbe Engineering
A. Overview
[0056] Genes and gene products may be introduced into microbial host cells. Suitable host cells for expression of the genes and nucleic acid molecules are microbial hosts that can be found broadly within the fungal or bacterial families. Examples of suitable host strains include but are not limited to fungal or yeast species, such as Arxula, Aspegillus, Aurantiochytrium, Candida, Claviceps, Cryptococcus, Cunninghamella, Hansenula, Kluyveromyces, Leucosporidiella, Lipomyces, Mortierella, Ogataea, Pichia, Prototheca, Rhizopus, Rhodosporidium, Rhodotorula, Saccharomyces, Schizosaccharomyces, Tremella, Trichosporon, Yarrowia, or bacterial species, such as members of proteobacteria and actinomycetes, as well as the genera Acinetobacter, Arthrobacter, Brevibacterium, Acidovorax, Bacillus, Clostridia, Streptomyces, Escherichia, Salmonella, Pseudomonas, and Cornyebacterium. Yarrowia lipolytica and Arxula adeninivorans are suited for use as a host microorganism because they can accumulate a large percentage of their weight as triacylglycerols.
[0057] Microbial expression systems and expression vectors containing regulatory sequences that direct high level expression of foreign proteins are known to those skilled in the art. Any of these could be used to construct chimeric genes to produce any one of the gene products of the instant sequences. These chimeric genes could then be introduced into appropriate microorganisms via transformation techniques to provide high-level expression of the enzymes.
[0058] For example, a gene encoding an enzyme can be cloned in a suitable plasmid, and an aforementioned starting parent strain as a host can be transformed with the resulting plasmid. This approach can increase the copy number of each of the genes encoding the enzymes and, as a result, the activities of the enzymes can be increased. The plasmid is not particularly limited so long as it renders a desired genetic modification inheritable to the microorganism's progeny.
[0059] Vectors or cassettes useful for the transformation of suitable host cells are well known. Typically the vector or cassette contains sequences that direct the transcription and translation of the relevant gene, a selectable marker, and sequences that allow autonomous replication or chromosomal integration. Suitable vectors comprise a region 5' of the gene harboring transcriptional initiation controls and a region 3' of the DNA fragment which controls transcriptional termination. In certain embodiments both control regions are derived from genes homologous to the transformed host cell, although it is to be understood that such control regions need not be derived from the genes native to the specific species chosen as a production host. Promoters, cDNAs, and 3'UTRs, as well as other elements of the vectors, can be generated through cloning techniques using fragments isolated from native sources (see, e.g., Green & Sambrook, Molecular Cloning: A Laboratory Manual, (4th ed., 2012); U.S. Pat. No. 4,683,202 (incorporated by reference)). Alternatively, elements can be generated synthetically using known methods (see, e.g., Gene 164:49-53 (1995)).
B. Homologous Recombination
[0060] Homologous recombination is the ability of complementary DNA sequences to align and exchange regions of homology. Transgenic DNA ("donor") containing sequences homologous to the genomic sequences being targeted ("template") is introduced into the organism and then undergoes recombination into the genome at the site of the corresponding homologous genomic sequences.
[0061] The ability to carry out homologous recombination in a host organism has many practical implications for what can be carried out at the molecular genetic level and is useful in the generation of a microbe that can produce a desired product. By its nature homologous recombination is a precise gene targeting event and, hence, most transgenic lines generated with the same targeting sequence will be essentially identical in terms of phenotype, necessitating the screening of far fewer transformation events. Homologous recombination also targets gene insertion events into the host chromosome, potentially resulting in excellent genetic stability, even in the absence of genetic selection. Because different chromosomal loci will likely impact gene expression, even from exogenous promoters/UTRs, homologous recombination can be a method of querying loci in an unfamiliar genome environment and to assess the impact of these environments on gene expression.
[0062] A particularly useful genetic engineering approach using homologous recombination is to co-opt specific host regulatory elements, such as promoters/UTRs, to drive heterologous gene expression in a highly specific fashion.
[0063] Because homologous recombination is a precise gene targeting event, it can be used to precisely modify any nucleotide(s) within a gene or region of interest, so long as sufficient flanking regions have been identified. Therefore, homologous recombination can be used as a means to modify regulatory sequences impacting gene expression of RNA and/or proteins. It can also be used to modify protein coding regions in an effort to modify enzyme activities such as substrate specificity, affinities and Km, thereby affecting a desired change in the metabolism of the host cell. Homologous recombination provides a powerful means to manipulate the host genome resulting in gene targeting, gene conversion, gene deletion, gene duplication, gene inversion, and exchanging gene expression regulatory elements such as promoters, enhancers and 3'UTRs.
[0064] Homologous recombination can be achieved by using targeting constructs containing pieces of endogenous sequences to "target" the gene or region of interest within the endogenous host cell genome. Such targeting sequences can either be located 5' of the gene or region of interest, 3' of the gene/region of interest or even flank the gene/region of interest. Such targeting constructs can be transformed into the host cell either as a supercoiled plasmid DNA with additional vector backbone, a PCR product with no vector backbone, or as a linearized molecule. In some cases, it may be advantageous to first expose the homologous sequences within the transgenic DNA (donor DNA) by cutting the transgenic DNA with a restriction enzyme. This step can increase the recombination efficiency and decrease the occurrence of undesired events. Other methods of increasing recombination efficiency include using PCR to generate transforming transgenic DNA containing linear ends homologous to the genomic sequences being targeted.
C. Vectors and Vector Components
[0065] Vectors for transforming microorganisms in accordance with the present invention can be prepared by known techniques familiar to those skilled in the art in view of the disclosure herein. A vector typically contains one or more genes, in which each gene codes for the expression of a desired product (the gene product) and is operably linked to one or more control sequences that regulate gene expression or target the gene product to a particular location in the recombinant cell.
1. Control Sequences
[0066] Control sequences are nucleic acids that regulate the expression of a coding sequence or direct a gene product to a particular location in or outside a cell. Control sequences that regulate expression include, for example, promoters that regulate transcription of a coding sequence and terminators that terminate transcription of a coding sequence. Another control sequence is a 3' untranslated sequence located at the end of a coding sequence that encodes a polyadenylation signal. Control sequences that direct gene products to particular locations include those that encode signal peptides, which direct the protein to which they are attached to a particular location inside or outside the cell.
[0067] Thus, an exemplary vector design for expression of a gene in a microbe contains a coding sequence for a desired gene product (for example, a selectable marker, or an enzyme) in operable linkage with a promoter active in yeast. Alternatively, if the vector does not contain a promoter in operable linkage with the coding sequence of interest, the coding sequence can be transformed into the cells such that it becomes operably linked to an endogenous promoter at the point of vector integration.
[0068] The promoter used to express a gene can be the promoter naturally linked to that gene or a different promoter.
[0069] A promoter can generally be characterized as constitutive or inducible. Constitutive promoters are generally active or function to drive expression at all times (or at certain times in the cell life cycle) at the same level. Inducible promoters, conversely, are active (or rendered inactive) or are significantly up- or down-regulated only in response to a stimulus. Both types of promoters find application in the methods of the invention. Inducible promoters useful in the invention include those that mediate transcription of an operably linked gene in response to a stimulus, such as an exogenously provided small molecule, temperature (heat or cold), lack of nitrogen in culture media, etc. Suitable promoters can activate transcription of an essentially silent gene or upregulate, e.g., substantially, transcription of an operably linked gene that is transcribed at a low level.
[0070] Inclusion of termination region control sequence is optional, and if employed, then the choice is primarily one of convenience, as the termination region is relatively interchangeable. The termination region may be native to the transcriptional initiation region (the promoter), may be native to the DNA sequence of interest, or may be obtainable from another source (See, e.g., Chen & Orozco, Nucleic Acids Research 16:8411 (1988)).
2. Genes and Codon Optimization
[0071] Typically, a gene includes a promoter, a coding sequence, and termination control sequences. When assembled by recombinant DNA technology, a gene may be termed an expression cassette and may be flanked by restriction sites for convenient insertion into a vector that is used to introduce the recombinant gene into a host cell. The expression cassette can be flanked by DNA sequences from the genome or other nucleic acid target to facilitate stable integration of the expression cassette into the genome by homologous recombination. Alternatively, the vector and its expression cassette may remain unintegrated (e.g., an episome), in which case, the vector typically includes an origin of replication, which is capable of providing for replication of the vector DNA.
[0072] A common gene present on a vector is a gene that codes for a protein, the expression of which allows the recombinant cell containing the protein to be differentiated from cells that do not express the protein. Such a gene, and its corresponding gene product, is called a selectable marker or selection marker. Any of a wide variety of selectable markers can be employed in a transgene construct useful for transforming the organisms of the invention.
[0073] For optimal expression of a recombinant protein, it is beneficial to employ coding sequences that produce mRNA with codons optimally used by the host cell to be transformed. Thus, proper expression of transgenes can require that the codon usage of the transgene matches the specific codon bias of the organism in which the transgene is being expressed. The precise mechanisms underlying this effect are many, but include the proper balancing of available aminoacylated tRNA pools with proteins being synthesized in the cell, coupled with more efficient translation of the transgenic messenger RNA (mRNA) when this need is met. When codon usage in the transgene is not optimized, available tRNA pools are not sufficient to allow for efficient translation of the transgenic mRNA resulting in ribosomal stalling and termination and possible instability of the transgenic mRNA. Resources for codon-optimization of gene sequences are described in Puigbo et al. (Nucleic Acids Research 35:W126-31 (2007)), and principles underlying codon optimization strategies are described in Angov (Biotechnology Jornal 6:650-69 (2011)). Public databases providing statistics for codon usage by different organisms are available, including at www kazusa.or.jp/codon/and other publicly available databases and resources.
D. Transformation
[0074] Cells can be transformed by any suitable technique including, e.g., biolistics, electroporation, glass bead transformation, and silicon carbide whisker transformation. Any convenient technique for introducing a transgene into a microorganism can be employed in the present invention. Transformation can be achieved by, for example, the method of D. M. Morrison (Methods in Enzymology 68:326 (1979)), the method by increasing permeability of recipient cells for DNA with calcium chloride (Mandel & Higa, J. Molecular Biology, 53:159 (1970)), or the like.
[0075] Examples of expression of transgenes in oleaginous yeast (e.g., Yarrowia lipolytica) can be found in the literature (Bordes et al., J. Microbiological Methods, 70:493 (2007); Chen et al., Applied Microbiology & Biotechnology 48:232 (1997)). Examples of expression of exogenous genes in bacteria such as E. coli are well known (Green & Sambrook, Molecular Cloning: A Laboratory Manual, (4th ed., 2012)).
[0076] Vectors for transformation of microorganisms in accordance with the present invention can be prepared by known techniques familiar to those skilled in the art. In one embodiment, an exemplary vector design for expression of a gene in a microorganism contains a gene encoding an enzyme in operable linkage with a promoter active in the microorganism. Alternatively, if the vector does not contain a promoter in operable linkage with the gene of interest, the gene can be transformed into the cells such that it becomes operably linked to a native promoter at the point of vector integration. The vector can also contain a second gene that encodes a protein. Optionally, one or both gene(s) is/are followed by a 3' untranslated sequence containing a polyadenylation signal. Expression cassettes encoding the two genes can be physically linked in the vector or on separate vectors. Co-transformation of microbes can also be used, in which distinct vector molecules are simultaneously used to transform cells (Protist 155:381-93 (2004)). The transformed cells can be optionally selected based upon the ability to grow in the presence of the antibiotic or other selectable marker under conditions in which cells lacking the resistance cassette would not grow.
Exemplary Cells, Nucleic Acids, Compositions, and Methods
A. Transformed Cell
[0077] In some embodiments, the transformed cell is a prokaryotic cell, such as a bacterial cell. In some embodiments, the cell is a eukaryotic cell, such as a mammalian cell, a yeast cell, a filamentous fungi cell, a protist cell, an algae cell, an avian cell, a plant cell, or an insect cell. In some embodiments, the cell is a yeast. Those with skill in the art will recognize that many forms of filamentous fungi produce yeast-like growth, and the definition of yeast herein encompasses such cells. The cell may cell may be selected from the group consisting of algae, bacteria, molds, fungi, plants, and yeasts. The cell may be a yeast, fungus, or yeast-like algae. The cell may be selected from thraustochytrids (Aurantiochytrium) and achlorophylic unicellular algae (Prototheca).
[0078] The cell may be selected from the group consisting of Arxula, Aspegillus, Aurantiochytrium, Candida, Claviceps, Cryptococcus, Cunninghamella, Geotrichum, Hansenula, Kluyveromyces, Kodamaea, Leucosporidiella, Lipomyces, Mortierella, Ogataea, Pichia, Prototheca, Rhizopus, Rhodosporidium, Rhodotorula, Saccharomyces, Schizosaccharomyces, Tremella, Trichosporon, Wickerhamomyces, and Yarrowia. It is specifically contemplated that one or more of these cell types may be excluded from embodiments of this invention.
[0079] The cell may be selected from the group of consisting of Arxula adeninivorans, Aspergillus niger, Aspergillus orzyae, Aspergillus terreus, Aurantiochytrium limacinum, Candida utilis, Claviceps purpurea, Cryptococcus albidus, Cryptococcus curvatus, Cryptococcus ramirezgomezianus, Cryptococcus terreus, Cryptococcus wieringae, Cunninghamella echinulata, Cunninghamella japonica, Geotrichum fermentans, Hansenula polymorpha, Kluyveromyces lactis, Kluyveromyces marxianus, Kodamaea ohmeri, Leucosporidiella creatinivora, Lipomyces lipofer, Lipomyces starkeyi, Lipomyces tetrasporus, Mortierella isabellina, Mortierella alpina, Ogataea polymorpha, Pichia ciferrii, Pichia guilliermondii, Pichia pastoris, Pichia stipites, Prototheca zopfii, Rhizopus arrhizus, Rhodosporidium babjevae, Rhodosporidium toruloides, Rhodosporidium paludigenum, Rhodotorula glutinis, Rhodotorula mucilaginosa, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Tremella enchepala, Trichosporon cutaneum, Trichosporon fermentans, Wickerhamomyces ciferrii, and Yarrowia lipolytica. It is specifically contemplated that one or more of these cell types may be excluded from embodiments of this invention.
[0080] The cell may be Saccharomyces cerevisiae, Yarrowia lipolytica, or Arxula adeninivorans.
[0081] In certain embodiments, the transformed cell comprises at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, or more lipid as measured by % dry cell weight, or any range derivable therein. In some embodiments, the transformed cell comprises C18 fatty acids at a concentration of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, or higher as a percentage of total C16 and C18 fatty acids in the cell, or any range derivable therein.
[0082] In some embodiments, the transformed cell comprises oleic acid at a concentration of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, or higher as a percentage of total C16 and C18 fatty acids in the cell, or any range derivable therein. In some embodiments, the transformed cell comprises a linear fatty acid with a chain length of 14-20 carbons with a methyl branch at the .DELTA.9, .DELTA.10, or .DELTA.11 position (e.g., 10-methylstearic acid) at a concentration of at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% by weight or higher as a percentage of total fatty acids in the cell, or any range derivable therein. In some embodiments, the fatty acid has a chain length of 14, 15, 16, 17, 18, 19, or 20 carbons, or any range derivable therein.
[0083] A cell may be modified to increase its oleate content, which serves as a substrate for 10-methylstearate synthesis. Genetic modifications that increase oleate content are known (see, e.g., PCT Patent Application Publication No. WO16/094520, published Jun. 16, 2016, hereby incorporated by reference in its entirety). For example, a cell may comprise a 412 desaturase knockdown or knockout, which favors the accumulation of oleate and disfavors the production of linoleate. A cell may comprise a recombinant 49 desaturase gene, which favors the production of oleate and disfavors the accumulation of stearate. The recombinant 49 desaturase gene may be, for example, the 49 desaturase gene from Y. lipolytica, Arxula adeninivorans, or Puccinia graminis. A cell may comprise a recombinant elongase 1 gene, which favors the production of oleate and disfavors the accumulation of palmitate and palmitoleate. The recombinant elongase 1 gene may be the elongase 1 gene from Y. lipolytica. A cell may comprise a recombinant elongase 2 gene, which favors the production of oleate and disfavors the accumulation of palmitate and palmitoleate. The recombinant elongase 2 gene may be the elongase 2 gene from R. norvegicus.
[0084] A cell may be modified to increase its triacylglycerol content, thereby increasing its 10-methylstearate content. Genetic modifications that increase triacylglycerol content are known (see, e.g., PCT Patent Application Publication No. WO16/094520, published Jun. 16, 2016, hereby incorporated by reference in its entirety). A cell may comprise a recombinant diacylglycerol acyltransferase gene (e.g., DGAT1, DGAT2, or DGAT3), which favors the production of triacylglycerols and disfavors the accumulation of diacylglycerols. The recombinant diacylglycerol acyltransferase gene may be, for example, DGAT2 (encoding protein DGA1) from Y. lipolytica, DGAT1 (encoding protein DGA2) from C. purpurea, or DGAT2 (encoding protein DGA1) from R. toruloides. The cell may comprise a glycerol-3-phosphate acyltransferase gene (Sct1) knockdown or knockout, which may favor the accumulation of triacylglycerols, depending on the cell type. The cell may comprise a recombinant glycerol-3-phosphate acyltransferase gene (Sct1) such as the Sct1 gene from A. adeninivorans, which may favor the accumulation of triacylglycerols. The cell may comprise a triacylglycerol lipase gene (TGL) knockdown or knockout, which may favor the accumulation of triacylglycerols in the cell.
[0085] Various aspects of the invention relate to a transformed cell. The transformed cell may comprise a recombinant methyltransferase gene (e.g., a tmsB gene), a recombinant reductase gene (e.g., a tmsA gene), an exomethylene-substituted lipid, and/or a branched (methyl)lipid. A transformed cell may comprise a tmsC gene. A branched (methyl)lipid may be a carboxylic acid (e.g., 10-methylstearic acid, 10-methylpalmitic acid, 12-methyloleic acid, 13-methyloleic acid, 10-methyl-octadec-12-enoic acid), carboxylate (e.g., 10-methylstearate, 10-methylpalmitate, 12-methyloleate, 13-methyloleate, 10-methyl-octadec-12-enoate), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylstearyl CoA, 10-methylpalmityl CoA, 12-methyloleoyl CoA, 13-methyloleoyl CoA, 10-methyl-octadec-12-enoyl CoA), or amide. An exomethylene-substituted lipid may be a carboxylic acid (e.g., 10-methylenestearic acid, 10-methylenepalmitic acid, 12-methyleneoleic acid, 13-methyleneoleic acid, 10-methylene-octadec-12-enoic acid), carboxylate (e.g., 10-methylenestearate, 10-methylenepalmitate, 12-methyleneoleate, 13-methyleneoleate, 10-methylene-octadec-12-enoate), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylenestearyl CoA, 10-methylenepalmityl CoA, 12-methyleneoleoyl CoA, 13-methyleneoleoyl CoA, 10-methylene-octadec-12-enoyl CoA), or amide. It is specifically contemplated that one or more of the above lipids may be excluded from embodiments of this invention.
[0086] "Fatty acids" generally exist in a cell as a phospholipid or triacylglycerol, although they may also exist as a monoacylglycerol or diacylglycerol, for example, as a metabolic intermediate. Free fatty acids also exist in the cell in equilibrium between a relatively abundant carboxylate anion and a relatively scarce, neutrally-charged acid. A fatty acid may exist in a cell as a thioester, especially as a thioester with coenzyme A (CoA), during biosynthesis or oxidation. A fatty acid may exist in a cell as an amide, for example, when covalently bound to a protein to anchor the protein to a membrane.
[0087] A cell may comprise any one of the nucleic acids described herein, infra (see, e.g., Section B, below).
[0088] A branched (methyl)lipid may comprise a saturated branched aliphatic chain (e.g., 10-methylstearic acid, 10-methylpalmitic acid) or an unsaturated branched aliphatic chain (e.g., 12-methyloleic acid, 13-methyloleic acid, 10-methyl-octadec-12-enoic acid). The branched (methyl)lipid may comprise a saturated or unsaturated branched aliphatic chain comprising a branching methyl group.
[0089] An exomethylene-substituted lipid may comprise a branched aliphatic chain (e.g., 10-methylenestearic acid, 10-methylenepalmitic acid, 12-methyleneoleic acid, 13-methyleneoleic acid, 10-methylene-octadec-12-enoic acid). The aliphatic chain may be branched because the aliphatic chain is substituted with an exomethylene group.
[0090] A branched (methyl)lipid may be 10-methylstearate, or an acid (10-methylstearic acid), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylstearyl CoA), or amide (e.g., 10-methylstearyl amide) thereof. For example, the branched (methyl)lipid may be a diacylglycerol, triacylglycerol, or phospholipid, and the diacylglycerol, triacylglycerol, or phospholipid may comprise an ester of 10-methylstearate.
[0091] An exomethylene-substituted lipid may be 10-methylenestearate, or an acid (10-methylenestearic acid), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylenestearyl CoA), or amide (e.g., 10-methylenestearyl amide) thereof. For example, the exomethylene-substituted lipid may be a diacylglycerol, triacylglycerol, or phospholipid, and the diacylglycerol, triacylglycerol, or phospholipid may comprise an ester of 10-methylenestearate.
[0092] In some embodiments, about, at most about, or at least about 1% of the fatty acids of the cell may be 10-methylstearic acid as measured by % dry cell weight. About, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the fatty acids of the cell may be 10-methylstearic acid as measured by % dry cell weight, or any range derivable therein.
[0093] In some embodiments, about, at least about, or at most about 1% of the fatty acids of the cell may be 10-methylenestearic acid as measured by % dry cell weight. About, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the fatty acids of the cell may be 10-methylenestearic acid as measured by % dry cell weight, or any range derivable therein.
[0094] In some embodiments, about, at least about, or at most about 1% by weight of the fatty acids of the cell may be one or more of the branched (methyl)lipids described herein. About, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% by weight of the fatty acids of the cell may be one or more of the branched (methyl)lipids described herein, or any range derivable therein.
[0095] In some embodiments, about, at least about, or at most about 1% by weight of the fatty acids of the cell may one or more of the branched (methyl)lipids described herein (e.g., a linear fatty acid with a chain length of 14-20 carbons with a methyl branch at the .DELTA.9, .DELTA.10, or .DELTA.11 position). About, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the fatty acids of the cell may one or more of the branched (methyl)lipids described herein (e.g., a linear fatty acid with a chain length of 14-20 carbons with a methyl branch at the .DELTA.9, .DELTA.10, or .DELTA.11 position), or any range derivable therein.
[0096] In some embodiments, the cell may comprise about, at least about, or at most about 1% 10-methylstearic acid as measured by % dry cell weight. The cell may comprise about, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, or 50% 10-methylstearic acid as measured by % dry cell weight, or any range derivable therein.
[0097] In some embodiments, the cell may comprise about, at least about, or at most about 1% 10-methylenestearic acid as measured by % dry cell weight. The cell may comprise about, at least about, or at most about 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, or 50% 10-methylenestearic acid as measured by % dry cell weight, or any range derivable therein.
[0098] An unmodified cell of the same type (e.g., species) as a cell of the invention may not comprise 10-methylstearate, or an acid (10-methylstearic acid), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylstearyl CoA), or amide (e.g., 10-methylstearyl amide) thereof (e.g., wherein the unmodified cell does not comprise a recombinant methyltransferase gene or a recombinant reductase gene). An unmodified cell of the same type (e.g., species) as a cell of the invention may not comprise 10-methylenestearate, or an acid (10-methylenestearic acid), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylenestearyl CoA), or amide (e.g., 10-methylenestearyl amide) thereof (e.g., wherein the unmodified cell does not comprise a recombinant methyltransferase gene or a recombinant reductase gene). In some embodiments, an unmodified cell of the same species as the cell does not comprise a branched (methyl)lipid and/or an exomethylene-substituted lipid. In some embodiments, an unmodified cell of the same species as the cell does not comprise one or more of the branched (methyl)lipids or exomethylene-substituted lipids described herein.
[0099] A cell may constitutively express the protein encoded by a recombinant methyltransferase gene. A cell may constitutively express the protein encoded by a recombinant reductase gene. A cell may constitutively express the protein encoded by a recombinant tmsC gene. A cell may constitutively express a methyltransferase protein. A cell may constitutively express a reductase protein. A cell may constitutively express a tmsC protein.
B. Nucleic Acids
[0100] Various aspects of the invention relate to a nucleic acid comprising a recombinant methyltransferase gene, a recombinant reductase gene, or both. The nucleic acid may be, for example, a plasmid. In some embodiments, a recombinant methyltransferase gene and/or a recombinant reductase gene is integrated into the genome of a cell, and thus, the nucleic acid may be a chromosome. In some embodiments, the invention relates to a cell comprising a recombinant methyltransferase gene, e.g., wherein the recombinant methyltransferase gene is present in a plasmid or chromosome. In some embodiments, the invention relates to a cell comprising a recombinant reductase gene, e.g., wherein the recombinant reductase gene is present in a plasmid or chromosome. A recombinant methyltransferase gene and a recombinant reductase gene may be present in a cell in the same nucleic acid (e.g., same plasmid or chromosome) or in different nucleic acids (e.g., different plasmids or chromosomes).
[0101] A nucleic acid may be inheritable to the progeny of a transformed cell. A gene such as a recombinant methyltransferase gene or recombinant reductase gene may be inheritable because it resides on a plasmid or chromosome. In certain embodiments, a gene may be inheritable because it is integrated into the genome of the transformed cell.
[0102] A gene may comprise conservative substitutions, deletions, and/or insertions while still encoding a protein that has activity. For example, codons may be optimized for a particular host cell, different codons may be substituted for convenience, such as to introduce a restriction site or to create optimal PCR primers, or codons may be substituted for another purpose. Similarly, the nucleotide sequence may be altered to create conservative amino acid substitutions, deletions, and/or insertions.
[0103] Proteins may comprise conservative substitutions, deletions, and/or insertions while still maintaining activity. Conservative substitution tables are well known in the art (Creighton, Proteins (2d. ed., 1992)).
[0104] Amino acid substitutions, deletions and/or insertions may readily be made using recombinant DNA manipulation techniques. Methods for the manipulation of DNA sequences to produce substitution, insertion or deletion variants of a protein are well known in the art. These methods include M13 mutagenesis, T7-Gen in vitro mutagenesis (USB, Cleveland, Ohio), Quick Change Site Directed mutagenesis (Stratagene, San Diego, Calif.), PCR-mediated site-directed mutagenesis, and other site-directed mutagenesis protocols.
[0105] To determine the percent identity of two amino acid sequences or two nucleic acid sequences, the sequences can be aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-identical sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes can be at least 95% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions can then be compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein amino acid or nucleic acid "identity" is equivalent to amino acid or nucleic acid "homology"). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
[0106] The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. Unless otherwise specified, when percent identity between two amino acid sequences is referred to herein, it refers to the percent identity as determined using the Needleman and Wunsch (J. Molecular Biology 48:444-453 (1970)) algorithm which has been incorporated into the GAP program in the GCG software package (available at www.gcg.com), using a Blosum 62 matrix, a gap weight of 10, and a length weight of 4. In some embodiments, the percent identity between two amino acid sequences is determined the Needleman and Wunsch algorithm using a Blosum 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. Unless otherwise specified, when percent identity between two nucleotide sequences is referred to herein, it refers to percent identity as determined using the GAP program in the GCG software package (available at www.gcg.com), using a NWSgapdna.CMP matrix and a gap weight of 60 and a length weight of 4. In yet another embodiment, the percent identity between two nucleotide sequences can be determined using a gap weight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. In another embodiment, the percent identity between two amino acid or nucleotide sequences can be determined using the algorithm of E. Meyers and W. Miller (Computer Applications in the Biosciences 4:11-17 (1988)) which has been incorporated into the ALIGN program (version 2.0 or 2.0U), using a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4.
[0107] Exemplary computer programs which can be used to determine identity between two sequences include, but are not limited to, the suite of BLAST programs, e.g., BLASTN, MEGABLAST, BLASTX, TBLASTN, TBLASTX, and BLASTP, and Clustal programs, e.g., ClustalW, ClustalX, and Clustal Omega.
[0108] Sequence searches are typically carried out using the BLASTN program, when evaluating a given nucleic acid sequence relative to nucleic acid sequences in the GenBank DNA Sequences and other public databases. The BLASTX program is effective for searching nucleic acid sequences that have been translated in all reading frames against amino acid sequences in the GenBank Protein Sequences and other public databases.
[0109] An alignment of selected sequences in order to determine "% identity" between two or more sequences is performed using for example, the CLUSTAL-W program.
[0110] A "coding sequence" or "coding region" refers to a nucleic acid molecule having sequence information necessary to produce a protein product, such as an amino acid or polypeptide, when the sequence is expressed. The coding sequence may comprise and/or consist of untranslated sequences (including introns or 5' or 3' untranslated regions) within translated regions, or may lack such intervening untranslated sequences (e.g., as in cDNA).
[0111] The abbreviation used throughout the specification to refer to nucleic acids comprising and/or consisting of nucleotide sequences are the conventional one-letter abbreviations. Thus when included in a nucleic acid, the naturally occurring encoding nucleotides are abbreviated as follows: adenine (A), guanine (G), cytosine (C), thymine (T) and uracil (U). Also, unless otherwise specified, the nucleic acid sequences presented herein is the 5'.fwdarw.3' direction.
[0112] As used herein, the term "complementary" and derivatives thereof are used in reference to pairing of nucleic acids by the well-known rules that A pairs with T or U and C pairs with G. Complement can be "partial" or "complete". In partial complement, only some of the nucleic acid bases are matched according to the base pairing rules; while in complete or total complement, all the bases are matched according to the pairing rule. The degree of complement between the nucleic acid strands may have significant effects on the efficiency and strength of hybridization between nucleic acid strands as well known in the art. The efficiency and strength of said hybridization depends upon the detection method.
[0113] Any nucleic acid that is referred to herein as having a certain percent sequence identity to a sequence set forth in a SEQ ID NO, includes nucleic acids that have the certain percent sequence identity to the complement of the sequence set forth in the SEQ ID NO.
[0114] i. Nucleic Acids Comprising a Recombinant Methyltransferase Gene
[0115] A methyltransferase gene (e.g., a recombinant methyltransferase gene) encodes a methyltransferase protein, which is an enzyme capable of transferring a carbon atom and one or more protons bound thereto from a substrate such as S-adenosyl methionine to a fatty acid such as oleic acid (e.g., wherein the fatty acid is present as a free fatty acid, carboxylate, phospholipid, diacylglycerol, or triacylglycerol). A methyltransferase gene (e.g., a recombinant methyltransferase gene) may comprise any one of the nucleotide sequences set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, and SEQ ID NO:81. A methyltransferase gene (e.g., a recombinant methyltransferase gene) may be a 10-methylstearic B gene (tmsB) as described herein, or a biologically-active portion thereof (i.e., wherein the biologically-active portion thereof comprises methyltransferase activity).
[0116] A methyltransferase gene (e.g., a recombinant methyltransferase gene) may be derived from a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A methyltransferase gene (e.g., a recombinant methyltransferase gene) may be selected from the group consisting of Mycobacterium smegmatis gene tmsB, Agromyces subbeticus gene tmsB, Amycolicicoccus subflavus gene tmsB, Corynebacterium glutamicum gene tmsB, Corynebacterium glyciniphilium gene tmsB, Knoella aerolata gene tmsB, Mycobacterium austroafricanum gene tmsB, Mycobacterium gilvum gene tmsB, Mycobacterium indicus pranii gene tmsB, Mycobacterium phlei gene tmsB, Mycobacterium tuberculosis gene tmsB, Mycobacterium vanbaalenii gene tmsB, Rhodococcus opacus gene tmsB, Streptomyces regnsis gene tmsB, Thermobifida fusca gene tmsB, and Thermomonospora curvata gene tmsB. It is specifically contemplated that one or more of the above methyltransferase genes may be excluded from embodiments of this invention.
[0117] A recombinant methyltransferase gene may be recombinant because it is operably-linked to a promoter other than the naturally-occurring promoter of the methyltransferase gene. Such genes may be useful to drive transcription in a particular species of cell. A recombinant methyltransferase gene may be recombinant because it contains one or more nucleotide substitutions relative to a naturally-occurring methyltransferase gene. Such genes may be useful to increase the translation efficiency of the methyltransferase gene's mRNA transcript in a particular species of cell.
[0118] A nucleic acid may comprise a recombinant methyltransferase gene and a promoter, wherein the recombinant methyltransferase gene and promoter are operably-linked. The recombinant methyltransferase gene and promoter may be derived from different species. For example, the recombinant methyltransferase gene may encode the methyltransferase protein of a gram-positive species of Actinobacteria, and the recombinant methyltransferase gene may be operably-linked to a promoter that can drive transcription in another phylum of bacteria (e.g., a Proteobacterium, such as E. coli) or a eukaryote (e.g., an algae cell, yeast cell, or plant cell). The promoter may be a eukaryotic promoter. A cell may comprise the nucleic acid, and the promoter may be capable of driving transcription in the cell. A cell may comprise a recombinant methyltransferase gene, and the recombinant methyltransferase gene may be operably-linked to a promoter capable of driving transcription of the recombinant methyltransferase gene in the cell. The cell may be a species of yeast, and the promoter may be a yeast promoter. The cell may be a species of bacteria, and the promoter may be a bacterial promoter (e.g., wherein the bacterial promoter is not a promoter from Actinobacteria). The cell may be a species of algae, and the promoter may be an algae promoter. The cell may be a species of plant, and the promoter may be a plant promoter.
[0119] A recombinant methyltransferase gene may be operably-linked to a promoter that cannot drive transcription in the cell from which the recombinant methyltransferase gene originated. For example, the promoter may not be capable of binding an RNA polymerase of the cell from which a recombinant methyltransferase gene originated. In some embodiments, the promoter cannot bind a prokaryotic RNA polymerase and/or initiate transcription mediated by a prokaryotic RNA polymerase. In some embodiments, a recombinant methyltransferase gene is operably-linked to a promoter that cannot drive transcription in the cell from which the protein encoded by the gene originated. For example, the promoter may not be capable of binding an RNA polymerase of a cell that naturally expresses the methyltransferase enzyme encoded by a recombinant methyltransferase gene.
[0120] A promoter may be an inducible promoter or a constitutive promoter. A promoter may be any one of the promoters described in PCT Patent Application Publication No. WO 2016/014900, published Jan. 28, 2016 (hereby incorporated by reference in its entirety). WO 2016/014900 describes various promoters derived from yeast species Yarrowia lipolytica and Arxula adeninivorans, which may be particularly useful as promoters for driving the transcription of a recombinant gene in a yeast cell. A promoter may be a promoter from a gene encoding a Translation Elongation factor EF-1.alpha.; Glycerol-3-phosphate dehydrogenase; Triosephosphate isomerase 1; Fructose-1,6-bisphosphate aldolase; Phosphoglycerate mutase; Pyruvate kinase; Export protein EXP1; Ribosomal protein S7; Alcohol dehydrogenase; Phosphoglycerate kinase; Hexose Transporter; General amino acid permease; Serine protease; Isocitrate lyase; Acyl-CoA oxidase; ATP-sulfurylase; Hexokinase; 3-phosphoglycerate dehydrogenase; Pyruvate Dehydrogenase Alpha subunit; Pyruvate Dehydrogenase Beta subunit; Aconitase; Enolase; Actin; Multidrug resistance protein (ABC-transporter); Ubiquitin; GTPase; Plasma membrane Na+/P.sub.i cotransporter; Pyruvate decarboxylase; Phytase; or Alpha-amylase, e.g., wherein the gene is a yeast gene, such as a gene from Yarrowia lipolytica or Arxula adeninivorans.
[0121] A recombinant methyltransferase gene may comprise a nucleotide sequence with at least about 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, or SEQ ID NO:81. A recombinant methyltransferase gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity (or any range derivable therein) with 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or 1300 contiguous base pairs (or any range derivable therein) starting at nucleotide position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, or 1200 of the nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, or SEQ ID NO:81. A recombinant methyltransferase may or may not have 100% sequence identity with any one of the nucleotide sequences set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, or SEQ ID NO:81. A recombinant methyltransferase gene may or may not have 100% sequence identity with 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or 1300 contiguous base pairs of the nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, or SEQ ID NO:81. A recombinant methyltransferase gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, SEQ ID NO:75, or SEQ ID NO:81, and the recombinant methyltransferase gene may encode a methyltransferase protein with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76. For example, SEQ ID NO:81 is a gene that is codon-optimized for expression in yeast. SEQ ID NO:81 has about 70% sequence identity (69.86% sequence identity) with SEQ ID NO:3, and the protein encoded by SEQ ID NO:81 has 100% sequence identity with the amino acid sequence set forth in by SEQ ID NO:4. Thus, even though SEQ ID NO:81 and SEQ ID NO:3 have 69.86% sequence identity, the two nucleotide sequences encode the same amino acid sequence.
[0122] A recombinant methyltransferase gene may vary from a naturally-occurring methyltransferase gene because the recombinant methyltransferase gene may be codon-optimized for expression in a eukaryotic cell, such as a plant cell, algae cell, or yeast cell. A cell may comprise a recombinant methyltransferase gene, wherein the recombinant methyltransferase gene is codon-optimized for the cell.
[0123] Exactly, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, or 500 codons of a recombinant methyltransferase gene may vary from a naturally-occurring methyltransferase gene or may be unchanged from a naturally-occurring methyltransferase gene. For example, a recombinant methyltransferase gene may comprise a nucleotide sequence with at least about 65% sequence identity with the naturally-occurring nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:7, SEQ ID NO:11, SEQ ID NO:15, SEQ ID NO:19, SEQ ID NO:25, SEQ ID NO:29, SEQ ID NO:35, SEQ ID NO:41, SEQ ID NO:45, SEQ ID NO:49, SEQ ID NO:53, SEQ ID NO:59, SEQ ID NO:63, SEQ ID NO:69, or SEQ ID NO:75 (e.g., at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity), and at least 5 codons of the nucleotide sequence of the recombinant methyltransferase gene may vary from the naturally-occurring nucleotide sequence (e.g., at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 codons (or any range deriable therein)).
[0124] A methyltransferase gene encodes a methyltransferase protein. A methyltransferase protein may be a protein expressed by a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A recombinant methyltransferase gene may encode a naturally-occurring methyltransferase protein even if the recombinant methyltransferase gene is not a naturally-occurring methyltransferase gene. For example, a recombinant methyltransferase gene may vary from a naturally-occurring methyltransferase gene because the recombinant methyltransferase gene is codon-optimized for expression in a specific cell. The codon-optimized, recombinant methyltransferase gene and the naturally-occurring methyltransferase gene may nevertheless encode the same naturally-occurring methyltransferase protein.
[0125] A recombinant methyltransferase gene may encode a methyltransferase protein selected from Mycobacterium smegmatis enzyme tmsB, Agromyces subbeticus enzyme tmsB, Amycolicicoccus subflavus enzyme tmsB, Corynebacterium glutamicum enzyme tmsB, Corynebacterium glyciniphilium enzyme tmsB, Knoella aerolata enzyme tmsB, Mycobacterium austroafricanum enzyme tmsB, Mycobacterium gilvum enzyme tmsB, Mycobacterium indicus pranii enzyme tmsB, Mycobacterium phlei enzyme tmsB, Mycobacterium tuberculosis enzyme tmsB, Mycobacterium vanbaalenii enzyme tmsB, Rhodococcus opacus enzyme tmsB, Streptomyces regnsis enzyme tmsB, Thermobifida fusca enzyme tmsB, and Thermomonospora curvata enzyme tmsB. It is specifically contemplated that one or more of the above methyltransferase proteins may be excluded from embodiments of this invention. A recombinant methyltransferase gene may encode a methyltransferase protein, and the methyltransferase protein may be substantially identical to any one of the foregoing enzymes, but the recombinant methyltransferase gene may vary from the naturally-occurring gene that encodes the enzyme. The recombinant methyltransferase gene may vary from the naturally-occurring gene because the recombinant methyltransferase gene may be codon-optimized for expression in a specific phylum, class, order, family, genus, species, or strain of cell.
[0126] The sequences of naturally-occurring methyltransferase proteins are set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76. A recombinant methyltransferase gene may or may not encode a protein comprising 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76. For example, a recombinant methyltransferase gene may encode a protein having 100% sequence identity with a biologically-active portion of an amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76.
[0127] A recombinant methyltransferase gene may encode a methyltransferase protein having, having at least, or having at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity (or any range derivable therein) with the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76, or a biologically-active portion thereof. A recombinant methyltransferase gene may encode a methyltransferase protein having at least about 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 100%, 100.1%, 100.2%, 100.3%, 100.4%, 100.5%, 100.6%, 100.7%, 100.8%, 100.9%, 101%, 105%, 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 160%, 170%, 180%, 190%, 200%, 220%, 240%, 260%, 280%, 300%, 320%, 340%, 360%, 380%, or 400% methyltransferase activity (or any range deriable therein) relative to a protein comprising the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76. A recombinant methyltransferase gene may encode a protein having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.9%, or 100% sequence identity with 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, or 500 contiguous amino acids starting at amino acid position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, or 500 of SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76.
[0128] Substrates for the methyltransferase protein may include any fatty acid from 14 to 20 carbons long with an unsaturated double bond in the .DELTA.9, .DELTA.10, or .DELTA.11 position. The methyltransferase protein may be capable of catalyzing the formation of a methylene substitution at the .DELTA.9, .DELTA.10, or .DELTA.11 position of such a substrate.
[0129] In some embodiments, the recombinant methyltransferase gene encodes a methyltransferase protein that includes an S-adenosylmethionine-dependent methyltransferase domain. In some embodiments the S-adenosylmethionine-dependent methyltransferase domain has, has at least, or has at most 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.9%, or 100% sequence identity to amino acids 192-291 of T. curvata TmsB (SEQ ID NO:76) or to a corresponding portion of TmsB from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, or Thermobifida fusca, according to the alignment set forth in FIGS. 19A-D.
[0130] In some embodiments, the recombinant methyltransferase gene encodes a methyltransferase protein that has specific amino acids unchanged from the amino acid sequence set forth in SEQ ID NO:4, SEQ ID NO:8, SEQ ID NO:12, SEQ ID NO:16, SEQ ID NO:20, SEQ ID NO:26, SEQ ID NO:30, SEQ ID NO:36, SEQ ID NO:42, SEQ ID NO:46, SEQ ID NO:50, SEQ ID NO:54, SEQ ID NO:60, SEQ ID NO:64, SEQ ID NO:70, or SEQ ID NO:76. The unchanged amino acids can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, or 29 amino acids selected from D23, G24, A59, H128, F147, Y148, L180, L193, M203, G236, A241, R313, R318, E320, L359, L400, V196, G197, C198, G199, W200, G201, G202, T219, L220, Q246, D247, Y248, and D262 of T. curvata TmsB (SEQ ID NO:76) or corresponding amino acids in TmsB from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, or Thermobifida fusca, according to the alignment set forth in FIGS. 19A-D.
[0131] ii. Nucleic Acids Comprising a Recombinant Reductase Gene
[0132] A reductase gene (e.g., a recombinant reductase gene) encodes a reductase protein, which is an enzyme capable of reducing, often in an NADPH-dependent manner, a double bond of a fatty acid (e.g., wherein the fatty acid is present as a free fatty acid, carboxylate, phospholipid, diacylglycerol, or triacylglycerol). A reductase gene (e.g., a recombinant reductase gene) may comprise any one of the nucleotide sequences set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, and SEQ ID NO:80. A reductase gene (e.g., a recombinant reductase gene) may be a 10-methylstearic A gene (tmsA) as described herein, or a biologically-active portion thereof (i.e., wherein the biologically-active portion thereof comprises reductase activity).
[0133] A reductase gene (e.g., a recombinant reductase gene) may be derived from a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A reductase gene (e.g., a recombinant reductase gene) may be selected from the group consisting of Mycobacterium smegmatis gene tmsA, Agromyces subbeticus gene tmsA, Amycolicicoccus subflavus gene tmsA, Corynebacterium glutamicum gene tmsA, Corynebacterium glyciniphilium gene tmsA, Knoella aerolata gene tmsA, Mycobacterium austroafricanum gene tmsA, Mycobacterium gilvum gene tmsA, Mycobacterium indicus pranii gene tmsA, Mycobacterium phlei gene tmsA, Mycobacterium tuberculosis gene tmsA, Mycobacterium vanbaalenii gene tmsA, Rhodococcus opacus gene tmsA, Streptomyces regnsis gene tmsA, Thermobifida fusca gene tmsA, and Thermomonospora curvata gene tmsA. It is specifically contemplated that one or more of the above reductase genes may be excluded from embodiments of this invention.
[0134] A recombinant reductase gene may be recombinant because it is operably-linked to a promoter other than the naturally-occurring promoter of the reductase gene. Such genes may be useful to drive transcription in a particular species of cell. A recombinant reductase gene may be recombinant because it contains one or more nucleotide substitutions relative to a naturally-occurring reductase gene. Such genes may be useful to increase the translation efficiency of the reductase gene's mRNA transcript in a particular species of cell.
[0135] A nucleic acid may comprise a recombinant reductase gene and a promoter, wherein the recombinant reductase gene and promoter are operably-linked. The recombinant reductase gene and promoter may be derived from different species. For example, the recombinant reductase gene may encode the reductase protein of a gram-positive species of Actinobacteria, and the recombinant reductase gene may be operably-linked to a promoter that can drive transcription in another phylum of bacteria (e.g., a Proteobacterium, such as E. coli) or a eukaryote (e.g., an algae cell, yeast cell, or plant cell). The promoter may be a eukaryotic promoter. A cell may comprise the nucleic acid, and the promoter may be capable of driving transcription in the cell. A cell may comprise a recombinant reductase gene, and the recombinant reductase gene may be operably-linked to a promoter capable of driving transcription of the recombinant reductase gene in the cell. The cell may be a species of yeast, and the promoter may be a yeast promoter. The cell may be a species of bacteria, and the promoter may be a bacterial promoter (e.g., wherein the bacterial promoter is not a promoter from Actinobacteria). The cell may be a species of algae, and the promoter may be an algae promoter. The cell may be a species of plant, and the promoter may be a plant promoter.
[0136] A recombinant reductase gene may be operably-linked to a promoter that cannot drive transcription in the cell from which the recombinant reductase gene originated. For example, the promoter may not be capable of binding an RNA polymerase of the cell from which a recombinant reductase gene originated. In some embodiments, the promoter cannot bind a prokaryotic RNA polymerase and/or initiate transcription mediated by a prokaryotic RNA polymerase. In some embodiments, a recombinant reductase gene is operably-linked to a promoter that cannot drive transcription in the cell from which the protein encoded by the gene originated. For example, the promoter may not be capable of binding an RNA polymerase of a cell that naturally expresses the reductase enzyme encoded by a recombinant reductase gene.
[0137] A promoter may be an inducible promoter or a constitutive promoter. A promoter may be any one of the promoters described in PCT Patent Application Publication No. WO 2016/014900, published Jan. 28, 2016 (hereby incorporated by reference in its entirety). WO 2016/014900 describes various promoters derived from yeast species Yarrowia lipolytica and Arxula adeninivorans, which may be particularly useful as promoters for driving the transcription of a recombinant gene in a yeast cell. A promoter may be a promoter from a gene encoding a Translation Elongation factor EF-1.alpha.; Glycerol-3-phosphate dehydrogenase; Triosephosphate isomerase 1; Fructose-1,6-bisphosphate aldolase; Phosphoglycerate mutase; Pyruvate kinase; Export protein EXP1; Ribosomal protein S7; Alcohol dehydrogenase; Phosphoglycerate kinase; Hexose Transporter; General amino acid permease; Serine protease; Isocitrate lyase; Acyl-CoA oxidase; ATP-sulfurylase; Hexokinase; 3-phosphoglycerate dehydrogenase; Pyruvate Dehydrogenase Alpha subunit; Pyruvate Dehydrogenase Beta subunit; Aconitase; Enolase; Actin; Multidrug resistance protein (ABC-transporter); Ubiquitin; GTPase; Plasma membrane Na+/P.sub.i cotransporter; Pyruvate decarboxylase; Phytase; or Alpha-amylase, e.g., wherein the gene is a yeast gene, such as a gene from Yarrowia lipolytica or Arxula adeninivorans.
[0138] A recombinant reductase gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, or SEQ ID NO:80. A recombinant reductase gene may comprise a nucleotide sequence with, with at least, with at most 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or 1300 contiguous base pairs starting at nucleotide position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, or 1200 of the nucleotide sequence set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, or SEQ ID NO:80. A recombinant reductase may or may not have 100% sequence identity with any one of the nucleotide sequences set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, or SEQ ID NO:80. A recombinant reductase gene may or may not have 100% sequence identity with 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or 1300 contiguous base pairs of the nucleotide sequence set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, or SEQ ID NO:80. A recombinant reductase gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, SEQ ID NO:73, or SEQ ID NO:80, and the recombinant reductase gene may encode a reductase protein with at least about 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74. For example, SEQ ID NO:80 is a gene that is codon-optimized for expression in yeast. SEQ ID NO:80 has about 70% sequence identity (70.09% sequence identity) with SEQ ID NO:1, and the protein encoded by SEQ ID NO:80 has at least about 99% sequence identity with the amino acid sequence set forth in SEQ ID NO:2. The protein encoded by SEQ ID NO:1 has 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:2.
[0139] A recombinant reductase gene may vary from a naturally-occurring reductase gene because the recombinant reductase gene may be codon-optimized for expression in a eukaryotic cell, such as a plant cell, algae cell, or yeast cell. A cell may comprise a recombinant reductase gene, wherein the recombinant reductase gene is codon-optimized for the cell.
[0140] Exactly, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, or 500 codons of a recombinant reductase gene may vary from a naturally-occurring reductase gene or may be unchanged from a naturally-occurring reductase gene. For example, a recombinant reductase gene may comprise a nucleotide sequence with at least 65% sequence identity with the naturally-occurring nucleotide sequence set forth in SEQ ID NO:1, SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:23, SEQ ID NO:27, SEQ ID NO:33, SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:57, SEQ ID NO:61, SEQ ID NO:67, or SEQ ID NO:73 (e.g., at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity), and at least 5 codons of the nucleotide sequence of the recombinant reductase gene may vary from the naturally-occurring nucleotide sequence (e.g., at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 codons).
[0141] A reductase gene encodes a reductase protein. A reductase protein may be a protein expressed by a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A recombinant reductase gene may encode a naturally-occurring reductase protein even if the recombinant reductase gene is not a naturally-occurring reductase gene. For example, a recombinant reductase gene may vary from a naturally-occurring reductase gene because the recombinant reductase gene is codon-optimized for expression in a specific cell. The codon-optimized, recombinant reductase gene and the naturally-occurring reductase gene may nevertheless encode the same naturally-occurring reductase protein.
[0142] A recombinant reductase gene may encode a reductase protein selected from Mycobacterium smegmatis enzyme tmsA, Agromyces subbeticus enzyme tmsA, Amycolicicoccus subflavus enzyme tmsA, Corynebacterium glutamicum enzyme tmsA, Corynebacterium glyciniphilium enzyme tmsA, Knoella aerolata enzyme tmsA, Mycobacterium austroafricanum enzyme tmsA, Mycobacterium gilvum enzyme tmsA, Mycobacterium indicus pranii enzyme tmsA, Mycobacterium phlei enzyme tmsA, Mycobacterium tuberculosis enzyme tmsA, Mycobacterium vanbaalenii enzyme tmsA, Rhodococcus opacus enzyme tmsA, Streptomyces regnsis enzyme tmsA, Thermobifida fusca enzyme tmsA, and Thermomonospora curvata enzyme tmsA. It is specifically contemplated that one or more of the above reductase proteins may be excluded from embodiments of this invention. A recombinant reductase gene may encode a reductase protein, and the reductase protein may be substantially identical to any one of the foregoing enzymes, but the recombinant reductase gene may vary from the naturally-occurring gene that encodes the enzyme. The recombinant reductase gene may vary from the naturally-occurring gene because the recombinant reductase gene may be codon-optimized for expression in a specific phylum, class, order, family, genus, species, or strain of cell.
[0143] The sequences of naturally-occurring reductase proteins are set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74. A recombinant reductase gene may or may not encode a protein comprising 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74. For example, a recombinant reductase gene may encode a protein having 100% sequence identity with a biologically-active portion of an amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74.
[0144] A recombinant reductase gene may encode a reductase protein having, having at least, or having at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74, or a biologically-active portion thereof. A recombinant reductase gene may encode a reductase protein having about, at least about, or at most about 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 100%, 100.1%, 100.2%, 100.3%, 100.4%, 100.5%, 100.6%, 100.7%, 100.8%, 100.9%, 101%, 105%, 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 160%, 170%, 180%, 190%, 200%, 220%, 240%, 260%, 280%, 300%, 320%, 340%, 360%, 380%, or 400% reductase activity relative to a protein comprising the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74. A recombinant reductase gene may encode a protein having, having at least, or having at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, or 500 contiguous amino acids starting at amino acid position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, or 500 of the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74.
[0145] Substrates for the reductase protein may include any fatty acid from 14 to 20 carbons long with a methylene substitution in the .DELTA.9, .DELTA.10, or .DELTA.11 position. The fatty acid substrate may be 14, 15, 16, 17, 18, 19, or 20 carbons long, or any range derivable therein. The reductase protein may be capable of catalyzing the reduction of a methylene-substituted fatty acid substrate to a (methyl)lipid. The reductase protein, together with a methyltransferase protein, may be capable of catalyzing the production of a methylated branch from any fatty acid from 14 to 20 carbons long with an unsaturated double bond in the .DELTA.9, .DELTA.10, or .DELTA.11 position.
[0146] In some embodiments, the recombinant reductase gene encodes a reductase protein that includes a Flavin adenine dinucleotide (FAD) binding domain. In some embodiments, the FAD binding domain has at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.9%, or 100% sequence identity to amino acids 9-141 of T. curvata TmsA (SEQ ID NO:74) or to a corresponding portion of TmsA from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, or Thermobifida fusca, according to the alignment set forth in FIGS. 20A-E.
[0147] In some embodiments, the recombinant reductase gene encodes a reductase protein that includes a FAD/FMN-containing dehydrogenase domain. In some embodiments, the FAD/FMN-containing dehydrogenase domain has, has at least, or has at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to amino acids 22-444 of T. curvata TmsA (SEQ ID NO:74) or to a corresponding portion of TmsA from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, or Thermobifida fusca, according to the alignment set forth in FIGS. 20A-E.
[0148] In some embodiments, the recombinant reductase gene encodes a reductase protein that has specific amino acids unchanged from the amino acid sequence set forth in SEQ ID NO:2, SEQ ID NO:6, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:58, SEQ ID NO:62, SEQ ID NO:68, or SEQ ID NO:74. The unchanged amino acids can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, or amino acids selected from R31, A33, S37, N38, L39, F40, R43, D52, V59, D63, G73, M74, T76, Y77, D79, L80, V81, L85, P91, V93, V94, Q96, L97, T99, I100, T101, A105, G108, G110, E112, 5113, 5115, F116, R117, N118, P121, H122, E123, V125, E127, G133, P154, N155, Y157, Y162, L166, E171, V173, V177, H181, V208, G213, F216, Y222, L223, 5236, D237, Y238, T239, Y245, 5247, D254, T257, Y261, W263, R264, W265, D266, D268, W269, C272, A275, G277, Q279, R284, W287, R293, 5294, G318, E232, V325, P328, E330, F339, F343, W353, C355, P356, W363, L365, Y366, P367, N376, F379, W380, V383, P384, N395, E399, G407, H408, K409, 5410, L411, Y412, 5413, Y417, F422, Y426, G428, R443, L447, and V452 of T. curvata TmsA (SEQ ID NO:74) or corresponding amino acids in TmsA from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, or Thermobifida fusca, according to the alignment set forth in FIGS. 20A-E.
[0149] iii. Nucleic Acids Comprising a Recombinant tmsC Gene.
[0150] A nucleic acid may comprise a 10-methylstearic C gene (trnsC), as described herein. A trnsC gene (e.g., a recombinant trnsC gene) may comprise any one of the nucleotide sequences set forth in SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:55, SEQ ID NO:65, and SEQ ID NO:71. A tmsC gene (e.g., a recombinant tmsC gene) may be derived from a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A tmsC gene (e.g., a recombinant tmsC gene) may be selected from the group consisting of Corynebacterium glyciniphilium gene tmsC, Mycobacterium austroafricanum gene tmsC, Mycobacterium gilvum gene tmsC, Mycobacterium vanbaalenii gene tmsC, Streptomyces regnsis gene tmsC, and Thermobifida fusca gene tmsC.
[0151] A recombinant tmsC gene may be recombinant because it is operably-linked to a promoter other than the naturally-occurring promoter of the tmsC gene. Such genes may be useful to drive transcription in a particular species of cell. A recombinant tmsC gene may be recombinant because it contains one or more nucleotide substitutions relative to a naturally-occurring tmsC gene. Such genes may be useful to increase the translation efficiency of the tmsC gene's mRNA transcript in a particular species of cell.
[0152] A nucleic acid may comprise a recombinant tmsC gene and a promoter, wherein the recombinant trnsC gene and promoter are operably-linked. The recombinant trnsC gene and promoter may be derived from different species. For example, the recombinant trnsC gene may encode the tmsC protein of a gram-positive species of Actinobacteria, and the recombinant tmsC gene may be operably-linked to a promoter that can drive transcription in another phylum of bacteria (e.g., a Proteobacterium, such as E. coli) or a eukaryote (e.g., an algae cell, yeast cell, or plant cell). The promoter may be a eukaryotic promoter. A cell may comprise the nucleic acid, and the promoter may be capable of driving transcription in the cell. A cell may comprise a recombinant tmsC gene, and the recombinant tmsC gene may be operably-linked to a promoter capable of driving transcription of the recombinant tmsC gene in the cell. The cell may be a species of yeast, and the promoter may be a yeast promoter. The cell may be a species of bacteria, and the promoter may be a bacterial promoter (e.g., wherein the bacterial promoter is not a promoter from Actinobacteria). The cell may be a species of algae, and the promoter may be an algae promoter. The cell may be a species of plant, and the promoter may be a plant promoter.
[0153] A recombinant tmsC gene may be operably-linked to a promoter that cannot drive transcription in the cell from which the recombinant tmsC gene originated. For example, the promoter may not be capable of binding an RNA polymerase of the cell from which a recombinant tmsC gene originated. In some embodiments, the promoter cannot bind a prokaryotic RNA polymerase and/or initiate transcription mediated by a prokaryotic RNA polymerase. In some embodiments, a recombinant tmsC gene is operably-linked to a promoter that cannot drive transcription in the cell from which the protein encoded by the gene originated. For example, the promoter may not be capable of binding an RNA polymerase of a cell that naturally expresses the tmsC enzyme encoded by a recombinant tmsC gene.
[0154] A promoter may be an inducible promoter or a constitutive promoter. A promoter may be any one of the promoters described in PCT Patent Application Publication No. WO 2016/014900, published Jan. 28, 2016 (hereby incorporated by reference in its entirety). WO 2016/014900 describes various promoters derived from yeast species Yarrowia lipolytica and Arxula adeninivorans, which may be particularly useful as promoters for driving the transcription of a recombinant gene in a yeast cell. A promoter may be a promoter from a gene encoding a Translation Elongation factor EF-1.alpha.; Glycerol-3-phosphate dehydrogenase; Triosephosphate isomerase 1; Fructose-1,6-bisphosphate aldolase; Phosphoglycerate mutase; Pyruvate kinase; Export protein EXP1; Ribosomal protein S7; Alcohol dehydrogenase; Phosphoglycerate kinase; Hexose Transporter; General amino acid permease; Serine protease; Isocitrate lyase; Acyl-CoA oxidase; ATP-sulfurylase; Hexokinase; 3-phosphoglycerate dehydrogenase; Pyruvate
[0155] Dehydrogenase Alpha subunit; Pyruvate Dehydrogenase Beta subunit; Aconitase; Enolase; Actin; Multidrug resistance protein (ABC-transporter); Ubiquitin; GTPase; Plasma membrane Na+/P.sub.i cotransporter; Pyruvate decarboxylase; Phytase; or Alpha-amylase, e.g., wherein the gene is a yeast gene, such as a gene from Yarrowia lipolytica or Arxula adeninivorans.
[0156] A recombinant tmsC gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:55, SEQ ID NO:65, or SEQ ID NO:71. A recombinant tmsC may or may not have 100% sequence identity with any one of the nucleotide sequences set forth in SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:55, SEQ ID NO:65, and SEQ ID NO:71. A recombinant tmsC gene may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:55, SEQ ID NO:65, and SEQ ID NO:71, and the recombinant tmsC gene may encode a tmsC protein with at least about 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:38, SEQ ID NO:56, SEQ ID NO:66, and SEQ ID NO:72.
[0157] A recombinant trnsC gene may vary from a naturally-occurring trnsC gene because the recombinant trnsC gene may be codon-optimized for expression in a eukaryotic cell, such as a plant cell, algae cell, or yeast cell. A cell may comprise a recombinant tmsC gene, wherein the recombinant tmsC gene is codon-optimized for the cell.
[0158] Exactly, at least, or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, or 500 codons of a recombinant trnsC gene may vary from a naturally-occurring trnsC gene or may remain unchanged from a naturally-occurring trnsC gene. For example, a recombinant tmsC gene may comprise a nucleotide sequence with at least about 65% sequence identity with the naturally-occurring nucleotide sequence set forth in SEQ ID NO:21, SEQ ID NO:31, SEQ ID NO:37, SEQ ID NO:55, SEQ ID NO:65, or SEQ ID NO:71 (e.g., at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity), and at least 5 codons of the nucleotide sequence of the recombinant tmsC gene may vary from the naturally-occurring nucleotide sequence (e.g., at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 codons).
[0159] A trnsC gene encodes a tmsC protein. A tmsC protein may be a protein expressed by a gram-positive species of Actinobacteria, such as Mycobacteria, Corynebacteria, Nocardia, Streptomyces, or Rhodococcus. A recombinant tmsC gene may encode a naturally-occurring tmsC protein even if the recombinant tmsC gene is not a naturally-occurring tmsC gene. For example, a recombinant tmsC gene may vary from a naturally-occurring tmsC gene because the recombinant trnsC gene is codon-optimized for expression in a specific cell. The codon-optimized, recombinant trnsC gene and the naturally-occurring trnsC gene may nevertheless encode the same naturally-occurring tmsC protein.
[0160] A recombinant tmsC gene may encode a tmsC protein selected from Corynebacterium glyciniphilium enzyme tmsC, Mycobacterium austroafricanum enzyme tmsC, Mycobacterium gilvum enzyme tmsC, Mycobacterium vanbaalenii enzyme tmsC, Streptomyces regnsis enzyme tmsC, and Thermobifida fusca enzyme tmsC. A recombinant tmsC gene may encode a tmsC protein, and the tmsC protein may be substantially identical to any one of the foregoing enzymes, but the recombinant tmsC gene may vary from the naturally-occurring gene that encodes the enzyme. The recombinant tmsC gene may vary from the naturally-occurring gene because the recombinant trnsC gene may be codon-optimized for expression in a specific phylum, class, order, family, genus, species, or strain of cell.
[0161] The sequences of naturally-occurring tmsC proteins are set forth in SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:38, SEQ ID NO:56, SEQ ID NO:66, and SEQ ID NO:72. A recombinant tmsC gene may or may not encode a protein comprising 100% sequence identity with the amino acid sequence set forth in SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:38, SEQ ID NO:56, SEQ ID NO:66, and SEQ ID NO:72. For example, a recombinant trnsC gene may encode a protein having 100% sequence identity with a biologically-active portion of an amino acid sequence set forth in SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:38, SEQ ID NO:38, SEQ ID NO:56, SEQ ID NO:66, and SEQ ID NO:72. A recombinant trnsC gene may encode a tmsC protein having at least about 95%, 96%, 97%, 98%, or 99% sequence identity with the amino acid sequence set forth in SEQ ID NO:22, SEQ ID NO:32, SEQ ID NO:38, SEQ ID NO:56, SEQ ID NO:66, or SEQ ID NO:72, or a biologically-active portion thereof.
[0162] iv. Nucleic acids comprising a recombinant methyltransferase gene and a recombinant reductase gene
[0163] A nucleic acid may comprise both a recombinant methyltransferase gene and a recombinant reductase gene. The recombinant methyltransferase gene and the recombinant reductase gene may encode proteins from the same species or from different species. A nucleic acid may comprise a recombinant methyltransferase gene, a recombinant reductase gene, and/or a trnsC gene. A recombinant methyltransferase gene, recombinant reductase gene, and a trnsC gene may encode proteins from 1, 2, or 3 different species (i.e., the genes may each be from the same species, two genes may be from the same species, or all three genes may be from different species). A nucleic acid may comprise the nucleotide sequence set forth in SEQ ID NO:77, SEQ ID NO:78, or SEQ ID NO:79. A nucleic acid may comprise a nucleotide sequence with, with at least, or with at most 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the nucleotide sequence set forth in SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:83, SEQ ID NO:84, SEQ ID NO:85, SEQ ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, or SEQ ID NO:92.
[0164] In some embodiments, the nucleic acid encodes a fusion protein that includes both a methyltransferase and a reductase or fragments thereof. In the context of the present invention, "fusion protein" means a single protein molecule containing two or more distinct proteins or fragments thereof, covalently linked via peptide bond in a single peptide chain. In some embodiments, the fusion protein comprises enzymatically active domains from both a methyltransferase protein and a reductase protein. The nucleic acid may further encode a linker peptide between the methyltransferase and the reductase. In some embodiments, the linker peptide comprises the amino acid sequence AGGAEGGNGGGA. The linker may comprise about or at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, or 30 amino acids, or any range derivable therein. The nucleic acid may comprise any of the methyltransferase and reductase genes described herein, and the fusion protein encoded by the nucleic acid can comprise any of the methyltransferase and reductase proteins described herein, including biologically active fragments thereof. In some embodiments, the fusion protein is a tmsA-B protein, in which the TmsA protein is closer to the
[0165] N-terminus than the TmsB protein. An example of such a tmsA-B protein is encoded by the nucleic acid sequence of SEQ ID NO:97. In some embodiments, the fusion protein is a tmsB-A protein, in which the tmsB protein is closer to the N-terminus than the tmsA protein. An example of such a tmsB-A protein is encoded by the nucleic acid sequence of SEQ ID NO:98. In some embodiments, the fusion protein has at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% identity to the amino acid sequence of a fusion protein encoded by SEQ ID NO:97 or SEQ ID NO:98.
C. Compositions
[0166] Various aspects of the invention relate to compositions produced by the cells described herein. The composition may be an oil composition comprised of about or at least about 75%, 80%, 85%, 90%, 95%, or 99% lipids. The composition may comprise branched (methyl)lipids and/or exomethylene-substituted lipids. The branched (methyl)lipid may be a carboxylic acid (e.g., 10-methylstearic acid, 10-methylpalmitic acid, 12-methyloleic acid, 13-methyloleic acid, 10-methyl-octadec-12-enoic acid), carboxylate (e.g., 10-methylstearate, 10-methylpalmitate, 12-methyloleate, 13-methyloleate, 10-methyl-octadec-12-enoate), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylstearyl CoA, 10-methylpalmityl CoA, 12-methyloleoyl CoA, 13-methyloleoyl CoA, 10-methyl-octadec-12-enoyl CoA), or amide. The exomethylene-substituted lipid may be a carboxylic acid (e.g., 10-methylenestearic acid, 10-methylenepalmitic acid, 12-methyleneoleic acid, 13-methyleneoleic acid, 10-methylene-octadec-12-enoic acid), carboxylate (e.g., 10-methylenestearate, 10-methylenepalmitate, 12-methyleneoleate, 13-methyleneoleate, 10-methylene-octadec-12-enoate), ester (e.g., diacylglycerol, triacylglycerol, phospholipid), thioester (e.g., 10-methylenestearyl CoA, 10-methylenepalmityl CoA, 12-methyleneoleoyl CoA, 13-methyleneoleoyl CoA, 10-methylene-octadec-12-enoyl CoA), or amide. 10-methyl lipids, 10-methylene lipids, or both. It is specifically contemplated that one or more of the above lipids may be excluded from certain embodiments. In some aspects, the composition is produced by cultivating a culture comprising any of the cells described herein and recovering the oil composition from the cell culture. The cells in the culture may contain any of the recombinant methyltransferase genes described herein and/or any of the recombinant reductase genes described herein. The culture medium and conditions can be chosen based on the species of the cell to be cultured and can be optimized to provide for maximal production of the desired lipid profile.
[0167] Various methods are known for recovering an oil composition from a culture of cells. For example, lipids, lipid derivatives, and hydrocarbons can be extracted with a hydrophobic solvent such as hexane. Lipids and lipid derivatives can also be extracted using liquefaction, oil liquefaction, and supercritical CO2 extraction. The recovery process may include harvesting cultured cells, such as by filtration or centrifugation, lysing cells to create a lysate, and extracting the lipid/hydrocarbon components using a hydrophobic solvent.
[0168] In addition to accumulating within cells, the lipids described herein may be secreted by the cells. In that case, a process for recovering the lipid may not require creating a lysate from the cells, but collecting the secreted lipid from the culture medium. Thus, the compositions described herein may be made by culturing a cell that secretes one of the lipids described herein, such as a a linear fatty acid with a chain length of 14-20 carbons with a methyl branch at the .DELTA.9, .DELTA.10, or .DELTA.11 position.
[0169] In some embodiments, the oil composition comprises about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% by weight of a branched (methyl)lipid, such as a 10-methyl fatty acid, or any range derivable therein. In some embodiments, 10-methyl fatty acids comprise about, at least about, or at most about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% by weight of the fatty acids in the composition, or any range derivable therein.
D. Methods of Producing a Branched (Methyl)Lipid
[0170] Various aspects of the invention relate to a method of producing a branched (methyl)lipid. The method may comprise incubating a cell or plurality of cells as described herein, supra, with media. The media may optionally be supplemented with an unbranched, unsaturated fatty acid, such as oleic acid, that serves as a substrate for methylation. The media may optionally be supplemented with methionine or s-adenosyl methionine, which may similarly serve as a substrate. Thus, the method may comprise contacting a cell or plurality of cells with oleic acid, methionine, or both. The method may comprise incubating a cell or plurality of cells as described herein, supra, in a bioreactor. The method may comprise recovering lipids from the cells and/or from the culture medium, such as by extraction with an organic solvent.
[0171] The method may comprise degumming the cell or plurality of cells, e.g., to remove proteins. The method may comprise transesterification or esterification of the lipids of the cells. An alcohol such as methanol or ethanol may be used for transesterification or esterification, e.g., thereby producing a fatty acid methyl ester or fatty acid ethyl ester.
EXEMPLIFICATION
[0172] The present description is further illustrated by the following examples, which should not be construed as limiting in any way.
Example 1: Identification of 10-Methylstearic Genes tmsA, tmsB, and tmsC
[0173] Two different genes have been identified as responsible for 10-methylstearate production in M. tuberculosis (see Meena, L. S., and P. E. Kolattukudy, BIOTECHNOLOGY & APPLIED BIOCHEMISTRY 60(4):412 (2013) and Meena, L. S., et al. BIOLOGICAL CHEMISTRY 394(7):871 (2013)). Curiously, neither gene is conserved throughout each Actinobacteria species that produces 10-methylstearate. While it is possible that different species of Actinobacteria each independently evolved genes that synthesize 10-methylstearate, such convergent evolution is rare. A simpler explanation is that a single common gene or set of genes is responsible for 10-methylstearate production in Actinobacteria.
[0174] To identify genes that may be responsible for 10-methylstearate production in Actinobacteria, genes with sequence homology to those that encode enzymes that catalyze lipid synthesis reactions were aligned from various species of 10-methylstearate-producing Actinobacteria. Two unique genes were identified and named 10-methystearic A (tmsA) and 10-methylstearic B (tmsB), which each occur in the same operon within each 10-methystearate producing species of Actinobacteria (FIGS. 3A and 3B). A third gene named 10-methylstearic C (trnsC) was identified as occurring in the same operon as trnsA and trnsB for some of the 10-methylstearate-producing species.
[0175] The 10-methylstearate B gene has sequence homology with cyclopropane synthases, which suggests that the 10-methylstearate B gene may be capable of transferring a methyl group to a fatty acid. The 10-methylstearic A gene has sequence homology with oxidoreductases, which suggests that it may be capable of reducing the exomethylene group of a branched fatty acid.
[0176] The 10-methylstearate A and 10-methylstearate B genes from M. smegmatis were cloned into a plasmid (named pNC704) for expression in E. coli (FIG. 4). The pNC704 plasmid harboring M. smegmatis tmsA and tmsB was used to transform E. coli. The transformed cells were grown for 20 hours at 37.degree. C. in LB media supplemented with 100 .mu.g/mL oleic acid. E. coli was transformed with an empty vector pNC53 (SEQ ID NO:81) and grown in parallel as a control. Each of two E. coli colonies transformed with pNC704 produced 10-methylstearate at a concentration of 2.0% and 2.1% of the total fatty acids in the cell (Table 1). The control did not produce 10-methylstearate
TABLE-US-00001 TABLE 1 Fatty acid concentration as a percentage of total cellular fatty acids. "10-MS" corresponds to 10-methylstearate Fatty acid composition % 10-MS % 16:1 % 16:0 % 18:0 % 18:1 E. coli TOP10 + 0.0 4.0 56.8 1.4 30.6 pNC53 E. coli TOP10 + 2.1 4.2 55.0 0.8 30.9 pNC704 isolate 1 E. coli TOP10 + 2.0 3.9 55.5 0.8 30.8 pNC704 isolate 2
[0177] Cellular lipids were transesterified to produce fatty acid methyl esters (FAMEs) in a solution of HCl in methanol. Stearic acid, 10-methylstearic acid, and oleic acid were transesterified into FAMEs as standards. Each sample/standard was extracted into isooctane and analyzed by various gas chromatography methods (FIGS. 7 and 8). FAMEs were first analyzed by capillary gas chromatography using a flame-ionization detector (GC-FID). The FAMEs produced from E. coli displayed a GC peak corresponding to the 10-methylstearic acid FAME standard, which suggests that the M. smegmatis tmsA and tmsB genes express proteins that are capable of synthesizing 10-methylstearic acid (FIG. 7A).
[0178] FAMEs were also produced from E. coli that was transformed with the empty vector pNC53 and analyzed by GC-FID as above. This sample did not display a GC peak corresponding to the 10-methylstearic acid FAME, further suggesting that the M. smegmatis tmsA and tmsB genes express proteins that are capable of synthesizing 10-methylstearic acid (FIG. 7B). The FAMEs produced from the tmsA/tmsB sample were analyzed using a GC-MS configured in single-ion monitoring mode (SIM), which monitored m/z at 312.3 and 313.3 amu. The mass spectrum displayed a peak at 312.3 amu, corresponding to the molecular weight of a 10-methylstearate methyl ester (FIG. 8B). Additionally, the ratio of the peak at 312.3 amu to 313.3 amu suggests that the ion observed at 312.3 amu contains 20.6 carbons, which corresponds to the actual number of carbons (20) in the 10-methylstearate methyl ester.
Example 2: Production of 10-Methyl Fatty Acid in E. coli Using tmsB and tmsA Genes from Different Donor Organisms
[0179] Methods:
[0180] Donor bacteria genomic DNA was obtained from Deutsche Sammlung von Mikroorganismen and Zellkulturen (DSMZ), Germany. Plasmids were constructed with standard molecular biology techniques using the "yeast gap repair" method (Shanks, et al., Appl. Microbiol. Biotechnol., 48:232 (1997)). The empty E. coli expression vector pNC53 (SEQ ID NO:82) was restriction digested with enzyme PmeI (New England Biolabs, MA), creating a double strand break between the tac promoter and trpT' terminator sequences on this vector. tmsAB gene operons were PCR amplified from genomic DNA with primer flanking sequence such that the tmsB ATG start site integrated into the end of the tac promoter via homologous recombination. E. coli transcription and translation was driven by the tac promoter. The stop codon of the tmsA gene similarly integrated into the beginning of the trpT' terminator region. E. coli translation of the operon-embedded tmsA gene relied on native translation signals from the donor organism DNA. Where necessary, the first codon of tmsB was altered from GTG or TTG to ATG; otherwise the native codon sequence was kept in the E. coli expression vectors.
[0181] Vectors were checked by DNA sequencing and restriction digest for correct construction. The vectors created for this example are illustrated in FIG. 9. Vectors transformed into E. coli Top10 (Invitrogen) were then used for fermentation studies. Cells were inoculated in 50 mL LB medium supplemented with 100 mg/L ampicillin and 100 mg/L oleic acid from a stock solution of 100 mg/mL oleic acid in ethanol. Cultures were incubated at 37.degree. C. and 200 rpm in baffled shake flasks for 41 hours. At the end of cultivation, cells were harvested by centrifugation at 4000 rpm for 15 minutes in an Eppendorf 5810 R clinical centrifuge, washed once with and equal volume of deionized water, resuspended in 0.1 mL deionized water, and frozen at -80.degree. C. Cells were then lyophilized to dryness and used to perform an acid-catalyzed transesterification with a solution of 0.5 N HCl in methanol (20.times.1 mL ampule, Sigma) at 85.degree. C. for 90 minutes. After the transesterification was completed, the lipid-soluble components of the reaction mixture were separated from the water-soluble components using a two-phase liquid extraction by adding water and isooctane and subsequently analyzed with a capillary gas chromatograph (GC) equipped with a robotic injector, flame ionization detector (Agilent Technologies 7890B GC system and 7396 Autosampler) and HP-INNOWAX capillary column (30 m.times.0.25 mm.times.0.15 micrometers, Agilent). A 10-methylstearic acid reference standard was obtained from Larodan AB, Sweden.
[0182] Results:
[0183] Conversion of oleic acid to 10-methylstearic acid was observed for 4 of the 11 vectors tested. Highest percent conversion occurred with tmsAB genes from Thermobifida fusca (22%) and Thermomonospora curvata (38%), as indicated in Table 2 below.
TABLE-US-00002 TABLE 2 % oleic acid conversion E. coli to 10-methyl- vector Sequence Donor organism stearic acid pNC704 SEQ ID NO: 77 Mycobacterium smegmatis 4.9% .+-. 0.6% pNC721 SEQ ID NO: 83 Mycobacterium vanbaaleni 0 pNC755 SEQ ID NO: 84 Amycolicicoccus subflavus 0 pNC757 SEQ ID NO: 85 Corynebacterium 0 glyciniphilum pNC904 SEQ ID NO: 86 Rhodococcus opacus 1.2% .+-. 0.2% pNC905 SEQ ID NO: 87 Thermobifida fusca 22.0% .+-. 0.3% pNC906 SEQ ID NO: 88 Thermomonospora curvata 38.3% .+-. 0.5% pNC907 SEQ ID NO: 89 Corynebacterium 0 glutamicum pNC908 SEQ ID NO: 90 Agromyces subbeticus 0 pNC910 SEQ ID NO: 91 Mycobacterium gilvum 0 pNC911 SEQ ID NO: 92 Mycobacterium sp. 0 indicus
Example 3: tmsB and tmsA Expression in Rhocococcus opacus PD630
[0184] The oleaginous bacteria Rhocococcus opacus can produce 10-methyl fatty acids natively at low levels (0.2% of total fatty acids (Waltermann et al., Microbiology, 72:5027 (2006)), and additionally possesses native homologs of the tmsB and tmsA gens, although they have not been identified as such in the literature. In this Example, the inventors tested whether overexpression of the tmsB and tmsA genes in R. opacus can increase 10-methyl branched fatty acid content.
[0185] Methods:
[0186] Rhodococcus opacus PD630 was obtained from the German Collection of Microorganisms and Cell Cultures (DSMZ) from stock DSM 44193. The culture was revived by dilution with 4 mL LB media and incubated at 30.degree. C. for 3 days in a drum roller. Once visible growth occurred, 10 .mu.L broth was struck to single colonies on an LB plate and incubated an additional 3 days at 30.degree. C. One colony was isolated and designated strain NS1104.
[0187] All R. opacus growth was performed at 30.degree. C. Routine culturing was performed in LB medium supplemented with appropriate antibiotics. Genetic transformation was performed in Nutrient Broth medium as modified by Kalscheuer et al. (Appl. Microbiol. and Biotechnol., 52:508 (1999)), which contained 5 g/L peptone, 2 g/L yeast extract, 1 g/L beef extract, 5 g/L NaCl, 8.5 g/L glycine, and 10 g/L sucrose. Lipid production was performed in defined medium containing the following components and adjusted to pH 7.6 with NaOH and filter sterilized before use.
R. opacus Fermentation Medium
TABLE-US-00003 Component g/L Glucose 40 (NH.sub.4).sub.2SO.sub.4 1.4 MgSO.sub.4.cndot.7H.sub.2O 1 CaCl.sub.2.cndot.6H.sub.2O 0.02 KH.sub.2PO.sub.4 0.4 MOPS acid 5 Trace element solution 1 mL
TABLE-US-00004 Trace element solution g/L stock solution FeSO.sub.4.cndot.7H.sub.2O 0.5 CuSO.sub.4.cndot.5H.sub.2O 0.005 ZnSO.sub.4.cndot.7H.sub.2O 0.4 MnCl.sub.2.cndot.2H.sub.2O 0.02 Na.sub.2MoO.sub.4.cndot.2H.sub.2O 0.02 CoC1.sub.2.cndot.6H.sub.2O 0.05 EDTA 0.25 H.sub.3BO.sub.3 0.015 NiCl.sub.2.cndot.6H.sub.2O 0.01
[0188] Plasmids were constructed with standard molecular biology techniques using the "yeast gap repair" method (Shanks et al., Applied and Environmental Biology 72:5207-36 (2006)). A synthetic DNA sequence containing the Rhodococcus repA origin of replication and gentamicin resistance marker (Lessard, BMC Microbiol., 4:15 (2004)) was used to create a R. opacus-E. coli-S. cerevisiae shuttle vector from two plasmids containing the tmsAB genes from Mycobacterium smegmatis and Thermobifida fusca under control of the tac promoter. Briefly, the repA and gen.sup.R synthetic DNA was constructed with approximately 50 bp flanking homology regions to the tmsAB destination plasmids. Destination plasmids were restriction digested with PacI, and the flanking homology regions repaired the gap, enabling genetic selection via the ura3 gene in S. cerevisiae. DNA was isolated from S. cerevisiae by phenol/chloroform extraction and ethanol precipitation and used to transform E. coli. Correct plasmid constructions were isolated by mini-prep (Qiagen, USA) and screened by restriction digest. Plasmids pNC985 (SEQ ID NO:93), containing M. smegmatis tmsAB, and pNC986 (SEQ ID NO:94) (FIG. 10), containing T. fusca tmsAB were isolated and used to transform R. opacus.
[0189] R. opacus was transformed following the protocol described by Kalscheuer et al. (Kalscheuer 1999). Cells were grown overnight in modified nutrient broth, then transferred to 50 mL modified nutrient broth medium at a starting optical density of 0.13. Cells were harvested at OD 0.36, washed twice in 50 mL ice cold water, and resuspended in 1.7 mL ice cold water. Cells were then subdivided to 350 .mu.L volumes and 2 .mu.L plasmid DNA at 400-600 ng/.mu.L concentration. Cells plus DNA were incubated at 39.degree. C. for 5 minutes immediately prior to cooling on ice and electrotransformation. Electric pulses were delivered using 2 mm gap cuvettes with a 2 kV pulse (600 .OMEGA., 25 .mu.F, 12 ms time constant). Cells were then diluted with 600 .mu.L SOC medium and incubated overnight at 30.degree. C. 200 .mu.L overnight cell broth was then plated on LB agar containing 10 .mu.g/mL gentamicin and incubated an additional 4 days at 30.degree. C. for colony formation. Gentamicin resistant colonies were picked for further analysis, no resistant colonies were seen on control plates without added plasmid DNA.
[0190] Fermentation was performed at 30.degree. C. for 4 days in 250 mL shake flasks (25 mL working volume with defined medium, 10 .mu.g/mL gentamicin added as appropriate) at 200 rpm. Inoculum was prepared from 48 hour grown cultures in LB+10 .mu.g/mL gentamicin. Inoculation amount was 1:25 v/v of the final volume. At the end of fermentation cells were harvested and resuspended in 1 mL distilled water and frozen at -80.degree. C. After freezing, cells were lyophilized to dryness and then whole cells were transesterified in situ with methanolic HCl at 80.degree. C. before extraction into isooctane and quantification by gas chromatography with flame ionization detection.
[0191] Results:
[0192] R. opacus was transformed with two vectors, pNC985 expressing the M. smegmatis tmsAB genes, and pNC986 expressing the T. fusca tmsAB genes. As shown in Table 3 below, one isolate of the pNC986 transformation, strain NS1155, produced 10-methylstearic acid at 7.2% by weight of total fatty acids, as compared to the control strain NS1104 at 3.6% by weight of total fatty acids.
TABLE-US-00005 TABLE 3 Weight percent 10-methylstearic acid measured in R. opacus strains transformed with tmsAB expression vectors. 10-methylstearic acid Description (% of total FA) R. opacus PD630 (NS1104) 3.6 R. opacus + pNC985 #1 (Msm tmsAB) 3.9 R. opacus + pNC985 #2 3.3 R. opacus + pNC985 #3 3.3 R. opacus + pNC986 #1 (Tfu tmsAB) 7.2 R. opacus + pNC986 #2 3.0 R. opacus + pNC986 #3 3.1
Example 4: Acyl Chain Substrate Range for tmsB and tmsA
[0193] The inventors performed the following experiments to determine the acyl-chain substrate range of the tmsB and tmsA enzymes from Thermomonospora curvata, particularly the fatty acid chain length and double bond position.
[0194] Methods:
[0195] Unsaturated fatty acids were purchased from Nu-Check Prep, Inc., Elysian MN. Fatty acids were dissolved in DMSO at a concentration of 100 mg/mL, with the exceptions of palmitoleic acid, oleic acid, and vaccenic acid, which were dissolved in ethanol at a concentration of 100 mg/mL. A 10-methyl stearic acid reference standard was obtained from Larodan AB, Sweden.
[0196] E. coli strains NS1161 and NS1162 were used in this experiment; strain NS1161 was constructed by transforming the control (empty) vector plasmid into E. coli CGSC 9407 (aka JW1653-1 Keio collection) which holds a kan.sup.R disruption of the native E. coli cyclopropane fatty acid synthase (cfa) gene. Strain NS1162 was constructed by transforming plasmid pNC906 (SEQ ID NO:88) (FIG. 9B), containing the T. curvata tmsB and tmsA genes under control of the constitutive tac promoter, into E. coli CGSC 9407.
[0197] E. coli strains were grown in LB media supplemented with 100 mg/L ampicillin and 100 mg/L of fatty acid. Cultures were inoculated with a 1:1000 dilution of overnight pre-culture and grown in 14 mL plastic culture tubes with a 5 mL working volume at 37.degree. C. in a rotary drum roller for 24 hours. At the end of cultivation cells were harvested by centrifugation at 4000 rpm for 15 minutes in an Eppendorf 5810 R clinical centrifuge, washed once with and equal volume of deionized water, resuspended in 0.1 mL deionized water, and frozen at -80.degree. C. Cells were then lyophilized to dryness and used to perform a HCl-methanol catalyzed transesterification reaction to produce fatty acid methyl esters (FAME). These samples were dissolved in isooctane and injected into a gas chromatography system (Agilent Technologies) equipped with a flame ionization detector.
[0198] Results:
[0199] When fed exogenous free fatty acids, E. coli can incorporate them into its phospholipids and other lipid structures. Strains NS1161 and NS1162 were cultured with 18 different unsaturated fatty acids and in a control medium with no fatty acid supplementation, and FAME profiles for the two strains were compared. To identify new unsaturated fatty acids, a GC peak corresponding to the supplemented fatty acid was identified via the strain NS1161 FAME profile as compared to the un-supplemented reference culture. and then the strain NS1162 FAME profile was checked for the same GC peak, and a new peak at a characteristic retention time shift (0.24 to 0.08 minutes forward, with the relative shift decreasing as overall retention time increases) corresponding to a methylated fatty acid. A 10-methyl stearic acid reference standard (Larodan AB, Sweden) was used as a control to assign retention time to 10-methylstearic acid.
[0200] As observed in Table 4 below, methylation occurred on fatty acids with 14, 15, 16, 17, 18, 19 and 20 carbons, and on 49, MO, and 411 double bond positions. The highest percent conversion to methylated fatty acids occurred at 16 and 18 carbon fatty acids at the 49 and 411 positions.
TABLE-US-00006 TABLE 4 Unsatu- Methyl- rated branched % FA FA conversion Retention retention to methyl time time branched Fatty acid Name (min) (min) FA 12:1.DELTA.11 11-Dodecenoic acid 4.627 -- 0.0% 13:1.DELTA.12 12-Tridecenoic acid 5.765 -- 0.0% 14:1.DELTA.9 Myristoleic acid 6.785 6.546 3.4% 15:1.DELTA.10 10-Pentadecenoic 7.926 7.715 1.7% acid 16:1.DELTA.9 Palmitoleic acid 8.907 8.772 30.4% 17:1.DELTA.10 10-Heptadecenoic 9.999 9.859 11.1% acid 18:1.DELTA.6 Petroselinic acid 10.943 -- 0.0% 18:1.DELTA.9 Oleic acid 10.978 10.862 33.7% 18:1.DELTA.11 Vaccenic acid 11.065 10.917 21.8% 18:1.DELTA.9, Ricinoleic acid 12.737 -- 0.0% 12-OH 18:1.DELTA.9, 12 Linoleic acid 11.656 -- 0.0% 19:1.DELTA.7 7-Nondecenoic acid 11.941 -- 0.0% 19:1.DELTA.10 10-Nondecenoic acid 12.01 11.888 6.1% 20:1.DELTA.5 5-Eicosenoic acid 12.652 -- 0.0% 20:1.DELTA.8 8-Eicosenoic acid 12.713 -- 0.0% 20:1.DELTA.11 11-Eicosenoic acid 12.743 12.666 2.2% 22:1.DELTA.13 Erucic acid 13.406 -- 0.0% 24:1.DELTA.15 Nervonic acid 13.86 -- 0.0%
Example 5: tmsA Co-Factor Usage
[0201] The inventors performed the following experiments to determine which redox co-factor the tmsA enzyme (10-methylene reductase) uses to produce fully saturated 10-methyl fatty acids from the intermediate 10-methylene fatty acids.
[0202] Methods:
[0203] E. coli strains NS1161, NS1163, and NS1164 were used in this experiment; strain NS1161 was constructed by transforming the control (empty) vector plasmid pNC53 into E. coli CGSC 9407 (aka JW1653-1 Keio collection) which holds a kan.sup.R disruption of the native E. coli cyclopropane fatty acid synthase (cfa) gene. Strain NS1163 was constructed by transforming plasmid pNC963 (SEQ ID NO:95) (FIG. 11), containing the T. curvata tmsB gene under control of the constitutive tac promoter, into E. coli CGSC 9407. Strain NS1164 was constructed by transforming plasmid pNC964 (SEQ ID NO:96) (FIG. 11), containing the T. curvata tmsA gene under control of the constitutive tac promoter, into E. coli CGSC 9407.
[0204] Strain NS1163 was grown in 1 L LB media supplemented with 100 mg/L ampicillin for 24 hours at 37.degree. C. (2.times.500 mL in 2 L baffled flasks). After cultivation, cells were harvested by centrifugation at 4000 rpm for 15 minutes in an Eppendorf 5810 R clinical centrifuge and washed twice in 100 mL PBS buffer. After concentration to 40 mL PBS buffer, cells were heat inactivated at 85.degree. C. for 30 min. Inactivated cells were then dispensed into 1 mL aliquots and disrupted with 0.3 grams of 0.1 mm glass beads using a MP fastprep-24 on "E. coli" setting (MP biomedicals, LLC). Whole cell lysed suspension was collected by micro-centrifugation at 2000.times.g for 30 seconds to remove beads and then 0.7 mL of suspension per tube was transferred to new tubes and frozen at -80.degree. C. until further use.
[0205] On the day of assay, strains NS1161 and NS1164 were grown via inoculation from overnight cultures (1:1000 dilution) in 50 mL LB medium supplemented with 100 mg/L ampicillin in 37.degree. C. and 200 rpm in baffled shake flasks. After 4 hours of cultivation, cells were harvested at 5.degree. C., washed 1.times. in ice cold PBS and then resuspended in 750 .mu.L PBS in 1 mL plastic screw tubes. 0.3 grams of 0.1 mm glass beads were added and cells were lysed with a MP fastprep-24 on the "E. coli" setting. The cell suspension was then micro-centrifuged for 5 min at 12,000.times.g, and the supernatant transferred to a fresh tube and held on ice until assay.
Assay reaction: 700 .mu.L of NS1163 whole lysate, 200 .mu.L of 37.2 mg/mL NADPH solution (assay concentration 10 mM), 33.2 mg/mL NADH solution (assay concentration 10 mM), or PBS buffer, and 100 .mu.L of cell free extract or PBS buffer. Assay tubes were sealed and rotated on a drum roller at 37.degree. C. for 16 hours. To end the assay, tubes were frozen at -80.degree. C., then lyophilized to dryness followed by in situ extraction and transesterification with methanolic HCL. Fatty acid profiles were determined by GC with flame ionization detection, and the 10-methyl fatty acid peak area was compared to the total fatty acid peak area to determine assay activity.
[0206] Results:
[0207] Strain NS1163, which accumulates 10-methylene intermediate fatty acids via expression of the Thermomonospora curvata tmsB gene, was grown, harvested, inactivated, and lysed for use as a substrate for the tmsA (10-methylene reductase) assay. To this substrate cell-free extract E. coli strain NS1164 expressing the T. curvata tmsA gene or E. coli strain NS1161 containing an empty expression vector were added, along with NADPH or NADH. As observed Table 5 below, only the presence of T. curvata tmsA and NADPH resulted in synthesis of 10-methyl fatty acids in this assay.
TABLE-US-00007 TABLE 5 E. coli (.DELTA.cfa relative background) cell co- 10Me16 + 10Me18 free extract factor peak area SD Tcu tmsA NADPH 0.059 0.003 Tcu tmsA NADH ND Tcu tmsA none ND empty vector NADPH ND empty vector NADH ND empty vector none ND none NADPH ND none NADH ND none none ND ND = Not detected by this assay
Example 6: Expression of tmsB Genes in Yeast Yarrowia lipolytica and Arxula Adeninivorans
[0208] Sequences encoding the native bacterial codon tmsB sequences from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, Thermobifida fusca, and Thermomonospora curvata were cloned into a standard Yarrowia expression vector driven by the Y. lipolytica TEF1 promoter and containing an ARS68 Y. lipolytica replication origin, a nourseothricin antibiotic resistance gene for selection, and the 2.mu. origin and URA3 gene for high copy maintenance in Saccharomyces cerevisiae. Cloning was performed using the yeast-gap repair method (Shanks 2006) with selection on uracil dropout media. Y. lipolytica was transformed following a standard lithium acetate heat-shock protocol with selection on YPD medium supplemented with 500 .mu.g/mL nourseothricin. Colonies were selected and transferred to a 96 well plate containing 300 .mu.L nitrogen-limited lipid production media per well and incubated at 30.degree. C. with shaking at 900 rpm for 96 hours. The medium contained 100 g/L glucose, 0.5 g/L urea, 1.5 g/L yeast extract, 0.85 g/L casamino acids, 1.7 g/L YNB base without amino acids, and 5.1 g/L potassium hydrogen phthalate at pH 5.5. After fermentation, cells were centrifuged, washed with distilled water, and frozen at -80.degree. C. prior to lyophilization to dryness. Dried cells were transesterified in situ with 0.5 N HCl in methanol at 85.degree. C. for 90 minutes to produce fatty acid methyl esters (FAME) suitable for gas chromatography analysis. These samples were dissolved in isooctane and injected into a gas chromatography system (Agilent Technologies) equipped with a flame ionization detector. Total C16 and C18 branched fatty acids were identified and quantified based on known standards and the 10 methylene and 10 methyl fatty acids identified in E. coli tms expression experiments. 10-methyl and 10-methylene fatty acid identities were verified by mass spec in an independent experiment. FIG. 12 shows that Y. lipolytica transformed with tmsB from T. fusca and T. curvata produced the highest amounts of 10-methylene stearic acid.
[0209] To test tmsB activity in Arxula adeninivorans, the top performing tmsB gene from Yarrowia, T. curvata tmsB (SEQ ID NO:75) was cloned into a constitutive expression vector under the Arxula ADH1 promoter, resulting in plasmid pNC1065. Individual transformant colonies were isolated and grown in a standard industrial media (with a high C:N ratio to promote lipid accumulation) for 4 days at 40.degree. C. Cell pellets were isolated, washed once with water, and lyophilized. Total C16 and C18 fatty acids were transesterified as for Yarrowia strains and were analyzed by GC. FIG. 13 shows that A. adeninivorans transformed with tmsB from T. curvata produce 10-methylene fatty acids.
Example 7: tmsA and tmsB coexpression in Yarrowia lipolytica and Saccharomyces cerevisiae
[0210] The inventors discovered that simultaneous expression of trnsA and trnsB genes can produce branched 10-methyl and 10-methylene fatty acids, respectively, in Saccharomyces and Yarrowia yeast strains. For expression in Yarrowia, plasmids constitutively expressing the native bacterial sequences for tmsA from T. curvata (pNC984), T. fusca (pNC983) and C. glutamicum (pNC991) were each transformed into strain NS1117 containing a stably integrated copy of the T. curvata tmsB gene (isolated from Example 6 above). Individual transformants were isolated and grown for 4 days at 30.degree. C. in shake flask medium. Fatty acids were isolated and analyzed by GC as in Example 6. As shown in FIG. 14, all trnsA genes analyzed produce at detectible levels of 10 methyl fatty acids in Yarrowia, compared to the parental strain. The T. curvata tmsA gene produced more 10-methyl fatty acids than the other tmsA genes analyzed.
[0211] For expression in Saccharomyces, plasmids with demonstrated gene activity in Yarrowia, pNC984 (T. curvata tmsA with a NAT marker) and pNC1025 (T. curvata tmsB with a HYG marker) were transformed individually and together into S. cerevisiae strain NS20, and transformants were selected on media containing the appropriate antibiotic(s). Individual transformation isolates were grown for 2 days in YPD medium at 30.degree. C. Cell pellets were processed, and total fatty acids were analyzed as for Yarrowia. As shown in FIG. 15, the strain transformed with only tmsB produced only 10-methylene fatty acids, and the strain transformed with both tmsA and tmsB produced a relatively high percentage of 10-methyl fatty acids.
Example 8: Expression of a tmsA-B Fusion Protein in E. coli, Saccharomyces ceverisiae, Yarrowia lipolytica and Arxula adeninivorans
[0212] The inventors discovered that expressing the trnsA and trnsB enzymes in a single polypeptide improves conversion of 10-methylene fatty acids to 10-methyl fatty acids. Single proteins containing both tmsA and tmsB activity were created by fusing the genes for Thermomonospora curvata tmsA and tmsB in frame, separated by a flexible linker domain. The Thermomonospora curvata tmsA and tmsB enzymes were chosen because they produced the most 10-methyl branched fatty acids in yeast. A short 12 amino acid linker with the sequence AGGAEGGNGGGA which occurs naturally in the Yarrowia FAS2 gene was chosen to connect the two enzymes. Two fusion enzymes were tested for activity in bacteria and yeast, tmsA-B (NG540; encoded by SEQ ID NO:97) and tmsB-A (NG541; encoded by SEQ ID NO:98).
[0213] For E. coli expression, plasmids pNC1069 and pNC1070 containing the T. curvata tmsA-B and tmsB-A genes with the tac promoter and trpT' terminator were each transformed into E. coli CGSC 9407. Individual transformed strains were grown and total fatty acids were assayed as in Example 2 above. As shown in Table 6 below, both the tmsA-B and tmsB-A genes resulted in production of methylated stearic acid in E. coli.
TABLE-US-00008 TABLE 6 Methylation of oleic and vaccenic acid was calculated as the percent of C18:1 fatty acids converted into 10- and 12-methyl fatty acids. Vector % C18:1 methylated None 0 T. curvata tmsA-B 19.4 T. curvata tmsB-A 26.25
[0214] For Saccharomyces cerevisiae and Yarrowia lipolytica expression, NG540 (SEQ ID NO:97) and NG541 (SEQ ID NO:98) were individually cloned into standard Yarrowia expression vectors containing a yeast 2u origin of replication for high copy retention in Saccharomyces, resulting in the respective vectors pNC1067 and pNC1068.
[0215] Plasmids pNC1067 and pNC1068 were transformed into Saccharomyces strain NS20 by a standard protocol and individual transformed strains were selected for assay of branched fatty acid production. Strains were grown for 2 days at 30.degree. C. in 25 ml YPD medium. Cell pellets were lyophilized and total fatty acids were analyzed by basic transesterification and GC analysis as in Example 2. FIG. 16 shows that expression of both tmsA-B and tmsB-A in S. cerevisiae led to production of 10 methyl fatty acids.
[0216] Plasmids pNC1067 and pNC1068 were transformed into Yarrowia lipolytica by a standard heat shock protocol. Individual resulting transformant strains were chosen for analysis of 10-methylene and 10-methyl fatty acid production. Strains were grown and analyzed by GC as in Example 7. FIG. 17 shows that expression of both tmsA-B and tmsB-A in E lipolytica led to production of 10 methyl fatty acids, although tmsA-B was more efficient at converting 10-methylene fatty acids to 10-methyl fatty acids.
[0217] For expression in Arxula adeninivorans, NG540 was cloned into a standard expression vector containing the constitutive Arxula ADH1 promoter resulting in pNC1151. pNC1151 was transformed into Arxula strain NS1166 and individual transformants were selected to assay of 10-methyl fatty acid production. Arxula strains were grown and analyzed by GC as in Example 7.
[0218] These experiments showed that 10-methyl C16 and C18 fatty acids were detected in E. coli. (Table 6), Saccharomyces cerevisiae (FIG. 16), Yarrowia lipolytica (FIG. 17), and Arxula adeninivorans (FIG. 18), indicating that the fusion enzymes contain both tmsA and tmsB activities. The low production of 10-methylene intermediates (undetectable in E. coli and Saccharomyces, at low levels in Yarrowia and Arxula) indicate that the fusion protein efficiently converts unsaturated fatty acids into 10 methyl fatty acids.
Example 9: tmsB sequence analysis
[0219] TmsB protein sequences coded by the tmsB genes from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, Thermobifida fusca, and Thermomonospora curvata were aligned with the cyclopropane fatty acid synthase (Cfa) enzyme from Escherichia coli with the CLUSTAL OMEGA software program (European Molecular Biology Laboratory, EMBL). FIGS. 19A-D show the alignment of these protein sequences. E. coli Cfa shares homology to the TmsB enzyme and carries out a similar reaction to TmsB, with methylation of a fatty acid phospholipid double bond, but produces a cyclopropane moiety rather than a methylene moiety.
[0220] Certain amino acids of the E. coli Cfa enzyme are thought to bind the active site bicarbonate ion. Twig et al., J. Am. Chem. Soc. 127:11612-13(2005). These amino acids are C139, E239, H266, 1268, and Y317 of the E. coli enzyme, which are conserved in the consensus tmsB protein sequence (C160, E266, H293, 1295, and Y348 on the T. curvata TmsB sequence SEQ ID NO:76).
[0221] Additionally, there are sixteen amino acid residues that are conserved for all twelve TmsB protein sequences, but not in the E. coli Cfa sequence. These amino acids may be specific for 10-methylene addition to fatty acid phospholipids rather than the cyclopropane addition performed by the E. coli Cfa protein. These conserved amino acids, numbered with the T. curvata TmsB sequence, are D23, G24, A59, H128, F147, Y148, L180, L193, M203, G236, A241, R313, R318, E320, L359, L400 of SEQ ID NO:76.
[0222] A BLASTp conserved domains analysis (National Center for Biotechnology Information, NCBI) identifies a S-adenosylmethionine-dependent methyltransferase domain from amino acids 192-291 of T. curvata TmsB. S-adenosylmethionine binding site amino acid residues are identified as V196, G197, C198, G199, W200, G201, G202, T219, L220, Q246, D247, Y248, and D262.
[0223] Table 7 shows the percent sequence identity of the indicated protein relative to T. curvata tmsB:
TABLE-US-00009 TABLE 7 Species % Identity Thermomonospora curvata tmsB 100 Mycobacterium smegmatis tmsB 60 Mycobacterium vanbaaleni tmsB 59 Amycolicicoccus subflavus tmsB 55 Corynebacterium glyciniphilum tmsB 47 Corynebacterium glutamicum tmsB 50 Rhodococcus opacus tmsB 59 Agromyces subbeticus tmsB 57 Knoellia aerolata tmsB 47 Mycobacterium gilvum tmsB 58 Mycobacterium sp. Indicus tmsB 58 Thermobifida fusca tmsB 67 Escherichia coli Cfa 34
As shown in Table 7, there is a great deal of variation among the tmsB protein sequences from the different species. Nevertheless, despite the sequence variation, several of the proteins are shown herein to have the same ability to catalyze the production of a methylene-substituted lipid.
Example 10: tmsA Sequence Analysis
[0224] TmsA protein sequences coded by the tmsA genes from Mycobacterium smegmatis, Mycobacterium vanbaaleni, Amycolicicoccus subflavus, Corynebacterium glyciniphilum, Corynebacterium glutamicum, Rhodococcus opacus, Agromyces subbeticus, Knoellia aerolata, Mycobacterium gilvum, Mycobacterium sp. Indicus, Thermobifida fusca, and Thermomonospora curvata were aligned with the Glycolate oxidase subunit GlcD enzyme from Escherichia coli with the CLUSTAL OMEGA software program (European Molecular Biology Laboratory, EMBL). The E. coli GlcD enzyme does not appear to perform a similar enzymatic reaction as TmsA, but it is the most closely homologous protein to TmsA in the E. coli genome.
[0225] FIGS. 20A-E show the alignment of the TmsA proteins. There are 114 amino acid residues that are conserved for all twelve TmsA protein sequences, but not in the E. coli GlcD sequence. These amino acids are (numbered according to the T. curvata sequence (SEQ ID NO:74)): R31, A33, S37, N38, L39, F40, R43, D52, V59, D63, G73, M74, T76, Y77, D79, L80, V81, L85, P91, V93, V94, Q96, L97, T99, I100, T101, A105, G108, G110, E112, 5113, 5115, F116, R117, N118, P121, H122, E123, V125, E127, G133, P154, N155, Y157, Y162, L166, E171, V173, V177, H181, V208, G213, F216, Y222, L223, 5236, D237, Y238, T239, Y245, 5247, D254, T257, Y261, W263, R264, W265, D266, D268, W269, C272, A275, G277, Q279, R284, W287, R293, 5294, G318, E232, V325, P328, E330, F339, F343, W353, C355, P356, W363, L365, Y366, P367, N376, F379, W380, V383, P384, N395, E399, G407, H408, K409, 5410, L411, Y412, 5413, Y417, F422, Y426, G428, R443, L447, and V452.
[0226] A BLASTp conserved domains analysis (National Center for Biotechnology Information, NCBI) identifies a Flavin adenine dinucleotide (FAD) binding domain from amino acids 9-141 of T. curvata TmsA (SEQ ID NO:74), as well as a FAD/FMN-containing dehydrogenase domain from amino acids 22-444. Table 8 shows the percent sequence identity of the indicated protein relative to T. curvata tmsA:
TABLE-US-00010 TABLE 8 Species % Identity Thermomonospora curvata tmsA 100 Mycobacterium smegmatis tmsA 61 Mycobacterium vanbaaleni tmsA 61 Amycolicicoccus subflavus tmsA 60 Corynebacterium glyciniphilum tmsA 55 Corynebacterium glutamicum tmsA 53 Rhodococcus opacus tmsA 61 Agromyces subbeticus tmsA 59 Knoellia aerolata tmsA 60 Mycobacterium gilvum tmsA 59 Mycobacterium sp. Indicus tmsA 58 Thermobifida fusca tmsA 64 Escherichia coli GlcD 28
As shown in Table 8, there is a great deal of variation among the tmsA protein sequences from the different species. Nevertheless, despite the sequence variation, several of the proteins are shown herein to have the same ability to catalyze the production of a methyl-substituted lipid.
INCORPORATION BY REFERENCE
[0227] Each of the patents, published patent applications, and non-patent references cited herein is hereby incorporated by reference in its entirety.
EQUIVALENTS
[0228] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.
Sequence CWU
1
1
9811401DNAMycobacterium smegmatis 1gtgtctgtgg ttactactga cgcacaggct
gcccatgccg ccggcgtctc gcgtcttctg 60gccagctacc gggcgatccc gcccagcgcg
acagtgcgcc ttgcgaaacc gacgtccaac 120ctgttccgcg cccgcgcccg caccaatgtg
aagggtctcg acgtctcggg cctgaccggt 180gtgatcggtg tcgacccgga cgcgcgcacc
gccgatgtgg cgggcatgtg cacctacgag 240gacctggtgg cggccacgct tccgtacggc
cttgccccac tggtggtgcc gcagctcaag 300accatcacgc tcggtggcgc ggtcaccggt
ctgggcatcg agtccacgtc gttccgcaac 360ggtctgccgc acgaaagtgt cctggagatg
gacatcttga ccggttcggg cgagatcgtc 420acggcctcac cggatcagca ctcggatctg
ttccatgcgt tccccaattc atatggaacc 480cttggttatt ccacccggct gcgcatcgaa
ctggagcccg tgcacccgtt tgtggcgttg 540cgccacctgc gctttcactc gatcaccgat
ctggtcgcgg cgatggaccg gatcatcgag 600accggcgggc tggacggtga acccgtcgac
tacctcgacg gcgtggtgtt cagcgcgact 660gagagttacc tgtgtgttgg cttcaagacg
aaaacgccgg ggccggtcag cgattacaca 720ggtcagcaga tcttctaccg gtcgatccag
catgacggcg acaccggcgc cgagaaacac 780gaccggctga ccatccacga ctacctgtgg
cgctgggaca ccgactggtt ctggtgctca 840cgggcattcg gcgctcagca tccggtgatc
cgcaggttct ggccgcggcg gctgcgccgc 900agcagcttct actggaagct ggtggcctac
gaccagcggt acgacatcgc cgaccgtatc 960gagaagcgca acgggcgccc gccgcgcgag
cgggtggtcc aggacgtcga ggtgcccatc 1020gagcggtgcg cggacttcgt cgagtggttc
ctgcagaatg tgccgatcga gccgatctgg 1080ctgtgccccc tacggttgcg tgacagcgcc
gacggcggtg cctcgtggcc cctgtatccg 1140ctgaaggcgc accacaccta cgtcaacatc
ggtttctggt catcagtgcc ggtgggcccc 1200gaggagggcc acaccaaccg cctcatcgag
aaaaaagtcg cggagctgga cgggcacaaa 1260tctttgtact cggacgctta ttacacacgt
gacgaattcg acgagctgta cggcggtgag 1320gtctacaaca ccgtcaagaa gacgtacgac
ccggattcac gtctgctaga cctgtattcg 1380aaggcggtgc aaagacaatg a
14012466PRTMycobacterium smegmatis 2Val
Ser Val Val Thr Thr Asp Ala Gln Ala Ala His Ala Ala Gly Val1
5 10 15Ser Arg Leu Leu Ala Ser Tyr
Arg Ala Ile Pro Pro Ser Ala Thr Val 20 25
30Arg Leu Ala Lys Pro Thr Ser Asn Leu Phe Arg Ala Arg Ala
Arg Thr 35 40 45Asn Val Lys Gly
Leu Asp Val Ser Gly Leu Thr Gly Val Ile Gly Val 50 55
60Asp Pro Asp Ala Arg Thr Ala Asp Val Ala Gly Met Cys
Thr Tyr Glu65 70 75
80Asp Leu Val Ala Ala Thr Leu Pro Tyr Gly Leu Ala Pro Leu Val Val
85 90 95Pro Gln Leu Lys Thr Ile
Thr Leu Gly Gly Ala Val Thr Gly Leu Gly 100
105 110Ile Glu Ser Thr Ser Phe Arg Asn Gly Leu Pro His
Glu Ser Val Leu 115 120 125Glu Met
Asp Ile Leu Thr Gly Ser Gly Glu Ile Val Thr Ala Ser Pro 130
135 140Asp Gln His Ser Asp Leu Phe His Ala Phe Pro
Asn Ser Tyr Gly Thr145 150 155
160Leu Gly Tyr Ser Thr Arg Leu Arg Ile Glu Leu Glu Pro Val His Pro
165 170 175Phe Val Ala Leu
Arg His Leu Arg Phe His Ser Ile Thr Asp Leu Val 180
185 190Ala Ala Met Asp Arg Ile Ile Glu Thr Gly Gly
Leu Asp Gly Glu Pro 195 200 205Val
Asp Tyr Leu Asp Gly Val Val Phe Ser Ala Thr Glu Ser Tyr Leu 210
215 220Cys Val Gly Phe Lys Thr Lys Thr Pro Gly
Pro Val Ser Asp Tyr Thr225 230 235
240Gly Gln Gln Ile Phe Tyr Arg Ser Ile Gln His Asp Gly Asp Thr
Gly 245 250 255Ala Glu Lys
His Asp Arg Leu Thr Ile His Asp Tyr Leu Trp Arg Trp 260
265 270Asp Thr Asp Trp Phe Trp Cys Ser Arg Ala
Phe Gly Ala Gln His Pro 275 280
285Val Ile Arg Arg Phe Trp Pro Arg Arg Leu Arg Arg Ser Ser Phe Tyr 290
295 300Trp Lys Leu Val Ala Tyr Asp Gln
Arg Tyr Asp Ile Ala Asp Arg Ile305 310
315 320Glu Lys Arg Asn Gly Arg Pro Pro Arg Glu Arg Val
Val Gln Asp Val 325 330
335Glu Val Pro Ile Glu Arg Cys Ala Asp Phe Val Glu Trp Phe Leu Gln
340 345 350Asn Val Pro Ile Glu Pro
Ile Trp Leu Cys Pro Leu Arg Leu Arg Asp 355 360
365Ser Ala Asp Gly Gly Ala Ser Trp Pro Leu Tyr Pro Leu Lys
Ala His 370 375 380His Thr Tyr Val Asn
Ile Gly Phe Trp Ser Ser Val Pro Val Gly Pro385 390
395 400Glu Glu Gly His Thr Asn Arg Leu Ile Glu
Lys Lys Val Ala Glu Leu 405 410
415Asp Gly His Lys Ser Leu Tyr Ser Asp Ala Tyr Tyr Thr Arg Asp Glu
420 425 430Phe Asp Glu Leu Tyr
Gly Gly Glu Val Tyr Asn Thr Val Lys Lys Thr 435
440 445Tyr Asp Pro Asp Ser Arg Leu Leu Asp Leu Tyr Ser
Lys Ala Val Gln 450 455 460Arg
Gln46531314DNAMycobacterium smegmatis 3atgaccacat tcaaagaacg cgagacgtcc
acagcggacc gcaagctcac cctggccgag 60atcctcgaga tcttcgccgc gggtaaggag
ccgctgaagt tcactgcgta cgacggcagc 120tcggccggtc ccgaggacgc cacgatgggt
ctggacctca agaccccgcg tgggaccacc 180tatctggcca cggcacccgg cgatctgggc
ctggcccgtg cgtatgtctc cggtgacctg 240gagccgcacg gcgtgcatcc cggcgatccc
tacccgctgc tgcgcgccct ggccgaacgc 300atggagttca agcgcccgcc tgcgcgtgtg
ctggcgaaca tcgtgcgctc catcggcatc 360gagcacctca agccgatcgc accgccgccg
caggaggcgc tgccccggtg gcgccgcatc 420atggagggcc tgcggcacag caagacccgc
gacgccgagg ccatccacca ccactacgac 480gtgtcgaaca cgttctacga gtgggtgctg
ggcccgtcga tgacctacac gtgcgcgtgc 540taccccaccg aggacgcgac cctcgaagag
gcccaggaca acaagtaccg cctggtgttc 600gagaagctgc gcctgaagcc cggtgaccgg
ttgctcgacg tgggctgcgg ctggggcggc 660atggtccgct acgcggcccg ccacggcgtc
aaggcgctcg gtgtcacgct cagccgcgaa 720caggcgacgt gggcgcagaa ggccatcgcc
caggaaggtc tcaccgatct ggccgaggtg 780cgtcacggtg attaccgcga cgtcatcgaa
tccgggttcg acgcggtgtc ctcgatcggg 840ctgaccgagc acatcggcgt gcacaactac
ccggcgtact tcaacttcct caagtcgaag 900ctgcgcaccg gtggcctgct gctcaaccac
tgcatcaccc gcccggacaa ccggtcggcg 960ccatcggccg gcgggttcat cgacaggtac
gtgttccccg acggggagct caccggctcg 1020ggccgcatca tcaccgaggc ccaggacgtg
ggccttgagg tgatccacga ggagaaccta 1080cgcaatcact atgcgatgac gctgcgcgac
tggtgccgca acctggtcga gcactgggac 1140gaggcggtcg aagaggtcgg gctgcccacc
gcgaaggtgt ggggcctgta catggccggc 1200tcacgtctgg gcttcgagac caatgtggtt
cagctgcacc aggttctggc ggtcaagctt 1260gacgatcagg gcaaggacgg cggactgccg
ttgcggccct ggtggtccgc ctag 13144437PRTMycobacterium smegmatis
4Met Thr Thr Phe Lys Glu Arg Glu Thr Ser Thr Ala Asp Arg Lys Leu1
5 10 15Thr Leu Ala Glu Ile Leu
Glu Ile Phe Ala Ala Gly Lys Glu Pro Leu 20 25
30Lys Phe Thr Ala Tyr Asp Gly Ser Ser Ala Gly Pro Glu
Asp Ala Thr 35 40 45Met Gly Leu
Asp Leu Lys Thr Pro Arg Gly Thr Thr Tyr Leu Ala Thr 50
55 60Ala Pro Gly Asp Leu Gly Leu Ala Arg Ala Tyr Val
Ser Gly Asp Leu65 70 75
80Glu Pro His Gly Val His Pro Gly Asp Pro Tyr Pro Leu Leu Arg Ala
85 90 95Leu Ala Glu Arg Met Glu
Phe Lys Arg Pro Pro Ala Arg Val Leu Ala 100
105 110Asn Ile Val Arg Ser Ile Gly Ile Glu His Leu Lys
Pro Ile Ala Pro 115 120 125Pro Pro
Gln Glu Ala Leu Pro Arg Trp Arg Arg Ile Met Glu Gly Leu 130
135 140Arg His Ser Lys Thr Arg Asp Ala Glu Ala Ile
His His His Tyr Asp145 150 155
160Val Ser Asn Thr Phe Tyr Glu Trp Val Leu Gly Pro Ser Met Thr Tyr
165 170 175Thr Cys Ala Cys
Tyr Pro Thr Glu Asp Ala Thr Leu Glu Glu Ala Gln 180
185 190Asp Asn Lys Tyr Arg Leu Val Phe Glu Lys Leu
Arg Leu Lys Pro Gly 195 200 205Asp
Arg Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met Val Arg Tyr 210
215 220Ala Ala Arg His Gly Val Lys Ala Leu Gly
Val Thr Leu Ser Arg Glu225 230 235
240Gln Ala Thr Trp Ala Gln Lys Ala Ile Ala Gln Glu Gly Leu Thr
Asp 245 250 255Leu Ala Glu
Val Arg His Gly Asp Tyr Arg Asp Val Ile Glu Ser Gly 260
265 270Phe Asp Ala Val Ser Ser Ile Gly Leu Thr
Glu His Ile Gly Val His 275 280
285Asn Tyr Pro Ala Tyr Phe Asn Phe Leu Lys Ser Lys Leu Arg Thr Gly 290
295 300Gly Leu Leu Leu Asn His Cys Ile
Thr Arg Pro Asp Asn Arg Ser Ala305 310
315 320Pro Ser Ala Gly Gly Phe Ile Asp Arg Tyr Val Phe
Pro Asp Gly Glu 325 330
335Leu Thr Gly Ser Gly Arg Ile Ile Thr Glu Ala Gln Asp Val Gly Leu
340 345 350Glu Val Ile His Glu Glu
Asn Leu Arg Asn His Tyr Ala Met Thr Leu 355 360
365Arg Asp Trp Cys Arg Asn Leu Val Glu His Trp Asp Glu Ala
Val Glu 370 375 380Glu Val Gly Leu Pro
Thr Ala Lys Val Trp Gly Leu Tyr Met Ala Gly385 390
395 400Ser Arg Leu Gly Phe Glu Thr Asn Val Val
Gln Leu His Gln Val Leu 405 410
415Ala Val Lys Leu Asp Asp Gln Gly Lys Asp Gly Gly Leu Pro Leu Arg
420 425 430Pro Trp Trp Ser Ala
43551380DNAUnknownAgromyces subbeticus 5gtgtccgctc ctgcgaccga
tgcacgaacc gcccacgccg acggcgtgga gcgattgctc 60gagagttatc gggcggtgcc
ggcggccgca tcggtgcggc tcgccaagcg cacctcgaac 120ctcttccggt cccgagcggc
gacggatgcc cctggcctcg acacctccgg cctgacccac 180gtcatcgcgg tcgaccccgg
ggcgcgcacg gccgacgtcg ccggcatgtg cacctacgac 240gacctcgtcg ccgcgacact
gccgcatggg ctcgcgccac tcgtggtgcc gcaactgaag 300accatcaccc tcgggggcgc
cgtaacggga ctcggcatcg agtcgacgtc gttccgcaac 360ggtctgccgc acgagtcggt
gctcgagatc gacgtgctca ccggcgcagg cgagatcatc 420acggcgtcgc cgatcgagca
cgcagagctg ttccgcgcct tccccaactc gtacggcacc 480ctcggctacg ccgtgcgcct
gcgcatcgag ctcgagccgg tcgagccgtt cgtcgcactc 540acgcaccttc ggttccatgc
gctcaccgac ctcatcgagg caatggagcg catcatcgag 600accggtcgac tcgacggggt
tgccgtcgat tccctcgacg gcgtggtgtt cagcgctgaa 660gagagctacc tgtgcgtcgg
cacgcagacc gcggcatccg gcccggtcag cgactacacc 720cgccagcaga tcttctatcg
ctccatccag catgacgacg gtgcgaagca cgaccggctc 780accatgcacg actacctgtg
gcgctgggac gccgactggt tctggtgctc gcaggcgttc 840ggcgcgcagc atccgctgat
tcgccggttc tggccgcggc gataccggcg cagccgctcg 900tactcgacgc tcatgcgcct
cgaacggcga ttcgacctcg gcgatcgcct cgagaagctc 960aagggccggc cggcgcgcga
acgcgtgatc caagacgtcg aggtgccgat cgggcgcacc 1020gtcggcttcc tcgaatggtt
cctcgcgaac gtgccgatcg agccgatctg gttgtgcccg 1080ctgcgcctgc ggggcgaccg
cggctggcct ctctacccga tccggccgca gcagacctac 1140gtcaacatcg gcttctggtc
gacggttccg gtgggcggct ccgagggcga gacgaaccgc 1200tcgatcgagc gcgccgtgag
cgagttcgac ggacacaagt cgctgtactc cgactcgtac 1260tactcgcgcg aggagttcga
ggagctctac ggcggcgagg cgtaccgggc cgtgaagcgg 1320cgatacgacc ccgactctcg
actgctcgac ctctatgcga aggcggtgca acggcgatga 13806459PRTUnknownAgromyces
subbeticus 6Val Ser Ala Pro Ala Thr Asp Ala Arg Thr Ala His Ala Asp Gly
Val1 5 10 15Glu Arg Leu
Leu Glu Ser Tyr Arg Ala Val Pro Ala Ala Ala Ser Val 20
25 30Arg Leu Ala Lys Arg Thr Ser Asn Leu Phe
Arg Ser Arg Ala Ala Thr 35 40
45Asp Ala Pro Gly Leu Asp Thr Ser Gly Leu Thr His Val Ile Ala Val 50
55 60Asp Pro Gly Ala Arg Thr Ala Asp Val
Ala Gly Met Cys Thr Tyr Asp65 70 75
80Asp Leu Val Ala Ala Thr Leu Pro His Gly Leu Ala Pro Leu
Val Val 85 90 95Pro Gln
Leu Lys Thr Ile Thr Leu Gly Gly Ala Val Thr Gly Leu Gly 100
105 110Ile Glu Ser Thr Ser Phe Arg Asn Gly
Leu Pro His Glu Ser Val Leu 115 120
125Glu Ile Asp Val Leu Thr Gly Ala Gly Glu Ile Ile Thr Ala Ser Pro
130 135 140Ile Glu His Ala Glu Leu Phe
Arg Ala Phe Pro Asn Ser Tyr Gly Thr145 150
155 160Leu Gly Tyr Ala Val Arg Leu Arg Ile Glu Leu Glu
Pro Val Glu Pro 165 170
175Phe Val Ala Leu Thr His Leu Arg Phe His Ala Leu Thr Asp Leu Ile
180 185 190Glu Ala Met Glu Arg Ile
Ile Glu Thr Gly Arg Leu Asp Gly Val Ala 195 200
205Val Asp Ser Leu Asp Gly Val Val Phe Ser Ala Glu Glu Ser
Tyr Leu 210 215 220Cys Val Gly Thr Gln
Thr Ala Ala Ser Gly Pro Val Ser Asp Tyr Thr225 230
235 240Arg Gln Gln Ile Phe Tyr Arg Ser Ile Gln
His Asp Asp Gly Ala Lys 245 250
255His Asp Arg Leu Thr Met His Asp Tyr Leu Trp Arg Trp Asp Ala Asp
260 265 270Trp Phe Trp Cys Ser
Gln Ala Phe Gly Ala Gln His Pro Leu Ile Arg 275
280 285Arg Phe Trp Pro Arg Arg Tyr Arg Arg Ser Arg Ser
Tyr Ser Thr Leu 290 295 300Met Arg Leu
Glu Arg Arg Phe Asp Leu Gly Asp Arg Leu Glu Lys Leu305
310 315 320Lys Gly Arg Pro Ala Arg Glu
Arg Val Ile Gln Asp Val Glu Val Pro 325
330 335Ile Gly Arg Thr Val Gly Phe Leu Glu Trp Phe Leu
Ala Asn Val Pro 340 345 350Ile
Glu Pro Ile Trp Leu Cys Pro Leu Arg Leu Arg Gly Asp Arg Gly 355
360 365Trp Pro Leu Tyr Pro Ile Arg Pro Gln
Gln Thr Tyr Val Asn Ile Gly 370 375
380Phe Trp Ser Thr Val Pro Val Gly Gly Ser Glu Gly Glu Thr Asn Arg385
390 395 400Ser Ile Glu Arg
Ala Val Ser Glu Phe Asp Gly His Lys Ser Leu Tyr 405
410 415Ser Asp Ser Tyr Tyr Ser Arg Glu Glu Phe
Glu Glu Leu Tyr Gly Gly 420 425
430Glu Ala Tyr Arg Ala Val Lys Arg Arg Tyr Asp Pro Asp Ser Arg Leu
435 440 445Leu Asp Leu Tyr Ala Lys Ala
Val Gln Arg Arg 450 45571254DNAUnknownAgromyces
subbeticus 7atcctcgaga tcgtcgtcgc cggtcggctg ccgctgaggt tcaccgccta
cgacgggagc 60tcggcggggc cgcctgacgc cctgttcggc ctcgacctga agactccgcg
aggaacgacc 120tatctcgcca ccggccgcgg cgatctcggc ctcgcccgcg cctacatcgc
gggcgacctc 180gagatacagg gggtgcaccc cggagacccc tacgagctgc tcaaggcact
cgccgacagc 240ctggtcttca agctgccacc gccgcgggtg atgacccaga tcatccgttc
gatcggcgtc 300gaacatctgc ggccgatcgc gccgccgccg caagaggtgc cgccccggtg
gcgccgcatc 360gccgaggggc tccgacacag caagggccgc gacgccgaag cgatccacca
ccactacgac 420gtgtcgaaca ccttctacga atgggtgctc gggccgtcga tgacctacac
gtgcgcgtgc 480tacccgggcc tcgacgcatc cctcgacgag gcgcagcaga acaagtaccg
gctcgtgttc 540gagaagctgc ggctgaagcc gggcgaccga ctgctcgacg tcggctgcgg
gtggggcggc 600atggtgcgct acgccgcgcg ccacggcgtg caggcgttgg gcgtgaccct
gtcgcgagag 660cagacggcgt gggcgcagca ggcgatcgcc gtcgagggcc tcgccgacct
cgccgaggtg 720cgctacggcg actaccgcga catcgccgaa gacggcttcg atgcggtgtc
atcgatcggg 780ctgctcgagc acatcggcgt gcgcaactac gcttcgtatt tcggctttct
gcagtcgcgc 840ttgcggcccg ggggactctt gctcaaccac tgcatcaccc ggcccgacaa
tcgctccgag 900ccgtcggcgc gcggcttcat cgaccggtac gtgttccccg acggagagct
caccggctcg 960ggccgcatca tcaccgaggc gcaggatgtc ggcttcgaag tgctgcacga
agagaacctg 1020cgtcagcatt atgcactgac actgcgcgat tggtgcgcca acctcgtcgc
gcactgggaa 1080gaggcggtcg ccgaggtcgg gctgccgacc gcgaaggtgt ggggcctcta
catggccggg 1140tcacggctcg cgttcgagag cggcggcatc cagttgcacc aggtgctggc
ggtcagacca 1200gacgatcgca gcgacgccgc ccagctgccg ctgcggccgt ggtggacgcc
atag 12548417PRTUnknownAgromyces subbeticus 8Ile Leu Glu Ile Val
Val Ala Gly Arg Leu Pro Leu Arg Phe Thr Ala1 5
10 15Tyr Asp Gly Ser Ser Ala Gly Pro Pro Asp Ala
Leu Phe Gly Leu Asp 20 25
30Leu Lys Thr Pro Arg Gly Thr Thr Tyr Leu Ala Thr Gly Arg Gly Asp
35 40 45Leu Gly Leu Ala Arg Ala Tyr Ile
Ala Gly Asp Leu Glu Ile Gln Gly 50 55
60Val His Pro Gly Asp Pro Tyr Glu Leu Leu Lys Ala Leu Ala Asp Ser65
70 75 80Leu Val Phe Lys Leu
Pro Pro Pro Arg Val Met Thr Gln Ile Ile Arg 85
90 95Ser Ile Gly Val Glu His Leu Arg Pro Ile Ala
Pro Pro Pro Gln Glu 100 105
110Val Pro Pro Arg Trp Arg Arg Ile Ala Glu Gly Leu Arg His Ser Lys
115 120 125Gly Arg Asp Ala Glu Ala Ile
His His His Tyr Asp Val Ser Asn Thr 130 135
140Phe Tyr Glu Trp Val Leu Gly Pro Ser Met Thr Tyr Thr Cys Ala
Cys145 150 155 160Tyr Pro
Gly Leu Asp Ala Ser Leu Asp Glu Ala Gln Gln Asn Lys Tyr
165 170 175Arg Leu Val Phe Glu Lys Leu
Arg Leu Lys Pro Gly Asp Arg Leu Leu 180 185
190Asp Val Gly Cys Gly Trp Gly Gly Met Val Arg Tyr Ala Ala
Arg His 195 200 205Gly Val Gln Ala
Leu Gly Val Thr Leu Ser Arg Glu Gln Thr Ala Trp 210
215 220Ala Gln Gln Ala Ile Ala Val Glu Gly Leu Ala Asp
Leu Ala Glu Val225 230 235
240Arg Tyr Gly Asp Tyr Arg Asp Ile Ala Glu Asp Gly Phe Asp Ala Val
245 250 255Ser Ser Ile Gly Leu
Leu Glu His Ile Gly Val Arg Asn Tyr Ala Ser 260
265 270Tyr Phe Gly Phe Leu Gln Ser Arg Leu Arg Pro Gly
Gly Leu Leu Leu 275 280 285Asn His
Cys Ile Thr Arg Pro Asp Asn Arg Ser Glu Pro Ser Ala Arg 290
295 300Gly Phe Ile Asp Arg Tyr Val Phe Pro Asp Gly
Glu Leu Thr Gly Ser305 310 315
320Gly Arg Ile Ile Thr Glu Ala Gln Asp Val Gly Phe Glu Val Leu His
325 330 335Glu Glu Asn Leu
Arg Gln His Tyr Ala Leu Thr Leu Arg Asp Trp Cys 340
345 350Ala Asn Leu Val Ala His Trp Glu Glu Ala Val
Ala Glu Val Gly Leu 355 360 365Pro
Thr Ala Lys Val Trp Gly Leu Tyr Met Ala Gly Ser Arg Leu Ala 370
375 380Phe Glu Ser Gly Gly Ile Gln Leu His Gln
Val Leu Ala Val Arg Pro385 390 395
400Asp Asp Arg Ser Asp Ala Ala Gln Leu Pro Leu Arg Pro Trp Trp
Thr 405 410
415Pro91428DNAUnknownAmycolicicoccus subflavus 9atgacgcctg aagctagtgc
ggcggcgcac gccgctgcgg tggatcgcct catccatagc 60tatcgggcga ttcctgatga
cgcgccggtg cggctggcga agaagacgtc aaacctattc 120cgccacaggg aaaagacttc
tgctcctggg cttgacgtat ccggcctggc tcgcgtgatt 180gggatcgact cagacactcg
cactgccgac gttggcggca tgtgcacata cgaggacctt 240gtcgcggcga cgctcgaata
cgatctggtc cccctggtcg tcccgcaact caaaacgatc 300actctcggcg gcgcggtgac
gggcctggga attgagtcca cctcgttccg caatgggctt 360ccccatgaat ctgttctcga
aatggatatc ctgacgggcg ccggggaggt cgtcacggcc 420ggcccggaag gcccccatag
cgatttgtac tgggggtttc cgaattcgta cggcacgctc 480ggctatgcga cgcgcctgcg
catcgaacta gaaccggtcg agccgtacgt cgaactcagg 540cacctgcggt tcactagcct
cgatgagctt caggagacac ttgacaccgt ttcgtacgaa 600cacacgtatg acggggaacc
cgttcattac gtcgatggag tcatgttctc agccacggaa 660agctacctca cgcttggccg
tcagacgagc gaacccggcc cggtcagcga ctacaccgga 720aaccagatct actaccgttc
aatacagcac ggtggcgctg aaactcccgt cgtcgaccgg 780atgaccattc atgactatct
atggcgctgg gatactgact ggttctggtg ctcgcgtgcc 840ttcggaacgc aacacccagt
ggtccggaga ttctggccac gccgctatcg ccgcagcagc 900ttctactgga agctgatcgc
gcttgaccgc caggttgggc tcgcggactt catcgaacaa 960cggaagggca acctcccccg
ggaacgcgta gtccaggaca tcgaggtccc gatcgagaac 1020actgcgagct tcttgcggtg
gttcttggcg aacgtgccga tcgagccggt atggctatgc 1080ccgctgcgcc tgcgaaaaac
acgcagcccc ggcctgcctt cgccgacgtc cccggcttca 1140cgcccatggc ccctctatcc
gctcgagcct cagcgcacat acgtcaatgt tggcttctgg 1200tcagcggtgc cggtcgtggc
cggccagccc gaggggcaca ccaaccggat gatcgagaac 1260gaagtcgatc gccttgacgg
tcacaaatcg ctgtactcag atgcgtttta cgagcgaaaa 1320gagtttgacg cgctgtacgg
cggcgatacc tatagagaac tcaaagagac ctacgaccca 1380aacagccggt tacttgatct
ctatgcaaag gcggtgcaag gacgatga
142810475PRTUnknownAmycolicicoccus subflavus 10Met Thr Pro Glu Ala Ser
Ala Ala Ala His Ala Ala Ala Val Asp Arg1 5
10 15Leu Ile His Ser Tyr Arg Ala Ile Pro Asp Asp Ala
Pro Val Arg Leu 20 25 30Ala
Lys Lys Thr Ser Asn Leu Phe Arg His Arg Glu Lys Thr Ser Ala 35
40 45Pro Gly Leu Asp Val Ser Gly Leu Ala
Arg Val Ile Gly Ile Asp Ser 50 55
60Asp Thr Arg Thr Ala Asp Val Gly Gly Met Cys Thr Tyr Glu Asp Leu65
70 75 80Val Ala Ala Thr Leu
Glu Tyr Asp Leu Val Pro Leu Val Val Pro Gln 85
90 95Leu Lys Thr Ile Thr Leu Gly Gly Ala Val Thr
Gly Leu Gly Ile Glu 100 105
110Ser Thr Ser Phe Arg Asn Gly Leu Pro His Glu Ser Val Leu Glu Met
115 120 125Asp Ile Leu Thr Gly Ala Gly
Glu Val Val Thr Ala Gly Pro Glu Gly 130 135
140Pro His Ser Asp Leu Tyr Trp Gly Phe Pro Asn Ser Tyr Gly Thr
Leu145 150 155 160Gly Tyr
Ala Thr Arg Leu Arg Ile Glu Leu Glu Pro Val Glu Pro Tyr
165 170 175Val Glu Leu Arg His Leu Arg
Phe Thr Ser Leu Asp Glu Leu Gln Glu 180 185
190Thr Leu Asp Thr Val Ser Tyr Glu His Thr Tyr Asp Gly Glu
Pro Val 195 200 205His Tyr Val Asp
Gly Val Met Phe Ser Ala Thr Glu Ser Tyr Leu Thr 210
215 220Leu Gly Arg Gln Thr Ser Glu Pro Gly Pro Val Ser
Asp Tyr Thr Gly225 230 235
240Asn Gln Ile Tyr Tyr Arg Ser Ile Gln His Gly Gly Ala Glu Thr Pro
245 250 255Val Val Asp Arg Met
Thr Ile His Asp Tyr Leu Trp Arg Trp Asp Thr 260
265 270Asp Trp Phe Trp Cys Ser Arg Ala Phe Gly Thr Gln
His Pro Val Val 275 280 285Arg Arg
Phe Trp Pro Arg Arg Tyr Arg Arg Ser Ser Phe Tyr Trp Lys 290
295 300Leu Ile Ala Leu Asp Arg Gln Val Gly Leu Ala
Asp Phe Ile Glu Gln305 310 315
320Arg Lys Gly Asn Leu Pro Arg Glu Arg Val Val Gln Asp Ile Glu Val
325 330 335Pro Ile Glu Asn
Thr Ala Ser Phe Leu Arg Trp Phe Leu Ala Asn Val 340
345 350Pro Ile Glu Pro Val Trp Leu Cys Pro Leu Arg
Leu Arg Lys Thr Arg 355 360 365Ser
Pro Gly Leu Pro Ser Pro Thr Ser Pro Ala Ser Arg Pro Trp Pro 370
375 380Leu Tyr Pro Leu Glu Pro Gln Arg Thr Tyr
Val Asn Val Gly Phe Trp385 390 395
400Ser Ala Val Pro Val Val Ala Gly Gln Pro Glu Gly His Thr Asn
Arg 405 410 415Met Ile Glu
Asn Glu Val Asp Arg Leu Asp Gly His Lys Ser Leu Tyr 420
425 430Ser Asp Ala Phe Tyr Glu Arg Lys Glu Phe
Asp Ala Leu Tyr Gly Gly 435 440
445Asp Thr Tyr Arg Glu Leu Lys Glu Thr Tyr Asp Pro Asn Ser Arg Leu 450
455 460Leu Asp Leu Tyr Ala Lys Ala Val
Gln Gly Arg465 470
475111311DNAUnknownAmycolicicoccus subflavus 11atgaaggcag tgttgacggc
gtttacggct ccccaactcg aaaggatgaa cgtcgctgag 60atactcagcg cggtactcgg
gcgagatttc ccgatccggt tcactgcgta cgacggcagc 120gcgctcggcc ccgaaaccgc
ccgctacggc ttgcacctca cgacgccgcg cgggctgacc 180tacctcgcta ccgcgcccgg
tgatctcggg ctcgcacgcg cgtacgtgtc cggcgacctc 240gaggtcagtg gggttcatca
gggtgacccg tacgagataa tgaagatcct cgcgcatgac 300gtccgggtgc ggcggccctc
gccagcaacg atcgcttcga tcatgcggtc cctcggctgg 360gaacgcttgc gaccggtcgc
gccgcccccg caagagaaca tgccccgttg gcgccggatg 420gcccttggcc tgctgcactc
gaagagccgt gatgctgcgg caatccacca tcattacgac 480gtgtcgaacg agttttacga
gcacatcctc ggcccgtcga tgacgtacac atgcgcggcc 540taccccagcg cagacagttc
cctggaggaa gcacaggaca acaagtaccg actcgtcttc 600gagaaacttg gcctgaaagc
cggggatcgc ctgcttgacg tcgggtgcgg gtggggcggc 660atggtgcggt tcgccgctaa
gcgcggcgtt catgtcatcg gtgcgacatt gtcccgcaaa 720caggcggaat gggctcagaa
gatgattgcc catgaaggat tgggcgatct ggcggaagtc 780cgtttctgcg actaccgcga
tgtcacagag gcgggcttcg acgcagtgtc gtcgatcggc 840ctcactgaac acatcggttt
ggcgaactac ccgtcgtact tcggcttcct gaaggacaag 900ttgcggccag gcggacgact
gctgaaccat tgcatcactc gcccgaacaa ccttcaaagc 960aaccgcgcag gtgacttcat
tgaccggtac gttttccctg acggagagct cgccggacct 1020ggcttcatca tttcagctgt
ccacgacgcc ggtttcgagg tgcggcacga agagaacctc 1080cgcgagcact acgcactgac
gctgcgggac tggaaccgca acctcgctcg cgactgggac 1140gcgtgtgtgc acgcctccga
cgagggcacc gcccgcgtct ggggactgta catttccggt 1200tcacgagtcg cgtttgaaac
gaactcgatt cagctgcacc aggtcctggc ggtcaaaacc 1260gcgcggaatg gcgaagcgca
ggtcccgttg ggtcagtggt ggacccgctg a
131112436PRTUnknownAmycolicicoccus subflavus 12Met Lys Ala Val Leu Thr
Ala Phe Thr Ala Pro Gln Leu Glu Arg Met1 5
10 15Asn Val Ala Glu Ile Leu Ser Ala Val Leu Gly Arg
Asp Phe Pro Ile 20 25 30Arg
Phe Thr Ala Tyr Asp Gly Ser Ala Leu Gly Pro Glu Thr Ala Arg 35
40 45Tyr Gly Leu His Leu Thr Thr Pro Arg
Gly Leu Thr Tyr Leu Ala Thr 50 55
60Ala Pro Gly Asp Leu Gly Leu Ala Arg Ala Tyr Val Ser Gly Asp Leu65
70 75 80Glu Val Ser Gly Val
His Gln Gly Asp Pro Tyr Glu Ile Met Lys Ile 85
90 95Leu Ala His Asp Val Arg Val Arg Arg Pro Ser
Pro Ala Thr Ile Ala 100 105
110Ser Ile Met Arg Ser Leu Gly Trp Glu Arg Leu Arg Pro Val Ala Pro
115 120 125Pro Pro Gln Glu Asn Met Pro
Arg Trp Arg Arg Met Ala Leu Gly Leu 130 135
140Leu His Ser Lys Ser Arg Asp Ala Ala Ala Ile His His His Tyr
Asp145 150 155 160Val Ser
Asn Glu Phe Tyr Glu His Ile Leu Gly Pro Ser Met Thr Tyr
165 170 175Thr Cys Ala Ala Tyr Pro Ser
Ala Asp Ser Ser Leu Glu Glu Ala Gln 180 185
190Asp Asn Lys Tyr Arg Leu Val Phe Glu Lys Leu Gly Leu Lys
Ala Gly 195 200 205Asp Arg Leu Leu
Asp Val Gly Cys Gly Trp Gly Gly Met Val Arg Phe 210
215 220Ala Ala Lys Arg Gly Val His Val Ile Gly Ala Thr
Leu Ser Arg Lys225 230 235
240Gln Ala Glu Trp Ala Gln Lys Met Ile Ala His Glu Gly Leu Gly Asp
245 250 255Leu Ala Glu Val Arg
Phe Cys Asp Tyr Arg Asp Val Thr Glu Ala Gly 260
265 270Phe Asp Ala Val Ser Ser Ile Gly Leu Thr Glu His
Ile Gly Leu Ala 275 280 285Asn Tyr
Pro Ser Tyr Phe Gly Phe Leu Lys Asp Lys Leu Arg Pro Gly 290
295 300Gly Arg Leu Leu Asn His Cys Ile Thr Arg Pro
Asn Asn Leu Gln Ser305 310 315
320Asn Arg Ala Gly Asp Phe Ile Asp Arg Tyr Val Phe Pro Asp Gly Glu
325 330 335Leu Ala Gly Pro
Gly Phe Ile Ile Ser Ala Val His Asp Ala Gly Phe 340
345 350Glu Val Arg His Glu Glu Asn Leu Arg Glu His
Tyr Ala Leu Thr Leu 355 360 365Arg
Asp Trp Asn Arg Asn Leu Ala Arg Asp Trp Asp Ala Cys Val His 370
375 380Ala Ser Asp Glu Gly Thr Ala Arg Val Trp
Gly Leu Tyr Ile Ser Gly385 390 395
400Ser Arg Val Ala Phe Glu Thr Asn Ser Ile Gln Leu His Gln Val
Leu 405 410 415Ala Val Lys
Thr Ala Arg Asn Gly Glu Ala Gln Val Pro Leu Gly Gln 420
425 430Trp Trp Thr Arg
435131548DNACorynebacterium glutamicum 13atgagcggat tagttgaccc ggatagtact
tttttaaaga ccatcggaaa actgagcaac 60agcttgtcca ttggtcgtgg agtagatcaa
aaagaggtaa tccccaaagg ctggaacgcc 120cattgggagg caattacaaa gcttaagaga
agctttgacg cgattcctgc tggggagcgg 180gtgcgtttag ctaagaaaac ctccaacctg
ttccgtggac gctccgatgc aggtcacggc 240ctagatgtgg cagcgcttgg gggagtgatt
gccattgatc cggtcaatgc caccgccgat 300gtacagggca tgtgcacgta tgaagacctg
gtagatgcca ctttaagtta tggtctgatg 360ccgttggttg tgcctcaact gaaaaccatc
acgcttggtg gcgcagtgac cggaatgggc 420gtggaatcca catccttccg caacggtttg
ccacacgaat cagtgctgga gatggatatt 480tttaccggca ctggtgagat cgtgacttgc
tcgcccacag aaaatgtcga cctttacaga 540ggttttccca actcttatgg ttcgctggga
tacgcggtgc ggctaaaaat tgagctggaa 600ccagtgcaag attacgtcca gctgcgccac
gtgcgcttca acgatttaga gtctttgacc 660aaagcgattg aggaagtcgc gtcttctctg
gagtttgata accaacccgt cgattacctt 720gacggcgtgg tgttttcacc cacggaagcc
tacttagttc ttggcacgca aacctcacaa 780cctggcccca ccagcgatta caccagggat
ttaagctact accgctccct gcaacaccca 840gagggcatca cctatgaccg cctgacaatc
cgcgattaca tctggcgctg ggacaccgac 900tggttctggt gttcacgcgc attcggcacc
caaaaccccg tggtgcgcaa actctggccc 960agggatctgc tgcgctcgag tttctattgg
aagatcatcg gctgggatcg aaaatactcc 1020atcgctgatc gcctggaaga gcgcaaaggc
cgcccggcta gggaacgggt ggtccaagac 1080gtggaagtta cgattgataa actgccagaa
tttttgaaat ggttctttga aagcagcgac 1140atcgagccgc tgtggctgtg cccgatcaag
cttcgggagg taccaggtag ttcggttggt 1200gctggagaaa ttttgagctc cgctgaagca
atcgactccg gtgctgctga acacccttgg 1260ccgctgtatc ccttgaagaa ggacgtgctg
tgggtcaaca tcggattctg gtcctcagtg 1320ccggttgatc tgatgggctc cgatgcacca
gagggagcat ttaacagaga aatcgaacgc 1380gtcatggcag agctaggcgg acataaatcg
ctgtactccg aagcgttcta caccagggaa 1440gactttgaaa aactttatgg cggaaccatc
ccggcgctgc taaaaaagca gtgggatccc 1500cacagccgat tccccggttt gtatgaaaag
acagtaaaag gcgcctag 154814515PRTCorynebacterium glutamicum
14Met Ser Gly Leu Val Asp Pro Asp Ser Thr Phe Leu Lys Thr Ile Gly1
5 10 15Lys Leu Ser Asn Ser Leu
Ser Ile Gly Arg Gly Val Asp Gln Lys Glu 20 25
30Val Ile Pro Lys Gly Trp Asn Ala His Trp Glu Ala Ile
Thr Lys Leu 35 40 45Lys Arg Ser
Phe Asp Ala Ile Pro Ala Gly Glu Arg Val Arg Leu Ala 50
55 60Lys Lys Thr Ser Asn Leu Phe Arg Gly Arg Ser Asp
Ala Gly His Gly65 70 75
80Leu Asp Val Ala Ala Leu Gly Gly Val Ile Ala Ile Asp Pro Val Asn
85 90 95Ala Thr Ala Asp Val Gln
Gly Met Cys Thr Tyr Glu Asp Leu Val Asp 100
105 110Ala Thr Leu Ser Tyr Gly Leu Met Pro Leu Val Val
Pro Gln Leu Lys 115 120 125Thr Ile
Thr Leu Gly Gly Ala Val Thr Gly Met Gly Val Glu Ser Thr 130
135 140Ser Phe Arg Asn Gly Leu Pro His Glu Ser Val
Leu Glu Met Asp Ile145 150 155
160Phe Thr Gly Thr Gly Glu Ile Val Thr Cys Ser Pro Thr Glu Asn Val
165 170 175Asp Leu Tyr Arg
Gly Phe Pro Asn Ser Tyr Gly Ser Leu Gly Tyr Ala 180
185 190Val Arg Leu Lys Ile Glu Leu Glu Pro Val Gln
Asp Tyr Val Gln Leu 195 200 205Arg
His Val Arg Phe Asn Asp Leu Glu Ser Leu Thr Lys Ala Ile Glu 210
215 220Glu Val Ala Ser Ser Leu Glu Phe Asp Asn
Gln Pro Val Asp Tyr Leu225 230 235
240Asp Gly Val Val Phe Ser Pro Thr Glu Ala Tyr Leu Val Leu Gly
Thr 245 250 255Gln Thr Ser
Gln Pro Gly Pro Thr Ser Asp Tyr Thr Arg Asp Leu Ser 260
265 270Tyr Tyr Arg Ser Leu Gln His Pro Glu Gly
Ile Thr Tyr Asp Arg Leu 275 280
285Thr Ile Arg Asp Tyr Ile Trp Arg Trp Asp Thr Asp Trp Phe Trp Cys 290
295 300Ser Arg Ala Phe Gly Thr Gln Asn
Pro Val Val Arg Lys Leu Trp Pro305 310
315 320Arg Asp Leu Leu Arg Ser Ser Phe Tyr Trp Lys Ile
Ile Gly Trp Asp 325 330
335Arg Lys Tyr Ser Ile Ala Asp Arg Leu Glu Glu Arg Lys Gly Arg Pro
340 345 350Ala Arg Glu Arg Val Val
Gln Asp Val Glu Val Thr Ile Asp Lys Leu 355 360
365Pro Glu Phe Leu Lys Trp Phe Phe Glu Ser Ser Asp Ile Glu
Pro Leu 370 375 380Trp Leu Cys Pro Ile
Lys Leu Arg Glu Val Pro Gly Ser Ser Val Gly385 390
395 400Ala Gly Glu Ile Leu Ser Ser Ala Glu Ala
Ile Asp Ser Gly Ala Ala 405 410
415Glu His Pro Trp Pro Leu Tyr Pro Leu Lys Lys Asp Val Leu Trp Val
420 425 430Asn Ile Gly Phe Trp
Ser Ser Val Pro Val Asp Leu Met Gly Ser Asp 435
440 445Ala Pro Glu Gly Ala Phe Asn Arg Glu Ile Glu Arg
Val Met Ala Glu 450 455 460Leu Gly Gly
His Lys Ser Leu Tyr Ser Glu Ala Phe Tyr Thr Arg Glu465
470 475 480Asp Phe Glu Lys Leu Tyr Gly
Gly Thr Ile Pro Ala Leu Leu Lys Lys 485
490 495Gln Trp Asp Pro His Ser Arg Phe Pro Gly Leu Tyr
Glu Lys Thr Val 500 505 510Lys
Gly Ala 515151308DNACorynebacterium glutamicum 15atgagtaacg
ccgtagcgca ggacctcatg accatcgccg acatcgtcga ggccacgacc 60actgcaccca
tcccattcca catcactgcc ttcgatggaa gcttcactgg ccctgaagat 120gctccctacc
agctgtttgt tgccaacacg gatgcagtat cctacatcgc aacagcgcca 180ggagatttgg
gtttggcacg tgcctacctc atgggagacc tcatcgtgga aggtgagcat 240cccggccatc
cttatgggat ctttgatgcg ttgaaggagt tctaccgctg cttcaaacgc 300ccagatgcat
ccaccacctt gcagatcatg tggactctgc ggaaaatgaa tgccttaaaa 360ttccaggaaa
ttccaccaat ggaacaagcc cctgcatggc gtaaagcact gatcaacggg 420ctagcatcca
ggcactcgaa atcccgcgac aagaaagcca ttagctacca ctacgacgtg 480ggcaatgagt
tctactccct gtttttagat gattccatga cctatacctg cgcgtattat 540ccaacgccag
aatcaagttt ggaagaagcc caagaaaaca aataccgcct catctttgaa 600aaactgcgtc
tgaaagaagg cgatcgcctc ctagacgtgg gatgcggttg gggaggcatg 660gtccgctacg
ccgccaaaca cggtgtgaaa gccatcggag ttacgctgtc tgaacagcaa 720tatgagtggg
gtcaagcaga gatcaaacgc caaggtttgg aagacctcgc ggaaattcgc 780ttcatggatt
accgcgatgt tccagaaact ggattcgatg cgatctcagc aatcggcatc 840attgaacaca
tcggtgtgaa caactatccc gactactttg aattgctcag cagcaaactc 900aaaacaggcg
gactgatgct caaccacagc atcacctacc cagacaaccg cccccgccac 960gcaggtgcat
ttattgatcg ctacattttc cccgacggtg aactcactgg ctctggcacc 1020ctgatcaagc
acatgcagga caacggtttc gaagtgctgc acgaagaaaa cctccgcttt 1080gattaccaac
gcaccctgca cgcgtggtgc gaaaacctca aagaaaattg ggaggaagca 1140gttgaactcg
ccggtgaacc cactgcacga ctctttggcc tgtacatggc aggttcggaa 1200tggggatttg
cccacaacat cgtccagctg caccaagtac tgggtgtgaa actcgatgag 1260cagggaagtc
gcggagaagt tcctgaaaga atgtggtgga ctatctaa
130816435PRTCorynebacterium glutamicum 16Met Ser Asn Ala Val Ala Gln Asp
Leu Met Thr Ile Ala Asp Ile Val1 5 10
15Glu Ala Thr Thr Thr Ala Pro Ile Pro Phe His Ile Thr Ala
Phe Asp 20 25 30Gly Ser Phe
Thr Gly Pro Glu Asp Ala Pro Tyr Gln Leu Phe Val Ala 35
40 45Asn Thr Asp Ala Val Ser Tyr Ile Ala Thr Ala
Pro Gly Asp Leu Gly 50 55 60Leu Ala
Arg Ala Tyr Leu Met Gly Asp Leu Ile Val Glu Gly Glu His65
70 75 80Pro Gly His Pro Tyr Gly Ile
Phe Asp Ala Leu Lys Glu Phe Tyr Arg 85 90
95Cys Phe Lys Arg Pro Asp Ala Ser Thr Thr Leu Gln Ile
Met Trp Thr 100 105 110Leu Arg
Lys Met Asn Ala Leu Lys Phe Gln Glu Ile Pro Pro Met Glu 115
120 125Gln Ala Pro Ala Trp Arg Lys Ala Leu Ile
Asn Gly Leu Ala Ser Arg 130 135 140His
Ser Lys Ser Arg Asp Lys Lys Ala Ile Ser Tyr His Tyr Asp Val145
150 155 160Gly Asn Glu Phe Tyr Ser
Leu Phe Leu Asp Asp Ser Met Thr Tyr Thr 165
170 175Cys Ala Tyr Tyr Pro Thr Pro Glu Ser Ser Leu Glu
Glu Ala Gln Glu 180 185 190Asn
Lys Tyr Arg Leu Ile Phe Glu Lys Leu Arg Leu Lys Glu Gly Asp 195
200 205Arg Leu Leu Asp Val Gly Cys Gly Trp
Gly Gly Met Val Arg Tyr Ala 210 215
220Ala Lys His Gly Val Lys Ala Ile Gly Val Thr Leu Ser Glu Gln Gln225
230 235 240Tyr Glu Trp Gly
Gln Ala Glu Ile Lys Arg Gln Gly Leu Glu Asp Leu 245
250 255Ala Glu Ile Arg Phe Met Asp Tyr Arg Asp
Val Pro Glu Thr Gly Phe 260 265
270Asp Ala Ile Ser Ala Ile Gly Ile Ile Glu His Ile Gly Val Asn Asn
275 280 285Tyr Pro Asp Tyr Phe Glu Leu
Leu Ser Ser Lys Leu Lys Thr Gly Gly 290 295
300Leu Met Leu Asn His Ser Ile Thr Tyr Pro Asp Asn Arg Pro Arg
His305 310 315 320Ala Gly
Ala Phe Ile Asp Arg Tyr Ile Phe Pro Asp Gly Glu Leu Thr
325 330 335Gly Ser Gly Thr Leu Ile Lys
His Met Gln Asp Asn Gly Phe Glu Val 340 345
350Leu His Glu Glu Asn Leu Arg Phe Asp Tyr Gln Arg Thr Leu
His Ala 355 360 365Trp Cys Glu Asn
Leu Lys Glu Asn Trp Glu Glu Ala Val Glu Leu Ala 370
375 380Gly Glu Pro Thr Ala Arg Leu Phe Gly Leu Tyr Met
Ala Gly Ser Glu385 390 395
400Trp Gly Phe Ala His Asn Ile Val Gln Leu His Gln Val Leu Gly Val
405 410 415Lys Leu Asp Glu Gln
Gly Ser Arg Gly Glu Val Pro Glu Arg Met Trp 420
425 430Trp Thr Ile
435171458DNAUnknownCorynebacterium glyciniphilium 17gtgaccgtcg ccggcaggat
cactgacgcg gtacgcatag gaaatggact tgaccagcga 60gatctagccc ccgtcgggtg
gtacgcacac gaacaggccg tggcgcgact gaaggccagt 120ttcgacgcgg tccccgccgg
gcgtcgcgtg cggctggcga agaagacgtc caaccttttc 180cgcgggcgtt ccggcgaggc
agtcgggctc gacgtgtcgg ggctgcacgg cgtcatcgcc 240gtcgaccccg ttgagggcac
cgctgacgtc cagggcatgt gcacgtacga ggacctggtg 300gacgtcctgc tgccctacgg
tctggcgccc accgtcgttc cgcagctgaa gaccatcact 360ctcggcggtg cggtgaccgg
catgggggtg gaatccacct ccttccgcaa cggcctgccg 420cacgaagccg tcctggaaat
ggatgtgctc accggtaccg gagacatcct cacctgttcg 480ccgacccaga acaccgacct
ctaccgcggc ttccccaact cctacggttc cctgggatac 540agcgtgcggc tgaaggtgcg
gtgcgaacgg gtggaaccct acgtcgacct gcggcatgta 600cgcttcgatg acgttcagtc
gctcaccgac gccctcgaca acatcgtcgt ggacaaggag 660tacgagggtg aacgggtcga
ctatctcgac ggtgtggtct tcagcctgga ggagagctac 720ctcgtcctgg gacgggcgac
cagcgaggcc ggccccgtta gcgactacac ccgcgagcgc 780agttactacc gttctctgca
gcatccgtcg ggggtcctgc gcgacaagtt gaccatccgc 840gactacctct ggcggtggga
cgtcgactgg ttctggtgca accgggcctt cggtacccag 900aaccccacca tccgtactct
gtggccgcgg gatctcctgc ggtcgagctt ctactggaag 960atcatcggct gggaccgacg
cttcgacatc gcggaccgga tcgaggcaca caacgggcgc 1020cccgcacgcg agcgcgtcgt
ccaggacatc gaggtcaccc ccgacaacct gccggagttc 1080ctcacgtggt tcttcaccca
ctgcgagatc gagccggtgt ggctgtgccc cattcgactg 1140gccgacgact cgggcgagcg
gacaccgtgg cccctgtacc cgctgtcacc cggcgacacc 1200tgggtcaacg tgggattctg
gagctcggtg cccgccgacc tgatggggaa ggacgccccg 1260accggagcct tcaaccggga
ggtggagaga gtcgtctcgg acctcggcgg acacaagtcg 1320ttgtactccg aggcattcta
ttctgaggaa cagttcgccg ccctctacgg cggtgaacgt 1380cccgcacaac tcaaggcggt
cttcgacccg gatgaccggt tccccgggtt gtacgagaag 1440accgtgggcg gcgtctga
145818485PRTUnknownCorynebacterium glyciniphilium 18Val Thr Val Ala Gly
Arg Ile Thr Asp Ala Val Arg Ile Gly Asn Gly1 5
10 15Leu Asp Gln Arg Asp Leu Ala Pro Val Gly Trp
Tyr Ala His Glu Gln 20 25
30Ala Val Ala Arg Leu Lys Ala Ser Phe Asp Ala Val Pro Ala Gly Arg
35 40 45Arg Val Arg Leu Ala Lys Lys Thr
Ser Asn Leu Phe Arg Gly Arg Ser 50 55
60Gly Glu Ala Val Gly Leu Asp Val Ser Gly Leu His Gly Val Ile Ala65
70 75 80Val Asp Pro Val Glu
Gly Thr Ala Asp Val Gln Gly Met Cys Thr Tyr 85
90 95Glu Asp Leu Val Asp Val Leu Leu Pro Tyr Gly
Leu Ala Pro Thr Val 100 105
110Val Pro Gln Leu Lys Thr Ile Thr Leu Gly Gly Ala Val Thr Gly Met
115 120 125Gly Val Glu Ser Thr Ser Phe
Arg Asn Gly Leu Pro His Glu Ala Val 130 135
140Leu Glu Met Asp Val Leu Thr Gly Thr Gly Asp Ile Leu Thr Cys
Ser145 150 155 160Pro Thr
Gln Asn Thr Asp Leu Tyr Arg Gly Phe Pro Asn Ser Tyr Gly
165 170 175Ser Leu Gly Tyr Ser Val Arg
Leu Lys Val Arg Cys Glu Arg Val Glu 180 185
190Pro Tyr Val Asp Leu Arg His Val Arg Phe Asp Asp Val Gln
Ser Leu 195 200 205Thr Asp Ala Leu
Asp Asn Ile Val Val Asp Lys Glu Tyr Glu Gly Glu 210
215 220Arg Val Asp Tyr Leu Asp Gly Val Val Phe Ser Leu
Glu Glu Ser Tyr225 230 235
240Leu Val Leu Gly Arg Ala Thr Ser Glu Ala Gly Pro Val Ser Asp Tyr
245 250 255Thr Arg Glu Arg Ser
Tyr Tyr Arg Ser Leu Gln His Pro Ser Gly Val 260
265 270Leu Arg Asp Lys Leu Thr Ile Arg Asp Tyr Leu Trp
Arg Trp Asp Val 275 280 285Asp Trp
Phe Trp Cys Asn Arg Ala Phe Gly Thr Gln Asn Pro Thr Ile 290
295 300Arg Thr Leu Trp Pro Arg Asp Leu Leu Arg Ser
Ser Phe Tyr Trp Lys305 310 315
320Ile Ile Gly Trp Asp Arg Arg Phe Asp Ile Ala Asp Arg Ile Glu Ala
325 330 335His Asn Gly Arg
Pro Ala Arg Glu Arg Val Val Gln Asp Ile Glu Val 340
345 350Thr Pro Asp Asn Leu Pro Glu Phe Leu Thr Trp
Phe Phe Thr His Cys 355 360 365Glu
Ile Glu Pro Val Trp Leu Cys Pro Ile Arg Leu Ala Asp Asp Ser 370
375 380Gly Glu Arg Thr Pro Trp Pro Leu Tyr Pro
Leu Ser Pro Gly Asp Thr385 390 395
400Trp Val Asn Val Gly Phe Trp Ser Ser Val Pro Ala Asp Leu Met
Gly 405 410 415Lys Asp Ala
Pro Thr Gly Ala Phe Asn Arg Glu Val Glu Arg Val Val 420
425 430Ser Asp Leu Gly Gly His Lys Ser Leu Tyr
Ser Glu Ala Phe Tyr Ser 435 440
445Glu Glu Gln Phe Ala Ala Leu Tyr Gly Gly Glu Arg Pro Ala Gln Leu 450
455 460Lys Ala Val Phe Asp Pro Asp Asp
Arg Phe Pro Gly Leu Tyr Glu Lys465 470
475 480Thr Val Gly Gly Val
485191368DNAUnknownCorynebacterium glyciniphilium 19atgagcaggg gattcacgcc
gctgacggtg ggacagatcg tggacaaggt catcacaccg 60ccggcaccgt tccgggtgac
cgctttcgac ggatccaccg cggggccggc agacgcggaa 120ctggcactgg agatcacatc
gccggacgcc ctggcctata tcgtgaccgc gccgggcgac 180ctcggactgg cacgcgccta
catcaccgga agcctccgcg tcaccggtga cgagcccggc 240cacccgtacc tcgtctttga
ccacctccag cacctttacg accagatccg acgcccctcg 300gcgaaggacc tgctggatat
cgcccgctcg ctgaaggcca tgggggcgat caaggtgcag 360ccggcaccgg agcaggagac
cctcccgggc tggaagaggg ccatactcga ggggctgtcc 420cggcactctc cggaacggga
caaggaggtc gtgagccgcc actacgacgt gggcaatgac 480ttctacgagc tcttcctcgg
cgattccatg gcctacacct gtgcctacta tcccgagttt 540gacggtgaga accaggtcac
cggtcccacc ggcgggtggc ggtacgacga ctgggagaaa 600gggccgaccg ccaacgggcc
gttgacccag gcgcaggaca acaagcatcg cctggtcttc 660gacaagctgc gactcaaccc
gggtgaccgg ttgttggacg tcggctgcgg gtggggcggt 720atggtgcggt acgccgcccg
ccacggcgtg aaggccatcg gtgtcacgct gtcccgagag 780cagtacgagt ggggtaaggc
gaagatcgag gaggagggtc tgcaggacct cgccgaggtc 840cggtgtatgg actaccgtga
cgtgccggag tccgacttcg acgcggtcag tgccatcggc 900atcctcgagc acatcggcgt
gcccaactac gaggactact tcacccgcct gttcgccaag 960ctgcgcccgg gcggtcggat
gctgaaccac tgcatcaccc gtccgcacaa ccggaagacg 1020aagaccggcc agttcatcga
ccgctacatc ttccccgacg gtgagctgac cggctcgggc 1080cggatcatca cgatcatgca
ggacaccgga ttcgacgtcg tccacgagga gaatctgcga 1140ccgcactacc agcgcacgtt
gcatgactgg tgtgaactgt tggccaccaa ctgggaccag 1200gccgtccatc tcgtgggcga
ggagacggct cgtctgttcg gcctgtacat ggcggggtcg 1260gaatggggtt tcgaacacaa
cgtgatccag ctccaccagg ttctcggcgt gaagccggac 1320gcggcaggca gttccggggt
gccggtccgc cagtggtgga ggtcctga
136820455PRTUnknownCorynebacterium glyciniphilium 20Met Ser Arg Gly Phe
Thr Pro Leu Thr Val Gly Gln Ile Val Asp Lys1 5
10 15Val Ile Thr Pro Pro Ala Pro Phe Arg Val Thr
Ala Phe Asp Gly Ser 20 25
30Thr Ala Gly Pro Ala Asp Ala Glu Leu Ala Leu Glu Ile Thr Ser Pro
35 40 45Asp Ala Leu Ala Tyr Ile Val Thr
Ala Pro Gly Asp Leu Gly Leu Ala 50 55
60Arg Ala Tyr Ile Thr Gly Ser Leu Arg Val Thr Gly Asp Glu Pro Gly65
70 75 80His Pro Tyr Leu Val
Phe Asp His Leu Gln His Leu Tyr Asp Gln Ile 85
90 95Arg Arg Pro Ser Ala Lys Asp Leu Leu Asp Ile
Ala Arg Ser Leu Lys 100 105
110Ala Met Gly Ala Ile Lys Val Gln Pro Ala Pro Glu Gln Glu Thr Leu
115 120 125Pro Gly Trp Lys Arg Ala Ile
Leu Glu Gly Leu Ser Arg His Ser Pro 130 135
140Glu Arg Asp Lys Glu Val Val Ser Arg His Tyr Asp Val Gly Asn
Asp145 150 155 160Phe Tyr
Glu Leu Phe Leu Gly Asp Ser Met Ala Tyr Thr Cys Ala Tyr
165 170 175Tyr Pro Glu Phe Asp Gly Glu
Asn Gln Val Thr Gly Pro Thr Gly Gly 180 185
190Trp Arg Tyr Asp Asp Trp Glu Lys Gly Pro Thr Ala Asn Gly
Pro Leu 195 200 205Thr Gln Ala Gln
Asp Asn Lys His Arg Leu Val Phe Asp Lys Leu Arg 210
215 220Leu Asn Pro Gly Asp Arg Leu Leu Asp Val Gly Cys
Gly Trp Gly Gly225 230 235
240Met Val Arg Tyr Ala Ala Arg His Gly Val Lys Ala Ile Gly Val Thr
245 250 255Leu Ser Arg Glu Gln
Tyr Glu Trp Gly Lys Ala Lys Ile Glu Glu Glu 260
265 270Gly Leu Gln Asp Leu Ala Glu Val Arg Cys Met Asp
Tyr Arg Asp Val 275 280 285Pro Glu
Ser Asp Phe Asp Ala Val Ser Ala Ile Gly Ile Leu Glu His 290
295 300Ile Gly Val Pro Asn Tyr Glu Asp Tyr Phe Thr
Arg Leu Phe Ala Lys305 310 315
320Leu Arg Pro Gly Gly Arg Met Leu Asn His Cys Ile Thr Arg Pro His
325 330 335Asn Arg Lys Thr
Lys Thr Gly Gln Phe Ile Asp Arg Tyr Ile Phe Pro 340
345 350Asp Gly Glu Leu Thr Gly Ser Gly Arg Ile Ile
Thr Ile Met Gln Asp 355 360 365Thr
Gly Phe Asp Val Val His Glu Glu Asn Leu Arg Pro His Tyr Gln 370
375 380Arg Thr Leu His Asp Trp Cys Glu Leu Leu
Ala Thr Asn Trp Asp Gln385 390 395
400Ala Val His Leu Val Gly Glu Glu Thr Ala Arg Leu Phe Gly Leu
Tyr 405 410 415Met Ala Gly
Ser Glu Trp Gly Phe Glu His Asn Val Ile Gln Leu His 420
425 430Gln Val Leu Gly Val Lys Pro Asp Ala Ala
Gly Ser Ser Gly Val Pro 435 440
445Val Arg Gln Trp Trp Arg Ser 450
45521588DNAUnknownCorynebacterium glyciniphilium 21gtggcggtgc tgtgcacacc
gttgctgctc ggagcctgca ccatcggcga cgcgggaccg 60ggggacgaga ccacggaccc
tgtcgtggac actgaagcac cgcccgataa accggtgccg 120gactctgcgg cggaatccgg
cgctgaagac ggacctgatt ctgaggtgcc ggacgacccc 180gaccagcctg atgctgagcc
ggtggagact gatcccgacg ccccgggggc ccggggactg 240gcgatcggtg actgcgtcgc
cgacatggac cagctcgacg gcaccggcga catcgacgtc 300gtcgactgcg ccggccccca
tgccggcgag gtgtacgcac aggcggatat cgcaggtaag 360aacctgttcc ccggcaacga
gccgttgggg caggaggcgg gagcgatctg cgggggtgac 420tccttcaccg gctatgtcgg
catcggattc cccgagtcct cgctggacgt cgtcacgatg 480atgccgtcca aggagagctg
ggcgcaggag gaccggacgg tgacctgtgt ggtcaccgac 540ccgaacctcg agcagatcgc
cggcacgctc gagcagagct ggcgttag
58822195PRTUnknownCorynebacterium glyciniphilium 22Val Ala Val Leu Cys
Thr Pro Leu Leu Leu Gly Ala Cys Thr Ile Gly1 5
10 15Asp Ala Gly Pro Gly Asp Glu Thr Thr Asp Pro
Val Val Asp Thr Glu 20 25
30Ala Pro Pro Asp Lys Pro Val Pro Asp Ser Ala Ala Glu Ser Gly Ala
35 40 45Glu Asp Gly Pro Asp Ser Glu Val
Pro Asp Asp Pro Asp Gln Pro Asp 50 55
60Ala Glu Pro Val Glu Thr Asp Pro Asp Ala Pro Gly Ala Arg Gly Leu65
70 75 80Ala Ile Gly Asp Cys
Val Ala Asp Met Asp Gln Leu Asp Gly Thr Gly 85
90 95Asp Ile Asp Val Val Asp Cys Ala Gly Pro His
Ala Gly Glu Val Tyr 100 105
110Ala Gln Ala Asp Ile Ala Gly Lys Asn Leu Phe Pro Gly Asn Glu Pro
115 120 125Leu Gly Gln Glu Ala Gly Ala
Ile Cys Gly Gly Asp Ser Phe Thr Gly 130 135
140Tyr Val Gly Ile Gly Phe Pro Glu Ser Ser Leu Asp Val Val Thr
Met145 150 155 160Met Pro
Ser Lys Glu Ser Trp Ala Gln Glu Asp Arg Thr Val Thr Cys
165 170 175Val Val Thr Asp Pro Asn Leu
Glu Gln Ile Ala Gly Thr Leu Glu Gln 180 185
190Ser Trp Arg 195231395DNAUnknownKnoella aerolata
23atgagcatgg accggaccgg accggccagg gtgcggaccg tgggggagcg gcggctgctc
60gagagcttcg ccgccgtccc cccgggcgaa cgcgtgcggc tggccaagcg cacgtccaac
120ctcttccgcg cccgggaggg cacctcgaca cgcgggctcg acacgagcgg actgaccggc
180gtgcgcgtgg tcgacgcagg caccctcacg gccgacgtcg acggaatgtg cacgtacgag
240gacctcgtcg ccgcaacgct gccgctcggg ctcgcgccgc tcgtcgtgcc ccagctgcgg
300accatcaccg tcggcggggc ggtcaccggt ctcgggatcg agtcgacgtc gttccgcaac
360gggttgccgc acgagtccgt cctcgagatg gacgtcctca cgggtgccgg cgagatcgtc
420actgccacag cggacaacga gcacgccgac ctcttccgcg gcttccccaa ctcctacggg
480tcgctgggct acgcgacgtg cctgcgcatc gagctcgagc gtgtgggtac ctgtgtggag
540gtgaggcacg tccgcttcca cgacctcgac gccctgtgcg ccgccatcgc cgaggtcgtg
600gcgacgagat cgcacgaggg cgaggaggtc gaccacgtgg acggggtggt cttctcccgc
660gacgaggcgt acctcacgct gggtcgtcac tccgaccgga ccggaccgac cagcgactac
720accgggcagc aggtctacta ccggtcgatc cagcacgacg gcccctctcc acggcgcgac
780ctgctcacca ctcacgacta cctctggcgc tgggacaccg actggttctg gtgctcgcgc
840gccttcgggg cccaggaccc gcgcgtccgg cggtggtggc cgcgccggtg gcgccggtcg
900agcgtgtact ggaggctcgt ggcggcggac cggcgcgtcg ggttctcgga ccgcctcgag
960gcacgtcggg gcaacccgcc gcgggagcgg gtggtccagg acgtcgagat cccgctcggg
1020cagaccgcgg ccttcctcca ctggttcctc gacgaggtgc cgatcgaacc gatctggctg
1080tgcccgttgc gtcttcgcga ccatcagagg tggccgctct atccgctcga gcccggacgc
1140acctacgtca acgtggggtt ctggtcgacc gtgccggggc ccggaccggg cgaggagctg
1200ggcgccacca accgcgccat cgagcgccgt gtcgacgagg tcggcggcca caagtccctg
1260tactccgact cctactactc ccggtccgac ttcgacgccc tctacggcgg ggacgcgtat
1320gccgtgctga aggccaccta cgacccggac gggcggttcc ctcacctcta cgacaaggcg
1380gtgcgacacg catga
139524464PRTUnknownKnoella aerolata 24Met Ser Met Asp Arg Thr Gly Pro Ala
Arg Val Arg Thr Val Gly Glu1 5 10
15Arg Arg Leu Leu Glu Ser Phe Ala Ala Val Pro Pro Gly Glu Arg
Val 20 25 30Arg Leu Ala Lys
Arg Thr Ser Asn Leu Phe Arg Ala Arg Glu Gly Thr 35
40 45Ser Thr Arg Gly Leu Asp Thr Ser Gly Leu Thr Gly
Val Arg Val Val 50 55 60Asp Ala Gly
Thr Leu Thr Ala Asp Val Asp Gly Met Cys Thr Tyr Glu65 70
75 80Asp Leu Val Ala Ala Thr Leu Pro
Leu Gly Leu Ala Pro Leu Val Val 85 90
95Pro Gln Leu Arg Thr Ile Thr Val Gly Gly Ala Val Thr Gly
Leu Gly 100 105 110Ile Glu Ser
Thr Ser Phe Arg Asn Gly Leu Pro His Glu Ser Val Leu 115
120 125Glu Met Asp Val Leu Thr Gly Ala Gly Glu Ile
Val Thr Ala Thr Ala 130 135 140Asp Asn
Glu His Ala Asp Leu Phe Arg Gly Phe Pro Asn Ser Tyr Gly145
150 155 160Ser Leu Gly Tyr Ala Thr Cys
Leu Arg Ile Glu Leu Glu Arg Val Gly 165
170 175Thr Cys Val Glu Val Arg His Val Arg Phe His Asp
Leu Asp Ala Leu 180 185 190Cys
Ala Ala Ile Ala Glu Val Val Ala Thr Arg Ser His Glu Gly Glu 195
200 205Glu Val Asp His Val Asp Gly Val Val
Phe Ser Arg Asp Glu Ala Tyr 210 215
220Leu Thr Leu Gly Arg His Ser Asp Arg Thr Gly Pro Thr Ser Asp Tyr225
230 235 240Thr Gly Gln Gln
Val Tyr Tyr Arg Ser Ile Gln His Asp Gly Pro Ser 245
250 255Pro Arg Arg Asp Leu Leu Thr Thr His Asp
Tyr Leu Trp Arg Trp Asp 260 265
270Thr Asp Trp Phe Trp Cys Ser Arg Ala Phe Gly Ala Gln Asp Pro Arg
275 280 285Val Arg Arg Trp Trp Pro Arg
Arg Trp Arg Arg Ser Ser Val Tyr Trp 290 295
300Arg Leu Val Ala Ala Asp Arg Arg Val Gly Phe Ser Asp Arg Leu
Glu305 310 315 320Ala Arg
Arg Gly Asn Pro Pro Arg Glu Arg Val Val Gln Asp Val Glu
325 330 335Ile Pro Leu Gly Gln Thr Ala
Ala Phe Leu His Trp Phe Leu Asp Glu 340 345
350Val Pro Ile Glu Pro Ile Trp Leu Cys Pro Leu Arg Leu Arg
Asp His 355 360 365Gln Arg Trp Pro
Leu Tyr Pro Leu Glu Pro Gly Arg Thr Tyr Val Asn 370
375 380Val Gly Phe Trp Ser Thr Val Pro Gly Pro Gly Pro
Gly Glu Glu Leu385 390 395
400Gly Ala Thr Asn Arg Ala Ile Glu Arg Arg Val Asp Glu Val Gly Gly
405 410 415His Lys Ser Leu Tyr
Ser Asp Ser Tyr Tyr Ser Arg Ser Asp Phe Asp 420
425 430Ala Leu Tyr Gly Gly Asp Ala Tyr Ala Val Leu Lys
Ala Thr Tyr Asp 435 440 445Pro Asp
Gly Arg Phe Pro His Leu Tyr Asp Lys Ala Val Arg His Ala 450
455 460251284DNAUnknownKnoella aerolata 25atgagccaca
cgaccgatga gatccgcacg gtcgccgacc tcgtcgacga ggtggtcgtc 60ggcccgctgc
cggtgcgggt cacggcctac gacgggtcga agacggggcc ggacagcgcc 120ccgcgaacca
tccacatcgc caaccagcga gcggtcgcct acctcgccac cgcgcccggg 180gacctcggca
tggcccgcgc ctacaccacc ggtgacctcg tcgtcgaggg cgtgcacccg 240ggcaacccct
acgaggccct ggtcgacctc gaacgtgtgc acttccgccg cccggacccg 300cggctgctcc
tcgacctcgc gcgcatcgtc gggccacgca acctcgcgcc cccgcccccg 360ccgccgcagg
aggctgtgcc gaggtggcgg cgggtggccg agggcctgcg ccactcgtac 420gggcgggaca
gcgaggcgat ccgccaccac tacgacgtct ccaaccactt ctacgagcag 480gtgctcggcc
cgagcatgac ctacacctgc gcggtcttcc ccgaccacga caccgggctc 540gacgaggcgc
aggaggagaa gtaccgcctc gtcttcgaga agctcgcgct gcgtcccggt 600gaccggttgc
tcgacatcgg ctgcgggtgg ggcgggatgg tccggtacgc cgcacggcgg 660ggggtgcgag
cgctcggcgt gacactgtcc ggtgagcagg cggcgtgggc acaggtcgcc 720atcgcccgcg
aggggctggg ggagctcgcc gccgtccggc acgaggacta ccgccacgtc 780gccgagaccg
ggttcgacgc catctcctcg atcggcatca ccgagcacat cggggtgcgc 840aactacccca
cgtacttcga ctggatgctc caccacgtca agccgggagg gctcgtgctc 900aaccactgca
tcaccagacc cgagaaccgg gccaagagcg tcggccggtt catcgaccgc 960tacatcttcc
ccgacggcga gctcaccggg tccggccgga tcatcacgac catgcaggac 1020aacggtttcg
aggtcgtgca ctccgagaac ctgcgagagc actacgccct caccctggcg 1080gcctggggcg
agaacctcgt cgagcactgg gcctcctgcg tggccgacgt gggggagggg 1140acggcgaagg
tctggggcct ctacctcgcg ggctcgcgtc gtggcttcga gcgcaacgtc 1200gtccagctgc
accaggtgct ggccgcgagg ccggtgccgt cccgactccc gcaggtgccg 1260ctgcgccagt
ggtggacctc gtga
128426427PRTUnknownKnoella aerolata 26Met Ser His Thr Thr Asp Glu Ile Arg
Thr Val Ala Asp Leu Val Asp1 5 10
15Glu Val Val Val Gly Pro Leu Pro Val Arg Val Thr Ala Tyr Asp
Gly 20 25 30Ser Lys Thr Gly
Pro Asp Ser Ala Pro Arg Thr Ile His Ile Ala Asn 35
40 45Gln Arg Ala Val Ala Tyr Leu Ala Thr Ala Pro Gly
Asp Leu Gly Met 50 55 60Ala Arg Ala
Tyr Thr Thr Gly Asp Leu Val Val Glu Gly Val His Pro65 70
75 80Gly Asn Pro Tyr Glu Ala Leu Val
Asp Leu Glu Arg Val His Phe Arg 85 90
95Arg Pro Asp Pro Arg Leu Leu Leu Asp Leu Ala Arg Ile Val
Gly Pro 100 105 110Arg Asn Leu
Ala Pro Pro Pro Pro Pro Pro Gln Glu Ala Val Pro Arg 115
120 125Trp Arg Arg Val Ala Glu Gly Leu Arg His Ser
Tyr Gly Arg Asp Ser 130 135 140Glu Ala
Ile Arg His His Tyr Asp Val Ser Asn His Phe Tyr Glu Gln145
150 155 160Val Leu Gly Pro Ser Met Thr
Tyr Thr Cys Ala Val Phe Pro Asp His 165
170 175Asp Thr Gly Leu Asp Glu Ala Gln Glu Glu Lys Tyr
Arg Leu Val Phe 180 185 190Glu
Lys Leu Ala Leu Arg Pro Gly Asp Arg Leu Leu Asp Ile Gly Cys 195
200 205Gly Trp Gly Gly Met Val Arg Tyr Ala
Ala Arg Arg Gly Val Arg Ala 210 215
220Leu Gly Val Thr Leu Ser Gly Glu Gln Ala Ala Trp Ala Gln Val Ala225
230 235 240Ile Ala Arg Glu
Gly Leu Gly Glu Leu Ala Ala Val Arg His Glu Asp 245
250 255Tyr Arg His Val Ala Glu Thr Gly Phe Asp
Ala Ile Ser Ser Ile Gly 260 265
270Ile Thr Glu His Ile Gly Val Arg Asn Tyr Pro Thr Tyr Phe Asp Trp
275 280 285Met Leu His His Val Lys Pro
Gly Gly Leu Val Leu Asn His Cys Ile 290 295
300Thr Arg Pro Glu Asn Arg Ala Lys Ser Val Gly Arg Phe Ile Asp
Arg305 310 315 320Tyr Ile
Phe Pro Asp Gly Glu Leu Thr Gly Ser Gly Arg Ile Ile Thr
325 330 335Thr Met Gln Asp Asn Gly Phe
Glu Val Val His Ser Glu Asn Leu Arg 340 345
350Glu His Tyr Ala Leu Thr Leu Ala Ala Trp Gly Glu Asn Leu
Val Glu 355 360 365His Trp Ala Ser
Cys Val Ala Asp Val Gly Glu Gly Thr Ala Lys Val 370
375 380Trp Gly Leu Tyr Leu Ala Gly Ser Arg Arg Gly Phe
Glu Arg Asn Val385 390 395
400Val Gln Leu His Gln Val Leu Ala Ala Arg Pro Val Pro Ser Arg Leu
405 410 415Pro Gln Val Pro Leu
Arg Gln Trp Trp Thr Ser 420
425271392DNAMycobacterium austroafricanum 27gtgtctgttc cttcgaccga
cgcacgttct gctcacgccg acggcgtgca gcggcttctc 60gccagctatc gggcgattcc
ccaagacgcc acggtccggc tggccaaacc cacgtcgaac 120ctcttccgtg cccgcgcgaa
aaccaggacc aagggtctgg acacgtctgg gttgacgaac 180gtgatcgcgg tcgacgcgga
ggcacgcacc gccgatgtgg cagggatgtg cacctacgaa 240gacctggtcg cggccacgct
gccgcatgga ctttcgccgc tggtggtgcc gcagttgaag 300acgatcaccc tcggcggggc
ggtcaccgga ctcgggatcg agtccgcctc gttccgcaac 360ggcctgccac acgaatcggt
tctcgagatg gacgtcctca ccggcaccgg tgatgtcgtg 420cgcgcctccc ccgacgagaa
ccctgacctg tttcgggcgt ttccgaattc ctatggcacg 480ttgggctatt cggttcggct
caagatcgag ctggaaccgg tgaagccgtt cgtcgcgctg 540cgccacctcc gtttccattc
gctgtcggct ctcatcgagg cgatggaccg catcgtcgaa 600accggcggcc tcaacggcga
accggtggac tacctcgacg gcgtcgtgtt cagtgccgag 660gagagttacc tgtgcgtggg
gcagcgctcc gcgacaccgg gcccggtcag cgactacacg 720ggcaagcaga tctactaccg
ctcgattcag cacgacggcc cgaccgatgg cgccgagaag 780cacgaccggc tgaccatcca
cgactacctg tggcgctggg acaccgactg gttctggtgc 840tcaagggcat tcggcgcgca
gaacccgcgg atccggcgct ggtggccgcg ccggtaccgg 900cgcagcagtg tgtactggaa
gctgatcggc tacgaccggc gtttcggtat cgccgatcgc 960atcgagaagc gcaacggccg
acccccgcgc gagcgggtgg tccaggacat cgaggtgccc 1020atcgagcgga ccgtcgagtt
tctgcagtgg tttctcgaca ccgtgcccat cgaaccgatc 1080tggttgtgcc cgttgcggct
ccgcgacgac cgcgattggc ccctgtatcc gatccgaccc 1140caccacacct acgtcaacgt
gggtttctgg tcgtcggtgc cggtgggccc ggaggagggc 1200tacaccaaca ggatgatcga
acggaaagtc agcgacctcg acggtcacaa atcgctgtat 1260tccgatgcgt actactcgcc
ggaagagttt gattcgctct atggcgggga gacgtacaag 1320acggtgaaga agacatacga
cccagactct cgtttcctgg acctgtacgg caaagcagtg 1380gggcggcaat ga
139228463PRTMycobacterium
austroafricanum 28Val Ser Val Pro Ser Thr Asp Ala Arg Ser Ala His Ala Asp
Gly Val1 5 10 15Gln Arg
Leu Leu Ala Ser Tyr Arg Ala Ile Pro Gln Asp Ala Thr Val 20
25 30Arg Leu Ala Lys Pro Thr Ser Asn Leu
Phe Arg Ala Arg Ala Lys Thr 35 40
45Arg Thr Lys Gly Leu Asp Thr Ser Gly Leu Thr Asn Val Ile Ala Val 50
55 60Asp Ala Glu Ala Arg Thr Ala Asp Val
Ala Gly Met Cys Thr Tyr Glu65 70 75
80Asp Leu Val Ala Ala Thr Leu Pro His Gly Leu Ser Pro Leu
Val Val 85 90 95Pro Gln
Leu Lys Thr Ile Thr Leu Gly Gly Ala Val Thr Gly Leu Gly 100
105 110Ile Glu Ser Ala Ser Phe Arg Asn Gly
Leu Pro His Glu Ser Val Leu 115 120
125Glu Met Asp Val Leu Thr Gly Thr Gly Asp Val Val Arg Ala Ser Pro
130 135 140Asp Glu Asn Pro Asp Leu Phe
Arg Ala Phe Pro Asn Ser Tyr Gly Thr145 150
155 160Leu Gly Tyr Ser Val Arg Leu Lys Ile Glu Leu Glu
Pro Val Lys Pro 165 170
175Phe Val Ala Leu Arg His Leu Arg Phe His Ser Leu Ser Ala Leu Ile
180 185 190Glu Ala Met Asp Arg Ile
Val Glu Thr Gly Gly Leu Asn Gly Glu Pro 195 200
205Val Asp Tyr Leu Asp Gly Val Val Phe Ser Ala Glu Glu Ser
Tyr Leu 210 215 220Cys Val Gly Gln Arg
Ser Ala Thr Pro Gly Pro Val Ser Asp Tyr Thr225 230
235 240Gly Lys Gln Ile Tyr Tyr Arg Ser Ile Gln
His Asp Gly Pro Thr Asp 245 250
255Gly Ala Glu Lys His Asp Arg Leu Thr Ile His Asp Tyr Leu Trp Arg
260 265 270Trp Asp Thr Asp Trp
Phe Trp Cys Ser Arg Ala Phe Gly Ala Gln Asn 275
280 285Pro Arg Ile Arg Arg Trp Trp Pro Arg Arg Tyr Arg
Arg Ser Ser Val 290 295 300Tyr Trp Lys
Leu Ile Gly Tyr Asp Arg Arg Phe Gly Ile Ala Asp Arg305
310 315 320Ile Glu Lys Arg Asn Gly Arg
Pro Pro Arg Glu Arg Val Val Gln Asp 325
330 335Ile Glu Val Pro Ile Glu Arg Thr Val Glu Phe Leu
Gln Trp Phe Leu 340 345 350Asp
Thr Val Pro Ile Glu Pro Ile Trp Leu Cys Pro Leu Arg Leu Arg 355
360 365Asp Asp Arg Asp Trp Pro Leu Tyr Pro
Ile Arg Pro His His Thr Tyr 370 375
380Val Asn Val Gly Phe Trp Ser Ser Val Pro Val Gly Pro Glu Glu Gly385
390 395 400Tyr Thr Asn Arg
Met Ile Glu Arg Lys Val Ser Asp Leu Asp Gly His 405
410 415Lys Ser Leu Tyr Ser Asp Ala Tyr Tyr Ser
Pro Glu Glu Phe Asp Ser 420 425
430Leu Tyr Gly Gly Glu Thr Tyr Lys Thr Val Lys Lys Thr Tyr Asp Pro
435 440 445Asp Ser Arg Phe Leu Asp Leu
Tyr Gly Lys Ala Val Gly Arg Gln 450 455
460291323DNAMycobacterium austroafricanum 29ttgacgacat ttcgggacgg
cgcggccgac accggcctgc acggagaccg caagctcacc 60ctggcggagg tcttggaggt
cttcgcctcg ggccgactgc ctctgaagtt cacggcgtac 120gacggcagca gcgcgggccc
ggacgacgcc acgctcgggc tggacctgct gaccccccgc 180gggaccacgt acctcgcaac
ggctcccggc gatctcggcc tggcccgggc ctacgtctcc 240ggtgacctgc agttgcaggg
ggtgcaccct ggcgacccgt acgacctgct caacgcactg 300gtgcagaaac tggacttcaa
gcgaccgtcc gcccgggtgc tggcgcaggt cgtccgatcg 360atcgggatcg agcacctgaa
accgatcgcg ccaccgccgc aggaggcgct gccgcggtgg 420cggcgcatcg cagaaggact
gcggcacagc aagacccgtg acgccgacgc gatccaccac 480cattacgatg tctccaacac
cttctacgag tgggtgctcg ggccgtcgat gacctacacc 540tgcgcctgct acccgcatcc
cgacgccacc ctcgaggagg cgcaggagaa caaatatcgg 600ctggtgttcg agaaactgcg
cctcaagccg ggcgaccgcc ttctcgacgt gggttgcggg 660tggggcggaa tggtgcgcta
cgcggcccgt cacggcgtca aggcgatcgg ggtgacgctg 720tccagggagc aggcgcagtg
ggcacgcgcc gccatcgaac gggacggcct gggtgacctc 780gccgaggtcc gccacagcga
ctaccgcgat gtgcgcgagt cccagttcga cgccgtgtct 840tcgctggggc tcaccgagca
catcggggtc gccaactatc cgtcgtactt ccggttcctc 900aagtcgaagt tgcgcccggg
cggcctactg ctcaaccact gcatcacccg gcacaacaat 960cgcaccggcc ccgccgccgg
gggattcatc gaccggtatg tgttcccgga cggggagctg 1020accggatcgg gccggatcat
caccgagatc caggacgtcg gtttggaggt gatgcacgaa 1080gagaacctgc gccggcacta
tgcgctgaca cttcgggact ggtgccggaa tctggtgcag 1140cactgggacg aagcggtcgc
agaggtcggc ctgcccaccg ccaaggtgtg gggtctgtac 1200atggctgcct cgcgggtcgg
cttcgagcag aacagcattc agctgcatca ggtactggcg 1260gtgaagctcg acgaacgtgg
cggggacggc ggtttgccgt tgcggccctg gtggaccgcg 1320tag
132330440PRTMycobacterium
austroafricanum 30Leu Thr Thr Phe Arg Asp Gly Ala Ala Asp Thr Gly Leu His
Gly Asp1 5 10 15Arg Lys
Leu Thr Leu Ala Glu Val Leu Glu Val Phe Ala Ser Gly Arg 20
25 30Leu Pro Leu Lys Phe Thr Ala Tyr Asp
Gly Ser Ser Ala Gly Pro Asp 35 40
45Asp Ala Thr Leu Gly Leu Asp Leu Leu Thr Pro Arg Gly Thr Thr Tyr 50
55 60Leu Ala Thr Ala Pro Gly Asp Leu Gly
Leu Ala Arg Ala Tyr Val Ser65 70 75
80Gly Asp Leu Gln Leu Gln Gly Val His Pro Gly Asp Pro Tyr
Asp Leu 85 90 95Leu Asn
Ala Leu Val Gln Lys Leu Asp Phe Lys Arg Pro Ser Ala Arg 100
105 110Val Leu Ala Gln Val Val Arg Ser Ile
Gly Ile Glu His Leu Lys Pro 115 120
125Ile Ala Pro Pro Pro Gln Glu Ala Leu Pro Arg Trp Arg Arg Ile Ala
130 135 140Glu Gly Leu Arg His Ser Lys
Thr Arg Asp Ala Asp Ala Ile His His145 150
155 160His Tyr Asp Val Ser Asn Thr Phe Tyr Glu Trp Val
Leu Gly Pro Ser 165 170
175Met Thr Tyr Thr Cys Ala Cys Tyr Pro His Pro Asp Ala Thr Leu Glu
180 185 190Glu Ala Gln Glu Asn Lys
Tyr Arg Leu Val Phe Glu Lys Leu Arg Leu 195 200
205Lys Pro Gly Asp Arg Leu Leu Asp Val Gly Cys Gly Trp Gly
Gly Met 210 215 220Val Arg Tyr Ala Ala
Arg His Gly Val Lys Ala Ile Gly Val Thr Leu225 230
235 240Ser Arg Glu Gln Ala Gln Trp Ala Arg Ala
Ala Ile Glu Arg Asp Gly 245 250
255Leu Gly Asp Leu Ala Glu Val Arg His Ser Asp Tyr Arg Asp Val Arg
260 265 270Glu Ser Gln Phe Asp
Ala Val Ser Ser Leu Gly Leu Thr Glu His Ile 275
280 285Gly Val Ala Asn Tyr Pro Ser Tyr Phe Arg Phe Leu
Lys Ser Lys Leu 290 295 300Arg Pro Gly
Gly Leu Leu Leu Asn His Cys Ile Thr Arg His Asn Asn305
310 315 320Arg Thr Gly Pro Ala Ala Gly
Gly Phe Ile Asp Arg Tyr Val Phe Pro 325
330 335Asp Gly Glu Leu Thr Gly Ser Gly Arg Ile Ile Thr
Glu Ile Gln Asp 340 345 350Val
Gly Leu Glu Val Met His Glu Glu Asn Leu Arg Arg His Tyr Ala 355
360 365Leu Thr Leu Arg Asp Trp Cys Arg Asn
Leu Val Gln His Trp Asp Glu 370 375
380Ala Val Ala Glu Val Gly Leu Pro Thr Ala Lys Val Trp Gly Leu Tyr385
390 395 400Met Ala Ala Ser
Arg Val Gly Phe Glu Gln Asn Ser Ile Gln Leu His 405
410 415Gln Val Leu Ala Val Lys Leu Asp Glu Arg
Gly Gly Asp Gly Gly Leu 420 425
430Pro Leu Arg Pro Trp Trp Thr Ala 435
44031381DNAMycobacterium austroafricanum 31gtgatccgct ttctgctgcg
cgtcgcggtc tttctcggat cgtcggcgat cgggctactg 60gtggccggct ggctggtgcc
gggggtgtcg ctgtcggtgc tgggcttcgt caccgcggtg 120gtgatcttca cggtggcaca
agggattctg tcgccgttct tcctgaagat ggccagccgc 180tacgcgtcgg ccttcctcgg
cggcatcggc ctggtgtcca cgttcgtggc gctgctgctc 240gcgtcgctgc tgtccaacgg
gctcagcatc cgcggcgtcg ggtcgtggat cgcggccacg 300gtggtggtct ggctggtcac
agccctggcg accgtcgtgc tgcccgttct ggtgctgcgg 360gagaagaaga aagcagcctg a
38132126PRTMycobacterium
austroafricanum 32Val Ile Arg Phe Leu Leu Arg Val Ala Val Phe Leu Gly Ser
Ser Ala1 5 10 15Ile Gly
Leu Leu Val Ala Gly Trp Leu Val Pro Gly Val Ser Leu Ser 20
25 30Val Leu Gly Phe Val Thr Ala Val Val
Ile Phe Thr Val Ala Gln Gly 35 40
45Ile Leu Ser Pro Phe Phe Leu Lys Met Ala Ser Arg Tyr Ala Ser Ala 50
55 60Phe Leu Gly Gly Ile Gly Leu Val Ser
Thr Phe Val Ala Leu Leu Leu65 70 75
80Ala Ser Leu Leu Ser Asn Gly Leu Ser Ile Arg Gly Val Gly
Ser Trp 85 90 95Ile Ala
Ala Thr Val Val Val Trp Leu Val Thr Ala Leu Ala Thr Val 100
105 110Val Leu Pro Val Leu Val Leu Arg Glu
Lys Lys Lys Ala Ala 115 120
125331392DNAMycobacterium gilvum 33gtgtctgttg ccgtaaccga cgcacgatcc
gcctacgccc acggcgtgca gcggctggtc 60gcgagttacc gcgccatccc cgccggcgcc
accgtccgcc tggccaaacc cacgtccaac 120ctgttccgcg ccagggcgaa gagcaccgcg
gcgggcctcg acacctccgg cctgacacat 180gtgatcgccg tggaccccga gacgcgcacc
gccgaggtcg cggggatgtg cacctacgag 240gacctggtgg cggcgacgct gccccacggg
ctttcaccgc tggtggtccc gcaactcaag 300acgatcaccc tcggcggcgc cgtcaccggg
ctcggcatcg agtcggcgtc gttccgcaac 360ggccttccgc acgaatcggt cctggagatg
gacatcctca ccgggaccgg cgacatcgtg 420cgcgccgcgc ccgacgagaa tcccgacctt
ttccgcacct tcccgaattc ttatggaacg 480ctgggttact cggttcggct gaagatcgag
ctggagccgg tgaagccgtt cgtggcgtta 540cgccatctcc gcttccactc actgtcgaca
ctcatcgcga cgatggaccg catcgtcgac 600accgggagtc tcgacggtga gcaggtcgac
tatctcgacg gagtggtgtt cagcgccgag 660gagagctacc tgtgcgtcgg aacacgttcc
gcgacaccgg gtcctgtcag cgactacacc 720ggcgagcaca tcttctaccg gtcgatccag
cacgattgcc cgaccgaagg cggacagaag 780cacgaccggc tgacggcgca cgactacttc
tggcgctggg acaccgactg gttctggtgc 840tcaagggcat tcggcgcgca gaacccgaag
gtccgtcggt ggtggccccg acggctccgg 900cgcagcagct tctactggaa gctcgtcggc
tacgaccagc gtttcggcat cgccgaccgg 960atcgagaaac accacggccg gccaccgcgc
gaacgcgtcg tccaggacgt cgaggtcccc 1020atcgagcgca ccgtcgaatt cctgcagtgg
ttcctcgaca cgatcccgat agagccgctc 1080tggttgtgcc cgttgcgact tcgcgatgac
aacagctggt cgctgtaccc gctccggccc 1140catcgcacgt atgtcaacgt gggattctgg
tcgtcggtgc ccgtcgggcc ggaggagggt 1200cacaccaaca agctgatcga acgcaggatc
agcgagctgg agggacacaa gtcgctgtac 1260tccgacgcct tctattcggc cgacgagttc
gacgcgctgt acggcggcga gatctaccgg 1320accgtgaaga agacctacga cccagattct
cgtttcctcg acctctatgc gaaggcggtg 1380cgacggcaat ga
139234463PRTMycobacterium gilvum 34Val
Ser Val Ala Val Thr Asp Ala Arg Ser Ala Tyr Ala His Gly Val1
5 10 15Gln Arg Leu Val Ala Ser Tyr
Arg Ala Ile Pro Ala Gly Ala Thr Val 20 25
30Arg Leu Ala Lys Pro Thr Ser Asn Leu Phe Arg Ala Arg Ala
Lys Ser 35 40 45Thr Ala Ala Gly
Leu Asp Thr Ser Gly Leu Thr His Val Ile Ala Val 50 55
60Asp Pro Glu Thr Arg Thr Ala Glu Val Ala Gly Met Cys
Thr Tyr Glu65 70 75
80Asp Leu Val Ala Ala Thr Leu Pro His Gly Leu Ser Pro Leu Val Val
85 90 95Pro Gln Leu Lys Thr Ile
Thr Leu Gly Gly Ala Val Thr Gly Leu Gly 100
105 110Ile Glu Ser Ala Ser Phe Arg Asn Gly Leu Pro His
Glu Ser Val Leu 115 120 125Glu Met
Asp Ile Leu Thr Gly Thr Gly Asp Ile Val Arg Ala Ala Pro 130
135 140Asp Glu Asn Pro Asp Leu Phe Arg Thr Phe Pro
Asn Ser Tyr Gly Thr145 150 155
160Leu Gly Tyr Ser Val Arg Leu Lys Ile Glu Leu Glu Pro Val Lys Pro
165 170 175Phe Val Ala Leu
Arg His Leu Arg Phe His Ser Leu Ser Thr Leu Ile 180
185 190Ala Thr Met Asp Arg Ile Val Asp Thr Gly Ser
Leu Asp Gly Glu Gln 195 200 205Val
Asp Tyr Leu Asp Gly Val Val Phe Ser Ala Glu Glu Ser Tyr Leu 210
215 220Cys Val Gly Thr Arg Ser Ala Thr Pro Gly
Pro Val Ser Asp Tyr Thr225 230 235
240Gly Glu His Ile Phe Tyr Arg Ser Ile Gln His Asp Cys Pro Thr
Glu 245 250 255Gly Gly Gln
Lys His Asp Arg Leu Thr Ala His Asp Tyr Phe Trp Arg 260
265 270Trp Asp Thr Asp Trp Phe Trp Cys Ser Arg
Ala Phe Gly Ala Gln Asn 275 280
285Pro Lys Val Arg Arg Trp Trp Pro Arg Arg Leu Arg Arg Ser Ser Phe 290
295 300Tyr Trp Lys Leu Val Gly Tyr Asp
Gln Arg Phe Gly Ile Ala Asp Arg305 310
315 320Ile Glu Lys His His Gly Arg Pro Pro Arg Glu Arg
Val Val Gln Asp 325 330
335Val Glu Val Pro Ile Glu Arg Thr Val Glu Phe Leu Gln Trp Phe Leu
340 345 350Asp Thr Ile Pro Ile Glu
Pro Leu Trp Leu Cys Pro Leu Arg Leu Arg 355 360
365Asp Asp Asn Ser Trp Ser Leu Tyr Pro Leu Arg Pro His Arg
Thr Tyr 370 375 380Val Asn Val Gly Phe
Trp Ser Ser Val Pro Val Gly Pro Glu Glu Gly385 390
395 400His Thr Asn Lys Leu Ile Glu Arg Arg Ile
Ser Glu Leu Glu Gly His 405 410
415Lys Ser Leu Tyr Ser Asp Ala Phe Tyr Ser Ala Asp Glu Phe Asp Ala
420 425 430Leu Tyr Gly Gly Glu
Ile Tyr Arg Thr Val Lys Lys Thr Tyr Asp Pro 435
440 445Asp Ser Arg Phe Leu Asp Leu Tyr Ala Lys Ala Val
Arg Arg Gln 450 455
460351323DNAMycobacterium gilvum 35atgacgactt ttcgggaaca taccgacagt
tcggcgtccg acccggatcg gaaactcact 60ttggcagagg tgttggagat cttcgccgcg
ggtcgccgtc cgctgaagtt caccgcctat 120gacggaagta gttgcgggcc tgaggatgcg
acactgggcc tcgacctgct gaccccgcgg 180ggcacgacct acctggccac ggcgccgggt
gatctcggcc tggcgcgggc ctacatcgcc 240ggcgatctgc gcctcagtgg tgtgcatccc
ggcgatcccc atgacctgct cacggcgctg 300acggaacgcc tggagtacag gcgtccgccg
gtgcgagtgc tggccaatgt tctgcgctcc 360atcgggatcg agcacctcaa gcccgtcgcg
ccgccacccc aggagcacct gccgcggtgg 420cggcggatcg cagaggggtt gcggcacagc
aagacccgtg acgctgaggc catccagcac 480cactacgacg tctcgaacac gttctactca
tgggtcctgg gtccgtcgat gacctacacc 540tgcgcctgct atccacaccc ggatgccacg
ctggaggagg cgcaggagaa caagtaccgg 600ctggtgttcg agaagcttcg actcaagccc
ggtgaccggc tgctcgacgt cggttgcggc 660tggggcggaa tggtccgcta cgccgcccgg
cacggggtca aggtcctggg ggtgacgctg 720tcgaaggagc aggcgcagtg ggcggccgac
gcagtcgagc gggacggcct gggtgagttg 780gccgaggtcc gccacggcga ctaccgcgac
gtgcgcgagt cgcacttcga cgcagtgtcc 840tcgctcgggc tcaccgagca catcggcgtc
gcgaactacc cgtcgtactt ccgcttcctg 900aagtcgaaac tgcggccggg tggcctgctg
ctcaaccact gcatcacccg aaacaacaac 960cggagtcacg ccaccgcagg cggattcatc
gatcgctatg tctttcccga cggggagctg 1020acggggtcgg ggcgaatcat caccgaaatg
caggacgtcg gactcgaggt cgtgcacgag 1080gagaatctgc gtcaccacta cgcgctgacg
ctgcgcgact ggagccgcaa cctggtcgcg 1140cactgggacg acgcggtgac cgaggtcggt
ctgccgactg ccaaggtgtg gggcctctac 1200atcgccgcgt cgcgagtcgg cttcgagcag
aacgccattc agctgcacca ggtgctgtcg 1260gtcaagctcg acgagcgtgg ctcggacggc
ggactgccgt tacgaccctg gtggaacgcc 1320tag
132336440PRTMycobacterium gilvum 36Met
Thr Thr Phe Arg Glu His Thr Asp Ser Ser Ala Ser Asp Pro Asp1
5 10 15Arg Lys Leu Thr Leu Ala Glu
Val Leu Glu Ile Phe Ala Ala Gly Arg 20 25
30Arg Pro Leu Lys Phe Thr Ala Tyr Asp Gly Ser Ser Cys Gly
Pro Glu 35 40 45Asp Ala Thr Leu
Gly Leu Asp Leu Leu Thr Pro Arg Gly Thr Thr Tyr 50 55
60Leu Ala Thr Ala Pro Gly Asp Leu Gly Leu Ala Arg Ala
Tyr Ile Ala65 70 75
80Gly Asp Leu Arg Leu Ser Gly Val His Pro Gly Asp Pro His Asp Leu
85 90 95Leu Thr Ala Leu Thr Glu
Arg Leu Glu Tyr Arg Arg Pro Pro Val Arg 100
105 110Val Leu Ala Asn Val Leu Arg Ser Ile Gly Ile Glu
His Leu Lys Pro 115 120 125Val Ala
Pro Pro Pro Gln Glu His Leu Pro Arg Trp Arg Arg Ile Ala 130
135 140Glu Gly Leu Arg His Ser Lys Thr Arg Asp Ala
Glu Ala Ile Gln His145 150 155
160His Tyr Asp Val Ser Asn Thr Phe Tyr Ser Trp Val Leu Gly Pro Ser
165 170 175Met Thr Tyr Thr
Cys Ala Cys Tyr Pro His Pro Asp Ala Thr Leu Glu 180
185 190Glu Ala Gln Glu Asn Lys Tyr Arg Leu Val Phe
Glu Lys Leu Arg Leu 195 200 205Lys
Pro Gly Asp Arg Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met 210
215 220Val Arg Tyr Ala Ala Arg His Gly Val Lys
Val Leu Gly Val Thr Leu225 230 235
240Ser Lys Glu Gln Ala Gln Trp Ala Ala Asp Ala Val Glu Arg Asp
Gly 245 250 255Leu Gly Glu
Leu Ala Glu Val Arg His Gly Asp Tyr Arg Asp Val Arg 260
265 270Glu Ser His Phe Asp Ala Val Ser Ser Leu
Gly Leu Thr Glu His Ile 275 280
285Gly Val Ala Asn Tyr Pro Ser Tyr Phe Arg Phe Leu Lys Ser Lys Leu 290
295 300Arg Pro Gly Gly Leu Leu Leu Asn
His Cys Ile Thr Arg Asn Asn Asn305 310
315 320Arg Ser His Ala Thr Ala Gly Gly Phe Ile Asp Arg
Tyr Val Phe Pro 325 330
335Asp Gly Glu Leu Thr Gly Ser Gly Arg Ile Ile Thr Glu Met Gln Asp
340 345 350Val Gly Leu Glu Val Val
His Glu Glu Asn Leu Arg His His Tyr Ala 355 360
365Leu Thr Leu Arg Asp Trp Ser Arg Asn Leu Val Ala His Trp
Asp Asp 370 375 380Ala Val Thr Glu Val
Gly Leu Pro Thr Ala Lys Val Trp Gly Leu Tyr385 390
395 400Ile Ala Ala Ser Arg Val Gly Phe Glu Gln
Asn Ala Ile Gln Leu His 405 410
415Gln Val Leu Ser Val Lys Leu Asp Glu Arg Gly Ser Asp Gly Gly Leu
420 425 430Pro Leu Arg Pro Trp
Trp Asn Ala 435 44037387DNAMycobacterium gilvum
37atgatccggt tcctgctgcg catcgcggtc tttctgggct catcagcgat cgggctcctc
60gtcgccggat ggctggtgcc cggggtgtcg ctgtcggtgt ggggcttcgt cacggcagtg
120gtgatcttca ccgtggcgca ggcgatcctg tccccgttct tcctcaagat ggccagccgc
180tacgcctcgg cgttcctcgg cgggatcggt ctggtgtcga cgtttgccgc gctgctgctc
240gtctcgctgc tgtccaacgg tctgagcatc cgcggcatcg gatcctggat cgccgcaacc
300gtggtggtct ggttggtgac cgccctggcg acgctggtgc tgccgatgtt ggtgctgcgc
360gagaagaaaa ccgcgtcgcg cgtctga
38738128PRTMycobacterium gilvum 38Met Ile Arg Phe Leu Leu Arg Ile Ala Val
Phe Leu Gly Ser Ser Ala1 5 10
15Ile Gly Leu Leu Val Ala Gly Trp Leu Val Pro Gly Val Ser Leu Ser
20 25 30Val Trp Gly Phe Val Thr
Ala Val Val Ile Phe Thr Val Ala Gln Ala 35 40
45Ile Leu Ser Pro Phe Phe Leu Lys Met Ala Ser Arg Tyr Ala
Ser Ala 50 55 60Phe Leu Gly Gly Ile
Gly Leu Val Ser Thr Phe Ala Ala Leu Leu Leu65 70
75 80Val Ser Leu Leu Ser Asn Gly Leu Ser Ile
Arg Gly Ile Gly Ser Trp 85 90
95Ile Ala Ala Thr Val Val Val Trp Leu Val Thr Ala Leu Ala Thr Leu
100 105 110Val Leu Pro Met Leu
Val Leu Arg Glu Lys Lys Thr Ala Ser Arg Val 115
120 125391425DNAUnknownMycobacterium indicus pranii
39atgcacgggc tgttgtcgaa gactagggta tatgtggtgc ctgtccttgg atctgcactc
60tcggcccaca agtcgggcgt tgaccggctg ctggcaagct atcgatccat tcccgcaacg
120tccgcggtcc ggctggccaa accgacgtca aacctgttcc gcgcccgcac caaacgtgac
180gcgcccggct tggacacctc ggggctgacc ggcgtcctga gcgtggatcc cgaaacccgc
240accgcggacg tcgccggcat gtgcacctac gcggacctgg tggccgcaac gctgccctac
300ggcctgtcgc cgctggtcgt cccgcagctg aagaccatca ccctcggcgg ggcggtcagc
360ggcctgggga tcgagtcggc gtcgtttcgc aacgggctgc cgcacgaatc ggtgctggag
420atggatatcc tcaccggcgc tggcgatttg ctcaccgcat cacgtaccca gcacccggac
480ctgttccgcg ccttcccgaa ttcctatggg acactggggt attcgacccg gcttcggatc
540gagctggaac ccgtcgcacc gttcgtcgcg ctgcgccaca tccgcttccg ctcgctgccc
600gcgctgatcg ccgcggccga acgcatcgtc gacaccggcg ggcagggcgg aaccccggtc
660gactacctcg acggggtggt cttcagcgcc gacgaaagct acctgtgcgt gggccggcgg
720accaccaccc ccggcccggt cagcgactac accggcaagg acatctacta ccagtccatc
780cggcacgacg ccccgggcct ggaggcgacc aaggatgacc ggctgaccat gcacgactac
840ttctggcgct gggacaccga ttggttctgg tgctcgcgcg cgttcggcgt gcaggacccg
900cgggtgcgac gcttctggcc gcgccgttat cggcgcagca gcttctactg gaagctgatt
960tccctggacc ggcgcttcgg gatctccgac cgcatcgagg cgcgcaacgg gcggccccca
1020cgcgaacggg tggtgcaaga catcgagatt ccaatcgaac ggacctgcga cttcctggag
1080tggttcctgg acaacgtgcc aatcacgccg atctggttgt gcccgttgcg ccttcgcgac
1140cgcgacggct ggccgttgta cccgatgcgg ccggatcaca cgtacgtcaa cgtcggcttc
1200tggtcgtcgg tgccgggggg cgcgaccgag ggcgccgcca accggatgat cgaagaaaag
1260gtgagcgaac tcgacgggca caagtccctg tactccgatt ccttctactc ccgcgaggac
1320ttcgacgagc tgtacggcgg cgagacctac aacaccgtca agaaaaccta cgaccccgat
1380tctcgtttac tcgacctcta cgcaaaggcg gtgcaacggc gatga
142540474PRTUnknownMycobacterium indicus pranii 40Met His Gly Leu Leu Ser
Lys Thr Arg Val Tyr Val Val Pro Val Leu1 5
10 15Gly Ser Ala Leu Ser Ala His Lys Ser Gly Val Asp
Arg Leu Leu Ala 20 25 30Ser
Tyr Arg Ser Ile Pro Ala Thr Ser Ala Val Arg Leu Ala Lys Pro 35
40 45Thr Ser Asn Leu Phe Arg Ala Arg Thr
Lys Arg Asp Ala Pro Gly Leu 50 55
60Asp Thr Ser Gly Leu Thr Gly Val Leu Ser Val Asp Pro Glu Thr Arg65
70 75 80Thr Ala Asp Val Ala
Gly Met Cys Thr Tyr Ala Asp Leu Val Ala Ala 85
90 95Thr Leu Pro Tyr Gly Leu Ser Pro Leu Val Val
Pro Gln Leu Lys Thr 100 105
110Ile Thr Leu Gly Gly Ala Val Ser Gly Leu Gly Ile Glu Ser Ala Ser
115 120 125Phe Arg Asn Gly Leu Pro His
Glu Ser Val Leu Glu Met Asp Ile Leu 130 135
140Thr Gly Ala Gly Asp Leu Leu Thr Ala Ser Arg Thr Gln His Pro
Asp145 150 155 160Leu Phe
Arg Ala Phe Pro Asn Ser Tyr Gly Thr Leu Gly Tyr Ser Thr
165 170 175Arg Leu Arg Ile Glu Leu Glu
Pro Val Ala Pro Phe Val Ala Leu Arg 180 185
190His Ile Arg Phe Arg Ser Leu Pro Ala Leu Ile Ala Ala Ala
Glu Arg 195 200 205Ile Val Asp Thr
Gly Gly Gln Gly Gly Thr Pro Val Asp Tyr Leu Asp 210
215 220Gly Val Val Phe Ser Ala Asp Glu Ser Tyr Leu Cys
Val Gly Arg Arg225 230 235
240Thr Thr Thr Pro Gly Pro Val Ser Asp Tyr Thr Gly Lys Asp Ile Tyr
245 250 255Tyr Gln Ser Ile Arg
His Asp Ala Pro Gly Leu Glu Ala Thr Lys Asp 260
265 270Asp Arg Leu Thr Met His Asp Tyr Phe Trp Arg Trp
Asp Thr Asp Trp 275 280 285Phe Trp
Cys Ser Arg Ala Phe Gly Val Gln Asp Pro Arg Val Arg Arg 290
295 300Phe Trp Pro Arg Arg Tyr Arg Arg Ser Ser Phe
Tyr Trp Lys Leu Ile305 310 315
320Ser Leu Asp Arg Arg Phe Gly Ile Ser Asp Arg Ile Glu Ala Arg Asn
325 330 335Gly Arg Pro Pro
Arg Glu Arg Val Val Gln Asp Ile Glu Ile Pro Ile 340
345 350Glu Arg Thr Cys Asp Phe Leu Glu Trp Phe Leu
Asp Asn Val Pro Ile 355 360 365Thr
Pro Ile Trp Leu Cys Pro Leu Arg Leu Arg Asp Arg Asp Gly Trp 370
375 380Pro Leu Tyr Pro Met Arg Pro Asp His Thr
Tyr Val Asn Val Gly Phe385 390 395
400Trp Ser Ser Val Pro Gly Gly Ala Thr Glu Gly Ala Ala Asn Arg
Met 405 410 415Ile Glu Glu
Lys Val Ser Glu Leu Asp Gly His Lys Ser Leu Tyr Ser 420
425 430Asp Ser Phe Tyr Ser Arg Glu Asp Phe Asp
Glu Leu Tyr Gly Gly Glu 435 440
445Thr Tyr Asn Thr Val Lys Lys Thr Tyr Asp Pro Asp Ser Arg Leu Leu 450
455 460Asp Leu Tyr Ala Lys Ala Val Gln
Arg Arg465 470411263DNAUnknownMycobacterium indicus
pranii 41atggccgaga tcctggaggt cttcgccgcc accggccgac atccgctgaa
gttcaccgcc 60tacgacggca gcatcgccgg caacgaggac gccgaactgg gcctggacct
tcgcagcccc 120cgcggcgcca cctatctggc gaccgccccc ggcgaactcg gcctcgcccg
cgcctacgtg 180tcgggcgacc tgcaggccta cggcgtccat cccggcgacc cgtaccaact
gctcaagacg 240ctcaccgatc gggtggaatt caagcggccc ccggtgcggg tgctggccaa
cgtcgtgcgg 300tcgctggggt tcgagcggtt gctgccggtc gcgccgcccc cgcaggaggc
gctgccccgg 360tggcggcgca tcgccgacgg gctgatgcac acgaggaccc gcgacgccga
ggccatccac 420caccactacg acgtgtccaa caccttctac gaattggtgt tggggccgtc
gatgacctac 480acctgcgcgg tgtatcccga tgccgacgcg acactcgaac aggcgcagga
gaacaagtac 540cggctgatct tcgagaagct gcggctgaag gcgggcgacc ggctgctcga
cgtcggctgc 600ggctggggcg gcatggtgcg ctacgcggcc cggcgcggcg tccgggccac
cggcgccacc 660ctgtcggccg aacaggcgaa gtgggcgcag aaggcgatcg ccgaggaagg
ccttgcggac 720ctggccgagg tgcgccacac cgactatcgg gacgtgggcg aggcggcgtt
cgacgccgtg 780tcctcgatcg ggctgaccga gcacatcggc gtcaagaatt accccgccta
cttcggcttc 840ttgaagtcga agctgcgcac cggcggcctg ctgctcaatc actgcatcac
ccgccacgac 900aacacgtcga cgtcgttcgc gggcggattc accgatcgct atgtcttccc
ggacggggag 960ctgaccggct cgggccgcat cacctgcgac gtccaggact gcggcttcga
ggtgctgcac 1020gcggagaact tccgccacca ctacgcgatg acgctgcgcg actggtgccg
caatctggtc 1080gagaactggg acgccgcggt cagcgaggtc ggcctaccga ccgcgaaggt
ctggggcctg 1140tacatggcgg cgtcacgggt tgcgttcgag cagaacaacc ttcagctgca
tcacgtgctg 1200gcggccaaga ccgacgcgcg gggcgacgac gacctgccgc tgcggccgtg
gtggacggcc 1260tga
126342420PRTUnknownMycobacterium indicus pranii 42Met Ala Glu
Ile Leu Glu Val Phe Ala Ala Thr Gly Arg His Pro Leu1 5
10 15Lys Phe Thr Ala Tyr Asp Gly Ser Ile
Ala Gly Asn Glu Asp Ala Glu 20 25
30Leu Gly Leu Asp Leu Arg Ser Pro Arg Gly Ala Thr Tyr Leu Ala Thr
35 40 45Ala Pro Gly Glu Leu Gly Leu
Ala Arg Ala Tyr Val Ser Gly Asp Leu 50 55
60Gln Ala Tyr Gly Val His Pro Gly Asp Pro Tyr Gln Leu Leu Lys Thr65
70 75 80Leu Thr Asp Arg
Val Glu Phe Lys Arg Pro Pro Val Arg Val Leu Ala 85
90 95Asn Val Val Arg Ser Leu Gly Phe Glu Arg
Leu Leu Pro Val Ala Pro 100 105
110Pro Pro Gln Glu Ala Leu Pro Arg Trp Arg Arg Ile Ala Asp Gly Leu
115 120 125Met His Thr Arg Thr Arg Asp
Ala Glu Ala Ile His His His Tyr Asp 130 135
140Val Ser Asn Thr Phe Tyr Glu Leu Val Leu Gly Pro Ser Met Thr
Tyr145 150 155 160Thr Cys
Ala Val Tyr Pro Asp Ala Asp Ala Thr Leu Glu Gln Ala Gln
165 170 175Glu Asn Lys Tyr Arg Leu Ile
Phe Glu Lys Leu Arg Leu Lys Ala Gly 180 185
190Asp Arg Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met Val
Arg Tyr 195 200 205Ala Ala Arg Arg
Gly Val Arg Ala Thr Gly Ala Thr Leu Ser Ala Glu 210
215 220Gln Ala Lys Trp Ala Gln Lys Ala Ile Ala Glu Glu
Gly Leu Ala Asp225 230 235
240Leu Ala Glu Val Arg His Thr Asp Tyr Arg Asp Val Gly Glu Ala Ala
245 250 255Phe Asp Ala Val Ser
Ser Ile Gly Leu Thr Glu His Ile Gly Val Lys 260
265 270Asn Tyr Pro Ala Tyr Phe Gly Phe Leu Lys Ser Lys
Leu Arg Thr Gly 275 280 285Gly Leu
Leu Leu Asn His Cys Ile Thr Arg His Asp Asn Thr Ser Thr 290
295 300Ser Phe Ala Gly Gly Phe Thr Asp Arg Tyr Val
Phe Pro Asp Gly Glu305 310 315
320Leu Thr Gly Ser Gly Arg Ile Thr Cys Asp Val Gln Asp Cys Gly Phe
325 330 335Glu Val Leu His
Ala Glu Asn Phe Arg His His Tyr Ala Met Thr Leu 340
345 350Arg Asp Trp Cys Arg Asn Leu Val Glu Asn Trp
Asp Ala Ala Val Ser 355 360 365Glu
Val Gly Leu Pro Thr Ala Lys Val Trp Gly Leu Tyr Met Ala Ala 370
375 380Ser Arg Val Ala Phe Glu Gln Asn Asn Leu
Gln Leu His His Val Leu385 390 395
400Ala Ala Lys Thr Asp Ala Arg Gly Asp Asp Asp Leu Pro Leu Arg
Pro 405 410 415Trp Trp Thr
Ala 420431380DNAMycobacterium phlei 43gtgtctgaac cccgaaccga
cgcacgtgtt gttcaggccg cgggcgtgca caagctgctg 60gagagctacc gcgcgatccc
gcccgaggcc accgtccggc tggccaaacc cacctcgaac 120ctgttccggg cgcgcgccaa
gacctcggtc aagggtctcg atgtctcggg cctgacccat 180gtgatctccg tcgaccccga
cgagcgcacc gctgaggtgg ccgggatgtg cacctacgag 240gacctggtcg ccgcgacgct
gccgtacggg ctgtcaccgc tggtggtgcc gcagctcaag 300accatcaccc tcggcggcgc
cgtgacgggt ctgggcatcg agtcggcgtc gttccgtaac 360ggcctgccgc acgagtcggt
gctggagatg gacatcctca ccggatcggg cgagatcctc 420accgcctccc gcgaccagca
ccccgacctg ttccgggcgt tcccgaactc ctatggcacg 480ctgggctatt cggtgcggct
gaagatcgag ttggagaccg tcaaaccgtt cgtcgcggtc 540cgtcacctgc ggttccacga
catcgaggac ctggtcgccg agatggaccg cattgtcgag 600accggcggct acgacggcac
cccggtcgac tatctcgacg gtgtggtgtt ctcggcccgc 660gagagctacc tgacgctggg
cttccagacc gccaccccgg gcccggtcag cgactacacc 720ggccagcaga tctactaccg
ctcgatccag cacgaggacg gcgtcaagga cgaccggctg 780acgatccacg actacttctg
gcgctgggac accgactggt tctggtgctc gcgggcgttc 840ggcgtgcaga acccgacgat
ccgccggttc tggccgcgcc ggctcaagcg cagcagcttc 900tactggaagc tggtcgccta
cgaccgcaag ttcaacatcg ccgatcgcat cgagatgcac 960aacggccgcc cgccccgcga
gcgcgtcgtg caggacatcg aggtgccgat cgagcgggtc 1020gccgagtttt tgggctggtt
cctcgacaac gtgccgatcg agccgatctg gctgtgcccg 1080ttgcgtcttc gcgacgacgc
cggctggccg ctgtacccga tccgggcgca gcacacctac 1140gtcaacgtgg ggttctggtc
ctcggtgccg gtggggccca ccgaggggca cacgaaccgg 1200ctgatcgagc gcaaggtcag
cgagctcgac gggcacaagt cgctgtactc ggacgcgtac 1260tactcgcgcg acgagttcga
ccagctctac ggcggcgaaa tctacaaaac cgttaaaaag 1320gcctacgatc cagattcacg
actgctcgac ctgtacgcga aggcggtgca gcgccagtga 138044459PRTMycobacterium
phlei 44Val Ser Glu Pro Arg Thr Asp Ala Arg Val Val Gln Ala Ala Gly Val1
5 10 15His Lys Leu Leu
Glu Ser Tyr Arg Ala Ile Pro Pro Glu Ala Thr Val 20
25 30Arg Leu Ala Lys Pro Thr Ser Asn Leu Phe Arg
Ala Arg Ala Lys Thr 35 40 45Ser
Val Lys Gly Leu Asp Val Ser Gly Leu Thr His Val Ile Ser Val 50
55 60Asp Pro Asp Glu Arg Thr Ala Glu Val Ala
Gly Met Cys Thr Tyr Glu65 70 75
80Asp Leu Val Ala Ala Thr Leu Pro Tyr Gly Leu Ser Pro Leu Val
Val 85 90 95Pro Gln Leu
Lys Thr Ile Thr Leu Gly Gly Ala Val Thr Gly Leu Gly 100
105 110Ile Glu Ser Ala Ser Phe Arg Asn Gly Leu
Pro His Glu Ser Val Leu 115 120
125Glu Met Asp Ile Leu Thr Gly Ser Gly Glu Ile Leu Thr Ala Ser Arg 130
135 140Asp Gln His Pro Asp Leu Phe Arg
Ala Phe Pro Asn Ser Tyr Gly Thr145 150
155 160Leu Gly Tyr Ser Val Arg Leu Lys Ile Glu Leu Glu
Thr Val Lys Pro 165 170
175Phe Val Ala Val Arg His Leu Arg Phe His Asp Ile Glu Asp Leu Val
180 185 190Ala Glu Met Asp Arg Ile
Val Glu Thr Gly Gly Tyr Asp Gly Thr Pro 195 200
205Val Asp Tyr Leu Asp Gly Val Val Phe Ser Ala Arg Glu Ser
Tyr Leu 210 215 220Thr Leu Gly Phe Gln
Thr Ala Thr Pro Gly Pro Val Ser Asp Tyr Thr225 230
235 240Gly Gln Gln Ile Tyr Tyr Arg Ser Ile Gln
His Glu Asp Gly Val Lys 245 250
255Asp Asp Arg Leu Thr Ile His Asp Tyr Phe Trp Arg Trp Asp Thr Asp
260 265 270Trp Phe Trp Cys Ser
Arg Ala Phe Gly Val Gln Asn Pro Thr Ile Arg 275
280 285Arg Phe Trp Pro Arg Arg Leu Lys Arg Ser Ser Phe
Tyr Trp Lys Leu 290 295 300Val Ala Tyr
Asp Arg Lys Phe Asn Ile Ala Asp Arg Ile Glu Met His305
310 315 320Asn Gly Arg Pro Pro Arg Glu
Arg Val Val Gln Asp Ile Glu Val Pro 325
330 335Ile Glu Arg Val Ala Glu Phe Leu Gly Trp Phe Leu
Asp Asn Val Pro 340 345 350Ile
Glu Pro Ile Trp Leu Cys Pro Leu Arg Leu Arg Asp Asp Ala Gly 355
360 365Trp Pro Leu Tyr Pro Ile Arg Ala Gln
His Thr Tyr Val Asn Val Gly 370 375
380Phe Trp Ser Ser Val Pro Val Gly Pro Thr Glu Gly His Thr Asn Arg385
390 395 400Leu Ile Glu Arg
Lys Val Ser Glu Leu Asp Gly His Lys Ser Leu Tyr 405
410 415Ser Asp Ala Tyr Tyr Ser Arg Asp Glu Phe
Asp Gln Leu Tyr Gly Gly 420 425
430Glu Ile Tyr Lys Thr Val Lys Lys Ala Tyr Asp Pro Asp Ser Arg Leu
435 440 445Leu Asp Leu Tyr Ala Lys Ala
Val Gln Arg Gln 450 455451314DNAMycobacterium phlei
45atgacggcga tcaaagagaa cccggtcctg acttcggcca ggaagctgtc cctggccgag
60attctggaaa tccttgccgg gggcgaactc ccggtgcgtt tcacggccta cgacggcagc
120tcggcgggcc cggcggactc cccgctcggc ctggagctgc tgaccccgcg cggcaccacc
180tatctggcca ccgccccggg cgatctcggg ctggcacgcg cctacatcgc cggtgacctg
240cagccgcacg gcgtgcatcc gggcgatccg tacgagctgc tcaaggccct gtcggagaag
300atggagttca agcggccgcc cgcgaaggtg ctggccaaca tcgtgcgctc catcggtatc
360gagcacctca agccgatcgc accgccgccg caggaggcgc agccgcgctg gcgccggatc
420gcggaagggt tgcggcacag caagactcgc gacgccgagg cgatccacca ccactacgac
480gtgtccaaca cgttctacga gtgggtgctc ggcccgtcga tgacctacac ctgcgcgtgc
540tacccggacg tcgacgcaac cctggagcag gcgcaggaga acaagtaccg cctggtgttc
600gagaagctgc gcctgaagcc gggcgaccgg ctgctcgacg tgggctgcgg ctggggcggc
660atggtgcgct acgccgccca gcacggggtc aaggccatcg gcgtcacgct gtctcgggag
720caggcgacgt gggcgcagaa ggcgatcgcc gagcaggggc tcagcgatct ggccgaggtc
780cgccacggcg actaccgcga cattcgcgag tccgggttcg acgcggtgtc ctcgatcggg
840ctgaccgagc acatcggcgt ggccaactac ccgtcgtact tccggttcct gcagtccaag
900ctgcgtgtcg gcgggctgct gctcaaccac tgcatcaccc ggccggacaa caagtcgcag
960gccagcgcgg gcgggttcat cgaccgctac gtgttccccg acggggagct caccgggtcc
1020ggccgcatca tcgccgcggc ccaggacgtc ggcctcgagg tggtgcacga ggagaacctg
1080cgccagcact acgcgatgac gctgcgcgac tggtgccgca acctcgtcga gcactgggac
1140gaggcggtcg ccgaggtcgg cctggaacgc gccaagatct ggggcctgta catggccggc
1200tcccggctcg gcttcgagac gaacatcgtg cagctgcacc aggtgctggc ggtcaagctg
1260gaccgcaggg gcggcgacgg cgggctgccg ttgcgcccgt ggtggacgcc ctag
131446437PRTMycobacterium phlei 46Met Thr Ala Ile Lys Glu Asn Pro Val Leu
Thr Ser Ala Arg Lys Leu1 5 10
15Ser Leu Ala Glu Ile Leu Glu Ile Leu Ala Gly Gly Glu Leu Pro Val
20 25 30Arg Phe Thr Ala Tyr Asp
Gly Ser Ser Ala Gly Pro Ala Asp Ser Pro 35 40
45Leu Gly Leu Glu Leu Leu Thr Pro Arg Gly Thr Thr Tyr Leu
Ala Thr 50 55 60Ala Pro Gly Asp Leu
Gly Leu Ala Arg Ala Tyr Ile Ala Gly Asp Leu65 70
75 80Gln Pro His Gly Val His Pro Gly Asp Pro
Tyr Glu Leu Leu Lys Ala 85 90
95Leu Ser Glu Lys Met Glu Phe Lys Arg Pro Pro Ala Lys Val Leu Ala
100 105 110Asn Ile Val Arg Ser
Ile Gly Ile Glu His Leu Lys Pro Ile Ala Pro 115
120 125Pro Pro Gln Glu Ala Gln Pro Arg Trp Arg Arg Ile
Ala Glu Gly Leu 130 135 140Arg His Ser
Lys Thr Arg Asp Ala Glu Ala Ile His His His Tyr Asp145
150 155 160Val Ser Asn Thr Phe Tyr Glu
Trp Val Leu Gly Pro Ser Met Thr Tyr 165
170 175Thr Cys Ala Cys Tyr Pro Asp Val Asp Ala Thr Leu
Glu Gln Ala Gln 180 185 190Glu
Asn Lys Tyr Arg Leu Val Phe Glu Lys Leu Arg Leu Lys Pro Gly 195
200 205Asp Arg Leu Leu Asp Val Gly Cys Gly
Trp Gly Gly Met Val Arg Tyr 210 215
220Ala Ala Gln His Gly Val Lys Ala Ile Gly Val Thr Leu Ser Arg Glu225
230 235 240Gln Ala Thr Trp
Ala Gln Lys Ala Ile Ala Glu Gln Gly Leu Ser Asp 245
250 255Leu Ala Glu Val Arg His Gly Asp Tyr Arg
Asp Ile Arg Glu Ser Gly 260 265
270Phe Asp Ala Val Ser Ser Ile Gly Leu Thr Glu His Ile Gly Val Ala
275 280 285Asn Tyr Pro Ser Tyr Phe Arg
Phe Leu Gln Ser Lys Leu Arg Val Gly 290 295
300Gly Leu Leu Leu Asn His Cys Ile Thr Arg Pro Asp Asn Lys Ser
Gln305 310 315 320Ala Ser
Ala Gly Gly Phe Ile Asp Arg Tyr Val Phe Pro Asp Gly Glu
325 330 335Leu Thr Gly Ser Gly Arg Ile
Ile Ala Ala Ala Gln Asp Val Gly Leu 340 345
350Glu Val Val His Glu Glu Asn Leu Arg Gln His Tyr Ala Met
Thr Leu 355 360 365Arg Asp Trp Cys
Arg Asn Leu Val Glu His Trp Asp Glu Ala Val Ala 370
375 380Glu Val Gly Leu Glu Arg Ala Lys Ile Trp Gly Leu
Tyr Met Ala Gly385 390 395
400Ser Arg Leu Gly Phe Glu Thr Asn Ile Val Gln Leu His Gln Val Leu
405 410 415Ala Val Lys Leu Asp
Arg Arg Gly Gly Asp Gly Gly Leu Pro Leu Arg 420
425 430Pro Trp Trp Thr Pro
435471413DNAMycobacterium tuberculosis 47atgcaggggc agttgtcgag gactagggta
tatacggtgc ctgtccctgg atctgcacag 60tcggcttacg cctgcggcgt cgagcggttg
ctggcgagct atcgatccat ccccgcgact 120gcatccatcc ggcttgccaa gcccacctca
aatctgttcc gcgcccgcgt caaacacgat 180gcacgcggcc tggacgcatc gggactgacc
ggtgtcatcg gtatcgatcc cgaggcccgc 240accgccgacg tggccggcat gtgcacatac
gaggacctaa tcgccgcgac actgcactac 300ggtctgtcac cattggtggt tccgcagctg
aggacgatca cattgggcgg agcggtcacc 360ggcttgggta tcgagtcggc gtcgttccgc
aacggcctgc cccacgagtc ggtgctggag 420atggatatcc tcaccggcgc aggagaactt
ctcaccgtct cgcccggaca gcactccgac 480ttgtaccgtg cattccctaa ctcgtatggg
acactgggct attcaacccg gcttcgaatc 540cagctggagc cggtccggcc gtttgtcgcg
ctgcggcaca tccgatttag ctcgttgacg 600gcgatggtgg ccgcaatgga gcgcatcatc
gacaccggcg gactggacgg cgaatcggtg 660gactatctcg acggggtggt tttcagcgct
gacgaaagct acctgtgcat cggcatgcag 720acgagcgtac cgggcccggt cagcgactac
accggacaag acatctacta ccggtcgatc 780caacacgagg cggggatcaa ggaagaccgg
ttgaccatcc acgattactt ctggcgctgg 840gacaccgatt ggttctggtg ctcacgatcg
tttggtgccc aaaacccgcg gctgcgccgc 900tggtggccgc ggcgctaccg gcgtagcagt
gtctactgga ggttgatggc gctcgatcag 960cgcttcggga tcgccgaccg gttcgagaac
agcaggggtc gtcccgcgcg tgaacgggtg 1020gtgcaggata tcgaagtgcc gatcgaacgg
acctgcgagt ttctggagtg gttcggggaa 1080aacgtgccca tttcgccaat ctggttgtgc
ccgttgcggc tacgcgatca cgccggctgg 1140ccgctgtacc cgatccggcc tgaccgtagc
tatgtcaaca tcgggttctg gtcgtcggtg 1200ccggttggcg ccaccgaggg cgccaccaac
cgcaagatcg agaacaaggt gagtgcgctc 1260gacgggcaca agtcgctcta ctccgactcc
ttctataccc gcgaggagtt cgacgagctc 1320tacggcggcg agacttacaa cactgtgaag
aaagcctacg atcccgattc gcgtctcctc 1380gatctttacg caaaggcggt gcaacgacga
tga 141348470PRTMycobacterium tuberculosis
48Met Gln Gly Gln Leu Ser Arg Thr Arg Val Tyr Thr Val Pro Val Pro1
5 10 15Gly Ser Ala Gln Ser Ala
Tyr Ala Cys Gly Val Glu Arg Leu Leu Ala 20 25
30Ser Tyr Arg Ser Ile Pro Ala Thr Ala Ser Ile Arg Leu
Ala Lys Pro 35 40 45Thr Ser Asn
Leu Phe Arg Ala Arg Val Lys His Asp Ala Arg Gly Leu 50
55 60Asp Ala Ser Gly Leu Thr Gly Val Ile Gly Ile Asp
Pro Glu Ala Arg65 70 75
80Thr Ala Asp Val Ala Gly Met Cys Thr Tyr Glu Asp Leu Ile Ala Ala
85 90 95Thr Leu His Tyr Gly Leu
Ser Pro Leu Val Val Pro Gln Leu Arg Thr 100
105 110Ile Thr Leu Gly Gly Ala Val Thr Gly Leu Gly Ile
Glu Ser Ala Ser 115 120 125Phe Arg
Asn Gly Leu Pro His Glu Ser Val Leu Glu Met Asp Ile Leu 130
135 140Thr Gly Ala Gly Glu Leu Leu Thr Val Ser Pro
Gly Gln His Ser Asp145 150 155
160Leu Tyr Arg Ala Phe Pro Asn Ser Tyr Gly Thr Leu Gly Tyr Ser Thr
165 170 175Arg Leu Arg Ile
Gln Leu Glu Pro Val Arg Pro Phe Val Ala Leu Arg 180
185 190His Ile Arg Phe Ser Ser Leu Thr Ala Met Val
Ala Ala Met Glu Arg 195 200 205Ile
Ile Asp Thr Gly Gly Leu Asp Gly Glu Ser Val Asp Tyr Leu Asp 210
215 220Gly Val Val Phe Ser Ala Asp Glu Ser Tyr
Leu Cys Ile Gly Met Gln225 230 235
240Thr Ser Val Pro Gly Pro Val Ser Asp Tyr Thr Gly Gln Asp Ile
Tyr 245 250 255Tyr Arg Ser
Ile Gln His Glu Ala Gly Ile Lys Glu Asp Arg Leu Thr 260
265 270Ile His Asp Tyr Phe Trp Arg Trp Asp Thr
Asp Trp Phe Trp Cys Ser 275 280
285Arg Ser Phe Gly Ala Gln Asn Pro Arg Leu Arg Arg Trp Trp Pro Arg 290
295 300Arg Tyr Arg Arg Ser Ser Val Tyr
Trp Arg Leu Met Ala Leu Asp Gln305 310
315 320Arg Phe Gly Ile Ala Asp Arg Phe Glu Asn Ser Arg
Gly Arg Pro Ala 325 330
335Arg Glu Arg Val Val Gln Asp Ile Glu Val Pro Ile Glu Arg Thr Cys
340 345 350Glu Phe Leu Glu Trp Phe
Gly Glu Asn Val Pro Ile Ser Pro Ile Trp 355 360
365Leu Cys Pro Leu Arg Leu Arg Asp His Ala Gly Trp Pro Leu
Tyr Pro 370 375 380Ile Arg Pro Asp Arg
Ser Tyr Val Asn Ile Gly Phe Trp Ser Ser Val385 390
395 400Pro Val Gly Ala Thr Glu Gly Ala Thr Asn
Arg Lys Ile Glu Asn Lys 405 410
415Val Ser Ala Leu Asp Gly His Lys Ser Leu Tyr Ser Asp Ser Phe Tyr
420 425 430Thr Arg Glu Glu Phe
Asp Glu Leu Tyr Gly Gly Glu Thr Tyr Asn Thr 435
440 445Val Lys Lys Ala Tyr Asp Pro Asp Ser Arg Leu Leu
Asp Leu Tyr Ala 450 455 460Lys Ala Val
Gln Arg Arg465 470491263DNAMycobacterium tuberculosis
49atggccgaga tcctggagat cttcaccgcg accgggcaac acccgctgaa gttcaccgcg
60tatgacggca gcaccgcggg acaagacgac gccacactgg gcctggatct tcggacgccc
120cgcggcgcca cctacttagc taccgctccc ggcgaactcg gcctggcccg cgcttatgtg
180tcgggtgacc tacaggcaca cggagtacat cccggcgatc cgtacgaact gctcaaaacg
240ctgaccgaaa gggtcgactt caaacggccg tcggcgcggg tgctggctaa tgtggtgcgc
300tcgatcggcg ttgagcacat actgcccatc gcgccgccac cccaggaggc gcgaccccgg
360tggcgtcgaa tggctaatgg cttgctgcac agcaagaccc gtgacgccga ggctatccat
420caccactacg acgtctccaa caacttctac gagtgggtgc tcgggccatc gatgacctac
480acgtgcgcgg tgtttccgaa cgctgaggct tcgctggagc aggcccaaga gaacaaatac
540cgactcattt tcgaaaagct acggctagag ccgggtgacc ggctactcga cgtcggctgc
600ggctggggcg gcatggtgcg ctacgccgcc cgacgcggtg tccgggtgat cggcgccacg
660ctctcggccg agcaggccaa gtggggccag aaagcagtcg aggacgaggg attgagcgac
720ctcgcgcagg tgcggcattc cgactaccgc gacgtagccg agaccggttt cgacgccgtt
780tcttcgatcg ggctaaccga gcacatcggc gtcaagaatt acccgttcta cttcgggttt
840ctcaagtcga agttgcgcac cggcggcttg ctgctcaatc actgcatcac ccgccacgac
900aacaggtcga cgtcctttgc cggcgggttc accgaccgtt acgttttccc cgacggggag
960ctgacgggct cgggacgtat taccaccgag atccagcagg tcggcttgga agtgctgcac
1020gaggagaact tccgccatca ctacgcgatg acgctgcgcg actggtgcgg caacctcgtc
1080gaacactggg acgacgcggt cgccgaggtc ggtctgccga ccgccaaggt gtggggcctg
1140tacatggcgg cttcgcgggt ggccttcgaa cgaaacaacc tgcagctaca tcacgtattg
1200gcgaccaagg tggacccccg gggcgacgac agcttgccac tgcggccctg gtggcagccc
1260tag
126350420PRTMycobacterium tuberculosis 50Met Ala Glu Ile Leu Glu Ile Phe
Thr Ala Thr Gly Gln His Pro Leu1 5 10
15Lys Phe Thr Ala Tyr Asp Gly Ser Thr Ala Gly Gln Asp Asp
Ala Thr 20 25 30Leu Gly Leu
Asp Leu Arg Thr Pro Arg Gly Ala Thr Tyr Leu Ala Thr 35
40 45Ala Pro Gly Glu Leu Gly Leu Ala Arg Ala Tyr
Val Ser Gly Asp Leu 50 55 60Gln Ala
His Gly Val His Pro Gly Asp Pro Tyr Glu Leu Leu Lys Thr65
70 75 80Leu Thr Glu Arg Val Asp Phe
Lys Arg Pro Ser Ala Arg Val Leu Ala 85 90
95Asn Val Val Arg Ser Ile Gly Val Glu His Ile Leu Pro
Ile Ala Pro 100 105 110Pro Pro
Gln Glu Ala Arg Pro Arg Trp Arg Arg Met Ala Asn Gly Leu 115
120 125Leu His Ser Lys Thr Arg Asp Ala Glu Ala
Ile His His His Tyr Asp 130 135 140Val
Ser Asn Asn Phe Tyr Glu Trp Val Leu Gly Pro Ser Met Thr Tyr145
150 155 160Thr Cys Ala Val Phe Pro
Asn Ala Glu Ala Ser Leu Glu Gln Ala Gln 165
170 175Glu Asn Lys Tyr Arg Leu Ile Phe Glu Lys Leu Arg
Leu Glu Pro Gly 180 185 190Asp
Arg Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met Val Arg Tyr 195
200 205Ala Ala Arg Arg Gly Val Arg Val Ile
Gly Ala Thr Leu Ser Ala Glu 210 215
220Gln Ala Lys Trp Gly Gln Lys Ala Val Glu Asp Glu Gly Leu Ser Asp225
230 235 240Leu Ala Gln Val
Arg His Ser Asp Tyr Arg Asp Val Ala Glu Thr Gly 245
250 255Phe Asp Ala Val Ser Ser Ile Gly Leu Thr
Glu His Ile Gly Val Lys 260 265
270Asn Tyr Pro Phe Tyr Phe Gly Phe Leu Lys Ser Lys Leu Arg Thr Gly
275 280 285Gly Leu Leu Leu Asn His Cys
Ile Thr Arg His Asp Asn Arg Ser Thr 290 295
300Ser Phe Ala Gly Gly Phe Thr Asp Arg Tyr Val Phe Pro Asp Gly
Glu305 310 315 320Leu Thr
Gly Ser Gly Arg Ile Thr Thr Glu Ile Gln Gln Val Gly Leu
325 330 335Glu Val Leu His Glu Glu Asn
Phe Arg His His Tyr Ala Met Thr Leu 340 345
350Arg Asp Trp Cys Gly Asn Leu Val Glu His Trp Asp Asp Ala
Val Ala 355 360 365Glu Val Gly Leu
Pro Thr Ala Lys Val Trp Gly Leu Tyr Met Ala Ala 370
375 380Ser Arg Val Ala Phe Glu Arg Asn Asn Leu Gln Leu
His His Val Leu385 390 395
400Ala Thr Lys Val Asp Pro Arg Gly Asp Asp Ser Leu Pro Leu Arg Pro
405 410 415Trp Trp Gln Pro
420511392DNAUnknownMycobacterium vanbaalenii 51gtgtctgttc cttcgaccga
cgcacgttct gctcacgccg acggcgtgca gcggcttctc 60gccagctatc gggcgattcc
ccaagacgcc acggtccggc tggccaaacc cacgtcgaac 120ctcttccgtg cccgcgcgaa
aaccaggacc aagggtctgg acacgtctgg gttgacgaac 180gtgatcgcgg tcgacgcgga
ggcacgcacc gccgatgtgg cagggatgtg cacctacgaa 240gacctggtcg cggccacgct
gccgcatgga ctttcgccgc tggtggtgcc gcagttgaag 300acgatcaccc tcggcggggc
ggtcaccgga ctcgggatcg agtccgcctc gttccgcaac 360ggcctgccac acgaatcggt
tctcgagatg gacgtcctca ccggcaccgg tgatgtcgtg 420cgcgcctccc ccgacgagaa
ccctgacctg tttcgggcgt ttccgaattc ctatggcacg 480ttgggctatt cggttcggct
caagatcgag ctggaaccgg tgaagccgtt cgtcgcgctg 540cgccacctcc gtttccattc
gctgtcggct ctcatcgagg cgatggaccg catcgtcgaa 600accggcggcc tcaacggcga
accggtggac tacctcgacg gcgtcgtgtt cagtgccgag 660gagagttacc tgtgcgtggg
gcagcgctcc gcgacaccgg gcccggtcag cgactacacg 720ggcaagcaga tctactaccg
ctcgattcag cacgacggcc cgaccgatgg cgccgagaag 780cacgaccggc tgaccatcca
cgactacctg tggcgctggg acaccgactg gttctggtgc 840tcaagggcat tcggcgcgca
gaacccgcgg atccggcgct ggtggccgcg ccggtaccgg 900cgcagcagtg tgtactggaa
gctgatcggc tacgaccggc gtttcggtat cgccgatcgc 960atcgagaagc gcaacggccg
acccccgcgc gagcgggtgg tccaggacat cgaggtgccc 1020atcgagcgga ccgtcgagtt
tctgcagtgg tttctcgaca ccgtgcccat cgaaccgatc 1080tggttgtgcc cgttgcggct
ccgcgacgac cgcgattggc ccctgtatcc gatccgaccc 1140caccacacct acgtcaacgt
gggtttctgg tcgtcggtgc cggtgggccc ggaggagggc 1200tacaccaaca ggatgatcga
acggaaagtc agcgacctcg acggtcacaa atcgctgtat 1260tccgatgcgt actactcgcc
ggaagagttt gattcgctct atggcgggga gacgtacaag 1320acggtgaaga agacatacga
cccagactct cgtttcctgg acctgtacgg caaagcagtg 1380gggcggcaat ga
139252463PRTUnknownMycobacterium vanbaalenii 52Val Ser Val Pro Ser Thr
Asp Ala Arg Ser Ala His Ala Asp Gly Val1 5
10 15Gln Arg Leu Leu Ala Ser Tyr Arg Ala Ile Pro Gln
Asp Ala Thr Val 20 25 30Arg
Leu Ala Lys Pro Thr Ser Asn Leu Phe Arg Ala Arg Ala Lys Thr 35
40 45Arg Thr Lys Gly Leu Asp Thr Ser Gly
Leu Thr Asn Val Ile Ala Val 50 55
60Asp Ala Glu Ala Arg Thr Ala Asp Val Ala Gly Met Cys Thr Tyr Glu65
70 75 80Asp Leu Val Ala Ala
Thr Leu Pro His Gly Leu Ser Pro Leu Val Val 85
90 95Pro Gln Leu Lys Thr Ile Thr Leu Gly Gly Ala
Val Thr Gly Leu Gly 100 105
110Ile Glu Ser Ala Ser Phe Arg Asn Gly Leu Pro His Glu Ser Val Leu
115 120 125Glu Met Asp Val Leu Thr Gly
Thr Gly Asp Val Val Arg Ala Ser Pro 130 135
140Asp Glu Asn Pro Asp Leu Phe Arg Ala Phe Pro Asn Ser Tyr Gly
Thr145 150 155 160Leu Gly
Tyr Ser Val Arg Leu Lys Ile Glu Leu Glu Pro Val Lys Pro
165 170 175Phe Val Ala Leu Arg His Leu
Arg Phe His Ser Leu Ser Ala Leu Ile 180 185
190Glu Ala Met Asp Arg Ile Val Glu Thr Gly Gly Leu Asn Gly
Glu Pro 195 200 205Val Asp Tyr Leu
Asp Gly Val Val Phe Ser Ala Glu Glu Ser Tyr Leu 210
215 220Cys Val Gly Gln Arg Ser Ala Thr Pro Gly Pro Val
Ser Asp Tyr Thr225 230 235
240Gly Lys Gln Ile Tyr Tyr Arg Ser Ile Gln His Asp Gly Pro Thr Asp
245 250 255Gly Ala Glu Lys His
Asp Arg Leu Thr Ile His Asp Tyr Leu Trp Arg 260
265 270Trp Asp Thr Asp Trp Phe Trp Cys Ser Arg Ala Phe
Gly Ala Gln Asn 275 280 285Pro Arg
Ile Arg Arg Trp Trp Pro Arg Arg Tyr Arg Arg Ser Ser Val 290
295 300Tyr Trp Lys Leu Ile Gly Tyr Asp Arg Arg Phe
Gly Ile Ala Asp Arg305 310 315
320Ile Glu Lys Arg Asn Gly Arg Pro Pro Arg Glu Arg Val Val Gln Asp
325 330 335Ile Glu Val Pro
Ile Glu Arg Thr Val Glu Phe Leu Gln Trp Phe Leu 340
345 350Asp Thr Val Pro Ile Glu Pro Ile Trp Leu Cys
Pro Leu Arg Leu Arg 355 360 365Asp
Asp Arg Asp Trp Pro Leu Tyr Pro Ile Arg Pro His His Thr Tyr 370
375 380Val Asn Val Gly Phe Trp Ser Ser Val Pro
Val Gly Pro Glu Glu Gly385 390 395
400Tyr Thr Asn Arg Met Ile Glu Arg Lys Val Ser Asp Leu Asp Gly
His 405 410 415Lys Ser Leu
Tyr Ser Asp Ala Tyr Tyr Ser Pro Glu Glu Phe Asp Ser 420
425 430Leu Tyr Gly Gly Glu Thr Tyr Lys Thr Val
Lys Lys Thr Tyr Asp Pro 435 440
445Asp Ser Arg Phe Leu Asp Leu Tyr Gly Lys Ala Val Gly Arg Gln 450
455 460531323DNAUnknownMycobacterium
vanbaalenii 53ttgacgacat ttcgggacgg cgcggccgac accggcctgc acggagaccg
caagctcacc 60ctggcggagg tcttggaggt cttcgcctcg ggccgactgc ctctgaagtt
cacggcgtac 120gacggcagca gcgcgggccc ggacgacgcc acgctcgggc tggacctgct
gaccccccgc 180gggaccacgt acctcgcaac ggctcccggc gatctcggcc tggcccgggc
ctacgtctcc 240ggtgacctgc agttgcaggg ggtgcaccct ggcgacccgt acgacctgct
caacgcactg 300gtgcagaaac tggacttcaa gcgaccgtcc gcccgggtgc tggcgcaggt
cgtccgatcg 360atcgggatcg agcacctgaa accgatcgcg ccaccgccgc aggaggcgct
gccgcggtgg 420cggcgcatcg cagaaggact gcggcacagc aagacccgtg acgccgacgc
gatccaccac 480cattacgatg tctccaacac cttctacgag tgggtgctcg ggccgtcgat
gacctacacc 540tgcgcctgct acccgcatcc cgacgccacc ctcgaggagg cgcaggagaa
caaatatcgg 600ctggtgttcg agaaactgcg cctcaagccg ggcgaccgcc ttctcgacgt
gggttgcggg 660tggggcggaa tggtgcgcta cgcggcccgt cacggcgtca aggcgatcgg
ggtgacgctg 720tccagggagc aggcgcagtg ggcacgcgcc gccatcgaac gggacggcct
gggtgacctc 780gccgaggtcc gccacagcga ctaccgcgat gtgcgcgagt cccagttcga
cgccgtgtct 840tcgctggggc tcaccgagca catcggggtc gccaactatc cgtcgtactt
ccggttcctc 900aagtcgaagt tgcgcccggg cggcctactg ctcaaccact gcatcacccg
gcacaacaat 960cgcaccggcc ccgccgccgg gggattcatc gaccggtatg tgttcccgga
cggggagctg 1020accggatcgg gccggatcat caccgagatc caggacgtcg gtttggaggt
gatgcacgaa 1080gagaacctgc gccggcacta tgcgctgaca cttcgggact ggtgccggaa
tctggtgcag 1140cactgggacg aagcggtcgc agaggtcggc ctgcccaccg ccaaggtgtg
gggtctgtac 1200atggctgcct cgcgggtcgg cttcgagcag aacagcattc agctgcatca
ggtactggcg 1260gtgaagctcg acgaacgtgg cggggacggc ggtttgccgt tgcggccctg
gtggaccgcg 1320tag
132354440PRTUnknownMycobacterium vanbaalenii 54Leu Thr Thr Phe
Arg Asp Gly Ala Ala Asp Thr Gly Leu His Gly Asp1 5
10 15Arg Lys Leu Thr Leu Ala Glu Val Leu Glu
Val Phe Ala Ser Gly Arg 20 25
30Leu Pro Leu Lys Phe Thr Ala Tyr Asp Gly Ser Ser Ala Gly Pro Asp
35 40 45Asp Ala Thr Leu Gly Leu Asp Leu
Leu Thr Pro Arg Gly Thr Thr Tyr 50 55
60Leu Ala Thr Ala Pro Gly Asp Leu Gly Leu Ala Arg Ala Tyr Val Ser65
70 75 80Gly Asp Leu Gln Leu
Gln Gly Val His Pro Gly Asp Pro Tyr Asp Leu 85
90 95Leu Asn Ala Leu Val Gln Lys Leu Asp Phe Lys
Arg Pro Ser Ala Arg 100 105
110Val Leu Ala Gln Val Val Arg Ser Ile Gly Ile Glu His Leu Lys Pro
115 120 125Ile Ala Pro Pro Pro Gln Glu
Ala Leu Pro Arg Trp Arg Arg Ile Ala 130 135
140Glu Gly Leu Arg His Ser Lys Thr Arg Asp Ala Asp Ala Ile His
His145 150 155 160His Tyr
Asp Val Ser Asn Thr Phe Tyr Glu Trp Val Leu Gly Pro Ser
165 170 175Met Thr Tyr Thr Cys Ala Cys
Tyr Pro His Pro Asp Ala Thr Leu Glu 180 185
190Glu Ala Gln Glu Asn Lys Tyr Arg Leu Val Phe Glu Lys Leu
Arg Leu 195 200 205Lys Pro Gly Asp
Arg Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met 210
215 220Val Arg Tyr Ala Ala Arg His Gly Val Lys Ala Ile
Gly Val Thr Leu225 230 235
240Ser Arg Glu Gln Ala Gln Trp Ala Arg Ala Ala Ile Glu Arg Asp Gly
245 250 255Leu Gly Asp Leu Ala
Glu Val Arg His Ser Asp Tyr Arg Asp Val Arg 260
265 270Glu Ser Gln Phe Asp Ala Val Ser Ser Leu Gly Leu
Thr Glu His Ile 275 280 285Gly Val
Ala Asn Tyr Pro Ser Tyr Phe Arg Phe Leu Lys Ser Lys Leu 290
295 300Arg Pro Gly Gly Leu Leu Leu Asn His Cys Ile
Thr Arg His Asn Asn305 310 315
320Arg Thr Gly Pro Ala Ala Gly Gly Phe Ile Asp Arg Tyr Val Phe Pro
325 330 335Asp Gly Glu Leu
Thr Gly Ser Gly Arg Ile Ile Thr Glu Ile Gln Asp 340
345 350Val Gly Leu Glu Val Met His Glu Glu Asn Leu
Arg Arg His Tyr Ala 355 360 365Leu
Thr Leu Arg Asp Trp Cys Arg Asn Leu Val Gln His Trp Asp Glu 370
375 380Ala Val Ala Glu Val Gly Leu Pro Thr Ala
Lys Val Trp Gly Leu Tyr385 390 395
400Met Ala Ala Ser Arg Val Gly Phe Glu Gln Asn Ser Ile Gln Leu
His 405 410 415Gln Val Leu
Ala Val Lys Leu Asp Glu Arg Gly Gly Asp Gly Gly Leu 420
425 430Pro Leu Arg Pro Trp Trp Thr Ala
435 44055381DNAUnknownMycobacterium vanbaalenii
55gtgatccgct ttctgctgcg cgtcgcggtc tttctcggat cgtcggcgat cgggctactg
60gtggccggct ggctggtgcc gggggtgtcg ctgtcggtgc tgggcttcgt caccgcggtg
120gtgatcttca cggtggcaca agggattctg tcgccgttct tcctgaagat ggccagccgc
180tacgcgtcgg ccttcctcgg cggcatcggc ctggtgtcca cgttcgtggc gctgctgctc
240gcgtcgctgc tgtccaacgg gctcagcatc cgcggcgtcg ggtcgtggat cgcggccacg
300gtggtggtct ggctggtcac agccctggcg accgtcgtgc tgcccgttct ggtgctgcgg
360gagaagaaga aagcagcctg a
38156126PRTUnknownMycobacterium vanbaalenii 56Val Ile Arg Phe Leu Leu Arg
Val Ala Val Phe Leu Gly Ser Ser Ala1 5 10
15Ile Gly Leu Leu Val Ala Gly Trp Leu Val Pro Gly Val
Ser Leu Ser 20 25 30Val Leu
Gly Phe Val Thr Ala Val Val Ile Phe Thr Val Ala Gln Gly 35
40 45Ile Leu Ser Pro Phe Phe Leu Lys Met Ala
Ser Arg Tyr Ala Ser Ala 50 55 60Phe
Leu Gly Gly Ile Gly Leu Val Ser Thr Phe Val Ala Leu Leu Leu65
70 75 80Ala Ser Leu Leu Ser Asn
Gly Leu Ser Ile Arg Gly Val Gly Ser Trp 85
90 95Ile Ala Ala Thr Val Val Val Trp Leu Val Thr Ala
Leu Ala Thr Val 100 105 110Val
Leu Pro Val Leu Val Leu Arg Glu Lys Lys Lys Ala Ala 115
120 125571452DNARhodococcus opacus 57atgcgggagg
gtggacgccc cttccgtgcg catcgcactc tgcccgtcac cgggatcgac 60gctcaccgcg
ccggcgtcga acggcttctc gcgtcctacc gcgcgattcc cacggacgcc 120accgtgcgac
tcgcgaagaa gacgtccaac ctgttccggg cgcgggccca gaccagcgca 180cccggcctcg
acgtctccgg gctcggcgga gtcatctcgg tcgacgagca ggaccggacc 240gcggatgtcg
ccggaatgtg cacgtacgaa gacctggtgg acgccaccct cccgtacggg 300ctggcgccgc
tggtggttcc gcaactcaag accatcacac tcggcggcgc ggtcaccggc 360ctcggcatcg
agtcgacgtc gttccgcaac gggctccccc acgaatcggt cctcgagatc 420gacgtcctga
ccggaagcgg cgacatcgtc accgcgagac cggaaggcga gaactccgac 480ctgttctggg
ggttccccaa ctcctacgga accctcggct actccacccg actgcgcatc 540cagctcgaac
ccgtcaaacg gtatgtggca ctgcgccatc tgcgtttcga ctccctggac 600gagctgcagt
cggcaatgga tcgcatcgtc accgagcgcg tccacgacgg catccccgtc 660gactatctgg
acggcgtcgt gttcaccgcg tccgagagtt acctgacact gggccatcag 720accgacgagg
gcggccccgt cagcgactac accgggcaga acatcttcta ccggtccatc 780cagcacagtt
ccgtgaacca ccccaaaacg gacaaactca ccatccgaga ctacctgtgg 840cgctgggaca
ccgactggtt ctggtgctcg cgcgccttcg gcgcccagaa ccccaccatc 900cgccggctgt
ggccgaagaa cctcctccgc agcagcttct actggaagct catcgccctc 960gaccacaagt
acgacatcgg cgaccgactc gagaagcgca agggcaaccc gccacgcgaa 1020cgcgtcgtgc
aggacgtcga agtgcccatc gagcgcaccg cggacttcgt ccgctggttc 1080ctcgacgaaa
tcccgatcga accgctgtgg ctgtgcccgt tgcggttgcg ggaacctgcc 1140cccgccggcg
cgtcctcgca acgcccctgg cccctgtacc ccctcgaacc gaaacgcacg 1200tacgtgaaca
tcggattctg gtcatcggtg cccatcgttc cgggccgacc cgagggggcc 1260gcgaatcggc
tgatcgaaga caaggtcagt gacttcgacg gacacaagtc cctctactcc 1320gattcgtact
attcacgcga agatttcgaa cgcctctact acggcggcga tcgatacacg 1380gaactgaaaa
aacgctacga cccgaaatca cgattactgg accttttctc caaggcggtg 1440caacgtcgat
ga
145258483PRTRhodococcus opacus 58Met Arg Glu Gly Gly Arg Pro Phe Arg Ala
His Arg Thr Leu Pro Val1 5 10
15Thr Gly Ile Asp Ala His Arg Ala Gly Val Glu Arg Leu Leu Ala Ser
20 25 30Tyr Arg Ala Ile Pro Thr
Asp Ala Thr Val Arg Leu Ala Lys Lys Thr 35 40
45Ser Asn Leu Phe Arg Ala Arg Ala Gln Thr Ser Ala Pro Gly
Leu Asp 50 55 60Val Ser Gly Leu Gly
Gly Val Ile Ser Val Asp Glu Gln Asp Arg Thr65 70
75 80Ala Asp Val Ala Gly Met Cys Thr Tyr Glu
Asp Leu Val Asp Ala Thr 85 90
95Leu Pro Tyr Gly Leu Ala Pro Leu Val Val Pro Gln Leu Lys Thr Ile
100 105 110Thr Leu Gly Gly Ala
Val Thr Gly Leu Gly Ile Glu Ser Thr Ser Phe 115
120 125Arg Asn Gly Leu Pro His Glu Ser Val Leu Glu Ile
Asp Val Leu Thr 130 135 140Gly Ser Gly
Asp Ile Val Thr Ala Arg Pro Glu Gly Glu Asn Ser Asp145
150 155 160Leu Phe Trp Gly Phe Pro Asn
Ser Tyr Gly Thr Leu Gly Tyr Ser Thr 165
170 175Arg Leu Arg Ile Gln Leu Glu Pro Val Lys Arg Tyr
Val Ala Leu Arg 180 185 190His
Leu Arg Phe Asp Ser Leu Asp Glu Leu Gln Ser Ala Met Asp Arg 195
200 205Ile Val Thr Glu Arg Val His Asp Gly
Ile Pro Val Asp Tyr Leu Asp 210 215
220Gly Val Val Phe Thr Ala Ser Glu Ser Tyr Leu Thr Leu Gly His Gln225
230 235 240Thr Asp Glu Gly
Gly Pro Val Ser Asp Tyr Thr Gly Gln Asn Ile Phe 245
250 255Tyr Arg Ser Ile Gln His Ser Ser Val Asn
His Pro Lys Thr Asp Lys 260 265
270Leu Thr Ile Arg Asp Tyr Leu Trp Arg Trp Asp Thr Asp Trp Phe Trp
275 280 285Cys Ser Arg Ala Phe Gly Ala
Gln Asn Pro Thr Ile Arg Arg Leu Trp 290 295
300Pro Lys Asn Leu Leu Arg Ser Ser Phe Tyr Trp Lys Leu Ile Ala
Leu305 310 315 320Asp His
Lys Tyr Asp Ile Gly Asp Arg Leu Glu Lys Arg Lys Gly Asn
325 330 335Pro Pro Arg Glu Arg Val Val
Gln Asp Val Glu Val Pro Ile Glu Arg 340 345
350Thr Ala Asp Phe Val Arg Trp Phe Leu Asp Glu Ile Pro Ile
Glu Pro 355 360 365Leu Trp Leu Cys
Pro Leu Arg Leu Arg Glu Pro Ala Pro Ala Gly Ala 370
375 380Ser Ser Gln Arg Pro Trp Pro Leu Tyr Pro Leu Glu
Pro Lys Arg Thr385 390 395
400Tyr Val Asn Ile Gly Phe Trp Ser Ser Val Pro Ile Val Pro Gly Arg
405 410 415Pro Glu Gly Ala Ala
Asn Arg Leu Ile Glu Asp Lys Val Ser Asp Phe 420
425 430Asp Gly His Lys Ser Leu Tyr Ser Asp Ser Tyr Tyr
Ser Arg Glu Asp 435 440 445Phe Glu
Arg Leu Tyr Tyr Gly Gly Asp Arg Tyr Thr Glu Leu Lys Lys 450
455 460Arg Tyr Asp Pro Lys Ser Arg Leu Leu Asp Leu
Phe Ser Lys Ala Val465 470 475
480Gln Arg Arg591302DNARhodococcus opacus 59atgacaactc tgaaagcttc
acgctcccag gaccacaagc tgaccatcgc agagattctc 60gaaactctgt ccgacggcat
gctccccctg cggttctccg cctacgacgg cagcgccgcc 120ggcccggagg acgcccccta
cggtctccac ctcaagacga cccgaggcac cacctacctg 180gcgaccgccc ccggcgacct
cggcatggcc cgggcctacg tgtccggcga cctcgaggcc 240cgcggcgtcc accccggcga
cccgtacgag atcctccgcg tgatgggcga cgaactgcac 300ttccgccgtc cgtccgcgct
cacgctcgcc gccatcacgc gctcgctcgg ctgggatctg 360ctgcgcccca tcgcccctcc
cccgcaggag catctcccgc ggtggcgtcg agtcgcggaa 420gggttgcggc actccaagtc
ccgcgacgcc gaggtcatcc accaccacta cgacgtctcg 480aacaccttct acgagtatgt
cctcggcccg tccatgacgt acacgtgcgc ctgctacgag 540aacgccgagc agaccctcga
agaggcacag gacaacaagt accgcctcgt cttcgagaag 600ctcggcctcc agcccggcga
ccgactgctc gacatcggtt gcggctgggg atcgatggtc 660cggtacgccg cccgccgcgg
cgtcaaggtc atcggcgcca ccctgtcccg agagcaggcc 720gaatgggcac agaaggccat
cgccgaagaa ggactgtccg acctcgccga ggtccggttc 780tccgactacc gtgacgtccc
cgagaccgga ttcgacgcca tctcctcgat cggcctgacc 840gagcacatcg gcgtcggcaa
ctaccccgcc tacttcggac tgctgcagag caagctccgc 900gagggcggcc ggctgctgaa
ccactgcatc acccggcccg acaaccagag tcaggcacgc 960gcgggcggct tcatcgaccg
gtacgtcttc cccgacggcg aactcaccgg ctccggacgc 1020atcatcaccg agatccagaa
cgtcggactc gaggtgcggc acgaggagaa tctgcgcgag 1080cactacgcac tcaccctcgc
cggctggtgc cagaacctcg tcgacaactg ggacgcctgc 1140gtcgccgagg tcggcgaagg
caccgcacgt gtgtggggtc tctacatggc cgggtcgcga 1200ctgggcttcg aacgcaacgt
cgttcagctg caccaggtcc tcgccgtcaa gctcggaccc 1260aagggcgagg cgcatgtgcc
gctgcgtccg tggtggaagt ag 130260433PRTRhodococcus
opacus 60Met Thr Thr Leu Lys Ala Ser Arg Ser Gln Asp His Lys Leu Thr Ile1
5 10 15Ala Glu Ile Leu
Glu Thr Leu Ser Asp Gly Met Leu Pro Leu Arg Phe 20
25 30Ser Ala Tyr Asp Gly Ser Ala Ala Gly Pro Glu
Asp Ala Pro Tyr Gly 35 40 45Leu
His Leu Lys Thr Thr Arg Gly Thr Thr Tyr Leu Ala Thr Ala Pro 50
55 60Gly Asp Leu Gly Met Ala Arg Ala Tyr Val
Ser Gly Asp Leu Glu Ala65 70 75
80Arg Gly Val His Pro Gly Asp Pro Tyr Glu Ile Leu Arg Val Met
Gly 85 90 95Asp Glu Leu
His Phe Arg Arg Pro Ser Ala Leu Thr Leu Ala Ala Ile 100
105 110Thr Arg Ser Leu Gly Trp Asp Leu Leu Arg
Pro Ile Ala Pro Pro Pro 115 120
125Gln Glu His Leu Pro Arg Trp Arg Arg Val Ala Glu Gly Leu Arg His 130
135 140Ser Lys Ser Arg Asp Ala Glu Val
Ile His His His Tyr Asp Val Ser145 150
155 160Asn Thr Phe Tyr Glu Tyr Val Leu Gly Pro Ser Met
Thr Tyr Thr Cys 165 170
175Ala Cys Tyr Glu Asn Ala Glu Gln Thr Leu Glu Glu Ala Gln Asp Asn
180 185 190Lys Tyr Arg Leu Val Phe
Glu Lys Leu Gly Leu Gln Pro Gly Asp Arg 195 200
205Leu Leu Asp Ile Gly Cys Gly Trp Gly Ser Met Val Arg Tyr
Ala Ala 210 215 220Arg Arg Gly Val Lys
Val Ile Gly Ala Thr Leu Ser Arg Glu Gln Ala225 230
235 240Glu Trp Ala Gln Lys Ala Ile Ala Glu Glu
Gly Leu Ser Asp Leu Ala 245 250
255Glu Val Arg Phe Ser Asp Tyr Arg Asp Val Pro Glu Thr Gly Phe Asp
260 265 270Ala Ile Ser Ser Ile
Gly Leu Thr Glu His Ile Gly Val Gly Asn Tyr 275
280 285Pro Ala Tyr Phe Gly Leu Leu Gln Ser Lys Leu Arg
Glu Gly Gly Arg 290 295 300Leu Leu Asn
His Cys Ile Thr Arg Pro Asp Asn Gln Ser Gln Ala Arg305
310 315 320Ala Gly Gly Phe Ile Asp Arg
Tyr Val Phe Pro Asp Gly Glu Leu Thr 325
330 335Gly Ser Gly Arg Ile Ile Thr Glu Ile Gln Asn Val
Gly Leu Glu Val 340 345 350Arg
His Glu Glu Asn Leu Arg Glu His Tyr Ala Leu Thr Leu Ala Gly 355
360 365Trp Cys Gln Asn Leu Val Asp Asn Trp
Asp Ala Cys Val Ala Glu Val 370 375
380Gly Glu Gly Thr Ala Arg Val Trp Gly Leu Tyr Met Ala Gly Ser Arg385
390 395 400Leu Gly Phe Glu
Arg Asn Val Val Gln Leu His Gln Val Leu Ala Val 405
410 415Lys Leu Gly Pro Lys Gly Glu Ala His Val
Pro Leu Arg Pro Trp Trp 420 425
430Lys611428DNAUnknownStreptomyces regnsis 61atgatcacac tggcaggccg
ggccggtgcg cgcgatcatg ggtgtatggc cttcggtgcc 60gccatcccca cggggtcggg
acacgccggg tacgccgagc gcgtcgcaac ccttcgcgcc 120cacctggccg acctcccgga
ggggacgccg gtccggctgg cgaagggcac ctcgaacctg 180ttccggccgc ggtcccgcgc
cacggcgggg ctcgacgtgt cggccttcga ccacgtgctg 240tcgatcgatc cgcagaaccg
gaccgccgac gtcgagggca tggtcaccta cgagcggctc 300gtcgacgcga cgttgccgca
cggcctgatg ccgctcgtcg ttccgcagct caagacgatc 360acgctgggcg gggcggtcac
gggactgggc atcgagtcgt cgtcgttccg cgagggcatg 420ccccacgaat ccgtggtgga
gatggacatc ctcacgggtg cgggagacgt ggtgaccgcg 480accccggacg gcgagcacag
cgacctgttc ttcgggttcc ccaactccta cggaacgctg 540ggatacgcgc tgcgcctgcg
gatcgaactc gcgccggtgc gcccgtacgt acgactcgaa 600cacctgcgtt tctccgatcc
ggcacgctac ttcgagcgcc tggcgcgtgc gtgccgcgac 660cgggaggccg acttcgtcga
cggcaccgtc ttcgctcccg acgagctgta cctgacgttg 720gccacgttca gcggcgagcc
cgacgaggtc agcgactaca cgtggatgga cgtctactac 780cgctcgatca gggagaagac
ggtcgaccat ctgccgatcc gcgactacct gtggcggtgg 840gacaccgact ggttctggtg
ttcgcgcgcg ctcggagcgc agaaccggct cgtgcggctg 900ctcgcgggtc cacgtctgct
gcgttccgat gtgtactgga agatcgtcgg tttcgaacgc 960aggcaccggc tgtgggagcg
tgcgagccgg ctgctgggca ggcccgagcg cgaagcggtg 1020atgcaggaca tcgaggtgcc
ggtgcaccgc gccgaggagt tcctgacgtt cctgcaccgg 1080gagatcccca tcagtccggt
gtggatctgc ccgctgagtg ggcgggacgc gcgccggtgg 1140ccgctgtacg agctcgaccc
ggacgagctg tacgtcaact tcggtttctg gggcacggtg 1200ccgctcgagc caggcgaacc
gcagggttcg cacaaccggc gggtggagaa cgtggttacc 1260gaactcgacg gacggaaatc
cctgtactcg gagagtttct acgaccgcga cacgttctgg 1320cggttgtacg gagggaatca
aggacagacg taccaggccc tgaagcatcg ctacgacccg 1380aacgggagat tgctggacct
gtacgccaag tgcgttcaag cgaggtga
142862475PRTUnknownStreptomyces regnsis 62Met Ile Thr Leu Ala Gly Arg Ala
Gly Ala Arg Asp His Gly Cys Met1 5 10
15Ala Phe Gly Ala Ala Ile Pro Thr Gly Ser Gly His Ala Gly
Tyr Ala 20 25 30Glu Arg Val
Ala Thr Leu Arg Ala His Leu Ala Asp Leu Pro Glu Gly 35
40 45Thr Pro Val Arg Leu Ala Lys Gly Thr Ser Asn
Leu Phe Arg Pro Arg 50 55 60Ser Arg
Ala Thr Ala Gly Leu Asp Val Ser Ala Phe Asp His Val Leu65
70 75 80Ser Ile Asp Pro Gln Asn Arg
Thr Ala Asp Val Glu Gly Met Val Thr 85 90
95Tyr Glu Arg Leu Val Asp Ala Thr Leu Pro His Gly Leu
Met Pro Leu 100 105 110Val Val
Pro Gln Leu Lys Thr Ile Thr Leu Gly Gly Ala Val Thr Gly 115
120 125Leu Gly Ile Glu Ser Ser Ser Phe Arg Glu
Gly Met Pro His Glu Ser 130 135 140Val
Val Glu Met Asp Ile Leu Thr Gly Ala Gly Asp Val Val Thr Ala145
150 155 160Thr Pro Asp Gly Glu His
Ser Asp Leu Phe Phe Gly Phe Pro Asn Ser 165
170 175Tyr Gly Thr Leu Gly Tyr Ala Leu Arg Leu Arg Ile
Glu Leu Ala Pro 180 185 190Val
Arg Pro Tyr Val Arg Leu Glu His Leu Arg Phe Ser Asp Pro Ala 195
200 205Arg Tyr Phe Glu Arg Leu Ala Arg Ala
Cys Arg Asp Arg Glu Ala Asp 210 215
220Phe Val Asp Gly Thr Val Phe Ala Pro Asp Glu Leu Tyr Leu Thr Leu225
230 235 240Ala Thr Phe Ser
Gly Glu Pro Asp Glu Val Ser Asp Tyr Thr Trp Met 245
250 255Asp Val Tyr Tyr Arg Ser Ile Arg Glu Lys
Thr Val Asp His Leu Pro 260 265
270Ile Arg Asp Tyr Leu Trp Arg Trp Asp Thr Asp Trp Phe Trp Cys Ser
275 280 285Arg Ala Leu Gly Ala Gln Asn
Arg Leu Val Arg Leu Leu Ala Gly Pro 290 295
300Arg Leu Leu Arg Ser Asp Val Tyr Trp Lys Ile Val Gly Phe Glu
Arg305 310 315 320Arg His
Arg Leu Trp Glu Arg Ala Ser Arg Leu Leu Gly Arg Pro Glu
325 330 335Arg Glu Ala Val Met Gln Asp
Ile Glu Val Pro Val His Arg Ala Glu 340 345
350Glu Phe Leu Thr Phe Leu His Arg Glu Ile Pro Ile Ser Pro
Val Trp 355 360 365Ile Cys Pro Leu
Ser Gly Arg Asp Ala Arg Arg Trp Pro Leu Tyr Glu 370
375 380Leu Asp Pro Asp Glu Leu Tyr Val Asn Phe Gly Phe
Trp Gly Thr Val385 390 395
400Pro Leu Glu Pro Gly Glu Pro Gln Gly Ser His Asn Arg Arg Val Glu
405 410 415Asn Val Val Thr Glu
Leu Asp Gly Arg Lys Ser Leu Tyr Ser Glu Ser 420
425 430Phe Tyr Asp Arg Asp Thr Phe Trp Arg Leu Tyr Gly
Gly Asn Gln Gly 435 440 445Gln Thr
Tyr Gln Ala Leu Lys His Arg Tyr Asp Pro Asn Gly Arg Leu 450
455 460Leu Asp Leu Tyr Ala Lys Cys Val Gln Ala
Arg465 470
475631317DNAUnknownStreptomyces regnsis 63ttggcgtcgt cggggccacc
gctgcccgcc agggcggggt cccgatcggc tgactcgacg 60gcgttggacg cgatcctgcg
ccgcgtgctc ggggacgacc cgcccgtggc cgtgaccgcg 120ttcgacggca cggtggtcgg
tgacccggac tcggcgctgc agctgcacat ccgcacgccg 180acggccctga gctacgtgct
caccgcgccc aacgaactcg ggttggcgcg ggcctacgtc 240acgggacatc tcgacgtgac
cggcgacgtc taccaggtgc tgcgcgcact gacgagcgtg 300gccgagaacc tcacgacggc
cgatcggatg tggctggccg gccgtctcgc acgggacttc 360accgaccggc tgcggccggt
gccgatcccc gtcgaggagg cgccgtcgcg gctccgcagg 420accgcacgtg gcctccggca
ttccaaggcg cgcgacagcg acgcgatctc ccggcactac 480gacgtctcga accgcttcta
cgagctggtg ctcggcccgt cgatggccta cacgtgcgcc 540tgctacccgg aggatgcggc
cacgctggag caggcacagt tccacaagtt cgacctcgtg 600tgccgaaagc tcggtctgaa
gccggggatg cgcctgctcg acgtgggctg cggttggggc 660ggcatggtcg cccacgccgt
ggagcactac ggggtgcggg cgatcggcgt caccctctcg 720cgccagcagg cggagtgggg
acagcgggac ctcgaggcca ggggcctggc cgatcgcggc 780gagatccgcc atctggacta
ccgcgacgtg cccgagaccg ggttcgacgc ggtgtcgtcc 840atcgggctca ccgaacacat
cggcgcgcgg aacctgccgt cgtacttccg cttcctgcac 900tcgaagttgc gtcccggcgg
acggttgctc aaccactgca tcgtgcgccc gcacacctac 960gactcccatc ggacgggccc
gttcatcgac cgctacgtct tcccggacgg cgaactcgag 1020ggcgtcggga cgatcgtgtc
ggcgatgcag gaccacgggt tcgaggtacg gcacgcggag 1080aacctgcggg aacactacgg
gcgcaccctc gcggcgtggt gcgccaatct cgacgcgcac 1140tgggaggcgg cggtggccga
ggcgggcgtg cagcgggcca gggtgtgggc gctgtacatg 1200gcggcctccc ggctgtcgtt
cgaacgtcat gagctcgagc tgcagcaggt gctcggcgtg 1260aaacccgacg ccgcgggcgg
gtcgtcgatg ccgcttcgcc cggactgggg ggtgtga
131764438PRTUnknownStreptomyces regnsis 64Leu Ala Ser Ser Gly Pro Pro Leu
Pro Ala Arg Ala Gly Ser Arg Ser1 5 10
15Ala Asp Ser Thr Ala Leu Asp Ala Ile Leu Arg Arg Val Leu
Gly Asp 20 25 30Asp Pro Pro
Val Ala Val Thr Ala Phe Asp Gly Thr Val Val Gly Asp 35
40 45Pro Asp Ser Ala Leu Gln Leu His Ile Arg Thr
Pro Thr Ala Leu Ser 50 55 60Tyr Val
Leu Thr Ala Pro Asn Glu Leu Gly Leu Ala Arg Ala Tyr Val65
70 75 80Thr Gly His Leu Asp Val Thr
Gly Asp Val Tyr Gln Val Leu Arg Ala 85 90
95Leu Thr Ser Val Ala Glu Asn Leu Thr Thr Ala Asp Arg
Met Trp Leu 100 105 110Ala Gly
Arg Leu Ala Arg Asp Phe Thr Asp Arg Leu Arg Pro Val Pro 115
120 125Ile Pro Val Glu Glu Ala Pro Ser Arg Leu
Arg Arg Thr Ala Arg Gly 130 135 140Leu
Arg His Ser Lys Ala Arg Asp Ser Asp Ala Ile Ser Arg His Tyr145
150 155 160Asp Val Ser Asn Arg Phe
Tyr Glu Leu Val Leu Gly Pro Ser Met Ala 165
170 175Tyr Thr Cys Ala Cys Tyr Pro Glu Asp Ala Ala Thr
Leu Glu Gln Ala 180 185 190Gln
Phe His Lys Phe Asp Leu Val Cys Arg Lys Leu Gly Leu Lys Pro 195
200 205Gly Met Arg Leu Leu Asp Val Gly Cys
Gly Trp Gly Gly Met Val Ala 210 215
220His Ala Val Glu His Tyr Gly Val Arg Ala Ile Gly Val Thr Leu Ser225
230 235 240Arg Gln Gln Ala
Glu Trp Gly Gln Arg Asp Leu Glu Ala Arg Gly Leu 245
250 255Ala Asp Arg Gly Glu Ile Arg His Leu Asp
Tyr Arg Asp Val Pro Glu 260 265
270Thr Gly Phe Asp Ala Val Ser Ser Ile Gly Leu Thr Glu His Ile Gly
275 280 285Ala Arg Asn Leu Pro Ser Tyr
Phe Arg Phe Leu His Ser Lys Leu Arg 290 295
300Pro Gly Gly Arg Leu Leu Asn His Cys Ile Val Arg Pro His Thr
Tyr305 310 315 320Asp Ser
His Arg Thr Gly Pro Phe Ile Asp Arg Tyr Val Phe Pro Asp
325 330 335Gly Glu Leu Glu Gly Val Gly
Thr Ile Val Ser Ala Met Gln Asp His 340 345
350Gly Phe Glu Val Arg His Ala Glu Asn Leu Arg Glu His Tyr
Gly Arg 355 360 365Thr Leu Ala Ala
Trp Cys Ala Asn Leu Asp Ala His Trp Glu Ala Ala 370
375 380Val Ala Glu Ala Gly Val Gln Arg Ala Arg Val Trp
Ala Leu Tyr Met385 390 395
400Ala Ala Ser Arg Leu Ser Phe Glu Arg His Glu Leu Glu Leu Gln Gln
405 410 415Val Leu Gly Val Lys
Pro Asp Ala Ala Gly Gly Ser Ser Met Pro Leu 420
425 430Arg Pro Asp Trp Gly Val
43565501DNAUnknownStreptomyces regnsis 65gtgcgcgtgg caccgccccg catcggtgcc
acacccggcg cggtgggcgc accggactac 60gcctccgcct tccgcgtgcc gacggcggcg
gcccgcaggc gttcgccgcg ggaatggacg 120cgtgcggtgt tcgagggcgc gcccgcgccg
ttggcgctgt tcgtgcgttg gggatggctg 180gccgtgctcc ggttgcgcct cagtgaggac
cccgaggcgg tggcgggctg gagacccacg 240acgctcgacc ccggcacctc cgacgccccc
gacacctctg agacagccgg aaactccgac 300gctgccgcac tggaggccga atcgccgctg
ctggaggcgt gcaacgtggc gttcgtcgac 360gacgacggtg tcacgtgggc gacctacgtc
cggttccgtg gtggcctcgg ccgcgcggtg 420tgggcggtgg cggcgcggat ccaccacgtc
gtcatcccct acctgctgcg gcgggcggtg 480cggcgcacgg aacgggagtg a
50166166PRTUnknownStreptomyces regnsis
66Val Arg Val Ala Pro Pro Arg Ile Gly Ala Thr Pro Gly Ala Val Gly1
5 10 15Ala Pro Asp Tyr Ala Ser
Ala Phe Arg Val Pro Thr Ala Ala Ala Arg 20 25
30Arg Arg Ser Pro Arg Glu Trp Thr Arg Ala Val Phe Glu
Gly Ala Pro 35 40 45Ala Pro Leu
Ala Leu Phe Val Arg Trp Gly Trp Leu Ala Val Leu Arg 50
55 60Leu Arg Leu Ser Glu Asp Pro Glu Ala Val Ala Gly
Trp Arg Pro Thr65 70 75
80Thr Leu Asp Pro Gly Thr Ser Asp Ala Pro Asp Thr Ser Glu Thr Ala
85 90 95Gly Asn Ser Asp Ala Ala
Ala Leu Glu Ala Glu Ser Pro Leu Leu Glu 100
105 110Ala Cys Asn Val Ala Phe Val Asp Asp Asp Gly Val
Thr Trp Ala Thr 115 120 125Tyr Val
Arg Phe Arg Gly Gly Leu Gly Arg Ala Val Trp Ala Val Ala 130
135 140Ala Arg Ile His His Val Val Ile Pro Tyr Leu
Leu Arg Arg Ala Val145 150 155
160Arg Arg Thr Glu Arg Glu
165671413DNAUnknownThermobifida fusca 67gtgaactgtc agtcttccgc gtccaacctc
gccaaccaca tcaacgcggt gtacgagctg 60cgccgcgcct atgcgcggct gtccgccgac
aagccggtgc gcctggcgaa gaccacctcc 120aacctcttcc gcttccgcag ccgggacgat
gccgcgcgtc tcgacgtcag cgctttcacc 180tcggtgatca gcatcgacac ggaggcgcgg
gtcgcggagg tgggcggcat gaccacctac 240gaggacctgg tcgccgccac cctgcggcat
ggcctgatgc cgccggtggt tccgcaactg 300cgcacgatca ccctgggcgg tgcggtcacc
gggctgggga tcgaatcctc gtccttccgc 360aacgggctcc cgcacgagtc agtggaagag
atggagatcc tcaccggcag cggccaggtg 420gtggtggccc ggcgcgacaa cgagcaccgc
gacctgttct acggtttccc caactcgtac 480ggcaccctcg gttacgcgct gcggctccgc
atccagctcg aaccggtccg cccctacgtc 540cacctgcggc acctgcggtt caccgatgcc
gcagcggcca tggccgcgct ggagcagatc 600tgcgcggacc gcacccacga cggggagacc
gtcgacttcg tcgacggcgt cgtgttcgcc 660cgcaacgagc tgtacctgac cttggggacg
ttcaccgacc gggctccgtg gaccagcgac 720tacaccggaa ccgacatcta ctaccggtcg
atcccccgct acgcgggccc cggccccggc 780gactacctca ccacgcacga ctacctgtgg
cggtgggaca ccgactggtt ctggtgctcc 840cgcgccttcg gactgcagca tcccgtggtg
cgccgcctgt ggccgcgttc cttgaaacgc 900tccgacgtct accgcaagct cgtcgcctgg
gaccggcgca ctgacgcgag ccgcctgctc 960gactactacc gcgggcgccc gcccaaggaa
ccggtgatcc aggacatcga ggttgaggtg 1020gggcgggctg ccgagttcct cgacttcttc
cacaccgaga tcggcatgtc cccggtgtgg 1080ctgtgcccgc tgcggctgcg agaagacaca
gccgacgata cggaaccggt ctggccgctc 1140taccccctca aaccccgccg cctctacgtc
aacttcgggt tttggggcct cgttccgatc 1200cgtcccggtg gaggcaggac ataccacaac
cggctgatcg aaaaagaagt gacccggttg 1260ggcgggcaca agtcgctcta ctcggacgcc
ttctacgacg aggacgagtt ctgggagctc 1320tacaacgggg agatctaccg caagctcaaa
gctgcctacg accccgacgg tcgactgctc 1380gacctgtaca ccaagtgcgt cggcggcggg
tga 141368470PRTUnknownThermobifida fusca
68Val Asn Cys Gln Ser Ser Ala Ser Asn Leu Ala Asn His Ile Asn Ala1
5 10 15Val Tyr Glu Leu Arg Arg
Ala Tyr Ala Arg Leu Ser Ala Asp Lys Pro 20 25
30Val Arg Leu Ala Lys Thr Thr Ser Asn Leu Phe Arg Phe
Arg Ser Arg 35 40 45Asp Asp Ala
Ala Arg Leu Asp Val Ser Ala Phe Thr Ser Val Ile Ser 50
55 60Ile Asp Thr Glu Ala Arg Val Ala Glu Val Gly Gly
Met Thr Thr Tyr65 70 75
80Glu Asp Leu Val Ala Ala Thr Leu Arg His Gly Leu Met Pro Pro Val
85 90 95Val Pro Gln Leu Arg Thr
Ile Thr Leu Gly Gly Ala Val Thr Gly Leu 100
105 110Gly Ile Glu Ser Ser Ser Phe Arg Asn Gly Leu Pro
His Glu Ser Val 115 120 125Glu Glu
Met Glu Ile Leu Thr Gly Ser Gly Gln Val Val Val Ala Arg 130
135 140Arg Asp Asn Glu His Arg Asp Leu Phe Tyr Gly
Phe Pro Asn Ser Tyr145 150 155
160Gly Thr Leu Gly Tyr Ala Leu Arg Leu Arg Ile Gln Leu Glu Pro Val
165 170 175Arg Pro Tyr Val
His Leu Arg His Leu Arg Phe Thr Asp Ala Ala Ala 180
185 190Ala Met Ala Ala Leu Glu Gln Ile Cys Ala Asp
Arg Thr His Asp Gly 195 200 205Glu
Thr Val Asp Phe Val Asp Gly Val Val Phe Ala Arg Asn Glu Leu 210
215 220Tyr Leu Thr Leu Gly Thr Phe Thr Asp Arg
Ala Pro Trp Thr Ser Asp225 230 235
240Tyr Thr Gly Thr Asp Ile Tyr Tyr Arg Ser Ile Pro Arg Tyr Ala
Gly 245 250 255Pro Gly Pro
Gly Asp Tyr Leu Thr Thr His Asp Tyr Leu Trp Arg Trp 260
265 270Asp Thr Asp Trp Phe Trp Cys Ser Arg Ala
Phe Gly Leu Gln His Pro 275 280
285Val Val Arg Arg Leu Trp Pro Arg Ser Leu Lys Arg Ser Asp Val Tyr 290
295 300Arg Lys Leu Val Ala Trp Asp Arg
Arg Thr Asp Ala Ser Arg Leu Leu305 310
315 320Asp Tyr Tyr Arg Gly Arg Pro Pro Lys Glu Pro Val
Ile Gln Asp Ile 325 330
335Glu Val Glu Val Gly Arg Ala Ala Glu Phe Leu Asp Phe Phe His Thr
340 345 350Glu Ile Gly Met Ser Pro
Val Trp Leu Cys Pro Leu Arg Leu Arg Glu 355 360
365Asp Thr Ala Asp Asp Thr Glu Pro Val Trp Pro Leu Tyr Pro
Leu Lys 370 375 380Pro Arg Arg Leu Tyr
Val Asn Phe Gly Phe Trp Gly Leu Val Pro Ile385 390
395 400Arg Pro Gly Gly Gly Arg Thr Tyr His Asn
Arg Leu Ile Glu Lys Glu 405 410
415Val Thr Arg Leu Gly Gly His Lys Ser Leu Tyr Ser Asp Ala Phe Tyr
420 425 430Asp Glu Asp Glu Phe
Trp Glu Leu Tyr Asn Gly Glu Ile Tyr Arg Lys 435
440 445Leu Lys Ala Ala Tyr Asp Pro Asp Gly Arg Leu Leu
Asp Leu Tyr Thr 450 455 460Lys Cys Val
Gly Gly Gly465 470691272DNAUnknownThermobifida fusca
69atgcgactgg cggaggtatt cgaacgtgtc gtcggacccg atgcgcccgt ccacttccgg
60gcctacgacg gcagcactgc gggagatcca cgcagtgaag tcgctatcgt ggttcgccac
120ccggcagccg tcaactacat cgtccaagcg ccgggagcac tcggtttgac ccgcgcctac
180gtggcgggat acctcgacgt cgaaggggac atgtacaccg cgctgcgggc aatggccgac
240gtggtgttcc aggaccggcc gcggctgtcc cccggggaac tgctgcggat catccgcggg
300atcgggtggg tgaagttcgt caaccggctt ccaccgccgc cgcaggaggt gcgccagtcc
360cgcctcgccg ccctgggctg gcgccactcc aagcagcgcg acgccgaagc catccagcac
420cactacgacg tctccaacgc cttctacgcc ctggtcttgg gcgagtcgat gacctacacc
480tgcgcggtct acccgaccga gcaggccacg ctggagcagg cacagttctt caagcacgag
540ctgatcgccc gcaagctcgg tcttgcccct gggatacgac tgctggatgt ggggtgcggc
600tggggcggca tggtcatcca cgcggcccgg gagcacgggg tcaaagccct gggggtgacc
660ctgtccaaag agcaggctga gtgggcgcag aagcggatcg cccacgaggg cctgggcgac
720ctggcagaag tccggcacat ggactaccgg gacctgcccg acggcgagta cgacgcgatc
780agctcgatcg ggttgaccga gcacgtcggc aaaaagaacg tgcccgccta cttcgcgtcg
840ctgtaccgca agctcgtccc gggaggccgc ctgctcaacc actgcatcac ccggccccgc
900aacgacctgc cgcccttcaa acgcggcggg gtgatcaacc gctacgtctt ccccgatggg
960gagctggaag ggcccggctg gctgcaggcg gcgatgaacg acgccgggtt cgaaatccgc
1020caccaggaga acctgcggga gcactacgca cggaccctgc gggactggct ggccaacctg
1080gaccgcaact gggatgccgc ggtgcgggaa gtgggggagg gcacggcccg agtgtggcgg
1140ctctacatgg ccgggtgcgt gctcggcttc gaacgcaacg tggtgcaact gcaccagatc
1200ctcggggtga agctcgacgg gaccgaggcg cggatgccgc tgcgccccga cttcgaaccg
1260ccgctgcctt aa
127270423PRTUnknownThermobifida fusca 70Met Arg Leu Ala Glu Val Phe Glu
Arg Val Val Gly Pro Asp Ala Pro1 5 10
15Val His Phe Arg Ala Tyr Asp Gly Ser Thr Ala Gly Asp Pro
Arg Ser 20 25 30Glu Val Ala
Ile Val Val Arg His Pro Ala Ala Val Asn Tyr Ile Val 35
40 45Gln Ala Pro Gly Ala Leu Gly Leu Thr Arg Ala
Tyr Val Ala Gly Tyr 50 55 60Leu Asp
Val Glu Gly Asp Met Tyr Thr Ala Leu Arg Ala Met Ala Asp65
70 75 80Val Val Phe Gln Asp Arg Pro
Arg Leu Ser Pro Gly Glu Leu Leu Arg 85 90
95Ile Ile Arg Gly Ile Gly Trp Val Lys Phe Val Asn Arg
Leu Pro Pro 100 105 110Pro Pro
Gln Glu Val Arg Gln Ser Arg Leu Ala Ala Leu Gly Trp Arg 115
120 125His Ser Lys Gln Arg Asp Ala Glu Ala Ile
Gln His His Tyr Asp Val 130 135 140Ser
Asn Ala Phe Tyr Ala Leu Val Leu Gly Glu Ser Met Thr Tyr Thr145
150 155 160Cys Ala Val Tyr Pro Thr
Glu Gln Ala Thr Leu Glu Gln Ala Gln Phe 165
170 175Phe Lys His Glu Leu Ile Ala Arg Lys Leu Gly Leu
Ala Pro Gly Ile 180 185 190Arg
Leu Leu Asp Val Gly Cys Gly Trp Gly Gly Met Val Ile His Ala 195
200 205Ala Arg Glu His Gly Val Lys Ala Leu
Gly Val Thr Leu Ser Lys Glu 210 215
220Gln Ala Glu Trp Ala Gln Lys Arg Ile Ala His Glu Gly Leu Gly Asp225
230 235 240Leu Ala Glu Val
Arg His Met Asp Tyr Arg Asp Leu Pro Asp Gly Glu 245
250 255Tyr Asp Ala Ile Ser Ser Ile Gly Leu Thr
Glu His Val Gly Lys Lys 260 265
270Asn Val Pro Ala Tyr Phe Ala Ser Leu Tyr Arg Lys Leu Val Pro Gly
275 280 285Gly Arg Leu Leu Asn His Cys
Ile Thr Arg Pro Arg Asn Asp Leu Pro 290 295
300Pro Phe Lys Arg Gly Gly Val Ile Asn Arg Tyr Val Phe Pro Asp
Gly305 310 315 320Glu Leu
Glu Gly Pro Gly Trp Leu Gln Ala Ala Met Asn Asp Ala Gly
325 330 335Phe Glu Ile Arg His Gln Glu
Asn Leu Arg Glu His Tyr Ala Arg Thr 340 345
350Leu Arg Asp Trp Leu Ala Asn Leu Asp Arg Asn Trp Asp Ala
Ala Val 355 360 365Arg Glu Val Gly
Glu Gly Thr Ala Arg Val Trp Arg Leu Tyr Met Ala 370
375 380Gly Cys Val Leu Gly Phe Glu Arg Asn Val Val Gln
Leu His Gln Ile385 390 395
400Leu Gly Val Lys Leu Asp Gly Thr Glu Ala Arg Met Pro Leu Arg Pro
405 410 415Asp Phe Glu Pro Pro
Leu Pro 42071447DNAUnknownThermobifida fusca 71atggctgcga
ccgatgacga ccggcaccac accaccgtcg ccctcgacct catcgacgcg 60tatgtgcgcg
ccgaccgcag aatgatcggt gaacgttccg cggggatcag cgcggaggcg 120ggggagcgga
tcgtctccac cctgaaagtg tgcgcggcct tccttgcccg ccgggtccag 180gagaccgggg
tgccgtggcg cgccgcggac tcccgggaag cggtcgcccg caccgtcgcc 240gacctgctgg
aacccgaggt ggaattcgcg gtcgtctccg cctgggaggc gtacgcgatc 300ggggagcacg
aggccgcctg ggtccgggcg cacggcgatc cgctggtctt cgtccacatg 360ctggccgcgt
tctccgctgc tatcggcaca gcggtctacg gccgtgagga gctgctgccc 420acgctgcgca
gggtgacagc acgataa
44772148PRTUnknownThermobifida fusca 72Met Ala Ala Thr Asp Asp Asp Arg
His His Thr Thr Val Ala Leu Asp1 5 10
15Leu Ile Asp Ala Tyr Val Arg Ala Asp Arg Arg Met Ile Gly
Glu Arg 20 25 30Ser Ala Gly
Ile Ser Ala Glu Ala Gly Glu Arg Ile Val Ser Thr Leu 35
40 45Lys Val Cys Ala Ala Phe Leu Ala Arg Arg Val
Gln Glu Thr Gly Val 50 55 60Pro Trp
Arg Ala Ala Asp Ser Arg Glu Ala Val Ala Arg Thr Val Ala65
70 75 80Asp Leu Leu Glu Pro Glu Val
Glu Phe Ala Val Val Ser Ala Trp Glu 85 90
95Ala Tyr Ala Ile Gly Glu His Glu Ala Ala Trp Val Arg
Ala His Gly 100 105 110Asp Pro
Leu Val Phe Val His Met Leu Ala Ala Phe Ser Ala Ala Ile 115
120 125Gly Thr Ala Val Tyr Gly Arg Glu Glu Leu
Leu Pro Thr Leu Arg Arg 130 135 140Val
Thr Ala Arg145731368DNAThermomonospora curvata 73atgtcacagc tggcggtcac
agaccaccac gagcgagcgg tcgaggcgct gcgcaggtcg 60tatgcggcga tcccgccggg
cacaccggtc cgcttggcca agcagacctc caacctgttc 120cgcttccgcg agccgacggc
cgcgcccggc ctggacgtgt ccggcttcaa ccgggtgctg 180gcggtggacc cggatgcgcg
caccgccgac gtgcagggca tgaccaccta cgaggacctg 240gtcgacgcca ccctgccgca
cgggctgatg ccgctggtgg tgccccagct caagacgatc 300acgctgggcg gggcggtgac
cggcctgggc atcgagtcca cctccttccg caacggcctg 360ccgcacgagt cggtgctgga
gatgcagatc atcaccggcg ccggcgaagt ggtcaccgcc 420accccggacg gggagcactc
cgacctgttc tggggcttcc ccaactccta cgggacgctg 480gggtacgccc tgaagctgaa
gatcgaactg gagccggtca agccgtacgt ccggctgcgg 540cacctgcgct tcgacgacgc
cggcgagtgc gccgccaagc tcgccgagct gagcgaaagc 600cgcgagcacg agggcgatga
ggtgcacttt ttggacggca ccttcttcgg gccgcgcgag 660atgtacctga cgctcggcac
gttcaccgac accgccccct atgtgtcgga ctacaccggg 720cagcacatct actaccggtc
gatccagcag cggtcgatcg actttttgac catccgcgac 780tacctgtggc gctgggacac
cgactggttc tggtgctcgc gcgccctggg cgtgcagaac 840ccgctgatcc ggcgggtgtg
gccgaagagc gccaagcggt cggatgtgta ccgcaagctg 900gtggcctacg aaaagcgcta
ccagttcaag gcgcgcatcg accggtggac gggcaagccg 960ccgcgcgagg acgtcatcca
ggacatcgag gtgccggcag aacgcctgcc ggagttcctg 1020gagttcttcc acgacaagat
cgggatgagc ccggtgtggc tgtgcccgct gcgggcgcgc 1080caccgctggc cgctgtaccc
gctcaagccc ggcgtcacct acgtcaacgc cggcttctgg 1140gggacggtgc cgctgcagcc
ggggcagatg cccgagtacc acaaccggct gatcgaacgg 1200aaggtcgccc aactggacgg
ccacaagtct ctgtactcga cggcgttcta ctcgcgtgag 1260gagttctggc ggcactacga
cggggaaacc taccggcgtc tgaaggacac ctacgacccc 1320gacgcgcgcc tgctcgacct
ctacgacaag tgcgtgcggg gacgctga 136874455PRTThermomonospora
curvata 74Met Ser Gln Leu Ala Val Thr Asp His His Glu Arg Ala Val Glu
Ala1 5 10 15Leu Arg Arg
Ser Tyr Ala Ala Ile Pro Pro Gly Thr Pro Val Arg Leu 20
25 30Ala Lys Gln Thr Ser Asn Leu Phe Arg Phe
Arg Glu Pro Thr Ala Ala 35 40
45Pro Gly Leu Asp Val Ser Gly Phe Asn Arg Val Leu Ala Val Asp Pro 50
55 60Asp Ala Arg Thr Ala Asp Val Gln Gly
Met Thr Thr Tyr Glu Asp Leu65 70 75
80Val Asp Ala Thr Leu Pro His Gly Leu Met Pro Leu Val Val
Pro Gln 85 90 95Leu Lys
Thr Ile Thr Leu Gly Gly Ala Val Thr Gly Leu Gly Ile Glu 100
105 110Ser Thr Ser Phe Arg Asn Gly Leu Pro
His Glu Ser Val Leu Glu Met 115 120
125Gln Ile Ile Thr Gly Ala Gly Glu Val Val Thr Ala Thr Pro Asp Gly
130 135 140Glu His Ser Asp Leu Phe Trp
Gly Phe Pro Asn Ser Tyr Gly Thr Leu145 150
155 160Gly Tyr Ala Leu Lys Leu Lys Ile Glu Leu Glu Pro
Val Lys Pro Tyr 165 170
175Val Arg Leu Arg His Leu Arg Phe Asp Asp Ala Gly Glu Cys Ala Ala
180 185 190Lys Leu Ala Glu Leu Ser
Glu Ser Arg Glu His Glu Gly Asp Glu Val 195 200
205His Phe Leu Asp Gly Thr Phe Phe Gly Pro Arg Glu Met Tyr
Leu Thr 210 215 220Leu Gly Thr Phe Thr
Asp Thr Ala Pro Tyr Val Ser Asp Tyr Thr Gly225 230
235 240Gln His Ile Tyr Tyr Arg Ser Ile Gln Gln
Arg Ser Ile Asp Phe Leu 245 250
255Thr Ile Arg Asp Tyr Leu Trp Arg Trp Asp Thr Asp Trp Phe Trp Cys
260 265 270Ser Arg Ala Leu Gly
Val Gln Asn Pro Leu Ile Arg Arg Val Trp Pro 275
280 285Lys Ser Ala Lys Arg Ser Asp Val Tyr Arg Lys Leu
Val Ala Tyr Glu 290 295 300Lys Arg Tyr
Gln Phe Lys Ala Arg Ile Asp Arg Trp Thr Gly Lys Pro305
310 315 320Pro Arg Glu Asp Val Ile Gln
Asp Ile Glu Val Pro Ala Glu Arg Leu 325
330 335Pro Glu Phe Leu Glu Phe Phe His Asp Lys Ile Gly
Met Ser Pro Val 340 345 350Trp
Leu Cys Pro Leu Arg Ala Arg His Arg Trp Pro Leu Tyr Pro Leu 355
360 365Lys Pro Gly Val Thr Tyr Val Asn Ala
Gly Phe Trp Gly Thr Val Pro 370 375
380Leu Gln Pro Gly Gln Met Pro Glu Tyr His Asn Arg Leu Ile Glu Arg385
390 395 400Lys Val Ala Gln
Leu Asp Gly His Lys Ser Leu Tyr Ser Thr Ala Phe 405
410 415Tyr Ser Arg Glu Glu Phe Trp Arg His Tyr
Asp Gly Glu Thr Tyr Arg 420 425
430Arg Leu Lys Asp Thr Tyr Asp Pro Asp Ala Arg Leu Leu Asp Leu Tyr
435 440 445Asp Lys Cys Val Arg Gly Arg
450 455751263DNAThermomonospora curvata 75atgacgctgg
ccaaggtctt cgaggagctg gtcggggcgg acgcccctgt ggagctcacc 60gcctacgacg
gatcgagagc cggacgcctg ggcagtgatc tgcgggtcca cgtgaagtcg 120ccgtacgcgg
tgtcctacct ggtgcactcg ccgagcgcgc tcgggctggc ccgcgcgtac 180gtggccgggc
acctggacgc ctacggcgac atgtacacgc tgctgcggga gatgacgcag 240ctgaccgagg
cgctgacgcc caaggcccgg ctgcggctgc tggccggtgt cctgcaggat 300ccgctgctgc
gcgcggcggc cagccgccgt ctgccgcccc cgccgcagga ggtgcggacc 360ggccgcacct
cctggttccg gcacaccaag cggcgggacg ccaaggccat ctcccaccac 420tacgacgtgt
ccaacacctt ctatgagtgg gtgctgggcc cgtcgatgac ctacacctgc 480gcctgtttcc
ccaccgagga cgccaccttg gaggaggcgc agttccacaa gcacgacctg 540gtcgccaaga
agctcgggct gcggccgggc atgcggctgc tggacgtggg ctgcggctgg 600ggcggcatgg
tgatgcacgc cgccaagcac tacggggtgc gggcgctggg cgtcacgctg 660tccaagcagc
aggccgagtg ggcgcagaag gccatcgccg aggcgggcct gagcgacctg 720gccgaggtcc
gccaccagga ctaccgggac gtcaccgagg gcgacttcga cgccatcagc 780tcgatcggcc
tcaccgagca catcggcaag gccaacctgc cgtcctactt cggcttcctg 840tacggcaagc
tcaagccggg cgggcggctg ctcaaccact gcatcacccg gcccgacaac 900acccagccgg
ccatgaagaa ggacgggttc atcaaccggt acgtcttccc cgacggggag 960ctggaggggc
ccggctacct gcagacccag atgaacgacg ccggttttga gatccgccac 1020caggagaacc
tgcgcgagca ctacgcccgc accctggccg gatggtgccg caacctcgat 1080gagcactggg
acgaggcggt ggccgaggtc ggcgagggca ccgcgcgggt gtggcggctg 1140tacatggccg
gcagccggct cggtttcgag ctcaactgga tccagctgca ccagatcctg 1200ggcgtcaagc
tcggcgagcg cggcgagtcc cgcatgccgt tgcggcccga ctggggcgtg 1260tga
126376420PRTThermomonospora curvata 76Met Thr Leu Ala Lys Val Phe Glu Glu
Leu Val Gly Ala Asp Ala Pro1 5 10
15Val Glu Leu Thr Ala Tyr Asp Gly Ser Arg Ala Gly Arg Leu Gly
Ser 20 25 30Asp Leu Arg Val
His Val Lys Ser Pro Tyr Ala Val Ser Tyr Leu Val 35
40 45His Ser Pro Ser Ala Leu Gly Leu Ala Arg Ala Tyr
Val Ala Gly His 50 55 60Leu Asp Ala
Tyr Gly Asp Met Tyr Thr Leu Leu Arg Glu Met Thr Gln65 70
75 80Leu Thr Glu Ala Leu Thr Pro Lys
Ala Arg Leu Arg Leu Leu Ala Gly 85 90
95Val Leu Gln Asp Pro Leu Leu Arg Ala Ala Ala Ser Arg Arg
Leu Pro 100 105 110Pro Pro Pro
Gln Glu Val Arg Thr Gly Arg Thr Ser Trp Phe Arg His 115
120 125Thr Lys Arg Arg Asp Ala Lys Ala Ile Ser His
His Tyr Asp Val Ser 130 135 140Asn Thr
Phe Tyr Glu Trp Val Leu Gly Pro Ser Met Thr Tyr Thr Cys145
150 155 160Ala Cys Phe Pro Thr Glu Asp
Ala Thr Leu Glu Glu Ala Gln Phe His 165
170 175Lys His Asp Leu Val Ala Lys Lys Leu Gly Leu Arg
Pro Gly Met Arg 180 185 190Leu
Leu Asp Val Gly Cys Gly Trp Gly Gly Met Val Met His Ala Ala 195
200 205Lys His Tyr Gly Val Arg Ala Leu Gly
Val Thr Leu Ser Lys Gln Gln 210 215
220Ala Glu Trp Ala Gln Lys Ala Ile Ala Glu Ala Gly Leu Ser Asp Leu225
230 235 240Ala Glu Val Arg
His Gln Asp Tyr Arg Asp Val Thr Glu Gly Asp Phe 245
250 255Asp Ala Ile Ser Ser Ile Gly Leu Thr Glu
His Ile Gly Lys Ala Asn 260 265
270Leu Pro Ser Tyr Phe Gly Phe Leu Tyr Gly Lys Leu Lys Pro Gly Gly
275 280 285Arg Leu Leu Asn His Cys Ile
Thr Arg Pro Asp Asn Thr Gln Pro Ala 290 295
300Met Lys Lys Asp Gly Phe Ile Asn Arg Tyr Val Phe Pro Asp Gly
Glu305 310 315 320Leu Glu
Gly Pro Gly Tyr Leu Gln Thr Gln Met Asn Asp Ala Gly Phe
325 330 335Glu Ile Arg His Gln Glu Asn
Leu Arg Glu His Tyr Ala Arg Thr Leu 340 345
350Ala Gly Trp Cys Arg Asn Leu Asp Glu His Trp Asp Glu Ala
Val Ala 355 360 365Glu Val Gly Glu
Gly Thr Ala Arg Val Trp Arg Leu Tyr Met Ala Gly 370
375 380Ser Arg Leu Gly Phe Glu Leu Asn Trp Ile Gln Leu
His Gln Ile Leu385 390 395
400Gly Val Lys Leu Gly Glu Arg Gly Glu Ser Arg Met Pro Leu Arg Pro
405 410 415Asp Trp Gly Val
420777102DNAArtificial SequenceSynthetic Nucleic Acid 77gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgtctgtgg ttactactga 120cgcacaggct
gcccatgccg ccggcgtctc gcgtcttctg gccagctacc gggcgatccc 180gcccagcgcg
acagtgcgcc ttgcgaaacc gacgtccaac ctgttccgcg cccgcgcccg 240caccaatgtg
aagggtctcg acgtctcggg cctgaccggt gtgatcggtg tcgacccgga 300cgcgcgcacc
gccgatgtgg cgggcatgtg cacctacgag gacctggtgg cggccacgct 360tccgtacggc
cttgccccac tggtggtgcc gcagctcaag accatcacgc tcggtggcgc 420ggtcaccggt
ctgggcatcg agtccacgtc gttccgcaac ggtctgccgc acgaaagtgt 480cctggagatg
gacatcttga ccggttcggg cgagatcgtc acggcctcac cggatcagca 540ctcggatctg
ttccatgcgt tccccaattc atatggaacc cttggttatt ccacccggct 600gcgcatcgaa
ctggagcccg tgcacccgtt tgtggcgttg cgccacctgc gctttcactc 660gatcaccgat
ctggtcgcgg cgatggaccg gatcatcgag accggcgggc tggacggtga 720acccgtcgac
tacctcgacg gcgtggtgtt cagcgcgact gagagttacc tgtgtgttgg 780cttcaagacg
aaaacgccgg ggccggtcag cgattacaca ggtcagcaga tcttctaccg 840gtcgatccag
catgacggcg acaccggcgc cgagaaacac gaccggctga ccatccacga 900ctacctgtgg
cgctgggaca ccgactggtt ctggtgctca cgggcattcg gcgctcagca 960tccggtgatc
cgcaggttct ggccgcggcg gctgcgccgc agcagcttct actggaagct 1020ggtggcctac
gaccagcggt acgacatcgc cgaccgtatc gagaagcgca acgggcgccc 1080gccgcgcgag
cgggtggtcc aggacgtcga ggtgcccatc gagcggtgcg cggacttcgt 1140cgagtggttc
ctgcagaatg tgccgatcga gccgatctgg ctgtgccccc tacggttgcg 1200tgacagcgcc
gacggcggtg cctcgtggcc cctgtatccg ctgaaggcgc accacaccta 1260cgtcaacatc
ggtttctggt catcagtgcc ggtgggcccc gaggagggcc acaccaaccg 1320cctcatcgag
aaaaaagtcg cggagctgga cgggcacaaa tctttgtact cggacgctta 1380ttacacacgt
gacgaattcg acgagctgta cggcggtgag gtctacaaca ccgtcaagaa 1440gacgtacgac
ccggattcac gtctgctaga cctgtattcg aaggcggtgc aaagacaatg 1500accacattca
aagaacgcga gacgtccaca gcggaccgca agctcaccct ggccgagatc 1560ctcgagatct
tcgccgcggg taaggagccg ctgaagttca ctgcgtacga cggcagctcg 1620gccggtcccg
aggacgccac gatgggtctg gacctcaaga ccccgcgtgg gaccacctat 1680ctggccacgg
cacccggcga tctgggcctg gcccgtgcgt atgtctccgg tgacctggag 1740ccgcacggcg
tgcatcccgg cgatccctac ccgctgctgc gcgccctggc cgaacgcatg 1800gagttcaagc
gcccgcctgc gcgtgtgctg gcgaacatcg tgcgctccat cggcatcgag 1860cacctcaagc
cgatcgcacc gccgccgcag gaggcgctgc cccggtggcg ccgcatcatg 1920gagggcctgc
ggcacagcaa gacccgcgac gccgaggcca tccaccacca ctacgacgtg 1980tcgaacacgt
tctacgagtg ggtgctgggc ccgtcgatga cctacacgtg cgcgtgctac 2040cccaccgagg
acgcgaccct cgaagaggcc caggacaaca agtaccgcct ggtgttcgag 2100aagctgcgcc
tgaagcccgg tgaccggttg ctcgacgtgg gctgcggctg gggcggcatg 2160gtccgctacg
cggcccgcca cggcgtcaag gcgctcggtg tcacgctcag ccgcgaacag 2220gcgacgtggg
cgcagaaggc catcgcccag gaaggtctca ccgatctggc cgaggtgcgt 2280cacggtgatt
accgcgacgt catcgaatcc gggttcgacg cggtgtcctc gatcgggctg 2340accgagcaca
tcggcgtgca caactacccg gcgtacttca acttcctcaa gtcgaagctg 2400cgcaccggtg
gcctgctgct caaccactgc atcacccgcc cggacaaccg gtcggcgcca 2460tcggccggcg
ggttcatcga caggtacgtg ttccccgacg gggagctcac cggctcgggc 2520cgcatcatca
ccgaggccca ggacgtgggc cttgaggtga tccacgagga gaacctacgc 2580aatcactatg
cgatgacgct gcgcgactgg tgccgcaacc tggtcgagca ctgggacgag 2640gcggtcgaag
aggtcgggct gcccaccgcg aaggtgtggg gcctgtacat ggccggctca 2700cgtctgggct
tcgagaccaa tgtggttcag ctgcaccagg ttctggcggt caagcttgac 2760gatcagggca
aggacggcgg actgccgttg cggccctggt ggtccgccta gcctcaaaat 2820atattttccc
tctatcttct cgttgcgctt aatttgacta attctcatta gcgaggcgcg 2880cctttccata
ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 2940tggcgaaacc
cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 3000cgctctcctg
ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 3060agcgtggcgc
tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 3120tccaagctgg
gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 3180aactatcgtc
ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 3240ggtaacagga
ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 3300cctaactacg
gctacactag aagaacagta tttggtatct gcgctctgct gaagccagtt 3360accttcggaa
aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 3420ggtttttttg
tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 3480ttgatctttt
ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 3540gtcatgagat
tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 3600aaatcaatct
aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 3660gaggcaccta
tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 3720gtgtagataa
ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 3780cgagacccac
gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 3840gagcgcagaa
gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 3900gaagctagag
taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca 3960ggcatcgtgg
tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 4020tcaaggcgag
ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 4080ccgatcgttg
tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 4140cataattctc
ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 4200accaagtcat
tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata 4260cgggataata
ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 4320tcggggcgaa
aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 4380cgtgcaccca
actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 4440acaggaaggc
aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 4500atactcttcc
tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 4560tacatatttg
aatgtattta gaaaaataaa cagcgatcgc gcggccgcgg gtaataactg 4620atataattaa
attgaagctc taatttgtga gtttagtata catgcattta cttataatac 4680agttttttag
ttttgctggc cgcatcttct caaatatgct tcccagcctg cttttctgta 4740acgttcaccc
tctaccttag catcccttcc ctttgcaaat agtcctcttc caacaataat 4800aatgtcagat
cctgtagaga ccacatcatc cacggttcta tactgttgac ccaatgcgtc 4860tcccttgtca
tctaaaccca caccgggtgt cataatcaac caatcgtaac cttcatctct 4920tccacccatg
tctctttgag caataaagcc gataacaaaa tctttgtcgc tcttcgcaat 4980gtcaacagta
cccttagtat attctccagt agctagggag cccttgcatg acaattctgc 5040taacatcaaa
aggcctctag gttcctttgt tacttcttcc gccgcctgct tcaaaccgct 5100aacaatacct
gggcccacca caccgtgtgc attcgtaatg tctgcccatt ctgctattct 5160gtatacaccc
gcagagtact gcaatttgac tgtattacca atgtcagcaa attttctgtc 5220ttcgaagagt
aaaaaattgt acttggcgga taatgccttt agcggcttaa ctgtgccctc 5280catggaaaaa
tcagtcaaga tatccacatg tgtttttagt aaacaaattt tgggacctaa 5340tgcttcaact
aactccagta attccttggt ggtacgaaca tccaatgaag cacacaagtt 5400tgtttgcttt
tcgtgcatga tattaaatag cttggcagca acaggactag gatgagtagc 5460agcacgttcc
ttatatgtag ctttcgacat gatttatctt cgtttcctgc aggtttttgt 5520tctgtgcagt
tgggttaaga atactgggca atttcatgtt tcttcaacac cacatatgcg 5580tatatatacc
aatctaagtc tgtgctcctt ccttcgttct tccttctgct cggagattac 5640cgaatcaaag
ctagcttatc gatgataagc tgtcaaagat gagaattaat tccacggact 5700atagactata
ctagatactc cgtctactgt acgatacact tccgctcagg tccttgtcct 5760ttaacgaggc
cttaccactc ttttgttact ctattgatcc agctcagcaa aggcagtgtg 5820atctaagatt
ctatcttcgc gatgtagtaa aactagctag accgagaaag agactagaaa 5880tgcaaaaggc
acttctacaa tggctgccat cattattatc cgatgtgacg ctgcagcttc 5940tcaatgatat
tcgaatacgc tttgaggaga tacagcctaa tatccgacaa actgttttac 6000agatttacga
tcgtacttgt tacccatcat tgaattttga acatccgaac ctgggagttt 6060tccctgaaac
agatagtata tttgaacctg tataataata tatagtctag cgctttacgg 6120aagacaatgt
atgtatttcg gttcctggag aaactattgc atctattgca taggtaatct 6180tgcacgtcgc
atccccggtt cattttctgc gtttccatct tgcacttcaa tagcatatct 6240ttgttaacga
agcatctgtg cttcattttg tagaacaaaa atgcaacgcg agagcgctaa 6300tttttcaaac
aaagaatctg agctgcattt ttacagaaca gaaatgcaac gcgaaagcgc 6360tattttacca
acgaagaatc tgtgcttcat ttttgtaaaa caaaaatgca acgcgacgag 6420agcgctaatt
tttcaaacaa agaatctgag ctgcattttt acagaacaga aatgcaacgc 6480gagagcgcta
ttttaccaac aaagaatcta tacttctttt ttgttctaca aaaatgcatc 6540ccgagagcgc
tatttttcta acaaagcatc ttagattact ttttttctcc tttgtgcgct 6600ctataatgca
gtctcttgat aactttttgc actgtaggtc cgttaaggtt agaagaaggc 6660tactttggtg
tctattttct cttccataaa aaaagcctga ctccacttcc cgcgtttact 6720gattactagc
gaagctgcgg gtgcattttt tcaagataaa ggcatccccg attatattct 6780ataccgatgt
ggattgcgca tactttgtga acagaaagtg atagcgttga tgattcttca 6840ttggtcagaa
aattatgaac ggtttcttct attttgtctc tatatactac gtataggaaa 6900tgtttacatt
ttcgtattgt tttcgattca ctctatgaat agttcttact acaatttttt 6960tgtctaaaga
gtaatactag agataaacat aaaaaatgta gaggtcgagt ttagatgcaa 7020gttcaaggag
cgaaaggtgg atgggtaggt tatataggga tatagcacag agatatatag 7080caaagagata
cttttgagca at
71027810766DNAArtificial SequenceSynthetic Nucleic Acid 78ttatcgatga
taagctgtca aagatgagaa ttaattccac ggactataga ctatactaga 60tactccgtct
actgtacgat acacttccgc tcaggtcctt gtcctttaac gaggccttac 120cactcttttg
ttactctatt gatccagctc agcaaaggca gtgtgatcta agattctatc 180ttcgcgatgt
agtaaaacta gctagaccga gaaagagact agaaatgcaa aaggcacttc 240tacaatggct
gccatcatta ttatccgatg tgacgctgca gcttctcaat gatattcgaa 300tacgctttga
ggagatacag cctaatatcc gacaaactgt tttacagatt tacgatcgta 360cttgttaccc
atcattgaat tttgaacatc cgaacctggg agttttccct gaaacagata 420gtatatttga
acctgtataa taatatatag tctagcgctt tacggaagac aatgtatgta 480tttcggttcc
tggagaaact attgcatcta ttgcataggt aatcttgcac gtcgcatccc 540cggttcattt
tctgcgtttc catcttgcac ttcaatagca tatctttgtt aacgaagcat 600ctgtgcttca
ttttgtagaa caaaaatgca acgcgagagc gctaattttt caaacaaaga 660atctgagctg
catttttaca gaacagaaat gcaacgcgaa agcgctattt taccaacgaa 720gaatctgtgc
ttcatttttg taaaacaaaa atgcaacgcg acgagagcgc taatttttca 780aacaaagaat
ctgagctgca tttttacaga acagaaatgc aacgcgagag cgctatttta 840ccaacaaaga
atctatactt cttttttgtt ctacaaaaat gcatcccgag agcgctattt 900ttctaacaaa
gcatcttaga ttactttttt tctcctttgt gcgctctata atgcagtctc 960ttgataactt
tttgcactgt aggtccgtta aggttagaag aaggctactt tggtgtctat 1020tttctcttcc
ataaaaaaag cctgactcca cttcccgcgt ttactgatta ctagcgaagc 1080tgcgggtgca
ttttttcaag ataaaggcat ccccgattat attctatacc gatgtggatt 1140gcgcatactt
tgtgaacaga aagtgatagc gttgatgatt cttcattggt cagaaaatta 1200tgaacggttt
cttctatttt gtctctatat actacgtata ggaaatgttt acattttcgt 1260attgttttcg
attcactcta tgaatagttc ttactacaat ttttttgtct aaagagtaat 1320actagagata
aacataaaaa atgtagaggt cgagtttaga tgcaagttca aggagcgaaa 1380ggtggatggg
taggttatat agggatatag cacagagata tatagcaaag agatactttt 1440gagcaatgtt
tgtggaagcg gtattcgcaa tgtttaaact gcgtcggaac gggatatgca 1500ttcccctagt
ttcgccgcag tgcagaatca ggcggtttct ttgcaccaca ccacatacgg 1560aggatgacgg
gcattattga tgttgaatag taacctgatc gtgactagta tgacggaacc 1620caacagcaac
agccgaccgt ttgtgagcgt ttttgcggcc ggtcaggcga gtttttccgg 1680cctgccaatg
gtccttccgt accctttacc ctgtacgctg tacctgccac ggataggccg 1740tgctccacct
gctcactatg gtgggtgcgg ggaaaacaac aggcaggctc aattgctctg 1800caaatgggtt
gagggggtga ttgatgtcac tggtacacca acaggggaat gctcggcgtt 1860gattttgggc
cacctctttt gtttgccaga gcttgtctct attgtcaaat ttaacggtct 1920gcaactgttg
cccaaaatgg gacaatgatc cgatgcctgc atagacaccc tgcttgaggg 1980tgcgatcgcc
ctaatacgag gcaaaccaag ttttccaatt gaccttcaat tgacgagcgg 2040ttgttgcgac
aggggactgg agtgctacct gtttagagtt caaatccgtc acccagcatt 2100gaaagttttt
ccccgcattg gatgattgca atgccgctaa cccgctcatc cgccaaagtt 2160catagtccca
ccctgcctcg acttatcgga ccacatgggg ctcccttatg cgcgcgcata 2220tggcgcttga
ttgctttttg gtcaacgttt gggacaaatt tcctttgtta aggcggaccc 2280gccagcagat
acgaaggtat aaatagggct cactttcacc atcttgtcca ttcaattgca 2340agactcaaaa
gtaataatga ccactctgga tgacaccgct taccgatacc gaacttccgt 2400tcctggcgat
gccgaggcta ttgaggctct ggatggatct ttcaccactg acaccgtttt 2460ccgagtgacc
gctactggcg acggcttcac cctgcgagag gtgcctgtcg accctcctct 2520caccaaggtt
ttccctgacg atgagtcgga cgatgagtct gacgctggag aggacggcga 2580ccctgactct
cgaactttcg tggcttacgg cgacgatgga gacctggccg gctttgtggt 2640cgtttcttac
tccggatgga accgacgact gaccgtggag gacatcgagg tcgctcctga 2700gcaccgaggt
catggtgtcg gacgagctct gatgggtctc gctactgagt tcgctcgaga 2760gcgaggtgct
ggccacctgt ggctcgaggt caccaacgtt aacgcccctg ctattcatgc 2820ctaccgacga
atgggtttta ccctgtgtgg cctcgatact gccctgtacg acggaaccgc 2880ttccgatgga
gagcaggccc tctacatgtc gatgccctgc ccttaaacag gccccttttc 2940ctttgtcgat
atcatgtaat tagttatgtc acgcttacat tcacgccctc ctcccacatc 3000cgctctaacc
gaaaaggaag gagttagaca acctgaagtc taggtcccta tttatttttt 3060ttaatagtta
tgttagtatt aagaacgtta tttatatttc aaatttttct tttttttctg 3120tacaaacgcg
tgtacgcatg taacattata ctgaaaacct tgcttgagaa ggttttggga 3180cgctcgaagg
ctttaatttg cagagaccgg gttggcggcg catttgtgtc ccaaaaaaca 3240gccccaattg
ccccaattga ccccaaattg acccagtagc gggcccaacc ccggcgagag 3300cccccttctc
cccacatatc aaacctcccc cggttcccac acttgccgtt aagggcgtag 3360ggtactgcag
tctggaatct acgcttgttc agactttgta ctagtttctt tgtctggcca 3420tccgggtaac
ccatgccgga cgcaaaatag actactgaaa atttttttgc tttgtggttg 3480ggactttagc
caagggtata aaagaccacc gtccccgaat tacctttcct cttcttttct 3540ctctctcctt
gtcaactcac acccgaaatc gttaagcatt tccttctgag tataagaatc 3600attcaaaatg
tccgttgtta ccaccgatgc tcaagctgct catgctgctg gtgtttctag 3660attattggct
tcttatagag ccattccacc atctgctact gttagattgg ctaagccaac 3720ttctaatttg
ttcagagcta gagctagaac taacgttaag ggtttggatg tttctggttt 3780gactggtgtt
attggtgttg atccagatgc tagaactgct gatgttgctg gtatgtgtac 3840ttacgaagat
ttggttgctg ctactttgcc atatggtttg gctccattgg ttgttccaca 3900attgaaaact
attactttgg gtggtgctgt taccggtttg ggtattgaat ctacttcttt 3960cagaaacggt
ttgccacacg aatctgtttt ggaaatggat attttgaccg gttccggtga 4020aatagttact
gcttctccag atcaacactc cgatttgttt catgcttttc caaactctta 4080cggtacattg
ggttactcta ccagattgag aattgaattg gaaccagttc atccattcgt 4140tgccttgaga
catttgagat tccattccat tactgatttg gtcgcagcca tggatagaat 4200tattgaaact
ggtggtttag acggtgaacc agttgattat ttggatggtg ttgttttctc 4260tgccaccgaa
tcatatttgt gtgttggttt caaaactaag accccaggtc cagtttctga 4320ttatactggt
caacaaatct tctacagatc catccaacat gatggtgata ctggtgctga 4380aaaacatgat
agattgacca tccatgacta cttgtggaga tgggatactg attggttttg 4440gtgttctaga
gcttttggtg ctcaacatcc agttattaga agattctggc caagaagatt 4500aagaagatcc
tccttctact ggaaattggt tgcttacgat caaagatacg atatcgccga 4560tagaatcgaa
aagagaaatg gtagaccacc aagagaaaga gttgttcaag acgttgaagt 4620tccaattgaa
agatgcgctg atttcgttga atggttcttg caaaatgttc caatcgaacc 4680tatttggttg
tgcccattga gattgagaga ttctgctgat ggtggtgctt catggccatt 4740atatccattg
aaagctcatc acacctacgt caatattggt ttctggtcat ctgttccagt 4800tggtccagaa
gaaggtcata ccaatagatt gattgaaaaa aaggtcgccg aattggacgg 4860tcacaaatca
ttatattctg atgcctacta caccagagat gaattcgatg aattatacgg 4920tggtgaagtt
tacaacaccg tcaaaaaaac ttacgaccca gactcaagat tattagactt 4980gtactctaag
gccgtccaaa gacaatgagc tgcttgtacc tagtgcaacc ccagtttgtt 5040aaaaattagt
agtcaaaaac ttctgagtta gaaatttgtg agtgtagtga gattgtagag 5100tatcatgtgt
gtccgtaagt gaagtgttat tgactcttag ttagtttatc tagtactcgt 5160ttagttgaca
ctgatctagt attttacgag gcgtatgact ttagccaagt gttgtactta 5220gtcttctctc
caaacatgag agggctctgt cactcagtcg gcctatgggt gagatggctt 5280ggtgagatct
ttcgatagtc tcgtcaagat ggtaggatga tgggggaata cattactgct 5340ctcgtcaagg
aaaccacaat cagatcacac catcctccat ggtatccgat gactctcttc 5400tccacagtcg
cagtaggatg tcctgcacgg gtctttttgt ggggtgtgga gaaaggggtg 5460cttggagatg
gaagccggta gaaccgggct gcttgggggg atttggggcc gctgggctcc 5520aaagaggggt
aggcatttcg ttggggttac gtaattgcgg catttgggtc ctgcgcgcat 5580gtcccattgg
tcagaattag tccggatagg agacttatca gccaatcaca gcgccggatc 5640cacctgtagg
ttgggttggg tgggagcacc cctccacaga gtagagtcaa acagcagcag 5700caacgtgata
gttgggggtg tgcgtgttaa aggaaaaaaa aagaagcttg ggttatattc 5760ccgctctatt
tagaggttgc gggatagacg ccgacggagg gcaatggcgc catggaacct 5820tgcggatatc
gatacgccgc ggcggactgc gtccgaacca gctccagcag cgttttttcc 5880gggccattga
gccgactgcg accccgccaa cgtgtcttgg cccacgcact catgtcatgt 5940tggtgttggg
aggccacttt ttaagtagca caaggcacct agctcgcggc agggtgtccg 6000aaccaaagaa
gcggctgcag tggtgcaaac ggggcggaaa cggcgggaaa aagccacggg 6060ggcacgaatt
gaggcacgcc ctcgaatttg agacgagtca cggccccatt cgcccgcgca 6120atggctcgcc
aacgcccggt cttttgcacc acatcaggtt accccaagcc aaacctttgt 6180gttaaaaagc
ttaacatatt ataccgaacg taggtttggg cgggcttgct ccgtctgtcc 6240aaggcaacat
ttatataagg gtctgcatcg ccggctcaat tgaatctttt ttcttcttct 6300cttctctata
ttcattcttg aattaaacac acatcaacaa tgaccacctt caaagaaaga 6360gaaacttcta
ccgctgatag aaagttgacc ttggctgaaa ttttggaaat tttcgctgct 6420ggtaaagaac
cattgaagtt cactgcttat gatggttctt ctgctggtcc tgaagatgct 6480actatgggtt
tggatttgaa aactccaaga ggtactactt acttggctac tgctccaggt 6540gatttgggtt
tggctagagc ttatgtttct ggtgacttgg aaccacatgg tgttcatcct 6600ggtgatccat
atccattatt gagagcttta gccgaaagaa tggaattcaa aagaccacca 6660gctagagttt
tggctaacat cgttagatcc attggtatcg aacatttgaa gccaattgct 6720ccaccaccac
aagaagcttt gccaagatgg agaagaatta tggaaggttt gagacactct 6780aagaccagag
atgctgaagc tattcatcat cactacgatg tttctaacac cttctacgaa 6840tgggttttgg
gtccatctat gacttatact tgtgcttgtt acccaacaga agatgccact 6900ttggaagaag
ctcaagataa caagtacaga ttggtctttg aaaagttgag attgaagcca 6960ggtgacagat
tattggatgt tggttgtggt tggggtggta tggttagata tgctgctaga 7020catggtgtaa
aagctttggg tgttactttg tctagagaac aagctacttg ggctcaaaaa 7080gctattgctc
aagaaggttt aaccgatttg gctgaagtta gacacggtga ttacagagat 7140gttatcgaat
ctggtttcga tgccgtttct tctattggtt tgactgaaca tatcggtgtt 7200cataactatc
cagcctactt caacttcttg aagtctaagt tgagaaccgg tggtttgttg 7260ttgaaccatt
gcattactag accagataac agatctgctc catctgctgg tggttttatt 7320gatagatacg
ttttcccaga tggtgaattg actggttccg gtagaattat tactgaagca 7380caagatgtcg
gtttggaagt tatccatgaa gaaaacttga gaaaccatta cgccatgact 7440ttgagagatt
ggtgtagaaa cttggttgaa cattgggatg aagccgttga agaagttggt 7500ttgccaactg
ctaaagtttg gggtttgtat atggctggtt ctagattagg ttttgaaact 7560aacgttgtcc
aattgcacca agttttggca gttaagttgg atgatcaagg taaagatggt 7620ggtttgcctt
taagaccatg gtggtctgct tgagcattag cgactactaa tatatatttg 7680aatccatgga
attataacaa acaagcatca aaacaagaat tagcgacatt atacttgaaa 7740tcagcattag
cgatactact aatatagttt attctatgta atgatccatg gaagttcgat 7800tgatttgcca
agttaatttg atagattatg catgccattt agtcgacgca ggtacgatct 7860acagcgataa
agaagaggtt gtgggtcatt caattttgca ccaattttgc accatcatag 7920atcataatac
atttacaagg cctacaattc ttacagggtc ttctcgagag caattcctta 7980attaaggcgc
gcctttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct 8040caagtcagag
gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa 8100gctccctcgt
gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc 8160tcccttcggg
aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt 8220aggtcgttcg
ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg 8280ccttatccgg
taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg 8340cagcagccac
tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct 8400tgaagtggtg
gcctaactac ggctacacta gaagaacagt atttggtatc tgcgctctgc 8460tgaagccagt
taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg 8520ctggtagcgg
tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc 8580aagaagatcc
tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt 8640aagggatttt
ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa 8700aatgaagttt
taaatcaatc taaagtatat atgagtaaac ttggtctgac agttaccaat 8760gcttaatcag
tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct 8820gactccccgt
cgtgtagata actacgatac gggagggctt accatctggc cccagtgctg 8880caatgatacc
gcgagaccca cgctcaccgg ctccagattt atcagcaata aaccagccag 8940ccggaagggc
cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta 9000attgttgccg
ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg 9060ccattgctac
aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg 9120gttcccaacg
atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct 9180ccttcggtcc
tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta 9240tggcagcact
gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg 9300gtgagtactc
aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc 9360cggcgtcaat
acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg 9420gaaaacgttc
ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga 9480tgtaacccac
tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg 9540ggtgagcaaa
aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat 9600gttgaatact
catactcttc ctttttcaat attattgaag catttatcag ggttattgtc 9660tcatgagcgg
atacatattt gaatgtattt agaaaaataa acagcgatcg cgcggccgcg 9720ggtaataact
gatataatta aattgaagct ctaatttgtg agtttagtat acatgcattt 9780acttataata
cagtttttta gttttgctgg ccgcatcttc tcaaatatgc ttcccagcct 9840gcttttctgt
aacgttcacc ctctacctta gcatcccttc cctttgcaaa tagtcctctt 9900ccaacaataa
taatgtcaga tcctgtagag accacatcat ccacggttct atactgttga 9960cccaatgcgt
ctcccttgtc atctaaaccc acaccgggtg tcataatcaa ccaatcgtaa 10020ccttcatctc
ttccacccat gtctctttga gcaataaagc cgataacaaa atctttgtcg 10080ctcttcgcaa
tgtcaacagt acccttagta tattctccag tagctaggga gcccttgcat 10140gacaattctg
ctaacatcaa aaggcctcta ggttcctttg ttacttcttc cgccgcctgc 10200ttcaaaccgc
taacaatacc tgggcccacc acaccgtgtg cattcgtaat gtctgcccat 10260tctgctattc
tgtatacacc cgcagagtac tgcaatttga ctgtattacc aatgtcagca 10320aattttctgt
cttcgaagag taaaaaattg tacttggcgg ataatgcctt tagcggctta 10380actgtgccct
ccatggaaaa atcagtcaag atatccacat gtgtttttag taaacaaatt 10440ttgggaccta
atgcttcaac taactccagt aattccttgg tggtacgaac atccaatgaa 10500gcacacaagt
ttgtttgctt ttcgtgcatg atattaaata gcttggcagc aacaggacta 10560ggatgagtag
cagcacgttc cttatatgta gctttcgaca tgatttatct tcgtttcctg 10620caggtttttg
ttctgtgcag ttgggttaag aatactgggc aatttcatgt ttcttcaaca 10680ccacatatgc
gtatatatac caatctaagt ctgtgctcct tccttcgttc ttccttctgc 10740tcggagatta
ccgaatcaaa gctagc
107667910970DNAArtificial SequenceSynthetic Nucleic Acid 79ttatcgatga
taagctgtca aagatgagaa ttaattccac ggactataga ctatactaga 60tactccgtct
actgtacgat acacttccgc tcaggtcctt gtcctttaac gaggccttac 120cactcttttg
ttactctatt gatccagctc agcaaaggca gtgtgatcta agattctatc 180ttcgcgatgt
agtaaaacta gctagaccga gaaagagact agaaatgcaa aaggcacttc 240tacaatggct
gccatcatta ttatccgatg tgacgctgca gcttctcaat gatattcgaa 300tacgctttga
ggagatacag cctaatatcc gacaaactgt tttacagatt tacgatcgta 360cttgttaccc
atcattgaat tttgaacatc cgaacctggg agttttccct gaaacagata 420gtatatttga
acctgtataa taatatatag tctagcgctt tacggaagac aatgtatgta 480tttcggttcc
tggagaaact attgcatcta ttgcataggt aatcttgcac gtcgcatccc 540cggttcattt
tctgcgtttc catcttgcac ttcaatagca tatctttgtt aacgaagcat 600ctgtgcttca
ttttgtagaa caaaaatgca acgcgagagc gctaattttt caaacaaaga 660atctgagctg
catttttaca gaacagaaat gcaacgcgaa agcgctattt taccaacgaa 720gaatctgtgc
ttcatttttg taaaacaaaa atgcaacgcg acgagagcgc taatttttca 780aacaaagaat
ctgagctgca tttttacaga acagaaatgc aacgcgagag cgctatttta 840ccaacaaaga
atctatactt cttttttgtt ctacaaaaat gcatcccgag agcgctattt 900ttctaacaaa
gcatcttaga ttactttttt tctcctttgt gcgctctata atgcagtctc 960ttgataactt
tttgcactgt aggtccgtta aggttagaag aaggctactt tggtgtctat 1020tttctcttcc
ataaaaaaag cctgactcca cttcccgcgt ttactgatta ctagcgaagc 1080tgcgggtgca
ttttttcaag ataaaggcat ccccgattat attctatacc gatgtggatt 1140gcgcatactt
tgtgaacaga aagtgatagc gttgatgatt cttcattggt cagaaaatta 1200tgaacggttt
cttctatttt gtctctatat actacgtata ggaaatgttt acattttcgt 1260attgttttcg
attcactcta tgaatagttc ttactacaat ttttttgtct aaagagtaat 1320actagagata
aacataaaaa atgtagaggt cgagtttaga tgcaagttca aggagcgaaa 1380ggtggatggg
taggttatat agggatatag cacagagata tatagcaaag agatactttt 1440gagcaatgtt
tgtggaagcg gtattcgcaa tgtttaaact gcgtcggaac gggatatgca 1500ttcccctagt
ttcgccgcag tgcagaatca ggcggtttct ttgcaccaca ccacatacgg 1560aggatgacgg
gcattattga tgttgaatag taacctgatc gtgactagta tgacggaacc 1620caacagcaac
agccgaccgt ttgtgagcgt ttttgcggcc ggtcaggcga gtttttccgg 1680cctgccaatg
gtccttccgt accctttacc ctgtacgctg tacctgccac ggataggccg 1740tgctccacct
gctcactatg gtgggtgcgg ggaaaacaac aggcaggctc aattgctctg 1800caaatgggtt
gagggggtga ttgatgtcac tggtacacca acaggggaat gctcggcgtt 1860gattttgggc
cacctctttt gtttgccaga gcttgtctct attgtcaaat ttaacggtct 1920gcaactgttg
cccaaaatgg gacaatgatc cgatgcctgc atagacaccc tgcttgaggg 1980tgcgatcgcc
ctaatacgag gcaaaccaag ttttccaatt gaccttcaat tgacgagcgg 2040ttgttgcgac
aggggactgg agtgctacct gtttagagtt caaatccgtc acccagcatt 2100gaaagttttt
ccccgcattg gatgattgca atgccgctaa cccgctcatc cgccaaagtt 2160catagtccca
ccctgcctcg acttatcgga ccacatgggg ctcccttatg cgcgcgcata 2220tggcgcttga
ttgctttttg gtcaacgttt gggacaaatt tcctttgtta aggcggaccc 2280gccagcagat
acgaaggtat aaatagggct cactttcacc atcttgtcca ttcaattgca 2340agactcaaaa
gtaataatga ccactctgga tgacaccgct taccgatacc gaacttccgt 2400tcctggcgat
gccgaggcta ttgaggctct ggatggatct ttcaccactg acaccgtttt 2460ccgagtgacc
gctactggcg acggcttcac cctgcgagag gtgcctgtcg accctcctct 2520caccaaggtt
ttccctgacg atgagtcgga cgatgagtct gacgctggag aggacggcga 2580ccctgactct
cgaactttcg tggcttacgg cgacgatgga gacctggccg gctttgtggt 2640cgtttcttac
tccggatgga accgacgact gaccgtggag gacatcgagg tcgctcctga 2700gcaccgaggt
catggtgtcg gacgagctct gatgggtctc gctactgagt tcgctcgaga 2760gcgaggtgct
ggccacctgt ggctcgaggt caccaacgtt aacgcccctg ctattcatgc 2820ctaccgacga
atgggtttta ccctgtgtgg cctcgatact gccctgtacg acggaaccgc 2880ttccgatgga
gagcaggccc tctacatgtc gatgccctgc ccttaaacag gccccttttc 2940ctttgtcgat
atcatgtaat tagttatgtc acgcttacat tcacgccctc ctcccacatc 3000cgctctaacc
gaaaaggaag gagttagaca acctgaagtc taggtcccta tttatttttt 3060ttaatagtta
tgttagtatt aagaacgtta tttatatttc aaatttttct tttttttctg 3120tacaaacgcg
tgtacgcatg taacattata ctgaaaacct tgcttgagaa ggttttggga 3180cgctcgaagg
ctttaatttg cagagaccgg gttggcggcg catttgtgtc ccaaaaaaca 3240gccccaattg
ccccaattga ccccaaattg acccagtagc gggcccaacc ccggcgagag 3300cccccttctc
cccacatatc aaacctcccc cggttcccac acttgccgtt aagggcgtag 3360ggtactgcag
tctggaatct acgcttgttc agactttgta ctagtttctt tgtctggcca 3420tccgggtaac
ccatgccgga cgcaaaatag actactgaaa atttttttgc tttgtggttg 3480ggactttagc
caagggtata aaagaccacc gtccccgaat tacctttcct cttcttttct 3540ctctctcctt
gtcaactcac acccgaaatc gttaagcatt tccttctgag tataagaatc 3600attcaaaatg
aagttctcta tgccatcttg gggtgttgtt ttttacgctt tgttggtttg 3660tttgttgcca
ttcttgtcta aggctggtgt tcaagctatg tccgttgtta ccaccgatgc 3720tcaagctgct
catgctgctg gtgtttctag attattggct tcttatagag ccattccacc 3780atctgctact
gttagattgg ctaagccaac ttctaatttg ttcagagcta gagctagaac 3840taacgttaag
ggtttggatg tttctggttt gactggtgtt attggtgttg atccagatgc 3900tagaactgct
gatgttgctg gtatgtgtac ttacgaagat ttggttgctg ctactttgcc 3960atatggtttg
gctccattgg ttgttccaca attgaaaact attactttgg gtggtgctgt 4020taccggtttg
ggtattgaat ctacttcttt cagaaacggt ttgccacacg aatctgtttt 4080ggaaatggat
attttgaccg gttccggtga aatagttact gcttctccag atcaacactc 4140cgatttgttt
catgcttttc caaactctta cggtacattg ggttactcta ccagattgag 4200aattgaattg
gaaccagttc atccattcgt tgccttgaga catttgagat tccattccat 4260tactgatttg
gtcgcagcca tggatagaat tattgaaact ggtggtttag acggtgaacc 4320agttgattat
ttggatggtg ttgttttctc tgccaccgaa tcatatttgt gtgttggttt 4380caaaactaag
accccaggtc cagtttctga ttatactggt caacaaatct tctacagatc 4440catccaacat
gatggtgata ctggtgctga aaaacatgat agattgacca tccatgacta 4500cttgtggaga
tgggatactg attggttttg gtgttctaga gcttttggtg ctcaacatcc 4560agttattaga
agattctggc caagaagatt aagaagatcc tccttctact ggaaattggt 4620tgcttacgat
caaagatacg atatcgccga tagaatcgaa aagagaaatg gtagaccacc 4680aagagaaaga
gttgttcaag acgttgaagt tccaattgaa agatgcgctg atttcgttga 4740atggttcttg
caaaatgttc caatcgaacc tatttggttg tgcccattga gattgagaga 4800ttctgctgat
ggtggtgctt catggccatt atatccattg aaagctcatc acacctacgt 4860caatattggt
ttctggtcat ctgttccagt tggtccagaa gaaggtcata ccaatagatt 4920gattgaaaaa
aaggtcgccg aattggacgg tcacaaatca ttatattctg atgcctacta 4980caccagagat
gaattcgatg aattatacgg tggtgaagtt tacaacaccg tcaaaaaaac 5040ttacgaccca
gactcaagat tattagactt gtactctaag gccgtccaaa gacaacatga 5100tgaattgtga
gctgcttgta cctagtgcaa ccccagtttg ttaaaaatta gtagtcaaaa 5160acttctgagt
tagaaatttg tgagtgtagt gagattgtag agtatcatgt gtgtccgtaa 5220gtgaagtgtt
attgactctt agttagttta tctagtactc gtttagttga cactgatcta 5280gtattttacg
aggcgtatga ctttagccaa gtgttgtact tagtcttctc tccaaacatg 5340agagggctct
gtcactcagt cggcctatgg gtgagatggc ttggtgagat ctttcgatag 5400tctcgtcaag
atggtaggat gatgggggaa tacattactg ctctcgtcaa ggaaaccaca 5460atcagatcac
accatcctcc atggtatccg atgactctct tctccacagt cgcagtagga 5520tgtcctgcac
gggtcttttt gtggggtgtg gagaaagggg tgcttggaga tggaagccgg 5580tagaaccggg
ctgcttgggg ggatttgggg ccgctgggct ccaaagaggg gtaggcattt 5640cgttggggtt
acgtaattgc ggcatttggg tcctgcgcgc atgtcccatt ggtcagaatt 5700agtccggata
ggagacttat cagccaatca cagcgccgga tccacctgta ggttgggttg 5760ggtgggagca
cccctccaca gagtagagtc aaacagcagc agcaacgtga tagttggggg 5820tgtgcgtgtt
aaaggaaaaa aaaagaagct tgggttatat tcccgctcta tttagaggtt 5880gcgggataga
cgccgacgga gggcaatggc gccatggaac cttgcggata tcgatacgcc 5940gcggcggact
gcgtccgaac cagctccagc agcgtttttt ccgggccatt gagccgactg 6000cgaccccgcc
aacgtgtctt ggcccacgca ctcatgtcat gttggtgttg ggaggccact 6060ttttaagtag
cacaaggcac ctagctcgcg gcagggtgtc cgaaccaaag aagcggctgc 6120agtggtgcaa
acggggcgga aacggcggga aaaagccacg ggggcacgaa ttgaggcacg 6180ccctcgaatt
tgagacgagt cacggcccca ttcgcccgcg caatggctcg ccaacgcccg 6240gtcttttgca
ccacatcagg ttaccccaag ccaaaccttt gtgttaaaaa gcttaacata 6300ttataccgaa
cgtaggtttg ggcgggcttg ctccgtctgt ccaaggcaac atttatataa 6360gggtctgcat
cgccggctca attgaatctt ttttcttctt ctcttctcta tattcattct 6420tgaattaaac
acacatcaac aatgaagttc tctatgccat cttggggtgt tgttttttac 6480gctttgttgg
tttgtttgtt gccattcttg tctaaggctg gtgttcaagc tatgaccacc 6540ttcaaagaaa
gagaaacttc taccgctgat agaaagttga ccttggctga aattttggaa 6600attttcgctg
ctggtaaaga accattgaag ttcactgctt atgatggttc ttctgctggt 6660cctgaagatg
ctactatggg tttggatttg aaaactccaa gaggtactac ttacttggct 6720actgctccag
gtgatttggg tttggctaga gcttatgttt ctggtgactt ggaaccacat 6780ggtgttcatc
ctggtgatcc atatccatta ttgagagctt tagccgaaag aatggaattc 6840aaaagaccac
cagctagagt tttggctaac atcgttagat ccattggtat cgaacatttg 6900aagccaattg
ctccaccacc acaagaagct ttgccaagat ggagaagaat tatggaaggt 6960ttgagacact
ctaagaccag agatgctgaa gctattcatc atcactacga tgtttctaac 7020accttctacg
aatgggtttt gggtccatct atgacttata cttgtgcttg ttacccaaca 7080gaagatgcca
ctttggaaga agctcaagat aacaagtaca gattggtctt tgaaaagttg 7140agattgaagc
caggtgacag attattggat gttggttgtg gttggggtgg tatggttaga 7200tatgctgcta
gacatggtgt aaaagctttg ggtgttactt tgtctagaga acaagctact 7260tgggctcaaa
aagctattgc tcaagaaggt ttaaccgatt tggctgaagt tagacacggt 7320gattacagag
atgttatcga atctggtttc gatgccgttt cttctattgg tttgactgaa 7380catatcggtg
ttcataacta tccagcctac ttcaacttct tgaagtctaa gttgagaacc 7440ggtggtttgt
tgttgaacca ttgcattact agaccagata acagatctgc tccatctgct 7500ggtggtttta
ttgatagata cgttttccca gatggtgaat tgactggttc cggtagaatt 7560attactgaag
cacaagatgt cggtttggaa gttatccatg aagaaaactt gagaaaccat 7620tacgccatga
ctttgagaga ttggtgtaga aacttggttg aacattggga tgaagccgtt 7680gaagaagttg
gtttgccaac tgctaaagtt tggggtttgt atatggctgg ttctagatta 7740ggttttgaaa
ctaacgttgt ccaattgcac caagttttgg cagttaagtt ggatgatcaa 7800ggtaaagatg
gtggtttgcc tttaagacca tggtggtctg ctcatgatga attgtgagca 7860ttagcgacta
ctaatatata tttgaatcca tggaattata acaaacaagc atcaaaacaa 7920gaattagcga
cattatactt gaaatcagca ttagcgatac tactaatata gtttattcta 7980tgtaatgatc
catggaagtt cgattgattt gccaagttaa tttgatagat tatgcatgcc 8040atttagtcga
cgcaggtacg atctacagcg ataaagaaga ggttgtgggt cattcaattt 8100tgcaccaatt
ttgcaccatc atagatcata atacatttac aaggcctaca attcttacag 8160ggtcttctcg
agagcaattc cttaattaag gcgcgccttt ccataggctc cgcccccctg 8220acgagcatca
caaaaatcga cgctcaagtc agaggtggcg aaacccgaca ggactataaa 8280gataccaggc
gtttccccct ggaagctccc tcgtgcgctc tcctgttccg accctgccgc 8340ttaccggata
cctgtccgcc tttctccctt cgggaagcgt ggcgctttct catagctcac 8400gctgtaggta
tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac 8460cccccgttca
gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag tccaacccgg 8520taagacacga
cttatcgcca ctggcagcag ccactggtaa caggattagc agagcgaggt 8580atgtaggcgg
tgctacagag ttcttgaagt ggtggcctaa ctacggctac actagaagaa 8640cagtatttgg
tatctgcgct ctgctgaagc cagttacctt cggaaaaaga gttggtagct 8700cttgatccgg
caaacaaacc accgctggta gcggtggttt ttttgtttgc aagcagcaga 8760ttacgcgcag
aaaaaaagga tctcaagaag atcctttgat cttttctacg gggtctgacg 8820ctcagtggaa
cgaaaactca cgttaaggga ttttggtcat gagattatca aaaaggatct 8880tcacctagat
ccttttaaat taaaaatgaa gttttaaatc aatctaaagt atatatgagt 8940aaacttggtc
tgacagttac caatgcttaa tcagtgaggc acctatctca gcgatctgtc 9000tatttcgttc
atccatagtt gcctgactcc ccgtcgtgta gataactacg atacgggagg 9060gcttaccatc
tggccccagt gctgcaatga taccgcgaga cccacgctca ccggctccag 9120atttatcagc
aataaaccag ccagccggaa gggccgagcg cagaagtggt cctgcaactt 9180tatccgcctc
catccagtct attaattgtt gccgggaagc tagagtaagt agttcgccag 9240ttaatagttt
gcgcaacgtt gttgccattg ctacaggcat cgtggtgtca cgctcgtcgt 9300ttggtatggc
ttcattcagc tccggttccc aacgatcaag gcgagttaca tgatccccca 9360tgttgtgcaa
aaaagcggtt agctccttcg gtcctccgat cgttgtcaga agtaagttgg 9420ccgcagtgtt
atcactcatg gttatggcag cactgcataa ttctcttact gtcatgccat 9480ccgtaagatg
cttttctgtg actggtgagt actcaaccaa gtcattctga gaatagtgta 9540tgcggcgacc
gagttgctct tgcccggcgt caatacggga taataccgcg ccacatagca 9600gaactttaaa
agtgctcatc attggaaaac gttcttcggg gcgaaaactc tcaaggatct 9660taccgctgtt
gagatccagt tcgatgtaac ccactcgtgc acccaactga tcttcagcat 9720cttttacttt
caccagcgtt tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa 9780agggaataag
ggcgacacgg aaatgttgaa tactcatact cttccttttt caatattatt 9840gaagcattta
tcagggttat tgtctcatga gcggatacat atttgaatgt atttagaaaa 9900ataaacagcg
atcgcgcggc cgcgggtaat aactgatata attaaattga agctctaatt 9960tgtgagttta
gtatacatgc atttacttat aatacagttt tttagttttg ctggccgcat 10020cttctcaaat
atgcttccca gcctgctttt ctgtaacgtt caccctctac cttagcatcc 10080cttccctttg
caaatagtcc tcttccaaca ataataatgt cagatcctgt agagaccaca 10140tcatccacgg
ttctatactg ttgacccaat gcgtctccct tgtcatctaa acccacaccg 10200ggtgtcataa
tcaaccaatc gtaaccttca tctcttccac ccatgtctct ttgagcaata 10260aagccgataa
caaaatcttt gtcgctcttc gcaatgtcaa cagtaccctt agtatattct 10320ccagtagcta
gggagccctt gcatgacaat tctgctaaca tcaaaaggcc tctaggttcc 10380tttgttactt
cttccgccgc ctgcttcaaa ccgctaacaa tacctgggcc caccacaccg 10440tgtgcattcg
taatgtctgc ccattctgct attctgtata cacccgcaga gtactgcaat 10500ttgactgtat
taccaatgtc agcaaatttt ctgtcttcga agagtaaaaa attgtacttg 10560gcggataatg
cctttagcgg cttaactgtg ccctccatgg aaaaatcagt caagatatcc 10620acatgtgttt
ttagtaaaca aattttggga cctaatgctt caactaactc cagtaattcc 10680ttggtggtac
gaacatccaa tgaagcacac aagtttgttt gcttttcgtg catgatatta 10740aatagcttgg
cagcaacagg actaggatga gtagcagcac gttccttata tgtagctttc 10800gacatgattt
atcttcgttt cctgcaggtt tttgttctgt gcagttgggt taagaatact 10860gggcaatttc
atgtttcttc aacaccacat atgcgtatat ataccaatct aagtctgtgc 10920tccttccttc
gttcttcctt ctgctcggag attaccgaat caaagctagc
10970801410DNAArtificial SequenceSynthetic Nucleic Acid 80atgtccgttg
ttaccaccga tgctcaagct gctcatgctg ctggtgtttc tagattattg 60gcttcttata
gagccattcc accatctgct actgttagat tggctaagcc aacttctaat 120ttgttcagag
ctagagctag aactaacgtt aagggtttgg atgtttctgg tttgactggt 180gttattggtg
ttgatccaga tgctagaact gctgatgttg ctggtatgtg tacttacgaa 240gatttggttg
ctgctacttt gccatatggt ttggctccat tggttgttcc acaattgaaa 300actattactt
tgggtggtgc tgttaccggt ttgggtattg aatctacttc tttcagaaac 360ggtttgccac
acgaatctgt tttggaaatg gatattttga ccggttccgg tgaaatagtt 420actgcttctc
cagatcaaca ctccgatttg tttcatgctt ttccaaactc ttacggtaca 480ttgggttact
ctaccagatt gagaattgaa ttggaaccag ttcatccatt cgttgccttg 540agacatttga
gattccattc cattactgat ttggtcgcag ccatggatag aattattgaa 600actggtggtt
tagacggtga accagttgat tatttggatg gtgttgtttt ctctgccacc 660gaatcatatt
tgtgtgttgg tttcaaaact aagaccccag gtccagtttc tgattatact 720ggtcaacaaa
tcttctacag atccatccaa catgatggtg atactggtgc tgaaaaacat 780gatagattga
ccatccatga ctacttgtgg agatgggata ctgattggtt ttggtgttct 840agagcttttg
gtgctcaaca tccagttatt agaagattct ggccaagaag attaagaaga 900tcctccttct
actggaaatt ggttgcttac gatcaaagat acgatatcgc cgatagaatc 960gaaaagagaa
atggtagacc accaagagaa agagttgttc aagacgttga agttccaatt 1020gaaagatgcg
ctgatttcgt tgaatggttc ttgcaaaatg ttccaatcga acctatttgg 1080ttgtgcccat
tgagattgag agattctgct gatggtggtg cttcatggcc attatatcca 1140ttgaaagctc
atcacaccta cgtcaatatt ggtttctggt catctgttcc agttggtcca 1200gaagaaggtc
ataccaatag attgattgaa aaaaaggtcg ccgaattgga cggtcacaaa 1260tcattatatt
ctgatgccta ctacaccaga gatgaattcg atgaattata cggtggtgaa 1320gtttacaaca
ccgtcaaaaa aacttacgac ccagactcaa gattattaga cttgtactct 1380aaggccgtcc
aaagacaaca tgatgaattg
1410811311DNAArtificial SequenceSynthetic Nucleic Acid 81atgaccacct
tcaaagaaag agaaacttct accgctgata gaaagttgac cttggctgaa 60attttggaaa
ttttcgctgc tggtaaagaa ccattgaagt tcactgctta tgatggttct 120tctgctggtc
ctgaagatgc tactatgggt ttggatttga aaactccaag aggtactact 180tacttggcta
ctgctccagg tgatttgggt ttggctagag cttatgtttc tggtgacttg 240gaaccacatg
gtgttcatcc tggtgatcca tatccattat tgagagcttt agccgaaaga 300atggaattca
aaagaccacc agctagagtt ttggctaaca tcgttagatc cattggtatc 360gaacatttga
agccaattgc tccaccacca caagaagctt tgccaagatg gagaagaatt 420atggaaggtt
tgagacactc taagaccaga gatgctgaag ctattcatca tcactacgat 480gtttctaaca
ccttctacga atgggttttg ggtccatcta tgacttatac ttgtgcttgt 540tacccaacag
aagatgccac tttggaagaa gctcaagata acaagtacag attggtcttt 600gaaaagttga
gattgaagcc aggtgacaga ttattggatg ttggttgtgg ttggggtggt 660atggttagat
atgctgctag acatggtgta aaagctttgg gtgttacttt gtctagagaa 720caagctactt
gggctcaaaa agctattgct caagaaggtt taaccgattt ggctgaagtt 780agacacggtg
attacagaga tgttatcgaa tctggtttcg atgccgtttc ttctattggt 840ttgactgaac
atatcggtgt tcataactat ccagcctact tcaacttctt gaagtctaag 900ttgagaaccg
gtggtttgtt gttgaaccat tgcattacta gaccagataa cagatctgct 960ccatctgctg
gtggttttat tgatagatac gttttcccag atggtgaatt gactggttcc 1020ggtagaatta
ttactgaagc acaagatgtc ggtttggaag ttatccatga agaaaacttg 1080agaaaccatt
acgccatgac tttgagagat tggtgtagaa acttggttga acattgggat 1140gaagccgttg
aagaagttgg tttgccaact gctaaagttt ggggtttgta tatggctggt 1200tctagattag
gttttgaaac taacgttgtc caattgcacc aagttttggc agttaagttg 1260gatgatcaag
gtaaagatgg tggtttgcct ttaagaccat ggtggtctgc t
1311824399DNAArtificial SequenceSynthetic Nucleic Acid 82tgggtaggtt
atatagggat atagcacaga gatatatagc aaagagatac ttttgagcaa 60tgtttgtgga
agcggtattc gcaatttaat taaagctggt gacaattaat catcggctcg 120tataatgtgt
ggaattgaat cgatataagg aggttaatca tgtttaaacc ctcaaaatat 180attttccctc
tatcttctcg ttgcgcttaa tttgactaat tctcattagc gaggcgcgcc 240tttccatagg
ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa gtcagaggtg 300gcgaaacccg
acaggactat aaagatacca ggcgtttccc cctggaagct ccctcgtgcg 360ctctcctgtt
ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc cttcgggaag 420cgtggcgctt
tctcatagct cacgctgtag gtatctcagt tcggtgtagg tcgttcgctc 480caagctgggc
tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct tatccggtaa 540ctatcgtctt
gagtccaacc cggtaagaca cgacttatcg ccactggcag cagccactgg 600taacaggatt
agcagagcga ggtatgtagg cggtgctaca gagttcttga agtggtggcc 660taactacggc
tacactagaa gaacagtatt tggtatctgc gctctgctga agccagttac 720cttcggaaaa
agagttggta gctcttgatc cggcaaacaa accaccgctg gtagcggtgg 780tttttttgtt
tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag aagatccttt 840gatcttttct
acggggtctg acgctcagtg gaacgaaaac tcacgttaag ggattttggt 900catgagatta
tcaaaaagga tcttcaccta gatcctttta aattaaaaat gaagttttaa 960atcaatctaa
agtatatatg agtaaacttg gtctgacagt taccaatgct taatcagtga 1020ggcacctatc
tcagcgatct gtctatttcg ttcatccata gttgcctgac tccccgtcgt 1080gtagataact
acgatacggg agggcttacc atctggcccc agtgctgcaa tgataccgcg 1140agacccacgc
tcaccggctc cagatttatc agcaataaac cagccagccg gaagggccga 1200gcgcagaagt
ggtcctgcaa ctttatccgc ctccatccag tctattaatt gttgccggga 1260agctagagta
agtagttcgc cagttaatag tttgcgcaac gttgttgcca ttgctacagg 1320catcgtggtg
tcacgctcgt cgtttggtat ggcttcattc agctccggtt cccaacgatc 1380aaggcgagtt
acatgatccc ccatgttgtg caaaaaagcg gttagctcct tcggtcctcc 1440gatcgttgtc
agaagtaagt tggccgcagt gttatcactc atggttatgg cagcactgca 1500taattctctt
actgtcatgc catccgtaag atgcttttct gtgactggtg agtactcaac 1560caagtcattc
tgagaatagt gtatgcggcg accgagttgc tcttgcccgg cgtcaatacg 1620ggataatacc
gcgccacata gcagaacttt aaaagtgctc atcattggaa aacgttcttc 1680ggggcgaaaa
ctctcaagga tcttaccgct gttgagatcc agttcgatgt aacccactcg 1740tgcacccaac
tgatcttcag catcttttac tttcaccagc gtttctgggt gagcaaaaac 1800aggaaggcaa
aatgccgcaa aaaagggaat aagggcgaca cggaaatgtt gaatactcat 1860actcttcctt
tttcaatatt attgaagcat ttatcagggt tattgtctca tgagcggata 1920catatttgaa
tgtatttaga aaaataaaca gcgatcgcgc ggccgcgggt aataactgat 1980ataattaaat
tgaagctcta atttgtgagt ttagtataca tgcatttact tataatacag 2040ttttttagtt
ttgctggccg catcttctca aatatgcttc ccagcctgct tttctgtaac 2100gttcaccctc
taccttagca tcccttccct ttgcaaatag tcctcttcca acaataataa 2160tgtcagatcc
tgtagagacc acatcatcca cggttctata ctgttgaccc aatgcgtctc 2220ccttgtcatc
taaacccaca ccgggtgtca taatcaacca atcgtaacct tcatctcttc 2280cacccatgtc
tctttgagca ataaagccga taacaaaatc tttgtcgctc ttcgcaatgt 2340caacagtacc
cttagtatat tctccagtag ctagggagcc cttgcatgac aattctgcta 2400acatcaaaag
gcctctaggt tcctttgtta cttcttccgc cgcctgcttc aaaccgctaa 2460caatacctgg
gcccaccaca ccgtgtgcat tcgtaatgtc tgcccattct gctattctgt 2520atacacccgc
agagtactgc aatttgactg tattaccaat gtcagcaaat tttctgtctt 2580cgaagagtaa
aaaattgtac ttggcggata atgcctttag cggcttaact gtgccctcca 2640tggaaaaatc
agtcaagata tccacatgtg tttttagtaa acaaattttg ggacctaatg 2700cttcaactaa
ctccagtaat tccttggtgg tacgaacatc caatgaagca cacaagtttg 2760tttgcttttc
gtgcatgata ttaaatagct tggcagcaac aggactagga tgagtagcag 2820cacgttcctt
atatgtagct ttcgacatga tttatcttcg tttcctgcag gtttttgttc 2880tgtgcagttg
ggttaagaat actgggcaat ttcatgtttc ttcaacacca catatgcgta 2940tatataccaa
tctaagtctg tgctccttcc ttcgttcttc cttctgctcg gagattaccg 3000aatcaaagct
agcttatcga tgataagctg tcaaagatga gaattaattc cacggactat 3060agactatact
agatactccg tctactgtac gatacacttc cgctcaggtc cttgtccttt 3120aacgaggcct
taccactctt ttgttactct attgatccag ctcagcaaag gcagtgtgat 3180ctaagattct
atcttcgcga tgtagtaaaa ctagctagac cgagaaagag actagaaatg 3240caaaaggcac
ttctacaatg gctgccatca ttattatccg atgtgacgct gcagcttctc 3300aatgatattc
gaatacgctt tgaggagata cagcctaata tccgacaaac tgttttacag 3360atttacgatc
gtacttgtta cccatcattg aattttgaac atccgaacct gggagttttc 3420cctgaaacag
atagtatatt tgaacctgta taataatata tagtctagcg ctttacggaa 3480gacaatgtat
gtatttcggt tcctggagaa actattgcat ctattgcata ggtaatcttg 3540cacgtcgcat
ccccggttca ttttctgcgt ttccatcttg cacttcaata gcatatcttt 3600gttaacgaag
catctgtgct tcattttgta gaacaaaaat gcaacgcgag agcgctaatt 3660tttcaaacaa
agaatctgag ctgcattttt acagaacaga aatgcaacgc gaaagcgcta 3720ttttaccaac
gaagaatctg tgcttcattt ttgtaaaaca aaaatgcaac gcgacgagag 3780cgctaatttt
tcaaacaaag aatctgagct gcatttttac agaacagaaa tgcaacgcga 3840gagcgctatt
ttaccaacaa agaatctata cttctttttt gttctacaaa aatgcatccc 3900gagagcgcta
tttttctaac aaagcatctt agattacttt ttttctcctt tgtgcgctct 3960ataatgcagt
ctcttgataa ctttttgcac tgtaggtccg ttaaggttag aagaaggcta 4020ctttggtgtc
tattttctct tccataaaaa aagcctgact ccacttcccg cgtttactga 4080ttactagcga
agctgcgggt gcattttttc aagataaagg catccccgat tatattctat 4140accgatgtgg
attgcgcata ctttgtgaac agaaagtgat agcgttgatg attcttcatt 4200ggtcagaaaa
ttatgaacgg tttcttctat tttgtctcta tatactacgt ataggaaatg 4260tttacatttt
cgtattgttt tcgattcact ctatgaatag ttcttactac aatttttttg 4320tctaaagagt
aatactagag ataaacataa aaaatgtaga ggtcgagttt agatgcaagt 4380tcaaggagcg
aaaggtgga
4399837531DNAArtificial SequenceSynthetic Nucleic Acid 83gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgtctgttc cttcgaccga 120cgcacgttct
gctcacgccg acggcgtgca gcggcttctc gccagctatc gggcgattcc 180ccaagacgcc
acggtccggc tggccaaacc cacgtcgaac ctcttccgtg cccgcgcgaa 240aaccaggacc
aagggtctgg acacgtctgg gttgacgaac gtgatcgcgg tcgacgcgga 300ggcacgcacc
gccgatgtgg cagggatgtg cacctacgaa gacctggtcg cggccacgct 360gccgcatgga
ctttcgccgc tggtggtgcc gcagttgaag acgatcaccc tcggcggggc 420ggtcaccgga
ctcgggatcg agtccgcctc gttccgcaac ggcctgccac acgaatcggt 480tctcgagatg
gacgtcctca ccggcaccgg tgatgtcgtg cgcgcctccc ccgacgagaa 540ccctgacctg
tttcgggcgt ttccgaattc ctatggcacg ttgggctatt cggttcggct 600caagatcgag
ctggaaccgg tgaagccgtt cgtcgcgctg cgccacctcc gtttccattc 660gctgtcggct
ctcatcgagg cgatggaccg catcgtcgaa accggcggcc tcaacggcga 720accggtggac
tacctcgacg gcgtcgtgtt cagtgccgag gagagttacc tgtgcgtggg 780gcagcgctcc
gcgacaccgg gcccggtcag cgactacacg ggcaagcaga tctactaccg 840ctcgattcag
cacgacggcc cgaccgatgg cgccgagaag cacgaccggc tgaccatcca 900cgactacctg
tggcgctggg acaccgactg gttctggtgc tcaagggcat tcggcgcgca 960gaacccgcgg
atccggcgct ggtggccgcg ccggtaccgg cgcagcagtg tgtactggaa 1020gctgatcggc
tacgaccggc gtttcggtat cgccgatcgc atcgagaagc gcaacggccg 1080acccccgcgc
gagcgggtgg tccaggacat cgaggtgccc atcgagcgga ccgtcgagtt 1140tctgcagtgg
tttctcgaca ccgtgcccat cgaaccgatc tggttgtgcc cgttgcggct 1200ccgcgacgac
cgcgattggc ccctgtatcc gatccgaccc caccacacct acgtcaacgt 1260gggtttctgg
tcgtcggtgc cggtgggccc ggaggagggc tacaccaaca ggatgatcga 1320acggaaagtc
agcgacctcg acggtcacaa atcgctgtat tccgatgcgt actactcgcc 1380ggaagagttt
gattcgctct atggcgggga gacgtacaag acggtgaaga agacatacga 1440cccagactct
cgtttcctgg acctgtacgg caaagcagtg gggcggcaat gagcgttgac 1500gcgaagaacg
gaggccacag ttgacgacat ttcgggacgg cgcggccgac accggcctgc 1560acggagaccg
caagctcacc ctggcggagg tcttggaggt cttcgcctcg ggccgactgc 1620ctctgaagtt
cacggcgtac gacggcagca gcgcgggccc ggacgacgcc acgctcgggc 1680tggacctgct
gaccccccgc gggaccacgt acctcgcaac ggctcccggc gatctcggcc 1740tggcccgggc
ctacgtctcc ggtgacctgc agttgcaggg ggtgcaccct ggcgacccgt 1800acgacctgct
caacgcactg gtgcagaaac tggacttcaa gcgaccgtcc gcccgggtgc 1860tggcgcaggt
cgtccgatcg atcgggatcg agcacctgaa accgatcgcg ccaccgccgc 1920aggaggcgct
gccgcggtgg cggcgcatcg cagaaggact gcggcacagc aagacccgtg 1980acgccgacgc
gatccaccac cattacgatg tctccaacac cttctacgag tgggtgctcg 2040ggccgtcgat
gacctacacc tgcgcctgct acccgcatcc cgacgccacc ctcgaggagg 2100cgcaggagaa
caaatatcgg ctggtgttcg agaaactgcg cctcaagccg ggcgaccgcc 2160ttctcgacgt
gggttgcggg tggggcggaa tggtgcgcta cgcggcccgt cacggcgtca 2220aggcgatcgg
ggtgacgctg tccagggagc aggcgcagtg ggcacgcgcc gccatcgaac 2280gggacggcct
gggtgacctc gccgaggtcc gccacagcga ctaccgcgat gtgcgcgagt 2340cccagttcga
cgccgtgtct tcgctggggc tcaccgagca catcggggtc gccaactatc 2400cgtcgtactt
ccggttcctc aagtcgaagt tgcgcccggg cggcctactg ctcaaccact 2460gcatcacccg
gcacaacaat cgcaccggcc ccgccgccgg gggattcatc gaccggtatg 2520tgttcccgga
cggggagctg accggatcgg gccggatcat caccgagatc caggacgtcg 2580gtttggaggt
gatgcacgaa gagaacctgc gccggcacta tgcgctgaca cttcgggact 2640ggtgccggaa
tctggtgcag cactgggacg aagcggtcgc agaggtcggc ctgcccaccg 2700ccaaggtgtg
gggtctgtac atggctgcct cgcgggtcgg cttcgagcag aacagcattc 2760agctgcatca
ggtactggcg gtgaagctcg acgaacgtgg cggggacggc ggtttgccgt 2820tgcggccctg
gtggaccgcg tagcaactat gctcaccgtg tgatccgctt tctgctgcgc 2880gtcgcggtct
ttctcggatc gtcggcgatc gggctactgg tggccggctg gctggtgccg 2940ggggtgtcgc
tgtcggtgct gggcttcgtc accgcggtgg tgatcttcac ggtggcacaa 3000gggattctgt
cgccgttctt cctgaagatg gccagccgct acgcgtcggc cttcctcggc 3060ggcatcggcc
tggtgtccac gttcgtggcg ctgctgctcg cgtcgctgct gtccaacggg 3120ctcagcatcc
gcggcgtcgg gtcgtggatc gcggccacgg tggtggtctg gctggtcaca 3180gccctggcga
ccgtcgtgct gcccgttctg gtgctgcggg agaagaagaa agcagcctga 3240cctcaaaata
tattttccct ctatcttctc gttgcgctta atttgactaa ttctcattag 3300cgaggcgcgc
ctttccatag gctccgcccc cctgacgagc atcacaaaaa tcgacgctca 3360agtcagaggt
ggcgaaaccc gacaggacta taaagatacc aggcgtttcc ccctggaagc 3420tccctcgtgc
gctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc 3480ccttcgggaa
gcgtggcgct ttctcatagc tcacgctgta ggtatctcag ttcggtgtag 3540gtcgttcgct
ccaagctggg ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc 3600ttatccggta
actatcgtct tgagtccaac ccggtaagac acgacttatc gccactggca 3660gcagccactg
gtaacaggat tagcagagcg aggtatgtag gcggtgctac agagttcttg 3720aagtggtggc
ctaactacgg ctacactaga agaacagtat ttggtatctg cgctctgctg 3780aagccagtta
ccttcggaaa aagagttggt agctcttgat ccggcaaaca aaccaccgct 3840ggtagcggtg
gtttttttgt ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa 3900gaagatcctt
tgatcttttc tacggggtct gacgctcagt ggaacgaaaa ctcacgttaa 3960gggattttgg
tcatgagatt atcaaaaagg atcttcacct agatcctttt aaattaaaaa 4020tgaagtttta
aatcaatcta aagtatatat gagtaaactt ggtctgacag ttaccaatgc 4080ttaatcagtg
aggcacctat ctcagcgatc tgtctatttc gttcatccat agttgcctga 4140ctccccgtcg
tgtagataac tacgatacgg gagggcttac catctggccc cagtgctgca 4200atgataccgc
gagacccacg ctcaccggct ccagatttat cagcaataaa ccagccagcc 4260ggaagggccg
agcgcagaag tggtcctgca actttatccg cctccatcca gtctattaat 4320tgttgccggg
aagctagagt aagtagttcg ccagttaata gtttgcgcaa cgttgttgcc 4380attgctacag
gcatcgtggt gtcacgctcg tcgtttggta tggcttcatt cagctccggt 4440tcccaacgat
caaggcgagt tacatgatcc cccatgttgt gcaaaaaagc ggttagctcc 4500ttcggtcctc
cgatcgttgt cagaagtaag ttggccgcag tgttatcact catggttatg 4560gcagcactgc
ataattctct tactgtcatg ccatccgtaa gatgcttttc tgtgactggt 4620gagtactcaa
ccaagtcatt ctgagaatag tgtatgcggc gaccgagttg ctcttgcccg 4680gcgtcaatac
gggataatac cgcgccacat agcagaactt taaaagtgct catcattgga 4740aaacgttctt
cggggcgaaa actctcaagg atcttaccgc tgttgagatc cagttcgatg 4800taacccactc
gtgcacccaa ctgatcttca gcatctttta ctttcaccag cgtttctggg 4860tgagcaaaaa
caggaaggca aaatgccgca aaaaagggaa taagggcgac acggaaatgt 4920tgaatactca
tactcttcct ttttcaatat tattgaagca tttatcaggg ttattgtctc 4980atgagcggat
acatatttga atgtatttag aaaaataaac agcgatcgcg cggccgcggg 5040taataactga
tataattaaa ttgaagctct aatttgtgag tttagtatac atgcatttac 5100ttataataca
gttttttagt tttgctggcc gcatcttctc aaatatgctt cccagcctgc 5160ttttctgtaa
cgttcaccct ctaccttagc atcccttccc tttgcaaata gtcctcttcc 5220aacaataata
atgtcagatc ctgtagagac cacatcatcc acggttctat actgttgacc 5280caatgcgtct
cccttgtcat ctaaacccac accgggtgtc ataatcaacc aatcgtaacc 5340ttcatctctt
ccacccatgt ctctttgagc aataaagccg ataacaaaat ctttgtcgct 5400cttcgcaatg
tcaacagtac ccttagtata ttctccagta gctagggagc ccttgcatga 5460caattctgct
aacatcaaaa ggcctctagg ttcctttgtt acttcttccg ccgcctgctt 5520caaaccgcta
acaatacctg ggcccaccac accgtgtgca ttcgtaatgt ctgcccattc 5580tgctattctg
tatacacccg cagagtactg caatttgact gtattaccaa tgtcagcaaa 5640ttttctgtct
tcgaagagta aaaaattgta cttggcggat aatgccttta gcggcttaac 5700tgtgccctcc
atggaaaaat cagtcaagat atccacatgt gtttttagta aacaaatttt 5760gggacctaat
gcttcaacta actccagtaa ttccttggtg gtacgaacat ccaatgaagc 5820acacaagttt
gtttgctttt cgtgcatgat attaaatagc ttggcagcaa caggactagg 5880atgagtagca
gcacgttcct tatatgtagc tttcgacatg atttatcttc gtttcctgca 5940ggtttttgtt
ctgtgcagtt gggttaagaa tactgggcaa tttcatgttt cttcaacacc 6000acatatgcgt
atatatacca atctaagtct gtgctccttc cttcgttctt ccttctgctc 6060ggagattacc
gaatcaaagc tagcttatcg atgataagct gtcaaagatg agaattaatt 6120ccacggacta
tagactatac tagatactcc gtctactgta cgatacactt ccgctcaggt 6180ccttgtcctt
taacgaggcc ttaccactct tttgttactc tattgatcca gctcagcaaa 6240ggcagtgtga
tctaagattc tatcttcgcg atgtagtaaa actagctaga ccgagaaaga 6300gactagaaat
gcaaaaggca cttctacaat ggctgccatc attattatcc gatgtgacgc 6360tgcagcttct
caatgatatt cgaatacgct ttgaggagat acagcctaat atccgacaaa 6420ctgttttaca
gatttacgat cgtacttgtt acccatcatt gaattttgaa catccgaacc 6480tgggagtttt
ccctgaaaca gatagtatat ttgaacctgt ataataatat atagtctagc 6540gctttacgga
agacaatgta tgtatttcgg ttcctggaga aactattgca tctattgcat 6600aggtaatctt
gcacgtcgca tccccggttc attttctgcg tttccatctt gcacttcaat 6660agcatatctt
tgttaacgaa gcatctgtgc ttcattttgt agaacaaaaa tgcaacgcga 6720gagcgctaat
ttttcaaaca aagaatctga gctgcatttt tacagaacag aaatgcaacg 6780cgaaagcgct
attttaccaa cgaagaatct gtgcttcatt tttgtaaaac aaaaatgcaa 6840cgcgacgaga
gcgctaattt ttcaaacaaa gaatctgagc tgcattttta cagaacagaa 6900atgcaacgcg
agagcgctat tttaccaaca aagaatctat acttcttttt tgttctacaa 6960aaatgcatcc
cgagagcgct atttttctaa caaagcatct tagattactt tttttctcct 7020ttgtgcgctc
tataatgcag tctcttgata actttttgca ctgtaggtcc gttaaggtta 7080gaagaaggct
actttggtgt ctattttctc ttccataaaa aaagcctgac tccacttccc 7140gcgtttactg
attactagcg aagctgcggg tgcatttttt caagataaag gcatccccga 7200ttatattcta
taccgatgtg gattgcgcat actttgtgaa cagaaagtga tagcgttgat 7260gattcttcat
tggtcagaaa attatgaacg gtttcttcta ttttgtctct atatactacg 7320tataggaaat
gtttacattt tcgtattgtt ttcgattcac tctatgaata gttcttacta 7380caattttttt
gtctaaagag taatactaga gataaacata aaaaatgtag aggtcgagtt 7440tagatgcaag
ttcaaggagc gaaaggtgga tgggtaggtt atatagggat atagcacaga 7500gatatatagc
aaagagatac ttttgagcaa t
7531847126DNAArtificial SequenceSynthetic Nucleic Acid 84gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat atgacgcctg aagctagtgc 120ggcggcgcac
gccgctgcgg tggatcgcct catccatagc tatcgggcga ttcctgatga 180cgcgccggtg
cggctggcga agaagacgtc aaacctattc cgccacaggg aaaagacttc 240tgctcctggg
cttgacgtat ccggcctggc tcgcgtgatt gggatcgact cagacactcg 300cactgccgac
gttggcggca tgtgcacata cgaggacctt gtcgcggcga cgctcgaata 360cgatctggtc
cccctggtcg tcccgcaact caaaacgatc actctcggcg gcgcggtgac 420gggcctggga
attgagtcca cctcgttccg caatgggctt ccccatgaat ctgttctcga 480aatggatatc
ctgacgggcg ccggggaggt cgtcacggcc ggcccggaag gcccccatag 540cgatttgtac
tgggggtttc cgaattcgta cggcacgctc ggctatgcga cgcgcctgcg 600catcgaacta
gaaccggtcg agccgtacgt cgaactcagg cacctgcggt tcactagcct 660cgatgagctt
caggagacac ttgacaccgt ttcgtacgaa cacacgtatg acggggaacc 720cgttcattac
gtcgatggag tcatgttctc agccacggaa agctacctca cgcttggccg 780tcagacgagc
gaacccggcc cggtcagcga ctacaccgga aaccagatct actaccgttc 840aatacagcac
ggtggcgctg aaactcccgt cgtcgaccgg atgaccattc atgactatct 900atggcgctgg
gatactgact ggttctggtg ctcgcgtgcc ttcggaacgc aacacccagt 960ggtccggaga
ttctggccac gccgctatcg ccgcagcagc ttctactgga agctgatcgc 1020gcttgaccgc
caggttgggc tcgcggactt catcgaacaa cggaagggca acctcccccg 1080ggaacgcgta
gtccaggaca tcgaggtccc gatcgagaac actgcgagct tcttgcggtg 1140gttcttggcg
aacgtgccga tcgagccggt atggctatgc ccgctgcgcc tgcgaaaaac 1200acgcagcccc
ggcctgcctt cgccgacgtc cccggcttca cgcccatggc ccctctatcc 1260gctcgagcct
cagcgcacat acgtcaatgt tggcttctgg tcagcggtgc cggtcgtggc 1320cggccagccc
gaggggcaca ccaaccggat gatcgagaac gaagtcgatc gccttgacgg 1380tcacaaatcg
ctgtactcag atgcgtttta cgagcgaaaa gagtttgacg cgctgtacgg 1440cggcgatacc
tatagagaac tcaaagagac ctacgaccca aacagccggt tacttgatct 1500ctatgcaaag
gcggtgcaag gacgatgaag gcagtgttga cggcgtttac ggctccccaa 1560ctcgaaagga
tgaacgtcgc tgagatactc agcgcggtac tcgggcgaga tttcccgatc 1620cggttcactg
cgtacgacgg cagcgcgctc ggccccgaaa ccgcccgcta cggcttgcac 1680ctcacgacgc
cgcgcgggct gacctacctc gctaccgcgc ccggtgatct cgggctcgca 1740cgcgcgtacg
tgtccggcga cctcgaggtc agtggggttc atcagggtga cccgtacgag 1800ataatgaaga
tcctcgcgca tgacgtccgg gtgcggcggc cctcgccagc aacgatcgct 1860tcgatcatgc
ggtccctcgg ctgggaacgc ttgcgaccgg tcgcgccgcc cccgcaagag 1920aacatgcccc
gttggcgccg gatggccctt ggcctgctgc actcgaagag ccgtgatgct 1980gcggcaatcc
accatcatta cgacgtgtcg aacgagtttt acgagcacat cctcggcccg 2040tcgatgacgt
acacatgcgc ggcctacccc agcgcagaca gttccctgga ggaagcacag 2100gacaacaagt
accgactcgt cttcgagaaa cttggcctga aagccgggga tcgcctgctt 2160gacgtcgggt
gcgggtgggg cggcatggtg cggttcgccg ctaagcgcgg cgttcatgtc 2220atcggtgcga
cattgtcccg caaacaggcg gaatgggctc agaagatgat tgcccatgaa 2280ggattgggcg
atctggcgga agtccgtttc tgcgactacc gcgatgtcac agaggcgggc 2340ttcgacgcag
tgtcgtcgat cggcctcact gaacacatcg gtttggcgaa ctacccgtcg 2400tacttcggct
tcctgaagga caagttgcgg ccaggcggac gactgctgaa ccattgcatc 2460actcgcccga
acaaccttca aagcaaccgc gcaggtgact tcattgaccg gtacgttttc 2520cctgacggag
agctcgccgg acctggcttc atcatttcag ctgtccacga cgccggtttc 2580gaggtgcggc
acgaagagaa cctccgcgag cactacgcac tgacgctgcg ggactggaac 2640cgcaacctcg
ctcgcgactg ggacgcgtgt gtgcacgcct ccgacgaggg caccgcccgc 2700gtctggggac
tgtacatttc cggttcacga gtcgcgtttg aaacgaactc gattcagctg 2760caccaggtcc
tggcggtcaa aaccgcgcgg aatggcgaag cgcaggtccc gttgggtcag 2820tggtggaccc
gctgacctca aaatatattt tccctctatc ttctcgttgc gcttaatttg 2880actaattctc
attagcgagg cgcgcctttc cataggctcc gcccccctga cgagcatcac 2940aaaaatcgac
gctcaagtca gaggtggcga aacccgacag gactataaag ataccaggcg 3000tttccccctg
gaagctccct cgtgcgctct cctgttccga ccctgccgct taccggatac 3060ctgtccgcct
ttctcccttc gggaagcgtg gcgctttctc atagctcacg ctgtaggtat 3120ctcagttcgg
tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc ccccgttcag 3180cccgaccgct
gcgccttatc cggtaactat cgtcttgagt ccaacccggt aagacacgac 3240ttatcgccac
tggcagcagc cactggtaac aggattagca gagcgaggta tgtaggcggt 3300gctacagagt
tcttgaagtg gtggcctaac tacggctaca ctagaagaac agtatttggt 3360atctgcgctc
tgctgaagcc agttaccttc ggaaaaagag ttggtagctc ttgatccggc 3420aaacaaacca
ccgctggtag cggtggtttt tttgtttgca agcagcagat tacgcgcaga 3480aaaaaaggat
ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc tcagtggaac 3540gaaaactcac
gttaagggat tttggtcatg agattatcaa aaaggatctt cacctagatc 3600cttttaaatt
aaaaatgaag ttttaaatca atctaaagta tatatgagta aacttggtct 3660gacagttacc
aatgcttaat cagtgaggca cctatctcag cgatctgtct atttcgttca 3720tccatagttg
cctgactccc cgtcgtgtag ataactacga tacgggaggg cttaccatct 3780ggccccagtg
ctgcaatgat accgcgagac ccacgctcac cggctccaga tttatcagca 3840ataaaccagc
cagccggaag ggccgagcgc agaagtggtc ctgcaacttt atccgcctcc 3900atccagtcta
ttaattgttg ccgggaagct agagtaagta gttcgccagt taatagtttg 3960cgcaacgttg
ttgccattgc tacaggcatc gtggtgtcac gctcgtcgtt tggtatggct 4020tcattcagct
ccggttccca acgatcaagg cgagttacat gatcccccat gttgtgcaaa 4080aaagcggtta
gctccttcgg tcctccgatc gttgtcagaa gtaagttggc cgcagtgtta 4140tcactcatgg
ttatggcagc actgcataat tctcttactg tcatgccatc cgtaagatgc 4200ttttctgtga
ctggtgagta ctcaaccaag tcattctgag aatagtgtat gcggcgaccg 4260agttgctctt
gcccggcgtc aatacgggat aataccgcgc cacatagcag aactttaaaa 4320gtgctcatca
ttggaaaacg ttcttcgggg cgaaaactct caaggatctt accgctgttg 4380agatccagtt
cgatgtaacc cactcgtgca cccaactgat cttcagcatc ttttactttc 4440accagcgttt
ctgggtgagc aaaaacagga aggcaaaatg ccgcaaaaaa gggaataagg 4500gcgacacgga
aatgttgaat actcatactc ttcctttttc aatattattg aagcatttat 4560cagggttatt
gtctcatgag cggatacata tttgaatgta tttagaaaaa taaacagcga 4620tcgcgcggcc
gcgggtaata actgatataa ttaaattgaa gctctaattt gtgagtttag 4680tatacatgca
tttacttata atacagtttt ttagttttgc tggccgcatc ttctcaaata 4740tgcttcccag
cctgcttttc tgtaacgttc accctctacc ttagcatccc ttccctttgc 4800aaatagtcct
cttccaacaa taataatgtc agatcctgta gagaccacat catccacggt 4860tctatactgt
tgacccaatg cgtctccctt gtcatctaaa cccacaccgg gtgtcataat 4920caaccaatcg
taaccttcat ctcttccacc catgtctctt tgagcaataa agccgataac 4980aaaatctttg
tcgctcttcg caatgtcaac agtaccctta gtatattctc cagtagctag 5040ggagcccttg
catgacaatt ctgctaacat caaaaggcct ctaggttcct ttgttacttc 5100ttccgccgcc
tgcttcaaac cgctaacaat acctgggccc accacaccgt gtgcattcgt 5160aatgtctgcc
cattctgcta ttctgtatac acccgcagag tactgcaatt tgactgtatt 5220accaatgtca
gcaaattttc tgtcttcgaa gagtaaaaaa ttgtacttgg cggataatgc 5280ctttagcggc
ttaactgtgc cctccatgga aaaatcagtc aagatatcca catgtgtttt 5340tagtaaacaa
attttgggac ctaatgcttc aactaactcc agtaattcct tggtggtacg 5400aacatccaat
gaagcacaca agtttgtttg cttttcgtgc atgatattaa atagcttggc 5460agcaacagga
ctaggatgag tagcagcacg ttccttatat gtagctttcg acatgattta 5520tcttcgtttc
ctgcaggttt ttgttctgtg cagttgggtt aagaatactg ggcaatttca 5580tgtttcttca
acaccacata tgcgtatata taccaatcta agtctgtgct ccttccttcg 5640ttcttccttc
tgctcggaga ttaccgaatc aaagctagct tatcgatgat aagctgtcaa 5700agatgagaat
taattccacg gactatagac tatactagat actccgtcta ctgtacgata 5760cacttccgct
caggtccttg tcctttaacg aggccttacc actcttttgt tactctattg 5820atccagctca
gcaaaggcag tgtgatctaa gattctatct tcgcgatgta gtaaaactag 5880ctagaccgag
aaagagacta gaaatgcaaa aggcacttct acaatggctg ccatcattat 5940tatccgatgt
gacgctgcag cttctcaatg atattcgaat acgctttgag gagatacagc 6000ctaatatccg
acaaactgtt ttacagattt acgatcgtac ttgttaccca tcattgaatt 6060ttgaacatcc
gaacctggga gttttccctg aaacagatag tatatttgaa cctgtataat 6120aatatatagt
ctagcgcttt acggaagaca atgtatgtat ttcggttcct ggagaaacta 6180ttgcatctat
tgcataggta atcttgcacg tcgcatcccc ggttcatttt ctgcgtttcc 6240atcttgcact
tcaatagcat atctttgtta acgaagcatc tgtgcttcat tttgtagaac 6300aaaaatgcaa
cgcgagagcg ctaatttttc aaacaaagaa tctgagctgc atttttacag 6360aacagaaatg
caacgcgaaa gcgctatttt accaacgaag aatctgtgct tcatttttgt 6420aaaacaaaaa
tgcaacgcga cgagagcgct aatttttcaa acaaagaatc tgagctgcat 6480ttttacagaa
cagaaatgca acgcgagagc gctattttac caacaaagaa tctatacttc 6540ttttttgttc
tacaaaaatg catcccgaga gcgctatttt tctaacaaag catcttagat 6600tacttttttt
ctcctttgtg cgctctataa tgcagtctct tgataacttt ttgcactgta 6660ggtccgttaa
ggttagaaga aggctacttt ggtgtctatt ttctcttcca taaaaaaagc 6720ctgactccac
ttcccgcgtt tactgattac tagcgaagct gcgggtgcat tttttcaaga 6780taaaggcatc
cccgattata ttctataccg atgtggattg cgcatacttt gtgaacagaa 6840agtgatagcg
ttgatgattc ttcattggtc agaaaattat gaacggtttc ttctattttg 6900tctctatata
ctacgtatag gaaatgttta cattttcgta ttgttttcga ttcactctat 6960gaatagttct
tactacaatt tttttgtcta aagagtaata ctagagataa acataaaaaa 7020tgtagaggtc
gagtttagat gcaagttcaa ggagcgaaag gtggatgggt aggttatata 7080gggatatagc
acagagatat atagcaaaga gatacttttg agcaat
7126857925DNAArtificial SequenceSynthetic Nucleic Acid 85gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgaccgtcg ccggcaggat 120cactgacgcg
gtacgcatag gaaatggact tgaccagcga gatctagccc ccgtcgggtg 180gtacgcacac
gaacaggccg tggcgcgact gaaggccagt ttcgacgcgg tccccgccgg 240gcgtcgcgtg
cggctggcga agaagacgtc caaccttttc cgcgggcgtt ccggcgaggc 300agtcgggctc
gacgtgtcgg ggctgcacgg cgtcatcgcc gtcgaccccg ttgagggcac 360cgctgacgtc
cagggcatgt gcacgtacga ggacctggtg gacgtcctgc tgccctacgg 420tctggcgccc
accgtcgttc cgcagctgaa gaccatcact ctcggcggtg cggtgaccgg 480catgggggtg
gaatccacct ccttccgcaa cggcctgccg cacgaagccg tcctggaaat 540ggatgtgctc
accggtaccg gagacatcct cacctgttcg ccgacccaga acaccgacct 600ctaccgcggc
ttccccaact cctacggttc cctgggatac agcgtgcggc tgaaggtgcg 660gtgcgaacgg
gtggaaccct acgtcgacct gcggcatgta cgcttcgatg acgttcagtc 720gctcaccgac
gccctcgaca acatcgtcgt ggacaaggag tacgagggtg aacgggtcga 780ctatctcgac
ggtgtggtct tcagcctgga ggagagctac ctcgtcctgg gacgggcgac 840cagcgaggcc
ggccccgtta gcgactacac ccgcgagcgc agttactacc gttctctgca 900gcatccgtcg
ggggtcctgc gcgacaagtt gaccatccgc gactacctct ggcggtggga 960cgtcgactgg
ttctggtgca accgggcctt cggtacccag aaccccacca tccgtactct 1020gtggccgcgg
gatctcctgc ggtcgagctt ctactggaag atcatcggct gggaccgacg 1080cttcgacatc
gcggaccgga tcgaggcaca caacgggcgc cccgcacgcg agcgcgtcgt 1140ccaggacatc
gaggtcaccc ccgacaacct gccggagttc ctcacgtggt tcttcaccca 1200ctgcgagatc
gagccggtgt ggctgtgccc cattcgactg gccgacgact cgggcgagcg 1260gacaccgtgg
cccctgtacc cgctgtcacc cggcgacacc tgggtcaacg tgggattctg 1320gagctcggtg
cccgccgacc tgatggggaa ggacgccccg accggagcct tcaaccggga 1380ggtggagaga
gtcgtctcgg acctcggcgg acacaagtcg ttgtactccg aggcattcta 1440ttctgaggaa
cagttcgccg ccctctacgg cggtgaacgt cccgcacaac tcaaggcggt 1500cttcgacccg
gatgaccggt tccccgggtt gtacgagaag accgtgggcg gcgtctgacg 1560acacgcacga
cgacgcacac cgagcacgat gacgcacgac aagcacgatg acgcatgatg 1620accaagagga
gagagatgag caggggattc acgccgctga cggtgggaca gatcgtggac 1680aaggtcatca
caccgccggc accgttccgg gtgaccgctt tcgacggatc caccgcgggg 1740ccggcagacg
cggaactggc actggagatc acatcgccgg acgccctggc ctatatcgtg 1800accgcgccgg
gcgacctcgg actggcacgc gcctacatca ccggaagcct ccgcgtcacc 1860ggtgacgagc
ccggccaccc gtacctcgtc tttgaccacc tccagcacct ttacgaccag 1920atccgacgcc
cctcggcgaa ggacctgctg gatatcgccc gctcgctgaa ggccatgggg 1980gcgatcaagg
tgcagccggc accggagcag gagaccctcc cgggctggaa gagggccata 2040ctcgaggggc
tgtcccggca ctctccggaa cgggacaagg aggtcgtgag ccgccactac 2100gacgtgggca
atgacttcta cgagctcttc ctcggcgatt ccatggccta cacctgtgcc 2160tactatcccg
agtttgacgg tgagaaccag gtcaccggtc ccaccggcgg gtggcggtac 2220gacgactggg
agaaagggcc gaccgccaac gggccgttga cccaggcgca ggacaacaag 2280catcgcctgg
tcttcgacaa gctgcgactc aacccgggtg accggttgtt ggacgtcggc 2340tgcgggtggg
gcggtatggt gcggtacgcc gcccgccacg gcgtgaaggc catcggtgtc 2400acgctgtccc
gagagcagta cgagtggggt aaggcgaaga tcgaggagga gggtctgcag 2460gacctcgccg
aggtccggtg tatggactac cgtgacgtgc cggagtccga cttcgacgcg 2520gtcagtgcca
tcggcatcct cgagcacatc ggcgtgccca actacgagga ctacttcacc 2580cgcctgttcg
ccaagctgcg cccgggcggt cggatgctga accactgcat cacccgtccg 2640cacaaccgga
agacgaagac cggccagttc atcgaccgct acatcttccc cgacggtgag 2700ctgaccggct
cgggccggat catcacgatc atgcaggaca ccggattcga cgtcgtccac 2760gaggagaatc
tgcgaccgca ctaccagcgc acgttgcatg actggtgtga actgttggcc 2820accaactggg
accaggccgt ccatctcgtg ggcgaggaga cggctcgtct gttcggcctg 2880tacatggcgg
ggtcggaatg gggtttcgaa cacaacgtga tccagctcca ccaggttctc 2940ggcgtgaagc
cggacgcggc aggcagttcc ggggtgccgg tccgccagtg gtggaggtcc 3000tgacggtaac
gtcgggacga tgagacggat caccagaggc gctgcggtgg cggtgctgtg 3060cacaccgttg
ctgctcggag cctgcaccat cggcgacgcg ggaccggggg acgagaccac 3120ggaccctgtc
gtggacactg aagcaccgcc cgataaaccg gtgccggact ctgcggcgga 3180atccggcgct
gaagacggac ctgattctga ggtgccggac gaccccgacc agcctgatgc 3240tgagccggtg
gagactgatc ccgacgcccc gggggcccgg ggactggcga tcggtgactg 3300cgtcgccgac
atggaccagc tcgacggcac cggcgacatc gacgtcgtcg actgcgccgg 3360cccccatgcc
ggcgaggtgt acgcacaggc ggatatcgca ggtaagaacc tgttccccgg 3420caacgagccg
ttggggcagg aggcgggagc gatctgcggg ggtgactcct tcaccggcta 3480tgtcggcatc
ggattccccg agtcctcgct ggacgtcgtc acgatgatgc cgtccaagga 3540gagctgggcg
caggaggacc ggacggtgac ctgtgtggtc accgacccga acctcgagca 3600gatcgccggc
acgctcgagc agagctggcg ttagcctcaa aatatatttt ccctctatct 3660tctcgttgcg
cttaatttga ctaattctca ttagcgaggc gcgcctttcc ataggctccg 3720cccccctgac
gagcatcaca aaaatcgacg ctcaagtcag aggtggcgaa acccgacagg 3780actataaaga
taccaggcgt ttccccctgg aagctccctc gtgcgctctc ctgttccgac 3840cctgccgctt
accggatacc tgtccgcctt tctcccttcg ggaagcgtgg cgctttctca 3900tagctcacgc
tgtaggtatc tcagttcggt gtaggtcgtt cgctccaagc tgggctgtgt 3960gcacgaaccc
cccgttcagc ccgaccgctg cgccttatcc ggtaactatc gtcttgagtc 4020caacccggta
agacacgact tatcgccact ggcagcagcc actggtaaca ggattagcag 4080agcgaggtat
gtaggcggtg ctacagagtt cttgaagtgg tggcctaact acggctacac 4140tagaagaaca
gtatttggta tctgcgctct gctgaagcca gttaccttcg gaaaaagagt 4200tggtagctct
tgatccggca aacaaaccac cgctggtagc ggtggttttt ttgtttgcaa 4260gcagcagatt
acgcgcagaa aaaaaggatc tcaagaagat cctttgatct tttctacggg 4320gtctgacgct
cagtggaacg aaaactcacg ttaagggatt ttggtcatga gattatcaaa 4380aaggatcttc
acctagatcc ttttaaatta aaaatgaagt tttaaatcaa tctaaagtat 4440atatgagtaa
acttggtctg acagttacca atgcttaatc agtgaggcac ctatctcagc 4500gatctgtcta
tttcgttcat ccatagttgc ctgactcccc gtcgtgtaga taactacgat 4560acgggagggc
ttaccatctg gccccagtgc tgcaatgata ccgcgagacc cacgctcacc 4620ggctccagat
ttatcagcaa taaaccagcc agccggaagg gccgagcgca gaagtggtcc 4680tgcaacttta
tccgcctcca tccagtctat taattgttgc cgggaagcta gagtaagtag 4740ttcgccagtt
aatagtttgc gcaacgttgt tgccattgct acaggcatcg tggtgtcacg 4800ctcgtcgttt
ggtatggctt cattcagctc cggttcccaa cgatcaaggc gagttacatg 4860atcccccatg
ttgtgcaaaa aagcggttag ctccttcggt cctccgatcg ttgtcagaag 4920taagttggcc
gcagtgttat cactcatggt tatggcagca ctgcataatt ctcttactgt 4980catgccatcc
gtaagatgct tttctgtgac tggtgagtac tcaaccaagt cattctgaga 5040atagtgtatg
cggcgaccga gttgctcttg cccggcgtca atacgggata ataccgcgcc 5100acatagcaga
actttaaaag tgctcatcat tggaaaacgt tcttcggggc gaaaactctc 5160aaggatctta
ccgctgttga gatccagttc gatgtaaccc actcgtgcac ccaactgatc 5220ttcagcatct
tttactttca ccagcgtttc tgggtgagca aaaacaggaa ggcaaaatgc 5280cgcaaaaaag
ggaataaggg cgacacggaa atgttgaata ctcatactct tcctttttca 5340atattattga
agcatttatc agggttattg tctcatgagc ggatacatat ttgaatgtat 5400ttagaaaaat
aaacagcgat cgcgcggccg cgggtaataa ctgatataat taaattgaag 5460ctctaatttg
tgagtttagt atacatgcat ttacttataa tacagttttt tagttttgct 5520ggccgcatct
tctcaaatat gcttcccagc ctgcttttct gtaacgttca ccctctacct 5580tagcatccct
tccctttgca aatagtcctc ttccaacaat aataatgtca gatcctgtag 5640agaccacatc
atccacggtt ctatactgtt gacccaatgc gtctcccttg tcatctaaac 5700ccacaccggg
tgtcataatc aaccaatcgt aaccttcatc tcttccaccc atgtctcttt 5760gagcaataaa
gccgataaca aaatctttgt cgctcttcgc aatgtcaaca gtacccttag 5820tatattctcc
agtagctagg gagcccttgc atgacaattc tgctaacatc aaaaggcctc 5880taggttcctt
tgttacttct tccgccgcct gcttcaaacc gctaacaata cctgggccca 5940ccacaccgtg
tgcattcgta atgtctgccc attctgctat tctgtataca cccgcagagt 6000actgcaattt
gactgtatta ccaatgtcag caaattttct gtcttcgaag agtaaaaaat 6060tgtacttggc
ggataatgcc tttagcggct taactgtgcc ctccatggaa aaatcagtca 6120agatatccac
atgtgttttt agtaaacaaa ttttgggacc taatgcttca actaactcca 6180gtaattcctt
ggtggtacga acatccaatg aagcacacaa gtttgtttgc ttttcgtgca 6240tgatattaaa
tagcttggca gcaacaggac taggatgagt agcagcacgt tccttatatg 6300tagctttcga
catgatttat cttcgtttcc tgcaggtttt tgttctgtgc agttgggtta 6360agaatactgg
gcaatttcat gtttcttcaa caccacatat gcgtatatat accaatctaa 6420gtctgtgctc
cttccttcgt tcttccttct gctcggagat taccgaatca aagctagctt 6480atcgatgata
agctgtcaaa gatgagaatt aattccacgg actatagact atactagata 6540ctccgtctac
tgtacgatac acttccgctc aggtccttgt cctttaacga ggccttacca 6600ctcttttgtt
actctattga tccagctcag caaaggcagt gtgatctaag attctatctt 6660cgcgatgtag
taaaactagc tagaccgaga aagagactag aaatgcaaaa ggcacttcta 6720caatggctgc
catcattatt atccgatgtg acgctgcagc ttctcaatga tattcgaata 6780cgctttgagg
agatacagcc taatatccga caaactgttt tacagattta cgatcgtact 6840tgttacccat
cattgaattt tgaacatccg aacctgggag ttttccctga aacagatagt 6900atatttgaac
ctgtataata atatatagtc tagcgcttta cggaagacaa tgtatgtatt 6960tcggttcctg
gagaaactat tgcatctatt gcataggtaa tcttgcacgt cgcatccccg 7020gttcattttc
tgcgtttcca tcttgcactt caatagcata tctttgttaa cgaagcatct 7080gtgcttcatt
ttgtagaaca aaaatgcaac gcgagagcgc taatttttca aacaaagaat 7140ctgagctgca
tttttacaga acagaaatgc aacgcgaaag cgctatttta ccaacgaaga 7200atctgtgctt
catttttgta aaacaaaaat gcaacgcgac gagagcgcta atttttcaaa 7260caaagaatct
gagctgcatt tttacagaac agaaatgcaa cgcgagagcg ctattttacc 7320aacaaagaat
ctatacttct tttttgttct acaaaaatgc atcccgagag cgctattttt 7380ctaacaaagc
atcttagatt actttttttc tcctttgtgc gctctataat gcagtctctt 7440gataactttt
tgcactgtag gtccgttaag gttagaagaa ggctactttg gtgtctattt 7500tctcttccat
aaaaaaagcc tgactccact tcccgcgttt actgattact agcgaagctg 7560cgggtgcatt
ttttcaagat aaaggcatcc ccgattatat tctataccga tgtggattgc 7620gcatactttg
tgaacagaaa gtgatagcgt tgatgattct tcattggtca gaaaattatg 7680aacggtttct
tctattttgt ctctatatac tacgtatagg aaatgtttac attttcgtat 7740tgttttcgat
tcactctatg aatagttctt actacaattt ttttgtctaa agagtaatac 7800tagagataaa
cataaaaaat gtagaggtcg agtttagatg caagttcaag gagcgaaagg 7860tggatgggta
ggttatatag ggatatagca cagagatata tagcaaagag atacttttga 7920gcaat
7925867141DNAArtificial SequenceSynthetic Nucleic Acid 86gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat atgcgggagg gtggacgccc 120cttccgtgcg
catcgcactc tgcccgtcac cgggatcgac gctcaccgcg ccggcgtcga 180acggcttctc
gcgtcctacc gcgcgattcc cacggacgcc accgtgcgac tcgcgaagaa 240gacgtccaac
ctgttccggg cgcgggccca gaccagcgca cccggcctcg acgtctccgg 300gctcggcgga
gtcatctcgg tcgacgagca ggaccggacc gcggatgtcg ccggaatgtg 360cacgtacgaa
gacctggtgg acgccaccct cccgtacggg ctggcgccgc tggtggttcc 420gcaactcaag
accatcacac tcggcggcgc ggtcaccggc ctcggcatcg agtcgacgtc 480gttccgcaac
gggctccccc acgaatcggt cctcgagatc gacgtcctga ccggaagcgg 540cgacatcgtc
accgcgagac cggaaggcga gaactccgac ctgttctggg ggttccccaa 600ctcctacgga
accctcggct actccacccg actgcgcatc cagctcgaac ccgtcaaacg 660gtatgtggca
ctgcgccatc tgcgtttcga ctccctggac gagctgcagt cggcaatgga 720tcgcatcgtc
accgagcgcg tccacgacgg catccccgtc gactatctgg acggcgtcgt 780gttcaccgcg
tccgagagtt acctgacact gggccatcag accgacgagg gcggccccgt 840cagcgactac
accgggcaga acatcttcta ccggtccatc cagcacagtt ccgtgaacca 900ccccaaaacg
gacaaactca ccatccgaga ctacctgtgg cgctgggaca ccgactggtt 960ctggtgctcg
cgcgccttcg gcgcccagaa ccccaccatc cgccggctgt ggccgaagaa 1020cctcctccgc
agcagcttct actggaagct catcgccctc gaccacaagt acgacatcgg 1080cgaccgactc
gagaagcgca agggcaaccc gccacgcgaa cgcgtcgtgc aggacgtcga 1140agtgcccatc
gagcgcaccg cggacttcgt ccgctggttc ctcgacgaaa tcccgatcga 1200accgctgtgg
ctgtgcccgt tgcggttgcg ggaacctgcc cccgccggcg cgtcctcgca 1260acgcccctgg
cccctgtacc ccctcgaacc gaaacgcacg tacgtgaaca tcggattctg 1320gtcatcggtg
cccatcgttc cgggccgacc cgagggggcc gcgaatcggc tgatcgaaga 1380caaggtcagt
gacttcgacg gacacaagtc cctctactcc gattcgtact attcacgcga 1440agatttcgaa
cgcctctact acggcggcga tcgatacacg gaactgaaaa aacgctacga 1500cccgaaatca
cgattactgg accttttctc caaggcggtg caacgtcgat gacaactctg 1560aaagcttcac
gctcccagga ccacaagctg accatcgcag agattctcga aactctgtcc 1620gacggcatgc
tccccctgcg gttctccgcc tacgacggca gcgccgccgg cccggaggac 1680gccccctacg
gtctccacct caagacgacc cgaggcacca cctacctggc gaccgccccc 1740ggcgacctcg
gcatggcccg ggcctacgtg tccggcgacc tcgaggcccg cggcgtccac 1800cccggcgacc
cgtacgagat cctccgcgtg atgggcgacg aactgcactt ccgccgtccg 1860tccgcgctca
cgctcgccgc catcacgcgc tcgctcggct gggatctgct gcgccccatc 1920gcccctcccc
cgcaggagca tctcccgcgg tggcgtcgag tcgcggaagg gttgcggcac 1980tccaagtccc
gcgacgccga ggtcatccac caccactacg acgtctcgaa caccttctac 2040gagtatgtcc
tcggcccgtc catgacgtac acgtgcgcct gctacgagaa cgccgagcag 2100accctcgaag
aggcacagga caacaagtac cgcctcgtct tcgagaagct cggcctccag 2160cccggcgacc
gactgctcga catcggttgc ggctggggat cgatggtccg gtacgccgcc 2220cgccgcggcg
tcaaggtcat cggcgccacc ctgtcccgag agcaggccga atgggcacag 2280aaggccatcg
ccgaagaagg actgtccgac ctcgccgagg tccggttctc cgactaccgt 2340gacgtccccg
agaccggatt cgacgccatc tcctcgatcg gcctgaccga gcacatcggc 2400gtcggcaact
accccgccta cttcggactg ctgcagagca agctccgcga gggcggccgg 2460ctgctgaacc
actgcatcac ccggcccgac aaccagagtc aggcacgcgc gggcggcttc 2520atcgaccggt
acgtcttccc cgacggcgaa ctcaccggct ccggacgcat catcaccgag 2580atccagaacg
tcggactcga ggtgcggcac gaggagaatc tgcgcgagca ctacgcactc 2640accctcgccg
gctggtgcca gaacctcgtc gacaactggg acgcctgcgt cgccgaggtc 2700ggcgaaggca
ccgcacgtgt gtggggtctc tacatggccg ggtcgcgact gggcttcgaa 2760cgcaacgtcg
ttcagctgca ccaggtcctc gccgtcaagc tcggacccaa gggcgaggcg 2820catgtgccgc
tgcgtccgtg gtggaagtag cctcaaaata tattttccct ctatcttctc 2880gttgcgctta
atttgactaa ttctcattag cgaggcgcgc ctttccatag gctccgcccc 2940cctgacgagc
atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta 3000taaagatacc
aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg 3060ccgcttaccg
gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcatagc 3120tcacgctgta
ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac 3180gaaccccccg
ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac 3240ccggtaagac
acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg 3300aggtatgtag
gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga 3360agaacagtat
ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt 3420agctcttgat
ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag 3480cagattacgc
gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct 3540gacgctcagt
ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg 3600atcttcacct
agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat 3660gagtaaactt
ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc 3720tgtctatttc
gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg 3780gagggcttac
catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct 3840ccagatttat
cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca 3900actttatccg
cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg 3960ccagttaata
gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg 4020tcgtttggta
tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc 4080cccatgttgt
gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag 4140ttggccgcag
tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg 4200ccatccgtaa
gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag 4260tgtatgcggc
gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat 4320agcagaactt
taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg 4380atcttaccgc
tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca 4440gcatctttta
ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca 4500aaaaagggaa
taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat 4560tattgaagca
tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag 4620aaaaataaac
agcgatcgcg cggccgcggg taataactga tataattaaa ttgaagctct 4680aatttgtgag
tttagtatac atgcatttac ttataataca gttttttagt tttgctggcc 4740gcatcttctc
aaatatgctt cccagcctgc ttttctgtaa cgttcaccct ctaccttagc 4800atcccttccc
tttgcaaata gtcctcttcc aacaataata atgtcagatc ctgtagagac 4860cacatcatcc
acggttctat actgttgacc caatgcgtct cccttgtcat ctaaacccac 4920accgggtgtc
ataatcaacc aatcgtaacc ttcatctctt ccacccatgt ctctttgagc 4980aataaagccg
ataacaaaat ctttgtcgct cttcgcaatg tcaacagtac ccttagtata 5040ttctccagta
gctagggagc ccttgcatga caattctgct aacatcaaaa ggcctctagg 5100ttcctttgtt
acttcttccg ccgcctgctt caaaccgcta acaatacctg ggcccaccac 5160accgtgtgca
ttcgtaatgt ctgcccattc tgctattctg tatacacccg cagagtactg 5220caatttgact
gtattaccaa tgtcagcaaa ttttctgtct tcgaagagta aaaaattgta 5280cttggcggat
aatgccttta gcggcttaac tgtgccctcc atggaaaaat cagtcaagat 5340atccacatgt
gtttttagta aacaaatttt gggacctaat gcttcaacta actccagtaa 5400ttccttggtg
gtacgaacat ccaatgaagc acacaagttt gtttgctttt cgtgcatgat 5460attaaatagc
ttggcagcaa caggactagg atgagtagca gcacgttcct tatatgtagc 5520tttcgacatg
atttatcttc gtttcctgca ggtttttgtt ctgtgcagtt gggttaagaa 5580tactgggcaa
tttcatgttt cttcaacacc acatatgcgt atatatacca atctaagtct 5640gtgctccttc
cttcgttctt ccttctgctc ggagattacc gaatcaaagc tagcttatcg 5700atgataagct
gtcaaagatg agaattaatt ccacggacta tagactatac tagatactcc 5760gtctactgta
cgatacactt ccgctcaggt ccttgtcctt taacgaggcc ttaccactct 5820tttgttactc
tattgatcca gctcagcaaa ggcagtgtga tctaagattc tatcttcgcg 5880atgtagtaaa
actagctaga ccgagaaaga gactagaaat gcaaaaggca cttctacaat 5940ggctgccatc
attattatcc gatgtgacgc tgcagcttct caatgatatt cgaatacgct 6000ttgaggagat
acagcctaat atccgacaaa ctgttttaca gatttacgat cgtacttgtt 6060acccatcatt
gaattttgaa catccgaacc tgggagtttt ccctgaaaca gatagtatat 6120ttgaacctgt
ataataatat atagtctagc gctttacgga agacaatgta tgtatttcgg 6180ttcctggaga
aactattgca tctattgcat aggtaatctt gcacgtcgca tccccggttc 6240attttctgcg
tttccatctt gcacttcaat agcatatctt tgttaacgaa gcatctgtgc 6300ttcattttgt
agaacaaaaa tgcaacgcga gagcgctaat ttttcaaaca aagaatctga 6360gctgcatttt
tacagaacag aaatgcaacg cgaaagcgct attttaccaa cgaagaatct 6420gtgcttcatt
tttgtaaaac aaaaatgcaa cgcgacgaga gcgctaattt ttcaaacaaa 6480gaatctgagc
tgcattttta cagaacagaa atgcaacgcg agagcgctat tttaccaaca 6540aagaatctat
acttcttttt tgttctacaa aaatgcatcc cgagagcgct atttttctaa 6600caaagcatct
tagattactt tttttctcct ttgtgcgctc tataatgcag tctcttgata 6660actttttgca
ctgtaggtcc gttaaggtta gaagaaggct actttggtgt ctattttctc 6720ttccataaaa
aaagcctgac tccacttccc gcgtttactg attactagcg aagctgcggg 6780tgcatttttt
caagataaag gcatccccga ttatattcta taccgatgtg gattgcgcat 6840actttgtgaa
cagaaagtga tagcgttgat gattcttcat tggtcagaaa attatgaacg 6900gtttcttcta
ttttgtctct atatactacg tataggaaat gtttacattt tcgtattgtt 6960ttcgattcac
tctatgaata gttcttacta caattttttt gtctaaagag taatactaga 7020gataaacata
aaaaatgtag aggtcgagtt tagatgcaag ttcaaggagc gaaaggtgga 7080tgggtaggtt
atatagggat atagcacaga gatatatagc aaagagatac ttttgagcaa 7140t
7141877588DNAArtificial SequenceSynthetic Nucleic Acid 87gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgaactgtc agtcttccgc 120gtccaacctc
gccaaccaca tcaacgcggt gtacgagctg cgccgcgcct atgcgcggct 180gtccgccgac
aagccggtgc gcctggcgaa gaccacctcc aacctcttcc gcttccgcag 240ccgggacgat
gccgcgcgtc tcgacgtcag cgctttcacc tcggtgatca gcatcgacac 300ggaggcgcgg
gtcgcggagg tgggcggcat gaccacctac gaggacctgg tcgccgccac 360cctgcggcat
ggcctgatgc cgccggtggt tccgcaactg cgcacgatca ccctgggcgg 420tgcggtcacc
gggctgggga tcgaatcctc gtccttccgc aacgggctcc cgcacgagtc 480agtggaagag
atggagatcc tcaccggcag cggccaggtg gtggtggccc ggcgcgacaa 540cgagcaccgc
gacctgttct acggtttccc caactcgtac ggcaccctcg gttacgcgct 600gcggctccgc
atccagctcg aaccggtccg cccctacgtc cacctgcggc acctgcggtt 660caccgatgcc
gcagcggcca tggccgcgct ggagcagatc tgcgcggacc gcacccacga 720cggggagacc
gtcgacttcg tcgacggcgt cgtgttcgcc cgcaacgagc tgtacctgac 780cttggggacg
ttcaccgacc gggctccgtg gaccagcgac tacaccggaa ccgacatcta 840ctaccggtcg
atcccccgct acgcgggccc cggccccggc gactacctca ccacgcacga 900ctacctgtgg
cggtgggaca ccgactggtt ctggtgctcc cgcgccttcg gactgcagca 960tcccgtggtg
cgccgcctgt ggccgcgttc cttgaaacgc tccgacgtct accgcaagct 1020cgtcgcctgg
gaccggcgca ctgacgcgag ccgcctgctc gactactacc gcgggcgccc 1080gcccaaggaa
ccggtgatcc aggacatcga ggttgaggtg gggcgggctg ccgagttcct 1140cgacttcttc
cacaccgaga tcggcatgtc cccggtgtgg ctgtgcccgc tgcggctgcg 1200agaagacaca
gccgacgata cggaaccggt ctggccgctc taccccctca aaccccgccg 1260cctctacgtc
aacttcgggt tttggggcct cgttccgatc cgtcccggtg gaggcaggac 1320ataccacaac
cggctgatcg aaaaagaagt gacccggttg ggcgggcaca agtcgctcta 1380ctcggacgcc
ttctacgacg aggacgagtt ctgggagctc tacaacgggg agatctaccg 1440caagctcaaa
gctgcctacg accccgacgg tcgactgctc gacctgtaca ccaagtgcgt 1500cggcggcggg
tgagaaagga tgagggatgc gactggcgga ggtattcgaa cgtgtcgtcg 1560gacccgatgc
gcccgtccac ttccgggcct acgacggcag cactgcggga gatccacgca 1620gtgaagtcgc
tatcgtggtt cgccacccgg cagccgtcaa ctacatcgtc caagcgccgg 1680gagcactcgg
tttgacccgc gcctacgtgg cgggatacct cgacgtcgaa ggggacatgt 1740acaccgcgct
gcgggcaatg gccgacgtgg tgttccagga ccggccgcgg ctgtcccccg 1800gggaactgct
gcggatcatc cgcgggatcg ggtgggtgaa gttcgtcaac cggcttccac 1860cgccgccgca
ggaggtgcgc cagtcccgcc tcgccgccct gggctggcgc cactccaagc 1920agcgcgacgc
cgaagccatc cagcaccact acgacgtctc caacgccttc tacgccctgg 1980tcttgggcga
gtcgatgacc tacacctgcg cggtctaccc gaccgagcag gccacgctgg 2040agcaggcaca
gttcttcaag cacgagctga tcgcccgcaa gctcggtctt gcccctggga 2100tacgactgct
ggatgtgggg tgcggctggg gcggcatggt catccacgcg gcccgggagc 2160acggggtcaa
agccctgggg gtgaccctgt ccaaagagca ggctgagtgg gcgcagaagc 2220ggatcgccca
cgagggcctg ggcgacctgg cagaagtccg gcacatggac taccgggacc 2280tgcccgacgg
cgagtacgac gcgatcagct cgatcgggtt gaccgagcac gtcggcaaaa 2340agaacgtgcc
cgcctacttc gcgtcgctgt accgcaagct cgtcccggga ggccgcctgc 2400tcaaccactg
catcacccgg ccccgcaacg acctgccgcc cttcaaacgc ggcggggtga 2460tcaaccgcta
cgtcttcccc gatggggagc tggaagggcc cggctggctg caggcggcga 2520tgaacgacgc
cgggttcgaa atccgccacc aggagaacct gcgggagcac tacgcacgga 2580ccctgcggga
ctggctggcc aacctggacc gcaactggga tgccgcggtg cgggaagtgg 2640gggagggcac
ggcccgagtg tggcggctct acatggccgg gtgcgtgctc ggcttcgaac 2700gcaacgtggt
gcaactgcac cagatcctcg gggtgaagct cgacgggacc gaggcgcgga 2760tgccgctgcg
ccccgacttc gaaccgccgc tgccttaacc gcggtgcaca gccgggggat 2820atcagtcgcg
gaaccgggca tgatgagccc atggctgcga ccgatgacga ccggcaccac 2880accaccgtcg
ccctcgacct catcgacgcg tatgtgcgcg ccgaccgcag aatgatcggt 2940gaacgttccg
cggggatcag cgcggaggcg ggggagcgga tcgtctccac cctgaaagtg 3000tgcgcggcct
tccttgcccg ccgggtccag gagaccgggg tgccgtggcg cgcagcggac 3060tcccgggaag
cggtcgcccg caccgtcgcc gacctgctgg aacccgaggt ggaattcgcg 3120gtcgtctccg
cctgggaggc gtacgcgatc ggggagcacg aggccgcctg ggtccgggcg 3180cacggcgatc
cgctggtctt cgtccacatg ctggccgcgt tctccgctgc tatcggcaca 3240gcggtctacg
gccgtgagga gctgctgccc acgctgcgca gggtgacagc acgataacct 3300caaaatatat
tttccctcta tcttctcgtt gcgcttaatt tgactaattc tcattagcga 3360ggcgcgcctt
tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt 3420cagaggtggc
gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc 3480ctcgtgcgct
ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct 3540tcgggaagcg
tggcgctttc tcatagctca cgctgtaggt atctcagttc ggtgtaggtc 3600gttcgctcca
agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta 3660tccggtaact
atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca 3720gccactggta
acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag 3780tggtggccta
actacggcta cactagaaga acagtatttg gtatctgcgc tctgctgaag 3840ccagttacct
tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt 3900agcggtggtt
tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa 3960gatcctttga
tcttttctac ggggtctgac gctcagtgga acgaaaactc acgttaaggg 4020attttggtca
tgagattatc aaaaaggatc ttcacctaga tccttttaaa ttaaaaatga 4080agttttaaat
caatctaaag tatatatgag taaacttggt ctgacagtta ccaatgctta 4140atcagtgagg
cacctatctc agcgatctgt ctatttcgtt catccatagt tgcctgactc 4200cccgtcgtgt
agataactac gatacgggag ggcttaccat ctggccccag tgctgcaatg 4260ataccgcgag
acccacgctc accggctcca gatttatcag caataaacca gccagccgga 4320agggccgagc
gcagaagtgg tcctgcaact ttatccgcct ccatccagtc tattaattgt 4380tgccgggaag
ctagagtaag tagttcgcca gttaatagtt tgcgcaacgt tgttgccatt 4440gctacaggca
tcgtggtgtc acgctcgtcg tttggtatgg cttcattcag ctccggttcc 4500caacgatcaa
ggcgagttac atgatccccc atgttgtgca aaaaagcggt tagctccttc 4560ggtcctccga
tcgttgtcag aagtaagttg gccgcagtgt tatcactcat ggttatggca 4620gcactgcata
attctcttac tgtcatgcca tccgtaagat gcttttctgt gactggtgag 4680tactcaacca
agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 4740tcaatacggg
ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 4800cgttcttcgg
ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 4860cccactcgtg
cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 4920gcaaaaacag
gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 4980atactcatac
tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 5040agcggataca
tatttgaatg tatttagaaa aataaacagc gatcgcgcgg ccgcgggtaa 5100taactgatat
aattaaattg aagctctaat ttgtgagttt agtatacatg catttactta 5160taatacagtt
ttttagtttt gctggccgca tcttctcaaa tatgcttccc agcctgcttt 5220tctgtaacgt
tcaccctcta ccttagcatc ccttcccttt gcaaatagtc ctcttccaac 5280aataataatg
tcagatcctg tagagaccac atcatccacg gttctatact gttgacccaa 5340tgcgtctccc
ttgtcatcta aacccacacc gggtgtcata atcaaccaat cgtaaccttc 5400atctcttcca
cccatgtctc tttgagcaat aaagccgata acaaaatctt tgtcgctctt 5460cgcaatgtca
acagtaccct tagtatattc tccagtagct agggagccct tgcatgacaa 5520ttctgctaac
atcaaaaggc ctctaggttc ctttgttact tcttccgccg cctgcttcaa 5580accgctaaca
atacctgggc ccaccacacc gtgtgcattc gtaatgtctg cccattctgc 5640tattctgtat
acacccgcag agtactgcaa tttgactgta ttaccaatgt cagcaaattt 5700tctgtcttcg
aagagtaaaa aattgtactt ggcggataat gcctttagcg gcttaactgt 5760gccctccatg
gaaaaatcag tcaagatatc cacatgtgtt tttagtaaac aaattttggg 5820acctaatgct
tcaactaact ccagtaattc cttggtggta cgaacatcca atgaagcaca 5880caagtttgtt
tgcttttcgt gcatgatatt aaatagcttg gcagcaacag gactaggatg 5940agtagcagca
cgttccttat atgtagcttt cgacatgatt tatcttcgtt tcctgcaggt 6000ttttgttctg
tgcagttggg ttaagaatac tgggcaattt catgtttctt caacaccaca 6060tatgcgtata
tataccaatc taagtctgtg ctccttcctt cgttcttcct tctgctcgga 6120gattaccgaa
tcaaagctag cttatcgatg ataagctgtc aaagatgaga attaattcca 6180cggactatag
actatactag atactccgtc tactgtacga tacacttccg ctcaggtcct 6240tgtcctttaa
cgaggcctta ccactctttt gttactctat tgatccagct cagcaaaggc 6300agtgtgatct
aagattctat cttcgcgatg tagtaaaact agctagaccg agaaagagac 6360tagaaatgca
aaaggcactt ctacaatggc tgccatcatt attatccgat gtgacgctgc 6420agcttctcaa
tgatattcga atacgctttg aggagataca gcctaatatc cgacaaactg 6480ttttacagat
ttacgatcgt acttgttacc catcattgaa ttttgaacat ccgaacctgg 6540gagttttccc
tgaaacagat agtatatttg aacctgtata ataatatata gtctagcgct 6600ttacggaaga
caatgtatgt atttcggttc ctggagaaac tattgcatct attgcatagg 6660taatcttgca
cgtcgcatcc ccggttcatt ttctgcgttt ccatcttgca cttcaatagc 6720atatctttgt
taacgaagca tctgtgcttc attttgtaga acaaaaatgc aacgcgagag 6780cgctaatttt
tcaaacaaag aatctgagct gcatttttac agaacagaaa tgcaacgcga 6840aagcgctatt
ttaccaacga agaatctgtg cttcattttt gtaaaacaaa aatgcaacgc 6900gacgagagcg
ctaatttttc aaacaaagaa tctgagctgc atttttacag aacagaaatg 6960caacgcgaga
gcgctatttt accaacaaag aatctatact tcttttttgt tctacaaaaa 7020tgcatcccga
gagcgctatt tttctaacaa agcatcttag attacttttt ttctcctttg 7080tgcgctctat
aatgcagtct cttgataact ttttgcactg taggtccgtt aaggttagaa 7140gaaggctact
ttggtgtcta ttttctcttc cataaaaaaa gcctgactcc acttcccgcg 7200tttactgatt
actagcgaag ctgcgggtgc attttttcaa gataaaggca tccccgatta 7260tattctatac
cgatgtggat tgcgcatact ttgtgaacag aaagtgatag cgttgatgat 7320tcttcattgg
tcagaaaatt atgaacggtt tcttctattt tgtctctata tactacgtat 7380aggaaatgtt
tacattttcg tattgttttc gattcactct atgaatagtt cttactacaa 7440tttttttgtc
taaagagtaa tactagagat aaacataaaa aatgtagagg tcgagtttag 7500atgcaagttc
aaggagcgaa aggtggatgg gtaggttata tagggatata gcacagagat 7560atatagcaaa
gagatacttt tgagcaat
7588887074DNAArtificial SequenceSynthetic Nucleic Acid 88gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat atgtcacagc tggcggtcac 120agaccaccac
gagcgagcgg tcgaggcgct gcgcaggtcg tatgcggcga tcccgccggg 180cacaccggtc
cgcttggcca agcagacctc caacctgttc cgcttccgcg agccgacggc 240cgcgcccggc
ctggacgtgt ccggcttcaa ccgggtgctg gcggtggacc cggatgcgcg 300caccgccgac
gtgcagggca tgaccaccta cgaggacctg gtcgacgcca ccctgccgca 360cgggctgatg
ccgctggtgg tgccccagct caagacgatc acgctgggcg gggcggtgac 420cggcctgggc
atcgagtcca cctccttccg caacggcctg ccgcacgagt cggtgctgga 480gatgcagatc
atcaccggcg ccggcgaagt ggtcaccgcc accccggacg gggagcactc 540cgacctgttc
tggggcttcc ccaactccta cgggacgctg gggtacgccc tgaagctgaa 600gatcgaactg
gagccggtca agccgtacgt ccggctgcgg cacctgcgct tcgacgacgc 660cggcgagtgc
gccgccaagc tcgccgagct gagcgaaagc cgcgagcacg agggcgatga 720ggtgcacttt
ttggacggca ccttcttcgg gccgcgcgag atgtacctga cgctcggcac 780gttcaccgac
accgccccct atgtgtcgga ctacaccggg cagcacatct actaccggtc 840gatccagcag
cggtcgatcg actttttgac catccgcgac tacctgtggc gctgggacac 900cgactggttc
tggtgctcgc gcgccctggg cgtgcagaac ccgctgatcc ggcgggtgtg 960gccgaagagc
gccaagcggt cggatgtgta ccgcaagctg gtggcctacg aaaagcgcta 1020ccagttcaag
gcgcgcatcg accggtggac gggcaagccg ccgcgcgagg acgtcatcca 1080ggacatcgag
gtgccggcag aacgcctgcc ggagttcctg gagttcttcc acgacaagat 1140cgggatgagc
ccggtgtggc tgtgcccgct gcgggcgcgc caccgctggc cgctgtaccc 1200gctcaagccc
ggcgtcacct acgtcaacgc cggcttctgg gggacggtgc cgctgcagcc 1260ggggcagatg
cccgagtacc acaaccggct gatcgaacgg aaggtcgccc aactggacgg 1320ccacaagtct
ctgtactcga cggcgttcta ctcgcgtgag gagttctggc ggcactacga 1380cggggaaacc
taccggcgtc tgaaggacac ctacgacccc gacgcgcgcc tgctcgacct 1440ctacgacaag
tgcgtgcggg gacgctgacc ggggcggcgg cgatgaagac ccgcggggcg 1500ggacggacag
gagggaagcg atgacgctgg ccaaggtctt cgaggagctg gtcggggcgg 1560acgcccctgt
ggagctcacc gcctacgacg gatcgagagc cggacgcctg ggcagtgatc 1620tgcgggtcca
cgtgaagtcg ccgtacgcgg tgtcctacct ggtgcactcg ccgagcgcgc 1680tcgggctggc
ccgcgcgtac gtggccgggc acctggacgc ctacggcgac atgtacacgc 1740tgctgcggga
gatgacgcag ctgaccgagg cgctgacgcc caaggcccgg ctgcggctgc 1800tggccggtgt
cctgcaggat ccgctgctgc gcgcggcggc cagccgccgt ctgccgcccc 1860cgccgcagga
ggtgcggacc ggccgcacct cctggttccg gcacaccaag cggcgggacg 1920ccaaggccat
ctcccaccac tacgacgtgt ccaacacctt ctatgagtgg gtgctgggcc 1980cgtcgatgac
ctacacctgc gcctgtttcc ccaccgagga cgccaccttg gaggaggcgc 2040agttccacaa
gcacgacctg gtcgccaaga agctcgggct gcggccgggc atgcggctgc 2100tggacgtggg
ctgcggctgg ggcggcatgg tgatgcacgc cgccaagcac tacggggtgc 2160gggcgctggg
cgtcacgctg tccaagcagc aggccgagtg ggcgcagaag gccatcgccg 2220aggcgggcct
gagcgacctg gccgaggtcc gccaccagga ctaccgggac gtcaccgagg 2280gcgacttcga
cgccatcagc tcgatcggcc tcaccgagca catcggcaag gccaacctgc 2340cgtcctactt
cggcttcctg tacggcaagc tcaagccggg cgggcggctg ctcaaccact 2400gcatcacccg
gcccgacaac acccagccgg ccatgaagaa ggacgggttc atcaaccggt 2460acgtcttccc
cgacggggag ctggaggggc ccggctacct gcagacccag atgaacgacg 2520ccggttttga
gatccgccac caggagaacc tgcgcgagca ctacgcccgc accctggccg 2580gatggtgccg
caacctcgat gagcactggg acgaggcggt ggccgaggtc ggcgagggca 2640ccgcgcgggt
gtggcggctg tacatggccg gcagccggct cggtttcgag ctcaactgga 2700tccagctgca
ccagatcctg ggcgtcaagc tcggcgagcg cggcgagtcc cgcatgccgt 2760tgcggcccga
ctggggcgtg tgacctcaaa atatattttc cctctatctt ctcgttgcgc 2820ttaatttgac
taattctcat tagcgaggcg cgcctttcca taggctccgc ccccctgacg 2880agcatcacaa
aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga ctataaagat 2940accaggcgtt
tccccctgga agctccctcg tgcgctctcc tgttccgacc ctgccgctta 3000ccggatacct
gtccgccttt ctcccttcgg gaagcgtggc gctttctcat agctcacgct 3060gtaggtatct
cagttcggtg taggtcgttc gctccaagct gggctgtgtg cacgaacccc 3120ccgttcagcc
cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa 3180gacacgactt
atcgccactg gcagcagcca ctggtaacag gattagcaga gcgaggtatg 3240taggcggtgc
tacagagttc ttgaagtggt ggcctaacta cggctacact agaagaacag 3300tatttggtat
ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt 3360gatccggcaa
acaaaccacc gctggtagcg gtggtttttt tgtttgcaag cagcagatta 3420cgcgcagaaa
aaaaggatct caagaagatc ctttgatctt ttctacgggg tctgacgctc 3480agtggaacga
aaactcacgt taagggattt tggtcatgag attatcaaaa aggatcttca 3540cctagatcct
tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata tatgagtaaa 3600cttggtctga
cagttaccaa tgcttaatca gtgaggcacc tatctcagcg atctgtctat 3660ttcgttcatc
catagttgcc tgactccccg tcgtgtagat aactacgata cgggagggct 3720taccatctgg
ccccagtgct gcaatgatac cgcgagaccc acgctcaccg gctccagatt 3780tatcagcaat
aaaccagcca gccggaaggg ccgagcgcag aagtggtcct gcaactttat 3840ccgcctccat
ccagtctatt aattgttgcc gggaagctag agtaagtagt tcgccagtta 3900atagtttgcg
caacgttgtt gccattgcta caggcatcgt ggtgtcacgc tcgtcgtttg 3960gtatggcttc
attcagctcc ggttcccaac gatcaaggcg agttacatga tcccccatgt 4020tgtgcaaaaa
agcggttagc tccttcggtc ctccgatcgt tgtcagaagt aagttggccg 4080cagtgttatc
actcatggtt atggcagcac tgcataattc tcttactgtc atgccatccg 4140taagatgctt
ttctgtgact ggtgagtact caaccaagtc attctgagaa tagtgtatgc 4200ggcgaccgag
ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca catagcagaa 4260ctttaaaagt
gctcatcatt ggaaaacgtt cttcggggcg aaaactctca aggatcttac 4320cgctgttgag
atccagttcg atgtaaccca ctcgtgcacc caactgatct tcagcatctt 4380ttactttcac
cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg 4440gaataagggc
gacacggaaa tgttgaatac tcatactctt cctttttcaa tattattgaa 4500gcatttatca
gggttattgt ctcatgagcg gatacatatt tgaatgtatt tagaaaaata 4560aacagcgatc
gcgcggccgc gggtaataac tgatataatt aaattgaagc tctaatttgt 4620gagtttagta
tacatgcatt tacttataat acagtttttt agttttgctg gccgcatctt 4680ctcaaatatg
cttcccagcc tgcttttctg taacgttcac cctctacctt agcatccctt 4740ccctttgcaa
atagtcctct tccaacaata ataatgtcag atcctgtaga gaccacatca 4800tccacggttc
tatactgttg acccaatgcg tctcccttgt catctaaacc cacaccgggt 4860gtcataatca
accaatcgta accttcatct cttccaccca tgtctctttg agcaataaag 4920ccgataacaa
aatctttgtc gctcttcgca atgtcaacag tacccttagt atattctcca 4980gtagctaggg
agcccttgca tgacaattct gctaacatca aaaggcctct aggttccttt 5040gttacttctt
ccgccgcctg cttcaaaccg ctaacaatac ctgggcccac cacaccgtgt 5100gcattcgtaa
tgtctgccca ttctgctatt ctgtatacac ccgcagagta ctgcaatttg 5160actgtattac
caatgtcagc aaattttctg tcttcgaaga gtaaaaaatt gtacttggcg 5220gataatgcct
ttagcggctt aactgtgccc tccatggaaa aatcagtcaa gatatccaca 5280tgtgttttta
gtaaacaaat tttgggacct aatgcttcaa ctaactccag taattccttg 5340gtggtacgaa
catccaatga agcacacaag tttgtttgct tttcgtgcat gatattaaat 5400agcttggcag
caacaggact aggatgagta gcagcacgtt ccttatatgt agctttcgac 5460atgatttatc
ttcgtttcct gcaggttttt gttctgtgca gttgggttaa gaatactggg 5520caatttcatg
tttcttcaac accacatatg cgtatatata ccaatctaag tctgtgctcc 5580ttccttcgtt
cttccttctg ctcggagatt accgaatcaa agctagctta tcgatgataa 5640gctgtcaaag
atgagaatta attccacgga ctatagacta tactagatac tccgtctact 5700gtacgataca
cttccgctca ggtccttgtc ctttaacgag gccttaccac tcttttgtta 5760ctctattgat
ccagctcagc aaaggcagtg tgatctaaga ttctatcttc gcgatgtagt 5820aaaactagct
agaccgagaa agagactaga aatgcaaaag gcacttctac aatggctgcc 5880atcattatta
tccgatgtga cgctgcagct tctcaatgat attcgaatac gctttgagga 5940gatacagcct
aatatccgac aaactgtttt acagatttac gatcgtactt gttacccatc 6000attgaatttt
gaacatccga acctgggagt tttccctgaa acagatagta tatttgaacc 6060tgtataataa
tatatagtct agcgctttac ggaagacaat gtatgtattt cggttcctgg 6120agaaactatt
gcatctattg cataggtaat cttgcacgtc gcatccccgg ttcattttct 6180gcgtttccat
cttgcacttc aatagcatat ctttgttaac gaagcatctg tgcttcattt 6240tgtagaacaa
aaatgcaacg cgagagcgct aatttttcaa acaaagaatc tgagctgcat 6300ttttacagaa
cagaaatgca acgcgaaagc gctattttac caacgaagaa tctgtgcttc 6360atttttgtaa
aacaaaaatg caacgcgacg agagcgctaa tttttcaaac aaagaatctg 6420agctgcattt
ttacagaaca gaaatgcaac gcgagagcgc tattttacca acaaagaatc 6480tatacttctt
ttttgttcta caaaaatgca tcccgagagc gctatttttc taacaaagca 6540tcttagatta
ctttttttct cctttgtgcg ctctataatg cagtctcttg ataacttttt 6600gcactgtagg
tccgttaagg ttagaagaag gctactttgg tgtctatttt ctcttccata 6660aaaaaagcct
gactccactt cccgcgttta ctgattacta gcgaagctgc gggtgcattt 6720tttcaagata
aaggcatccc cgattatatt ctataccgat gtggattgcg catactttgt 6780gaacagaaag
tgatagcgtt gatgattctt cattggtcag aaaattatga acggtttctt 6840ctattttgtc
tctatatact acgtatagga aatgtttaca ttttcgtatt gttttcgatt 6900cactctatga
atagttctta ctacaatttt tttgtctaaa gagtaatact agagataaac 6960ataaaaaatg
tagaggtcga gtttagatgc aagttcaagg agcgaaaggt ggatgggtag 7020gttatatagg
gatatagcac agagatatat agcaaagaga tacttttgag caat
7074897331DNAArtificial SequenceSynthetic Nucleic Acid 89gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat atgagcggat tagttgaccc 120ggatagtact
tttttaaaga ccatcggaaa actgagcaac agcttgtcca ttggtcgtgg 180agtagatcaa
aaagaggtaa tccccaaagg ctggaacgcc cattgggagg caattacaaa 240gcttaagaga
agctttgacg cgattcctgc tggggagcgg gtgcgtttag ctaagaaaac 300ctccaacctg
ttccgtggac gctccgatgc aggtcacggc ctagatgtgg cagcgcttgg 360gggagtgatt
gccattgatc cggtcaatgc caccgccgat gtacagggca tgtgcacgta 420tgaagacctg
gtagatgcca ctttaagtta tggtctgatg ccgttggttg tgcctcaact 480gaaaaccatc
acgcttggtg gcgcagtgac cggaatgggc gtggaatcca catccttccg 540caacggtttg
ccacacgaat cagtgctgga gatggatatt tttaccggca ctggtgagat 600cgtgacttgc
tcgcccacag aaaatgtcga cctttacaga ggttttccca actcttatgg 660ttcgctggga
tacgcggtgc ggctaaaaat tgagctggaa ccagtgcaag attacgtcca 720gctgcgccac
gtgcgcttca acgatttaga gtctttgacc aaagcgattg aggaagtcgc 780gtcttctctg
gagtttgata accaacccgt cgattacctt gacggcgtgg tgttttcacc 840cacggaagcc
tacttagttc ttggcacgca aacctcacaa cctggcccca ccagcgatta 900caccagggat
ttaagctact accgctccct gcaacaccca gagggcatca cctatgaccg 960cctgacaatc
cgcgattaca tctggcgctg ggacaccgac tggttctggt gttcacgcgc 1020attcggcacc
caaaaccccg tggtgcgcaa actctggccc agggatctgc tgcgctcgag 1080tttctattgg
aagatcatcg gctgggatcg aaaatactcc atcgctgatc gcctggaaga 1140gcgcaaaggc
cgcccggcta gggaacgggt ggtccaagac gtggaagtta cgattgataa 1200actgccagaa
tttttgaaat ggttctttga aagcagcgac atcgagccgc tgtggctgtg 1260cccgatcaag
cttcgggagg taccaggtag ttcggttggt gctggagaaa ttttgagctc 1320cgctgaagca
atcgactccg gtgctgctga acacccttgg ccgctgtatc ccttgaagaa 1380ggacgtgctg
tgggtcaaca tcggattctg gtcctcagtg ccggttgatc tgatgggctc 1440cgatgcacca
gagggagcat ttaacagaga aatcgaacgc gtcatggcag agctaggcgg 1500acataaatcg
ctgtactccg aagcgttcta caccagggaa gactttgaaa aactttatgg 1560cggaaccatc
ccggcgctgc taaaaaagca gtgggatccc cacagccgat tccccggttt 1620gtatgaaaag
acagtaaaag gcgcctagga tcgctcactg taggtagagg cttgtggtca 1680ctacttgtgg
ccacatttta aaaaaatgca caagaagaga aagcaaagca ttatgagtaa 1740cgccgtagcg
caggacctca tgaccatcgc cgacatcgtc gaggccacga ccactgcacc 1800catcccattc
cacatcactg ccttcgatgg aagcttcact ggccctgaag atgctcccta 1860ccagctgttt
gttgccaaca cggatgcagt atcctacatc gcaacagcgc caggagattt 1920gggtttggca
cgtgcctacc tcatgggaga cctcatcgtg gaaggtgagc atcccggcca 1980tccttatggg
atctttgatg cgttgaagga gttctaccgc tgcttcaaac gcccagatgc 2040atccaccacc
ttgcagatca tgtggactct gcggaaaatg aatgccttaa aattccagga 2100aattccacca
atggaacaag cccctgcatg gcgtaaagca ctgatcaacg ggctagcatc 2160caggcactcg
aaatcccgcg acaagaaagc cattagctac cactacgacg tgggcaatga 2220gttctactcc
ctgtttttag atgattccat gacctatacc tgcgcgtatt atccaacgcc 2280agaatcaagt
ttggaagaag cccaagaaaa caaataccgc ctcatctttg aaaaactgcg 2340tctgaaagaa
ggcgatcgcc tcctagacgt gggatgcggt tggggaggca tggtccgcta 2400cgccgccaaa
cacggtgtga aagccatcgg agttacgctg tctgaacagc aatatgagtg 2460gggtcaagca
gagatcaaac gccaaggttt ggaagacctc gcggaaattc gcttcatgga 2520ttaccgcgat
gttccagaaa ctggattcga tgcgatctca gcaatcggca tcattgaaca 2580catcggtgtg
aacaactatc ccgactactt tgaattgctc agcagcaaac tcaaaacagg 2640cggactgatg
ctcaaccaca gcatcaccta cccagacaac cgcccccgcc acgcaggtgc 2700atttattgat
cgctacattt tccccgacgg tgaactcact ggctctggca ccctgatcaa 2760gcacatgcag
gacaacggtt tcgaagtgct gcacgaagaa aacctccgct ttgattacca 2820acgcaccctg
cacgcgtggt gcgaaaacct caaagaaaat tgggaggaag cagttgaact 2880cgccggtgaa
cccactgcac gactctttgg cctgtacatg gcaggttcgg aatggggatt 2940tgcccacaac
atcgtccagc tgcaccaagt actgggtgtg aaactcgatg agcagggaag 3000tcgcggagaa
gttcctgaaa gaatgtggtg gactatctaa cctcaaaata tattttccct 3060ctatcttctc
gttgcgctta atttgactaa ttctcattag cgaggcgcgc ctttccatag 3120gctccgcccc
cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc 3180gacaggacta
taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt 3240tccgaccctg
ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct 3300ttctcatagc
tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg 3360ctgtgtgcac
gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct 3420tgagtccaac
ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat 3480tagcagagcg
aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg 3540ctacactaga
agaacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa 3600aagagttggt
agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt 3660ttgcaagcag
cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc 3720tacggggtct
gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt 3780atcaaaaagg
atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta 3840aagtatatat
gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat 3900ctcagcgatc
tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac 3960tacgatacgg
gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg 4020ctcaccggct
ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag 4080tggtcctgca
actttatccg cctccatcca gtctattaat tgttgccggg aagctagagt 4140aagtagttcg
ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt 4200gtcacgctcg
tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt 4260tacatgatcc
cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt 4320cagaagtaag
ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct 4380tactgtcatg
ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt 4440ctgagaatag
tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac 4500cgcgccacat
agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa 4560actctcaagg
atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa 4620ctgatcttca
gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca 4680aaatgccgca
aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct 4740ttttcaatat
tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga 4800atgtatttag
aaaaataaac agcgatcgcg cggccgcggg taataactga tataattaaa 4860ttgaagctct
aatttgtgag tttagtatac atgcatttac ttataataca gttttttagt 4920tttgctggcc
gcatcttctc aaatatgctt cccagcctgc ttttctgtaa cgttcaccct 4980ctaccttagc
atcccttccc tttgcaaata gtcctcttcc aacaataata atgtcagatc 5040ctgtagagac
cacatcatcc acggttctat actgttgacc caatgcgtct cccttgtcat 5100ctaaacccac
accgggtgtc ataatcaacc aatcgtaacc ttcatctctt ccacccatgt 5160ctctttgagc
aataaagccg ataacaaaat ctttgtcgct cttcgcaatg tcaacagtac 5220ccttagtata
ttctccagta gctagggagc ccttgcatga caattctgct aacatcaaaa 5280ggcctctagg
ttcctttgtt acttcttccg ccgcctgctt caaaccgcta acaatacctg 5340ggcccaccac
accgtgtgca ttcgtaatgt ctgcccattc tgctattctg tatacacccg 5400cagagtactg
caatttgact gtattaccaa tgtcagcaaa ttttctgtct tcgaagagta 5460aaaaattgta
cttggcggat aatgccttta gcggcttaac tgtgccctcc atggaaaaat 5520cagtcaagat
atccacatgt gtttttagta aacaaatttt gggacctaat gcttcaacta 5580actccagtaa
ttccttggtg gtacgaacat ccaatgaagc acacaagttt gtttgctttt 5640cgtgcatgat
attaaatagc ttggcagcaa caggactagg atgagtagca gcacgttcct 5700tatatgtagc
tttcgacatg atttatcttc gtttcctgca ggtttttgtt ctgtgcagtt 5760gggttaagaa
tactgggcaa tttcatgttt cttcaacacc acatatgcgt atatatacca 5820atctaagtct
gtgctccttc cttcgttctt ccttctgctc ggagattacc gaatcaaagc 5880tagcttatcg
atgataagct gtcaaagatg agaattaatt ccacggacta tagactatac 5940tagatactcc
gtctactgta cgatacactt ccgctcaggt ccttgtcctt taacgaggcc 6000ttaccactct
tttgttactc tattgatcca gctcagcaaa ggcagtgtga tctaagattc 6060tatcttcgcg
atgtagtaaa actagctaga ccgagaaaga gactagaaat gcaaaaggca 6120cttctacaat
ggctgccatc attattatcc gatgtgacgc tgcagcttct caatgatatt 6180cgaatacgct
ttgaggagat acagcctaat atccgacaaa ctgttttaca gatttacgat 6240cgtacttgtt
acccatcatt gaattttgaa catccgaacc tgggagtttt ccctgaaaca 6300gatagtatat
ttgaacctgt ataataatat atagtctagc gctttacgga agacaatgta 6360tgtatttcgg
ttcctggaga aactattgca tctattgcat aggtaatctt gcacgtcgca 6420tccccggttc
attttctgcg tttccatctt gcacttcaat agcatatctt tgttaacgaa 6480gcatctgtgc
ttcattttgt agaacaaaaa tgcaacgcga gagcgctaat ttttcaaaca 6540aagaatctga
gctgcatttt tacagaacag aaatgcaacg cgaaagcgct attttaccaa 6600cgaagaatct
gtgcttcatt tttgtaaaac aaaaatgcaa cgcgacgaga gcgctaattt 6660ttcaaacaaa
gaatctgagc tgcattttta cagaacagaa atgcaacgcg agagcgctat 6720tttaccaaca
aagaatctat acttcttttt tgttctacaa aaatgcatcc cgagagcgct 6780atttttctaa
caaagcatct tagattactt tttttctcct ttgtgcgctc tataatgcag 6840tctcttgata
actttttgca ctgtaggtcc gttaaggtta gaagaaggct actttggtgt 6900ctattttctc
ttccataaaa aaagcctgac tccacttccc gcgtttactg attactagcg 6960aagctgcggg
tgcatttttt caagataaag gcatccccga ttatattcta taccgatgtg 7020gattgcgcat
actttgtgaa cagaaagtga tagcgttgat gattcttcat tggtcagaaa 7080attatgaacg
gtttcttcta ttttgtctct atatactacg tataggaaat gtttacattt 7140tcgtattgtt
ttcgattcac tctatgaata gttcttacta caattttttt gtctaaagag 7200taatactaga
gataaacata aaaaatgtag aggtcgagtt tagatgcaag ttcaaggagc 7260gaaaggtgga
tgggtaggtt atatagggat atagcacaga gatatatagc aaagagatac 7320ttttgagcaa t
7331907126DNAArtificial SequenceSynthetic Nucleic Acid 90gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgtccgctc ctgcgaccga 120tgcacgaacc
gcccacgccg acggcgtgga gcgattgctc gagagttatc gggcggtgcc 180ggcggccgca
tcggtgcggc tcgccaagcg cacctcgaac ctcttccggt cccgagcggc 240gacggatgcc
cctggcctcg acacctccgg cctgacccac gtcatcgcgg tcgaccccgg 300ggcgcgcacg
gccgacgtcg ccggcatgtg cacctacgac gacctcgtcg ccgcgacact 360gccgcatggg
ctcgcgccac tcgtggtgcc gcaactgaag accatcaccc tcgggggcgc 420cgtaacggga
ctcggcatcg agtcgacgtc gttccgcaac ggtctgccgc acgagtcggt 480gctcgagatc
gacgtgctca ccggcgcagg cgagatcatc acggcgtcgc cgatcgagca 540cgcagagctg
ttccgcgcct tccccaactc gtacggcacc ctcggctacg ccgtgcgcct 600gcgcatcgag
ctcgagccgg tcgagccgtt cgtcgcactc acgcaccttc ggttccatgc 660gctcaccgac
ctcatcgagg caatggagcg catcatcgag accggtcgac tcgacggggt 720tgccgtcgat
tccctcgacg gcgtggtgtt cagcgctgaa gagagctacc tgtgcgtcgg 780cacgcagacc
gcggcatccg gcccggtcag cgactacacc cgccagcaga tcttctatcg 840ctccatccag
catgacgacg gtgcgaagca cgaccggctc accatgcacg actacctgtg 900gcgctgggac
gccgactggt tctggtgctc gcaggcgttc ggcgcgcagc atccgctgat 960tcgccggttc
tggccgcggc gataccggcg cagccgctcg tactcgacgc tcatgcgcct 1020cgaacggcga
ttcgacctcg gcgatcgcct cgagaagctc aagggccggc cggcgcgcga 1080acgcgtgatc
caagacgtcg aggtgccgat cgggcgcacc gtcggcttcc tcgaatggtt 1140cctcgcgaac
gtgccgatcg agccgatctg gttgtgcccg ctgcgcctgc ggggcgaccg 1200cggctggcct
ctctacccga tccggccgca gcagacctac gtcaacatcg gcttctggtc 1260gacggttccg
gtgggcggct ccgagggcga gacgaaccgc tcgatcgagc gcgccgtgag 1320cgagttcgac
ggacacaagt cgctgtactc cgactcgtac tactcgcgcg aggagttcga 1380ggagctctac
ggcggcgagg cgtaccgggc cgtgaagcgg cgatacgacc ccgactctcg 1440actgctcgac
ctctatgcga aggcggtgca acggcgatga ccacgaccaa acgccaggcg 1500acagcggggc
aggctgagac cgcgccgacg acggatgcgg cggccgcacc cgactcgtcg 1560gcgaagctca
ccctcgccga gatcctcgag atcgtcgtcg ccggtcggct gccgctgagg 1620ttcaccgcct
acgacgggag ctcggcgggg ccgcctgacg ccctgttcgg cctcgacctg 1680aagactccgc
gaggaacgac ctatctcgcc accggccgcg gcgatctcgg cctcgcccgc 1740gcctacatcg
cgggcgacct cgagatacag ggggtgcacc ccggagaccc ctacgagctg 1800ctcaaggcac
tcgccgacag cctggtcttc aagctgccac cgccgcgggt gatgacccag 1860atcatccgtt
cgatcggcgt cgaacatctg cggccgatcg cgccgccgcc gcaagaggtg 1920ccgccccggt
ggcgccgcat cgccgagggg ctccgacaca gcaagggccg cgacgccgaa 1980gcgatccacc
accactacga cgtgtcgaac accttctacg aatgggtgct cgggccgtcg 2040atgacctaca
cgtgcgcgtg ctacccgggc ctcgacgcat ccctcgacga ggcgcagcag 2100aacaagtacc
ggctcgtgtt cgagaagctg cggctgaagc cgggcgaccg actgctcgac 2160gtcggctgcg
ggtggggcgg catggtgcgc tacgccgcgc gccacggcgt gcaggcgttg 2220ggcgtgaccc
tgtcgcgaga gcagacggcg tgggcgcagc aggcgatcgc cgtcgagggc 2280ctcgccgacc
tcgccgaggt gcgctacggc gactaccgcg acatcgccga agacggcttc 2340gatgcggtgt
catcgatcgg gctgctcgag cacatcggcg tgcgcaacta cgcttcgtat 2400ttcggctttc
tgcagtcgcg cttgcggccc gggggactct tgctcaacca ctgcatcacc 2460cggcccgaca
atcgctccga gccgtcggcg cgcggcttca tcgaccggta cgtgttcccc 2520gacggagagc
tcaccggctc gggccgcatc atcaccgagg cgcaggatgt cggcttcgaa 2580gtgctgcacg
aagagaacct gcgtcagcat tatgcactga cactgcgcga ttggtgcgcc 2640aacctcgtcg
cgcactggga agaggcggtc gccgaggtcg ggctgccgac cgcgaaggtg 2700tggggcctct
acatggccgg gtcacggctc gcgttcgaga gcggcggcat ccagttgcac 2760caggtgctgg
cggtcagacc agacgatcgc agcgacgccg cccagctgcc gctgcggccg 2820tggtggacgc
catagcctca aaatatattt tccctctatc ttctcgttgc gcttaatttg 2880actaattctc
attagcgagg cgcgcctttc cataggctcc gcccccctga cgagcatcac 2940aaaaatcgac
gctcaagtca gaggtggcga aacccgacag gactataaag ataccaggcg 3000tttccccctg
gaagctccct cgtgcgctct cctgttccga ccctgccgct taccggatac 3060ctgtccgcct
ttctcccttc gggaagcgtg gcgctttctc atagctcacg ctgtaggtat 3120ctcagttcgg
tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc ccccgttcag 3180cccgaccgct
gcgccttatc cggtaactat cgtcttgagt ccaacccggt aagacacgac 3240ttatcgccac
tggcagcagc cactggtaac aggattagca gagcgaggta tgtaggcggt 3300gctacagagt
tcttgaagtg gtggcctaac tacggctaca ctagaagaac agtatttggt 3360atctgcgctc
tgctgaagcc agttaccttc ggaaaaagag ttggtagctc ttgatccggc 3420aaacaaacca
ccgctggtag cggtggtttt tttgtttgca agcagcagat tacgcgcaga 3480aaaaaaggat
ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc tcagtggaac 3540gaaaactcac
gttaagggat tttggtcatg agattatcaa aaaggatctt cacctagatc 3600cttttaaatt
aaaaatgaag ttttaaatca atctaaagta tatatgagta aacttggtct 3660gacagttacc
aatgcttaat cagtgaggca cctatctcag cgatctgtct atttcgttca 3720tccatagttg
cctgactccc cgtcgtgtag ataactacga tacgggaggg cttaccatct 3780ggccccagtg
ctgcaatgat accgcgagac ccacgctcac cggctccaga tttatcagca 3840ataaaccagc
cagccggaag ggccgagcgc agaagtggtc ctgcaacttt atccgcctcc 3900atccagtcta
ttaattgttg ccgggaagct agagtaagta gttcgccagt taatagtttg 3960cgcaacgttg
ttgccattgc tacaggcatc gtggtgtcac gctcgtcgtt tggtatggct 4020tcattcagct
ccggttccca acgatcaagg cgagttacat gatcccccat gttgtgcaaa 4080aaagcggtta
gctccttcgg tcctccgatc gttgtcagaa gtaagttggc cgcagtgtta 4140tcactcatgg
ttatggcagc actgcataat tctcttactg tcatgccatc cgtaagatgc 4200ttttctgtga
ctggtgagta ctcaaccaag tcattctgag aatagtgtat gcggcgaccg 4260agttgctctt
gcccggcgtc aatacgggat aataccgcgc cacatagcag aactttaaaa 4320gtgctcatca
ttggaaaacg ttcttcgggg cgaaaactct caaggatctt accgctgttg 4380agatccagtt
cgatgtaacc cactcgtgca cccaactgat cttcagcatc ttttactttc 4440accagcgttt
ctgggtgagc aaaaacagga aggcaaaatg ccgcaaaaaa gggaataagg 4500gcgacacgga
aatgttgaat actcatactc ttcctttttc aatattattg aagcatttat 4560cagggttatt
gtctcatgag cggatacata tttgaatgta tttagaaaaa taaacagcga 4620tcgcgcggcc
gcgggtaata actgatataa ttaaattgaa gctctaattt gtgagtttag 4680tatacatgca
tttacttata atacagtttt ttagttttgc tggccgcatc ttctcaaata 4740tgcttcccag
cctgcttttc tgtaacgttc accctctacc ttagcatccc ttccctttgc 4800aaatagtcct
cttccaacaa taataatgtc agatcctgta gagaccacat catccacggt 4860tctatactgt
tgacccaatg cgtctccctt gtcatctaaa cccacaccgg gtgtcataat 4920caaccaatcg
taaccttcat ctcttccacc catgtctctt tgagcaataa agccgataac 4980aaaatctttg
tcgctcttcg caatgtcaac agtaccctta gtatattctc cagtagctag 5040ggagcccttg
catgacaatt ctgctaacat caaaaggcct ctaggttcct ttgttacttc 5100ttccgccgcc
tgcttcaaac cgctaacaat acctgggccc accacaccgt gtgcattcgt 5160aatgtctgcc
cattctgcta ttctgtatac acccgcagag tactgcaatt tgactgtatt 5220accaatgtca
gcaaattttc tgtcttcgaa gagtaaaaaa ttgtacttgg cggataatgc 5280ctttagcggc
ttaactgtgc cctccatgga aaaatcagtc aagatatcca catgtgtttt 5340tagtaaacaa
attttgggac ctaatgcttc aactaactcc agtaattcct tggtggtacg 5400aacatccaat
gaagcacaca agtttgtttg cttttcgtgc atgatattaa atagcttggc 5460agcaacagga
ctaggatgag tagcagcacg ttccttatat gtagctttcg acatgattta 5520tcttcgtttc
ctgcaggttt ttgttctgtg cagttgggtt aagaatactg ggcaatttca 5580tgtttcttca
acaccacata tgcgtatata taccaatcta agtctgtgct ccttccttcg 5640ttcttccttc
tgctcggaga ttaccgaatc aaagctagct tatcgatgat aagctgtcaa 5700agatgagaat
taattccacg gactatagac tatactagat actccgtcta ctgtacgata 5760cacttccgct
caggtccttg tcctttaacg aggccttacc actcttttgt tactctattg 5820atccagctca
gcaaaggcag tgtgatctaa gattctatct tcgcgatgta gtaaaactag 5880ctagaccgag
aaagagacta gaaatgcaaa aggcacttct acaatggctg ccatcattat 5940tatccgatgt
gacgctgcag cttctcaatg atattcgaat acgctttgag gagatacagc 6000ctaatatccg
acaaactgtt ttacagattt acgatcgtac ttgttaccca tcattgaatt 6060ttgaacatcc
gaacctggga gttttccctg aaacagatag tatatttgaa cctgtataat 6120aatatatagt
ctagcgcttt acggaagaca atgtatgtat ttcggttcct ggagaaacta 6180ttgcatctat
tgcataggta atcttgcacg tcgcatcccc ggttcatttt ctgcgtttcc 6240atcttgcact
tcaatagcat atctttgtta acgaagcatc tgtgcttcat tttgtagaac 6300aaaaatgcaa
cgcgagagcg ctaatttttc aaacaaagaa tctgagctgc atttttacag 6360aacagaaatg
caacgcgaaa gcgctatttt accaacgaag aatctgtgct tcatttttgt 6420aaaacaaaaa
tgcaacgcga cgagagcgct aatttttcaa acaaagaatc tgagctgcat 6480ttttacagaa
cagaaatgca acgcgagagc gctattttac caacaaagaa tctatacttc 6540ttttttgttc
tacaaaaatg catcccgaga gcgctatttt tctaacaaag catcttagat 6600tacttttttt
ctcctttgtg cgctctataa tgcagtctct tgataacttt ttgcactgta 6660ggtccgttaa
ggttagaaga aggctacttt ggtgtctatt ttctcttcca taaaaaaagc 6720ctgactccac
ttcccgcgtt tactgattac tagcgaagct gcgggtgcat tttttcaaga 6780taaaggcatc
cccgattata ttctataccg atgtggattg cgcatacttt gtgaacagaa 6840agtgatagcg
ttgatgattc ttcattggtc agaaaattat gaacggtttc ttctattttg 6900tctctatata
ctacgtatag gaaatgttta cattttcgta ttgttttcga ttcactctat 6960gaatagttct
tactacaatt tttttgtcta aagagtaata ctagagataa acataaaaaa 7020tgtagaggtc
gagtttagat gcaagttcaa ggagcgaaag gtggatgggt aggttatata 7080gggatatagc
acagagatat atagcaaaga gatacttttg agcaat
7126917505DNAArtificial SequenceSynthetic Nucleic Acid 91gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgtctgttg ccgtaaccga 120cgcacgatcc
gcctacgccc acggcgtgca gcggctggtc gcgagttacc gcgccatccc 180cgccggcgcc
accgtccgcc tggccaaacc cacgtccaac ctgttccgcg ccagggcgaa 240gagcaccgcg
gcgggcctcg acacctccgg cctgacacat gtgatcgccg tggaccccga 300gacgcgcacc
gccgaggtcg cggggatgtg cacctacgag gacctggtgg cggcgacgct 360gccccacggg
ctttcaccgc tggtggtccc gcaactcaag acgatcaccc tcggcggcgc 420cgtcaccggg
ctcggcatcg agtcggcgtc gttccgcaac ggccttccgc acgaatcggt 480cctggagatg
gacatcctca ccgggaccgg cgacatcgtg cgcgccgcgc ccgacgagaa 540tcccgacctt
ttccgcacct tcccgaattc ttatggaacg ctgggttact cggttcggct 600gaagatcgag
ctggagccgg tgaagccgtt cgtggcgtta cgccatctcc gcttccactc 660actgtcgaca
ctcatcgcga cgatggaccg catcgtcgac accgggagtc tcgacggtga 720gcaggtcgac
tatctcgacg gagtggtgtt cagcgccgag gagagctacc tgtgcgtcgg 780aacacgttcc
gcgacaccgg gtcctgtcag cgactacacc ggcgagcaca tcttctaccg 840gtcgatccag
cacgattgcc cgaccgaagg cggacagaag cacgaccggc tgacggcgca 900cgactacttc
tggcgctggg acaccgactg gttctggtgc tcaagggcat tcggcgcgca 960gaacccgaag
gtccgtcggt ggtggccccg acggctccgg cgcagcagct tctactggaa 1020gctcgtcggc
tacgaccagc gtttcggcat cgccgaccgg atcgagaaac accacggccg 1080gccaccgcgc
gaacgcgtcg tccaggacgt cgaggtcccc atcgagcgca ccgtcgaatt 1140cctgcagtgg
ttcctcgaca cgatcccgat agagccgctc tggttgtgcc cgttgcgact 1200tcgcgatgac
aacagctggt cgctgtaccc gctccggccc catcgcacgt atgtcaacgt 1260gggattctgg
tcgtcggtgc ccgtcgggcc ggaggagggt cacaccaaca agctgatcga 1320acgcaggatc
agcgagctgg agggacacaa gtcgctgtac tccgacgcct tctattcggc 1380cgacgagttc
gacgcgctgt acggcggcga gatctaccgg accgtgaaga agacctacga 1440cccagattct
cgtttcctcg acctctatgc gaaggcggtg cgacggcaat gacgactttt 1500cgggaacata
ccgacagttc ggcgtccgac ccggatcgga aactcacttt ggcagaggtg 1560ttggagatct
tcgccgcggg tcgccgtccg ctgaagttca ccgcctatga cggaagtagt 1620tgcgggcctg
aggatgcgac actgggcctc gacctgctga ccccgcgggg cacgacctac 1680ctggccacgg
cgccgggtga tctcggcctg gcgcgggcct acatcgccgg cgatctgcgc 1740ctcagtggtg
tgcatcccgg cgatccccat gacctgctca cggcgctgac ggaacgcctg 1800gagtacaggc
gtccgccggt gcgagtgctg gccaatgttc tgcgctccat cgggatcgag 1860cacctcaagc
ccgtcgcgcc gccaccccag gagcacctgc cgcggtggcg gcggatcgca 1920gaggggttgc
ggcacagcaa gacccgtgac gctgaggcca tccagcacca ctacgacgtc 1980tcgaacacgt
tctactcatg ggtcctgggt ccgtcgatga cctacacctg cgcctgctat 2040ccacacccgg
atgccacgct ggaggaggcg caggagaaca agtaccggct ggtgttcgag 2100aagcttcgac
tcaagcccgg tgaccggctg ctcgacgtcg gttgcggctg gggcggaatg 2160gtccgctacg
ccgcccggca cggggtcaag gtcctggggg tgacgctgtc gaaggagcag 2220gcgcagtggg
cggccgacgc agtcgagcgg gacggcctgg gtgagttggc cgaggtccgc 2280cacggcgact
accgcgacgt gcgcgagtcg cacttcgacg cagtgtcctc gctcgggctc 2340accgagcaca
tcggcgtcgc gaactacccg tcgtacttcc gcttcctgaa gtcgaaactg 2400cggccgggtg
gcctgctgct caaccactgc atcacccgaa acaacaaccg gagtcacgcc 2460accgcaggcg
gattcatcga tcgctatgtc tttcccgacg gggagctgac ggggtcgggg 2520cgaatcatca
ccgaaatgca ggacgtcgga ctcgaggtcg tgcacgagga gaatctgcgt 2580caccactacg
cgctgacgct gcgcgactgg agccgcaacc tggtcgcgca ctgggacgac 2640gcggtgaccg
aggtcggtct gccgactgcc aaggtgtggg gcctctacat cgccgcgtcg 2700cgagtcggct
tcgagcagaa cgccattcag ctgcaccagg tgctgtcggt caagctcgac 2760gagcgtggct
cggacggcgg actgccgtta cgaccctggt ggaacgccta gccactatgc 2820tctgcccatg
atccggttcc tgctgcgcat cgcggtcttt ctgggctcat cagcgatcgg 2880gctcctcgtc
gccggatggc tggtgcccgg ggtgtcgctg tcggtgtggg gcttcgtcac 2940ggcagtggtg
atcttcaccg tggcgcaggc gatcctgtcc ccgttcttcc tcaagatggc 3000cagccgctac
gcctcggcgt tcctcggcgg gatcggtctg gtgtcgacgt ttgccgcgct 3060gctgctcgtc
tcgctgctgt ccaacggtct gagcatccgc ggcatcggat cctggatcgc 3120cgcaaccgtg
gtggtctggt tggtgaccgc cctggcgacg ctggtgctgc cgatgttggt 3180gctgcgcgag
aagaaaaccg cgtcgcgcgt ctgacctcaa aatatatttt ccctctatct 3240tctcgttgcg
cttaatttga ctaattctca ttagcgaggc gcgcctttcc ataggctccg 3300cccccctgac
gagcatcaca aaaatcgacg ctcaagtcag aggtggcgaa acccgacagg 3360actataaaga
taccaggcgt ttccccctgg aagctccctc gtgcgctctc ctgttccgac 3420cctgccgctt
accggatacc tgtccgcctt tctcccttcg ggaagcgtgg cgctttctca 3480tagctcacgc
tgtaggtatc tcagttcggt gtaggtcgtt cgctccaagc tgggctgtgt 3540gcacgaaccc
cccgttcagc ccgaccgctg cgccttatcc ggtaactatc gtcttgagtc 3600caacccggta
agacacgact tatcgccact ggcagcagcc actggtaaca ggattagcag 3660agcgaggtat
gtaggcggtg ctacagagtt cttgaagtgg tggcctaact acggctacac 3720tagaagaaca
gtatttggta tctgcgctct gctgaagcca gttaccttcg gaaaaagagt 3780tggtagctct
tgatccggca aacaaaccac cgctggtagc ggtggttttt ttgtttgcaa 3840gcagcagatt
acgcgcagaa aaaaaggatc tcaagaagat cctttgatct tttctacggg 3900gtctgacgct
cagtggaacg aaaactcacg ttaagggatt ttggtcatga gattatcaaa 3960aaggatcttc
acctagatcc ttttaaatta aaaatgaagt tttaaatcaa tctaaagtat 4020atatgagtaa
acttggtctg acagttacca atgcttaatc agtgaggcac ctatctcagc 4080gatctgtcta
tttcgttcat ccatagttgc ctgactcccc gtcgtgtaga taactacgat 4140acgggagggc
ttaccatctg gccccagtgc tgcaatgata ccgcgagacc cacgctcacc 4200ggctccagat
ttatcagcaa taaaccagcc agccggaagg gccgagcgca gaagtggtcc 4260tgcaacttta
tccgcctcca tccagtctat taattgttgc cgggaagcta gagtaagtag 4320ttcgccagtt
aatagtttgc gcaacgttgt tgccattgct acaggcatcg tggtgtcacg 4380ctcgtcgttt
ggtatggctt cattcagctc cggttcccaa cgatcaaggc gagttacatg 4440atcccccatg
ttgtgcaaaa aagcggttag ctccttcggt cctccgatcg ttgtcagaag 4500taagttggcc
gcagtgttat cactcatggt tatggcagca ctgcataatt ctcttactgt 4560catgccatcc
gtaagatgct tttctgtgac tggtgagtac tcaaccaagt cattctgaga 4620atagtgtatg
cggcgaccga gttgctcttg cccggcgtca atacgggata ataccgcgcc 4680acatagcaga
actttaaaag tgctcatcat tggaaaacgt tcttcggggc gaaaactctc 4740aaggatctta
ccgctgttga gatccagttc gatgtaaccc actcgtgcac ccaactgatc 4800ttcagcatct
tttactttca ccagcgtttc tgggtgagca aaaacaggaa ggcaaaatgc 4860cgcaaaaaag
ggaataaggg cgacacggaa atgttgaata ctcatactct tcctttttca 4920atattattga
agcatttatc agggttattg tctcatgagc ggatacatat ttgaatgtat 4980ttagaaaaat
aaacagcgat cgcgcggccg cgggtaataa ctgatataat taaattgaag 5040ctctaatttg
tgagtttagt atacatgcat ttacttataa tacagttttt tagttttgct 5100ggccgcatct
tctcaaatat gcttcccagc ctgcttttct gtaacgttca ccctctacct 5160tagcatccct
tccctttgca aatagtcctc ttccaacaat aataatgtca gatcctgtag 5220agaccacatc
atccacggtt ctatactgtt gacccaatgc gtctcccttg tcatctaaac 5280ccacaccggg
tgtcataatc aaccaatcgt aaccttcatc tcttccaccc atgtctcttt 5340gagcaataaa
gccgataaca aaatctttgt cgctcttcgc aatgtcaaca gtacccttag 5400tatattctcc
agtagctagg gagcccttgc atgacaattc tgctaacatc aaaaggcctc 5460taggttcctt
tgttacttct tccgccgcct gcttcaaacc gctaacaata cctgggccca 5520ccacaccgtg
tgcattcgta atgtctgccc attctgctat tctgtataca cccgcagagt 5580actgcaattt
gactgtatta ccaatgtcag caaattttct gtcttcgaag agtaaaaaat 5640tgtacttggc
ggataatgcc tttagcggct taactgtgcc ctccatggaa aaatcagtca 5700agatatccac
atgtgttttt agtaaacaaa ttttgggacc taatgcttca actaactcca 5760gtaattcctt
ggtggtacga acatccaatg aagcacacaa gtttgtttgc ttttcgtgca 5820tgatattaaa
tagcttggca gcaacaggac taggatgagt agcagcacgt tccttatatg 5880tagctttcga
catgatttat cttcgtttcc tgcaggtttt tgttctgtgc agttgggtta 5940agaatactgg
gcaatttcat gtttcttcaa caccacatat gcgtatatat accaatctaa 6000gtctgtgctc
cttccttcgt tcttccttct gctcggagat taccgaatca aagctagctt 6060atcgatgata
agctgtcaaa gatgagaatt aattccacgg actatagact atactagata 6120ctccgtctac
tgtacgatac acttccgctc aggtccttgt cctttaacga ggccttacca 6180ctcttttgtt
actctattga tccagctcag caaaggcagt gtgatctaag attctatctt 6240cgcgatgtag
taaaactagc tagaccgaga aagagactag aaatgcaaaa ggcacttcta 6300caatggctgc
catcattatt atccgatgtg acgctgcagc ttctcaatga tattcgaata 6360cgctttgagg
agatacagcc taatatccga caaactgttt tacagattta cgatcgtact 6420tgttacccat
cattgaattt tgaacatccg aacctgggag ttttccctga aacagatagt 6480atatttgaac
ctgtataata atatatagtc tagcgcttta cggaagacaa tgtatgtatt 6540tcggttcctg
gagaaactat tgcatctatt gcataggtaa tcttgcacgt cgcatccccg 6600gttcattttc
tgcgtttcca tcttgcactt caatagcata tctttgttaa cgaagcatct 6660gtgcttcatt
ttgtagaaca aaaatgcaac gcgagagcgc taatttttca aacaaagaat 6720ctgagctgca
tttttacaga acagaaatgc aacgcgaaag cgctatttta ccaacgaaga 6780atctgtgctt
catttttgta aaacaaaaat gcaacgcgac gagagcgcta atttttcaaa 6840caaagaatct
gagctgcatt tttacagaac agaaatgcaa cgcgagagcg ctattttacc 6900aacaaagaat
ctatacttct tttttgttct acaaaaatgc atcccgagag cgctattttt 6960ctaacaaagc
atcttagatt actttttttc tcctttgtgc gctctataat gcagtctctt 7020gataactttt
tgcactgtag gtccgttaag gttagaagaa ggctactttg gtgtctattt 7080tctcttccat
aaaaaaagcc tgactccact tcccgcgttt actgattact agcgaagctg 7140cgggtgcatt
ttttcaagat aaaggcatcc ccgattatat tctataccga tgtggattgc 7200gcatactttg
tgaacagaaa gtgatagcgt tgatgattct tcattggtca gaaaattatg 7260aacggtttct
tctattttgt ctctatatac tacgtatagg aaatgtttac attttcgtat 7320tgttttcgat
tcactctatg aatagttctt actacaattt ttttgtctaa agagtaatac 7380tagagataaa
cataaaaaat gtagaggtcg agtttagatg caagttcaag gagcgaaagg 7440tggatgggta
ggttatatag ggatatagca cagagatata tagcaaagag atacttttga 7500gcaat
7505927123DNAArtificial SequenceSynthetic Nucleic Acid 92gtttgtggaa
gcggtattcg caatttaatt aaagctggtg acaattaatc atcggctcgt 60ataatgtgtg
gaattgaatc gatataagga ggttaatcat atgcacgggc tgttgtcgaa 120gactagggta
tatgtggtgc ctgtccttgg atctgcactc tcggcccaca agtcgggcgt 180tgaccggctg
ctggcaagct atcgatccat tcccgcaacg tccgcggtcc ggctggccaa 240accgacgtca
aacctgttcc gcgcccgcac caaacgtgac gcgcccggct tggacacctc 300ggggctgacc
ggcgtcctga gcgtggatcc cgaaacccgc accgcggacg tcgccggcat 360gtgcacctac
gcggacctgg tggccgcaac gctgccctac ggcctgtcgc cgctggtcgt 420cccgcagctg
aagaccatca ccctcggcgg ggcggtcagc ggcctgggga tcgagtcggc 480gtcgtttcgc
aacgggctgc cgcacgaatc ggtgctggag atggatatcc tcaccggcgc 540tggcgatttg
ctcaccgcat cacgtaccca gcacccggac ctgttccgcg ccttcccgaa 600ttcctatggg
acactggggt attcgacccg gcttcggatc gagctggaac ccgtcgcacc 660gttcgtcgcg
ctgcgccaca tccgcttccg ctcgctgccc gcgctgatcg ccgcggccga 720acgcatcgtc
gacaccggcg ggcagggcgg aaccccggtc gactacctcg acggggtggt 780cttcagcgcc
gacgaaagct acctgtgcgt gggccggcgg accaccaccc ccggcccggt 840cagcgactac
accggcaagg acatctacta ccagtccatc cggcacgacg ccccgggcct 900ggaggcgacc
aaggatgacc ggctgaccat gcacgactac ttctggcgct gggacaccga 960ttggttctgg
tgctcgcgcg cgttcggcgt gcaggacccg cgggtgcgac gcttctggcc 1020gcgccgttat
cggcgcagca gcttctactg gaagctgatt tccctggacc ggcgcttcgg 1080gatctccgac
cgcatcgagg cgcgcaacgg gcggccccca cgcgaacggg tggtgcaaga 1140catcgagatt
ccaatcgaac ggacctgcga cttcctggag tggttcctgg acaacgtgcc 1200aatcacgccg
atctggttgt gcccgttgcg ccttcgcgac cgcgacggct ggccgttgta 1260cccgatgcgg
ccggatcaca cgtacgtcaa cgtcggcttc tggtcgtcgg tgccgggggg 1320cgcgaccgag
ggcgccgcca accggatgat cgaagaaaag gtgagcgaac tcgacgggca 1380caagtccctg
tactccgatt ccttctactc ccgcgaggac ttcgacgagc tgtacggcgg 1440cgagacctac
aacaccgtca agaaaaccta cgaccccgat tctcgtttac tcgacctcta 1500cgcaaaggcg
gtgcaacggc gatgacgact accaaggaac cccaccgcac gtcgcacggg 1560aaactgagca
tggccgagat cctggaggtc ttcgccgcca ccggccgaca tccgctgaag 1620ttcaccgcct
acgacggcag catcgccggc aacgaggacg ccgaactggg cctggacctt 1680cgcagccccc
gcggcgccac ctatctggcg accgcccccg gcgaactcgg cctcgcccgc 1740gcctacgtgt
cgggcgacct gcaggcctac ggcgtccatc ccggcgaccc gtaccaactg 1800ctcaagacgc
tcaccgatcg ggtggaattc aagcggcccc cggtgcgggt gctggccaac 1860gtcgtgcggt
cgctggggtt cgagcggttg ctgccggtcg cgccgccccc gcaggaggcg 1920ctgccccggt
ggcggcgcat cgccgacggg ctgatgcaca cgaggacccg cgacgccgag 1980gccatccacc
accactacga cgtgtccaac accttctacg aattggtgtt ggggccgtcg 2040atgacctaca
cctgcgcggt gtatcccgat gccgacgcga cactcgaaca ggcgcaggag 2100aacaagtacc
ggctgatctt cgagaagctg cggctgaagg cgggcgaccg gctgctcgac 2160gtcggctgcg
gctggggcgg catggtgcgc tacgcggccc ggcgcggcgt ccgggccacc 2220ggcgccaccc
tgtcggccga acaggcgaag tgggcgcaga aggcgatcgc cgaggaaggc 2280cttgcggacc
tggccgaggt gcgccacacc gactatcggg acgtgggcga ggcggcgttc 2340gacgccgtgt
cctcgatcgg gctgaccgag cacatcggcg tcaagaatta ccccgcctac 2400ttcggcttct
tgaagtcgaa gctgcgcacc ggcggcctgc tgctcaatca ctgcatcacc 2460cgccacgaca
acacgtcgac gtcgttcgcg ggcggattca ccgatcgcta tgtcttcccg 2520gacggggagc
tgaccggctc gggccgcatc acctgcgacg tccaggactg cggcttcgag 2580gtgctgcacg
cggagaactt ccgccaccac tacgcgatga cgctgcgcga ctggtgccgc 2640aatctggtcg
agaactggga cgccgcggtc agcgaggtcg gcctaccgac cgcgaaggtc 2700tggggcctgt
acatggcggc gtcacgggtt gcgttcgagc agaacaacct tcagctgcat 2760cacgtgctgg
cggccaagac cgacgcgcgg ggcgacgacg acctgccgct gcggccgtgg 2820tggacggcct
gacctcaaaa tatattttcc ctctatcttc tcgttgcgct taatttgact 2880aattctcatt
agcgaggcgc gcctttccat aggctccgcc cccctgacga gcatcacaaa 2940aatcgacgct
caagtcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt 3000ccccctggaa
gctccctcgt gcgctctcct gttccgaccc tgccgcttac cggatacctg 3060tccgcctttc
tcccttcggg aagcgtggcg ctttctcata gctcacgctg taggtatctc 3120agttcggtgt
aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc 3180gaccgctgcg
ccttatccgg taactatcgt cttgagtcca acccggtaag acacgactta 3240tcgccactgg
cagcagccac tggtaacagg attagcagag cgaggtatgt aggcggtgct 3300acagagttct
tgaagtggtg gcctaactac ggctacacta gaagaacagt atttggtatc 3360tgcgctctgc
tgaagccagt taccttcgga aaaagagttg gtagctcttg atccggcaaa 3420caaaccaccg
ctggtagcgg tggttttttt gtttgcaagc agcagattac gcgcagaaaa 3480aaaggatctc
aagaagatcc tttgatcttt tctacggggt ctgacgctca gtggaacgaa 3540aactcacgtt
aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt 3600ttaaattaaa
aatgaagttt taaatcaatc taaagtatat atgagtaaac ttggtctgac 3660agttaccaat
gcttaatcag tgaggcacct atctcagcga tctgtctatt tcgttcatcc 3720atagttgcct
gactccccgt cgtgtagata actacgatac gggagggctt accatctggc 3780cccagtgctg
caatgatacc gcgagaccca cgctcaccgg ctccagattt atcagcaata 3840aaccagccag
ccggaagggc cgagcgcaga agtggtcctg caactttatc cgcctccatc 3900cagtctatta
attgttgccg ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc 3960aacgttgttg
ccattgctac aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca 4020ttcagctccg
gttcccaacg atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa 4080gcggttagct
ccttcggtcc tccgatcgtt gtcagaagta agttggccgc agtgttatca 4140ctcatggtta
tggcagcact gcataattct cttactgtca tgccatccgt aagatgcttt 4200tctgtgactg
gtgagtactc aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt 4260tgctcttgcc
cggcgtcaat acgggataat accgcgccac atagcagaac tttaaaagtg 4320ctcatcattg
gaaaacgttc ttcggggcga aaactctcaa ggatcttacc gctgttgaga 4380tccagttcga
tgtaacccac tcgtgcaccc aactgatctt cagcatcttt tactttcacc 4440agcgtttctg
ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg aataagggcg 4500acacggaaat
gttgaatact catactcttc ctttttcaat attattgaag catttatcag 4560ggttattgtc
tcatgagcgg atacatattt gaatgtattt agaaaaataa acagcgatcg 4620cgcggccgcg
ggtaataact gatataatta aattgaagct ctaatttgtg agtttagtat 4680acatgcattt
acttataata cagtttttta gttttgctgg ccgcatcttc tcaaatatgc 4740ttcccagcct
gcttttctgt aacgttcacc ctctacctta gcatcccttc cctttgcaaa 4800tagtcctctt
ccaacaataa taatgtcaga tcctgtagag accacatcat ccacggttct 4860atactgttga
cccaatgcgt ctcccttgtc atctaaaccc acaccgggtg tcataatcaa 4920ccaatcgtaa
ccttcatctc ttccacccat gtctctttga gcaataaagc cgataacaaa 4980atctttgtcg
ctcttcgcaa tgtcaacagt acccttagta tattctccag tagctaggga 5040gcccttgcat
gacaattctg ctaacatcaa aaggcctcta ggttcctttg ttacttcttc 5100cgccgcctgc
ttcaaaccgc taacaatacc tgggcccacc acaccgtgtg cattcgtaat 5160gtctgcccat
tctgctattc tgtatacacc cgcagagtac tgcaatttga ctgtattacc 5220aatgtcagca
aattttctgt cttcgaagag taaaaaattg tacttggcgg ataatgcctt 5280tagcggctta
actgtgccct ccatggaaaa atcagtcaag atatccacat gtgtttttag 5340taaacaaatt
ttgggaccta atgcttcaac taactccagt aattccttgg tggtacgaac 5400atccaatgaa
gcacacaagt ttgtttgctt ttcgtgcatg atattaaata gcttggcagc 5460aacaggacta
ggatgagtag cagcacgttc cttatatgta gctttcgaca tgatttatct 5520tcgtttcctg
caggtttttg ttctgtgcag ttgggttaag aatactgggc aatttcatgt 5580ttcttcaaca
ccacatatgc gtatatatac caatctaagt ctgtgctcct tccttcgttc 5640ttccttctgc
tcggagatta ccgaatcaaa gctagcttat cgatgataag ctgtcaaaga 5700tgagaattaa
ttccacggac tatagactat actagatact ccgtctactg tacgatacac 5760ttccgctcag
gtccttgtcc tttaacgagg ccttaccact cttttgttac tctattgatc 5820cagctcagca
aaggcagtgt gatctaagat tctatcttcg cgatgtagta aaactagcta 5880gaccgagaaa
gagactagaa atgcaaaagg cacttctaca atggctgcca tcattattat 5940ccgatgtgac
gctgcagctt ctcaatgata ttcgaatacg ctttgaggag atacagccta 6000atatccgaca
aactgtttta cagatttacg atcgtacttg ttacccatca ttgaattttg 6060aacatccgaa
cctgggagtt ttccctgaaa cagatagtat atttgaacct gtataataat 6120atatagtcta
gcgctttacg gaagacaatg tatgtatttc ggttcctgga gaaactattg 6180catctattgc
ataggtaatc ttgcacgtcg catccccggt tcattttctg cgtttccatc 6240ttgcacttca
atagcatatc tttgttaacg aagcatctgt gcttcatttt gtagaacaaa 6300aatgcaacgc
gagagcgcta atttttcaaa caaagaatct gagctgcatt tttacagaac 6360agaaatgcaa
cgcgaaagcg ctattttacc aacgaagaat ctgtgcttca tttttgtaaa 6420acaaaaatgc
aacgcgacga gagcgctaat ttttcaaaca aagaatctga gctgcatttt 6480tacagaacag
aaatgcaacg cgagagcgct attttaccaa caaagaatct atacttcttt 6540tttgttctac
aaaaatgcat cccgagagcg ctatttttct aacaaagcat cttagattac 6600tttttttctc
ctttgtgcgc tctataatgc agtctcttga taactttttg cactgtaggt 6660ccgttaaggt
tagaagaagg ctactttggt gtctattttc tcttccataa aaaaagcctg 6720actccacttc
ccgcgtttac tgattactag cgaagctgcg ggtgcatttt ttcaagataa 6780aggcatcccc
gattatattc tataccgatg tggattgcgc atactttgtg aacagaaagt 6840gatagcgttg
atgattcttc attggtcaga aaattatgaa cggtttcttc tattttgtct 6900ctatatacta
cgtataggaa atgtttacat tttcgtattg ttttcgattc actctatgaa 6960tagttcttac
tacaattttt ttgtctaaag agtaatacta gagataaaca taaaaaatgt 7020agaggtcgag
tttagatgca agttcaagga gcgaaaggtg gatgggtagg ttatataggg 7080atatagcaca
gagatatata gcaaagagat acttttgagc aat
7123939807DNAArtificial SequenceSynthetic Nucleic Acid 93ggttatatag
ggatatagca cagagatata tagcaaagag atacttttga gcaatgtttg 60tggaagcggt
attcgcaatt taattaacgc ttaccttggc cgttagacat catggtaaat 120ctgcgcagac
agccctgtgc agctgaaacg cggttacgta tagcttgcca tatgtctagc 180catacgtaac
cgcaggtaaa aggcatattt ttcgcgtgtc atggctagta aataacaccg 240gtgtcattta
gagtcaggga aagacaatga aaaacgaaga aagccaccgg gcggcaaccc 300gatgactttc
gcttatcacc cagcacacac ctgggagaaa tcacggtcat gagtttacag 360actcatgcgc
agaatgcgca cactaaaaca cctacccgcg tcgagcgcga ccgtggtgga 420ctggacaaca
ccccagcatc tgccagtgac cgcgaccttt tacgcgatca tctaggccgc 480gatgtactcc
acggttcagt cacacgagac tttaaaaagg cctatcgacg caacgctgac 540ggcacgaact
cgccgcgtat gtatcgcttc gagactgatg ctttaggacg gtgcgagtac 600gccatgctca
ccaccaagca gtacgccgcc gtcctggtcg tagacgttga ccaagtaggt 660accgcaggcg
gtgaccccgc agacttaaac ccgtacgtcc gcgacgtggt gcgctcactg 720attactcata
gcgtcgggcc agcctgggtg ggtattaacc caactaacgg caaagcccag 780ttcatatggc
ttattgaccc tgtctacgct gaccgtaacg gtaaatctgc gcagatgaag 840cttcttgcag
caaccacgcg tgtgctgggt gagcttttag accatgaccc gcacttttcc 900caccgcttta
gccgcaaccc gttctacaca ggcaaagccc ctaccgctta tcgttggtat 960aggcagcaca
accgggtgat gcgccttgga gacttgataa agcaggtaag ggatatggca 1020ggacacgacc
agttcaaccc caccccacgc cagcaattca gctctggccg cgaacttatc 1080aacgcggtca
agacccgccg tgaagaagcc caagcattca aagcactcgc ccaggacgta 1140gacgcggaaa
tcgccggtgg tctcgaccag tatgacccgg aacttatcga cggtgtgcgt 1200gtgctctgga
ttgtccaagg aaccgcagca cgcgacgaaa cagcctttag acatgcgctt 1260aagactggcc
accgcttgcg ccagcaaggc caacgcctga cagacgcagc aatcatcgac 1320gcctatgagc
acgcctacaa cgtcgcacac acccacggcg gtgcaggccg cgacaacgag 1380atgccaccca
tgcgcgaccg ccaaaccatg gcaaggcgcg tgcgcgggta tgtcgcccaa 1440tccaagagcg
agacctacag cggctctaac gcaccaggta aagccaccag cagcgagcgg 1500aaagccttgg
ccacgatggg acgcagaggc ggacaaaaag ccgcacaacg ctggaaaaca 1560gaccccgagg
gcaaatatgc gcaagcacaa aggtcgaagc ttgaaaagac gcaccgtaag 1620aaaaaggctc
aaggacgatc tacgaagtcc cgtattagcc aaatggtgaa cgatcagtat 1680ttccagacag
ggacagttcc cacgtgggct gaaatagggg cagaggtagg agtctctcgc 1740gccacggttg
ctaggcatgt cgcggagcta aagaagagcg gtgactatcc ggacgtttaa 1800ggggtctcat
accgtaagca atatacggtt cccctgccgt taggcagtta gataaaacct 1860cacttgaaga
aaaccttgag gggcagggca gcttatatgc ttcaaagcat gacttcctct 1920gttctcctag
acctcgcaac cctccgccat aacctcaccc tgctctgcga ggctggccgg 1980ctaccgccgg
cgtaacagat gagggcaagc ggatggctga tgaaaccaag ccgcggccgg 2040gaagccgatc
tcggcttgaa cgaattgtta ggtggcggta cttgggtcga tatcaaagtg 2100catcacttct
tcccgtatgc ccaactttgt atagagagcc actgcgggat cgtcaccgta 2160atctgcttgc
acgtagatca cataagcacc aagcgcgttg gcctcatgct tgaggagatt 2220gatgagcgcg
gtggcaatgc cctgcctccg gtgctcgccg gagactgcga gatcatagat 2280atagatctca
ctacgcggct gctcaaactt gggcagaacg taagccgcga gagcgccaac 2340aaccgcttct
tggtcgaagg cagcaagcgc gatgaatgtc ttactacgga gcaagttccc 2400gaggtaatcg
gagtccggct gatgttggga gtaggtggct acgtctccga actcacgacc 2460gaaaagatca
agagcagccc gcatggattt gacttggtca gggccgagcc tacatgtgcg 2520aatgatgccc
atacttgagc cacctaactt tgttttaggg cgactgccct gctgcgtaac 2580atcgttgctg
ctccataaca tcaaacatcg acccacggcg taacgcgctt gctgcttgga 2640tgcccgaggc
atagactgta caaaaaaaca gtcataacaa gccatgaaaa ccgccactgc 2700gccgttacca
ccgctgcgtt cggtcaaggt tctggaccag ttgcgtgagc gcatacgcta 2760cttgcattac
agtttacgaa ccgagtttaa acagctggtg acaattaatc atcggctcgt 2820ataatgtgtg
gaattgaatc gatataagga ggttaatcat gtgtctgtgg ttactactga 2880cgcacaggct
gcccatgccg ccggcgtctc gcgtcttctg gccagctacc gggcgatccc 2940gcccagcgcg
acagtgcgcc ttgcgaaacc gacgtccaac ctgttccgcg cccgcgcccg 3000caccaatgtg
aagggtctcg acgtctcggg cctgaccggt gtgatcggtg tcgacccgga 3060cgcgcgcacc
gccgatgtgg cgggcatgtg cacctacgag gacctggtgg cggccacgct 3120tccgtacggc
cttgccccac tggtggtgcc gcagctcaag accatcacgc tcggtggcgc 3180ggtcaccggt
ctgggcatcg agtccacgtc gttccgcaac ggtctgccgc acgaaagtgt 3240cctggagatg
gacatcttga ccggttcggg cgagatcgtc acggcctcac cggatcagca 3300ctcggatctg
ttccatgcgt tccccaattc atatggaacc cttggttatt ccacccggct 3360gcgcatcgaa
ctggagcccg tgcacccgtt tgtggcgttg cgccacctgc gctttcactc 3420gatcaccgat
ctggtcgcgg cgatggaccg gatcatcgag accggcgggc tggacggtga 3480acccgtcgac
tacctcgacg gcgtggtgtt cagcgcgact gagagttacc tgtgtgttgg 3540cttcaagacg
aaaacgccgg ggccggtcag cgattacaca ggtcagcaga tcttctaccg 3600gtcgatccag
catgacggcg acaccggcgc cgagaaacac gaccggctga ccatccacga 3660ctacctgtgg
cgctgggaca ccgactggtt ctggtgctca cgggcattcg gcgctcagca 3720tccggtgatc
cgcaggttct ggccgcggcg gctgcgccgc agcagcttct actggaagct 3780ggtggcctac
gaccagcggt acgacatcgc cgaccgtatc gagaagcgca acgggcgccc 3840gccgcgcgag
cgggtggtcc aggacgtcga ggtgcccatc gagcggtgcg cggacttcgt 3900cgagtggttc
ctgcagaatg tgccgatcga gccgatctgg ctgtgccccc tacggttgcg 3960tgacagcgcc
gacggcggtg cctcgtggcc cctgtatccg ctgaaggcgc accacaccta 4020cgtcaacatc
ggtttctggt catcagtgcc ggtgggcccc gaggagggcc acaccaaccg 4080cctcatcgag
aaaaaagtcg cggagctgga cgggcacaaa tctttgtact cggacgctta 4140ttacacacgt
gacgaattcg acgagctgta cggcggtgag gtctacaaca ccgtcaagaa 4200gacgtacgac
ccggattcac gtctgctaga cctgtattcg aaggcggtgc aaagacaatg 4260accacattca
aagaacgcga gacgtccaca gcggaccgca agctcaccct ggccgagatc 4320ctcgagatct
tcgccgcggg taaggagccg ctgaagttca ctgcgtacga cggcagctcg 4380gccggtcccg
aggacgccac gatgggtctg gacctcaaga ccccgcgtgg gaccacctat 4440ctggccacgg
cacccggcga tctgggcctg gcccgtgcgt atgtctccgg tgacctggag 4500ccgcacggcg
tgcatcccgg cgatccctac ccgctgctgc gcgccctggc cgaacgcatg 4560gagttcaagc
gcccgcctgc gcgtgtgctg gcgaacatcg tgcgctccat cggcatcgag 4620cacctcaagc
cgatcgcacc gccgccgcag gaggcgctgc cccggtggcg ccgcatcatg 4680gagggcctgc
ggcacagcaa gacccgcgac gccgaggcca tccaccacca ctacgacgtg 4740tcgaacacgt
tctacgagtg ggtgctgggc ccgtcgatga cctacacgtg cgcgtgctac 4800cccaccgagg
acgcgaccct cgaagaggcc caggacaaca agtaccgcct ggtgttcgag 4860aagctgcgcc
tgaagcccgg tgaccggttg ctcgacgtgg gctgcggctg gggcggcatg 4920gtccgctacg
cggcccgcca cggcgtcaag gcgctcggtg tcacgctcag ccgcgaacag 4980gcgacgtggg
cgcagaaggc catcgcccag gaaggtctca ccgatctggc cgaggtgcgt 5040cacggtgatt
accgcgacgt catcgaatcc gggttcgacg cggtgtcctc gatcgggctg 5100accgagcaca
tcggcgtgca caactacccg gcgtacttca acttcctcaa gtcgaagctg 5160cgcaccggtg
gcctgctgct caaccactgc atcacccgcc cggacaaccg gtcggcgcca 5220tcggccggcg
ggttcatcga caggtacgtg ttccccgacg gggagctcac cggctcgggc 5280cgcatcatca
ccgaggccca ggacgtgggc cttgaggtga tccacgagga gaacctacgc 5340aatcactatg
cgatgacgct gcgcgactgg tgccgcaacc tggtcgagca ctgggacgag 5400gcggtcgaag
aggtcgggct gcccaccgcg aaggtgtggg gcctgtacat ggccggctca 5460cgtctgggct
tcgagaccaa tgtggttcag ctgcaccagg ttctggcggt caagcttgac 5520gatcagggca
aggacggcgg actgccgttg cggccctggt ggtccgccta gcctcaaaat 5580atattttccc
tctatcttct cgttgcgctt aatttgacta attctcatta gcgaggcgcg 5640cctttccata
ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg 5700tggcgaaacc
cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg 5760cgctctcctg
ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga 5820agcgtggcgc
tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc 5880tccaagctgg
gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt 5940aactatcgtc
ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact 6000ggtaacagga
ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg 6060cctaactacg
gctacactag aagaacagta tttggtatct gcgctctgct gaagccagtt 6120accttcggaa
aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt 6180ggtttttttg
tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct 6240ttgatctttt
ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg 6300gtcatgagat
tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt 6360aaatcaatct
aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt 6420gaggcaccta
tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc 6480gtgtagataa
ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg 6540cgagacccac
gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc 6600gagcgcagaa
gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg 6660gaagctagag
taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca 6720ggcatcgtgg
tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga 6780tcaaggcgag
ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct 6840ccgatcgttg
tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 6900cataattctc
ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 6960accaagtcat
tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata 7020cgggataata
ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 7080tcggggcgaa
aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 7140cgtgcaccca
actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 7200acaggaaggc
aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 7260atactcttcc
tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 7320tacatatttg
aatgtattta gaaaaataaa cagcgatcgc gcggccgcgg gtaataactg 7380atataattaa
attgaagctc taatttgtga gtttagtata catgcattta cttataatac 7440agttttttag
ttttgctggc cgcatcttct caaatatgct tcccagcctg cttttctgta 7500acgttcaccc
tctaccttag catcccttcc ctttgcaaat agtcctcttc caacaataat 7560aatgtcagat
cctgtagaga ccacatcatc cacggttcta tactgttgac ccaatgcgtc 7620tcccttgtca
tctaaaccca caccgggtgt cataatcaac caatcgtaac cttcatctct 7680tccacccatg
tctctttgag caataaagcc gataacaaaa tctttgtcgc tcttcgcaat 7740gtcaacagta
cccttagtat attctccagt agctagggag cccttgcatg acaattctgc 7800taacatcaaa
aggcctctag gttcctttgt tacttcttcc gccgcctgct tcaaaccgct 7860aacaatacct
gggcccacca caccgtgtgc attcgtaatg tctgcccatt ctgctattct 7920gtatacaccc
gcagagtact gcaatttgac tgtattacca atgtcagcaa attttctgtc 7980ttcgaagagt
aaaaaattgt acttggcgga taatgccttt agcggcttaa ctgtgccctc 8040catggaaaaa
tcagtcaaga tatccacatg tgtttttagt aaacaaattt tgggacctaa 8100tgcttcaact
aactccagta attccttggt ggtacgaaca tccaatgaag cacacaagtt 8160tgtttgcttt
tcgtgcatga tattaaatag cttggcagca acaggactag gatgagtagc 8220agcacgttcc
ttatatgtag ctttcgacat gatttatctt cgtttcctgc aggtttttgt 8280tctgtgcagt
tgggttaaga atactgggca atttcatgtt tcttcaacac cacatatgcg 8340tatatatacc
aatctaagtc tgtgctcctt ccttcgttct tccttctgct cggagattac 8400cgaatcaaag
ctagcttatc gatgataagc tgtcaaagat gagaattaat tccacggact 8460atagactata
ctagatactc cgtctactgt acgatacact tccgctcagg tccttgtcct 8520ttaacgaggc
cttaccactc ttttgttact ctattgatcc agctcagcaa aggcagtgtg 8580atctaagatt
ctatcttcgc gatgtagtaa aactagctag accgagaaag agactagaaa 8640tgcaaaaggc
acttctacaa tggctgccat cattattatc cgatgtgacg ctgcagcttc 8700tcaatgatat
tcgaatacgc tttgaggaga tacagcctaa tatccgacaa actgttttac 8760agatttacga
tcgtacttgt tacccatcat tgaattttga acatccgaac ctgggagttt 8820tccctgaaac
agatagtata tttgaacctg tataataata tatagtctag cgctttacgg 8880aagacaatgt
atgtatttcg gttcctggag aaactattgc atctattgca taggtaatct 8940tgcacgtcgc
atccccggtt cattttctgc gtttccatct tgcacttcaa tagcatatct 9000ttgttaacga
agcatctgtg cttcattttg tagaacaaaa atgcaacgcg agagcgctaa 9060tttttcaaac
aaagaatctg agctgcattt ttacagaaca gaaatgcaac gcgaaagcgc 9120tattttacca
acgaagaatc tgtgcttcat ttttgtaaaa caaaaatgca acgcgacgag 9180agcgctaatt
tttcaaacaa agaatctgag ctgcattttt acagaacaga aatgcaacgc 9240gagagcgcta
ttttaccaac aaagaatcta tacttctttt ttgttctaca aaaatgcatc 9300ccgagagcgc
tatttttcta acaaagcatc ttagattact ttttttctcc tttgtgcgct 9360ctataatgca
gtctcttgat aactttttgc actgtaggtc cgttaaggtt agaagaaggc 9420tactttggtg
tctattttct cttccataaa aaaagcctga ctccacttcc cgcgtttact 9480gattactagc
gaagctgcgg gtgcattttt tcaagataaa ggcatccccg attatattct 9540ataccgatgt
ggattgcgca tactttgtga acagaaagtg atagcgttga tgattcttca 9600ttggtcagaa
aattatgaac ggtttcttct attttgtctc tatatactac gtataggaaa 9660tgtttacatt
ttcgtattgt tttcgattca ctctatgaat agttcttact acaatttttt 9720tgtctaaaga
gtaatactag agataaacat aaaaaatgta gaggtcgagt ttagatgcaa 9780gttcaaggag
cgaaaggtgg atgggta
98079410293DNAArtificial SequenceSynthetic Nucleic Acid 94gtttgtggaa
gcggtattcg caatttaatt aacgcttacc ttggccgtta gacatcatgg 60taaatctgcg
cagacagccc tgtgcagctg aaacgcggtt acgtatagct tgccatatgt 120ctagccatac
gtaaccgcag gtaaaaggca tatttttcgc gtgtcatggc tagtaaataa 180caccggtgtc
atttagagtc agggaaagac aatgaaaaac gaagaaagcc accgggcggc 240aacccgatga
ctttcgctta tcacccagca cacacctggg agaaatcacg gtcatgagtt 300tacagactca
tgcgcagaat gcgcacacta aaacacctac ccgcgtcgag cgcgaccgtg 360gtggactgga
caacacccca gcatctgcca gtgaccgcga ccttttacgc gatcatctag 420gccgcgatgt
actccacggt tcagtcacac gagactttaa aaaggcctat cgacgcaacg 480ctgacggcac
gaactcgccg cgtatgtatc gcttcgagac tgatgcttta ggacggtgcg 540agtacgccat
gctcaccacc aagcagtacg ccgccgtcct ggtcgtagac gttgaccaag 600taggtaccgc
aggcggtgac cccgcagact taaacccgta cgtccgcgac gtggtgcgct 660cactgattac
tcatagcgtc gggccagcct gggtgggtat taacccaact aacggcaaag 720cccagttcat
atggcttatt gaccctgtct acgctgaccg taacggtaaa tctgcgcaga 780tgaagcttct
tgcagcaacc acgcgtgtgc tgggtgagct tttagaccat gacccgcact 840tttcccaccg
ctttagccgc aacccgttct acacaggcaa agcccctacc gcttatcgtt 900ggtataggca
gcacaaccgg gtgatgcgcc ttggagactt gataaagcag gtaagggata 960tggcaggaca
cgaccagttc aaccccaccc cacgccagca attcagctct ggccgcgaac 1020ttatcaacgc
ggtcaagacc cgccgtgaag aagcccaagc attcaaagca ctcgcccagg 1080acgtagacgc
ggaaatcgcc ggtggtctcg accagtatga cccggaactt atcgacggtg 1140tgcgtgtgct
ctggattgtc caaggaaccg cagcacgcga cgaaacagcc tttagacatg 1200cgcttaagac
tggccaccgc ttgcgccagc aaggccaacg cctgacagac gcagcaatca 1260tcgacgccta
tgagcacgcc tacaacgtcg cacacaccca cggcggtgca ggccgcgaca 1320acgagatgcc
acccatgcgc gaccgccaaa ccatggcaag gcgcgtgcgc gggtatgtcg 1380cccaatccaa
gagcgagacc tacagcggct ctaacgcacc aggtaaagcc accagcagcg 1440agcggaaagc
cttggccacg atgggacgca gaggcggaca aaaagccgca caacgctgga 1500aaacagaccc
cgagggcaaa tatgcgcaag cacaaaggtc gaagcttgaa aagacgcacc 1560gtaagaaaaa
ggctcaagga cgatctacga agtcccgtat tagccaaatg gtgaacgatc 1620agtatttcca
gacagggaca gttcccacgt gggctgaaat aggggcagag gtaggagtct 1680ctcgcgccac
ggttgctagg catgtcgcgg agctaaagaa gagcggtgac tatccggacg 1740tttaaggggt
ctcataccgt aagcaatata cggttcccct gccgttaggc agttagataa 1800aacctcactt
gaagaaaacc ttgaggggca gggcagctta tatgcttcaa agcatgactt 1860cctctgttct
cctagacctc gcaaccctcc gccataacct caccctgctc tgcgaggctg 1920gccggctacc
gccggcgtaa cagatgaggg caagcggatg gctgatgaaa ccaagccgcg 1980gccgggaagc
cgatctcggc ttgaacgaat tgttaggtgg cggtacttgg gtcgatatca 2040aagtgcatca
cttcttcccg tatgcccaac tttgtataga gagccactgc gggatcgtca 2100ccgtaatctg
cttgcacgta gatcacataa gcaccaagcg cgttggcctc atgcttgagg 2160agattgatga
gcgcggtggc aatgccctgc ctccggtgct cgccggagac tgcgagatca 2220tagatataga
tctcactacg cggctgctca aacttgggca gaacgtaagc cgcgagagcg 2280ccaacaaccg
cttcttggtc gaaggcagca agcgcgatga atgtcttact acggagcaag 2340ttcccgaggt
aatcggagtc cggctgatgt tgggagtagg tggctacgtc tccgaactca 2400cgaccgaaaa
gatcaagagc agcccgcatg gatttgactt ggtcagggcc gagcctacat 2460gtgcgaatga
tgcccatact tgagccacct aactttgttt tagggcgact gccctgctgc 2520gtaacatcgt
tgctgctcca taacatcaaa catcgaccca cggcgtaacg cgcttgctgc 2580ttggatgccc
gaggcataga ctgtacaaaa aaacagtcat aacaagccat gaaaaccgcc 2640actgcgccgt
taccaccgct gcgttcggtc aaggttctgg accagttgcg tgagcgcata 2700cgctacttgc
attacagttt acgaaccgag tttaaacagc tggtgacaat taatcatcgg 2760ctcgtataat
gtgtggaatt gaatcgatat aaggaggtta atcatgtgaa ctgtcagtct 2820tccgcgtcca
acctcgccaa ccacatcaac gcggtgtacg agctgcgccg cgcctatgcg 2880cggctgtccg
ccgacaagcc ggtgcgcctg gcgaagacca cctccaacct cttccgcttc 2940cgcagccggg
acgatgccgc gcgtctcgac gtcagcgctt tcacctcggt gatcagcatc 3000gacacggagg
cgcgggtcgc ggaggtgggc ggcatgacca cctacgagga cctggtcgcc 3060gccaccctgc
ggcatggcct gatgccgccg gtggttccgc aactgcgcac gatcaccctg 3120ggcggtgcgg
tcaccgggct ggggatcgaa tcctcgtcct tccgcaacgg gctcccgcac 3180gagtcagtgg
aagagatgga gatcctcacc ggcagcggcc aggtggtggt ggcccggcgc 3240gacaacgagc
accgcgacct gttctacggt ttccccaact cgtacggcac cctcggttac 3300gcgctgcggc
tccgcatcca gctcgaaccg gtccgcccct acgtccacct gcggcacctg 3360cggttcaccg
atgccgcagc ggccatggcc gcgctggagc agatctgcgc ggaccgcacc 3420cacgacgggg
agaccgtcga cttcgtcgac ggcgtcgtgt tcgcccgcaa cgagctgtac 3480ctgaccttgg
ggacgttcac cgaccgggct ccgtggacca gcgactacac cggaaccgac 3540atctactacc
ggtcgatccc ccgctacgcg ggccccggcc ccggcgacta cctcaccacg 3600cacgactacc
tgtggcggtg ggacaccgac tggttctggt gctcccgcgc cttcggactg 3660cagcatcccg
tggtgcgccg cctgtggccg cgttccttga aacgctccga cgtctaccgc 3720aagctcgtcg
cctgggaccg gcgcactgac gcgagccgcc tgctcgacta ctaccgcggg 3780cgcccgccca
aggaaccggt gatccaggac atcgaggttg aggtggggcg ggctgccgag 3840ttcctcgact
tcttccacac cgagatcggc atgtccccgg tgtggctgtg cccgctgcgg 3900ctgcgagaag
acacagccga cgatacggaa ccggtctggc cgctctaccc cctcaaaccc 3960cgccgcctct
acgtcaactt cgggttttgg ggcctcgttc cgatccgtcc cggtggaggc 4020aggacatacc
acaaccggct gatcgaaaaa gaagtgaccc ggttgggcgg gcacaagtcg 4080ctctactcgg
acgccttcta cgacgaggac gagttctggg agctctacaa cggggagatc 4140taccgcaagc
tcaaagctgc ctacgacccc gacggtcgac tgctcgacct gtacaccaag 4200tgcgtcggcg
gcgggtgaga aaggatgagg gatgcgactg gcggaggtat tcgaacgtgt 4260cgtcggaccc
gatgcgcccg tccacttccg ggcctacgac ggcagcactg cgggagatcc 4320acgcagtgaa
gtcgctatcg tggttcgcca cccggcagcc gtcaactaca tcgtccaagc 4380gccgggagca
ctcggtttga cccgcgccta cgtggcggga tacctcgacg tcgaagggga 4440catgtacacc
gcgctgcggg caatggccga cgtggtgttc caggaccggc cgcggctgtc 4500ccccggggaa
ctgctgcgga tcatccgcgg gatcgggtgg gtgaagttcg tcaaccggct 4560tccaccgccg
ccgcaggagg tgcgccagtc ccgcctcgcc gccctgggct ggcgccactc 4620caagcagcgc
gacgccgaag ccatccagca ccactacgac gtctccaacg ccttctacgc 4680cctggtcttg
ggcgagtcga tgacctacac ctgcgcggtc tacccgaccg agcaggccac 4740gctggagcag
gcacagttct tcaagcacga gctgatcgcc cgcaagctcg gtcttgcccc 4800tgggatacga
ctgctggatg tggggtgcgg ctggggcggc atggtcatcc acgcggcccg 4860ggagcacggg
gtcaaagccc tgggggtgac cctgtccaaa gagcaggctg agtgggcgca 4920gaagcggatc
gcccacgagg gcctgggcga cctggcagaa gtccggcaca tggactaccg 4980ggacctgccc
gacggcgagt acgacgcgat cagctcgatc gggttgaccg agcacgtcgg 5040caaaaagaac
gtgcccgcct acttcgcgtc gctgtaccgc aagctcgtcc cgggaggccg 5100cctgctcaac
cactgcatca cccggccccg caacgacctg ccgcccttca aacgcggcgg 5160ggtgatcaac
cgctacgtct tccccgatgg ggagctggaa gggcccggct ggctgcaggc 5220ggcgatgaac
gacgccgggt tcgaaatccg ccaccaggag aacctgcggg agcactacgc 5280acggaccctg
cgggactggc tggccaacct ggaccgcaac tgggatgccg cggtgcggga 5340agtgggggag
ggcacggccc gagtgtggcg gctctacatg gccgggtgcg tgctcggctt 5400cgaacgcaac
gtggtgcaac tgcaccagat cctcggggtg aagctcgacg ggaccgaggc 5460gcggatgccg
ctgcgccccg acttcgaacc gccgctgcct taaccgcggt gcacagccgg 5520gggatatcag
tcgcggaacc gggcatgatg agcccatggc tgcgaccgat gacgaccggc 5580accacaccac
cgtcgccctc gacctcatcg acgcgtatgt gcgcgccgac cgcagaatga 5640tcggtgaacg
ttccgcgggg atcagcgcgg aggcggggga gcggatcgtc tccaccctga 5700aagtgtgcgc
ggccttcctt gcccgccggg tccaggagac cggggtgccg tggcgcgcag 5760cggactcccg
ggaagcggtc gcccgcaccg tcgccgacct gctggaaccc gaggtggaat 5820tcgcggtcgt
ctccgcctgg gaggcgtacg cgatcgggga gcacgaggcc gcctgggtcc 5880gggcgcacgg
cgatccgctg gtcttcgtcc acatgctggc cgcgttctcc gctgctatcg 5940gcacagcggt
ctacggccgt gaggagctgc tgcccacgct gcgcagggtg acagcacgat 6000aacctcaaaa
tatattttcc ctctatcttc tcgttgcgct taatttgact aattctcatt 6060agcgaggcgc
gcctttccat aggctccgcc cccctgacga gcatcacaaa aatcgacgct 6120caagtcagag
gtggcgaaac ccgacaggac tataaagata ccaggcgttt ccccctggaa 6180gctccctcgt
gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc 6240tcccttcggg
aagcgtggcg ctttctcata gctcacgctg taggtatctc agttcggtgt 6300aggtcgttcg
ctccaagctg ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg 6360ccttatccgg
taactatcgt cttgagtcca acccggtaag acacgactta tcgccactgg 6420cagcagccac
tggtaacagg attagcagag cgaggtatgt aggcggtgct acagagttct 6480tgaagtggtg
gcctaactac ggctacacta gaagaacagt atttggtatc tgcgctctgc 6540tgaagccagt
taccttcgga aaaagagttg gtagctcttg atccggcaaa caaaccaccg 6600ctggtagcgg
tggttttttt gtttgcaagc agcagattac gcgcagaaaa aaaggatctc 6660aagaagatcc
tttgatcttt tctacggggt ctgacgctca gtggaacgaa aactcacgtt 6720aagggatttt
ggtcatgaga ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa 6780aatgaagttt
taaatcaatc taaagtatat atgagtaaac ttggtctgac agttaccaat 6840gcttaatcag
tgaggcacct atctcagcga tctgtctatt tcgttcatcc atagttgcct 6900gactccccgt
cgtgtagata actacgatac gggagggctt accatctggc cccagtgctg 6960caatgatacc
gcgagaccca cgctcaccgg ctccagattt atcagcaata aaccagccag 7020ccggaagggc
cgagcgcaga agtggtcctg caactttatc cgcctccatc cagtctatta 7080attgttgccg
ggaagctaga gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg 7140ccattgctac
aggcatcgtg gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg 7200gttcccaacg
atcaaggcga gttacatgat cccccatgtt gtgcaaaaaa gcggttagct 7260ccttcggtcc
tccgatcgtt gtcagaagta agttggccgc agtgttatca ctcatggtta 7320tggcagcact
gcataattct cttactgtca tgccatccgt aagatgcttt tctgtgactg 7380gtgagtactc
aaccaagtca ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc 7440cggcgtcaat
acgggataat accgcgccac atagcagaac tttaaaagtg ctcatcattg 7500gaaaacgttc
ttcggggcga aaactctcaa ggatcttacc gctgttgaga tccagttcga 7560tgtaacccac
tcgtgcaccc aactgatctt cagcatcttt tactttcacc agcgtttctg 7620ggtgagcaaa
aacaggaagg caaaatgccg caaaaaaggg aataagggcg acacggaaat 7680gttgaatact
catactcttc ctttttcaat attattgaag catttatcag ggttattgtc 7740tcatgagcgg
atacatattt gaatgtattt agaaaaataa acagcgatcg cgcggccgcg 7800ggtaataact
gatataatta aattgaagct ctaatttgtg agtttagtat acatgcattt 7860acttataata
cagtttttta gttttgctgg ccgcatcttc tcaaatatgc ttcccagcct 7920gcttttctgt
aacgttcacc ctctacctta gcatcccttc cctttgcaaa tagtcctctt 7980ccaacaataa
taatgtcaga tcctgtagag accacatcat ccacggttct atactgttga 8040cccaatgcgt
ctcccttgtc atctaaaccc acaccgggtg tcataatcaa ccaatcgtaa 8100ccttcatctc
ttccacccat gtctctttga gcaataaagc cgataacaaa atctttgtcg 8160ctcttcgcaa
tgtcaacagt acccttagta tattctccag tagctaggga gcccttgcat 8220gacaattctg
ctaacatcaa aaggcctcta ggttcctttg ttacttcttc cgccgcctgc 8280ttcaaaccgc
taacaatacc tgggcccacc acaccgtgtg cattcgtaat gtctgcccat 8340tctgctattc
tgtatacacc cgcagagtac tgcaatttga ctgtattacc aatgtcagca 8400aattttctgt
cttcgaagag taaaaaattg tacttggcgg ataatgcctt tagcggctta 8460actgtgccct
ccatggaaaa atcagtcaag atatccacat gtgtttttag taaacaaatt 8520ttgggaccta
atgcttcaac taactccagt aattccttgg tggtacgaac atccaatgaa 8580gcacacaagt
ttgtttgctt ttcgtgcatg atattaaata gcttggcagc aacaggacta 8640ggatgagtag
cagcacgttc cttatatgta gctttcgaca tgatttatct tcgtttcctg 8700caggtttttg
ttctgtgcag ttgggttaag aatactgggc aatttcatgt ttcttcaaca 8760ccacatatgc
gtatatatac caatctaagt ctgtgctcct tccttcgttc ttccttctgc 8820tcggagatta
ccgaatcaaa gctagcttat cgatgataag ctgtcaaaga tgagaattaa 8880ttccacggac
tatagactat actagatact ccgtctactg tacgatacac ttccgctcag 8940gtccttgtcc
tttaacgagg ccttaccact cttttgttac tctattgatc cagctcagca 9000aaggcagtgt
gatctaagat tctatcttcg cgatgtagta aaactagcta gaccgagaaa 9060gagactagaa
atgcaaaagg cacttctaca atggctgcca tcattattat ccgatgtgac 9120gctgcagctt
ctcaatgata ttcgaatacg ctttgaggag atacagccta atatccgaca 9180aactgtttta
cagatttacg atcgtacttg ttacccatca ttgaattttg aacatccgaa 9240cctgggagtt
ttccctgaaa cagatagtat atttgaacct gtataataat atatagtcta 9300gcgctttacg
gaagacaatg tatgtatttc ggttcctgga gaaactattg catctattgc 9360ataggtaatc
ttgcacgtcg catccccggt tcattttctg cgtttccatc ttgcacttca 9420atagcatatc
tttgttaacg aagcatctgt gcttcatttt gtagaacaaa aatgcaacgc 9480gagagcgcta
atttttcaaa caaagaatct gagctgcatt tttacagaac agaaatgcaa 9540cgcgaaagcg
ctattttacc aacgaagaat ctgtgcttca tttttgtaaa acaaaaatgc 9600aacgcgacga
gagcgctaat ttttcaaaca aagaatctga gctgcatttt tacagaacag 9660aaatgcaacg
cgagagcgct attttaccaa caaagaatct atacttcttt tttgttctac 9720aaaaatgcat
cccgagagcg ctatttttct aacaaagcat cttagattac tttttttctc 9780ctttgtgcgc
tctataatgc agtctcttga taactttttg cactgtaggt ccgttaaggt 9840tagaagaagg
ctactttggt gtctattttc tcttccataa aaaaagcctg actccacttc 9900ccgcgtttac
tgattactag cgaagctgcg ggtgcatttt ttcaagataa aggcatcccc 9960gattatattc
tataccgatg tggattgcgc atactttgtg aacagaaagt gatagcgttg 10020atgattcttc
attggtcaga aaattatgaa cggtttcttc tattttgtct ctatatacta 10080cgtataggaa
atgtttacat tttcgtattg ttttcgattc actctatgaa tagttcttac 10140tacaattttt
ttgtctaaag agtaatacta gagataaaca taaaaaatgt agaggtcgag 10200tttagatgca
agttcaagga gcgaaaggtg gatgggtagg ttatataggg atatagcaca 10260gagatatata
gcaaagagat acttttgagc aat
10293955654DNAArtificial SequenceSynthetic Nucleic Acid 95tgggtaggtt
atatagggat atagcacaga gatatatagc aaagagatac ttttgagcaa 60tgtttgtgga
agcggtattc gcaatttaat taaagctggt gacaattaat catcggctcg 120tataatgtgt
ggaattgaat cgatataagg aggttaatca tatgacgctg gccaaggtct 180tcgaggagct
ggtcggggcg gacgcccctg tggagctcac cgcctacgac ggatcgagag 240ccggacgcct
gggcagtgat ctgcgggtcc acgtgaagtc gccgtacgcg gtgtcctacc 300tggtgcactc
gccgagcgcg ctcgggctgg cccgcgcgta cgtggccggg cacctggacg 360cctacggcga
catgtacacg ctgctgcggg agatgacgca gctgaccgag gcgctgacgc 420ccaaggcccg
gctgcggctg ctggccggtg tcctgcagga tccgctgctg cgcgcggcgg 480ccagccgccg
tctgccgccc ccgccgcagg aggtgcggac cggccgcacc tcctggttcc 540ggcacaccaa
gcggcgggac gccaaggcca tctcccacca ctacgacgtg tccaacacct 600tctatgagtg
ggtgctgggc ccgtcgatga cctacacctg cgcctgtttc cccaccgagg 660acgccacctt
ggaggaggcg cagttccaca agcacgacct ggtcgccaag aagctcgggc 720tgcggccggg
catgcggctg ctggacgtgg gctgcggctg gggcggcatg gtgatgcacg 780ccgccaagca
ctacggggtg cgggcgctgg gcgtcacgct gtccaagcag caggccgagt 840gggcgcagaa
ggccatcgcc gaggcgggcc tgagcgacct ggccgaggtc cgccaccagg 900actaccggga
cgtcaccgag ggcgacttcg acgccatcag ctcgatcggc ctcaccgagc 960acatcggcaa
ggccaacctg ccgtcctact tcggcttcct gtacggcaag ctcaagccgg 1020gcgggcggct
gctcaaccac tgcatcaccc ggcccgacaa cacccagccg gccatgaaga 1080aggacgggtt
catcaaccgg tacgtcttcc ccgacgggga gctggagggg cccggctacc 1140tgcagaccca
gatgaacgac gccggttttg agatccgcca ccaggagaac ctgcgcgagc 1200actacgcccg
caccctggcc ggatggtgcc gcaacctcga tgagcactgg gacgaggcgg 1260tggccgaggt
cggcgagggc accgcgcggg tgtggcggct gtacatggcc ggcagccggc 1320tcggtttcga
gctcaactgg atccagctgc accagatcct gggcgtcaag ctcggcgagc 1380gcggcgagtc
ccgcatgccg ttgcggcccg actggggcgt gtgacctcaa aatatatttt 1440ccctctatct
tctcgttgcg cttaatttga ctaattctca ttagcgaggc gcgcctttcc 1500ataggctccg
cccccctgac gagcatcaca aaaatcgacg ctcaagtcag aggtggcgaa 1560acccgacagg
actataaaga taccaggcgt ttccccctgg aagctccctc gtgcgctctc 1620ctgttccgac
cctgccgctt accggatacc tgtccgcctt tctcccttcg ggaagcgtgg 1680cgctttctca
tagctcacgc tgtaggtatc tcagttcggt gtaggtcgtt cgctccaagc 1740tgggctgtgt
gcacgaaccc cccgttcagc ccgaccgctg cgccttatcc ggtaactatc 1800gtcttgagtc
caacccggta agacacgact tatcgccact ggcagcagcc actggtaaca 1860ggattagcag
agcgaggtat gtaggcggtg ctacagagtt cttgaagtgg tggcctaact 1920acggctacac
tagaagaaca gtatttggta tctgcgctct gctgaagcca gttaccttcg 1980gaaaaagagt
tggtagctct tgatccggca aacaaaccac cgctggtagc ggtggttttt 2040ttgtttgcaa
gcagcagatt acgcgcagaa aaaaaggatc tcaagaagat cctttgatct 2100tttctacggg
gtctgacgct cagtggaacg aaaactcacg ttaagggatt ttggtcatga 2160gattatcaaa
aaggatcttc acctagatcc ttttaaatta aaaatgaagt tttaaatcaa 2220tctaaagtat
atatgagtaa acttggtctg acagttacca atgcttaatc agtgaggcac 2280ctatctcagc
gatctgtcta tttcgttcat ccatagttgc ctgactcccc gtcgtgtaga 2340taactacgat
acgggagggc ttaccatctg gccccagtgc tgcaatgata ccgcgagacc 2400cacgctcacc
ggctccagat ttatcagcaa taaaccagcc agccggaagg gccgagcgca 2460gaagtggtcc
tgcaacttta tccgcctcca tccagtctat taattgttgc cgggaagcta 2520gagtaagtag
ttcgccagtt aatagtttgc gcaacgttgt tgccattgct acaggcatcg 2580tggtgtcacg
ctcgtcgttt ggtatggctt cattcagctc cggttcccaa cgatcaaggc 2640gagttacatg
atcccccatg ttgtgcaaaa aagcggttag ctccttcggt cctccgatcg 2700ttgtcagaag
taagttggcc gcagtgttat cactcatggt tatggcagca ctgcataatt 2760ctcttactgt
catgccatcc gtaagatgct tttctgtgac tggtgagtac tcaaccaagt 2820cattctgaga
atagtgtatg cggcgaccga gttgctcttg cccggcgtca atacgggata 2880ataccgcgcc
acatagcaga actttaaaag tgctcatcat tggaaaacgt tcttcggggc 2940gaaaactctc
aaggatctta ccgctgttga gatccagttc gatgtaaccc actcgtgcac 3000ccaactgatc
ttcagcatct tttactttca ccagcgtttc tgggtgagca aaaacaggaa 3060ggcaaaatgc
cgcaaaaaag ggaataaggg cgacacggaa atgttgaata ctcatactct 3120tcctttttca
atattattga agcatttatc agggttattg tctcatgagc ggatacatat 3180ttgaatgtat
ttagaaaaat aaacagcgat cgcgcggccg cgggtaataa ctgatataat 3240taaattgaag
ctctaatttg tgagtttagt atacatgcat ttacttataa tacagttttt 3300tagttttgct
ggccgcatct tctcaaatat gcttcccagc ctgcttttct gtaacgttca 3360ccctctacct
tagcatccct tccctttgca aatagtcctc ttccaacaat aataatgtca 3420gatcctgtag
agaccacatc atccacggtt ctatactgtt gacccaatgc gtctcccttg 3480tcatctaaac
ccacaccggg tgtcataatc aaccaatcgt aaccttcatc tcttccaccc 3540atgtctcttt
gagcaataaa gccgataaca aaatctttgt cgctcttcgc aatgtcaaca 3600gtacccttag
tatattctcc agtagctagg gagcccttgc atgacaattc tgctaacatc 3660aaaaggcctc
taggttcctt tgttacttct tccgccgcct gcttcaaacc gctaacaata 3720cctgggccca
ccacaccgtg tgcattcgta atgtctgccc attctgctat tctgtataca 3780cccgcagagt
actgcaattt gactgtatta ccaatgtcag caaattttct gtcttcgaag 3840agtaaaaaat
tgtacttggc ggataatgcc tttagcggct taactgtgcc ctccatggaa 3900aaatcagtca
agatatccac atgtgttttt agtaaacaaa ttttgggacc taatgcttca 3960actaactcca
gtaattcctt ggtggtacga acatccaatg aagcacacaa gtttgtttgc 4020ttttcgtgca
tgatattaaa tagcttggca gcaacaggac taggatgagt agcagcacgt 4080tccttatatg
tagctttcga catgatttat cttcgtttcc tgcaggtttt tgttctgtgc 4140agttgggtta
agaatactgg gcaatttcat gtttcttcaa caccacatat gcgtatatat 4200accaatctaa
gtctgtgctc cttccttcgt tcttccttct gctcggagat taccgaatca 4260aagctagctt
atcgatgata agctgtcaaa gatgagaatt aattccacgg actatagact 4320atactagata
ctccgtctac tgtacgatac acttccgctc aggtccttgt cctttaacga 4380ggccttacca
ctcttttgtt actctattga tccagctcag caaaggcagt gtgatctaag 4440attctatctt
cgcgatgtag taaaactagc tagaccgaga aagagactag aaatgcaaaa 4500ggcacttcta
caatggctgc catcattatt atccgatgtg acgctgcagc ttctcaatga 4560tattcgaata
cgctttgagg agatacagcc taatatccga caaactgttt tacagattta 4620cgatcgtact
tgttacccat cattgaattt tgaacatccg aacctgggag ttttccctga 4680aacagatagt
atatttgaac ctgtataata atatatagtc tagcgcttta cggaagacaa 4740tgtatgtatt
tcggttcctg gagaaactat tgcatctatt gcataggtaa tcttgcacgt 4800cgcatccccg
gttcattttc tgcgtttcca tcttgcactt caatagcata tctttgttaa 4860cgaagcatct
gtgcttcatt ttgtagaaca aaaatgcaac gcgagagcgc taatttttca 4920aacaaagaat
ctgagctgca tttttacaga acagaaatgc aacgcgaaag cgctatttta 4980ccaacgaaga
atctgtgctt catttttgta aaacaaaaat gcaacgcgac gagagcgcta 5040atttttcaaa
caaagaatct gagctgcatt tttacagaac agaaatgcaa cgcgagagcg 5100ctattttacc
aacaaagaat ctatacttct tttttgttct acaaaaatgc atcccgagag 5160cgctattttt
ctaacaaagc atcttagatt actttttttc tcctttgtgc gctctataat 5220gcagtctctt
gataactttt tgcactgtag gtccgttaag gttagaagaa ggctactttg 5280gtgtctattt
tctcttccat aaaaaaagcc tgactccact tcccgcgttt actgattact 5340agcgaagctg
cgggtgcatt ttttcaagat aaaggcatcc ccgattatat tctataccga 5400tgtggattgc
gcatactttg tgaacagaaa gtgatagcgt tgatgattct tcattggtca 5460gaaaattatg
aacggtttct tctattttgt ctctatatac tacgtatagg aaatgtttac 5520attttcgtat
tgttttcgat tcactctatg aatagttctt actacaattt ttttgtctaa 5580agagtaatac
tagagataaa cataaaaaat gtagaggtcg agtttagatg caagttcaag 5640gagcgaaagg
tgga
5654965759DNAArtificial SequenceSynthetic Nucleic Acid 96tgggtaggtt
atatagggat atagcacaga gatatatagc aaagagatac ttttgagcaa 60tgtttgtgga
agcggtattc gcaatttaat taaagctggt gacaattaat catcggctcg 120tataatgtgt
ggaattgaat cgatataagg aggttaatca tatgtcacag ctggcggtca 180cagaccacca
cgagcgagcg gtcgaggcgc tgcgcaggtc gtatgcggcg atcccgccgg 240gcacaccggt
ccgcttggcc aagcagacct ccaacctgtt ccgcttccgc gagccgacgg 300ccgcgcccgg
cctggacgtg tccggcttca accgggtgct ggcggtggac ccggatgcgc 360gcaccgccga
cgtgcagggc atgaccacct acgaggacct ggtcgacgcc accctgccgc 420acgggctgat
gccgctggtg gtgccccagc tcaagacgat cacgctgggc ggggcggtga 480ccggcctggg
catcgagtcc acctccttcc gcaacggcct gccgcacgag tcggtgctgg 540agatgcagat
catcaccggc gccggcgaag tggtcaccgc caccccggac ggggagcact 600ccgacctgtt
ctggggcttc cccaactcct acgggacgct ggggtacgcc ctgaagctga 660agatcgaact
ggagccggtc aagccgtacg tccggctgcg gcacctgcgc ttcgacgacg 720ccggcgagtg
cgccgccaag ctcgccgagc tgagcgaaag ccgcgagcac gagggcgatg 780aggtgcactt
tttggacggc accttcttcg ggccgcgcga gatgtacctg acgctcggca 840cgttcaccga
caccgccccc tatgtgtcgg actacaccgg gcagcacatc tactaccggt 900cgatccagca
gcggtcgatc gactttttga ccatccgcga ctacctgtgg cgctgggaca 960ccgactggtt
ctggtgctcg cgcgccctgg gcgtgcagaa cccgctgatc cggcgggtgt 1020ggccgaagag
cgccaagcgg tcggatgtgt accgcaagct ggtggcctac gaaaagcgct 1080accagttcaa
ggcgcgcatc gaccggtgga cgggcaagcc gccgcgcgag gacgtcatcc 1140aggacatcga
ggtgccggca gaacgcctgc cggagttcct ggagttcttc cacgacaaga 1200tcgggatgag
cccggtgtgg ctgtgcccgc tgcgggcgcg ccaccgctgg ccgctgtacc 1260cgctcaagcc
cggcgtcacc tacgtcaacg ccggcttctg ggggacggtg ccgctgcagc 1320cggggcagat
gcccgagtac cacaaccggc tgatcgaacg gaaggtcgcc caactggacg 1380gccacaagtc
tctgtactcg acggcgttct actcgcgtga ggagttctgg cggcactacg 1440acggggaaac
ctaccggcgt ctgaaggaca cctacgaccc cgacgcgcgc ctgctcgacc 1500tctacgacaa
gtgcgtgcgg ggacgctgac ctcaaaatat attttccctc tatcttctcg 1560ttgcgcttaa
tttgactaat tctcattagc gaggcgcgcc tttccatagg ctccgccccc 1620ctgacgagca
tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg acaggactat 1680aaagatacca
ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc 1740cgcttaccgg
atacctgtcc gcctttctcc cttcgggaag cgtggcgctt tctcatagct 1800cacgctgtag
gtatctcagt tcggtgtagg tcgttcgctc caagctgggc tgtgtgcacg 1860aaccccccgt
tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc 1920cggtaagaca
cgacttatcg ccactggcag cagccactgg taacaggatt agcagagcga 1980ggtatgtagg
cggtgctaca gagttcttga agtggtggcc taactacggc tacactagaa 2040gaacagtatt
tggtatctgc gctctgctga agccagttac cttcggaaaa agagttggta 2100gctcttgatc
cggcaaacaa accaccgctg gtagcggtgg tttttttgtt tgcaagcagc 2160agattacgcg
cagaaaaaaa ggatctcaag aagatccttt gatcttttct acggggtctg 2220acgctcagtg
gaacgaaaac tcacgttaag ggattttggt catgagatta tcaaaaagga 2280tcttcaccta
gatcctttta aattaaaaat gaagttttaa atcaatctaa agtatatatg 2340agtaaacttg
gtctgacagt taccaatgct taatcagtga ggcacctatc tcagcgatct 2400gtctatttcg
ttcatccata gttgcctgac tccccgtcgt gtagataact acgatacggg 2460agggcttacc
atctggcccc agtgctgcaa tgataccgcg agacccacgc tcaccggctc 2520cagatttatc
agcaataaac cagccagccg gaagggccga gcgcagaagt ggtcctgcaa 2580ctttatccgc
ctccatccag tctattaatt gttgccggga agctagagta agtagttcgc 2640cagttaatag
tttgcgcaac gttgttgcca ttgctacagg catcgtggtg tcacgctcgt 2700cgtttggtat
ggcttcattc agctccggtt cccaacgatc aaggcgagtt acatgatccc 2760ccatgttgtg
caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc agaagtaagt 2820tggccgcagt
gttatcactc atggttatgg cagcactgca taattctctt actgtcatgc 2880catccgtaag
atgcttttct gtgactggtg agtactcaac caagtcattc tgagaatagt 2940gtatgcggcg
accgagttgc tcttgcccgg cgtcaatacg ggataatacc gcgccacata 3000gcagaacttt
aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa ctctcaagga 3060tcttaccgct
gttgagatcc agttcgatgt aacccactcg tgcacccaac tgatcttcag 3120catcttttac
tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa aatgccgcaa 3180aaaagggaat
aagggcgaca cggaaatgtt gaatactcat actcttcctt tttcaatatt 3240attgaagcat
ttatcagggt tattgtctca tgagcggata catatttgaa tgtatttaga 3300aaaataaaca
gcgatcgcgc ggccgcgggt aataactgat ataattaaat tgaagctcta 3360atttgtgagt
ttagtataca tgcatttact tataatacag ttttttagtt ttgctggccg 3420catcttctca
aatatgcttc ccagcctgct tttctgtaac gttcaccctc taccttagca 3480tcccttccct
ttgcaaatag tcctcttcca acaataataa tgtcagatcc tgtagagacc 3540acatcatcca
cggttctata ctgttgaccc aatgcgtctc ccttgtcatc taaacccaca 3600ccgggtgtca
taatcaacca atcgtaacct tcatctcttc cacccatgtc tctttgagca 3660ataaagccga
taacaaaatc tttgtcgctc ttcgcaatgt caacagtacc cttagtatat 3720tctccagtag
ctagggagcc cttgcatgac aattctgcta acatcaaaag gcctctaggt 3780tcctttgtta
cttcttccgc cgcctgcttc aaaccgctaa caatacctgg gcccaccaca 3840ccgtgtgcat
tcgtaatgtc tgcccattct gctattctgt atacacccgc agagtactgc 3900aatttgactg
tattaccaat gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac 3960ttggcggata
atgcctttag cggcttaact gtgccctcca tggaaaaatc agtcaagata 4020tccacatgtg
tttttagtaa acaaattttg ggacctaatg cttcaactaa ctccagtaat 4080tccttggtgg
tacgaacatc caatgaagca cacaagtttg tttgcttttc gtgcatgata 4140ttaaatagct
tggcagcaac aggactagga tgagtagcag cacgttcctt atatgtagct 4200ttcgacatga
tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat 4260actgggcaat
ttcatgtttc ttcaacacca catatgcgta tatataccaa tctaagtctg 4320tgctccttcc
ttcgttcttc cttctgctcg gagattaccg aatcaaagct agcttatcga 4380tgataagctg
tcaaagatga gaattaattc cacggactat agactatact agatactccg 4440tctactgtac
gatacacttc cgctcaggtc cttgtccttt aacgaggcct taccactctt 4500ttgttactct
attgatccag ctcagcaaag gcagtgtgat ctaagattct atcttcgcga 4560tgtagtaaaa
ctagctagac cgagaaagag actagaaatg caaaaggcac ttctacaatg 4620gctgccatca
ttattatccg atgtgacgct gcagcttctc aatgatattc gaatacgctt 4680tgaggagata
cagcctaata tccgacaaac tgttttacag atttacgatc gtacttgtta 4740cccatcattg
aattttgaac atccgaacct gggagttttc cctgaaacag atagtatatt 4800tgaacctgta
taataatata tagtctagcg ctttacggaa gacaatgtat gtatttcggt 4860tcctggagaa
actattgcat ctattgcata ggtaatcttg cacgtcgcat ccccggttca 4920ttttctgcgt
ttccatcttg cacttcaata gcatatcttt gttaacgaag catctgtgct 4980tcattttgta
gaacaaaaat gcaacgcgag agcgctaatt tttcaaacaa agaatctgag 5040ctgcattttt
acagaacaga aatgcaacgc gaaagcgcta ttttaccaac gaagaatctg 5100tgcttcattt
ttgtaaaaca aaaatgcaac gcgacgagag cgctaatttt tcaaacaaag 5160aatctgagct
gcatttttac agaacagaaa tgcaacgcga gagcgctatt ttaccaacaa 5220agaatctata
cttctttttt gttctacaaa aatgcatccc gagagcgcta tttttctaac 5280aaagcatctt
agattacttt ttttctcctt tgtgcgctct ataatgcagt ctcttgataa 5340ctttttgcac
tgtaggtccg ttaaggttag aagaaggcta ctttggtgtc tattttctct 5400tccataaaaa
aagcctgact ccacttcccg cgtttactga ttactagcga agctgcgggt 5460gcattttttc
aagataaagg catccccgat tatattctat accgatgtgg attgcgcata 5520ctttgtgaac
agaaagtgat agcgttgatg attcttcatt ggtcagaaaa ttatgaacgg 5580tttcttctat
tttgtctcta tatactacgt ataggaaatg tttacatttt cgtattgttt 5640tcgattcact
ctatgaatag ttcttactac aatttttttg tctaaagagt aatactagag 5700ataaacataa
aaaatgtaga ggtcgagttt agatgcaagt tcaaggagcg aaaggtgga
5759972664DNAThermomonospora curvata 97atgtcacagc tggcggtcac agaccaccac
gagcgagcgg tcgaggcgct gcgcaggtcg 60tatgcggcga tcccgccggg cacaccggtc
cgcttggcca agcagacctc caacctgttc 120cgcttccgcg agccgacggc cgcgcccggc
ctggacgtgt ccggcttcaa ccgggtgctg 180gcggtggacc cggatgcgcg caccgccgac
gtgcagggca tgaccaccta cgaggacctg 240gtcgacgcca ccctgccgca cgggctgatg
ccgctggtgg tgccccagct caagacgatc 300acgctgggcg gggcggtgac cggcctgggc
atcgagtcca cctccttccg caacggcctg 360ccgcacgagt cggtgctgga gatgcagatc
atcaccggcg ccggcgaagt ggtcaccgcc 420accccggacg gggagcactc cgacctgttc
tggggcttcc ccaactccta cgggacgctg 480gggtacgccc tgaagctgaa gatcgaactg
gagccggtca agccgtacgt ccggctgcgg 540cacctgcgct tcgacgacgc cggcgagtgc
gccgccaagc tcgccgagct gagcgaaagc 600cgcgagcacg agggcgatga ggtgcacttt
ttggacggca ccttcttcgg gccgcgcgag 660atgtacctga cgctcggcac gttcaccgac
accgccccct atgtgtcgga ctacaccggg 720cagcacatct actaccggtc gatccagcag
cggtcgatcg actttttgac catccgcgac 780tacctgtggc gctgggacac cgactggttc
tggtgctcgc gcgccctggg cgtgcagaac 840ccgctgatcc ggcgggtgtg gccgaagagc
gccaagcggt cggatgtgta ccgcaagctg 900gtggcctacg aaaagcgcta ccagttcaag
gcgcgcatcg accggtggac gggcaagccg 960ccgcgcgagg acgtcatcca ggacatcgag
gtgccggcag aacgcctgcc ggagttcctg 1020gagttcttcc acgacaagat cgggatgagc
ccggtgtggc tgtgcccgct gcgggcgcgc 1080caccgctggc cgctgtaccc gctcaagccc
ggcgtcacct acgtcaacgc cggcttctgg 1140gggacggtgc cgctgcagcc ggggcagatg
cccgagtacc acaaccggct gatcgaacgg 1200aaggtcgccc aactggacgg ccacaagtct
ctgtactcga cggcgttcta ctcgcgtgag 1260gagttctggc ggcactacga cggggaaacc
taccggcgtc tgaaggacac ctacgacccc 1320gacgcgcgcc tgctcgacct ctacgacaag
tgcgtgcggg gacgcgctgg tggtgccgag 1380ggtggcaatg gcggtggcgc catgacgctg
gccaaggtct tcgaggagct ggtcggggcg 1440gacgcccctg tggagctcac cgcctacgac
ggatcgagag ccggacgcct gggcagtgat 1500ctgcgggtcc acgtgaagtc gccgtacgcg
gtgtcctacc tggtgcactc gccgagcgcg 1560ctcgggctgg cccgcgcgta cgtggccggg
cacctggacg cctacggcga catgtacacg 1620ctgctgcggg agatgacgca gctgaccgag
gcgctgacgc ccaaggcccg gctgcggctg 1680ctggccggtg tcctgcagga tccgctgctg
cgcgcggcgg ccagccgccg tctgccgccc 1740ccgccgcagg aggtgcggac cggccgcacc
tcctggttcc ggcacaccaa gcggcgggac 1800gccaaggcca tctcccacca ctacgacgtg
tccaacacct tctatgagtg ggtgctgggc 1860ccgtcgatga cctacacctg cgcctgtttc
cccaccgagg acgccacctt ggaggaggcg 1920cagttccaca agcacgacct ggtcgccaag
aagctcgggc tgcggccggg catgcggctg 1980ctggacgtgg gctgcggctg gggcggcatg
gtgatgcacg ccgccaagca ctacggggtg 2040cgggcgctgg gcgtcacgct gtccaagcag
caggccgagt gggcgcagaa ggccatcgcc 2100gaggcgggcc tgagcgacct ggccgaggtc
cgccaccagg actaccggga cgtcaccgag 2160ggcgacttcg acgccatcag ctcgatcggc
ctcaccgagc acatcggcaa ggccaacctg 2220ccgtcctact tcggcttcct gtacggcaag
ctcaagccgg gcgggcggct gctcaaccac 2280tgcatcaccc ggcccgacaa cacccagccg
gccatgaaga aggacgggtt catcaaccgg 2340tacgtcttcc ccgacgggga gctggagggg
cccggctacc tgcagaccca gatgaacgac 2400gccggttttg agatccgcca ccaggagaac
ctgcgcgagc actacgcccg caccctggcc 2460ggatggtgcc gcaacctcga tgagcactgg
gacgaggcgg tggccgaggt cggcgagggc 2520accgcgcggg tgtggcggct gtacatggcc
ggcagccggc tcggtttcga gctcaactgg 2580atccagctgc accagatcct gggcgtcaag
ctcggcgagc gcggcgagtc ccgcatgccg 2640ttgcggcccg actggggcgt gtga
2664982664DNAThermomonospora curvata
98atgacgctgg ccaaggtctt cgaggagctg gtcggggcgg acgcccctgt ggagctcacc
60gcctacgacg gatcgagagc cggacgcctg ggcagtgatc tgcgggtcca cgtgaagtcg
120ccgtacgcgg tgtcctacct ggtgcactcg ccgagcgcgc tcgggctggc ccgcgcgtac
180gtggccgggc acctggacgc ctacggcgac atgtacacgc tgctgcggga gatgacgcag
240ctgaccgagg cgctgacgcc caaggcccgg ctgcggctgc tggccggtgt cctgcaggat
300ccgctgctgc gcgcggcggc cagccgccgt ctgccgcccc cgccgcagga ggtgcggacc
360ggccgcacct cctggttccg gcacaccaag cggcgggacg ccaaggccat ctcccaccac
420tacgacgtgt ccaacacctt ctatgagtgg gtgctgggcc cgtcgatgac ctacacctgc
480gcctgtttcc ccaccgagga cgccaccttg gaggaggcgc agttccacaa gcacgacctg
540gtcgccaaga agctcgggct gcggccgggc atgcggctgc tggacgtggg ctgcggctgg
600ggcggcatgg tgatgcacgc cgccaagcac tacggggtgc gggcgctggg cgtcacgctg
660tccaagcagc aggccgagtg ggcgcagaag gccatcgccg aggcgggcct gagcgacctg
720gccgaggtcc gccaccagga ctaccgggac gtcaccgagg gcgacttcga cgccatcagc
780tcgatcggcc tcaccgagca catcggcaag gccaacctgc cgtcctactt cggcttcctg
840tacggcaagc tcaagccggg cgggcggctg ctcaaccact gcatcacccg gcccgacaac
900acccagccgg ccatgaagaa ggacgggttc atcaaccggt acgtcttccc cgacggggag
960ctggaggggc ccggctacct gcagacccag atgaacgacg ccggttttga gatccgccac
1020caggagaacc tgcgcgagca ctacgcccgc accctggccg gatggtgccg caacctcgat
1080gagcactggg acgaggcggt ggccgaggtc ggcgagggca ccgcgcgggt gtggcggctg
1140tacatggccg gcagccggct cggtttcgag ctcaactgga tccagctgca ccagatcctg
1200ggcgtcaagc tcggcgagcg cggcgagtcc cgcatgccgt tgcggcccga ctggggcgtg
1260gctggtggtg ccgagggtgg caatggcggt ggcgccatgt cacagctggc ggtcacagac
1320caccacgagc gagcggtcga ggcgctgcgc aggtcgtatg cggcgatccc gccgggcaca
1380ccggtccgct tggccaagca gacctccaac ctgttccgct tccgcgagcc gacggccgcg
1440cccggcctgg acgtgtccgg cttcaaccgg gtgctggcgg tggacccgga tgcgcgcacc
1500gccgacgtgc agggcatgac cacctacgag gacctggtcg acgccaccct gccgcacggg
1560ctgatgccgc tggtggtgcc ccagctcaag acgatcacgc tgggcggggc ggtgaccggc
1620ctgggcatcg agtccacctc cttccgcaac ggcctgccgc acgagtcggt gctggagatg
1680cagatcatca ccggcgccgg cgaagtggtc accgccaccc cggacgggga gcactccgac
1740ctgttctggg gcttccccaa ctcctacggg acgctggggt acgccctgaa gctgaagatc
1800gaactggagc cggtcaagcc gtacgtccgg ctgcggcacc tgcgcttcga cgacgccggc
1860gagtgcgccg ccaagctcgc cgagctgagc gaaagccgcg agcacgaggg cgatgaggtg
1920cactttttgg acggcacctt cttcgggccg cgcgagatgt acctgacgct cggcacgttc
1980accgacaccg ccccctatgt gtcggactac accgggcagc acatctacta ccggtcgatc
2040cagcagcggt cgatcgactt tttgaccatc cgcgactacc tgtggcgctg ggacaccgac
2100tggttctggt gctcgcgcgc cctgggcgtg cagaacccgc tgatccggcg ggtgtggccg
2160aagagcgcca agcggtcgga tgtgtaccgc aagctggtgg cctacgaaaa gcgctaccag
2220ttcaaggcgc gcatcgaccg gtggacgggc aagccgccgc gcgaggacgt catccaggac
2280atcgaggtgc cggcagaacg cctgccggag ttcctggagt tcttccacga caagatcggg
2340atgagcccgg tgtggctgtg cccgctgcgg gcgcgccacc gctggccgct gtacccgctc
2400aagcccggcg tcacctacgt caacgccggc ttctggggga cggtgccgct gcagccgggg
2460cagatgcccg agtaccacaa ccggctgatc gaacggaagg tcgcccaact ggacggccac
2520aagtctctgt actcgacggc gttctactcg cgtgaggagt tctggcggca ctacgacggg
2580gaaacctacc ggcgtctgaa ggacacctac gaccccgacg cgcgcctgct cgacctctac
2640gacaagtgcg tgcggggacg ctga
2664
User Contributions:
Comment about this patent or add new information about this topic: