Patent application title: Modified Shine-Dalgarno Sequences and Methods of Use Thereof
Inventors:
Michael W. Laird (San Ramon, CA, US)
Michael W. Laird (San Ramon, CA, US)
Assignees:
Human Genome Sciences, Inc.
IPC8 Class: AC12P2104FI
USPC Class:
435 691
Class name: Chemistry: molecular biology and microbiology micro-organism, tissue cell culture or enzyme using process to synthesize a desired chemical compound or composition recombinant dna technique included in method of making a protein or polypeptide
Publication date: 2008-11-06
Patent application number: 20080274503
Claims:
1. An isolated polynucleotide comprising a Shine-Dalgarno sequence
selected from the group consisting of:(a) SEQ ID NO:2;(b) polynucleotides
4-13 of SEQ ID NO:2; and(c) SEQ ID NO:18.
2. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (a).
3. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (b).
4. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (c).
5. A vector comprising a Shine-Dalgarno sequence selected from the group consisting of:(a) SEQ ID NO:2;(b) polynucleotides 4-13 of SEQ ID NO:2; and(c) SEQ ID NO:18.
6. The vector of claim 5 wherein the Shine-Dalgarno sequence is (a).
7. The vector of claim 5 wherein the Shine-Dalgarno sequence is (b).
8. The vector of claim 5 wherein the Shine-Dalgarno sequence is (c).
9. The vector of claim 5, wherein said Shine-Dalgarno sequence is operably associated with a polynucleotide encoding a protein or fragment thereof.
10. The vector of claim 9, wherein said polynucleotide encodes SEQ ID NO:4.
11. The vector of claim 9, wherein said polynucleotide is operably associated with an expression control sequence.
12. A method of producing a vector comprising inserting the Shine-Dalgarno sequence of claim 1 into a vector.
13. A method of producing a host cell comprising transducing, transforming or transfecting a host cell with the vector of claim 5.
14. A recombinant host cell comprising the Shine-Dalgarno sequence of claim 1.
15. A recombinant host cell comprising the vector of claim 5.
16. A recombinant host cell comprising the vector of claim 9.
17. A method of producing a protein, comprising:(a) culturing the host cell of claim 16 under conditions suitable to produce the protein or fragment thereof, and(b) recovering the protein or fragment thereof from the cell culture.
18. The method of claim 17, wherein said polynucleotide encodes SEQ ID NO:4.
Description:
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001]This application is a divisional of U.S. application Ser. No. 11/447,892, filed Jun. 7, 2006, which is a divisional of U.S. application Ser. No. 11/004,853, filed Dec. 7, 2004 (now U.S. Pat. No. 7,094,573, issued Aug. 22, 2006), which is a continuation of International Application No. PCT/US03/19786, filed Jun. 25, 2003, which claims benefit under 35 U.S.C. § 119(e) to U.S. Provisional Application Nos. 60/391,433, filed Jun. 26, 2002, and 60/406,630, filed Aug. 29, 2002, each of which is hereby incorporated by reference in its entirety.
STATEMENT UNDER 37 C.F.R. § 1.77(b)(5)
[0002]This application refers to a "Sequence Listing" listed below, which is provided as a text document. The text document is entitled "PV595D2-SeqList.txt" (47,517 bytes, created May 19, 2008), which is incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0003]The present invention relates to novel Shine-Dalgarno (ribosome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors. The present invention also relates to methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems, and in one aspect of the invention, provides for high efficiency production of soluble protein in prokaryotic systems.
BACKGROUND OF THE INVENTION
[0004]The level of production of a protein in a host cell is determined by three major factors: the number of copies of its structural gene within the cell, the efficiency with which the structural gene copies are transcribed and the efficiency with which the resulting messenger RNA ("mRNA") is translated. The transcription and translation efficiencies are, in turn, dependent on nucleotide sequences that are normally situated ahead of the desired structural genes or the translated sequence. These nucleotide sequences, also known as expression control sequences, define, inter alia, the locations at which RNA polymerase binds (the promoter sequence to initiate transcription; see also EMBO J. 5:2995-3000 (1986)) and at which ribosomes bind and interact with the mRNA (the product of transcription) to initiate translation.
[0005]In most prokaryotes, the purine-rich ribosome binding site known as the Shine-Dalgarno (S-D) sequence assists with the binding and positioning of the 30S ribosome component relative to the start codon on the mRNA through interaction with a pyrimidine-rich region of the 16S ribosomal RNA. See, e.g., Shine & Dalgarno, Proc. Natl. Acad. Sci. USA 71:1342-46 (1976). The S-D sequence is located on the mRNA downstream from the start of transcription and upstream from the start of translation, typically from 4-14 nucleotides upstream of the start codon, and more typically from 8-10 nucleotides upstream of the start codon. Because of the role of the S-D sequence in translation, there is a direct relationship between the efficiency of translation and the efficiency (or strength) of the S-D sequence.
[0006]Not all S-D sequences have the same efficiency, however. Accordingly, prior attempts have been made to increase the efficiency of ribosomal binding, positioning, and translation by, inter alia, changing the distance between the S-D sequence and the start codon, changing the composition of the space between the S-D sequence and the start codon, modifying an existing S-D sequence, using a heterologous S-D sequence, and manipulating of the secondary structure of mRNA during the initiation of translation. Despite these changes, however, success in increasing of protein expression efficiency in prokaryotic systems has remained an elusive and unpredictable goal due to a variety of factors, including, inter alia, the host cells used, the expression control sequences (including the S-D sequence) used, and the characteristics of the gene and protein being expressed. See, e.g., Stenstrom, et al., Gene 273(2):259-265 (2001); Komarova, et al., Bioorg. Khim. 27(4)282-290 (2001); Stenstrom, et al., Gene 263(1-2):273-284 (2001); and Mironova, et al., Microbiol. Res. 154(1):35-41 (1999). For example, efficient expression of soluble B. anthracis protective antigen (PA) has proved difficult in E. coli. See, e.g., Sharma, et al. Protein Expression and Purification 7:33-38 (1996) (indicating 0.5 mg/L at 70% purity); Chauhan, et al. Biochem. Biophys. Res. Commun.; 283(2):308-15 (2001) (indicating 125 mg/L); Gupta, et al. Protein Expr. Purif 16(3):369-76 (1999) (indicating 2 mg/L).
[0007]Accordingly, there remains a demand in the art for compositions and methods for increasing the efficiency of ribosome binding and translation in prokaryotic systems, thereby resulting in increased efficiency of protein expression. This demand is especially strong for proteins that are difficult to express in existing systems, and for proteins that are desired in large quantity for pharmacological, therapeutic, or industrial use.
SUMMARY OF THE INVENTION
[0008]The present invention encompasses novel Shine-Dalgarno sequences that result in increased efficiency of protein expression in prokaryotic systems. The present invention further relates to vectors comprising such S-D sequences and host cells transformed with such vectors. In particular embodiments, the present invention relates to methods for producing proteins and fragments thereof in prokaryotic systems using such S-D sequences, vectors, and host cells. In certain embodiments, methods of use of the S-D sequences, vectors, and host cells of the invention provide high efficiency production of soluble protein in prokaryotic systems, including prokaryotic in vitro translation systems.
[0009]In particular embodiments of the invention, the novel S-D sequence comprises (or alternately consists of) SEQ ID NO:2. In additional embodiments, the novel S-D sequence comprises (or alternately consists of) nucleotides 4-13 of SEQ ID NO:2. The invention also encompasses the S-D sequence of SEQ ID NO:18, described at paragraph 0426 of U.S. Provisional Application No. 60/368,548, filed Apr. 1, 2002, and in U.S. Provisional Application No. 60/331,478, filed Nov. 16, 2001, each of which is hereby incorporated by reference herein in its entirety.
[0010]The protein or fragment thereof may be of prokaryotic, eukaryotic, or viral origin, or may be artificial. In particular embodiments, the S-D sequences, vectors, and host cells of the invention are used to express B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, TL6, or other proteins. In certain embodiments, the S-D sequences, vectors, and host cells of the invention are used to express proteins that have previously been difficult to express in prokaryotic systems. The present invention also encompasses the combination of novel S-D sequences with a variety of expression control sequences, such as those described in detail in U.S. Pat. No. 6,194,168 (which is hereby incorporated by reference herein in its entirety), and in particular, expression control sequences comprising at least a portion of one or more lac operator sequences and a phage promoter comprising a -30 region.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011]FIG. 1 depicts a Shine-Dalgarno sequence of the present invention (SEQ ID NO: 2) and the Shine-Dalgarno sequence contained in the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168). Bases matching the S-D sequence of the present invention (SEQ ID NO:2) are highlighted.
[0012]FIG. 2A depicts a map of the pHE6 vector (SEQ ID NO:1), which incorporates a S-D sequence of the invention. FIG. 2B depicts the pHE6 vector (SEQ ID NO:1) with the gene encoding mature Bacillus anthracis PA including an ETB signal sequence (SEQ ID NO:3) inserted.
[0013]FIGS. 3A-3B compare the efficiency of TL6 protein expression using the pHE4 vector (FIG. 3B) versus the pHE6 vector (FIG. 3A), which uses a S-D sequence of the invention. In particular, increased soluble TL6 expression with the pHE6 vector can be seen in FIG. 3A as a lack of "shadow" in the gel.
[0014]FIG. 4 depicts a gel showing the quantity and quality of PA after expression using pHE6 and subsequent purification. Using the compositions and methods of the invention, approximately 150 mg/L of soluble PA at greater than 96% purity (as measured by RP-HPLC) was obtained.
DETAILED DESCRIPTION OF THE INVENTION
[0015]The instant invention is directed to novel Shine-Dalgarno (ribosomal binding site) sequences. These S-D sequences result in increased efficiency of protein expression in prokaryotic systems. The S-D sequences of the present invention have been optimized through modification of several nucleotides. See, e.g., FIG. 1. In particular embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:2. In additional embodiments, the S-D sequences of the present invention comprise (or alternately consist of) nucleotides 4-13 of SEQ ID NO:2. In other embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:18.
[0016]In many embodiments, the S-D sequences of the present invention are used in prokaryotic cells. Exemplary bacterial cells suitable for use with the instant invention include E. coli, B. subtilis, S. aureus, S. typhimurium, and other bacteria used in the art. In other embodiments, the S-D sequences of the present invention are used in prokaryotic in vitro transcription systems.
[0017]The present invention also relates to vectors and plasmids comprising one or more S-D sequences of the invention. Such vectors and plasmids generally also further comprise one or more restriction enzyme sites downstream of the S-D sequence for cloning and expression of a gene or polynucleotide of interest.
[0018]In certain embodiments, vectors and plasmids of the present invention further comprise additional expression control sequences, including but not limited to those described in U.S. Pat. No. 6,194,168, and in particular, M (SEQ ID NO:5), M+D (SEQ ID NO:6), U+D (SEQ ID NO:7), M+D1 (SEQ ID NO:8), and M+D2 (SEQ ID NO:9). More generally, the expression control sequence elements contemplated include bacterial or phage promoter sequences and functional variants thereof, whether natural or artificial; operator/repressor systems; and the lacIq gene (which confers tight regulation of the lac operator by blocking transcription of down-stream (i.e., 3') sequences).
[0019]The lac operator sequences contemplated for use in vectors and plasmids of the instant invention comprise (or alternately consist of) the entire lac operator sequence represented by the sequence 5' AATTGTGAGCGGATAACAATTTCACACA 3' (SEQ ID NO:10), or a portion thereof that retains at least partial activity, as described in U.S. Pat. No. 6,194,168. Activity is routinely determined using techniques well known in the art to measure the relative repressability of a promoter sequence in the absence of an inducer, such as IPTG. This is done by comparing the relative amounts of protein expressed from expression control sequences comprising portions of the lac operator sequence and full-length lac operator sequence. The partial operator sequence is measured relative to the full-length lac operator sequence (e.g., SEQ ID NO:10). In one embodiment, partial activity for the purposes of the present invention means activity reduced by no more than 100 fold relative to the full-length sequence. In alternative embodiments, partial activity for the purpose of the present invention means activity reduced by no more than 75, 50, 25, 20, 15, and 10 fold, relative to the full-length lac operator sequence. In a preferred embodiment, the activity of a partial operator sequence is reduced by no more than 10 fold relative to the activity of the full-length sequence.
[0020]In many embodiments, one or more S-D sequences of the invention are used in a vector comprising a T5 phage promoter sequence and two lac operator sequences wherein at least a portion of the full-length lac operator sequence (SEQ ID NO:10) is located within the spacer region between -12 and -30 of the expression control sequences described in U.S. Pat. No. 6,194,168. In particular embodiments, the operator sequence comprises (or alternately consists of) at least the sequence 5'-GTGAGCGGATAACAAT-3' (SEQ ID NO:11).
[0021]The previously mentioned lac-operator sequences are negatively regulated by the lac-repressor. The corresponding repressor gene can be introduced into the host cell in a vector or through integration into the chromosome of a bacterium by known methods, such as by integration of the lacIq gene. See, e.g., Miller et al, supra; Calos, (1978) Nature 274:762-765. The vector encoding the repressor molecule may be the same vector that contains the expression control sequences and a gene or polynucleotide of interest or may be a separate vector.
[0022]The S-D sequences of the invention can routinely be inserted using procedures known in the art into any suitable expression vector that can replicate in gram-negative and/or gram-positive bacteria. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.). Suitable vectors and plasmids can be constructed from segments of chromosomal, non-chromosomal and synthetic DNA sequences, such as various known plasmid and phage DNAs. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989). Especially suitable vectors include plasmids of the pDS family. See Bujard et al, (1987) Methods in Enzymology, 155:416-4333. Additional examples of preferred suitable plasmids include pBR322 and pBluescript® (Stratagene, La Jolla, Calif.) based plasmids. Still additional examples of preferred suitable plasmids include pUC-based vectors, including pUC18 and pUC19 (New England Biolabs, Beverly, Mass.) and pREP4 (Qiagen Inc., Chatsworth, Calif.). Portions of vectors and plasmids encoding desired functions may also be combined to form new vectors with desired characteristics. For example, the origin of replication of pUC19 may be recombined with the kanamycin resistance gene of pREP4 to create a new vector with both desired characteristics.
[0023]Preferably, vectors and plasmids comprising one or more S-D sequences of the invention also contain sequences that allow replication of the plasmid to high copy number in the host bacterium of choice. Additionally, vector or plasmid embodiments of the invention that comprise expression control sequences may further comprise a multiple cloning site immediately downstream of the expression control sequences and the S-D sequence.
[0024]Vectors and plasmids comprising one or more S-D sequences of the invention may further comprise genes conferring antibiotic resistance. Preferred genes are those conferring resistance to ampicillin, chloramphenicol, and tetracycline. Especially preferred genes are those conferring resistance to kanamycin.
[0025]The optimized S-D ribosomal binding site of the invention can also be inserted into the chromosome of gram-negative and gram-positive bacterial cells using techniques known in the art. In this case, selection agents such as antibiotics, which are generally required when working with vectors, can be dispensed with.
[0026]Proteins of interest that can be expressed using the S-D sequences, vectors, and host cells of the invention include prokaryotic, eukaryotic, viral, or artificial proteins. Such proteins include, but are not limited to: enzymes; hormones; proteins having immunoregulatory, antiviral or antitumor activity; antibodies and fragments thereof (e.g., Fab, F(ab), F(ab)2, single-chain Fv, disulfide-linked Fv); or antigens. In preferred embodiments, the protein to be expressed is B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, or TL6. Any effective signal sequence may be used in combination with the gene or polynucleotide of interest. In a preferred embodiment, the ETB signal sequence is used to enhance the expression of soluble protein.
[0027]The S-D sequences of the present invention provide for increased efficiency of protein expression in prokaryotic systems. Efficient expression means that the level of protein expression to be expected when using the S-D sequences of the instant invention is generally higher than levels previously reported in the art. In preferred embodiments, the resultant expressed protein can be highly purified to levels greater than 90% purity by RF-HPLC. Particularly preferred purity levels include 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, and near 100% purity, all of which are encompassed by the instant invention. It is expressly contemplated by the invention that the addition of one or more S-D sequences of the invention into any prokaryotic-based expression system, including and in addition to E. coli expression systems, will result in increased and more efficient protein expression.
[0028]The present invention also relates to methods of using the S-D sequences, vectors, plasmids, and host cells of the invention to produce proteins and fragments thereof. In one embodiment of the invention, a desired protein is produced by a method comprising:
[0029](a) transforming a bacterium with a vector in which a polynucleotide encoding a desired protein is operably linked to a S-D sequence of the invention;
[0030](b) culturing the transformed bacterium under suitable growth conditions; and
[0031](c) isolating the desired protein from the culture.
[0032]In another embodiment of the invention, a desired protein is produced by a method comprising:
[0033](a) inserting a S-D sequence of the invention and an expression control sequence into the chromosome of a suitable bacterium, wherein the S-D sequence and expression control sequence are each operably linked to a polynucleotide encoding a desired protein;
[0034](b) cultivating the bacterium under suitable growth conditions; and
[0035](c) isolating the desired protein from the culture.
[0036]The selection of a suitable host organism is determined by various factors that are well known in the art. Factors to be considered include, for example, compatibility with the selected vector, toxicity of the expression product, expression characteristics, necessary biological safety precautions and costs.
[0037]Suitable host organisms include, but are not limited to, gram-negative and gram-positive bacteria, such as E. coli, B. subtilis, S. aureus, and S. typhimurium strains. Preferred E. coli strains include DH5α (Gibco-BRL, Gaithersburg, Md.), XL-1 Blue (Stratagene®), and W3110 (ATCC® No. 27325). Other E. coli strains that can be used according to the present invention include other generally available strains such as E. coli 294 (ATCC® No. 31446), E. coli RR1 (ATCC® No. 31343) and M15.
EXAMPLES
[0038]The examples which follow are set forth to aid in understanding the invention but are not intended to, and should not be construed to, limit the scope of the invention in any way. The examples do not include detailed descriptions for conventional methods employed in the art, such as for the construction of vectors, the insertion of genes encoding polypeptides of interest into such vectors, or the introduction of the resulting plasmids into bacterial hosts. Such methods are described in numerous publications and can be carried out using recombinant DNA technology methods which are well known in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.).
Example 1
pHE6 Design
[0039]The S-D sequence used in pHE6 (SEQ ID NO:2) was based on the S-D sequence of the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168), with three base pair changes made as indicated in FIG. 1. Additionally, the pHE6 plasmid encodes the aminoglycoside phosphotransferase protein (conferring kanamycin resistance), the lacIq repressor, and includes a ColE1 replicon. Construction of the pHE4 plasmid upon which the pHE6 plasmid is based is described in U.S. Pat. No. 6,194,168.
Example 2
Method of Making and Purifying PA in Escherichia coli K-12
[0040]Using the following method, a post-purification final yield of soluble PA greater than 2 g from 1 kg of E. coli cell paste (approximately 150 mg/L) can be obtained from either shake flasks or bioreactors. See FIG. 4. The purity of such soluble PA, as judged by RP-HPLC analysis, is greater than 96-98%.
[0041]The bacterial host strain used for the production of recombinant wild-type PA from a recombinant plasmid DNA molecule is an E. coli K-12 derived strain. To express protein from the expression vectors, E. coli cells were transformed with the expression vectors and grown overnight (O/N) at 30° C. in 4 L shaker flasks containing 1 L Luria broth medium supplemented with kanamycin. The cultures were started at optical density 600λ (O.D.600) of 0.1. IPTG was added to a final concentration of 1 mM when the culture reached an O.D.600 of between 0.4 and 0.6. IPTG induced cultures were grown for an additional 3 hours. Cells were then harvested using methods known in the art, and the level of protein was detected using Western blot analysis. Soluble PA was then extracted from the periplasm and clarified by conventional means. The clarified supernatant was then purified using a Q Sepharose HP column (Amersham), concentrated, and further purified using a Biogel Hydroxyapatite HP column (BioRAD). Using the expression control sequence M+D1 (SEQ ID NO:8), high levels of repression in the absence of IPTG, and high levels of induced expression in the presence of IPTG were obtained.
Deposit of Microorganisms
[0042]Plasmid pHE6 was deposited with the American Type Culture Collection, 10801 University Boulevard, Manassas, Va. 20110-2209 on Jun. 20, 2002 and was given Accession No. PTA-4474. This culture has been accepted for deposit under the provisions of the Budapest Treaty on the International Recognition of Microorganisms for the Purposes of Patent Proceedings.
[0043]The disclosures of all publications (including patents, patent applications, journal articles, laboratory manuals, books, or other documents) cited herein are hereby incorporated by reference in their entireties.
[0044]The present invention is not to be limited in scope by the specific embodiments described herein, which are intended as illustrations of individual aspects of the invention. Functionally equivalent methods and components are within the scope of the invention, in addition to those shown and described herein and will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims.
Sequence CWU
1
1813979DNAArtificial sequencepHE6 expression plasmid including novel
Shine- Dalgarno sequence 1aagcttaaaa aactgcaaaa aatagtttga
cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa tttcacacat
tataaaggaa aaattacata tgaaggatcc 120aaggtacctg agtagggcgt ccgatcgacg
gacgcctttt ttttgaattc gtaatcatgt 180catagctgtt tcctgtgtga aattgttatc
cgctcacaat tccacacaac atacgagccg 240gaagcataaa gtgtaaagcc tggggtgcct
aatgagtgag ctaactcaca ttaattgcgt 300tgcgctcact gcccgctttc cagtcgggaa
acctgtcgtg ccagctgcat taatgaatcg 360gccaacgcgc ggggagaggc ggtttgcgta
ttgggcgctc ttccgcttcc tcgctcactg 420actcgctgcg ctcggtcgtt cggctgcggc
gagcggtatc agctcactca aaggcggtaa 480tacggttatc cacagaatca ggggagaacg
caggaaagaa catgtgagca aaaggccagc 540aaaaggccag gaaccgtaaa aaggccgcgt
tgctggcgtt tttccatagg ctccgccccc 600ctgacgagca tcacaaaaat cgacgctcaa
gtcagaggtg gcgaaacccg acaggactat 660aaagatacca ggcgtttccc cctggaagct
ccctcgtgcg ctctcctgtt ccgaccctgc 720cgcttaccgg atacctgtcc gcctttctcc
cttcgggaag cgtggcgctt tctcatagct 780cacgctgtag gtatctcagt tcggtgtaag
tcgttcgctc caagctgggc tgtgtgcacg 840aaccccccgt tcagcccgac cgctgcgcct
tatccggtaa ctatcgtctt gagtccaacc 900cggtaagaca cgacttatcg ccactggcag
cagccactgg taacaggatt agcagagcga 960ggtatgtagg cggtgctaca gagttcttga
agtggtggcc taactacggc tacactagaa 1020gaacagtatt tggtatctgc gctctgctga
agccagttac cttcggaaaa agagttggta 1080gctcttgatc cggcaaacaa accaccgctg
gtagcggtgg tttttttgtt tgcaagcagc 1140agattacgcg cagaaaaaaa ggatctcaag
aagatccttt gatcttttct acggggtctg 1200acgctcagtg gaacgaaaac tcacgttaag
ggattttggt catgagatta tcgtcgacaa 1260ttcgcgcgcg aaggcgaagc ggcatgcatt
tacgttgaca ccatcgaatg gtgcaaaacc 1320tttcgcggta tggcatgata gcgcccggaa
gagagtcaat tcagggtggt gaatgtgaaa 1380ccagtaacgt tatacgatgt cgcagagtat
gccggtgtct cttatcagac cgtttcccgc 1440gtggtgaacc aggccagcca cgtttctgcg
aaaacgcggg aaaaagtgga agcggcgatg 1500gcggagctga attacattcc caaccgcgtg
gcacaacaac tggcgggcaa acagtcgttg 1560ctgattggcg ttgccacctc cagtctggcc
ctgcacgcgc cgtcgcaaat tgtcgcggcg 1620attaaatctc gcgccgatca actgggtgcc
agcgtggtgg tgtcgatggt agaacgaagc 1680ggcgtcgaag cctgtaaagc ggcggtgcac
aatcttctcg cgcaacgcgt cagtgggctg 1740atcattaact atccgctgga tgaccaggat
gccattgctg tggaagctgc ctgcactaat 1800gttccggcgt tatttcttga tgtctctgac
cagacaccca tcaacagtat tattttctcc 1860catgaagacg gtacgcgact gggcgtggag
catctggtcg cattgggtca ccagcaaatc 1920gcgctgttag cgggcccatt aagttctgtc
tcggcgcgtc tgcgtctggc tggctggcat 1980aaatatctca ctcgcaatca aattcagccg
atagcggaac gggaaggcga ctggagtgcc 2040atgtccggtt ttcaacaaac catgcaaatg
ctgaatgagg gcatcgttcc cactgcgatg 2100ctggttgcca acgatcagat ggcgctgggc
gcaatgcgcg ccattaccga gtccgggctg 2160cgcgttggtg cggatatctc ggtagtggga
tacgacgata ccgaagacag ctcatgttat 2220atcccgccgt taaccaccat caaacaggat
tttcgcctgc tggggcaaac cagcgtggac 2280cgcttgctgc aactctctca gggccaggcg
gtgaagggca atcagctgtt gcccgtctca 2340ctggtgaaaa gaaaaaccac cctggcgccc
aatacgcaaa ccgcctctcc ccgcgcgttg 2400gccgattcat taatgcagct ggcacgacag
gtttcccgac tggaaagcgg gcagtgagcg 2460caacgcaatt aatgtaagtt agcgcgaatt
gtcgaccaaa gcggccatcg tgcctcccca 2520ctcctgcagt tcgggggcat ggatgcgcgg
atagccgctg ctggtttcct ggatgccgac 2580ggatttgcac tgccggtaga actccgcgag
gtcgtccagc ctcaggcagc agctgaacca 2640actcgcgagg ggatcgagcc cggggtgggc
gaagaactcc agcatgagat ccccgcgctg 2700gaggatcatc cagccggcgt cccggaaaac
gattccgaag cccaaccttt catagaaggc 2760ggcggtggaa tcgaaatctc gtgatggcag
gttgggcgtc gcttggtcgg tcatttcgaa 2820ccccagagtc ccgctcagaa gaactcgtca
agaaggcgat agaaggcgat gcgctgcgaa 2880tcgggagcgg cgataccgta aagcacgagg
aagcggtcag cccattcgcc gccaagctct 2940tcagcaatat cacgggtagc caacgctatg
tcctgatagc ggtccgccac acccagccgg 3000ccacagtcga tgaatccaga aaagcggcca
ttttccacca tgatattcgg caagcaggca 3060tcgccatggg tcacgacgag atcctcgccg
tcgggcatgc gcgccttgag cctggcgaac 3120agttcggctg gcgcgagccc ctgatgctct
tcgtccagat catcctgatc gacaagaccg 3180gcttccatcc gagtacgtgc tcgctcgatg
cgatgtttcg cttggtggtc gaatgggcag 3240gtagccggat caagcgtatg cagccgccgc
attgcatcag ccatgatgga tactttctcg 3300gcaggagcaa ggtgagatga caggagatcc
tgccccggca cttcgcccaa tagcagccag 3360tcccttcccg cttcagtgac aacgtcgagc
acagctgcgc aaggaacgcc cgtcgtggcc 3420agccacgata gccgcgctgc ctcgtcctgc
agttcattca gggcaccgga caggtcggtc 3480ttgacaaaaa gaaccgggcg cccctgcgct
gacagccgga acacggcggc atcagagcag 3540ccgattgtct gttgtgccca gtcatagccg
aatagcctct ccacccaagc ggccggagaa 3600cctgcgtgca atccatcttg ttcaatcatg
cgaaacgatc ctcatcctgt ctcttgatca 3660gatcttgatc ccctgcgcca tcagatcctt
ggcggcaaga aagccatcca gtttactttg 3720cagggcttcc caaccttacc agagggcgcc
ccagctggca attccggttc gcttgctgtc 3780cataaaaccg cccagtctag ctatcgccat
gtaagcccac tgcaagctac ctgctttctc 3840tttgcgcttg cgttttccct tgtccagata
gcccagtagc tgacattcat ccggggtcag 3900caccgtttct gcggactggc tttctacgtg
ttccgcttcc tttagcagcc cttgcgccct 3960gagtgcttgc ggcagcgtg
3979218DNAArtificial
sequenceShine-Dalgarno sequence 2attataaagg aaaaatta
1832268DNAArtificial sequenceMature PA
sequence including an ETB signal sequence 3atgaataaag taaaatgtta
tgttttattt acggcgttac tatcctctct atatgcccat 60gga gaa gtt aaa cag gaa
aac cgt ctg ctc aac gaa tct gag tct tcc 108Glu Val Lys Gln Glu Asn
Arg Leu Leu Asn Glu Ser Glu Ser Ser1 5 10
15tct cag ggc ctg ctg ggt tac tat ttc tct gac ctg aac
ttc cag gca 156Ser Gln Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn
Phe Gln Ala20 25 30ccg atg gtt gta act
tct tcc acc acc ggc gac ctg tct att ccg tct 204Pro Met Val Val Thr
Ser Ser Thr Thr Gly Asp Leu Ser Ile Pro Ser35 40
45tct gaa ctg gag aac atc ccg tct gaa aac cag tac ttc cag tct
gct 252Ser Glu Leu Glu Asn Ile Pro Ser Glu Asn Gln Tyr Phe Gln Ser
Ala50 55 60atc tgg tct ggt ttc att aaa
gtt aag aaa tct gac gaa tac acc ttc 300Ile Trp Ser Gly Phe Ile Lys
Val Lys Lys Ser Asp Glu Tyr Thr Phe65 70
75gct act tct gca gat aac cac gtt act atg tgg gta gac gac cag gaa
348Ala Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gln Glu80
85 90 95gtt atc aac aaa gct
tct aac tct aac aaa atc cgt ctg gaa aaa ggc 396Val Ile Asn Lys Ala
Ser Asn Ser Asn Lys Ile Arg Leu Glu Lys Gly100 105
110cgt ctg tac cag atc aag att caa tac caa cgt gaa aac ccg acc
gag 444Arg Leu Tyr Gln Ile Lys Ile Gln Tyr Gln Arg Glu Asn Pro Thr
Glu115 120 125aaa ggt ctg gac ttc aaa ctg
tac tgg acc gac tct cag aac aag aaa 492Lys Gly Leu Asp Phe Lys Leu
Tyr Trp Thr Asp Ser Gln Asn Lys Lys130 135
140gaa gtt atc tct tcc gac aac ctg cag ctg ccg gaa ctg aaa cag aaa
540Glu Val Ile Ser Ser Asp Asn Leu Gln Leu Pro Glu Leu Lys Gln Lys145
150 155tct tcc aac tct cgt aaa aag cgt tct
act tct gct ggt ccg acc gtt 588Ser Ser Asn Ser Arg Lys Lys Arg Ser
Thr Ser Ala Gly Pro Thr Val160 165 170
175ccg gac cgt gat aac gac ggt att ccg gac tct ctg gaa gtt
gaa ggc 636Pro Asp Arg Asp Asn Asp Gly Ile Pro Asp Ser Leu Glu Val
Glu Gly180 185 190tac acc gta gac gtt aaa
aac aaa cgt acc ttc ctg tct ccg tgg atc 684Tyr Thr Val Asp Val Lys
Asn Lys Arg Thr Phe Leu Ser Pro Trp Ile195 200
205tct aac atc cac gaa aag aaa ggt ctg acc aaa tac aaa tct tcc ccg
732Ser Asn Ile His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro210
215 220gag aaa tgg tct acc gct tct gat ccg
tac tct gac ttc gaa aaa gtt 780Glu Lys Trp Ser Thr Ala Ser Asp Pro
Tyr Ser Asp Phe Glu Lys Val225 230 235act
ggt cgt atc gac aaa aac gtt tct ccg gaa gct cgt cac ccg ctg 828Thr
Gly Arg Ile Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu240
245 250 255gta gca gcg tac ccg atc
gtt cac gtt gac atg gaa aac att atc ctg 876Val Ala Ala Tyr Pro Ile
Val His Val Asp Met Glu Asn Ile Ile Leu260 265
270tct aaa aac gaa gac cag tct acc cag aac acc gac tct caa act cgt
924Ser Lys Asn Glu Asp Gln Ser Thr Gln Asn Thr Asp Ser Gln Thr Arg275
280 285acc atc tct aaa aac acc tct acc tct
cgt act cac acc tct gaa gtt 972Thr Ile Ser Lys Asn Thr Ser Thr Ser
Arg Thr His Thr Ser Glu Val290 295 300cac
ggt aac gct gag gtt cac gct tct ttc ttt gac atc ggt ggc tct 1020His
Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp Ile Gly Gly Ser305
310 315gta tct gct ggt ttc tct aac tct aac tct tct
acc gtt gca atc gac 1068Val Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser
Thr Val Ala Ile Asp320 325 330
335cac tct ctg tct ctg gct ggt gaa cgt acc tgg gct gaa act atg ggc
1116His Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly340
345 350ctg aac acc gca gac acc gct cgt ctg
aac gct aac atc cgt tac gtt 1164Leu Asn Thr Ala Asp Thr Ala Arg Leu
Asn Ala Asn Ile Arg Tyr Val355 360 365aac
acc ggc acc gct ccg atc tac aac gtt ctg ccg act acc tct ctg 1212Asn
Thr Gly Thr Ala Pro Ile Tyr Asn Val Leu Pro Thr Thr Ser Leu370
375 380gta ctg ggt aaa aac cag acc ctg gca acc atc
aaa gct gac gaa aac 1260Val Leu Gly Lys Asn Gln Thr Leu Ala Thr Ile
Lys Ala Asp Glu Asn385 390 395cag ctg tct
cag atc ctg gct ccg aac aac tac tat ccg tct aaa aac 1308Gln Leu Ser
Gln Ile Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn400
405 410 415ctg gct ccg att gca ctg aac
gct cag aaa gac ttc tct tcc acc ccg 1356Leu Ala Pro Ile Ala Leu Asn
Ala Gln Lys Asp Phe Ser Ser Thr Pro420 425
430atc act atg aac tac aac cag ttc ctg gaa ctg gag aaa acc aaa cag
1404Ile Thr Met Asn Tyr Asn Gln Phe Leu Glu Leu Glu Lys Thr Lys Gln435
440 445ctg cgt ctg gac acc gac cag gtt tac
ggt aac atc gct acc tac aac 1452Leu Arg Leu Asp Thr Asp Gln Val Tyr
Gly Asn Ile Ala Thr Tyr Asn450 455 460ttc
gaa aac ggt cgt gtt cgt gta gac acc ggc tct aac tgg tct gaa 1500Phe
Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu465
470 475gtt ctg ccg cag atc cag gaa acc act gct cgt
att atc ttc aac ggt 1548Val Leu Pro Gln Ile Gln Glu Thr Thr Ala Arg
Ile Ile Phe Asn Gly480 485 490
495aaa gac ctg aac ctg gtt gaa cgt cgt atc gct gca gta aac ccg tct
1596Lys Asp Leu Asn Leu Val Glu Arg Arg Ile Ala Ala Val Asn Pro Ser500
505 510gac ccg ctg gaa acc act aaa ccg gac
atg acc ctg aaa gaa gct ctg 1644Asp Pro Leu Glu Thr Thr Lys Pro Asp
Met Thr Leu Lys Glu Ala Leu515 520 525aaa
atc gct ttc ggt ttc aac gaa ccg aac ggc aac ctg cag tac cag 1692Lys
Ile Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gln Tyr Gln530
535 540ggt aaa gat atc acc gaa ttc gac ttt aac ttc
gac cag caa acc tct 1740Gly Lys Asp Ile Thr Glu Phe Asp Phe Asn Phe
Asp Gln Gln Thr Ser545 550 555cag aac atc
aaa aac cag ctg gct gaa ctg aac gct acc aac atc tac 1788Gln Asn Ile
Lys Asn Gln Leu Ala Glu Leu Asn Ala Thr Asn Ile Tyr560
565 570 575acc gtt ctg gac aaa atc aag
ctg aac gct aaa atg aac att ctg atc 1836Thr Val Leu Asp Lys Ile Lys
Leu Asn Ala Lys Met Asn Ile Leu Ile580 585
590cgt gat aaa cgt ttc cac tac gac cgt aac aac atc gct gtt ggt gct
1884Arg Asp Lys Arg Phe His Tyr Asp Arg Asn Asn Ile Ala Val Gly Ala595
600 605gac gaa tct gta gtt aaa gaa gct cac
cgt gag gtt atc aac tct tcc 1932Asp Glu Ser Val Val Lys Glu Ala His
Arg Glu Val Ile Asn Ser Ser610 615 620acc
gaa ggt ctg ctc ctg aac atc gac aaa gat att cgt aaa atc ctg 1980Thr
Glu Gly Leu Leu Leu Asn Ile Asp Lys Asp Ile Arg Lys Ile Leu625
630 635tct ggt tac atc gtt gaa atc gaa gac acc gag
ggc ctg aaa gaa gtt 2028Ser Gly Tyr Ile Val Glu Ile Glu Asp Thr Glu
Gly Leu Lys Glu Val640 645 650
655atc aac gac cgt tac gat atg ctg aac atc tct tcc ctg cgt cag gac
2076Ile Asn Asp Arg Tyr Asp Met Leu Asn Ile Ser Ser Leu Arg Gln Asp660
665 670ggt aaa acc ttc atc gac ttc aaa aag
tac aac gat aaa ctg ccg ctg 2124Gly Lys Thr Phe Ile Asp Phe Lys Lys
Tyr Asn Asp Lys Leu Pro Leu675 680 685tac
atc tct aac ccg aac tac aaa gta aac gtt tac gct gtt acc aaa 2172Tyr
Ile Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys690
695 700gaa aac acc att atc aac ccg tct gaa aac ggt
gac acc tct acc aac 2220Glu Asn Thr Ile Ile Asn Pro Ser Glu Asn Gly
Asp Thr Ser Thr Asn705 710 715ggt atc aaa
aag atc ctg atc ttc tct aag aaa ggc tac gaa atc ggt 2268Gly Ile Lys
Lys Ile Leu Ile Phe Ser Lys Lys Gly Tyr Glu Ile Gly720
725 730 7354735PRTArtificial
sequenceMature PA sequence including an ETB signal sequence 4Glu
Val Lys Gln Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser1
5 10 15Gln Gly Leu Leu Gly Tyr Tyr
Phe Ser Asp Leu Asn Phe Gln Ala Pro20 25
30Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser Ile Pro Ser Ser35
40 45Glu Leu Glu Asn Ile Pro Ser Glu Asn Gln
Tyr Phe Gln Ser Ala Ile50 55 60Trp Ser
Gly Phe Ile Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala65
70 75 80Thr Ser Ala Asp Asn His Val
Thr Met Trp Val Asp Asp Gln Glu Val85 90
95Ile Asn Lys Ala Ser Asn Ser Asn Lys Ile Arg Leu Glu Lys Gly Arg100
105 110Leu Tyr Gln Ile Lys Ile Gln Tyr Gln
Arg Glu Asn Pro Thr Glu Lys115 120 125Gly
Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gln Asn Lys Lys Glu130
135 140Val Ile Ser Ser Asp Asn Leu Gln Leu Pro Glu
Leu Lys Gln Lys Ser145 150 155
160Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val
Pro165 170 175Asp Arg Asp Asn Asp Gly Ile
Pro Asp Ser Leu Glu Val Glu Gly Tyr180 185
190Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp Ile Ser195
200 205Asn Ile His Glu Lys Lys Gly Leu Thr
Lys Tyr Lys Ser Ser Pro Glu210 215 220Lys
Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr225
230 235 240Gly Arg Ile Asp Lys Asn
Val Ser Pro Glu Ala Arg His Pro Leu Val245 250
255Ala Ala Tyr Pro Ile Val His Val Asp Met Glu Asn Ile Ile Leu
Ser260 265 270Lys Asn Glu Asp Gln Ser Thr
Gln Asn Thr Asp Ser Gln Thr Arg Thr275 280
285Ile Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His290
295 300Gly Asn Ala Glu Val His Ala Ser Phe
Phe Asp Ile Gly Gly Ser Val305 310 315
320Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala Ile
Asp His325 330 335Ser Leu Ser Leu Ala Gly
Glu Arg Thr Trp Ala Glu Thr Met Gly Leu340 345
350Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn Ile Arg Tyr Val
Asn355 360 365Thr Gly Thr Ala Pro Ile Tyr
Asn Val Leu Pro Thr Thr Ser Leu Val370 375
380Leu Gly Lys Asn Gln Thr Leu Ala Thr Ile Lys Ala Asp Glu Asn Gln385
390 395 400Leu Ser Gln Ile
Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu405 410
415Ala Pro Ile Ala Leu Asn Ala Gln Lys Asp Phe Ser Ser Thr
Pro Ile420 425 430Thr Met Asn Tyr Asn Gln
Phe Leu Glu Leu Glu Lys Thr Lys Gln Leu435 440
445Arg Leu Asp Thr Asp Gln Val Tyr Gly Asn Ile Ala Thr Tyr Asn
Phe450 455 460Glu Asn Gly Arg Val Arg Val
Asp Thr Gly Ser Asn Trp Ser Glu Val465 470
475 480Leu Pro Gln Ile Gln Glu Thr Thr Ala Arg Ile Ile
Phe Asn Gly Lys485 490 495Asp Leu Asn Leu
Val Glu Arg Arg Ile Ala Ala Val Asn Pro Ser Asp500 505
510Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala
Leu Lys515 520 525Ile Ala Phe Gly Phe Asn
Glu Pro Asn Gly Asn Leu Gln Tyr Gln Gly530 535
540Lys Asp Ile Thr Glu Phe Asp Phe Asn Phe Asp Gln Gln Thr Ser
Gln545 550 555 560Asn Ile
Lys Asn Gln Leu Ala Glu Leu Asn Ala Thr Asn Ile Tyr Thr565
570 575Val Leu Asp Lys Ile Lys Leu Asn Ala Lys Met Asn
Ile Leu Ile Arg580 585 590Asp Lys Arg Phe
His Tyr Asp Arg Asn Asn Ile Ala Val Gly Ala Asp595 600
605Glu Ser Val Val Lys Glu Ala His Arg Glu Val Ile Asn Ser
Ser Thr610 615 620Glu Gly Leu Leu Leu Asn
Ile Asp Lys Asp Ile Arg Lys Ile Leu Ser625 630
635 640Gly Tyr Ile Val Glu Ile Glu Asp Thr Glu Gly
Leu Lys Glu Val Ile645 650 655Asn Asp Arg
Tyr Asp Met Leu Asn Ile Ser Ser Leu Arg Gln Asp Gly660
665 670Lys Thr Phe Ile Asp Phe Lys Lys Tyr Asn Asp Lys
Leu Pro Leu Tyr675 680 685Ile Ser Asn Pro
Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu690 695
700Asn Thr Ile Ile Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr
Asn Gly705 710 715 720Ile
Lys Lys Ile Leu Ile Phe Ser Lys Lys Gly Tyr Glu Ile Gly725
730 735562DNAArtificial sequenceM expression control
sequence 5taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat
gtacccagtt 60cg
62676DNAArtificial sequenceM+D expression control sequence
6taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat gtacccagtg
60tgagcggata acaatt
76773DNAArtificial sequenceU+D expression control sequence 7ttgtgagcgg
ataacaattt gacaccctag ccgataggct ttaagatgta cccagtgtga 60gcggataaca
att
738122DNAArtificial sequenceM+D1 expression control sequence 8gatccaagct
taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat 60gtacccaatt
gtgagcggat aacaatttca cacattaaag aggagaaatt acatatggat 120cg
1229119DNAArtificial sequenceM+D2 expression control sequence 9gatccaagct
taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat 60gtacccagtg
tgagcggata acaatttcac attaaagagg agaaattaca tatggatcg
1191028DNAArtificial sequencelac operator sequence 10aattgtgagc
ggataacaat ttcacaca
281116DNAArtificial sequenceoperator sequence 11gtgagcggat aacaat
16124208DNAArtificial
sequencepHE4-5 expression plasmid sequence 12aagcttaaaa aactgcaaaa
aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa
tttcacacat taaagaggag aaattacata tggaccgttt 120ccacgctacc tccgctgact
gctgcatctc ctacaccccg cgttccatcc cgtgctcgct 180gctggaatcc tacttcgaaa
ccaactccga atgctccaaa ccgggtgtta tcttcctgac 240caaaaaaggt cgtcgtttct
gcgctaaccc gtccgacaaa caggttcagg tttgtatgcg 300tatgctgaaa ctggacaccc
gtatcaaaac ccgtaaaaac tgataaggta cctaagtgag 360tagggcgtcc gatcgacgga
cgcctttttt ttgaattcgt aatcatggtc atagctgttt 420cctgtgtgaa attgttatcc
gctcacaatt ccacacaaca tacgagccgg aagcataaag 480tgtaaagcct ggggtgccta
atgagtgagc taactcacat taattgcgtt gcgctcactg 540cccgctttcc agtcgggaaa
cctgtcgtgc cagctgcatt aatgaatcgg ccaacgcgcg 600gggagaggcg gtttgcgtat
tgggcgctct tccgcttcct cgctcactga ctcgctgcgc 660tcggtcgttc ggctgcggcg
agcggtatca gctcactcaa aggcggtaat acggttatcc 720acagaatcag gggataacgc
aggaaagaac atgtgagcaa aaggccagca aaaggccagg 780aaccgtaaaa aggccgcgtt
gctggcgttt ttccataggc tccgcccccc tgacgagcat 840cacaaaaatc gacgctcaag
tcagaggtgg cgaaacccga caggactata aagataccag 900gcgtttcccc ctggaagctc
cctcgtgcgc tctcctgttc cgaccctgcc gcttaccgga 960tacctgtccg cctttctccc
ttcgggaagc gtggcgcttt ctcatagctc acgctgtagg 1020tatctcagtt cggtgtaggt
cgttcgctcc aagctgggct gtgtgcacga accccccgtt 1080cagcccgacc gctgcgcctt
atccggtaac tatcgtcttg agtccaaccc ggtaagacac 1140gacttatcgc cactggcagc
agccactggt aacaggatta gcagagcgag gtatgtaggc 1200ggtgctacag agttcttgaa
gtggtggcct aactacggct acactagaag aacagtattt 1260ggtatctgcg ctctgctgaa
gccagttacc ttcggaaaaa gagttggtag ctcttgatcc 1320ggcaaacaaa ccaccgctgg
tagcggtggt ttttttgttt gcaagcagca gattacgcgc 1380agaaaaaaag gatctcaaga
agatcctttg atcttttcta cggggtctga cgctcagtgg 1440aacgaaaact cacgttaagg
gattttggtc atgagattat cgtcgacaat tcgcgcgcga 1500aggcgaagcg gcatgcattt
acgttgacac catcgaatgg tgcaaaacct ttcgcggtat 1560ggcatgatag cgcccggaag
agagtcaatt cagggtggtg aatgtgaaac cagtaacgtt 1620atacgatgtc gcagagtatg
ccggtgtctc ttatcagacc gtttcccgcg tggtgaacca 1680ggccagccac gtttctgcga
aaacgcggga aaaagtggaa gcggcgatgg cggagctgaa 1740ttacattccc aaccgcgtgg
cacaacaact ggcgggcaaa cagtcgttgc tgattggcgt 1800tgccacctcc agtctggccc
tgcacgcgcc gtcgcaaatt gtcgcggcga ttaaatctcg 1860cgccgatcaa ctgggtgcca
gcgtggtggt gtcgatggta gaacgaagcg gcgtcgaagc 1920ctgtaaagcg gcggtgcaca
atcttctcgc gcaacgcgtc agtgggctga tcattaacta 1980tccgctggat gaccaggatg
ccattgctgt ggaagctgcc tgcactaatg ttccggcgtt 2040atttcttgat gtctctgacc
agacacccat caacagtatt attttctccc atgaagacgg 2100tacgcgactg ggcgtggagc
atctggtcgc attgggtcac cagcaaatcg cgctgttagc 2160gggcccatta agttctgtct
cggcgcgtct gcgtctggct ggctggcata aatatctcac 2220tcgcaatcaa attcagccga
tagcggaacg ggaaggcgac tggagtgcca tgtccggttt 2280tcaacaaacc atgcaaatgc
tgaatgaggg catcgttccc actgcgatgc tggttgccaa 2340cgatcagatg gcgctgggcg
caatgcgcgc cattaccgag tccgggctgc gcgttggtgc 2400ggatatctcg gtagtgggat
acgacgatac cgaagacagc tcatgttata tcccgccgtt 2460aaccaccatc aaacaggatt
ttcgcctgct ggggcaaacc agcgtggacc gcttgctgca 2520actctctcag ggccaggcgg
tgaagggcaa tcagctgttg cccgtctcac tggtgaaaag 2580aaaaaccacc ctggcgccca
atacgcaaac cgcctctccc cgcgcgttgg ccgattcatt 2640aatgcagctg gcacgacagg
tttcccgact ggaaagcggg cagtgagcgc aacgcaatta 2700atgtaagtta gcgcgaattg
tcgaccaaag cggccatcgt gcctccccac tcctgcagtt 2760cgggggcatg gatgcgcgga
tagccgctgc tggtttcctg gatgccgacg gatttgcact 2820gccggtagaa ctccgcgagg
tcgtccagcc tcaggcagca gctgaaccaa ctcgcgaggg 2880gatcgagccc ggggtgggcg
aagaactcca gcatgagatc cccgcgctgg aggatcatcc 2940agccggcgtc ccggaaaacg
attccgaagc ccaacctttc atagaaggcg gcggtggaat 3000cgaaatctcg tgatggcagg
ttgggcgtcg cttggtcggt catttcgaac cccagagtcc 3060cgctcagaag aactcgtcaa
gaaggcgata gaaggcgatg cgctgcgaat cgggagcggc 3120gataccgtaa agcacgagga
agcggtcagc ccattcgccg ccaagctctt cagcaatatc 3180acgggtagcc aacgctatgt
cctgatagcg gtccgccaca cccagccggc cacagtcgat 3240gaatccagaa aagcggccat
tttccaccat gatattcggc aagcaggcat cgccatgggt 3300cacgacgaga tcctcgccgt
cgggcatgcg cgccttgagc ctggcgaaca gttcggctgg 3360cgcgagcccc tgatgctctt
cgtccagatc atcctgatcg acaagaccgg cttccatccg 3420agtacgtgct cgctcgatgc
gatgtttcgc ttggtggtcg aatgggcagg tagccggatc 3480aagcgtatgc agccgccgca
ttgcatcagc catgatggat actttctcgg caggagcaag 3540gtgagatgac aggagatcct
gccccggcac ttcgcccaat agcagccagt cccttcccgc 3600ttcagtgaca acgtcgagca
cagctgcgca aggaacgccc gtcgtggcca gccacgatag 3660ccgcgctgcc tcgtcctgca
gttcattcag ggcaccggac aggtcggtct tgacaaaaag 3720aaccgggcgc ccctgcgctg
acagccggaa cacggcggca tcagagcagc cgattgtctg 3780ttgtgcccag tcatagccga
atagcctctc cacccaagcg gccggagaac ctgcgtgcaa 3840tccatcttgt tcaatcatgc
gaaacgatcc tcatcctgtc tcttgatcag atcttgatcc 3900cctgcgccat cagatccttg
gcggcaagaa agccatccag tttactttgc agggcttccc 3960aaccttacca gagggcgccc
cagctggcaa ttccggttcg cttgctgtcc ataaaaccgc 4020ccagtctagc tatcgccatg
taagcccact gcaagctacc tgctttctct ttgcgcttgc 4080gttttccctt gtccagatag
cccagtagct gacattcatc cggggtcagc accgtttctg 4140cggactggct ttctacgtgt
tccgcttcct ttagcagccc ttgcgccctg agtgcttgcg 4200gcagcgtg
4208133984DNAArtificial
sequencepHE4-0 expression plasmid sequence 13aagcttaaaa aactgcaaaa
aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa
tttcacacat taaagaggag aaattacata tgaaggatcc 120ttggtaccta agtgagtagg
gcgtccgatc gacggacgcc ttttttttga attcgtaatc 180atggtcatag ctgtttcctg
tgtgaaattg ttatccgctc acaattccac acaacatacg 240agccggaagc ataaagtgta
aagcctgggg tgcctaatga gtgagctaac tcacattaat 300tgcgttgcgc tcactgcccg
ctttccagtc gggaaacctg tcgtgccagc tgcattaatg 360aatcggccaa cgcgcgggga
gaggcggttt gcgtattggg cgctcttccg cttcctcgct 420cactgactcg ctgcgctcgg
tcgttcggct gcggcgagcg gtatcagctc actcaaaggc 480ggtaatacgg ttatccacag
aatcagggga taacgcagga aagaacatgt gagcaaaagg 540ccagcaaaag gccaggaacc
gtaaaaaggc cgcgttgctg gcgtttttcc ataggctccg 600cccccctgac gagcatcaca
aaaatcgacg ctcaagtcag aggtggcgaa acccgacagg 660actataaaga taccaggcgt
ttccccctgg aagctccctc gtgcgctctc ctgttccgac 720cctgccgctt accggatacc
tgtccgcctt tctcccttcg ggaagcgtgg cgctttctca 780tagctcacgc tgtaggtatc
tcagttcggt gtaggtcgtt cgctccaagc tgggctgtgt 840gcacgaaccc cccgttcagc
ccgaccgctg cgccttatcc ggtaactatc gtcttgagtc 900caacccggta agacacgact
tatcgccact ggcagcagcc actggtaaca ggattagcag 960agcgaggtat gtaggcggtg
ctacagagtt cttgaagtgg tggcctaact acggctacac 1020tagaagaaca gtatttggta
tctgcgctct gctgaagcca gttaccttcg gaaaaagagt 1080tggtagctct tgatccggca
aacaaaccac cgctggtagc ggtggttttt ttgtttgcaa 1140gcagcagatt acgcgcagaa
aaaaaggatc tcaagaagat cctttgatct tttctacggg 1200gtctgacgct cagtggaacg
aaaactcacg ttaagggatt ttggtcatga gattatcgtc 1260gacaattcgc gcgcgaaggc
gaagcggcat gcatttacgt tgacaccatc gaatggtgca 1320aaacctttcg cggtatggca
tgatagcgcc cggaagagag tcaattcagg gtggtgaatg 1380tgaaaccagt aacgttatac
gatgtcgcag agtatgccgg tgtctcttat cagaccgttt 1440cccgcgtggt gaaccaggcc
agccacgttt ctgcgaaaac gcgggaaaaa gtggaagcgg 1500cgatggcgga gctgaattac
attcccaacc gcgtggcaca acaactggcg ggcaaacagt 1560cgttgctgat tggcgttgcc
acctccagtc tggccctgca cgcgccgtcg caaattgtcg 1620cggcgattaa atctcgcgcc
gatcaactgg gtgccagcgt ggtggtgtcg atggtagaac 1680gaagcggcgt cgaagcctgt
aaagcggcgg tgcacaatct tctcgcgcaa cgcgtcagtg 1740ggctgatcat taactatccg
ctggatgacc aggatgccat tgctgtggaa gctgcctgca 1800ctaatgttcc ggcgttattt
cttgatgtct ctgaccagac acccatcaac agtattattt 1860tctcccatga agacggtacg
cgactgggcg tggagcatct ggtcgcattg ggtcaccagc 1920aaatcgcgct gttagcgggc
ccattaagtt ctgtctcggc gcgtctgcgt ctggctggct 1980ggcataaata tctcactcgc
aatcaaattc agccgatagc ggaacgggaa ggcgactgga 2040gtgccatgtc cggttttcaa
caaaccatgc aaatgctgaa tgagggcatc gttcccactg 2100cgatgctggt tgccaacgat
cagatggcgc tgggcgcaat gcgcgccatt accgagtccg 2160ggctgcgcgt tggtgcggat
atctcggtag tgggatacga cgataccgaa gacagctcat 2220gttatatccc gccgttaacc
accatcaaac aggattttcg cctgctgggg caaaccagcg 2280tggaccgctt gctgcaactc
tctcagggcc aggcggtgaa gggcaatcag ctgttgcccg 2340tctcactggt gaaaagaaaa
accaccctgg cgcccaatac gcaaaccgcc tctccccgcg 2400cgttggccga ttcattaatg
cagctggcac gacaggtttc ccgactggaa agcgggcagt 2460gagcgcaacg caattaatgt
aagttagcgc gaattgtcga ccaaagcggc catcgtgcct 2520ccccactcct gcagttcggg
ggcatggatg cgcggatagc cgctgctggt ttcctggatg 2580ccgacggatt tgcactgccg
gtagaactcc gcgaggtcgt ccagcctcag gcagcagctg 2640aaccaactcg cgaggggatc
gagcccgggg tgggcgaaga actccagcat gagatccccg 2700cgctggagga tcatccagcc
ggcgtcccgg aaaacgattc cgaagcccaa cctttcatag 2760aaggcggcgg tggaatcgaa
atctcgtgat ggcaggttgg gcgtcgcttg gtcggtcatt 2820tcgaacccca gagtcccgct
cagaagaact cgtcaagaag gcgatagaag gcgatgcgct 2880gcgaatcggg agcggcgata
ccgtaaagca cgaggaagcg gtcagcccat tcgccgccaa 2940gctcttcagc aatatcacgg
gtagccaacg ctatgtcctg atagcggtcc gccacaccca 3000gccggccaca gtcgatgaat
ccagaaaagc ggccattttc caccatgata ttcggcaagc 3060aggcatcgcc atgggtcacg
acgagatcct cgccgtcggg catgcgcgcc ttgagcctgg 3120cgaacagttc ggctggcgcg
agcccctgat gctcttcgtc cagatcatcc tgatcgacaa 3180gaccggcttc catccgagta
cgtgctcgct cgatgcgatg tttcgcttgg tggtcgaatg 3240ggcaggtagc cggatcaagc
gtatgcagcc gccgcattgc atcagccatg atggatactt 3300tctcggcagg agcaaggtga
gatgacagga gatcctgccc cggcacttcg cccaatagca 3360gccagtccct tcccgcttca
gtgacaacgt cgagcacagc tgcgcaagga acgcccgtcg 3420tggccagcca cgatagccgc
gctgcctcgt cctgcagttc attcagggca ccggacaggt 3480cggtcttgac aaaaagaacc
gggcgcccct gcgctgacag ccggaacacg gcggcatcag 3540agcagccgat tgtctgttgt
gcccagtcat agccgaatag cctctccacc caagcggccg 3600gagaacctgc gtgcaatcca
tcttgttcaa tcatgcgaaa cgatcctcat cctgtctctt 3660gatcagatct tgatcccctg
cgccatcaga tccttggcgg caagaaagcc atccagttta 3720ctttgcaggg cttcccaacc
ttaccagagg gcgccccagc tggcaattcc ggttcgcttg 3780ctgtccataa aaccgcccag
tctagctatc gccatgtaag cccactgcaa gctacctgct 3840ttctctttgc gcttgcgttt
tcccttgtcc agatagccca gtagctgaca ttcatccggg 3900gtcagcaccg tttctgcgga
ctggctttct acgtgttccg cttcctttag cagcccttgc 3960gccctgagtg cttgcggcag
cgtg 3984144277DNAArtificial
sequencepHE4-a expression plasmid sequence 14aagcttaaaa aactgcaaaa
aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa
tttcacacat taaagaggag aaattacata tgtgatagat 120aaaagacgct gaaaccgaat
tcttgttgtc caaactgccg ctggaaaacc cggttctgct 180ggaccgtttc cacgctacct
ccgctgactg ctgcatctcc tacaccacgc gttccatccc 240gtgctcgctg ctggaatcct
acttcgaaac caactccgaa tgctccaaac cgggtgttat 300cttcctgacc aaaaaaggtc
gtcgtttctg cgctaacccg tccgacaaac aggttcaggt 360ttgtatgcgt atgctgaaac
tggacacccg tgcggccgct ctagaggatc ctcgaggtac 420ctaagtgagt agggcgtccg
atcgacggac gccttttttt tgaattcgta atcatggtca 480tagctgtttc ctgtgtgaaa
ttgttatccg ctcacaattc cacacaacat acgagccgga 540agcataaagt gtaaagcctg
gggtgcctaa tgagtgagct aactcacatt aattgcgttg 600cgctcactgc ccgctttcca
gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc 660caacgcgcgg ggagaggcgg
tttgcgtatt gggcgctctt ccgcttcctc gctcactgac 720tcgctgcgct cggtcgttcg
gctgcggcga gcggtatcag ctcactcaaa ggcggtaata 780cggttatcca cagaatcagg
ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa 840aaggccagga accgtaaaaa
ggccgcgttg ctggcgtttt tccataggct ccgcccccct 900gacgagcatc acaaaaatcg
acgctcaagt cagaggtggc gaaacccgac aggactataa 960agataccagg cgtttccccc
tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg 1020cttaccggat acctgtccgc
ctttctccct tcgggaagcg tggcgctttc tcatagctca 1080cgctgtaggt atctcagttc
ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa 1140ccccccgttc agcccgaccg
ctgcgcctta tccggtaact atcgtcttga gtccaacccg 1200gtaagacacg acttatcgcc
actggcagca gccactggta acaggattag cagagcgagg 1260tatgtaggcg gtgctacaga
gttcttgaag tggtggccta actacggcta cactagaaga 1320acagtatttg gtatctgcgc
tctgctgaag ccagttacct tcggaaaaag agttggtagc 1380tcttgatccg gcaaacaaac
caccgctggt agcggtggtt tttttgtttg caagcagcag 1440attacgcgca gaaaaaaagg
atctcaagaa gatcctttga tcttttctac ggggtctgac 1500gctcagtgga acgaaaactc
acgttaaggg attttggtca tgagattatc gtcgacaatt 1560cgcgcgcgaa ggcgaagcgg
catgcattta cgttgacacc atcgaatggt gcaaaacctt 1620tcgcggtatg gcatgatagc
gcccggaaga gagtcaattc agggtggtga atgtgaaacc 1680agtaacgtta tacgatgtcg
cagagtatgc cggtgtctct tatcagaccg tttcccgcgt 1740ggtgaaccag gccagccacg
tttctgcgaa aacgcgggaa aaagtggaag cggcgatggc 1800ggagctgaat tacattccca
accgcgtggc acaacaactg gcgggcaaac agtcgttgct 1860gattggcgtt gccacctcca
gtctggccct gcacgcgccg tcgcaaattg tcgcggcgat 1920taaatctcgc gccgatcaac
tgggtgccag cgtggtggtg tcgatggtag aacgaagcgg 1980cgtcgaagcc tgtaaagcgg
cggtgcacaa tcttctcgcg caacgcgtca gtgggctgat 2040cattaactat ccgctggatg
accaggatgc cattgctgtg gaagctgcct gcactaatgt 2100tccggcgtta tttcttgatg
tctctgacca gacacccatc aacagtatta ttttctccca 2160tgaagacggt acgcgactgg
gcgtggagca tctggtcgca ttgggtcacc agcaaatcgc 2220gctgttagcg ggcccattaa
gttctgtctc ggcgcgtctg cgtctggctg gctggcataa 2280atatctcact cgcaatcaaa
ttcagccgat agcggaacgg gaaggcgact ggagtgccat 2340gtccggtttt caacaaacca
tgcaaatgct gaatgagggc atcgttccca ctgcgatgct 2400ggttgccaac gatcagatgg
cgctgggcgc aatgcgcgcc attaccgagt ccgggctgcg 2460cgttggtgcg gatatctcgg
tagtgggata cgacgatacc gaagacagct catgttatat 2520cccgccgtta accaccatca
aacaggattt tcgcctgctg gggcaaacca gcgtggaccg 2580cttgctgcaa ctctctcagg
gccaggcggt gaagggcaat cagctgttgc ccgtctcact 2640ggtgaaaaga aaaaccaccc
tggcgcccaa tacgcaaacc gcctctcccc gcgcgttggc 2700cgattcatta atgcagctgg
cacgacaggt ttcccgactg gaaagcgggc agtgagcgca 2760acgcaattaa tgtaagttag
cgcgaattgt cgaccaaagc ggccatcgtg cctccccact 2820cctgcagttc gggggcatgg
atgcgcggat agccgctgct ggtttcctgg atgccgacgg 2880atttgcactg ccggtagaac
tccgcgaggt cgtccagcct caggcagcag ctgaaccaac 2940tcgcgagggg atcgagcccg
gggtgggcga agaactccag catgagatcc ccgcgctgga 3000ggatcatcca gccggcgtcc
cggaaaacga ttccgaagcc caacctttca tagaaggcgg 3060cggtggaatc gaaatctcgt
gatggcaggt tgggcgtcgc ttggtcggtc atttcgaacc 3120ccagagtccc gctcagaaga
actcgtcaag aaggcgatag aaggcgatgc gctgcgaatc 3180gggagcggcg ataccgtaaa
gcacgaggaa gcggtcagcc cattcgccgc caagctcttc 3240agcaatatca cgggtagcca
acgctatgtc ctgatagcgg tccgccacac ccagccggcc 3300acagtcgatg aatccagaaa
agcggccatt ttccaccatg atattcggca agcaggcatc 3360gccatgggtc acgacgagat
cctcgccgtc gggcatgcgc gccttgagcc tggcgaacag 3420ttcggctggc gcgagcccct
gatgctcttc gtccagatca tcctgatcga caagaccggc 3480ttccatccga gtacgtgctc
gctcgatgcg atgtttcgct tggtggtcga atgggcaggt 3540agccggatca agcgtatgca
gccgccgcat tgcatcagcc atgatggata ctttctcggc 3600aggagcaagg tgagatgaca
ggagatcctg ccccggcact tcgcccaata gcagccagtc 3660ccttcccgct tcagtgacaa
cgtcgagcac agctgcgcaa ggaacgcccg tcgtggccag 3720ccacgatagc cgcgctgcct
cgtcctgcag ttcattcagg gcaccggaca ggtcggtctt 3780gacaaaaaga accgggcgcc
cctgcgctga cagccggaac acggcggcat cagagcagcc 3840gattgtctgt tgtgcccagt
catagccgaa tagcctctcc acccaagcgg ccggagaacc 3900tgcgtgcaat ccatcttgtt
caatcatgcg aaacgatcct catcctgtct cttgatcaga 3960tcttgatccc ctgcgccatc
agatccttgg cggcaagaaa gccatccagt ttactttgca 4020gggcttccca accttaccag
agggcgcccc agctggcaat tccggttcgc ttgctgtcca 4080taaaaccgcc cagtctagct
atcgccatgt aagcccactg caagctacct gctttctctt 4140tgcgcttgcg ttttcccttg
tccagatagc ccagtagctg acattcatcc ggggtcagca 4200ccgtttctgc ggactggctt
tctacgtgtt ccgcttcctt tagcagccct tgcgccctga 4260gtgcttgcgg cagcgtg
427715319PRTArtificial
sequenceLacIq repressor gene sequence 15Met Ala Glu Leu Asn Tyr Ile Pro
Asn Arg Val Ala Gln Gln Leu Ala1 5 10
15Gly Lys Gln Ser Leu Leu Ile Gly Val Ala Thr Ser Ser Leu
Ala Leu20 25 30His Ala Pro Ser Gln Ile
Val Ala Ala Ile Lys Ser Arg Ala Asp Gln35 40
45Leu Gly Ala Ser Val Val Val Ser Met Val Glu Arg Ser Gly Val Glu50
55 60Ala Cys Lys Ala Ala Val His Asn Leu
Leu Ala Gln Arg Val Ser Gly65 70 75
80Leu Ile Ile Asn Tyr Pro Leu Asp Asp Gln Asp Ala Ile Ala
Val Glu85 90 95Ala Ala Cys Thr Asn Val
Pro Ala Leu Phe Leu Asp Val Ser Asp Gln100 105
110Thr Pro Ile Asn Ser Ile Ile Phe Ser His Glu Asp Gly Thr Arg
Leu115 120 125Gly Val Glu His Leu Val Ala
Leu Gly His Gln Gln Ile Ala Leu Leu130 135
140Ala Gly Pro Leu Ser Ser Val Ser Ala Arg Leu Arg Leu Ala Gly Trp145
150 155 160His Lys Tyr Leu
Thr Arg Asn Gln Ile Gln Pro Ile Ala Glu Arg Glu165 170
175Gly Asp Trp Ser Ala Met Ser Gly Phe Gln Gln Thr Met Gln
Met Leu180 185 190Asn Glu Gly Ile Val Pro
Thr Ala Met Leu Val Ala Asn Asp Gln Met195 200
205Ala Leu Gly Ala Met Arg Ala Ile Thr Glu Ser Gly Leu Arg Val
Gly210 215 220Ala Asp Ile Ser Val Val Gly
Tyr Asp Asp Thr Glu Asp Ser Ser Cys225 230
235 240Tyr Ile Pro Pro Leu Thr Thr Ile Lys Gln Asp Phe
Arg Leu Leu Gly245 250 255Gln Thr Ser Val
Asp Arg Leu Leu Gln Leu Ser Gln Gly Gln Ala Val260 265
270Lys Gly Asn Gln Leu Leu Pro Val Ser Leu Val Lys Arg Lys
Thr Thr275 280 285Leu Ala Pro Asn Thr Gln
Thr Ala Ser Pro Arg Ala Leu Ala Asp Ser290 295
300Leu Met Gln Leu Ala Arg Gln Val Ser Arg Leu Glu Ser Gly Gln305
310 31516264PRTArtificial sequenceKanamycin
resistance gene sequence 16Met Ile Glu Gln Asp Gly Leu His Ala Gly Ser
Pro Ala Ala Trp Val1 5 10
15Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile Gly Cys Ser20
25 30Asp Ala Ala Val Phe Arg Leu Ser Ala Gln
Gly Arg Pro Val Leu Phe35 40 45Val Lys
Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln Asp Glu Ala50
55 60Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro
Cys Ala Ala Val65 70 75
80Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu Leu Gly Glu85
90 95Val Pro Gly Gln Asp Leu Leu Ser Ser His
Leu Ala Pro Ala Glu Lys100 105 110Val Ser
Ile Met Ala Asp Ala Met Arg Arg Leu His Thr Leu Asp Pro115
120 125Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg
Ile Glu Arg Ala130 135 140Arg Thr Arg Met
Glu Ala Gly Leu Val Asp Gln Asp Asp Leu Asp Glu145 150
155 160Glu His Gln Gly Leu Ala Pro Ala Glu
Leu Phe Ala Arg Leu Lys Ala165 170 175Arg
Met Pro Asp Gly Glu Asp Leu Val Val Thr His Gly Asp Ala Cys180
185 190Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe
Ser Gly Phe Ile Asp195 200 205Cys Gly Arg
Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile Ala Leu Ala210
215 220Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp
Ala Asp Arg Phe225 230 235
240Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg Ile Ala Phe245
250 255Tyr Arg Leu Leu Asp Glu Phe
Phe2601718DNAArtificial sequencepHE4 Shine-Dalgarno sequence 17attaaagagg
agaaatta
181812DNAArtificial sequenceShine Dalgarno sequence based on phoA
promoter 18gtaaaggaag ta
12
User Contributions:
Comment about this patent or add new information about this topic: