Patent application title: Modified Shine-Dalgarno Sequences and Methods of Use Thereof

Inventors: Michael W. Laird (San Ramon, CA, US) Michael W. Laird (San Ramon, CA, US)
Assignees: Human Genome Sciences, Inc.
IPC8 Class: AC12P2104FI
USPC Class: 435 691
Class name: Chemistry: molecular biology and microbiology micro-organism, tissue cell culture or enzyme using process to synthesize a desired chemical compound or composition recombinant dna technique included in method of making a protein or polypeptide
Publication date: 2008-11-06
Patent application number: 20080274503

osome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors are provided. Methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems are also provided. In particular embodiments of the invention, compounds and methods for high efficiency production of soluble protein in prokaryotic systems are provided.

Claims:

1. An isolated polynucleotide comprising a Shine-Dalgarno sequence selected from the group consisting of:(a) SEQ ID NO:2;(b) polynucleotides 4-13 of SEQ ID NO:2; and(c) SEQ ID NO:18.

2. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (a).

3. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (b).

4. The isolated polynucleotide of claim 1 wherein the Shine-Dalgarno sequence is (c).

5. A vector comprising a Shine-Dalgarno sequence selected from the group consisting of:(a) SEQ ID NO:2;(b) polynucleotides 4-13 of SEQ ID NO:2; and(c) SEQ ID NO:18.

6. The vector of claim 5 wherein the Shine-Dalgarno sequence is (a).

7. The vector of claim 5 wherein the Shine-Dalgarno sequence is (b).

8. The vector of claim 5 wherein the Shine-Dalgarno sequence is (c).

9. The vector of claim 5, wherein said Shine-Dalgarno sequence is operably associated with a polynucleotide encoding a protein or fragment thereof.

10. The vector of claim 9, wherein said polynucleotide encodes SEQ ID NO:4.

11. The vector of claim 9, wherein said polynucleotide is operably associated with an expression control sequence.

12. A method of producing a vector comprising inserting the Shine-Dalgarno sequence of claim 1 into a vector.

13. A method of producing a host cell comprising transducing, transforming or transfecting a host cell with the vector of claim 5.

14. A recombinant host cell comprising the Shine-Dalgarno sequence of claim 1.

15. A recombinant host cell comprising the vector of claim 5.

16. A recombinant host cell comprising the vector of claim 9.

17. A method of producing a protein, comprising:(a) culturing the host cell of claim 16 under conditions suitable to produce the protein or fragment thereof, and(b) recovering the protein or fragment thereof from the cell culture.

18. The method of claim 17, wherein said polynucleotide encodes SEQ ID NO:4.

Description:

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001]This application is a divisional of U.S. application Ser. No. 11/447,892, filed Jun. 7, 2006, which is a divisional of U.S. application Ser. No. 11/004,853, filed Dec. 7, 2004 (now U.S. Pat. No. 7,094,573, issued Aug. 22, 2006), which is a continuation of International Application No. PCT/US03/19786, filed Jun. 25, 2003, which claims benefit under 35 U.S.C. § 119(e) to U.S. Provisional Application Nos. 60/391,433, filed Jun. 26, 2002, and 60/406,630, filed Aug. 29, 2002, each of which is hereby incorporated by reference in its entirety.

STATEMENT UNDER 37 C.F.R. § 1.77(b)(5)

[0002]This application refers to a "Sequence Listing" listed below, which is provided as a text document. The text document is entitled "PV595D2-SeqList.txt" (47,517 bytes, created May 19, 2008), which is incorporated by reference in its entirety.

FIELD OF THE INVENTION

[0003]The present invention relates to novel Shine-Dalgarno (ribosome binding site) sequences, vectors containing such sequences, and host cells transformed with these vectors. The present invention also relates to methods of use of such sequences, vectors, and host cells for the efficient production of proteins and fragments thereof in prokaryotic systems, and in one aspect of the invention, provides for high efficiency production of soluble protein in prokaryotic systems.

BACKGROUND OF THE INVENTION

[0004]The level of production of a protein in a host cell is determined by three major factors: the number of copies of its structural gene within the cell, the efficiency with which the structural gene copies are transcribed and the efficiency with which the resulting messenger RNA ("mRNA") is translated. The transcription and translation efficiencies are, in turn, dependent on nucleotide sequences that are normally situated ahead of the desired structural genes or the translated sequence. These nucleotide sequences, also known as expression control sequences, define, inter alia, the locations at which RNA polymerase binds (the promoter sequence to initiate transcription; see also EMBO J. 5:2995-3000 (1986)) and at which ribosomes bind and interact with the mRNA (the product of transcription) to initiate translation.

[0005]In most prokaryotes, the purine-rich ribosome binding site known as the Shine-Dalgarno (S-D) sequence assists with the binding and positioning of the 30S ribosome component relative to the start codon on the mRNA through interaction with a pyrimidine-rich region of the 16S ribosomal RNA. See, e.g., Shine & Dalgarno, Proc. Natl. Acad. Sci. USA 71:1342-46 (1976). The S-D sequence is located on the mRNA downstream from the start of transcription and upstream from the start of translation, typically from 4-14 nucleotides upstream of the start codon, and more typically from 8-10 nucleotides upstream of the start codon. Because of the role of the S-D sequence in translation, there is a direct relationship between the efficiency of translation and the efficiency (or strength) of the S-D sequence.

[0006]Not all S-D sequences have the same efficiency, however. Accordingly, prior attempts have been made to increase the efficiency of ribosomal binding, positioning, and translation by, inter alia, changing the distance between the S-D sequence and the start codon, changing the composition of the space between the S-D sequence and the start codon, modifying an existing S-D sequence, using a heterologous S-D sequence, and manipulating of the secondary structure of mRNA during the initiation of translation. Despite these changes, however, success in increasing of protein expression efficiency in prokaryotic systems has remained an elusive and unpredictable goal due to a variety of factors, including, inter alia, the host cells used, the expression control sequences (including the S-D sequence) used, and the characteristics of the gene and protein being expressed. See, e.g., Stenstrom, et al., Gene 273(2):259-265 (2001); Komarova, et al., Bioorg. Khim. 27(4)282-290 (2001); Stenstrom, et al., Gene 263(1-2):273-284 (2001); and Mironova, et al., Microbiol. Res. 154(1):35-41 (1999). For example, efficient expression of soluble B. anthracis protective antigen (PA) has proved difficult in E. coli. See, e.g., Sharma, et al. Protein Expression and Purification 7:33-38 (1996) (indicating 0.5 mg/L at 70% purity); Chauhan, et al. Biochem. Biophys. Res. Commun.; 283(2):308-15 (2001) (indicating 125 mg/L); Gupta, et al. Protein Expr. Purif 16(3):369-76 (1999) (indicating 2 mg/L).

[0007]Accordingly, there remains a demand in the art for compositions and methods for increasing the efficiency of ribosome binding and translation in prokaryotic systems, thereby resulting in increased efficiency of protein expression. This demand is especially strong for proteins that are difficult to express in existing systems, and for proteins that are desired in large quantity for pharmacological, therapeutic, or industrial use.

SUMMARY OF THE INVENTION

[0008]The present invention encompasses novel Shine-Dalgarno sequences that result in increased efficiency of protein expression in prokaryotic systems. The present invention further relates to vectors comprising such S-D sequences and host cells transformed with such vectors. In particular embodiments, the present invention relates to methods for producing proteins and fragments thereof in prokaryotic systems using such S-D sequences, vectors, and host cells. In certain embodiments, methods of use of the S-D sequences, vectors, and host cells of the invention provide high efficiency production of soluble protein in prokaryotic systems, including prokaryotic in vitro translation systems.

[0009]In particular embodiments of the invention, the novel S-D sequence comprises (or alternately consists of) SEQ ID NO:2. In additional embodiments, the novel S-D sequence comprises (or alternately consists of) nucleotides 4-13 of SEQ ID NO:2. The invention also encompasses the S-D sequence of SEQ ID NO:18, described at paragraph 0426 of U.S. Provisional Application No. 60/368,548, filed Apr. 1, 2002, and in U.S. Provisional Application No. 60/331,478, filed Nov. 16, 2001, each of which is hereby incorporated by reference herein in its entirety.

[0010]The protein or fragment thereof may be of prokaryotic, eukaryotic, or viral origin, or may be artificial. In particular embodiments, the S-D sequences, vectors, and host cells of the invention are used to express B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, TL6, or other proteins. In certain embodiments, the S-D sequences, vectors, and host cells of the invention are used to express proteins that have previously been difficult to express in prokaryotic systems. The present invention also encompasses the combination of novel S-D sequences with a variety of expression control sequences, such as those described in detail in U.S. Pat. No. 6,194,168 (which is hereby incorporated by reference herein in its entirety), and in particular, expression control sequences comprising at least a portion of one or more lac operator sequences and a phage promoter comprising a -30 region.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 depicts a Shine-Dalgarno sequence of the present invention (SEQ ID NO: 2) and the Shine-Dalgarno sequence contained in the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168). Bases matching the S-D sequence of the present invention (SEQ ID NO:2) are highlighted.

[0012]FIG. 2A depicts a map of the pHE6 vector (SEQ ID NO:1), which incorporates a S-D sequence of the invention. FIG. 2B depicts the pHE6 vector (SEQ ID NO:1) with the gene encoding mature Bacillus anthracis PA including an ETB signal sequence (SEQ ID NO:3) inserted.

[0013]FIGS. 3A-3B compare the efficiency of TL6 protein expression using the pHE4 vector (FIG. 3B) versus the pHE6 vector (FIG. 3A), which uses a S-D sequence of the invention. In particular, increased soluble TL6 expression with the pHE6 vector can be seen in FIG. 3A as a lack of "shadow" in the gel.

[0014]FIG. 4 depicts a gel showing the quantity and quality of PA after expression using pHE6 and subsequent purification. Using the compositions and methods of the invention, approximately 150 mg/L of soluble PA at greater than 96% purity (as measured by RP-HPLC) was obtained.

DETAILED DESCRIPTION OF THE INVENTION

[0015]The instant invention is directed to novel Shine-Dalgarno (ribosomal binding site) sequences. These S-D sequences result in increased efficiency of protein expression in prokaryotic systems. The S-D sequences of the present invention have been optimized through modification of several nucleotides. See, e.g., FIG. 1. In particular embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:2. In additional embodiments, the S-D sequences of the present invention comprise (or alternately consist of) nucleotides 4-13 of SEQ ID NO:2. In other embodiments, the S-D sequences of the present invention comprise (or alternately consist of) SEQ ID NO:18.

[0016]In many embodiments, the S-D sequences of the present invention are used in prokaryotic cells. Exemplary bacterial cells suitable for use with the instant invention include E. coli, B. subtilis, S. aureus, S. typhimurium, and other bacteria used in the art. In other embodiments, the S-D sequences of the present invention are used in prokaryotic in vitro transcription systems.

[0017]The present invention also relates to vectors and plasmids comprising one or more S-D sequences of the invention. Such vectors and plasmids generally also further comprise one or more restriction enzyme sites downstream of the S-D sequence for cloning and expression of a gene or polynucleotide of interest.

[0018]In certain embodiments, vectors and plasmids of the present invention further comprise additional expression control sequences, including but not limited to those described in U.S. Pat. No. 6,194,168, and in particular, M (SEQ ID NO:5), M+D (SEQ ID NO:6), U+D (SEQ ID NO:7), M+D1 (SEQ ID NO:8), and M+D2 (SEQ ID NO:9). More generally, the expression control sequence elements contemplated include bacterial or phage promoter sequences and functional variants thereof, whether natural or artificial; operator/repressor systems; and the lacIq gene (which confers tight regulation of the lac operator by blocking transcription of down-stream (i.e., 3') sequences).

[0019]The lac operator sequences contemplated for use in vectors and plasmids of the instant invention comprise (or alternately consist of) the entire lac operator sequence represented by the sequence 5' AATTGTGAGCGGATAACAATTTCACACA 3' (SEQ ID NO:10), or a portion thereof that retains at least partial activity, as described in U.S. Pat. No. 6,194,168. Activity is routinely determined using techniques well known in the art to measure the relative repressability of a promoter sequence in the absence of an inducer, such as IPTG. This is done by comparing the relative amounts of protein expressed from expression control sequences comprising portions of the lac operator sequence and full-length lac operator sequence. The partial operator sequence is measured relative to the full-length lac operator sequence (e.g., SEQ ID NO:10). In one embodiment, partial activity for the purposes of the present invention means activity reduced by no more than 100 fold relative to the full-length sequence. In alternative embodiments, partial activity for the purpose of the present invention means activity reduced by no more than 75, 50, 25, 20, 15, and 10 fold, relative to the full-length lac operator sequence. In a preferred embodiment, the activity of a partial operator sequence is reduced by no more than 10 fold relative to the activity of the full-length sequence.

[0020]In many embodiments, one or more S-D sequences of the invention are used in a vector comprising a T5 phage promoter sequence and two lac operator sequences wherein at least a portion of the full-length lac operator sequence (SEQ ID NO:10) is located within the spacer region between -12 and -30 of the expression control sequences described in U.S. Pat. No. 6,194,168. In particular embodiments, the operator sequence comprises (or alternately consists of) at least the sequence 5'-GTGAGCGGATAACAAT-3' (SEQ ID NO:11).

[0021]The previously mentioned lac-operator sequences are negatively regulated by the lac-repressor. The corresponding repressor gene can be introduced into the host cell in a vector or through integration into the chromosome of a bacterium by known methods, such as by integration of the lacIq gene. See, e.g., Miller et al, supra; Calos, (1978) Nature 274:762-765. The vector encoding the repressor molecule may be the same vector that contains the expression control sequences and a gene or polynucleotide of interest or may be a separate vector.

[0022]The S-D sequences of the invention can routinely be inserted using procedures known in the art into any suitable expression vector that can replicate in gram-negative and/or gram-positive bacteria. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.). Suitable vectors and plasmids can be constructed from segments of chromosomal, non-chromosomal and synthetic DNA sequences, such as various known plasmid and phage DNAs. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989). Especially suitable vectors include plasmids of the pDS family. See Bujard et al, (1987) Methods in Enzymology, 155:416-4333. Additional examples of preferred suitable plasmids include pBR322 and pBluescript® (Stratagene, La Jolla, Calif.) based plasmids. Still additional examples of preferred suitable plasmids include pUC-based vectors, including pUC18 and pUC19 (New England Biolabs, Beverly, Mass.) and pREP4 (Qiagen Inc., Chatsworth, Calif.). Portions of vectors and plasmids encoding desired functions may also be combined to form new vectors with desired characteristics. For example, the origin of replication of pUC19 may be recombined with the kanamycin resistance gene of pREP4 to create a new vector with both desired characteristics.

[0023]Preferably, vectors and plasmids comprising one or more S-D sequences of the invention also contain sequences that allow replication of the plasmid to high copy number in the host bacterium of choice. Additionally, vector or plasmid embodiments of the invention that comprise expression control sequences may further comprise a multiple cloning site immediately downstream of the expression control sequences and the S-D sequence.

[0024]Vectors and plasmids comprising one or more S-D sequences of the invention may further comprise genes conferring antibiotic resistance. Preferred genes are those conferring resistance to ampicillin, chloramphenicol, and tetracycline. Especially preferred genes are those conferring resistance to kanamycin.

[0025]The optimized S-D ribosomal binding site of the invention can also be inserted into the chromosome of gram-negative and gram-positive bacterial cells using techniques known in the art. In this case, selection agents such as antibiotics, which are generally required when working with vectors, can be dispensed with.

[0026]Proteins of interest that can be expressed using the S-D sequences, vectors, and host cells of the invention include prokaryotic, eukaryotic, viral, or artificial proteins. Such proteins include, but are not limited to: enzymes; hormones; proteins having immunoregulatory, antiviral or antitumor activity; antibodies and fragments thereof (e.g., Fab, F(ab), F(ab)₂, single-chain Fv, disulfide-linked Fv); or antigens. In preferred embodiments, the protein to be expressed is B. anthracis protective antigen (PA), mutated protective antigens (mPAs) (See, e.g., Sellman et al, JBC 276(11):8371-8376 (2001)), TL3, or TL6. Any effective signal sequence may be used in combination with the gene or polynucleotide of interest. In a preferred embodiment, the ETB signal sequence is used to enhance the expression of soluble protein.

[0027]The S-D sequences of the present invention provide for increased efficiency of protein expression in prokaryotic systems. Efficient expression means that the level of protein expression to be expected when using the S-D sequences of the instant invention is generally higher than levels previously reported in the art. In preferred embodiments, the resultant expressed protein can be highly purified to levels greater than 90% purity by RF-HPLC. Particularly preferred purity levels include 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, and near 100% purity, all of which are encompassed by the instant invention. It is expressly contemplated by the invention that the addition of one or more S-D sequences of the invention into any prokaryotic-based expression system, including and in addition to E. coli expression systems, will result in increased and more efficient protein expression.

[0028]The present invention also relates to methods of using the S-D sequences, vectors, plasmids, and host cells of the invention to produce proteins and fragments thereof. In one embodiment of the invention, a desired protein is produced by a method comprising:

[0029](a) transforming a bacterium with a vector in which a polynucleotide encoding a desired protein is operably linked to a S-D sequence of the invention;

[0030](b) culturing the transformed bacterium under suitable growth conditions; and

[0031](c) isolating the desired protein from the culture.

[0032]In another embodiment of the invention, a desired protein is produced by a method comprising:

[0033](a) inserting a S-D sequence of the invention and an expression control sequence into the chromosome of a suitable bacterium, wherein the S-D sequence and expression control sequence are each operably linked to a polynucleotide encoding a desired protein;

[0034](b) cultivating the bacterium under suitable growth conditions; and

[0035](c) isolating the desired protein from the culture.

[0036]The selection of a suitable host organism is determined by various factors that are well known in the art. Factors to be considered include, for example, compatibility with the selected vector, toxicity of the expression product, expression characteristics, necessary biological safety precautions and costs.

[0037]Suitable host organisms include, but are not limited to, gram-negative and gram-positive bacteria, such as E. coli, B. subtilis, S. aureus, and S. typhimurium strains. Preferred E. coli strains include DH5α (Gibco-BRL, Gaithersburg, Md.), XL-1 Blue (Stratagene®), and W3110 (ATCC® No. 27325). Other E. coli strains that can be used according to the present invention include other generally available strains such as E. coli 294 (ATCC® No. 31446), E. coli RR1 (ATCC® No. 31343) and M15.

EXAMPLES

[0038]The examples which follow are set forth to aid in understanding the invention but are not intended to, and should not be construed to, limit the scope of the invention in any way. The examples do not include detailed descriptions for conventional methods employed in the art, such as for the construction of vectors, the insertion of genes encoding polypeptides of interest into such vectors, or the introduction of the resulting plasmids into bacterial hosts. Such methods are described in numerous publications and can be carried out using recombinant DNA technology methods which are well known in the art. See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y. 2nd ed. 1989); Ausubel et al., Current Protocols in Molecular Biology (Green Pub. Assoc. and Wiley Intersciences, N.Y.).

Example 1

pHE6 Design

[0039]The S-D sequence used in pHE6 (SEQ ID NO:2) was based on the S-D sequence of the pHE4 expression vector (SEQ ID NO:17) (See U.S. Pat. No. 6,194,168), with three base pair changes made as indicated in FIG. 1. Additionally, the pHE6 plasmid encodes the aminoglycoside phosphotransferase protein (conferring kanamycin resistance), the lacIq repressor, and includes a ColE1 replicon. Construction of the pHE4 plasmid upon which the pHE6 plasmid is based is described in U.S. Pat. No. 6,194,168.

Example 2

Method of Making and Purifying PA in Escherichia coli K-12

[0040]Using the following method, a post-purification final yield of soluble PA greater than 2 g from 1 kg of E. coli cell paste (approximately 150 mg/L) can be obtained from either shake flasks or bioreactors. See FIG. 4. The purity of such soluble PA, as judged by RP-HPLC analysis, is greater than 96-98%.

[0041]The bacterial host strain used for the production of recombinant wild-type PA from a recombinant plasmid DNA molecule is an E. coli K-12 derived strain. To express protein from the expression vectors, E. coli cells were transformed with the expression vectors and grown overnight (O/N) at 30° C. in 4 L shaker flasks containing 1 L Luria broth medium supplemented with kanamycin. The cultures were started at optical density 600λ (O.D.⁶⁰⁰) of 0.1. IPTG was added to a final concentration of 1 mM when the culture reached an O.D.⁶⁰⁰ of between 0.4 and 0.6. IPTG induced cultures were grown for an additional 3 hours. Cells were then harvested using methods known in the art, and the level of protein was detected using Western blot analysis. Soluble PA was then extracted from the periplasm and clarified by conventional means. The clarified supernatant was then purified using a Q Sepharose HP column (Amersham), concentrated, and further purified using a Biogel Hydroxyapatite HP column (BioRAD). Using the expression control sequence M+D1 (SEQ ID NO:8), high levels of repression in the absence of IPTG, and high levels of induced expression in the presence of IPTG were obtained.

Deposit of Microorganisms

[0042]Plasmid pHE6 was deposited with the American Type Culture Collection, 10801 University Boulevard, Manassas, Va. 20110-2209 on Jun. 20, 2002 and was given Accession No. PTA-4474. This culture has been accepted for deposit under the provisions of the Budapest Treaty on the International Recognition of Microorganisms for the Purposes of Patent Proceedings.

[0043]The disclosures of all publications (including patents, patent applications, journal articles, laboratory manuals, books, or other documents) cited herein are hereby incorporated by reference in their entireties.

[0044]The present invention is not to be limited in scope by the specific embodiments described herein, which are intended as illustrations of individual aspects of the invention. Functionally equivalent methods and components are within the scope of the invention, in addition to those shown and described herein and will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims.

Sequence CWU 1

1813979DNAArtificial sequencepHE6 expression plasmid including novel Shine- Dalgarno sequence 1aagcttaaaa aactgcaaaa aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa tttcacacat tataaaggaa aaattacata tgaaggatcc 120aaggtacctg agtagggcgt ccgatcgacg gacgcctttt ttttgaattc gtaatcatgt 180catagctgtt tcctgtgtga aattgttatc cgctcacaat tccacacaac atacgagccg 240gaagcataaa gtgtaaagcc tggggtgcct aatgagtgag ctaactcaca ttaattgcgt 300tgcgctcact gcccgctttc cagtcgggaa acctgtcgtg ccagctgcat taatgaatcg 360gccaacgcgc ggggagaggc ggtttgcgta ttgggcgctc ttccgcttcc tcgctcactg 420actcgctgcg ctcggtcgtt cggctgcggc gagcggtatc agctcactca aaggcggtaa 480tacggttatc cacagaatca ggggagaacg caggaaagaa catgtgagca aaaggccagc 540aaaaggccag gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg ctccgccccc 600ctgacgagca tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg acaggactat 660aaagatacca ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt ccgaccctgc 720cgcttaccgg atacctgtcc gcctttctcc cttcgggaag cgtggcgctt tctcatagct 780cacgctgtag gtatctcagt tcggtgtaag tcgttcgctc caagctgggc tgtgtgcacg 840aaccccccgt tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt gagtccaacc 900cggtaagaca cgacttatcg ccactggcag cagccactgg taacaggatt agcagagcga 960ggtatgtagg cggtgctaca gagttcttga agtggtggcc taactacggc tacactagaa 1020gaacagtatt tggtatctgc gctctgctga agccagttac cttcggaaaa agagttggta 1080gctcttgatc cggcaaacaa accaccgctg gtagcggtgg tttttttgtt tgcaagcagc 1140agattacgcg cagaaaaaaa ggatctcaag aagatccttt gatcttttct acggggtctg 1200acgctcagtg gaacgaaaac tcacgttaag ggattttggt catgagatta tcgtcgacaa 1260ttcgcgcgcg aaggcgaagc ggcatgcatt tacgttgaca ccatcgaatg gtgcaaaacc 1320tttcgcggta tggcatgata gcgcccggaa gagagtcaat tcagggtggt gaatgtgaaa 1380ccagtaacgt tatacgatgt cgcagagtat gccggtgtct cttatcagac cgtttcccgc 1440gtggtgaacc aggccagcca cgtttctgcg aaaacgcggg aaaaagtgga agcggcgatg 1500gcggagctga attacattcc caaccgcgtg gcacaacaac tggcgggcaa acagtcgttg 1560ctgattggcg ttgccacctc cagtctggcc ctgcacgcgc cgtcgcaaat tgtcgcggcg 1620attaaatctc gcgccgatca actgggtgcc agcgtggtgg tgtcgatggt agaacgaagc 1680ggcgtcgaag cctgtaaagc ggcggtgcac aatcttctcg cgcaacgcgt cagtgggctg 1740atcattaact atccgctgga tgaccaggat gccattgctg tggaagctgc ctgcactaat 1800gttccggcgt tatttcttga tgtctctgac cagacaccca tcaacagtat tattttctcc 1860catgaagacg gtacgcgact gggcgtggag catctggtcg cattgggtca ccagcaaatc 1920gcgctgttag cgggcccatt aagttctgtc tcggcgcgtc tgcgtctggc tggctggcat 1980aaatatctca ctcgcaatca aattcagccg atagcggaac gggaaggcga ctggagtgcc 2040atgtccggtt ttcaacaaac catgcaaatg ctgaatgagg gcatcgttcc cactgcgatg 2100ctggttgcca acgatcagat ggcgctgggc gcaatgcgcg ccattaccga gtccgggctg 2160cgcgttggtg cggatatctc ggtagtggga tacgacgata ccgaagacag ctcatgttat 2220atcccgccgt taaccaccat caaacaggat tttcgcctgc tggggcaaac cagcgtggac 2280cgcttgctgc aactctctca gggccaggcg gtgaagggca atcagctgtt gcccgtctca 2340ctggtgaaaa gaaaaaccac cctggcgccc aatacgcaaa ccgcctctcc ccgcgcgttg 2400gccgattcat taatgcagct ggcacgacag gtttcccgac tggaaagcgg gcagtgagcg 2460caacgcaatt aatgtaagtt agcgcgaatt gtcgaccaaa gcggccatcg tgcctcccca 2520ctcctgcagt tcgggggcat ggatgcgcgg atagccgctg ctggtttcct ggatgccgac 2580ggatttgcac tgccggtaga actccgcgag gtcgtccagc ctcaggcagc agctgaacca 2640actcgcgagg ggatcgagcc cggggtgggc gaagaactcc agcatgagat ccccgcgctg 2700gaggatcatc cagccggcgt cccggaaaac gattccgaag cccaaccttt catagaaggc 2760ggcggtggaa tcgaaatctc gtgatggcag gttgggcgtc gcttggtcgg tcatttcgaa 2820ccccagagtc ccgctcagaa gaactcgtca agaaggcgat agaaggcgat gcgctgcgaa 2880tcgggagcgg cgataccgta aagcacgagg aagcggtcag cccattcgcc gccaagctct 2940tcagcaatat cacgggtagc caacgctatg tcctgatagc ggtccgccac acccagccgg 3000ccacagtcga tgaatccaga aaagcggcca ttttccacca tgatattcgg caagcaggca 3060tcgccatggg tcacgacgag atcctcgccg tcgggcatgc gcgccttgag cctggcgaac 3120agttcggctg gcgcgagccc ctgatgctct tcgtccagat catcctgatc gacaagaccg 3180gcttccatcc gagtacgtgc tcgctcgatg cgatgtttcg cttggtggtc gaatgggcag 3240gtagccggat caagcgtatg cagccgccgc attgcatcag ccatgatgga tactttctcg 3300gcaggagcaa ggtgagatga caggagatcc tgccccggca cttcgcccaa tagcagccag 3360tcccttcccg cttcagtgac aacgtcgagc acagctgcgc aaggaacgcc cgtcgtggcc 3420agccacgata gccgcgctgc ctcgtcctgc agttcattca gggcaccgga caggtcggtc 3480ttgacaaaaa gaaccgggcg cccctgcgct gacagccgga acacggcggc atcagagcag 3540ccgattgtct gttgtgccca gtcatagccg aatagcctct ccacccaagc ggccggagaa 3600cctgcgtgca atccatcttg ttcaatcatg cgaaacgatc ctcatcctgt ctcttgatca 3660gatcttgatc ccctgcgcca tcagatcctt ggcggcaaga aagccatcca gtttactttg 3720cagggcttcc caaccttacc agagggcgcc ccagctggca attccggttc gcttgctgtc 3780cataaaaccg cccagtctag ctatcgccat gtaagcccac tgcaagctac ctgctttctc 3840tttgcgcttg cgttttccct tgtccagata gcccagtagc tgacattcat ccggggtcag 3900caccgtttct gcggactggc tttctacgtg ttccgcttcc tttagcagcc cttgcgccct 3960gagtgcttgc ggcagcgtg 3979218DNAArtificial sequenceShine-Dalgarno sequence 2attataaagg aaaaatta 1832268DNAArtificial sequenceMature PA sequence including an ETB signal sequence 3atgaataaag taaaatgtta tgttttattt acggcgttac tatcctctct atatgcccat 60gga gaa gtt aaa cag gaa aac cgt ctg ctc aac gaa tct gag tct tcc 108Glu Val Lys Gln Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser1 5 10 15tct cag ggc ctg ctg ggt tac tat ttc tct gac ctg aac ttc cag gca 156Ser Gln Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gln Ala20 25 30ccg atg gtt gta act tct tcc acc acc ggc gac ctg tct att ccg tct 204Pro Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser Ile Pro Ser35 40 45tct gaa ctg gag aac atc ccg tct gaa aac cag tac ttc cag tct gct 252Ser Glu Leu Glu Asn Ile Pro Ser Glu Asn Gln Tyr Phe Gln Ser Ala50 55 60atc tgg tct ggt ttc att aaa gtt aag aaa tct gac gaa tac acc ttc 300Ile Trp Ser Gly Phe Ile Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe65 70 75gct act tct gca gat aac cac gtt act atg tgg gta gac gac cag gaa 348Ala Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gln Glu80 85 90 95gtt atc aac aaa gct tct aac tct aac aaa atc cgt ctg gaa aaa ggc 396Val Ile Asn Lys Ala Ser Asn Ser Asn Lys Ile Arg Leu Glu Lys Gly100 105 110cgt ctg tac cag atc aag att caa tac caa cgt gaa aac ccg acc gag 444Arg Leu Tyr Gln Ile Lys Ile Gln Tyr Gln Arg Glu Asn Pro Thr Glu115 120 125aaa ggt ctg gac ttc aaa ctg tac tgg acc gac tct cag aac aag aaa 492Lys Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gln Asn Lys Lys130 135 140gaa gtt atc tct tcc gac aac ctg cag ctg ccg gaa ctg aaa cag aaa 540Glu Val Ile Ser Ser Asp Asn Leu Gln Leu Pro Glu Leu Lys Gln Lys145 150 155tct tcc aac tct cgt aaa aag cgt tct act tct gct ggt ccg acc gtt 588Ser Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val160 165 170 175ccg gac cgt gat aac gac ggt att ccg gac tct ctg gaa gtt gaa ggc 636Pro Asp Arg Asp Asn Asp Gly Ile Pro Asp Ser Leu Glu Val Glu Gly180 185 190tac acc gta gac gtt aaa aac aaa cgt acc ttc ctg tct ccg tgg atc 684Tyr Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp Ile195 200 205tct aac atc cac gaa aag aaa ggt ctg acc aaa tac aaa tct tcc ccg 732Ser Asn Ile His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro210 215 220gag aaa tgg tct acc gct tct gat ccg tac tct gac ttc gaa aaa gtt 780Glu Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val225 230 235act ggt cgt atc gac aaa aac gtt tct ccg gaa gct cgt cac ccg ctg 828Thr Gly Arg Ile Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu240 245 250 255gta gca gcg tac ccg atc gtt cac gtt gac atg gaa aac att atc ctg 876Val Ala Ala Tyr Pro Ile Val His Val Asp Met Glu Asn Ile Ile Leu260 265 270tct aaa aac gaa gac cag tct acc cag aac acc gac tct caa act cgt 924Ser Lys Asn Glu Asp Gln Ser Thr Gln Asn Thr Asp Ser Gln Thr Arg275 280 285acc atc tct aaa aac acc tct acc tct cgt act cac acc tct gaa gtt 972Thr Ile Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val290 295 300cac ggt aac gct gag gtt cac gct tct ttc ttt gac atc ggt ggc tct 1020His Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp Ile Gly Gly Ser305 310 315gta tct gct ggt ttc tct aac tct aac tct tct acc gtt gca atc gac 1068Val Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala Ile Asp320 325 330 335cac tct ctg tct ctg gct ggt gaa cgt acc tgg gct gaa act atg ggc 1116His Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly340 345 350ctg aac acc gca gac acc gct cgt ctg aac gct aac atc cgt tac gtt 1164Leu Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn Ile Arg Tyr Val355 360 365aac acc ggc acc gct ccg atc tac aac gtt ctg ccg act acc tct ctg 1212Asn Thr Gly Thr Ala Pro Ile Tyr Asn Val Leu Pro Thr Thr Ser Leu370 375 380gta ctg ggt aaa aac cag acc ctg gca acc atc aaa gct gac gaa aac 1260Val Leu Gly Lys Asn Gln Thr Leu Ala Thr Ile Lys Ala Asp Glu Asn385 390 395cag ctg tct cag atc ctg gct ccg aac aac tac tat ccg tct aaa aac 1308Gln Leu Ser Gln Ile Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn400 405 410 415ctg gct ccg att gca ctg aac gct cag aaa gac ttc tct tcc acc ccg 1356Leu Ala Pro Ile Ala Leu Asn Ala Gln Lys Asp Phe Ser Ser Thr Pro420 425 430atc act atg aac tac aac cag ttc ctg gaa ctg gag aaa acc aaa cag 1404Ile Thr Met Asn Tyr Asn Gln Phe Leu Glu Leu Glu Lys Thr Lys Gln435 440 445ctg cgt ctg gac acc gac cag gtt tac ggt aac atc gct acc tac aac 1452Leu Arg Leu Asp Thr Asp Gln Val Tyr Gly Asn Ile Ala Thr Tyr Asn450 455 460ttc gaa aac ggt cgt gtt cgt gta gac acc ggc tct aac tgg tct gaa 1500Phe Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu465 470 475gtt ctg ccg cag atc cag gaa acc act gct cgt att atc ttc aac ggt 1548Val Leu Pro Gln Ile Gln Glu Thr Thr Ala Arg Ile Ile Phe Asn Gly480 485 490 495aaa gac ctg aac ctg gtt gaa cgt cgt atc gct gca gta aac ccg tct 1596Lys Asp Leu Asn Leu Val Glu Arg Arg Ile Ala Ala Val Asn Pro Ser500 505 510gac ccg ctg gaa acc act aaa ccg gac atg acc ctg aaa gaa gct ctg 1644Asp Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu515 520 525aaa atc gct ttc ggt ttc aac gaa ccg aac ggc aac ctg cag tac cag 1692Lys Ile Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gln Tyr Gln530 535 540ggt aaa gat atc acc gaa ttc gac ttt aac ttc gac cag caa acc tct 1740Gly Lys Asp Ile Thr Glu Phe Asp Phe Asn Phe Asp Gln Gln Thr Ser545 550 555cag aac atc aaa aac cag ctg gct gaa ctg aac gct acc aac atc tac 1788Gln Asn Ile Lys Asn Gln Leu Ala Glu Leu Asn Ala Thr Asn Ile Tyr560 565 570 575acc gtt ctg gac aaa atc aag ctg aac gct aaa atg aac att ctg atc 1836Thr Val Leu Asp Lys Ile Lys Leu Asn Ala Lys Met Asn Ile Leu Ile580 585 590cgt gat aaa cgt ttc cac tac gac cgt aac aac atc gct gtt ggt gct 1884Arg Asp Lys Arg Phe His Tyr Asp Arg Asn Asn Ile Ala Val Gly Ala595 600 605gac gaa tct gta gtt aaa gaa gct cac cgt gag gtt atc aac tct tcc 1932Asp Glu Ser Val Val Lys Glu Ala His Arg Glu Val Ile Asn Ser Ser610 615 620acc gaa ggt ctg ctc ctg aac atc gac aaa gat att cgt aaa atc ctg 1980Thr Glu Gly Leu Leu Leu Asn Ile Asp Lys Asp Ile Arg Lys Ile Leu625 630 635tct ggt tac atc gtt gaa atc gaa gac acc gag ggc ctg aaa gaa gtt 2028Ser Gly Tyr Ile Val Glu Ile Glu Asp Thr Glu Gly Leu Lys Glu Val640 645 650 655atc aac gac cgt tac gat atg ctg aac atc tct tcc ctg cgt cag gac 2076Ile Asn Asp Arg Tyr Asp Met Leu Asn Ile Ser Ser Leu Arg Gln Asp660 665 670ggt aaa acc ttc atc gac ttc aaa aag tac aac gat aaa ctg ccg ctg 2124Gly Lys Thr Phe Ile Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu675 680 685tac atc tct aac ccg aac tac aaa gta aac gtt tac gct gtt acc aaa 2172Tyr Ile Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys690 695 700gaa aac acc att atc aac ccg tct gaa aac ggt gac acc tct acc aac 2220Glu Asn Thr Ile Ile Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn705 710 715ggt atc aaa aag atc ctg atc ttc tct aag aaa ggc tac gaa atc ggt 2268Gly Ile Lys Lys Ile Leu Ile Phe Ser Lys Lys Gly Tyr Glu Ile Gly720 725 730 7354735PRTArtificial sequenceMature PA sequence including an ETB signal sequence 4Glu Val Lys Gln Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser1 5 10 15Gln Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gln Ala Pro20 25 30Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser Ile Pro Ser Ser35 40 45Glu Leu Glu Asn Ile Pro Ser Glu Asn Gln Tyr Phe Gln Ser Ala Ile50 55 60Trp Ser Gly Phe Ile Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala65 70 75 80Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gln Glu Val85 90 95Ile Asn Lys Ala Ser Asn Ser Asn Lys Ile Arg Leu Glu Lys Gly Arg100 105 110Leu Tyr Gln Ile Lys Ile Gln Tyr Gln Arg Glu Asn Pro Thr Glu Lys115 120 125Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gln Asn Lys Lys Glu130 135 140Val Ile Ser Ser Asp Asn Leu Gln Leu Pro Glu Leu Lys Gln Lys Ser145 150 155 160Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val Pro165 170 175Asp Arg Asp Asn Asp Gly Ile Pro Asp Ser Leu Glu Val Glu Gly Tyr180 185 190Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp Ile Ser195 200 205Asn Ile His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu210 215 220Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr225 230 235 240Gly Arg Ile Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val245 250 255Ala Ala Tyr Pro Ile Val His Val Asp Met Glu Asn Ile Ile Leu Ser260 265 270Lys Asn Glu Asp Gln Ser Thr Gln Asn Thr Asp Ser Gln Thr Arg Thr275 280 285Ile Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His290 295 300Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp Ile Gly Gly Ser Val305 310 315 320Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala Ile Asp His325 330 335Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu340 345 350Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn Ile Arg Tyr Val Asn355 360 365Thr Gly Thr Ala Pro Ile Tyr Asn Val Leu Pro Thr Thr Ser Leu Val370 375 380Leu Gly Lys Asn Gln Thr Leu Ala Thr Ile Lys Ala Asp Glu Asn Gln385 390 395 400Leu Ser Gln Ile Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu405 410 415Ala Pro Ile Ala Leu Asn Ala Gln Lys Asp Phe Ser Ser Thr Pro Ile420 425 430Thr Met Asn Tyr Asn Gln Phe Leu Glu Leu Glu Lys Thr Lys Gln Leu435 440 445Arg Leu Asp Thr Asp Gln Val Tyr Gly Asn Ile Ala Thr Tyr Asn Phe450 455 460Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val465 470 475 480Leu Pro Gln Ile Gln Glu Thr Thr Ala Arg Ile Ile Phe Asn Gly Lys485 490 495Asp Leu Asn Leu Val Glu Arg Arg Ile Ala Ala Val Asn Pro Ser Asp500 505 510Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys515 520 525Ile Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gln Tyr Gln Gly530 535 540Lys Asp Ile Thr Glu Phe Asp Phe Asn Phe Asp Gln Gln Thr Ser Gln545 550 555 560Asn Ile Lys Asn Gln Leu Ala Glu Leu Asn Ala Thr Asn Ile Tyr Thr565 570 575Val Leu Asp Lys Ile Lys Leu Asn Ala Lys Met Asn Ile Leu Ile Arg580 585 590Asp Lys Arg Phe His Tyr Asp Arg Asn Asn Ile Ala Val Gly Ala Asp595 600 605Glu Ser Val Val Lys Glu Ala His Arg Glu Val Ile Asn Ser Ser Thr610 615 620Glu Gly Leu Leu Leu Asn Ile Asp Lys Asp Ile Arg Lys Ile Leu Ser625 630 635 640Gly Tyr Ile Val Glu Ile Glu Asp Thr Glu Gly Leu Lys Glu Val Ile645 650 655Asn Asp Arg

Tyr Asp Met Leu Asn Ile Ser Ser Leu Arg Gln Asp Gly660 665 670Lys Thr Phe Ile Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr675 680 685Ile Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu690 695 700Asn Thr Ile Ile Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly705 710 715 720Ile Lys Lys Ile Leu Ile Phe Ser Lys Lys Gly Tyr Glu Ile Gly725 730 735562DNAArtificial sequenceM expression control sequence 5taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat gtacccagtt 60cg 62676DNAArtificial sequenceM+D expression control sequence 6taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat gtacccagtg 60tgagcggata acaatt 76773DNAArtificial sequenceU+D expression control sequence 7ttgtgagcgg ataacaattt gacaccctag ccgataggct ttaagatgta cccagtgtga 60gcggataaca att 738122DNAArtificial sequenceM+D1 expression control sequence 8gatccaagct taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat 60gtacccaatt gtgagcggat aacaatttca cacattaaag aggagaaatt acatatggat 120cg 1229119DNAArtificial sequenceM+D2 expression control sequence 9gatccaagct taaaaaactg caaaaaatag tttgacttgt gagcggataa caattaagat 60gtacccagtg tgagcggata acaatttcac attaaagagg agaaattaca tatggatcg 1191028DNAArtificial sequencelac operator sequence 10aattgtgagc ggataacaat ttcacaca 281116DNAArtificial sequenceoperator sequence 11gtgagcggat aacaat 16124208DNAArtificial sequencepHE4-5 expression plasmid sequence 12aagcttaaaa aactgcaaaa aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa tttcacacat taaagaggag aaattacata tggaccgttt 120ccacgctacc tccgctgact gctgcatctc ctacaccccg cgttccatcc cgtgctcgct 180gctggaatcc tacttcgaaa ccaactccga atgctccaaa ccgggtgtta tcttcctgac 240caaaaaaggt cgtcgtttct gcgctaaccc gtccgacaaa caggttcagg tttgtatgcg 300tatgctgaaa ctggacaccc gtatcaaaac ccgtaaaaac tgataaggta cctaagtgag 360tagggcgtcc gatcgacgga cgcctttttt ttgaattcgt aatcatggtc atagctgttt 420cctgtgtgaa attgttatcc gctcacaatt ccacacaaca tacgagccgg aagcataaag 480tgtaaagcct ggggtgccta atgagtgagc taactcacat taattgcgtt gcgctcactg 540cccgctttcc agtcgggaaa cctgtcgtgc cagctgcatt aatgaatcgg ccaacgcgcg 600gggagaggcg gtttgcgtat tgggcgctct tccgcttcct cgctcactga ctcgctgcgc 660tcggtcgttc ggctgcggcg agcggtatca gctcactcaa aggcggtaat acggttatcc 720acagaatcag gggataacgc aggaaagaac atgtgagcaa aaggccagca aaaggccagg 780aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc tccgcccccc tgacgagcat 840cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga caggactata aagataccag 900gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc cgaccctgcc gcttaccgga 960tacctgtccg cctttctccc ttcgggaagc gtggcgcttt ctcatagctc acgctgtagg 1020tatctcagtt cggtgtaggt cgttcgctcc aagctgggct gtgtgcacga accccccgtt 1080cagcccgacc gctgcgcctt atccggtaac tatcgtcttg agtccaaccc ggtaagacac 1140gacttatcgc cactggcagc agccactggt aacaggatta gcagagcgag gtatgtaggc 1200ggtgctacag agttcttgaa gtggtggcct aactacggct acactagaag aacagtattt 1260ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa gagttggtag ctcttgatcc 1320ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt gcaagcagca gattacgcgc 1380agaaaaaaag gatctcaaga agatcctttg atcttttcta cggggtctga cgctcagtgg 1440aacgaaaact cacgttaagg gattttggtc atgagattat cgtcgacaat tcgcgcgcga 1500aggcgaagcg gcatgcattt acgttgacac catcgaatgg tgcaaaacct ttcgcggtat 1560ggcatgatag cgcccggaag agagtcaatt cagggtggtg aatgtgaaac cagtaacgtt 1620atacgatgtc gcagagtatg ccggtgtctc ttatcagacc gtttcccgcg tggtgaacca 1680ggccagccac gtttctgcga aaacgcggga aaaagtggaa gcggcgatgg cggagctgaa 1740ttacattccc aaccgcgtgg cacaacaact ggcgggcaaa cagtcgttgc tgattggcgt 1800tgccacctcc agtctggccc tgcacgcgcc gtcgcaaatt gtcgcggcga ttaaatctcg 1860cgccgatcaa ctgggtgcca gcgtggtggt gtcgatggta gaacgaagcg gcgtcgaagc 1920ctgtaaagcg gcggtgcaca atcttctcgc gcaacgcgtc agtgggctga tcattaacta 1980tccgctggat gaccaggatg ccattgctgt ggaagctgcc tgcactaatg ttccggcgtt 2040atttcttgat gtctctgacc agacacccat caacagtatt attttctccc atgaagacgg 2100tacgcgactg ggcgtggagc atctggtcgc attgggtcac cagcaaatcg cgctgttagc 2160gggcccatta agttctgtct cggcgcgtct gcgtctggct ggctggcata aatatctcac 2220tcgcaatcaa attcagccga tagcggaacg ggaaggcgac tggagtgcca tgtccggttt 2280tcaacaaacc atgcaaatgc tgaatgaggg catcgttccc actgcgatgc tggttgccaa 2340cgatcagatg gcgctgggcg caatgcgcgc cattaccgag tccgggctgc gcgttggtgc 2400ggatatctcg gtagtgggat acgacgatac cgaagacagc tcatgttata tcccgccgtt 2460aaccaccatc aaacaggatt ttcgcctgct ggggcaaacc agcgtggacc gcttgctgca 2520actctctcag ggccaggcgg tgaagggcaa tcagctgttg cccgtctcac tggtgaaaag 2580aaaaaccacc ctggcgccca atacgcaaac cgcctctccc cgcgcgttgg ccgattcatt 2640aatgcagctg gcacgacagg tttcccgact ggaaagcggg cagtgagcgc aacgcaatta 2700atgtaagtta gcgcgaattg tcgaccaaag cggccatcgt gcctccccac tcctgcagtt 2760cgggggcatg gatgcgcgga tagccgctgc tggtttcctg gatgccgacg gatttgcact 2820gccggtagaa ctccgcgagg tcgtccagcc tcaggcagca gctgaaccaa ctcgcgaggg 2880gatcgagccc ggggtgggcg aagaactcca gcatgagatc cccgcgctgg aggatcatcc 2940agccggcgtc ccggaaaacg attccgaagc ccaacctttc atagaaggcg gcggtggaat 3000cgaaatctcg tgatggcagg ttgggcgtcg cttggtcggt catttcgaac cccagagtcc 3060cgctcagaag aactcgtcaa gaaggcgata gaaggcgatg cgctgcgaat cgggagcggc 3120gataccgtaa agcacgagga agcggtcagc ccattcgccg ccaagctctt cagcaatatc 3180acgggtagcc aacgctatgt cctgatagcg gtccgccaca cccagccggc cacagtcgat 3240gaatccagaa aagcggccat tttccaccat gatattcggc aagcaggcat cgccatgggt 3300cacgacgaga tcctcgccgt cgggcatgcg cgccttgagc ctggcgaaca gttcggctgg 3360cgcgagcccc tgatgctctt cgtccagatc atcctgatcg acaagaccgg cttccatccg 3420agtacgtgct cgctcgatgc gatgtttcgc ttggtggtcg aatgggcagg tagccggatc 3480aagcgtatgc agccgccgca ttgcatcagc catgatggat actttctcgg caggagcaag 3540gtgagatgac aggagatcct gccccggcac ttcgcccaat agcagccagt cccttcccgc 3600ttcagtgaca acgtcgagca cagctgcgca aggaacgccc gtcgtggcca gccacgatag 3660ccgcgctgcc tcgtcctgca gttcattcag ggcaccggac aggtcggtct tgacaaaaag 3720aaccgggcgc ccctgcgctg acagccggaa cacggcggca tcagagcagc cgattgtctg 3780ttgtgcccag tcatagccga atagcctctc cacccaagcg gccggagaac ctgcgtgcaa 3840tccatcttgt tcaatcatgc gaaacgatcc tcatcctgtc tcttgatcag atcttgatcc 3900cctgcgccat cagatccttg gcggcaagaa agccatccag tttactttgc agggcttccc 3960aaccttacca gagggcgccc cagctggcaa ttccggttcg cttgctgtcc ataaaaccgc 4020ccagtctagc tatcgccatg taagcccact gcaagctacc tgctttctct ttgcgcttgc 4080gttttccctt gtccagatag cccagtagct gacattcatc cggggtcagc accgtttctg 4140cggactggct ttctacgtgt tccgcttcct ttagcagccc ttgcgccctg agtgcttgcg 4200gcagcgtg 4208133984DNAArtificial sequencepHE4-0 expression plasmid sequence 13aagcttaaaa aactgcaaaa aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa tttcacacat taaagaggag aaattacata tgaaggatcc 120ttggtaccta agtgagtagg gcgtccgatc gacggacgcc ttttttttga attcgtaatc 180atggtcatag ctgtttcctg tgtgaaattg ttatccgctc acaattccac acaacatacg 240agccggaagc ataaagtgta aagcctgggg tgcctaatga gtgagctaac tcacattaat 300tgcgttgcgc tcactgcccg ctttccagtc gggaaacctg tcgtgccagc tgcattaatg 360aatcggccaa cgcgcgggga gaggcggttt gcgtattggg cgctcttccg cttcctcgct 420cactgactcg ctgcgctcgg tcgttcggct gcggcgagcg gtatcagctc actcaaaggc 480ggtaatacgg ttatccacag aatcagggga taacgcagga aagaacatgt gagcaaaagg 540ccagcaaaag gccaggaacc gtaaaaaggc cgcgttgctg gcgtttttcc ataggctccg 600cccccctgac gagcatcaca aaaatcgacg ctcaagtcag aggtggcgaa acccgacagg 660actataaaga taccaggcgt ttccccctgg aagctccctc gtgcgctctc ctgttccgac 720cctgccgctt accggatacc tgtccgcctt tctcccttcg ggaagcgtgg cgctttctca 780tagctcacgc tgtaggtatc tcagttcggt gtaggtcgtt cgctccaagc tgggctgtgt 840gcacgaaccc cccgttcagc ccgaccgctg cgccttatcc ggtaactatc gtcttgagtc 900caacccggta agacacgact tatcgccact ggcagcagcc actggtaaca ggattagcag 960agcgaggtat gtaggcggtg ctacagagtt cttgaagtgg tggcctaact acggctacac 1020tagaagaaca gtatttggta tctgcgctct gctgaagcca gttaccttcg gaaaaagagt 1080tggtagctct tgatccggca aacaaaccac cgctggtagc ggtggttttt ttgtttgcaa 1140gcagcagatt acgcgcagaa aaaaaggatc tcaagaagat cctttgatct tttctacggg 1200gtctgacgct cagtggaacg aaaactcacg ttaagggatt ttggtcatga gattatcgtc 1260gacaattcgc gcgcgaaggc gaagcggcat gcatttacgt tgacaccatc gaatggtgca 1320aaacctttcg cggtatggca tgatagcgcc cggaagagag tcaattcagg gtggtgaatg 1380tgaaaccagt aacgttatac gatgtcgcag agtatgccgg tgtctcttat cagaccgttt 1440cccgcgtggt gaaccaggcc agccacgttt ctgcgaaaac gcgggaaaaa gtggaagcgg 1500cgatggcgga gctgaattac attcccaacc gcgtggcaca acaactggcg ggcaaacagt 1560cgttgctgat tggcgttgcc acctccagtc tggccctgca cgcgccgtcg caaattgtcg 1620cggcgattaa atctcgcgcc gatcaactgg gtgccagcgt ggtggtgtcg atggtagaac 1680gaagcggcgt cgaagcctgt aaagcggcgg tgcacaatct tctcgcgcaa cgcgtcagtg 1740ggctgatcat taactatccg ctggatgacc aggatgccat tgctgtggaa gctgcctgca 1800ctaatgttcc ggcgttattt cttgatgtct ctgaccagac acccatcaac agtattattt 1860tctcccatga agacggtacg cgactgggcg tggagcatct ggtcgcattg ggtcaccagc 1920aaatcgcgct gttagcgggc ccattaagtt ctgtctcggc gcgtctgcgt ctggctggct 1980ggcataaata tctcactcgc aatcaaattc agccgatagc ggaacgggaa ggcgactgga 2040gtgccatgtc cggttttcaa caaaccatgc aaatgctgaa tgagggcatc gttcccactg 2100cgatgctggt tgccaacgat cagatggcgc tgggcgcaat gcgcgccatt accgagtccg 2160ggctgcgcgt tggtgcggat atctcggtag tgggatacga cgataccgaa gacagctcat 2220gttatatccc gccgttaacc accatcaaac aggattttcg cctgctgggg caaaccagcg 2280tggaccgctt gctgcaactc tctcagggcc aggcggtgaa gggcaatcag ctgttgcccg 2340tctcactggt gaaaagaaaa accaccctgg cgcccaatac gcaaaccgcc tctccccgcg 2400cgttggccga ttcattaatg cagctggcac gacaggtttc ccgactggaa agcgggcagt 2460gagcgcaacg caattaatgt aagttagcgc gaattgtcga ccaaagcggc catcgtgcct 2520ccccactcct gcagttcggg ggcatggatg cgcggatagc cgctgctggt ttcctggatg 2580ccgacggatt tgcactgccg gtagaactcc gcgaggtcgt ccagcctcag gcagcagctg 2640aaccaactcg cgaggggatc gagcccgggg tgggcgaaga actccagcat gagatccccg 2700cgctggagga tcatccagcc ggcgtcccgg aaaacgattc cgaagcccaa cctttcatag 2760aaggcggcgg tggaatcgaa atctcgtgat ggcaggttgg gcgtcgcttg gtcggtcatt 2820tcgaacccca gagtcccgct cagaagaact cgtcaagaag gcgatagaag gcgatgcgct 2880gcgaatcggg agcggcgata ccgtaaagca cgaggaagcg gtcagcccat tcgccgccaa 2940gctcttcagc aatatcacgg gtagccaacg ctatgtcctg atagcggtcc gccacaccca 3000gccggccaca gtcgatgaat ccagaaaagc ggccattttc caccatgata ttcggcaagc 3060aggcatcgcc atgggtcacg acgagatcct cgccgtcggg catgcgcgcc ttgagcctgg 3120cgaacagttc ggctggcgcg agcccctgat gctcttcgtc cagatcatcc tgatcgacaa 3180gaccggcttc catccgagta cgtgctcgct cgatgcgatg tttcgcttgg tggtcgaatg 3240ggcaggtagc cggatcaagc gtatgcagcc gccgcattgc atcagccatg atggatactt 3300tctcggcagg agcaaggtga gatgacagga gatcctgccc cggcacttcg cccaatagca 3360gccagtccct tcccgcttca gtgacaacgt cgagcacagc tgcgcaagga acgcccgtcg 3420tggccagcca cgatagccgc gctgcctcgt cctgcagttc attcagggca ccggacaggt 3480cggtcttgac aaaaagaacc gggcgcccct gcgctgacag ccggaacacg gcggcatcag 3540agcagccgat tgtctgttgt gcccagtcat agccgaatag cctctccacc caagcggccg 3600gagaacctgc gtgcaatcca tcttgttcaa tcatgcgaaa cgatcctcat cctgtctctt 3660gatcagatct tgatcccctg cgccatcaga tccttggcgg caagaaagcc atccagttta 3720ctttgcaggg cttcccaacc ttaccagagg gcgccccagc tggcaattcc ggttcgcttg 3780ctgtccataa aaccgcccag tctagctatc gccatgtaag cccactgcaa gctacctgct 3840ttctctttgc gcttgcgttt tcccttgtcc agatagccca gtagctgaca ttcatccggg 3900gtcagcaccg tttctgcgga ctggctttct acgtgttccg cttcctttag cagcccttgc 3960gccctgagtg cttgcggcag cgtg 3984144277DNAArtificial sequencepHE4-a expression plasmid sequence 14aagcttaaaa aactgcaaaa aatagtttga cttgtgagcg gataacaatt aagatgtacc 60caattgtgag cggataacaa tttcacacat taaagaggag aaattacata tgtgatagat 120aaaagacgct gaaaccgaat tcttgttgtc caaactgccg ctggaaaacc cggttctgct 180ggaccgtttc cacgctacct ccgctgactg ctgcatctcc tacaccacgc gttccatccc 240gtgctcgctg ctggaatcct acttcgaaac caactccgaa tgctccaaac cgggtgttat 300cttcctgacc aaaaaaggtc gtcgtttctg cgctaacccg tccgacaaac aggttcaggt 360ttgtatgcgt atgctgaaac tggacacccg tgcggccgct ctagaggatc ctcgaggtac 420ctaagtgagt agggcgtccg atcgacggac gccttttttt tgaattcgta atcatggtca 480tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat acgagccgga 540agcataaagt gtaaagcctg gggtgcctaa tgagtgagct aactcacatt aattgcgttg 600cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc 660caacgcgcgg ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc gctcactgac 720tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata 780cggttatcca cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa 840aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct 900gacgagcatc acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa 960agataccagg cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg 1020cttaccggat acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca 1080cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa 1140ccccccgttc agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg 1200gtaagacacg acttatcgcc actggcagca gccactggta acaggattag cagagcgagg 1260tatgtaggcg gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga 1320acagtatttg gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc 1380tcttgatccg gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag 1440attacgcgca gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac 1500gctcagtgga acgaaaactc acgttaaggg attttggtca tgagattatc gtcgacaatt 1560cgcgcgcgaa ggcgaagcgg catgcattta cgttgacacc atcgaatggt gcaaaacctt 1620tcgcggtatg gcatgatagc gcccggaaga gagtcaattc agggtggtga atgtgaaacc 1680agtaacgtta tacgatgtcg cagagtatgc cggtgtctct tatcagaccg tttcccgcgt 1740ggtgaaccag gccagccacg tttctgcgaa aacgcgggaa aaagtggaag cggcgatggc 1800ggagctgaat tacattccca accgcgtggc acaacaactg gcgggcaaac agtcgttgct 1860gattggcgtt gccacctcca gtctggccct gcacgcgccg tcgcaaattg tcgcggcgat 1920taaatctcgc gccgatcaac tgggtgccag cgtggtggtg tcgatggtag aacgaagcgg 1980cgtcgaagcc tgtaaagcgg cggtgcacaa tcttctcgcg caacgcgtca gtgggctgat 2040cattaactat ccgctggatg accaggatgc cattgctgtg gaagctgcct gcactaatgt 2100tccggcgtta tttcttgatg tctctgacca gacacccatc aacagtatta ttttctccca 2160tgaagacggt acgcgactgg gcgtggagca tctggtcgca ttgggtcacc agcaaatcgc 2220gctgttagcg ggcccattaa gttctgtctc ggcgcgtctg cgtctggctg gctggcataa 2280atatctcact cgcaatcaaa ttcagccgat agcggaacgg gaaggcgact ggagtgccat 2340gtccggtttt caacaaacca tgcaaatgct gaatgagggc atcgttccca ctgcgatgct 2400ggttgccaac gatcagatgg cgctgggcgc aatgcgcgcc attaccgagt ccgggctgcg 2460cgttggtgcg gatatctcgg tagtgggata cgacgatacc gaagacagct catgttatat 2520cccgccgtta accaccatca aacaggattt tcgcctgctg gggcaaacca gcgtggaccg 2580cttgctgcaa ctctctcagg gccaggcggt gaagggcaat cagctgttgc ccgtctcact 2640ggtgaaaaga aaaaccaccc tggcgcccaa tacgcaaacc gcctctcccc gcgcgttggc 2700cgattcatta atgcagctgg cacgacaggt ttcccgactg gaaagcgggc agtgagcgca 2760acgcaattaa tgtaagttag cgcgaattgt cgaccaaagc ggccatcgtg cctccccact 2820cctgcagttc gggggcatgg atgcgcggat agccgctgct ggtttcctgg atgccgacgg 2880atttgcactg ccggtagaac tccgcgaggt cgtccagcct caggcagcag ctgaaccaac 2940tcgcgagggg atcgagcccg gggtgggcga agaactccag catgagatcc ccgcgctgga 3000ggatcatcca gccggcgtcc cggaaaacga ttccgaagcc caacctttca tagaaggcgg 3060cggtggaatc gaaatctcgt gatggcaggt tgggcgtcgc ttggtcggtc atttcgaacc 3120ccagagtccc gctcagaaga actcgtcaag aaggcgatag aaggcgatgc gctgcgaatc 3180gggagcggcg ataccgtaaa gcacgaggaa gcggtcagcc cattcgccgc caagctcttc 3240agcaatatca cgggtagcca acgctatgtc ctgatagcgg tccgccacac ccagccggcc 3300acagtcgatg aatccagaaa agcggccatt ttccaccatg atattcggca agcaggcatc 3360gccatgggtc acgacgagat cctcgccgtc gggcatgcgc gccttgagcc tggcgaacag 3420ttcggctggc gcgagcccct gatgctcttc gtccagatca tcctgatcga caagaccggc 3480ttccatccga gtacgtgctc gctcgatgcg atgtttcgct tggtggtcga atgggcaggt 3540agccggatca agcgtatgca gccgccgcat tgcatcagcc atgatggata ctttctcggc 3600aggagcaagg tgagatgaca ggagatcctg ccccggcact tcgcccaata gcagccagtc 3660ccttcccgct tcagtgacaa cgtcgagcac agctgcgcaa ggaacgcccg tcgtggccag 3720ccacgatagc cgcgctgcct cgtcctgcag ttcattcagg gcaccggaca ggtcggtctt 3780gacaaaaaga accgggcgcc cctgcgctga cagccggaac acggcggcat cagagcagcc 3840gattgtctgt tgtgcccagt catagccgaa tagcctctcc acccaagcgg ccggagaacc 3900tgcgtgcaat ccatcttgtt caatcatgcg aaacgatcct catcctgtct cttgatcaga 3960tcttgatccc ctgcgccatc agatccttgg cggcaagaaa gccatccagt ttactttgca 4020gggcttccca accttaccag agggcgcccc agctggcaat tccggttcgc ttgctgtcca 4080taaaaccgcc cagtctagct atcgccatgt aagcccactg caagctacct gctttctctt 4140tgcgcttgcg ttttcccttg tccagatagc ccagtagctg acattcatcc ggggtcagca 4200ccgtttctgc ggactggctt tctacgtgtt ccgcttcctt tagcagccct tgcgccctga 4260gtgcttgcgg cagcgtg 427715319PRTArtificial sequenceLacIq repressor gene sequence 15Met Ala Glu Leu Asn Tyr Ile Pro Asn Arg Val Ala Gln Gln Leu Ala1 5 10 15Gly Lys Gln Ser Leu Leu Ile Gly Val Ala Thr Ser Ser Leu Ala Leu20 25 30His Ala Pro Ser Gln Ile Val Ala Ala Ile Lys Ser Arg Ala Asp Gln35 40 45Leu Gly Ala Ser Val Val Val Ser Met Val Glu Arg Ser Gly Val Glu50 55 60Ala Cys Lys Ala Ala Val His Asn Leu Leu Ala Gln Arg Val Ser Gly65 70 75 80Leu Ile Ile Asn Tyr Pro Leu Asp Asp Gln Asp Ala Ile Ala Val Glu85 90 95Ala Ala Cys Thr Asn Val Pro Ala Leu Phe Leu Asp Val Ser Asp Gln100 105

110Thr Pro Ile Asn Ser Ile Ile Phe Ser His Glu Asp Gly Thr Arg Leu115 120 125Gly Val Glu His Leu Val Ala Leu Gly His Gln Gln Ile Ala Leu Leu130 135 140Ala Gly Pro Leu Ser Ser Val Ser Ala Arg Leu Arg Leu Ala Gly Trp145 150 155 160His Lys Tyr Leu Thr Arg Asn Gln Ile Gln Pro Ile Ala Glu Arg Glu165 170 175Gly Asp Trp Ser Ala Met Ser Gly Phe Gln Gln Thr Met Gln Met Leu180 185 190Asn Glu Gly Ile Val Pro Thr Ala Met Leu Val Ala Asn Asp Gln Met195 200 205Ala Leu Gly Ala Met Arg Ala Ile Thr Glu Ser Gly Leu Arg Val Gly210 215 220Ala Asp Ile Ser Val Val Gly Tyr Asp Asp Thr Glu Asp Ser Ser Cys225 230 235 240Tyr Ile Pro Pro Leu Thr Thr Ile Lys Gln Asp Phe Arg Leu Leu Gly245 250 255Gln Thr Ser Val Asp Arg Leu Leu Gln Leu Ser Gln Gly Gln Ala Val260 265 270Lys Gly Asn Gln Leu Leu Pro Val Ser Leu Val Lys Arg Lys Thr Thr275 280 285Leu Ala Pro Asn Thr Gln Thr Ala Ser Pro Arg Ala Leu Ala Asp Ser290 295 300Leu Met Gln Leu Ala Arg Gln Val Ser Arg Leu Glu Ser Gly Gln305 310 31516264PRTArtificial sequenceKanamycin resistance gene sequence 16Met Ile Glu Gln Asp Gly Leu His Ala Gly Ser Pro Ala Ala Trp Val1 5 10 15Glu Arg Leu Phe Gly Tyr Asp Trp Ala Gln Gln Thr Ile Gly Cys Ser20 25 30Asp Ala Ala Val Phe Arg Leu Ser Ala Gln Gly Arg Pro Val Leu Phe35 40 45Val Lys Thr Asp Leu Ser Gly Ala Leu Asn Glu Leu Gln Asp Glu Ala50 55 60Ala Arg Leu Ser Trp Leu Ala Thr Thr Gly Val Pro Cys Ala Ala Val65 70 75 80Leu Asp Val Val Thr Glu Ala Gly Arg Asp Trp Leu Leu Leu Gly Glu85 90 95Val Pro Gly Gln Asp Leu Leu Ser Ser His Leu Ala Pro Ala Glu Lys100 105 110Val Ser Ile Met Ala Asp Ala Met Arg Arg Leu His Thr Leu Asp Pro115 120 125Ala Thr Cys Pro Phe Asp His Gln Ala Lys His Arg Ile Glu Arg Ala130 135 140Arg Thr Arg Met Glu Ala Gly Leu Val Asp Gln Asp Asp Leu Asp Glu145 150 155 160Glu His Gln Gly Leu Ala Pro Ala Glu Leu Phe Ala Arg Leu Lys Ala165 170 175Arg Met Pro Asp Gly Glu Asp Leu Val Val Thr His Gly Asp Ala Cys180 185 190Leu Pro Asn Ile Met Val Glu Asn Gly Arg Phe Ser Gly Phe Ile Asp195 200 205Cys Gly Arg Leu Gly Val Ala Asp Arg Tyr Gln Asp Ile Ala Leu Ala210 215 220Thr Arg Asp Ile Ala Glu Glu Leu Gly Gly Glu Trp Ala Asp Arg Phe225 230 235 240Leu Val Leu Tyr Gly Ile Ala Ala Pro Asp Ser Gln Arg Ile Ala Phe245 250 255Tyr Arg Leu Leu Asp Glu Phe Phe2601718DNAArtificial sequencepHE4 Shine-Dalgarno sequence 17attaaagagg agaaatta 181812DNAArtificial sequenceShine Dalgarno sequence based on phoA promoter 18gtaaaggaag ta 12

Patent applications by Michael W. Laird, San Ramon, CA US

Patent applications by Human Genome Sciences, Inc.

Patent applications in class Recombinant DNA technique included in method of making a protein or polypeptide

Patent applications in all subclasses Recombinant DNA technique included in method of making a protein or polypeptide

User Contributions:

Comment about this patent or add new information about this topic:

Images included with this patent application:

Date	Title
Similar patent applications:
2010-08-19	Postpartum cells derived from umbilical cord tissue, and methods of making and using the same
2010-08-19	Rotation system for cell growth chamber of a cell expansion system and method of use therefor
2010-08-05	Modified endoglucanase ii and methods of use
2009-06-04	Blood diluent and method of use thereof
2010-08-12	Polymerases and methods of use thereof

Date	Title
New patent applications in this class:
2022-05-05	Engineered cd47 extracellular domain for bioconjugation
2019-05-16	High cell density anaerobic fermentation for protein expression
2019-05-16	Polynucleotide encoding fusion of anchoring motif and dehalogenase, host cell including the polynucleotide, and use thereof
2019-05-16	Cell culture method, medium, and medium kit
2018-01-25	Protein expression strains

Date	Title
New patent applications from these inventors:
2016-06-30	Methods and compositions for preventing norleucine misincorporation into proteins
2016-05-12	Harvest operations for recombinant proteins
2015-08-13	Harvest operations for recombinant proteins
2014-05-08	Prevention of disulfide bond reduction during recombinant production of polypeptides
2014-03-20	Methods and compositions for preventing norleucine misincorporation into proteins

Rank	Inventor's name
Top Inventors for class "Chemistry: molecular biology and microbiology"
1	Marshall Medoff
2	Anthony P. Burgard
3	Mark J. Burk
4	Robin E. Osterhout
5	Rangarajan Sampath

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Patent application title: Modified Shine-Dalgarno Sequences and Methods of Use Thereof

Claims:

Description: