Patent application title: ENTROPIC BRISTLE DOMAIN SEQUENCES AND THEIR USE IN RECOMBINANT PROTEIN PRODUCTION

Inventors: A. Keith Dunker (Indianapolis, IN, US) Vladimir N. Uversky (Carmel, IN, US) Marc S. Cortese (Indianapolis, IN, US) James Mueller (Indianapolis, IN, US)
Assignees: MOLECULAR KINETICS INCORPORATED
IPC8 Class: AC12P2106FI
USPC Class: 435 691
Class name: Chemistry: molecular biology and microbiology micro-organism, tissue cell culture or enzyme using process to synthesize a desired chemical compound or composition recombinant dna technique included in method of making a protein or polypeptide
Publication date: 2009-09-03
Patent application number: 20090221032

ENTROPIC BRISTLE DOMAIN SEQUENCES AND THEIR USE IN RECOMBINANT PROTEIN PRODUCTION - Patent application init(); ?>

Patent application title: ENTROPIC BRISTLE DOMAIN SEQUENCES AND THEIR USE IN RECOMBINANT PROTEIN PRODUCTION

Inventors: Vladimir N. Uversky A. Keith Dunker Marc S. Cortese James Mueller
Agents: SEED INTELLECTUAL PROPERTY LAW GROUP PLLC
Assignees: MOLECULAR KINETICS INCORPORATED
Origin: SEATTLE, WA US
IPC8 Class: AC12P2106FI
USPC Class: 435 691

Abstract:

Compositions and methods for recombinant protein production and, more particularly, fusion polypeptides, polynucleotides encoding fusion polypeptides, expression vectors, kits, and related methods for recombinant protein production, are provided.

Claims:

1. An isolated fusion polypeptide comprising at least one entropic bristle domain (EBD) sequence and at least one heterologous polypeptide sequence, wherein the fusion polypeptide has increased solubility relative to the heterologous polypeptide sequence, reduced aggregation relative to the heterologous polypeptide sequence and/or improved folding relative to the heterologous polypeptide sequence.

2. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mammalian neurofilament protein.

3. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mammalian neurofilament NF-H protein.

4. The polypeptide according to claim 1, wherein the EBD sequence is derived from a human neurofilament NF-H protein having a sequence set forth in SEQ ID NO: 1 or 3.

5. The polypeptide according to claim 1, wherein the EBD sequence comprises a neurofilament NF-H sequence selected from the group consisting of SPEAEK (SEQ ID NO:23), SPAAVK (SEQ ID NO:24), SPAEAK (SEQ ID NO:25), SPAEPK (SEQ ID NO:26), SPAEVK (SEQ ID NO:27), SPATVK (SEQ ID NO:28), SPEKAK (SEQ ID NO:29), SPGEAK (SEQ ID NO:30), SPIEVK (SEQ ID NO:31), SPPEAK (SEQ ID NO:32), SPSEAK (SEQ ID NO:33), SPEKEAK (SEQ ID NO:34), SPAKEKAK (SEQ ID NO:35), SPEKEEAK (SEQ ID NO:36), SPTKEEAK (SEQ ID NO:37), SPVKEEAK (SEQ ID NO:38), SPVKAEAK (SEQ ID NO:39), SPVKEEAK (SEQ ID NO:40), SPVKEEVK (SEQ ID NO:41), SPVKEEEKP (SEQ ID NO:42), SPEKAKTLDVK (SEQ ID NO:43), SPADKFPEKAK (SEQ ID NO:44), SPEAKTPAKEEAR (SEQ ID NO:45), SPEKAKTPVKEGAK (SEQ ID NO:46), SPVKEEAKTPEKAK (SEQ ID NO:47), SPVKEGAKPPEKAKPLDVK (SEQ ID NO:48), SPVKEDIKPPAEAKSPEKAK (SEQ ID NO:49), SPLKEDAKAPEKEIPKKEEVK (SEQ ID NO:50), SPEKEEAKTSEKVAPKKEEVK (SEQ ID NO:51), SPEAQTPVQEEATVPTDIRPPEQVK (SEQ ID NO:52), SPVKEEVKAKEPPKKVEEEKTLPTPKTEAKESKKDE (SEQ ID NO:53), or a combination thereof.

6. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mammalian neurofilament protein NF-M.

7. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mammalian neurofilament NF-M protein having the sequence set forth in any one of SEQ ID NOs: 5, 7, 9, 11, 13 or 15.

8. The polypeptide according to claim 1, wherein the EBD sequence comprises a neurofilament NF-M sequence selected from the group consisting of SPPK (SEQ ID NO:54), SPVK (SEQ ID NO:55), SPAAK (SEQ ID NO:56), SPAPK (SEQ ID NO:57), SPEAK (SEQ ID NO:58), SPMPK (SEQ ID NO:59), SPPAK (SEQ ID NO:60), SPTAK (SEQ ID NO:61), SPTTK (SEQ ID NO:62), SPVAK (SEQ ID NO:63), SPVAK (SEQ ID NO:64), SPVPK (SEQ ID NO:65), SPVSK (SEQ ID NO:66), SPEKPA (SEQ ID NO:67), SPVEEKAK (SEQ ID NO:68), SPVEEKGK (SEQ ID NO:69), SPVEEVKP (SEQ ID NO:70), SPEKPATPKVT (SEQ ID NO:71), SPEKPRTPEKPA (SEQ ID NO:72), SPEKPTTPEKW (SEQ ID NO:73), SPEKPSSPLKDEKA (SEQ ID NO:74), SPVKEKAVEEMITIT (SEQ ID NO:75), SPVKEEAAEEAATITK (SEQ ID NO:76), SPVPKSPVEEVKPKAEATAG (SEQ ID NO:77), SPVKAESPVKEEVPAKPVKV (SEQ ID NO:78), SPEKEAKEEEKPQEKEKEKEK (SEQ ID NO:79), SPVKATTPEIKEEEGEKEEEGQE (SEQ ID NO:80), SPVEEVKPKPEAKAGKGEQKEE (SEQ ID NO:81), SPEKPATPEKPPTPEKAITPEKVR (SEQ ID NO:82), SPEKPATPEKPRTPEKPATPEKPR (SEQ ID NO:83), SPKEEKVEKKEEKPKDVPKKKAE (SEQ ID NO:84), SPKEEKAEKKEEKPKDVPEKKKAE (SEQ ID NO:85), SPVEEAKSKAEVGKGEQKEEEEKE (SEQ ID NO:86), SPKEEKVEKKEEKPKDVPDKKKAE (SEQ ID NO:87), SPVKEEAVAEVVTITKSVKVHLEKET (SEQ ID NO:88), SSEKDEGEQEEEEGETEAEGEGEEAEAKEEK (SEQ ID NO:89), SPVEEVKPKAEAGAEKGEQKEKVEEEKKEAKE (SEQ ID NO:90), SPVTEQAKAVQKAAAEVGKDQKAEKAAEKAAKEEKAA (SEQ ID NO:91), SPEAKEEEEEGEKEEEEEGQEEEEEEDEGVKSDQAEEGGSEKEG (SEQ ID NO:92), or a combination thereof.

9. The polypeptide according to claim 1, wherein the EBD sequence is derived from a phage sequence.

10. The polypeptide according to claim 1, wherein the EBD sequence is derived from a filamentous phage fd.

11. The polypeptide according to claim 1, wherein the EBD sequence comprises at least one linker region derived from a filamentous phage fd adsorption protein pIII having a sequence set forth in SEQ ID NO: 17.

12. The polypeptide according to claim 1, wherein the EBD sequence comprises a filamentous phage fd adsorption protein pIII sequence selected from the group consisting of EGGGS (SEQ ID NO:93), EGGGT (SEQ ID NO:94), SEGGG (SEQ ID NO:95), GGGSGGG (SEQ ID NO:96), SGGGSGSG (SEQ ID NO:97), and SGGGSEGGG (SEQ ID NO:98), or a combination thereof.

13. The polypeptide according to claim 1, wherein the EBD sequence is derived from a nuclear pore Nup2p protein having a sequence set forth in SEQ ID NO: 19.

14. The polypeptide according to claim 1, wherein the EBD sequence comprises a yeast nucleoporin Nup2p sequence selected from the group consisting of FSFGTSQPNNTPS (SEQ ID NO:99), FSFSIPSKNTPDASKPS (SEQ ID NO:100), FVFGQAAAKPSLEKSS (SEQ ID NO:101), FSFGVPNSSKNETSKPV (SEQ ID NO:102), FTFGTKHAADSQNNKPS (SEQ ID NO:103), FTFGSSALADNKEDVKKP (SEQ ID NO:104), FSFGINTNTTKTADTKAPT (SEQ ID NO:105), FSFGKTTANLPANSSTSPAPSIPSTG (SEQ ID NO:106), FSFGPKKENRKKDESDSENDIEIKGPE (SEQ ID NO:107), FKFSGTVSSDVFKLNPSTDKNEKKTETNAKP (SEQ ID NO:108), FKFSLPFEQKGSQTTTNDSKEESTTEATGNESQ (SEQ ID NO:109), FTFGSTTIEKKNDENSTSNSKPEKSSDSNDSNPS (SEQ ID NO:110), FSFGISNGSESKDSDKPSLPSAVDGENDKKEATKPA (SEQ ID NO:111), FSFSSATSTTEQTKSKNPLSLTEATKTNVDNNSKAEAS (SEQ ID NO:112) and FSFGAATPSAKEASQEDDNNNVEKPSSKPAFNLISNAGTEKEKESKKDSKPA (SEQ ID NO:113), or a combination thereof.

15. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mammalian elastin protein.

16. The polypeptide according to claim 1, wherein the EBD sequence is derived from a mouse elastin protein having a sequence set forth in SEQ ID NO: 21.

17. The polypeptide according to claim 1, wherein the EBD sequence is an elastin sequence selected from the group consisting of VPGA (SEQ ID NO:114), GAGGL (SEQ ID NO:115), GAGGG (SEQ ID NO:116), VPGVG (SEQ ID NO:117), VPGFGAGA (SEQ ID NO:118), VPGALPGA (SEQ ID NO:119), VPGFGAGAG (SEQ ID NO:120), VPAVPGAGG (SEQ ID NO:121), VPGGVGVGG (SEQ ID NO:122), VGAGGFPGYG (SEQ ID NO:123), VPGAVPGGLPGG (SEQ ID NO:124), VSPAAAAKAAKYGAA (SEQ ID NO:125), VPQVGAGIGAGGKPGK (SEQ ID NO:126), VPGGVGVGGIPGGVGVGG (SEQ ID NO:127), VPGGVGGIGGIGGLGVSTGAV (SEQ ID NO:128), VPGGAAGAAAAYKAAAKAGAGLGGVGG (SEQ ID NO:129), VSPAAAAKAAAKAAKYGARGGVGIPTYG (SEQ ID NO:130), KPPKPYGGALGALGYQGGGCFGKSCGRKRK (SEQ ID NO:131), VPGAGTPAAAAAAAAAKAAAKAGLGPGVGG (SEQ ID NO:132), VPGRVAGAAPPAAAAAAAKAAAKAAQYGLG (SEQ ID NO:133), VPGVGLPGVYPGGVLPGTGARFPGVGVLPG (SEQ ID NO:134), VPTGTGVKAKAPGGGGAFSGIPGVGPFGGQQPG (SEQ ID NO:135), VPGGVYYPGAGIGGLGGGGGALGPGGKPPKPGAG (SEQ ID NO:136), VGAGAGLGGASPAAAAAAAKAAKYGAGGAGALGGL (SEQ ID NO:137), GLGGVLGARPFPGGGVAARPGFGLSPIYPGGGAGGLGVGG (SEQ ID NO:138), VPGSLAASKAAKYGAAGGLGGPGGLGGPGGLGGPGGLGGAG (SEQ ID NO:139), VPGGPGVRLPGAGIPGVGGIPGVGGIPGVGGPGIGGPGIVGGPGA (SEQ ID NO:140), VLPGVGGGGIPGGAGAIPGIGGIAGAGTPAAAAAAKAAAKAAKYGAAGGL (SEQ ID NO:141), VPGGVGPGGVTGIGAGPGGLGGAGSPAAAKSAAKAAAKAQYRAAAGLGAG (SEQ ID NO:142), and VPLGYPIKAPKLPGGYGLPYTNGKLPYGVAGAGGKAGYPTGTGVGSQAAAAAAK AAKYGAGGAG (SEQ ID NO:143), or a combination thereof.

18. The polypeptide according to claim 1, wherein the polypeptide further comprises a cleavable linker.

19. An isolated polynucleotide encoding a fusion polypeptide according to claim 1.

20. An expression vector comprising an isolated polynucleotide according to claims 19.

21. A host cell comprising an expression vector according to claim 20.

22. A kit comprising an isolated polynucleotide according to claim 19.

23. A kit comprising an expression vector according to claim 20.

24. A kit comprising a host cell according to claim 21.

25. A method for producing a recombinant protein comprising the steps of: (a) introducing into a host cell a polynucleotide according to claim 19 or an expression vector according to claim 20; and (b) expressing in the host cell a fusion polypeptide comprising at least one EBD sequence and at least one heterologous polypeptide sequence.

Description:

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001]This application is a continuation of U.S. patent application Ser. No. 11/485,613, filed Jul. 11, 2006, now U.S. Pat. No. 7,494,788, which application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application No. 60/698,456, filed Jul. 11, 2005, where these applications are incorporated herein by reference in their entireties.

STATEMENT REGARDING SEQUENCE LISTING

[0002]The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is 670098_--402Cl_SEQUENCE_LISTING.txt. The text file is 151 KB, was created on Jan. 22, 2009, and is being submitted electronically via EFS-Web.

FIELD OF THE INVENTION

[0003]The present invention relates generally to compositions and methods for recombinant protein production and, more particularly, to fusion polypeptides, polynucleotides encoding fusion polypeptides, expression vectors, kits, and related methods for recombinant protein production.

DETAILED OF THE RELATED ART

[0004]A large percentage of the proteins identified via the different genome sequencing effort have been difficult to express and/or purify as recombinant proteins using standard methods. For example, a trial study using Methanobacterium thermoautotrophicum as a model system identified a number of problems associated with high throughput structure determination (Christendat et al. (2000) Prog. Biophys. Mol. Biol. 73(5): 339-345; Christendat et al. (2000) Nat Struct Biol 7(10): 903-909). The complete list of genome-encoded proteins was filtered to remove proteins with predicted transmembrane regions or homologues to known structures. When these filtered proteins were taken through the cloning, expression, and structural determination steps of a high throughput process, only about 50% of the selected proteins could be purified in a state suitable for structural studies, with roughly 45% of large expressed proteins and 30% of small expressed proteins failing due to insolubility. The study concluded that considerable effort must be invested in improving the attrition rate due to proteins with poor expression levels and unfavorable biophysical properties. (Christendat et al. (2000) Prog. Biophys. Mol. Biol. 73(5): 339-345; Christendat et al. (2000) Nat Struct Biol 7(10): 903-909).

[0005]Similar results have been observed for other prokaryotic proteomes. One study reported the successful cloning and attempted expression of 1376 (73%) of the predicted 1877 genes of the Thermotoga maritima proteome. However, crystallization conditions were able to be determined for only 432 proteins (23%). A significant component of the decrease between the cloned and crystallized success levels was due to poor protein solubility and stability (Kuhn et al. (2002) Proteins 49(1): 142-5).

[0006]Similarly low success rates have been reported for eukaryotic proteomes. A study of a sample set of human proteins, for example, reported that the failure rate using high-throughput methods for three classes of proteins based on cellular location was 50% for soluble proteins, 70% for extracellular proteins, and more than 80% for membrane proteins (Braun et al. (2002) Proc Natl Acad Sci USA 99(5): 2654-9).

[0007]Interactions between individual recombinant proteins are responsible for a significant number of the previously mentioned failures. In a high-throughput structural determination study, Christendat and colleagues found that 24 of 32 proteins that were classified by nuclear magnetic resonance as aggregated displayed circular dichroism spectra consistent with stable folded proteins, suggesting that these proteins were folded properly but aggregated due to surface interactions (Christendat et al. (2000) Prog. Biophys. Mol. Biol. 73(5): 339-345). One possible explanation for this is that these proteins function in vivo as part of multimeric units but when they are recombinantly expressed, dimerization domains are exposed that mediate protein-protein interactions.

[0008]Prior methods used to increase recombinant protein stability include production in E. coli strains that are deficient in proteases (Gottesman and Zipser (1978) J Bacteriol 133(2): 844-51) and production of fusions of bacterial protein fragments to a recombinant polypeptide/protein of interest (Itakura et al., Science, 1977. 198:1056-63; Shen, Proc Natl Acad Sci USA, 1984. 81:4627-31), has also been attempted to stabilize foreign proteins in E. coli. In addition, fusing a leader sequence to a recombinant protein may cause a gene product to accumulate in the periplasm or be excreted, which may result in increased recovery of properly folded soluble protein (Nilsson et al., EMBO J, 1985. 4:1075-80; Abrahmsen et al., Nucleic Acids Res, 1986. 14:7487-500). These strategies have advantages for some proteins but they generally do not succeed when used, for example, with membrane proteins or proteins capable of strong protein-protein interactions.

[0009]Fusion polypeptides have also been used as an approach for improving the solubility and folding of recombinant polypeptides/proteins produced in E. coli (Zhan et al., Gene, 2001. 281:1-9). Some commonly used fusion partners which have been linked to heterologous protein sequences of interest include calmodulin-binding peptide (CBP) (Vaillancourt et al., Biotechniques, 1997. 22:451-3), glutathione-S-transferase (GST) (Smith, Methods Enzymol, 2000. 326:254-70), thioredoxin (TRX) (Martin Hammarstrom et al., Protein Science, 2002. 11:313-321), and maltose-binding protein (MBP) (Sachdev et al., Methods Enzymol, 2000. 326:312-21). Glutathione-S-transferase and maltose-binding protein have been found to increase the recombinant protein purification success rate when fused to a heterologous sequence in a controlled trial of 32 human test proteins (Braun et al., Proc Natl Acad Sci USA, 2002. 99:2654-9). Further, maltose-binding protein domain fusions have been shown to increase the solubility of recombinant proteins (Kapust et al., Protein Sci, 1999. 8:1668-74; Braun et al., Proc Natl Acad Sci USA, 2002. 99:2654-9; Martin Hammarstrom et al., Protein Science, 2002. 11:313-321). Maltose-binding protein may further benefit recombinant protein solubility and folding in that it may have chaperone-like properties that assist in folding of the fusion partner (Richarme et al., J Biol Chem, 1997. 272:15607-12; Bach et al., J Mol Biol, 2001. 312:79-93. However, these fusion approaches used to date have not been amendable to all classes of proteins, and have thus met with only limited success.

[0010]Entropic bristles have been used in a variety of polymers to reduce aggregation of small particles such as latex particles in paints and to stabilize a wide variety of other colloidal products (Hoh, Proteins, 1998. 32:223-228). Entropic bristles generally comprise amino acid residues that do not have a tendency to form secondary structure and in the process of random motion about their attachment points sweep out a significant region in space and entropically exclude other molecules by their random motion (Hoh, Proteins, 1998. 32:223-228). Entropic bristles are singular elements, comprising highly flexible, non-aggregating polymer chains, of which entropic brushes are assembled. In polymer chemistry, entropic bristles have been affixed to the surfaces of particles (e.g. latex beads), thereby forming entropic brushes which, in turn, prevent particle aggregation (Stabilization by attached polymer: steric stabilization, in Polymeric stabilization of colloidal dispersions, D. H. Napper, Editor. 1983, Academic Press: London. p. 18-30). EBDs can exclude large molecules but do not exclude small molecules such as water, salts, metal ions, or cofactors (Hoh, Proteins, 1998. 32:223-228).

[0011]EBDs can also function as steric stabilizers and operate through steric hindrance stabilization (Stabilization by attached polymer: steric stabilization, in Polymeric stabilization of colloidal dispersions, D. H. Napper, Editor. 1983, Academic Press: London. p. 18-30). Naper described characteristics that contribute to steric stabilization functions, including (1) they have an amphipathic sequence; (2) they are attached to the colloidal particle by one end rather than being totally adsorbed; (3) they are soluble in the medium used; (4) they are mutually repulsive; (5) they are thermodynamically stable; and (6) they exhibit stabilizing ability in proportion to their length. Steric stabilizers intended to function in aqueous media extend from the surface of colloidal molecules thus transforming their surfaces from hydrophobic to hydrophilic. The fact that sterically stabilized particles are thermodynamically stable leads them to spontaneously re-disperse when dried residue is reintroduced to solvent. Entropic bristles can adopt random-walk configurations in solution (Milner, Science, 1991. 251:905-914). These chains extend from an attachment point because of their affinity for the solvent. This affinity is due in part to the highly charged nature of the entropic bristle sequence.

[0012]While certain prior approaches have met with some success, there remains a need for new compositions and methods for improving the properties and characteristics of recombinant proteins, e.g., improving solubility, stability, yield and/or folding of recombinant proteins. The present invention addresses these needs and offers other related advantages by employing entropic bristle domain sequences as fusion partners in recombinant protein production, as described herein.

SUMMARY OF THE INVENTION

[0013]According to a general aspect of the present invention, there are provided isolated fusion polypeptides comprising at least one entropic bristle domain (EBD) sequence and at least one heterologous polypeptide sequence of interest. By providing an EBD sequence which effectively sweeps out the three-dimensional space surrounding a newly synthesized heterologous polypeptide, the fusion polypeptides of the invention offer a number of advantages over prior fusion polypeptides and methods relating thereto.

[0014]In one embodiment, a fusion polypeptide comprising an EBD sequence and a heterologous polypeptide sequence exhibits improved solubility relative to the corresponding heterologous polypeptide in the absence of the EBD sequence. In a related embodiment, the fusion polypeptide has at least 5% increased solubility relative to the heterologous polypeptide sequence, at least 25% increased solubility relative to the heterologous polypeptide sequence, or at least 50% increased solubility relative to the heterologous polypeptide sequence.

[0015]In another embodiment, a fusion polypeptide of the invention exhibits reduced aggregation relative to the level of aggregation of the heterologous polypeptide sequence in the absence of the EBD sequence. For example, a fusion polypeptide of the invention generally exhibits at least 10% reduced aggregation relative to the heterologous polypeptide sequence or at least 25% reduced aggregation relative to the heterologous polypeptide sequence.

[0016]In another embodiment, a fusion polypeptide of the invention exhibits improved self-folding relative to the heterologous polypeptide sequence in the absence of the EBD sequence.

[0017]In another embodiment of the present invention, an EBD sequence employed in a fusion polypeptide comprises an amino acid sequence that maintains a substantially random coil conformation.

[0018]In another embodiment, the EBD sequence of a fusion polypeptide of the invention comprises an amino acid sequence that is substantially mutually repulsive.

[0019]In another embodiment, the EBD sequence of a fusion polypeptide of the invention comprises an amino acid sequence that remains in substantially constant motion.

[0020]In a more particular embodiment, an EBD sequence of a fusion polypeptide of the invention is derived from a mammalian neurofilament protein. In a related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a mammalian neurofilament NF-H protein. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a human neurofilament NF-H protein having the sequence set forth in SEQ ID NO: 1. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a mouse neurofilament NF-H protein having the sequence set forth in SEQ ID NO: 3.

[0021]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises a neurofilament NF-H sequence selected from the group consisting of SPEAEK (SEQ ID NO:23), SPAAVK (SEQ ID NO:24), SPAEAK (SEQ ID NO:25), SPAEPK (SEQ ID NO:26), SPAEVK (SEQ ID NO:27), SPATVK (SEQ ID NO:28), SPEKAK (SEQ ID NO:29), SPGEAK (SEQ ID NO:30), SPIEVK (SEQ ID NO:31), SPPEAK (SEQ ID NO:32), SPSEAK (SEQ ID NO:33), SPEKEAK (SEQ ID NO:34), SPAKEKAK (SEQ ID NO:35), SPEKEEAK (SEQ ID NO:36), SPTKEEAK (SEQ ID NO:37), SPVKEEAK (SEQ ID NO:38), SPVKAEAK (SEQ ID NO:39), SPVKEEAK (SEQ ID NO:40), SPVKEEVK (SEQ ID NO:41), SPVKEEEKP (SEQ ID NO:42), SPEKAKTLDVK (SEQ ID NO:43), SPADKFPEKAK (SEQ ID NO:44), SPEAKTPAKEEAR (SEQ ID NO:45), SPEKAKTPVKEGAK (SEQ ID NO:46), SPVKEEAKTPEKAK (SEQ ID NO:47), SPVKEGAKPPEKAKPLDVK (SEQ ID NO:48), SPVKEDIKPPAEAKSPEKAK (SEQ ID NO:49), SPLKEDAKAPEKEIPKKEEVK (SEQ ID NO:50), SPEKEEAKTSEKVAPKKEEVK (SEQ ID NO:51), SPEAQTPVQEEATVPTDIRPPEQVK (SEQ ID NO:52), SPVKEEVKAKEPPKKVEEEKTLPTPKTEAKESKKDE (SEQ ID NO:53).

[0022]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises at least 2-100 repeats of a neurofilament NF-H sequence set forth above, or a combination thereof.

[0023]According to another particular embodiment of the present invention, an EBD sequence of a fusion polypeptide is derived from a mammalian neurofilament protein NF-M. In a related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a bovine neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 5. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a chicken neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 7. In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a human neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 9. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a mouse neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 11. In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a rat neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 13. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a rabbit neurofilament NF-M protein having the sequence set forth in SEQ ID NO: 15.

[0024]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises a neurofilament NF-M sequence selected from the group consisting of SPPK (SEQ ID NO:54), SPVK (SEQ ID NO:55), SPAAK (SEQ ID NO:56), SPAPK (SEQ ID NO:57), SPEAK (SEQ ID NO:58), SPMPK (SEQ ID NO:59), SPPAK (SEQ ID NO:60), SPTAK (SEQ ID NO:61), SPTTK (SEQ ID NO:62), SPVAK (SEQ ID NO:63), SPVAK (SEQ ID NO:64), SPVPK (SEQ ID NO:65), SPVSK (SEQ ID NO:66), SPEKPA (SEQ ID NO:67), SPVEEKAK (SEQ ID NO:68), SPVEEKGK (SEQ ID NO:69), SPVEEVKP (SEQ ID NO:70), SPEKPATPKVT (SEQ ID NO:71), SPEKPRTPEKPA (SEQ ID NO:72), SPEKPTTPEKW (SEQ ID NO:73), SPEKPSSPLKDEKA (SEQ ID NO:74), SPVKEKAVEEMITIT (SEQ ID NO:75), SPVKEEAAEEAATITK (SEQ ID NO:76), SPVPKSPVEEVKPKAEATAG (SEQ ID NO:77), SPVKAESPVKEEVPAKPVKV (SEQ ID NO:78), SPEKEAKEEEKPQEKEKEKEK (SEQ ID NO:79), SPVKATTPEIKEEEGEKEEEGQE (SEQ ID NO:80), SPVEEVKPKPEAKAGKGEQKEE (SEQ ID NO:81), SPEKPATPEKPPTPEKAITPEKVR (SEQ ID NO:82), SPEKPATPEKPRTPEKPATPEKPR (SEQ ID NO:83), SPKEEKVEKKEEKPKDVPKKKAE (SEQ ID NO:84), SPKEEKAEKKEEKPKDVPEKKKAE (SEQ ID NO:85), SPVEEAKSKAEVGKGEQKEEEEKE (SEQ ID NO:86), SPKEEKVEKKEEKPKDVPDKKKAE (SEQ ID NO:87), SPVKEEAVAEVVTITKSVKVHLEKET (SEQ ID NO:88), SSEKDEGEQEEEEGETEAEGEGEEAEAKEEK (SEQ ID NO:89), SPVEEVKPKAEAGAEKGEQKEKVEEEKKEAKE (SEQ ID NO:90), SPVTEQAKAVQKAAAEVGKDQKAEKAAEKAAKEEKAA (SEQ ID NO:91), SPEAKEEEEEGEKEEEEEGQEEEEEEDEGVKSDQAEEGGSEKEG (SEQ ID NO:92).

[0025]According to another particular embodiment of the present invention, an EBD sequence of a fusion polypeptide is derived from a phage sequence. In a related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a filamentous phage fd. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises at least one linker region derived from a filamentous phage fd adsorption protein pIII. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises a filamentous phage fd adsorption protein pIII having a sequence set forth in SEQ ID NO: 17. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises a filamentous phage fd adsorption protein pIII sequence selected from the group consisting of EGGGS (SEQ ID NO:93), EGGGT (SEQ ID NO:94), SEGGG (SEQ ID NO:95), GGGSGGG (SEQ ID NO:96), SGGGSGSG (SEQ ID NO:97), and SGGGSEGGG (SEQ ID NO:98).

[0026]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises at least 2-100 repeats of A filamentous phage fd adsorption protein pIII sequence set forth above, or a combination thereof.

[0027]In another particular embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention is derived from a nuclear pore protein. In a more particular embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from an yeast nuclear pore Nup2p protein having the sequence set forth in SEQ ID NO: 19. In a related embodiment, the EBD is derived from the yeast nucleoporin Nup2p protein and is selected from the group consisting of FSFGTSQPNNTPS (SEQ ID NO:99), FSFSIPSKNTPDASKPS (SEQ ID NO:100), FVFGQAAAKPSLEKSS (SEQ ID NO:101), FSFGVPNSSKNETSKPV (SEQ ID NO:102), FTFGTKHAADSQNNKPS (SEQ ID NO:103), FTFGSSALADNKEDVKKP (SEQ ID NO:104), FSFGINTNTTKTADTKAPT (SEQ ID NO:105), FSFGKTTANLPANSSTSPAPSIPSTG (SEQ ID NO:106), FSFGPKKENRKKDESDSENDIEIKGPE (SEQ ID NO:107), FKFSGTVSSDVFKLNPSTDKNEKKTETNAKP (SEQ ID NO:108), FKFSLPFEQKGSQTTTNDSKEESTTEATGNESQ (SEQ ID NO:109), FTFGSTTIEKKNDENSTSNSKPEKSSDSNDSNPS (SEQ ID NO:110), FSFGISNGSESKDSDKPSLPSAVDGENDKKEATKPA (SEQ ID NO:111), FSFSSATSTTEQTKSKNPLSLTEATKTNVDNNSKAEAS (SEQ ID NO:112) and FSFGAATPSAKEASQEDDNNNVEKPSSKPAFNLISNAGTEKEKESKKDSKPA (SEQ ID NO:113).

[0028]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises at least 2-100 repeats of a Nup2p sequence set forth above, or a combination thereof.

[0029]According to another particular embodiment of the present invention, an EBD sequence is a sequence derived from a mammalian elastin protein. In another related embodiment, the EBD sequence of a fusion polypeptide of the invention is derived from a mouse elastin having the sequence set forth in SEQ ID NO: 21.

[0030]In a related embodiment, the EBD comprises a sequence derived from an elastin protein and is selected from the group consisting of VPGA (SEQ ID NO:114), GAGGL (SEQ ID NO:115), GAGGG (SEQ ID NO:116), VPGVG (SEQ ID NO:117), VPGFGAGA (SEQ ID NO:118), VPGALPGA (SEQ ID NO:119), VPGFGAGAG (SEQ ID NO:120), VPAVPGAGG (SEQ ID NO:121), VPGGVGVGG (SEQ ID NO:122), VGAGGFPGYG (SEQ ID NO:123), VPGAVPGGLPGG (SEQ ID NO:124), VSPAAAAKAAKYGAA (SEQ ID NO:125), VPQVGAGIGAGGKPGK (SEQ ID NO:126), VPGGVGVGGIPGGVGVGG (SEQ ID NO:127), VPGGVGGIGGIGGLGVSTGAV (SEQ ID NO:128), VPGGAAGAAAAYKAAAKAGAGLGGVGG (SEQ ID NO:129), VSPAAAAKAAAKAAKYGARGGVGIPTYG (SEQ ID NO:130), KPPKPYGGALGALGYQGGGCFGKSCGRKRK (SEQ ID NO:131), VPGAGTPAAAAAAAAAKAAAKAGLGPGVGG (SEQ ID NO:132), VPGRVAGAAPPAAAAAAAKAAAKAAQYGLG (SEQ ID NO:133), VPGVGLPGVYPGGVLPGTGARFPGVGVLPG (SEQ ID NO:134), VPTGTGVKAKAPGGGGAFSGIPGVGPFGGQQPG (SEQ ID NO:135), VPGGVYYPGAGIGGLGGGGGALGPGGKPPKPGAG (SEQ ID NO:136), VGAGAGLGGASPAAAAAAAKAAKYGAGGAGALGGL (SEQ ID NO:137), GLGGVLGARPFPGGGVAARPGFGLSPIYPGGGAGGLGVGG (SEQ ID NO:138), VPGSLAASKAAKYGAAGGLGGPGGLGGPGGLGGPGGLGGAG (SEQ ID NO:139), VPGGPGVRLPGAGIPGVGGIPGVGGIPGVGGPGIGGPGIVGGPGA (SEQ ID NO:140), VLPGVGGGGIPGGAGAIPGIGGIAGAGTPAAAAAAKAAAKAAKYGAAGGL (SEQ ID NO:141), VPGGVGPGGVTGIGAGPGGLGGAGSPAAAKSAAKAAAKAQYRAAAGLGAG (SEQ ID NO:142), and VPLGYPIKAPKLPGGYGLPYTNGKLPYGVAGAGGKAGYPTGTGVGSQAAAAAAK AAKYGAGGAG (SEQ ID NO:143).

[0031]In yet another related embodiment, the EBD sequence of a fusion polypeptide of the invention comprises at least 2-100 repeats of an elastin sequence set forth above, or a combination thereof.

[0032]In another embodiment, the EBD sequence of a fusion polypeptide of the invention comprises a combination of any one or more of the EBD sequences set forth herein.

[0033]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H and NF-M sequences set forth herein.

[0034]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H and Nup2p sequences set forth herein.

[0035]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-M and Nup2p sequence set forth herein.

[0036]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0037]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-M and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0038]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of Nup2p and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0039]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H, NF-M and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0040]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H, NF-M and Nup2p sequences set forth herein.

[0041]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of Nup2p, NF-M and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0042]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of NF-H, Nup2p and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0043]In yet another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a combination of Nup2p, NF-H, NF-M and filamentous phage fd adsorption protein pIII sequences set forth herein.

[0044]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a variant version of an amino acid sequence of NF-H described herein, where resulting sequence preserves amino acid composition of the parent sequence.

[0045]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a variant version of an amino acid sequence of NF-M described herein, where resulting sequence preserves amino acid composition of the parent sequence.

[0046]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a variant version of an amino acid sequence of Nup2p described herein, where resulting sequence preserves amino acid composition of the parent sequence.

[0047]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a variant version of an amino acid sequence of filamentous phage fd adsorption protein pIII described herein, where resulting sequence preserves amino acid composition of the parent sequence.

[0048]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention comprises a variant version of an amino acid sequence of elastin described herein, where resulting sequence preserves amino acid composition of the parent sequence.

[0049]According to another embodiment of the invention, an EBD sequence of a fusion polypeptide of the invention generally comprises between about 5-600 amino acid residues, between about 5-300 amino acid residues or between about 5-100 amino acid residues, however other polypeptide lengths may also be used.

[0050]In another embodiment, an EBD sequence of a fusion polypeptide of the invention is cleavable, e.g., can be removed and/or separated from the heterologous polypeptide sequence after recombinant expression by, for example, enzymatic or chemical cleavage methods.

[0051]In another embodiment, an EBD sequence of a fusion polypeptide of the invention is covalently linked at the N-terminus of the heterologous polypeptide sequence of interest. In another embodiment, an EBD sequence of a fusion polypeptide of the invention is covalently linked at the C-terminus of the heterologous polypeptide sequence of interest. In yet another embodiment, an EBD sequence of a fusion polypeptide of the invention is covalently linked at the N- and C-termini of the heterologous polypeptide sequence of interest.

[0052]In another embodiment of the invention, the charge of an EBD sequence of a fusion polypeptide of the invention is modulated by, for example, enzymatic and/or chemical methods, in order to modulate the activity of the EBD sequence. In a particular embodiment, the charge of the EBD sequence is modulated by phosphorylation.

[0053]According to another aspect of the invention, an isolated polynucleotide is provided, wherein the polynucleotide encodes a fusion polypeptide as described herein.

[0054]According to yet another aspect of the invention, there is provided an expression vector comprising an isolated polynucleotide encoding a fusion polypeptide as described herein. In a related embodiment, an expression vector is provided comprising a polynucleotide encoding an EBD sequence and further comprising a cloning site for insertion of a polynucleotide encoding a heterologous polypeptide of interest.

[0055]According to yet another aspect of the invention, there is provided a host cell comprising an expression vector as described herein.

[0056]According to yet another aspect of the invention, there is provided a kit comprising an isolated polynucleotide as described herein, an isolated polypeptide as described herein and/or an isolated host cell as described herein.

[0057]Yet another aspect of the invention provides a method for producing a recombinant protein comprising the steps of: introducing into a host cell an expression vector comprising a polynucleotide sequence encoding a fusion polypeptide, the fusion polypeptide comprising at least one entropic bristle domain sequence and at least one polypeptide sequence of interest; and expressing the fusion polypeptide in the host cell. In another embodiment, the method further comprises the step of isolating the fusion polypeptide from the host cell. In another related embodiment, the method further comprises the step of removing the entropic bristle domain sequence from the fusion polypeptide before or after isolating the fusion polypeptide from the host cell.

[0058]These and other aspects of the present invention will become apparent upon reference to the following detailed description. All references disclosed herein and in the enclosed Application Data Sheet are hereby incorporated by reference in their entirety as if each was incorporated individually.

BRIEF DESCRIPTION OF THE DRAWING

[0059]FIG. 1 depicts the average net charge of a 5 residue moving window for residues 422 to 916 of human neurofilament medium (NF-M) protein sequence.

BRIEF DESCRIPTION OF THE SEQUENCE IDENTIFIERS

[0060]SEQ ID NO: 1 is the amino acid sequence of a human NF-H protein, Swiss-Prot accession number P12036, having an illustrative EB-domain corresponding to residues 414-1026.

[0061]SEQ ID NO: 2 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 1, GenBank accession number BC073969, having an illustrative EB-domain corresponding to residues 1242-3081.

[0062]SEQ ID NO: 3 is the amino acid sequence of a mouse NF-H protein, Swiss-Prot accession number P19246, having an illustrative EB domain corresponding to residues 409-1087.

[0063]SEQ ID NO: 4 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 3, GenBank accession number M35131, having an illustrative EB-domain corresponding to residues 1227-3219.

[0064]SEQ ID NO: 5 is the amino acid sequence of a bovine NF-M protein, Swiss-Prot accession number 077788; having an illustrative EB domain corresponding to residues 412-925.

[0065]SEQ ID NO: 6 is a polynucleotide sequence encoding protein residues 116-925 of bovine NF-M, GenBank accession number AF091342, having an illustrative EB domain corresponding to residues 891-2433.

[0066]SEQ ID NO: 7 is the amino acid sequence of a chicken NF-M protein, Swiss-Prot accession number P16053, having an illustrative EB domain corresponding to residues 407-857.

[0067]SEQ ID NO: 8 is a polynucleotide sequence encoding the protein fragment 259-857 of chicken NF-M, GenBank accession number X05558, having an illustrative EB domain corresponding to residues 177-1530.

[0068]SEQ ID NO: 9 is the amino acid sequence of a human NF-M protein, Swiss-Prot accession number P07197, having an illustrative EB domain corresponding to residues 412-915.

[0069]SEQ ID NO: 10 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 9, GenBank accession number Y00067, having an illustrative EB domain corresponding to residues 1236-2751.

[0070]SEQ ID NO: 11 is the amino acid sequence of a mouse NF-M protein, Swiss-Prot accession number P08553, having an illustrative EB domain corresponding to residues 411-848.

[0071]SEQ ID NO: 12 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 11, GenBank accession number X05640, having an illustrative EB domain corresponding to residues 1233-2550.

[0072]SEQ ID NO: 13 is the amino acid sequence of a rat NF-M protein, Swiss-Prot accession number P12839, having an illustrative EB domain corresponding to residues 411-845.

[0073]SEQ ID NO: 14 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 13, GenBank accession number Z12152, having an illustrative EB domain corresponding to residues 1233-2538.

[0074]SEQ ID NO: 15 is the amino acid sequence of a rabbit NF-M protein, Swiss-Prot accession number P54938, having an illustrative EB domain corresponding to residues 198-644.

[0075]SEQ ID NO: 16 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 15, GenBank accession number Z47378, having an illustrative EB domain corresponding to residues 594-1938.

[0076]SEQ ID NO: 17 is the amino acid sequence of a phage fd pill protein, Swiss-Prot accession number P69168, having illustrative EB-domains corresponding to residues 86-104 and 236-274.

[0077]SEQ ID NO: 18 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 17, GenBank accession number V00604, having illustrative EB domains corresponding to residues 258-312 and 708-822.

[0078]SEQ ID NO: 19 is the amino acid sequence of a Yeast Nup2p protein, Swiss-Prot accession number P32499, having an illustrative EB-domain corresponding to residues 189-582.

[0079]SEQ ID NO: 20 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 19, GenBank accession number X69964, having an illustrative EB domain corresponding to residues 567-1748.

[0080]SEQ ID NO: 21 is the amino acid sequence of a mouse elastin protein, Swiss-Prot accession number P54320, the entire sequence of which represents an illustrative EB domain.

[0081]SEQ ID NO: 22 is a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 21, GenBank accession number U08210.

[0082]SEQ ID Nos: 23 to 144 represent further illustrative EBD sequences according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0083]The practice of the present invention will employ, unless indicated specifically to the contrary, conventional methods of molecular biology and recombinant DNA techniques within the skill of the art, many of which are described below for the purpose of illustration. Such techniques are explained fully in the literature. See, e.g., Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Maniatis et al., Molecular Cloning: A Laboratory Manual (1982); DNA Cloning: A Practical Approach, vol. I & II (D. Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed., 1984); Nucleic Acid Hybridization (B. Hames & S. Higgins, eds., 1985); Transcription and Translation (B. Hames & S. Higgins, eds., 1984); Animal Cell Culture (R. Freshney, ed., 1986); Perbal, A Practical Guide to Molecular Cloning (1984).

[0084]All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety.

[0085]As used in this specification and the appended claims, the singular forms "a," "an" and "the" include plural references unless the content clearly dictates otherwise.

[0086]As used herein, the terms "polypeptide" and "protein" are used interchangeably, unless specified to the contrary, and according to conventional meaning, i.e., as a sequence of amino acids. Polypeptides are not limited to a specific length, e.g., they may comprise a full length protein sequence or a fragment of a full length protein, and may include post-expression modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like, as well as other modifications known in the art, both naturally occurring and non-naturally occurring. Polypeptides of the invention may be prepared using any of a variety of well known recombinant and/or synthetic techniques, illustrative examples of which are further discussed below.

[0087]As noted above, the present invention, in a general aspect, relates to isolated fusion polypeptides comprising at least one entropic bristle domain (EBD) sequence and at least one heterologous polypeptide sequence. By providing an EBD sequence which sweeps out the three-dimensional space surrounding a newly synthesized heterologous polypeptide, the EBD sequences of the invention effectively exclude other polypeptides and thereby minimize aggregation with other newly synthesized heterologous polypeptides during recombinant polypeptide production.

[0088]In addition, an EBD sequence of the invention can provide steric stabilization to recombinant polypeptides, a property that is relatively independent of concentration, and can thus minimize problems associated with high-level recombinant production of polypeptides and proteins (e.g., precipitation, toxicity and/or inclusion body formation). Thus, EBD fusion polypeptides described herein exhibit both steric effects (via the entropic bristle's motion) and electrostatic effects (via the bristle's highly charged sequence) to minimize interactions between recombinant polypeptides expressed as fusions according to the present invention. These characteristics allow EBD polypeptide sequences to more effectively solubilize recombinantly expressed polypeptides than, for example, other fusion partners which do not have a steric exclusion component that contributes to their activity.

[0089]Therefore, according to one embodiment of the invention, fusion polypeptides comprising an EBD sequence and a heterologous polypeptide are provided which exhibit improved solubility relative to the corresponding heterologous polypeptide in the absence of the EBD sequence. In one embodiment, for example, the fusion polypeptide has at least 5% increased solubility relative to the heterologous polypeptide sequence alone. In another related embodiment, the fusion polypeptide has at least 25% increased solubility relative to the heterologous polypeptide sequence. In yet another related embodiment, the fusion polypeptide has at least 50% increased solubility relative to the heterologous polypeptide sequence.

[0090]The extent of improved solubility provided by an EBD sequence described herein can be determined using any of a number of available approaches (see for example, Kapust, R. B. and D. S. Waugh, Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci, 1999. 8:1668-74; Fox, J. D., et al., Maltodextrin-binding proteins from diverse bacteria and archaea are potent solubility enhancers. FEBS Lett, 2003. 537:53-7; Dyson M R, Shadbolt S P, Vincent K J, Perera R L, McCafferty J. Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression. BMC Biotechnol. 2004 Dec. 14; 4(1):32).

[0091]Cells from single, drug resistant colony of E. coli overproducing the fusion polypeptide are grown to saturation in LB broth (Miller J H. 1972. Experiments in molecular genetics. Cold Spring Harbor, N.Y.: Cold Spring Harbor Press. p 433) supplemented with 100 mg/mL ampicillin and 30 mg/mL chloramphenicol at 37° C. The saturated cultures are diluted 50-fold in the same medium and grown in shake-flasks to mid-log phase (A₆₀₀˜0.5-0.7), at which time IPTG is added to a final concentration of 1 mM. After 3 h, the cells are recovered by centrifugation. The cell pellets are resuspended in 0.1 culture volumes of lysis buffer (50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 1 mM EDTA), and disrupted by sonication. A total protein sample is collected from the cell suspension after sonication, and a soluble protein sample is collected from the supernatant after the insoluble debris is pelleted by centrifugation (20,000×g). These samples are subjected to SDS-PAGE and proteins are visualized by staining with Coomassie Brilliant Blue. At least three independent experiments are typically performed to obtain numerical estimates of the solubility of each fusion protein in E. coli. Coomassie-stained gels will be scanned with a gel-scanning densitometer and the pixel densities of the bands corresponding to the fusion proteins are obtained directly by volumetric integration. In each lane, the collective density of all E. coli proteins that are larger than the largest fusion protein are also determined by volumetric integration and used to normalize the values in each lane relative to the others. The percent solubility of each fusion protein is calculated by dividing the amount of soluble fusion protein by the total amount of fusion protein in the cells, after first subtracting the normalized background values obtained from negative control lanes (cells containing no expression vector). Descriptive statistical data (e.g., the mean and standard deviation) is then generated using standard methods.

[0092]The presence of an EBD sequence in fusion polypeptides of the present invention can also serve to reduce the extent of aggregation of a heterologous polypeptide sequence. In one embodiment, for example, the fusion polypeptide exhibits at least 10% reduced aggregation relative to the heterologous polypeptide. In another embodiment, the fusion polypeptide has at least 25% reduced aggregation relative to the heterologous polypeptide.

[0093]The extent of reduced aggregation provided by the fusion polypeptides of the present invention can be determined using any of a number of available techniques (see for example, Kapust, R. B. and D. S. Waugh, Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci, 1999. 8:1668-74; Fox, J. D., et al., Maltodextrin-binding proteins from diverse bacteria and archaea are potent solubility enhancers. FEBS Lett, 2003. 537:53-7).

[0094]Cells from single, drug resistant colony of E. coli overproducing the fusion polypeptide are grown to saturation in LB broth (Miller J H. 1972. Experiments in molecular genetics. Cold Spring Harbor, N.Y.: Cold Spring Harbor Press. p 433) supplemented with 100 mg/mL ampicillin and 30 mg/mL chloramphenicol at 37° C. The saturated cultures are diluted 50-fold in the same medium and grown in shake-flasks to mid-log phase (A₆₀₀˜0.5-0.7), at which time IPTG is added to a final concentration of 1 mM. After 3 h, the cells are recovered by centrifugation. The cell pellets are resuspended in 0.1 culture volumes of lysis buffer (50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 1 mM EDTA), and disrupted by sonication. A total protein sample is collected from the cell suspension after sonication, and an insoluble protein sample is collected from the pellet after the centrifugation (20,000×g). These samples are subjected to SDS-PAGE and proteins are visualized by staining with Coomassie Brilliant Blue. At least three independent experiments are typically performed to obtain numerical estimates of the solubility of each fusion protein in E. coli. Coomassie-stained gels are scanned with a gel-scanning densitometer and the pixel densities of the bands corresponding to the fusion proteins are obtained directly by volumetric integration. In each lane, the collective density of all insoluble E. coli proteins that are larger than the largest fusion protein is also determined by volumetric integration and used to normalize the values in each lane relative to the others. The percent insolubility of each fusion protein is calculated by dividing the amount of insoluble fusion protein by the total amount of fusion protein in the cells, after first subtracting the normalized background values obtained from negative control lanes (cells containing no expression vector). Descriptive statistical data (e.g., the mean and standard deviation) is generated by standard methods.

[0095]The presence of an EBD sequence in the fusion polypeptides of the present invention can also serve to improve the folding characteristics of the fusion polypeptides relative to the corresponding heterologous polypeptide, e.g., by minimizing interference caused by interaction with other proteins.

[0096]Assays for evaluating the folding characteristics of a fusion polypeptide of the invention can be carried out using conventional techniques, such as circular dichroism spectroscopy in far ultra-violet region, circular dichroism in near ultra-violet region, nuclear magnetic resonance spectroscopy, infra-red spectroscopy, Raman spectroscopy, intrinsic fluorescence spectroscopy, extrinsic fluorescence spectroscopy, fluorescence resonance energy transfer, fluorescence anisotropy and polarization, steady-state fluorescence, time-domain fluorescence, numerous hydrodynamic techniques including gel-filtration, viscometry, small-angle X-ray scattering, small angle neutron scattering, dynamic light scattering, static light scattering, scanning microcalorimetry, and limited proteolysis.

[0097]In another embodiment of the invention, an EBD comprises an amino acid sequence that maintains a substantially random coil conformation. Whether a given amino acid sequence maintains a substantially random coil conformation can be determined by circular dichroism spectroscopy in far ultra-violet region, nuclear magnetic resonance spectroscopy, infra-red spectroscopy, Raman spectroscopy, fluorescence spectroscopy, numerous hydrodynamic techniques including gel-filtration, viscometry, small-angle X-ray scattering, small angle neutron scattering, dynamic light scattering, static light scattering, scanning microcalorimetry, and limited proteolysis.

[0098]In another embodiment of the invention, an EBD sequence comprises an amino acid sequence that is substantially mutually repulsive. This property of being mutually repulsive can be determined by simple calculations of charge distribution within the polypeptide sequence.

[0099]In yet another embodiment of the invention, an EBD sequence comprises an amino acid sequence that remains in substantially constant motion, particularly in an aqueous environment. The property of being in substantially constant motion can be determined by nuclear magnetic resonance spectroscopy, small-angle X-ray scattering, small angle neutron scattering, dynamic light scattering, intrinsic fluorescence spectroscopy, extrinsic fluorescence spectroscopy, fluorescence resonance energy transfer, fluorescence anisotropy and polarization, steady-state fluorescence, time-domain fluorescence.

[0100]According to a more particular embodiment of the present invention, an EBD sequence is derived from one of the three subunits that make up mammalian axon neurofilaments (including human, bovine, chicken, rabbit, mouse, and rat neurofilaments). Axon neurofilaments are major cytoskeletal components of the axonal cell. One of the functions of neurofilaments is to maintain the bore of the axon. Spacing between the filaments is maintained by the action of an entropic brush formed by entropic bristles carried by certain of the neurofilament subunits. The combination of the entropic bristles along the length of the fiber results in the formation of an entropic brush that functions to sterically exclude interfiber contact by thermally-driven motion, thereby maintaining the bore of the axon. Interfilament spacing is thought to be maintained by long-range interactions between the entropic brushes formed by the EBDs that project from the NF-M and NF-H monomers (Brown and Hoh, 1997).

[0101]Therefore, in another embodiment of the invention, an EBD sequence of the invention comprises a C-terminal entropic bristle sequence of an NF-M or NF-H neurofilament protein. For example, in one embodiment, an EBD sequence of the invention comprises at least one amino acid sequence, SPEAEK (SEQ ID NO:23), derived from the neurofilament triplet H protein. In a related embodiment, multiple repeats of the SPEAEK (SEQ ID NO:23) sequence are provided within the same isolated fusion polypeptide. In a more particular embodiment, about 1-10, 1-50 or 1-100 repeats of the sequence SPEAEK (SEQ ID NO:23) are provide in a polypeptide.

[0102]In another embodiment of the invention, an EBD sequence is a sequence derived from a phage protein. In a more particular embodiment, the EBD sequence comprises at least one sequence derived from the linker region of a filamentous phage, such as the filamentous phage fd. In a more particular embodiment, the EBD sequence comprises at least one sequence derived from the linker region derived from the filamentous phage fd adsorption protein pIII. In a more particular embodiment, the EBD sequence comprises at least one sequence derived from the 36 amino acid linker region derived from filamentous phage fd adsorption protein pIII. In a more particular embodiment, an EBD sequence of the invention comprises between about 1-10, 1-50 or 1-100 repeats of the amino acid sequence EGGGS (SEQ ID NO:93), derived from the linker region of a filamentous phage fd adsorption protein pIII.

[0103]In another embodiment of the invention, an EBD sequence is a sequence derived from nucleoporin. In eukaryotic cells, the translocation of biomolecules between the nucleus and cytosol occurs through nuclear pore complexes (NPCs), supramolecular protein structures embedded in the double lipid membrane of the nuclear envelope (Nakielny, S., and Dreyfuss, G. (1999) Cell 99, 677-690; Pemberton, L. F., Blobel, G., and Rosenblum, J. S. (1998) Curr. Opin. Cell Biol. 10, 392-399; Rout, M., and Aitchison, J. (2001) J. Biol. Chem. 276, 16593-16596). For example, the Saccharomyces cerevisiae NPC is a 60-MDa structure (Yang, Q., Rout, M. P., and Akey, C. W. (1998) Mol. Cell. 1, 223-234) formed by 30 different nucleoporins present in multiple copies per NPC (Rout, M. P., Aitchison, J. D., Suprapto, A., Hjertaas, K., Zhao, Y., and Chait, B. T. (2000) J. Cell Biol. 148, 635-651). The yeast NPC contains a core ring structure with 8-fold symmetry measuring 95 nm in diameter and 35 nm in depth (Yang, Q., Rout, M. P., and Akey, C. W. (1998) Mol. Cell. 1, 223-234). It is believed that nucleoporins form a barrier meshwork that excludes most macromolecules larger than a threshold size from entering the NPC (Rout, M., and Aitchison, J. (2001) J. Biol. Chem. 276, 16593-16596; Rout, M. P., Aitchison, J. D., Suprapto, A., Hjertaas, K., Zhao, Y., and Chait, B. T. (2000) J. Cell Biol. 148, 635-651; Denning D P, Uversky V, Patel S S, Fink A L, Rexach M (2002) The Saccharomyces cerevisiae nucleoporin Nup2p is a natively unfolded protein. J Biol. Chem. 277(36):33447-55).

[0104]Therefore, in another embodiment of the invention, an EBD sequence of the invention comprises a central fragment of yeast nucleoporin Nup2p, such as those described herein. For example, in one embodiment, an EBD sequence of the invention comprises at least one amino acid sequence, FSFGTSQPNNTPS (SEQ ID NO:99), derived from the yeast nucleoporin porin protein Nup2p. In a related embodiment, multiple repeats of the FSFGTSQPNNTPS (SEQ ID NO:99) sequence are provided within the same isolated fusion polypeptide. In a more particular embodiment, about 1-10, 1-50 or 1-100 repeats of the sequence FSFGTSQPNNTPS (SEQ ID NO:99) are provide in a polypeptide.

[0105]In another embodiment of the invention, an EBD sequence is a sequence derived from an elastin-like polypeptide (ELP). ELPs comprise multiple repeats of the elastin-derived pentamer VPGXG (SEQ ID NO:144) where x, the guest residue, is not proline. ELPs are disordered and highly solvated at normal temperatures. They undergo inverse transition at elevated temperatures (the T_t of a particular ELP sequence). The conformation of ELPs transitions from extended to collapsed and is dependent on temperature and salt concentration. Purification of proteins using ELPs may be carried out using inverse transition cycling. The ELP is soluble at temperatures below its T_t and insoluble at temperatures above its T_t. Using ELPs to purify protein may be accomplished by making a fusion construct that includes the target heterologous protein and a suitable ELP multimer, e.g., comprising about 5-100 residues.

[0106]As will be understood by those skilled in the art, the propensity of a polypeptide chain to maintain a substantially random coil and flexible conformation is encoded in its amino acid composition rather than in its amino acid sequence (Uversky V N, Gillespie J R, Fink A L (2000) Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins. 41(3):415-27). This means that polypeptides sharing similar amino acid compositions will be similarly unfolded. The function of EBDs to increase protein solubility is based at least in part on their random coil and flexible conformation. Therefore, in one preferred embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence of a mammalian NF-H protein. In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence of a mammalian NF-M protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence of a Nup2 protein. In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence of a mammalian elastin protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence of a filamentous phage fd adsorption protein pIII.

[0107]In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any combination of fragments derived from sequence of a mammalian NF-H protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any combination of fragments derived from sequence of a mammalian NF-M protein. In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any combination of fragments derived from sequence of a Nup2p protein. In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any combination of fragments derived from sequence of an elastin protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any combination of fragments derived from sequence of a filamentous phage fd adsorption protein pIII.

[0108]In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any combination of fragments derived from sequence of a mammalian NF-H protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any combination of fragments derived from sequence of a mammalian NF-M protein. In one more embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any combination of fragments derived from sequence of a Nup2p protein. In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any combination of fragments derived from sequence of an elastin protein. In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any combination of fragments derived from sequence of a filamentous phage fd adsorption protein pIII.

[0109]In another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to any pairwise or multiple combinations of fragments derived from sequence of a mammalian NF-H protein, a mammalian NF-M protein, a Nup2p protein, an elastin protein and a filamentous phage fd adsorption protein pIII.

[0110]In yet another embodiment of the invention, an EBD sequence of the invention comprises a scrambled variant sequence corresponding to multiple repeats of any pairwise or multiple combinations of fragments derived from sequence of a mammalian NF-H protein, a mammalian NF-M protein, a Nup2p protein, an elastin protein and a filamentous phage fd adsorption protein pIII.

[0111]In another embodiment, the fusion polypeptides of the invention further comprise independent cleavable linkers, which allow an EBD sequence, for example at either the N or C terminus, to be easily cleaved from a heterologous polypeptide sequence of interest. Such cleavable linkers are known and available in the art. This embodiment thus provides improved isolation and purification of a heterologous polypeptide sequence and facilitates downstream high-throughput processes.

[0112]The present invention also provides polypeptide fragments of an EBD polypeptide sequence described herein, wherein the fragment comprises at least about 5, 10, 15, 20, 25, 50, or 100 contiguous amino acids, or more, including all intermediate lengths, of an EBD polypeptide sequence set forth herein, or those encoded by a polynucleotide sequence set forth herein. In a preferred embodiment, an EBD fragment provides similar or improved activity relative to the activity of the EBD sequence from which it is derived (wherein the activity includes, for example, one or more of improved solubility, improved folding, reduced aggregation and/or improved yield, when in fusion with a heterologous polypeptide sequence of interest.

[0113]In another aspect, the present invention provides variants of an EBD polypeptide sequence described herein. EBD polypeptide variants will typically exhibit at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or more identity (e.g., determined as described below), along its length, to an EBD polypeptide sequence set forth herein. Preferably the EBD variant provides similar or improved activity relative to the activity of the EBD sequence from which the variant was derived (wherein the activity includes one or more of improved solubility, improved folding, reduced aggregation and/or improved yield, when in fusion with a heterologous polypeptide sequence of interest.

[0114]An EBD polypeptide variant thus refers to a polypeptide that differs from an EBD polypeptide sequence disclosed herein in one or more substitutions, deletions, additions and/or insertions. Such variants may be naturally occurring or may be synthetically generated, for example, by modifying one or more of the EBD polypeptide sequences of the invention and evaluating their activity as described herein and/or using any of a number of techniques well known in the art.

[0115]In many instances, a variant will contain conservative substitutions. A "conservative substitution" is one in which an amino acid is substituted for another amino acid that has similar properties, such that one skilled in the art of peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to be substantially unchanged. As described above, modifications may be made in the structure of the EBD polynucleotides and polypeptides of the present invention and still obtain a functional molecule that encodes a variant or derivative polypeptide with desirable activity. When it is desired to alter the amino acid sequence of an EBD polypeptide to create an equivalent or an improved EBD variant or EBD fragment, one skilled in the art can readily change one or more of the codons of the encoding DNA sequence, for example according to Table 1.

[0116]For example, certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of desired activity. It is thus contemplated that various changes may be made in the EBD polypeptide sequences of the invention, or corresponding DNA sequences which encode said EBD polypeptide sequences, without appreciable loss of their desired activity.

TABLE-US-00001 TABLE 1 Amino Acids Codons Alanine Ala A GCA GCC GCG GCU Cysteine Cys C UGC UGU Aspartic acid Asp D GAC GAU Glutamic acid Glu E GAA GAG Phenylalanine Phe F UUC UUU Glycine Gly G GGA GGC GGG GGU Histidine His H CAC CAU Isoleucine Ile I AUA AUC AUU Lysine Lys K AAA AAG Leucine Leu L UUA UUG CUA CUC CUG CUU Methionine Met M AUG Asparagine Asn N AAC AAU Proline Pro P CCA CCC CCG CCU Glutamine Gln Q CAA CAG Arginine Arg R AGA AGG CGA CGC CGG CGU Serine Ser S AGC AGU UCA UCC UCG UCU Threonine Thr T ACA ACC ACG ACU Valine Val V GUA GUC GUG GUU Tryptophan Trp W UGG Tyrosine Tyr Y UAC UAU

[0117]In making such changes, the hydropathic index of amino acids may also be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte and Doolittle, 1982, incorporated herein by reference). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn has potential bearing on the interaction of the protein with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, and the like. Each amino acid has been assigned a hydropathic index on the basis of its hydrophobicity and charge characteristics (Kyte and Doolittle, 1982). These values are: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (-3.5); glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (-4.5).

[0118]Therefore, according to certain embodiments, amino acids within an EBD sequence of the invention may be substituted by other amino acids having a similar hydropathic index or score. Preferably, any such changes result in an EBD sequence with a similar level of activity as the unmodified EBD sequence. In making such changes, the substitution of amino acids whose hydropathic indices are within ±2 is preferred, those within ±1 are particularly preferred, and those within ±0.5 are even more particularly preferred. It is also understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0±1); glutamate (+3.0±1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5±1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). Thus, an amino acid can be substituted for another having a similar hydrophilicity value and in many cases still retain a desired level of activity. In such changes, the substitution of amino acids whose hydrophilicity values are within ±2 is preferred, those within ±1 are particularly preferred, and those within ±0.5 are even more particularly preferred.

[0119]As outlined above, amino acid substitutions are generally therefore based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like.

[0120]In addition, any polynucleotide of the invention, such as a polynucleotide encoding an EBD polypeptide sequence, or a vector comprising a polynucleotide encoding an EBD polypeptide sequence, may be further modified to increase stability in vivo. Possible modifications include, but are not limited to, the addition of flanking sequences at the 5' and/or 3' ends; the use of phosphorothioate or 2' O-methyl rather than phosphodiesterase linkages in the backbone; and/or the inclusion of nontraditional bases such as inosine, queosine and wybutosine, as well as acetyl- methyl-, thio- and other modified forms of adenine, cytidine, guanine, thymine and uridine.

[0121]Amino acid substitutions within an EBD sequence of the invention may further be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the residues. For example, negatively charged amino acids include aspartic acid and glutamic acid; positively charged amino acids include lysine and arginine; and amino acids with uncharged polar head groups having similar hydrophilicity values include leucine, isoleucine and valine; glycine and alanine; asparagine and glutamine; and serine, threonine, phenylalanine and tyrosine. Other groups of amino acids that may represent conservative changes include: (1) ala, pro, gly, glu, asp, gin, asn, ser, thr; (2) cys, ser, tyr, thr; (3) val, ile, leu, met, ala, phe; (4) lys, arg, his; and (5) phe, tyr, trp, his. A variant may also, or alternatively, contain nonconservative changes.

[0122]In an illustrative embodiment, a variant EBD polypeptide differs from the corresponding unmodified EBD sequence by substitution, deletion or addition of five percent of the original amino acids or fewer. Variants may also (or alternatively) be modified by, for example, the deletion or addition of amino acids that have minimal influence on the desired activity.

[0123]A polypeptide of the invention may further comprise a signal (or leader) sequence at the N-terminal end of the polypeptide, which co-translationally or post-translationally directs transfer of the protein. The polypeptide may also be conjugated to a linker or other sequence for ease of synthesis, purification or identification of the polypeptide (e.g., poly-His), or to enhance binding of the polypeptide to a solid support.

[0124]As noted above, the present invention provides EBD polypeptide variant sequences which share some degree of sequence identity with an EBD polypeptide specifically described herein, such as those having at least 40%, 50%, 60%, 70%, 80%, 90% or 95% identity with an EBD polypeptide sequence described herein. When comparing polypeptide sequences to evaluate their extent of shared sequence identity, two sequences are said to be "identical" if the sequence of amino acids in the two sequences is the same when aligned for maximum correspondence, as described below. Comparisons between two sequences are typically performed by comparing the sequences over a comparison window to identify and compare local regions of sequence similarity.

[0125]A "comparison window" as used herein, refers to a segment of at least about 20 contiguous positions, usually 30 to about 75, 40 to about 50, in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.

[0126]Optimal alignment of sequences for comparison may be conducted using the Megalign program in the Lasergene suite of bioinformatics software (DNASTAR, Inc., Madison, Wis.), using default parameters. This program embodies several alignment schemes described in the following references: Dayhoff, M. O., (1978) A model of evolutionary change in proteins--Matrices for detecting distant relationships. In Dayhoff, M. O. (ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington D.C. Vol. 5, Suppl. 3, pp. 345-358; Hein J. (1990) Unified Approach to Alignment and Phylogenes, pp. 626-645 Methods in Enzymology vol. 183, Academic Press, Inc., San Diego, Calif.; Higgins, D. G. and Sharp, P. M., CABIOS 5:151-153 (1989); Myers, E. W. and Muller W., CABIOS 4:11-17 (1988); Robinson, E. D., Comb. Theor 11:105 (1971); Saitou, N. Nei, M., Mol. Biol. Evol. 4:406-425 (1987); Sneath, P. H. A. and Sokal, R. R., Numerical Taxonomy--the Principles and Practice of Numerical Taxonomy, Freeman Press, San Francisco, Calif. (1973); Wilbur, W. J. and Lipman, D. J., Proc. Natl. Acad., Sci. USA 80:726-730 (1983).

[0127]Alternatively, optimal alignment of sequences for comparison may be conducted by the local identity algorithm of Smith and Waterman, Add. APL. Math 2:482 (1981), by the identity alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity methods of Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), or by inspection.

[0128]One preferred example of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., Nucl. Acids Res. 25:3389-3402 (1977), and Altschul et al., J. Mol. Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 can be used, for example with the parameters described herein, to determine percent sequence identity for the polynucleotides and polypeptides of the invention. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. For amino acid sequences, a scoring matrix can be used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment.

[0129]In one preferred approach, the "percentage of sequence identity" is determined by comparing two optimally aligned sequences over a window of comparison of at least 20 positions, wherein the portion of the polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less, usually 5 to 15 percent, or 10 to 12 percent, as compared to the reference sequences (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the reference sequence (i.e., the window size) and multiplying the results by 100 to yield the percentage of sequence identity.

[0130]In another aspect of the invention, there is provided an isolated polynucleotide sequence encoding a fusion polypeptide, the fusion polypeptide comprising at least one entropic bristle domain sequence and at least one heterologous polypeptide sequence of interest. In a related aspect, the invention provides expression vectors comprising a polynucleotide encoding an EBD fusion polypeptide of the invention. In another related aspect, an expression vector of the invention comprises a polynucleotide encoding one or more EBD sequence and further comprises a multiple cloning site for the insertion of a polynucleotide encoding a heterologous polypeptide sequence of interest.

[0131]Polynucleotides compositions of the present invention may be identified, prepared and/or manipulated using any of a variety of well established techniques (see generally, Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y., 1989, and other like references).

[0132]The terms "DNA" and "polynucleotide" are used essentially interchangeably herein to refer to a DNA molecule that has been isolated free of total genomic DNA of a particular species. "Isolated", as used herein, means that a polynucleotide is substantially away from other coding sequences, and that the DNA molecule does not contain large portions of unrelated coding DNA, such as large chromosomal fragments or other functional genes or polypeptide coding regions. Of course, this refers to the DNA molecule as originally isolated, and does not exclude genes or coding regions later added to the segment by the hand of man.

[0133]As will be understood by those skilled in the art, the polynucleotide compositions of this invention can include genomic sequences, extra-genomic and plasmid-encoded sequences and smaller engineered gene segments that express, or may be adapted to express, proteins, polypeptides, peptides and the like. Such segments may be naturally isolated, or modified synthetically by the hand of man.

[0134]As will also be recognized, polynucleotides of the invention may be single-stranded (coding or antisense) or double-stranded, and may be DNA (genomic, cDNA or synthetic) or RNA molecules. RNA molecules may include HnRNA molecules, which contain introns and correspond to a DNA molecule in a one-to-one manner, and mRNA molecules, which do not contain introns. Additional coding or non-coding sequences may, but need not, be present within a polynucleotide of the present invention, and a polynucleotide may, but need not, be linked to other molecules and/or support materials.

[0135]In addition to the EBD polynucleotide sequences set forth herein, the present invention also provides EBD polynucleotide variants having substantial identity to an EBD polynucleotide sequence disclosed herein, for example those comprising at least 50% sequence identity, preferably at least, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% or higher, sequence identity compared to an EBD polynucleotide sequence of this invention using the methods described herein, (e.g., BLAST analysis using standard parameters, as described below). One skilled in this art will recognize that these values can be appropriately adjusted to determine corresponding identity of polypeptides encoded by two polynucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like.

[0136]Typically, EBD polynucleotide variants will contain one or more substitutions, additions, deletions and/or insertions, preferably such that the activity (e.g., improved folding, reduced aggregation and/or improved yield, when in fusion with a heterologous sequence of interest) of the polypeptide encoded by the variant polynucleotide is not substantially diminished relative to the corresponding unmodified polynucleotide sequence.

[0137]In additional embodiments, the present invention provides polynucleotide fragments comprising or consisting of various lengths of contiguous stretches of sequence identical to or complementary to one or more of the EBD polynucleotide sequences disclosed herein. For example, polynucleotides are provided by this invention that comprise or consist of at least about 10, 15, 20, 30, 40, 50, 75, 100, 150, 200, 300, 400, 500 or 1000 or more contiguous nucleotides of one or more of the sequences disclosed herein as well as all intermediate lengths there between. It will be readily understood that "intermediate lengths", in this context, means any length between the quoted values, such as 16, 17, 18, 19, etc.; 21, 22, 23, etc.; 30, 31, 32, etc.; 50, 51, 52, 53, etc.; 100, 101, 102, 103, etc.; 150, 151, 152, 153, etc.; including all integers through 200-500; 500-1,000, and the like. A polynucleotide sequence as described here may be extended at one or both ends by additional nucleotides not found in the native sequence. This additional sequence may consist of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides at either end of the disclosed sequence or at both ends of the disclosed sequence. Preferably, an EBD polynucleotide fragment of the invention encodes a fusion polypeptide that retains one or more desired activities, e.g., improved folding, reduced aggregation and/or improved yield, when in fusion with a heterologous sequence of interest.

[0138]The EBD polynucleotides of the present invention, or fragments thereof, regardless of the length of the coding sequence itself, may be combined with other DNA sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol. For example, illustrative polynucleotide segments with total lengths of about 10,000, about 5000, about 3000, about 2,000, about 1,000, about 500, about 200, about 100, about 50 base pairs in length, and the like, (including all intermediate lengths) are contemplated to be useful in many implementations of this invention.

[0139]It will be appreciated by those of ordinary skill in the art that, as a result of the degeneracy of the genetic code, there are many nucleotide sequences that will encode a polypeptide as described herein. Some of these polynucleotides bear minimal homology to the native polynucleotide sequence. Nonetheless, polynucleotides that vary due to differences in codon usage are specifically contemplated by the present invention. Further, different alleles of an EBD polynucleotide sequence provided herein are within the scope of the present invention. Alleles are endogenous sequences that are altered as a result of one or more mutations, such as deletions, additions and/or substitutions of nucleotides. The resulting mRNA and protein may, but need not, have an altered structure or function. Alleles may be identified using standard techniques (such as hybridization, amplification and/or database sequence comparison).

[0140]In another embodiment of the invention, a mutagenesis approach, such as site-specific mutagenesis, may be employed for the preparation of variants and/or derivatives of the EBD polynucleotides and polypeptides described herein. By this approach, for example, specific modifications in a polypeptide sequence can be made through mutagenesis of the underlying polynucleotides that encode them. These techniques provides a straightforward approach to prepare and test sequence variants, for example, incorporating one or more of the foregoing considerations, by introducing one or more nucleotide sequence changes into the polynucleotide.

[0141]Site-specific mutagenesis allows the production of mutants through the use of specific oligonucleotide sequences which encode the DNA sequence of the desired mutation, as well as a sufficient number of adjacent nucleotides, to provide a primer sequence of sufficient size and sequence complexity to form a stable duplex on both sides of the deletion junction being traversed. Mutations may be employed in a selected polynucleotide sequence to improve, alter, decrease, modify, or otherwise change the properties of the polynucleotide itself, and/or alter the properties, activity, composition, stability, or primary sequence of the encoded polypeptide.

[0142]In certain embodiments, the present invention contemplates the mutagenesis of the disclosed polynucleotide sequences to alter one or more activities/properties of the encoded polypeptide. The techniques of site-specific mutagenesis are well-known in the art, and are widely used to create variants of both polypeptides and polynucleotides. For example, site-specific mutagenesis is often used to alter a specific portion of a DNA molecule. In such embodiments, a primer comprising typically about 14 to about 25 nucleotides or so in length may be employed, with about 5 to about 10 residues on both sides of the junction of the sequence being altered.

[0143]As will be appreciated by those of skill in the art, site-specific mutagenesis techniques have often employed a phage vector that exists in both a single stranded and double stranded form. Typical vectors useful in site-directed mutagenesis include vectors such as the M13 phage. These phage are readily commercially-available and their use is generally well-known to those skilled in the art. Double-stranded plasmids are also routinely employed in site directed mutagenesis that eliminates the step of transferring the gene of interest from a plasmid to a phage.

[0144]In general, site-directed mutagenesis in accordance herewith is performed by first obtaining a single-stranded vector or melting apart of two strands of a double-stranded vector that includes within its sequence a DNA sequence that encodes the desired peptide. An oligonucleotide primer bearing the desired mutated sequence is prepared, generally synthetically. This primer is then annealed with the single-stranded vector, and subjected to DNA polymerizing enzymes such as E. coli polymerase I Klenow fragment, in order to complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed wherein one strand encodes the original non-mutated sequence and the second strand bears the desired mutation. This heteroduplex vector is then used to transform appropriate cells, such as E. coli cells, and clones are selected which include recombinant vectors bearing the mutated sequence arrangement.

[0145]The preparation of sequence variants of the selected peptide-encoding DNA segments using site-directed mutagenesis provides a means of producing potentially useful species and is not meant to be limiting as there are other ways in which sequence variants of peptides and the DNA sequences encoding them may be obtained. For example, recombinant vectors encoding the desired peptide sequence may be treated with mutagenic agents, such as hydroxylamine, to obtain sequence variants. Specific details regarding these methods and protocols are found in the teachings of Maloy et al., 1994; Segal, 1976; Prokop and Bajpai, 1991; Kuby, 1994; and Maniatis et al., 1982, each incorporated herein by reference, for that purpose.

[0146]As used herein, the term "oligonucleotide directed mutagenesis procedure" refers to template-dependent processes and vector-mediated propagation which result in an increase in the concentration of a specific nucleic acid molecule relative to its initial concentration, or in an increase in the concentration of a detectable signal, such as amplification. As used herein, the term "oligonucleotide directed mutagenesis procedure" is intended to refer to a process that involves the template-dependent extension of a primer molecule. The term template dependent process refers to nucleic acid synthesis of an RNA or a DNA molecule wherein the sequence of the newly synthesized strand of nucleic acid is dictated by the well-known rules of complementary base pairing (see, for example, Watson, 1987). Typically, vector mediated methodologies involve the introduction of the nucleic acid fragment into a DNA or RNA vector, the clonal amplification of the vector, and the recovery of the amplified nucleic acid fragment. Examples of such methodologies are provided by U.S. Pat. No. 4,237,224, specifically incorporated herein by reference in its entirety.

[0147]In another approach for the production of polypeptide variants of the present invention, recursive sequence recombination, as described in U.S. Pat. No. 5,837,458, may be employed. In this approach, iterative cycles of recombination and screening or selection are performed to "evolve" individual polynucleotide variants of the invention wherein one or more desired activities is improved or modified.

[0148]In other embodiments of the present invention, the polynucleotide sequences provided herein can be advantageously used as probes or primers for nucleic acid hybridization. As such, it is contemplated that nucleic acid segments that comprise or consist of a sequence region of at least about a 15 nucleotide long contiguous sequence that has the same sequence as, or is complementary to, a 15 nucleotide long contiguous sequence disclosed herein may be used. Longer contiguous identical or complementary sequences, e.g., those of about 20, 30, 40, 50, 100, 200, 500, 1000 (including all intermediate lengths) and even up to full length sequences will also be of use in certain embodiments.

[0149]Many template dependent processes are available to amplify a target sequences of interest present in a sample. One of the best known amplification methods is the polymerase chain reaction (PCR®) which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159, each of which is incorporated herein by reference in its entirety. Briefly, in PCR®, two primer sequences are prepared which are complementary to regions on opposite complementary strands of the target sequence. An excess of deoxynucleoside triphosphates is added to a reaction mixture along with a DNA polymerase (e.g., Taq polymerase). If the target sequence is present in a sample, the primers will bind to the target and the polymerase will cause the primers to be extended along the target sequence by adding on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended primers will dissociate from the target to form reaction products, excess primers will bind to the target and to the reaction product and the process is repeated. Preferably reverse transcription and PCR® amplification procedure may be performed in order to quantify the amount of mRNA amplified. Polymerase chain reaction methodologies are well known in the art.

[0150]Any of a number of other template dependent processes, many of which are variations of the PCR® amplification technique, are readily known and available in the art. Illustratively, some such methods include the ligase chain reaction (referred to as LCR), described, for example, in Eur. Pat. Appl. Publ. No. 320,308 and U.S. Pat. No. 4,883,750; Qbeta Replicase, described in PCT Intl. Pat. Appl. Publ. No. PCT/US87/00880; Strand Displacement Amplification (SDA) and Repair Chain Reaction (RCR). Still other amplification methods are described in Great Britain Pat. Appl. No. 2 202 328, and in PCT Intl. Pat. Appl. Publ. No. PCT/US89/01025. Other nucleic acid amplification procedures include transcription-based amplification systems (TAS) (PCT Intl. Pat. Appl. Publ. No. WO 88/10315), including nucleic acid sequence based amplification (NASBA) and 3SR. Eur. Pat. Appl. Publ. No. 329,822 describes a nucleic acid amplification process involving cyclically synthesizing single-stranded RNA ("ssRNA"), ssDNA, and double-stranded DNA (dsDNA). PCT Intl. Pat. Appl. Publ. No. WO 89/06700 describes a nucleic acid sequence amplification scheme based on the hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies of the sequence. Other amplification methods such as "RACE" (Frohman, 1990), and "one-sided PCR" (Ohara, 1989) are also well-known to those of skill in the art.

[0151]As noted, the EBD fusion polynucleotides, polypeptides and vectors of the present invention are advantageous in the context of recombinant polypeptide production, particularly where it is desired to achieve, for example, improved solubility, improved yield, improved folding and/or reduced aggregation of a heterologous polypeptide to which an EBD polypeptide sequence has been operably fused. Therefore, another aspect of the invention provides methods for producing a recombinant protein, for example by introducing into a host cell an expression vector comprising a polynucleotide sequence encoding a fusion polypeptide as described herein, e.g., a fusion polypeptide comprising at least one EBD sequence and at least one heterologous polypeptide sequence of interest; and expressing the fusion polypeptide in the host cell. In a related embodiment, the method further comprises the step of isolating the fusion polypeptide from the host cell. In another embodiment, the method further comprises the step of removing an entropic bristle domain sequence from the fusion polypeptide before or after isolating the fusion polypeptide from the host cell.

[0152]For recombinant production of a fusion polypeptide of the invention, DNA sequences encoding the polypeptide components of a fusion polypeptide (e.g., one or more EBD sequences and a heterologous polypeptide sequence of interest) may be assembled using conventional methodologies. In one example, the components may be assembled separately and ligated into an appropriate expression vector. For example, the 3' end of the DNA sequence encoding one polypeptide component is ligated, with or without a peptide linker, to the 5' end of a DNA sequence encoding the second polypeptide component so that the reading frames of the sequences are in phase. This permits translation into a single fusion polypeptide that retains the activities of both component polypeptides.

[0153]A peptide linker sequence may be employed to separate an EBD polypeptide sequence from a heterologous polypeptide sequence by some defined distance, for example a distance sufficient to ensure that the advantages of the invention are achieved, e.g, advantages such as improved folding, reduced aggregation and/or improved yield. Such a peptide linker sequence may be incorporated into the fusion polypeptide using standard techniques well known in the art. Suitable peptide linker sequences may be chosen based, for example, on the factors such as: (1) their ability to adopt a flexible extended conformation; and (2) their inability to adopt a secondary structure that could interfere with the activity of the EBD sequence. Illustrative peptide linker sequences, for example, may contain Gly, Asn and Ser residues. Other near neutral amino acids, such as Thr and Ala may also be used in the linker sequence. Amino acid sequences which may be usefully employed as linkers include those disclosed in Maratea et al., Gene 40:39-46, 1985; Murphy et al., Proc. Natl. Acad. Sci. USA 83:8258-8262, 1986; U.S. Pat. No. 4,935,233 and U.S. Pat. No. 4,751,180. The linker sequence may generally be from 1 to about 50 amino acids in length, for example.

[0154]The ligated DNA sequences of a fusion polynucleotide are operably linked to suitable transcriptional and/or translational regulatory elements. The regulatory elements responsible for expression of DNA are located only 5' to the DNA sequence encoding the first polypeptides. Similarly, stop codons required to end translation and transcription termination signals are only present 3' to the DNA sequence encoding the second polypeptide.

[0155]The EBD and heterologous polynucleotide sequences may comprise a sequence as described herein, or may comprise a sequence that has been modified to facilitate recombinant polypeptide production. As will be understood by those of skill in the art, it may be advantageous in some instances to produce polypeptide-encoding polynucleotide sequences possessing non-naturally occurring codons. For example, codons preferred by a particular prokaryotic or eukaryotic host can be selected to increase the rate of protein expression or to produce a recombinant RNA transcript having desirable properties, such as a half-life which is longer than that of a transcript generated from the naturally occurring sequence.

[0156]Moreover, the polynucleotide sequences of the present invention can be engineered using methods generally known in the art in order to alter polypeptide encoding sequences for a variety of reasons, including but not limited to, alterations which modify the cloning, processing, and/or expression of the gene product. For example, DNA shuffling by random fragmentation and PCR reassembly of gene fragments and synthetic oligonucleotides may be used to engineer the nucleotide sequences. In addition, site-directed mutagenesis may be used to insert new restriction sites, alter glycosylation patterns, change codon preference, produce splice variants, or introduce mutations, and so forth.

[0157]In a particular embodiment, a fusion polynucleotide is engineered to further comprise a cleavage site located between the EBD polypeptide-encoding sequence and the heterologous polypeptide sequence, so that the hetereolous polypeptide may be cleaved and purified away from an EBD polypeptide sequence at any desired stage following expression of the fusion polypeptide. Illustratively, a fusion polynucleotide of the invention may be designed to include heparin, thrombin, or factor Xa protease cleavage sites.

[0158]In order to express a desired polypeptide, the nucleotide sequences encoding the polypeptide, or functional equivalents, may be inserted into appropriate expression vector, i.e., a vector which contains the necessary elements for the transcription and translation of an inserted coding sequence. Methods which are well known to those skilled in the art may be used to construct expression vectors containing sequences encoding a polypeptide of interest and appropriate transcriptional and translational control elements. These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. Such techniques are described, for example, in Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview, N.Y., and Ausubel, F. M. et al. (1989) Current Protocols in Molecular Biology, John Wiley & Sons, New York. N.Y.

[0159]A variety of expression vector/host systems may be utilized to contain and express polynucleotide sequences of the present invention. These include, but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transformed with virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal cell systems.

[0160]The "control elements" or "regulatory sequences" present in an expression vector are those non-translated regions of the vector--enhancers, promoters, 5' and 3' untranslated regions--which interact with host cellular proteins to carry out transcription and translation. Such elements may vary in their strength and specificity. Depending on the vector system and host utilized, any number of suitable transcription and translation elements, including constitutive and inducible promoters, may be used. For example, when cloning in bacterial systems, inducible promoters such as the hybrid lacZ promoter of the pBLUESCRIPT phagemid (Stratagene, La Jolla, Calif.) or pSPORT1 plasmid (Gibco BRL, Gaithersburg, Md.) and the like may be used. In mammalian cell systems, promoters from mammalian genes or from mammalian viruses are generally preferred. If it is necessary to generate a cell line that contains multiple copies of the sequence encoding a polypeptide, vectors based on SV40 or EBV may be advantageously used with an appropriate selectable marker.

[0161]In bacterial systems, any of a number of expression vectors may be selected depending upon the use intended for the expressed polypeptide. For example, when large quantities are needed, for example for the induction of antibodies, vectors which direct high level expression of fusion proteins that are readily purified may be used. Such vectors include, but are not limited to, the multifunctional E. coli cloning and expression vectors such as pBLUESCRIPT (Stratagene), in which the sequence encoding the polypeptide of interest may be ligated into the vector in frame with sequences for the amino-terminal Met and the subsequent 7 residues of β-galactosidase so that a hybrid protein is produced; pIN vectors (Van Heeke, G. and S. M. Schuster (1989) J. Biol. Chem. 264:5503-5509); and the like. Proteins made in such systems may be designed to include heparin, thrombin, or factor Xa protease cleavage sites so that the cloned polypeptide of interest can be released from the EBD moiety at will.

[0162]In the yeast, Saccharomyces cerevisiae, a number of vectors containing constitutive or inducible promoters such as alpha factor, alcohol oxidase, and PGH may be used. For reviews, see Ausubel et al. (supra) and Grant et al. (1987) Methods Enzymol. 153:516-544.

[0163]In cases where plant expression vectors are used, the expression of sequences encoding polypeptides may be driven by any of a number of promoters. For example, viral promoters such as the 35S and 19S promoters of CaMV may be used alone or in combination with the omega leader sequence from TMV (Takamatsu, N. (1987) EMBO J. 6:307-311. Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock promoters may be used (Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, R. et al. (1984) Science 224:838-843; and Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105). These constructs can be introduced into plant cells by direct DNA transformation or pathogen-mediated transfection. Such techniques are described in a number of generally available reviews (see, for example, Hobbs, S. or Murry, L. E. in McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, New York, N.Y.; pp. 191-196).

[0164]An insect system may also be used to express a polypeptide of interest. For example, in one such system, Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. The sequences encoding the polypeptide may be cloned into a non-essential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the polypeptide-encoding sequence will render the polyhedrin gene inactive and produce recombinant virus lacking coat protein. The recombinant viruses may then be used to infect, for example, S. frugiperda cells or Trichoplusia larvae in which the polypeptide of interest may be expressed (Engelhard, E. K. et al. (1994) Proc. Natl. Acad. Sci. 91:3224-3227).

[0165]In mammalian host cells, a number of viral-based expression systems are generally available. For example, in cases where an adenovirus is used as an expression vector, sequences encoding a polypeptide of interest may be ligated into an adenovirus transcription/translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a non-essential E1 or E3 region of the viral genome may be used to obtain a viable virus which is capable of expressing the polypeptide in infected host cells (Logan, J. and Shenk, T. (1984) Proc. Natl. Acad. Sci. 81:3655-3659). In addition, transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells.

[0166]Specific initiation signals may also be used to achieve more efficient translation of sequences encoding a polypeptide of interest. Such signals include the ATG initiation codon and adjacent sequences. In cases where sequences encoding the polypeptide, its initiation codon, and upstream sequences are inserted into the appropriate expression vector, no additional transcriptional or translational control signals may be needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous translational control signals including the ATG initiation codon should be provided. Furthermore, the initiation codon should be in the correct reading frame to ensure translation of the entire insert. Exogenous translational elements and initiation codons may be of various origins, both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of enhancers which are appropriate for the particular cell system which is used, such as those described in the literature (Scharf, D. et al. (1994) Results Probl. Cell Differ. 20:125-162).

[0167]In addition, a host cell strain may be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed protein in the desired fashion. Such modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation. Post-translational processing which cleaves a "prepro" form of the protein may also be used to facilitate correct insertion, folding and/or function. Different host cells such as CHO, COS, HeLa, MDCK, HEK293, and WI38, which have specific cellular machinery and characteristic mechanisms for such post-translational activities, may be chosen to ensure the correct modification and processing of the foreign protein.

[0168]For long-term, high-yield production of recombinant proteins, stable expression is generally preferred. For example, cell lines which stably express a polynucleotide of interest may be transformed using expression vectors which may contain viral origins of replication and/or endogenous expression elements and a selectable marker gene on the same or on a separate vector. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an enriched media before they are switched to selective media. The purpose of the selectable marker is to confer resistance to selection, and its presence allows growth and recovery of cells which successfully express the introduced sequences. Resistant clones of stably transformed cells may be proliferated using tissue culture techniques appropriate to the cell type.

[0169]Any number of selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler, M. et al. (1977) Cell 11:223-32) and adenine phosphoribosyltransferase (Lowy, I. et al. (1990) Cell 22:817-23) genes which can be employed in tk.sup.- or aprt.sup.-cells, respectively. Also, antimetabolite, antibiotic or herbicide resistance can be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. 77:3567-70); npt, which confers resistance to the aminoglycosides, neomycin and G-418 (Colbere-Garapin, F. et al (1981) J. Mol. Biol. 150:1-14); and als or pat, which confer resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively (Murry, supra). Additional selectable genes have been described, for example, trpB, which allows cells to utilize indole in place of tryptophan, or hisD, which allows cells to utilize histinol in place of histidine (Hartman, S. C. and R. C. Mulligan (1988) Proc. Natl. Acad. Sci. 85:8047-51). The use of visible markers has gained popularity with such markers as anthocyanins, beta-glucuronidase and its substrate GUS, and luciferase and its substrate luciferin, being widely used not only to identify transformants, but also to quantify the amount of transient or stable protein expression attributable to a specific vector system (Rhodes, C. A. et al. (1995) Methods Mol. Biol. 55:121-131).

[0170]Although the presence/absence of marker gene expression suggests that the gene of interest is also present, its presence and expression may need to be confirmed. For example, if the sequence encoding a polypeptide is inserted within a marker gene sequence, recombinant cells containing sequences can be identified by the absence of marker gene function. Alternatively, a marker gene can be placed in tandem with a polypeptide-encoding sequence under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem gene as well.

[0171]Alternatively, host cells that contain and express a desired polynucleotide sequence may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridizations and protein bioassay or immunoassay techniques which include, for example, membrane, solution, or chip based technologies for the detection and/or quantification of nucleic acid or protein.

[0172]A variety of protocols for detecting and measuring the expression of polynucleotide-encoded products, using either polyclonal or monoclonal antibodies specific for the product are known in the art. Examples include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and fluorescence activated cell sorting (FACS). A two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on a given polypeptide may be preferred for some applications, but a competitive binding assay may also be employed. These and other assays are described, among other places, in Hampton, R. et al. (1990; Serological Methods, a Laboratory Manual, APS Press, St Paul. Minn.) and Maddox, D. E. et al. (1983; J. Exp. Med. 158:1211-1216).

[0173]A wide variety of labels and conjugation techniques are known by those skilled in the art and may be used in various nucleic acid and amino acid assays. Means for producing labeled hybridization or PCR probes for detecting sequences related to polynucleotides include oligolabeling, nick translation, end-labeling or PCR amplification using a labeled nucleotide. Alternatively, the sequences, or any portions thereof may be cloned into a vector for the production of an mRNA probe. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3, or SP6 and labeled nucleotides. These procedures may be conducted using a variety of commercially available kits. Suitable reporter molecules or labels, which may be used include radionuclides, enzymes, fluorescent, chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles, and the like.

[0174]Host cells transformed with a polynucleotide sequence of interest may be cultured under conditions suitable for the expression and recovery of the polypeptide from cell culture. The polypeptide produced by a recombinant cell may be secreted or contained intracellularly depending on the sequence and/or the vector used. As will be understood by those of skill in the art, expression vectors containing polynucleotides of the invention may be designed to contain signal sequences which direct secretion of the encoded polypeptide through a prokaryotic or eukaryotic cell membrane. Other recombinant constructions may be used to join sequences encoding a polypeptide of interest to polynucleotide sequence encoding a polypeptide domain which will facilitate purification of soluble proteins. Such purification facilitating domains include, but are not limited to, metal chelating peptides such as histidine-tryptophan modules that allow purification on immobilized metals, protein A domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp., Seattle, Wash.). The inclusion of cleavable linker sequences such as those specific for Factor Xa or enterokinase (Invitrogen. San Diego, Calif.) between the purification domain and the encoded polypeptide may be used to facilitate purification. One such expression vector provides for expression of a fusion protein containing a polypeptide of interest and a nucleic acid encoding 6 histidine residues preceding a thioredoxin or an enterokinase cleavage site. The histidine residues facilitate purification on IMIAC (immobilized metal ion affinity chromatography) as described in Porath, J. et al. (1992, Prot. Exp. Purif. 3:263-281) while the enterokinase cleavage site provides a means for purifying the desired polypeptide from the fusion protein. Further discussion of vectors which comprise fusion proteins can be found in Kroll, D. J. et al. (1993; DNA Cell Biol. 12:441-453).

[0175]In addition to recombinant production methods, polypeptides of the invention, and fragments thereof, may be produced by direct peptide synthesis using solid-phase techniques (Merrifield J. (1963) J. Am. Chem. Soc. 85:2149-2154). Polypeptide synthesis may be performed using manual techniques or by automation. Automated synthesis may be achieved, for example, using Applied Biosystems 431A Peptide Synthesizer (Perkin Elmer). Alternatively, various fragments may be chemically synthesized separately and combined using chemical methods to produce the full length molecule.

[0176]According to another aspect, the present invention further provides binding agents, such as antibodies and antigen-binding fragments thereof, that specifically bind to an EBD sequence according to the present invention, or to a portion, variant or derivative thereof. Such binding agents may be used, for example, to detect the presence of a polypeptide comprising an EBD sequence, to facilitate purification of a polypeptide comprising an EBD sequence, and the like. An antibody, or antigen-binding fragment thereof, is said to "specifically bind" to a polypeptide if it reacts at a detectable level (within, for example, an ELISA assay) with the polypeptide, and does not react detectably with unrelated polypeptides under similar conditions.

[0177]Antibodies and other binding agents can be prepared using conventional methodologies. For example, monoclonal antibodies specific for a polypeptide of interest may be prepared using the technique of Kohler and Milstein, Eur. J. Immunol. 6:511-519, 1976, and improvements thereto. Briefly, these methods involve the preparation of immortal cell lines capable of producing antibodies having the desired specificity (i.e., reactivity with the polypeptide of interest). Such cell lines may be produced, for example, from spleen cells obtained from an animal immunized as described above. The spleen cells are then immortalized by, for example, fusion with a myeloma cell fusion partner, preferably one that is syngeneic with the immunized animal. A variety of fusion techniques may be employed. For example, the spleen cells and myeloma cells may be combined with a nonionic detergent for a few minutes and then plated at low density on a selective medium that supports the growth of hybrid cells, but not myeloma cells. A preferred selection technique uses HAT (hypoxanthine, aminopterin, thymidine) selection. After a sufficient time, usually about 1 to 2 weeks, colonies of hybrids are observed. Single colonies are selected and their culture supernatants tested for binding activity against the polypeptide. Hybridomas having high reactivity and specificity are preferred.

[0178]Monoclonal antibodies may be isolated from the supernatants of growing hybridoma colonies. In addition, various techniques may be employed to enhance the yield, such as injection of the hybridoma cell line into the peritoneal cavity of a suitable vertebrate host, such as a mouse. Monoclonal antibodies may then be harvested from the ascites fluid or the blood. Contaminants may be removed from the antibodies by conventional techniques, such as chromatography, gel filtration, precipitation, and extraction. The polypeptides of this invention may be used in the purification process in, for example, an affinity chromatography step.

[0179]A number of "humanized" antibody molecules comprising an antigen-binding site derived from a non-human immunoglobulin have been described, including chimeric antibodies having rodent V regions and their associated CDRs fused to human constant domains (Winter et al. (1991) Nature 349:293-299; Lobuglio et al. (1989) Proc. Nat. Acad. Sci. USA 86:4220-4224; Shaw et al. (1987) J. Immunol. 138:4534-4538; and Brown et al. (1987) Cancer Res. 47:3577-3583), rodent CDRs grafted into a human supporting FR prior to fusion with an appropriate human antibody constant domain (Riechmann et al. (1988) Nature 332:323-327; Verhoeyen et al. (1988) Science 239:1534-1536; and Jones et al. (1986) Nature 321:522-525), and rodent CDRs supported by recombinantly veneered rodent FRs (European Patent Publication No. 519,596, published Dec. 23, 1992). These "humanized" molecules are designed to minimize unwanted immunological response toward rodent antihuman antibody molecules which limits the duration and effectiveness of therapeutic applications of those moieties in human recipients.

[0180]Yet another aspect of the invention provides kits comprising one or more compositions described herein, e.g., an isolated EBD polynucleotide, polypeptide, antibody, vector, host cell, etc. In a particular embodiment, the invention provides a kit containing an expression vector comprising a polynucleotide sequence encoding an EBD polypeptide sequence and a multiple cloning site for easily introducing into the vector a polynucleotide sequence encoding a heterologous polypeptide sequence of interest. In another embodiment, the expression vector further comprises an engineered cleavage site to facilitate separation of the an EBD polypeptide sequence from the hetereologous polypeptide sequence of interest following recombinant production.

[0181]The following Examples are offered by way of illustration and not by way of limitation.

EXAMPLES

Example 1

Use of Neurofilament Triplet M Protein (NF-M) in an Entropic Bristle Domain Vector

[0182]The heterogeneity in the charge distribution of the human NF-M protein sequence was determined (shown below). The observed heterogeneity of the sequence suggests that EBDs with different characteristics may result for different regions of the sequence. For example, a 422-600 fragment is predominantly negatively charged. This fragment could be used as a basis to design EBDs for negatively charged proteins. The charge distribution in the 601-916 fragment is very heterogeneous. It can be used as a basis to design EBDs both for positively- and negatively-charged proteins.

[0183]Cloning of EBD sequence: We obtained the full-length cDNA for human NF-M from Origene Technologies (Rockville, Md.) and cloned the coding region for a 494-residue EBD sequence (residues 422 to 916 of the NF-M protein) into a pMALc2E vector from which the maltose-binding protein coding region had been deleted. (See FIG. 1.) Restriction sites suitable for cloning the test proteins were engineered at the appropriate locations. The proximity of the start codon in the cloned target sequences to the Shine Delgarno sequence of the vector was the same as that in pMALc2E. This construct is referred to as pEBDM.

[0184]Preparation of heterologous sequence: The coding region of a heterologous sequence of interest may be examined for rare E. Coli codons and restrictions sites for a suitable cloning strategy. Prior to cloning, incompatible codons and restriction sites may be altered by site directed mutagenesis. The heterologous protein coding region, not including the stop codon, is PCR-amplified using primers containing the relevant restriction sites for the 5' and the 3' ends of the test protein open reading frame respectively.

[0185]Assembly of EBD expression vector: The PCR-amplified open reading frame of the heterologous polypeptide sequence of interest is ligated into the pEBDM vector backbone following digestion with appropriate restriction enzymes. In addition to cloning the heterologous sequence into an EBD expression vector, the test proteins may be cloned, for example, into an MBP expression vector (e.g., pMAL®-c2E, which already contains a maltose-binding protein coding region) as well as a control vector. The pMAL®-c2E serves as a positive control. To construct the control vector backbone, a KpnI site is added to pMAL®-c2E at base 1524 by site-directed mutagenesis of 4 bases. This allows excision of the MBP coding region (including the start codon) by KpnI digestion and re-ligation.

[0186]Protein expression and solubility analysis are carried out essentially according to the procedures of Kapust and Waugh. Briefly, the construct is transformed into E. Coli BL21/DE3 cells (Stratagene, LaJolla, Calif.). This cell line provides increased protein stability due to its deficiency in both the OmpT and Lon proteases. The transformed cells are grown at 37° C. with shaking in LB broth supplemented with the appropriate antibiotics, diluted 50 fold, and grown to an OD₆₀₀ of 0.6 before induction. Recombinant protein productions is induced by adding IPTG to a final concentration of 1 mM, grown for more 3 hours, and harvested by centrifugation. The pellets are resuspended in 0.1 volume of lysis buffer and sonicated to disrupt cells. A sample of this crude lysate is reserved and used for total protein analyses. After the crude lysate is cleared by centrifugation, a sample of the cleared lysate will be used for soluble protein analyses. These samples are run on SDS-PAGE gels using standard procedures and visualized by Coomassie staining. The non-degraded soluble recombinant protein is apparent as a heavy band of the appropriate size.

[0187]The stained gels are scanned using an Epson Perfection 3200 scanner (Epson, Long Beach, Calif.) and the density of the protein bands is quantified using Total Lab image analysis software (Nonlinear Dynamics, Newcastle upon Tyne, UK). The densities of the bands corresponding to the fusion protein are normalized by dividing by the combined density of all the E. coli proteins larger than the largest fusion protein. Percent solubility is calculated by dividing the normalized density of the fusion protein band in the cleared lysate (soluble protein) lane by the normalized density of the fusion protein band in the crude lysate (total protein) protein lane after subtracting the normalized background density obtained from lanes containing equivalent protein extracts from E. Coli cells grown with an empty vector. Mean and standard deviation are calculated for at least three independent experiments.

Sequence CWU 1

14411026PRTHomo sapiens 1Met Met Ser Phe Gly Gly Ala Asp Ala Leu Leu Gly Ala Pro Phe Ala1 5 10 15Pro Leu His Gly Gly Gly Ser Leu His Tyr Ala Leu Ala Arg Lys Gly20 25 30Gly Ala Gly Gly Thr Arg Ser Ala Ala Gly Ser Ser Ser Gly Phe His35 40 45Ser Trp Thr Arg Thr Ser Val Ser Ser Val Ser Ala Ser Pro Ser Arg50 55 60Phe Arg Gly Ala Gly Ala Ala Ser Ser Thr Asp Ser Leu Asp Thr Leu65 70 75 80Ser Asn Gly Pro Glu Gly Cys Met Val Ala Val Ala Thr Ser Arg Ser85 90 95Glu Lys Glu Gln Leu Gln Ala Leu Asn Asp Arg Phe Ala Gly Tyr Ile100 105 110Asp Lys Val Arg Gln Leu Glu Ala His Asn Arg Ser Leu Glu Gly Glu115 120 125Ala Ala Ala Leu Arg Gln Gln Gln Ala Gly Arg Ser Ala Met Gly Glu130 135 140Leu Tyr Glu Arg Glu Val Arg Glu Met Arg Gly Ala Val Leu Arg Leu145 150 155 160Gly Ala Ala Arg Gly Gln Leu Arg Leu Glu Gln Glu His Leu Leu Glu165 170 175Asp Ile Ala His Val Arg Gln Arg Leu Asp Asp Glu Ala Arg Gln Arg180 185 190Glu Glu Ala Glu Ala Ala Ala Arg Ala Leu Ala Arg Phe Ala Gln Glu195 200 205Ala Glu Ala Ala Arg Val Asp Leu Gln Lys Lys Ala Gln Ala Leu Gln210 215 220Glu Glu Cys Gly Tyr Leu Arg Arg His His Gln Glu Glu Val Gly Glu225 230 235 240Leu Leu Gly Gln Ile Gln Gly Ser Gly Ala Ala Gln Ala Gln Met Gln245 250 255Ala Glu Thr Arg Asp Ala Leu Lys Cys Asp Val Thr Ser Ala Leu Arg260 265 270Glu Ile Arg Ala Gln Leu Glu Gly His Ala Val Gln Ser Thr Leu Gln275 280 285Ser Glu Glu Trp Phe Arg Val Arg Leu Asp Arg Leu Ser Glu Ala Ala290 295 300Lys Val Asn Thr Asp Ala Met Arg Ser Ala Gln Glu Glu Ile Thr Glu305 310 315 320Tyr Arg Arg Gln Leu Gln Ala Arg Thr Thr Glu Leu Glu Ala Leu Lys325 330 335Ser Thr Lys Asp Ser Leu Glu Arg Gln Arg Ser Glu Leu Glu Asp Arg340 345 350His Gln Ala Asp Ile Ala Ser Tyr Gln Glu Ala Ile Gln Gln Leu Asp355 360 365Ala Glu Leu Arg Asn Thr Lys Trp Glu Met Ala Ala Gln Leu Arg Glu370 375 380Tyr Gln Asp Leu Leu Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala385 390 395 400Ala Tyr Arg Lys Leu Leu Glu Gly Glu Glu Cys Arg Ile Gly Phe Gly405 410 415Pro Ile Pro Phe Ser Leu Pro Glu Gly Leu Pro Lys Ile Pro Ser Val420 425 430Ser Thr His Ile Lys Val Lys Ser Glu Glu Lys Ile Lys Val Val Glu435 440 445Lys Ser Glu Lys Glu Thr Val Ile Val Glu Glu Gln Thr Glu Glu Thr450 455 460Gln Val Thr Glu Glu Val Thr Glu Glu Glu Glu Lys Glu Ala Lys Glu465 470 475 480Glu Glu Gly Lys Glu Glu Glu Gly Gly Glu Glu Glu Glu Ala Glu Gly485 490 495Gly Glu Glu Glu Thr Lys Ser Pro Pro Ala Glu Glu Ala Ala Ser Pro500 505 510Glu Lys Glu Ala Lys Ser Pro Val Lys Glu Glu Ala Lys Ser Pro Ala515 520 525Glu Ala Lys Ser Pro Glu Lys Glu Glu Ala Lys Ser Pro Ala Glu Val530 535 540Lys Ser Pro Glu Lys Ala Lys Ser Pro Ala Lys Glu Glu Ala Lys Ser545 550 555 560Pro Pro Glu Ala Lys Ser Pro Glu Lys Glu Glu Ala Lys Ser Pro Ala565 570 575Glu Val Lys Ser Pro Glu Lys Ala Lys Ser Pro Ala Lys Glu Glu Ala580 585 590Lys Ser Pro Ala Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Val595 600 605Lys Glu Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Val Lys Glu610 615 620Glu Ala Lys Ser Pro Ala Glu Val Lys Ser Pro Glu Lys Ala Lys Ser625 630 635 640Pro Thr Lys Glu Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Glu645 650 655Lys Ala Lys Ser Pro Glu Lys Glu Glu Ala Lys Ser Pro Glu Lys Ala660 665 670Lys Ser Pro Val Lys Ala Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser675 680 685Pro Val Lys Ala Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Val690 695 700Lys Glu Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Val Lys Glu705 710 715 720Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Val Lys Glu Glu Ala725 730 735Lys Thr Pro Glu Lys Ala Lys Ser Pro Val Lys Glu Glu Ala Lys Ser740 745 750Pro Glu Lys Ala Lys Ser Pro Glu Lys Ala Lys Thr Leu Asp Val Lys755 760 765Ser Pro Glu Ala Lys Thr Pro Ala Lys Glu Glu Ala Arg Ser Pro Ala770 775 780Asp Lys Phe Pro Glu Lys Ala Lys Ser Pro Val Lys Glu Glu Val Lys785 790 795 800Ser Pro Glu Lys Ala Lys Ser Pro Leu Lys Glu Asp Ala Lys Ala Pro805 810 815Glu Lys Glu Ile Pro Lys Lys Glu Glu Val Lys Ser Pro Val Lys Glu820 825 830Glu Glu Lys Pro Gln Glu Val Lys Val Lys Glu Pro Pro Lys Lys Ala835 840 845Glu Glu Glu Lys Ala Pro Ala Thr Pro Lys Thr Glu Glu Lys Lys Asp850 855 860Ser Lys Lys Glu Glu Ala Pro Lys Lys Glu Ala Pro Lys Pro Lys Val865 870 875 880Glu Glu Lys Lys Glu Pro Ala Val Glu Lys Pro Lys Glu Ser Lys Val885 890 895Glu Ala Lys Lys Glu Glu Ala Glu Asp Lys Lys Lys Val Pro Thr Pro900 905 910Glu Lys Glu Ala Pro Ala Lys Val Glu Val Lys Glu Asp Ala Lys Pro915 920 925Lys Glu Lys Thr Glu Val Ala Lys Lys Glu Pro Asp Asp Ala Lys Ala930 935 940Lys Glu Pro Ser Lys Pro Ala Glu Lys Lys Glu Ala Ala Pro Glu Lys945 950 955 960Lys Asp Thr Lys Glu Glu Lys Ala Lys Lys Pro Glu Glu Lys Pro Lys965 970 975Thr Glu Ala Lys Ala Lys Glu Asp Asp Lys Thr Leu Ser Lys Glu Pro980 985 990Ser Lys Pro Lys Ala Glu Lys Ala Glu Lys Ser Ser Ser Thr Asp Gln995 1000 1005Lys Asp Ser Lys Pro Pro Glu Lys Ala Thr Glu Asp Lys Ala Ala Lys1010 1015 1020Gly Lys102523081DNAHomo sapiens 2atgatgagct tcggcggcgc ggacgcgctg ctgggcgccc cgttcgcgcc gctgcatggc 60ggcggcagcc tccactacgc gctagcccga aagggtggcg caggcgggac gcgctccgcc 120gctggctcct ccagcggctt ccactcgtgg acacggacgt ccgtgagctc cgtgtccgcc 180tcgcccagcc gcttccgtgg cgcaggcgcc gcctcaagca ccgactcgct ggacacgctg 240agcaacgggc cggagggctg catggtggcg gtggccacct cacgcagtga gaaggagcag 300ctgcaggcgc tgaacgaccg cttcgccggg tacatcgaca aggtgcggca gctggaggcg 360cacaaccgca gcctggaggg cgaggctgcg gcgctgcggc agcagcaggc gggccgctcc 420gctatgggcg agctgtacga gcgcgaggtc cgcgagatgc gcggcgcggt gctgcgcctg 480ggcgcggcgc gcggtcagct acgcctggag caggagcacc tgctcgagga catcgcgcac 540gtgcgccagc gcctagacga cgaggcccgg cagcgagagg aggccgaggc ggcggcccgc 600gcgctggcgc gcttcgcgca ggaggccgag gcggcgcgcg tggacctgca gaagaaggcg 660caggcgctgc aggaggagtg cggctacctg cggcgccacc accaggaaga ggtgggcgag 720ctgctcggcc agatccaggg ctccggcgcc gcgcaggcgc agatgcaggc cgagacgcgc 780gacgccctga agtgcgacgt gacgtcggcg ctgcgcgaga ttcgcgcgca gcttgaaggc 840cacgcggtgc agagcacgct gcagtccgag gagtggttcc gagtgaggct ggaccgactg 900tcggaggcag ccaaggtgaa cacagacgct atgcgctcag cgcaggagga gataactgag 960taccggcgtc agctgcaggc caggaccaca gagctggagg cactgaaaag caccaaggac 1020tcactggaga ggcagcgctc tgagctggag gaccgtcatc aggccgacat tgcctcctac 1080caggaagcca ttcagcagct ggacgctgag ctgaggaaca ccaagtggga gatggccgcc 1140cagctgcgag aataccagga cctgctcaat gtcaagatgg ctctggatat agagatagcc 1200gcttacagaa aactcctgga aggtgaagag tgtcggattg gctttggccc aattcctttc 1260tcgcttccag aaggactccc caaaattccc tctgtgtcca ctcacataaa ggtgaaaagc 1320gaagagaaga tcaaagtggt ggagaagtct gagaaagaaa ctgtgattgt ggaggaacag 1380acagaggaga cccaagtgac tgaagaagtg actgaagaag aggagaaaga ggccaaagag 1440gaggagggca aggaggaaga agggggtgaa gaagaggagg cagaaggggg agaagaagaa 1500acaaagtctc ccccagcaga agaggctgca tccccagaga aggaagccaa gtcaccagta 1560aaggaagagg caaagtcacc ggctgaggcc aagtccccag agaaggagga agcaaaatcc 1620ccagccgaag tcaagtcccc tgagaaggcc aagtctccag caaaggaaga ggcaaagtca 1680ccgcctgagg ccaagtcccc agagaaggag gaagcaaaat ctccagctga ggtcaagtcc 1740cccgagaagg ccaagtcccc agcaaaggaa gaggcaaagt caccggctga ggccaagtct 1800ccagagaagg ccaagtcccc agtgaaggaa gaagcaaagt caccggctga ggccaagtcc 1860ccagtgaagg aagaagcaaa atctccagct gaggtcaagt ccccggaaaa ggccaagtct 1920ccaacgaagg aggaagcaaa gtcccctgag aaggccaagt cccctgagaa ggccaagtcc 1980ccagagaagg aagaggccaa gtcccctgag aaggccaagt ccccagtgaa ggcagaagca 2040aagtcccctg agaaggccaa gtccccagtg aaggcagaag caaagtcccc tgagaaggcc 2100aagtccccag tgaaggaaga agcaaagtcc cctgagaagg ccaagtcccc agtgaaggaa 2160gaagcaaagt cccctgagaa ggccaagtcc ccagtgaagg aagaagcaaa gacccccgag 2220aaggccaagt ccccagtgaa ggaagaagcc aagtccccag agaaggccaa gtccccagag 2280aaggccaaga ctcttgatgt gaagtctcca gaagccaaga ctccagcgaa ggaggaagca 2340aggtcccctg cagacaaatt ccctgaaaag gccaaaagcc ctgtcaagga ggaggtcaag 2400tccccagaga aggcgaaatc tcccctgaag gaggatgcca aggcccctga gaaggagatc 2460ccaaaaaagg aagaggtgaa gtccccagtg aaggaggagg agaagcccca ggaggtgaaa 2520gtcaaagagc ccccaaagaa ggcagaggaa gagaaagccc ctgccacacc aaaaacagag 2580gagaagaagg acagcaagaa agaggaggca cccaagaagg aggctccaaa gcccaaggtg 2640gaggagaaga aggaacctgc tgtcgaaaag cccaaagaat ccaaagttga agccaagaag 2700gaagaggctg aagataagaa aaaagtcccc accccagaga aggaggctcc tgccaaggtg 2760gaggtgaagg aagacgctaa acccaaagaa aagacagagg tggccaagaa ggaaccagat 2820gatgccaagg ccaaggaacc cagcaaacca gcagagaaga aggaggcagc accggagaaa 2880aaagacacca aggaggagaa ggccaagaag cctgaggaga aacccaagac agaggccaaa 2940gccaaggaag atgacaagac cctctcaaaa gagcctagca agcctaaggc agaaaaggct 3000gaaaaatcct ccagcacaga ccaaaaagac agcaagcctc cagagaaggc cacagaagac 3060aaggccgcca aggggaagta a 308131087PRTMus musculus 3Met Ser Phe Gly Ser Ala Asp Ala Leu Leu Gly Ala Pro Phe Ala Pro1 5 10 15Leu His Gly Gly Gly Ser Leu His Tyr Ser Leu Ser Arg Lys Ala Gly20 25 30Pro Gly Gly Thr Arg Ser Ala Ala Gly Ser Ser Ser Gly Phe His Ser35 40 45Trp Ala Arg Thr Ser Val Ser Ser Val Ser Ala Ser Pro Ser Arg Phe50 55 60Arg Gly Ala Ala Ser Ser Thr Asp Ser Leu Asp Thr Leu Ser Asn Gly65 70 75 80Pro Glu Gly Cys Val Val Ala Ala Val Ala Ala Arg Ser Glu Lys Glu85 90 95Gln Leu Gln Ala Leu Asn Asp Arg Phe Ala Gly Tyr Ile Asp Lys Val100 105 110Arg Gln Leu Glu Ala His Asn Arg Ser Leu Glu Gly Glu Ala Ala Ala115 120 125Leu Arg Gln Gln Lys Gly Arg Ala Ala Met Gly Glu Leu Tyr Glu Arg130 135 140Glu Val Arg Glu Met Arg Gly Ala Val Leu Arg Leu Gly Ala Ala Arg145 150 155 160Gly Gln Leu Arg Leu Glu Gln Glu His Leu Leu Glu Asp Ile Ala His165 170 175Val Arg Gln Arg Leu Asp Glu Glu Ala Arg Gln Arg Glu Glu Ala Glu180 185 190Ala Ala Ala Arg Ala Leu Ala Phe Ala Gln Glu Ala Glu Ala Ala Arg195 200 205Val Glu Leu Gln Lys Lys Ala Gln Ala Leu Gln Glu Glu Cys Gly Tyr210 215 220Leu Arg Arg His His Gln Glu Glu Val Gly Glu Leu Leu Gly Gln Ile225 230 235 240Gln Gly Cys Gly Ala Ala Gln Ala Gln Ala Gln Ala Glu Ala Arg Asp245 250 255Ala Leu Lys Cys Asp Val Thr Ser Ala Leu Arg Glu Ile Arg Ala Gln260 265 270Leu Glu Gly His Ala Val Gln Ser Ser Leu Gln Ser Glu Glu Trp Phe275 280 285Arg Val Arg Leu Asp Arg Leu Ser Glu Ala Ala Lys Val Asn Thr Asp290 295 300Ala Met Arg Ser Ala Gln Glu Glu Ile Thr Glu Tyr Arg Arg Gln Leu305 310 315 320Gln Ala Arg Thr Thr Glu Leu Glu Ala Leu Lys Ser Thr Lys Glu Ser325 330 335Leu Glu Arg Gln Arg Ser Glu Leu Glu Asp Arg His Gln Ala Asp Ile340 345 350Ala Ser Tyr Gln Asp Ala Ile Gln Gln Leu Asp Ser Glu Leu Arg Asn355 360 365Thr Lys Trp Glu Met Ala Ala Gln Leu Arg Glu Tyr Gln Asp Leu Leu370 375 380Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr Arg Lys Leu385 390 395 400Leu Glu Gly Glu Glu Cys Arg Ile Gly Phe Gly Pro Ser Pro Phe Ser405 410 415Leu Thr Glu Gly Leu Pro Lys Ile Pro Ser Ile Ser Thr His Ile Lys420 425 430Val Lys Ser Glu Glu Met Ile Lys Val Val Glu Lys Ser Glu Lys Glu435 440 445Thr Val Ile Val Glu Gly Gln Thr Glu Glu Ile Arg Val Thr Glu Gly450 455 460Val Thr Glu Glu Glu Asp Lys Glu Ala Gln Gly Gln Glu Gly Glu Glu465 470 475 480Ala Glu Glu Gly Glu Glu Lys Glu Glu Glu Glu Leu Ala Ala Ala Thr485 490 495Ser Pro Pro Ala Glu Glu Ala Ala Ser Pro Glu Lys Glu Thr Lys Ser500 505 510Arg Val Lys Glu Glu Ala Lys Ser Pro Gly Glu Ala Lys Ser Pro Gly515 520 525Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Gly Glu Ala Lys Ser530 535 540Pro Gly Glu Ala Lys Ser Pro Gly Glu Ala Lys Ser Pro Ala Glu Pro545 550 555 560Lys Ser Pro Ala Glu Pro Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala565 570 575Glu Pro Lys Ser Pro Ala Thr Val Lys Ser Pro Gly Glu Ala Lys Ser580 585 590Pro Ser Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala Glu Ala595 600 605Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala610 615 620Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala Thr Val Lys Ser625 630 635 640Pro Gly Glu Ala Lys Ser Pro Ser Glu Ala Lys Ser Pro Ala Glu Ala645 650 655Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala660 665 670Glu Val Lys Ser Pro Gly Glu Ala Lys Ser Pro Ala Glu Pro Lys Ser675 680 685Pro Ala Glu Ala Lys Ser Pro Ala Glu Val Lys Ser Pro Ala Glu Ala690 695 700Lys Ser Pro Ala Glu Val Lys Ser Pro Gly Glu Ala Lys Ser Pro Ala705 710 715 720Ala Val Lys Ser Pro Ala Glu Ala Lys Ser Pro Ala Ala Val Lys Ser725 730 735Pro Gly Glu Ala Lys Ser Pro Gly Glu Ala Lys Ser Pro Ala Glu Ala740 745 750Lys Ser Pro Ala Glu Ala Lys Ser Pro Ile Glu Val Lys Ser Pro Glu755 760 765Lys Ala Lys Thr Pro Val Lys Glu Gly Ala Lys Ser Pro Ala Glu Ala770 775 780Lys Ser Pro Glu Lys Ala Lys Ser Pro Val Lys Glu Asp Ile Lys Pro785 790 795 800Pro Ala Glu Ala Lys Ser Pro Glu Lys Ala Lys Ser Pro Val Lys Glu805 810 815Gly Ala Lys Pro Pro Glu Lys Ala Lys Pro Leu Asp Val Lys Ser Pro820 825 830Glu Ala Gln Thr Pro Val Gln Glu Glu Ala Thr Val Pro Thr Asp Ile835 840 845Arg Pro Pro Glu Gln Val Lys Ser Pro Ala Lys Glu Lys Ala Lys Ser850 855 860Pro Glu Lys Glu Glu Ala Lys Thr Ser Glu Lys Val Ala Pro Lys Lys865 870 875 880Glu Glu Val Lys Ser Pro Val Lys Glu Glu Val Lys Ala Lys Glu Pro885 890 895Pro Lys Lys Val Glu Glu Glu Lys Thr Leu Pro Thr Pro Lys Thr Glu900 905 910Ala Lys Glu Ser Lys Lys Asp Glu Ala Pro Lys Glu Ala Pro Lys Pro915 920 925Lys Val Glu Glu Lys Lys Glu Thr Pro Thr Glu Lys Pro Lys Asp Ser930 935 940Thr Ala Glu Ala Lys Lys Glu Glu Ala Gly Glu Lys Lys Lys Ala Val945 950 955 960Ala Ser Glu Glu Glu Thr Pro Ala Lys Leu Gly Val Lys Glu Glu Ala965 970 975Lys Pro Lys Glu Lys Thr Glu Thr Thr Lys Thr Glu Ala Glu Asp Thr980 985 990Lys Ala Lys Glu Pro Ser Lys Pro Thr Glu Thr Glu Lys Pro Lys Lys995 1000 1005Glu Glu Met Pro Ala Ala Pro Glu Lys Lys Asp Thr Lys Glu Glu Lys1010 1015 1020Thr Thr Glu Ser Arg Lys Pro Glu Glu Lys Pro Lys Met Glu Ala Lys1025 1030 1035 1040Val Lys Glu Asp Asp Lys Ser Leu Ser Lys Glu Pro Ser Lys Pro Lys1045 1050 1055Thr Glu Lys Ala Glu Lys Ser Ser Ser Thr Asp Gln Lys Glu Ser Gln1060 1065

1070Pro Pro Glu Lys Thr Thr Glu Asp Lys Ala Thr Lys Gly Glu Lys1075 1080 108543219DNAMus musculus 4atgatgagct tcggcagcgc cgatgcgctg ctgggcgccc cgttcgcgcc gctgcacgga 60ggcggcagcc tgcactactc gctgagccgc aaggcaggcc cgggcggcac gcgctccgcg 120gccggctcct ccagcggctt ccactcgtgg gcgcggacgt ccgtgagctc cgtgtccgcc 180tcacccagcc gcttccgcgg cgccgcctcg agcaccgact cgctagacac cctaagcaac 240ggcccagagg gctgcgtggt ggcggcggtg gcggcgcgca gcgagaagga gcagctgcag 300gctctgaacg accgcttcgc gggctacatc gacaaggtga ggcagctcga ggcgcacaac 360cgcagcctgg agggcgaggc ggcggcgctg cggcagcaac aagccggccg cgccgccatg 420ggcgagctgt acgagcgcga ggtgcgcgag atgcgcggcg ccgtgctgcg cctcggggcg 480gcgcgcgggc agctgcgcct ggagcaggag cacctgctgg aggacatcgc tcacgtccgc 540cagcggctgg acgaggaggc ccggcagcgt gaggaggcgg aggcggcggc gcgcgccctg 600gcgcgcttcg cgcaggaggc ggaagcggcg cgcgtggagc tgcagaagaa ggcgcaggcg 660ctgcaggagg agtgcggcta cctgcggcgc caccaccagg aggaggtggg cgagctgctc 720ggtcagatcc agggctgcgg ggccgcgcag gcgcaggctc aggccgaggc tcgcgacgcc 780ctcaagtgcg acgtgacgtc ggcgctgcgg gagatccgcg cgcagctcga aggccacgcg 840gtgcagagca cgctgcagtc cgaggagtgg ttccgagtga ggttggaccg actctcagag 900gcagccaaag tgaacacaga tgctatgcgc tcggcccaag aggagataac tgagtaccgg 960cggcagctgc aagccaggac cacagagttg gaggccctga aaagcaccaa ggagtcactg 1020gagaggcagc gctctgagct agaggaccgt catcaggcag acattgcctc ctaccaggac 1080gctattcagc agctggacag tgagctgaga aacaccaagt gggagatggc tgcacagctc 1140cgagagtacc aggacctgct caacgtcaag atggccctgg acattgagat tgccgcttac 1200agaaagctcc tggaaggcga agagtgtcgg attggctttg gtccgagtcc cttctctctt 1260actgaaggac tcccaaaaat tccctccata tccacgcaca taaaagtcaa aagcgaagag 1320atgataaagg tagtagagaa atccgagaag gaaactgtga ttgtagaagg acagacagaa 1380gagatccggg tgacggaagg agtgacagaa gaggaggaca aagaggccca aggtcaggaa 1440ggagaagaag cagaagaggg agaagaaaaa gaagaagagg aaggagcagc agctacatct 1500ccccctgcag aagaggctgc atctccagaa aaagaaacca agtctcgtgt gaaagaagag 1560gccaagtccc caggtgaggc caagtcccca ggtgaggcca agtccccagg tgaggccaag 1620tccccagctg aggccaagtc cccaggtgag gccaagtccc cacgtgaggc caagtcccca 1680ggtgaggcca agtctccagc tgagcccaag tctccagctg agcccaagtc tccagctgag 1740gccaagtcac cagctgagcc caagtctcca gctacagtga agtctccagg tgaggccaag 1800tcaccatctg aggccaaatc tccagctgaa gccaaatctc cagctgaggc caaatctcca 1860gctgaggcca aatctccagc tgaggccaag tcaccagctg aagccaagtc accagctgaa 1920gccaaatctc cagctacagt gaagtctcca ggtgaggcca agtcaccatc tgaggccaaa 1980tctccagctg aagccaaatc tccagctgag gccaaatctc cagctgaggc caaatctcca 2040gctgaggtca agtcaccagg tgaggccaag tctccagctg agcccaagtc accagctgag 2100gccaaatctc cagctgcagt gaagtcacca gctgaggcca agtctccagc tgcagtcaag 2160tccccaggtg aggccaagtc cccaggtgag gccaagtcac cagctgaggc caaatctcca 2220gctgaggcca agtcaccaat tgaggtaaaa tctccagaga aggccaagac ccccgtcaag 2280gaaggagcaa aatctccagc tgaggccaag tctcctgaga aggccaagtc ccccgtgaag 2340gaagatatca agcccccagc tgaggcgaaa tcccctgaga aggccaagag ccccatgaag 2400gaaggagcaa agcctcctga gaaggccaag cctctagatg tgaagtctcc ggaagcccag 2460actccagtac aggaggaagc gaacgacccc acagacatca gaccccctga gcaggtgaaa 2520agtcctgcca aggagaaggc caagtcccct gagaaggaag aagccaagac ttctgaaaag 2580gtggctccca agaaggaaga ggtgaagtcc cctgtgaagg aggaggtaaa agccaaagaa 2640cccccaaaga aggtagaaga agagaagaca ctgcctacac caaagacaga ggcgaaggag 2700agtaagaaag acgaagctcc caaggaggcc ccgaagccca aggtggagga gaagaaggaa 2760actcccacgg aaaagcccaa ggactctaca gcagaagcca agaaggaaga ggctggagag 2820aagaagaaag ccgtggcctc agaggaggag actcctgcca agttgggtgt gaaggaagaa 2880gctaaaccca aagagaagac agagacaacc aagacagaag cagaagacac caaggccaaa 2940gaacctagca aacccacaga gacggaaaag ccaaagaaag aggagatgcc agcggcacca 3000gagaagaaag acaccaagga ggagaagacc acagagtcca ggaagcctga ggagaagccc 3060aaaatggagg ccaaggtcaa ggaggatgac aagagccttt ccaaagagcc tagcaaaccc 3120aagacagaaa aggctgaaaa atcctctagc acagaccaga aagaaagcca gcccccagag 3180aagaccacag aggacaaggc caccaaggga gagaagtaa 32195925PRTBos taurus 5Ser Tyr Thr Leu Asp Ser Leu Gly Asn Pro Ser Ala Tyr Arg Arg Val1 5 10 15Thr Glu Thr Arg Ser Ser Phe Ser Arg Ile Ser Gly Ser Pro Ser Ser20 25 30Gly Phe Arg Ser Gln Ser Trp Ser Arg Gly Ser Pro Ser Thr Val Ser35 40 45Ser Ser Tyr Lys Arg Ser Ala Leu Ala Pro Arg Leu Thr Tyr Ser Ser50 55 60Ala Met Leu Ser Ser Ala Glu Ser Ser Leu Asp Phe Ser Gln Ser Ser65 70 75 80Ser Leu Leu Asp Gly Gly Ser Gly Pro Gly Gly Asp Tyr Lys Leu Ser85 90 95Arg Ser Asn Glu Lys Glu Gln Ile Gln Gly Leu Asn Asp Arg Phe Ala100 105 110Gly Tyr Ile Glu Lys Val His Tyr Leu Glu Gln Gln Asn Lys Glu Ile115 120 125Glu Ala Glu Ile Gln Ala Leu Arg Gln Lys Gln Ala Ser His Ala Gln130 135 140Leu Gly Asp Ala Tyr Asp Gln Glu Ile Arg Glu Leu Arg Ala Thr Leu145 150 155 160Glu Met Val Asn His Glu Lys Ala Gln Val Gln Leu Asp Ser Asp His165 170 175Leu Glu Glu Asp Ile His Arg Leu Lys Glu Arg Phe Glu Glu Glu Ala180 185 190Arg Leu Arg Asp Asp Thr Glu Ala Ala Ile Arg Ala Leu Arg Lys Asp195 200 205Ile Glu Glu Ser Ser Leu Val Lys Val Glu Leu Asp Lys Lys Val Gln210 215 220Ser Leu Gln Asp Glu Val Ala Phe Leu Arg Ser Asn His Glu Glu Glu225 230 235 240Val Ala Asp Leu Leu Ala Gln Ile Gln Ala Ser His Ile Thr Val Glu245 250 255Arg Lys Asp Tyr Leu Lys Thr Asp Ile Ser Thr Ala Leu Lys Glu Ile260 265 270Arg Ser Gln Leu Glu Ser His Ser Asp Gln Asn Met His Gln Ala Glu275 280 285Glu Trp Phe Lys Cys Arg Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln290 295 300Asn Lys Glu Ala Ile Arg Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg305 310 315 320Arg Gln Leu Gln Ser Lys Ser Ile Glu Leu Glu Ser Val Arg Gly Thr325 330 335Lys Glu Ser Leu Glu Arg Gln Leu Ser Asp Ile Glu Glu Arg His Asn340 345 350His Asp Leu Ser Ser Tyr Gln Asp Thr Ile Gln Gln Leu Glu Asn Glu355 360 365Leu Arg Gly Thr Lys Trp Glu Met Ala Arg His Leu Arg Glu Tyr Gln370 375 380Asp Leu Leu Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr385 390 395 400Arg Lys Leu Leu Glu Gly Glu Glu Thr Arg Phe Ser Thr Phe Ala Gly405 410 415Ser Ile Thr Gly Pro Leu Tyr Thr His Arg Gln Pro Ser Ile Ala Ile420 425 430Ser Ser Lys Ile Gln Lys Thr Lys Val Glu Ala Pro Lys Leu Lys Val435 440 445Gln His Lys Phe Val Glu Glu Ile Ile Glu Glu Thr Lys Val Glu Asp450 455 460Glu Lys Ser Glu Met Glu Glu Ala Leu Thr Ala Ile Thr Glu Glu Leu465 470 475 480Ala Val Ser Val Lys Glu Glu Val Lys Glu Glu Glu Ala Glu Glu Lys485 490 495Glu Glu Lys Glu Glu Ala Glu Glu Glu Val Val Ala Ala Lys Lys Ser500 505 510Pro Val Lys Ala Thr Ala Pro Glu Leu Lys Glu Glu Glu Gly Glu Lys515 520 525Glu Glu Glu Glu Gly Gln Glu Glu Glu Glu Glu Glu Glu Glu Ala Ala530 535 540Lys Ser Asp Gln Ala Glu Glu Gly Gly Ser Glu Lys Glu Gly Ser Ser545 550 555 560Glu Lys Glu Glu Gly Glu Gln Glu Glu Glu Gly Glu Thr Glu Ala Glu565 570 575Gly Glu Gly Glu Glu Ala Ala Ala Glu Ala Lys Glu Glu Lys Lys Met580 585 590Glu Glu Lys Ala Glu Glu Val Ala Pro Lys Glu Glu Leu Ala Ala Glu595 600 605Ala Lys Val Glu Lys Pro Glu Lys Ala Lys Ser Pro Val Ala Lys Ser610 615 620Pro Thr Thr Lys Ser Pro Thr Ala Lys Ser Pro Glu Ala Lys Ser Pro625 630 635 640Glu Ala Lys Ser Pro Thr Ala Lys Ser Pro Thr Ala Lys Ser Pro Val645 650 655Ala Lys Ser Pro Thr Ala Lys Ser Pro Glu Ala Lys Ser Pro Glu Ala660 665 670Lys Ser Pro Thr Ala Lys Ser Pro Thr Ala Lys Ser Pro Ala Ala Lys675 680 685Ser Pro Ala Pro Lys Ser Pro Val Glu Glu Val Lys Pro Lys Ala Glu690 695 700Ala Gly Ala Glu Lys Gly Glu Gln Lys Glu Lys Val Glu Glu Glu Lys705 710 715 720Lys Glu Ala Lys Glu Ser Pro Lys Glu Glu Lys Ala Glu Lys Lys Glu725 730 735Glu Lys Pro Lys Asp Val Pro Glu Lys Lys Lys Ala Glu Ser Pro Val740 745 750Lys Ala Glu Ser Pro Val Lys Glu Glu Val Pro Ala Lys Pro Val Lys755 760 765Val Ser Pro Glu Lys Glu Ala Lys Glu Glu Glu Lys Pro Gln Glu Lys770 775 780Glu Lys Glu Lys Glu Lys Val Glu Glu Val Gly Gly Lys Glu Glu Gly785 790 795 800Gly Leu Lys Glu Ser Arg Lys Glu Asp Ile Ala Ile Asn Gly Glu Val805 810 815Glu Gly Lys Glu Glu Glu Gln Glu Thr Lys Glu Lys Gly Ser Gly Gly820 825 830Glu Glu Glu Lys Gly Val Val Thr Asn Gly Leu Asp Val Ser Pro Gly835 840 845Asp Glu Lys Lys Gly Gly Asp Lys Ser Glu Glu Lys Val Val Val Thr850 855 860Lys Met Val Glu Lys Ile Thr Ser Glu Gly Gly Asp Gly Ala Thr Lys865 870 875 880Tyr Ile Thr Lys Ser Val Thr Val Thr Gln Lys Val Glu Glu His Glu885 890 895Glu Thr Phe Glu Glu Lys Leu Val Ser Thr Lys Lys Val Glu Lys Val900 905 910Thr Ser His Ala Ile Val Lys Glu Val Thr Gln Ser Asp915 920 92562433DNABos Taurus 6gagaaggtcc actacctgga gcagcagaac aaggagatcg aggcagagat ccaggcgctg 60cggcagaagc aggcctcgca cgcccagctg ggcgacgcgt acgaccagga aatccgcgag 120ctacgcgcca ccctggagat ggtgaaccat gagaaggctc aggtacagct ggactcggac 180cacctggaag aggatatcca ccggctcaag gagcgcttcg aggaggaggc acggctgcgc 240gacgacaccg aggcggctat ccgcgcgctg cgcaaagata tcgaggagtc gtcgctggtc 300aaggtggagc tggacaagaa ggtgcagtcg ctgcaggatg aggtggcctt cctgcggagc 360aatcacgagg aggaggtggc cgacctgctg gcccagatcc aagcgtcgca catcacggtg 420gagcgcaaag actacctgaa gacggacatc tcgacggcgc tgaaagagat ccgctcccag 480ctcgagagtc actccgacca gaacatgcac caggccgaag agtggtttaa gtgccgctac 540gccaagctca ccgaggcggc cgagcagaac aaggaagcca tccgctccgc caaggaagag 600atcgccgagt accggcgcca gctgcagtcc aagagcatcg agctcgagtc agtgcgcggc 660accaaggagt ccctggagcg gcagctcagc gacatcgagg agcgccacaa ccacgacctt 720agcagctacc aggacaccat ccagcagctg gaaaatgagc ttcggggcac aaagtgggaa 780atggctcgtc atctgcgaga ataccaggat ctcctcaacg tcaagatggc tctggatatt 840gagatcgcgg cgtacaggaa actcctggag ggtgaagaga ccagatttag cacatttgcg 900ggtagcatca ctgggccact gtatacacac cgacagccct ccatcgccat atccagtaag 960attcagaaaa ccaaggtaga ggctcccaag ctaaaggtcc aacacaaatt tgttgaggag 1020attatagagg aaaccaaggt ggaagatgag aaatcagaaa tggaagaagc cctgacggcc 1080attaccgagg aattggccgt ttccgtgaaa gaggaggtca aggaagagga ggctgaagaa 1140aaggaggaga aagaagaagc cgaagaagaa gttgttgctg ccaaaaagtc tccagtgaaa 1200gctactgcac ctgaacttaa agaagaggaa ggagaaaagg aggaggaaga gggccaagag 1260gaagaggaag aggaagaaga ggctgctaag tcagaccaag ccgaggaagg aggatctgag 1320aaggaaggtt ctagtgaaaa agaggaaggt gagcaagaag aggaaggaga aacagaggct 1380gagggggaag gagaggaagc cgctgccgaa gctaaggagg aaaagaaaat ggaggaaaag 1440gctgaagaag tggctccaaa ggaggagctg gcggcagaag ccaaggtgga gaagccagag 1500aaagccaagt ccccagtggc caagtcccca acaacaaagt ccccaacggc caagtcccca 1560gaggcaaagt ccccagaggc aaagtcccca acagcaaaat ccccgacggc caagtcccca 1620gtggccaagt ccccgacggc caagtcccca gaggcaaagt ccccagaggc aaagtcccca 1680acagcaaaat ccccgacggc caagtcccca gcagcaaagt ccccagcgcc aaaatcacct 1740gtggaggaag tgaaacccaa agcagaagct ggagctgaga aaggagaaca gaaggagaag 1800gtggaggaag aaaagaaaga agcaaaggaa tctcccaagg aagagaaggc agagaaaaag 1860gaggagaagc caaaggatgt gccagagaag aagaaggctg aatccccagt gaaggctgag 1920tccccagtga aggaggaggt gcctgccaag ccagtaaagg tgagcccaga gaaggaagcc 1980aaagaggagg agaagccaca ggagaaagag aaggagaagg agaaagtgga agaggtggga 2040gggaaggagg agggaggttt gaaggaatcc aggaaggaag acatagccat caatggggag 2100gtggaaggga aggaggaaga acaggaaact aaggagaaag gcagtggggg agaagaggag 2160aaaggagtcg tcaccaacgg cctagacgtg agcccagggg atgaaaagaa gggcggtgat 2220aaaagtgagg agaaagtggt ggtaaccaaa atggtggaaa aaatcaccag tgagggggga 2280gatggtgcta ccaagtatat caccaaatct gtaaccgtca ctcaaaaggt cgaagagcat 2340gaagagacct ttgaggagaa actagtgtct actaaaaagg tagagaaagt cacttcacac 2400gccatagtaa aggaagtcac ccagagtgac taa 24337857PRTGallus gallus 7Ser Tyr Thr Met Glu Pro Leu Gly Asn Pro Ser Tyr Arg Arg Val Met1 5 10 15Thr Glu Thr Arg Ala Thr Tyr Ser Arg Ala Ser Ala Ser Pro Ser Ser20 25 30Gly Phe Arg Ser Gln Ser Trp Ser Arg Gly Ser Gly Ser Thr Val Ser35 40 45Ser Ser Tyr Lys Arg Thr Asn Leu Gly Ala Pro Arg Thr Ala Tyr Gly50 55 60Ser Thr Val Leu Ser Ser Ala Glu Ser Leu Asp Val Ser Gln Ser Ser65 70 75 80Leu Leu Asn Gly Ala Ala Glu Leu Lys Leu Ser Arg Ser Asn Glu Lys85 90 95Glu Gln Leu Gln Gly Leu Asn Asp Arg Phe Ala Gly Tyr Ile Glu Lys100 105 110Val His Tyr Leu Glu Gln Gln Asn Lys Glu Ile Glu Ala Glu Leu Ala115 120 125Ala Leu Arg Gln Lys His Ala Gly Arg Ala Gln Leu Gly Asp Ala Tyr130 135 140Glu Gln Glu Leu Arg Glu Leu Arg Gly Ala Leu Glu Gln Val Ser His145 150 155 160Glu Lys Ala Gln Ile Gln Leu Asp Ser Glu His Ile Glu Glu Asp Ile165 170 175Gln Arg Leu Arg Glu Arg Phe Glu Asp Glu Ala Arg Leu Arg Asp Glu180 185 190Thr Glu Ala Thr Ile Ala Ala Leu Arg Lys Glu Met Glu Glu Ala Ser195 200 205Leu Met Arg Ala Glu Leu Asp Lys Lys Val Gln Ser Leu Gln Asp Glu210 215 220Val Ala Phe Leu Arg Gly Asn His Glu Glu Glu Val Ala Glu Leu Leu225 230 235 240Ala Gln Leu Gln Ala Ser His Ala Thr Val Glu Arg Lys Asp Tyr Leu245 250 255Lys Thr Asp Leu Thr Thr Ala Leu Lys Glu Ile Arg Ala Gln Leu Glu260 265 270Cys Gln Ser Asp His Asn Met His Gln Ala Glu Glu Trp Phe Lys Cys275 280 285Arg Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln Asn Lys Glu Ala Ile290 295 300Arg Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg Arg Gln Leu Gln Ser305 310 315 320Lys Ser Ile Glu Leu Glu Ser Val Arg Gly Thr Lys Glu Ser Leu Glu325 330 335Arg Gln Leu Ser Asp Ile Glu Glu Arg His Asn Asn Asp Leu Thr Thr340 345 350Tyr Gln Asp Thr Ile His Gln Leu Glu Asn Glu Leu Arg Gly Thr Lys355 360 365Trp Glu Met Ala Arg His Leu Arg Glu Tyr Gln Asp Leu Leu Asn Val370 375 380Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr Arg Lys Leu Leu Glu385 390 395 400Gly Glu Glu Thr Arg Phe Ser Ala Phe Ser Gly Ser Ile Thr Gly Pro405 410 415Ile Phe Thr His Arg Gln Pro Ser Val Thr Ile Ala Ser Thr Lys Ile420 425 430Gln Lys Thr Lys Ile Glu Pro Pro Lys Leu Lys Val Gln His Lys Phe435 440 445Val Glu Glu Ile Ile Glu Glu Thr Lys Val Glu Asp Glu Lys Ser Glu450 455 460Met Glu Asp Ala Leu Ser Ala Ile Ala Glu Glu Met Ala Ala Lys Ala465 470 475 480Gln Glu Glu Glu Gln Glu Glu Glu Lys Ala Glu Glu Glu Ala Val Glu485 490 495Glu Glu Ala Val Ser Glu Lys Ala Ala Glu Gln Ala Ala Glu Glu Glu500 505 510Glu Lys Glu Glu Glu Glu Ala Glu Glu Glu Glu Ala Ala Lys Ser Asp515 520 525Ala Ala Glu Glu Gly Gly Ser Lys Lys Glu Glu Ile Glu Glu Lys Glu530 535 540Glu Gly Glu Glu Ala Glu Glu Glu Glu Ala Glu Ala Lys Gly Lys Ala545 550 555 560Glu Glu Ala Gly Ala Lys Val Glu Lys Val Lys Ser Pro Pro Ala Lys565 570 575Ser Pro Pro Lys Ser Pro Pro Lys Ser Pro Val Thr Glu Gln Ala Lys580 585 590Ala Val Gln Lys Ala Ala Ala Glu Val Gly Lys Asp Gln Lys Ala Glu595 600 605Lys Ala Ala Glu Lys Ala Ala Lys Glu Glu Lys Ala Ala Ser Pro Glu610 615 620Lys Pro Ala Thr Pro Lys Val Thr Ser Pro Glu Lys Pro Ala Thr Pro625 630 635 640Glu Lys Pro Pro Thr Pro Glu Lys Ala Ile Thr Pro Glu Lys Val Arg645 650 655Ser Pro Glu Lys Pro Thr Thr Pro Glu Lys Val Val Ser Pro Glu Lys660 665 670Pro Ala Ser Pro Glu Lys Pro Arg Thr Pro Glu Lys Pro Ala Ser Pro675 680 685Glu Lys Pro Ala Thr Pro Glu Lys Pro Arg Thr Pro Glu Lys Pro Ala690

695 700Thr Pro Glu Lys Pro Arg Ser Pro Glu Lys Pro Ser Ser Pro Leu Lys705 710 715 720Asp Glu Lys Ala Val Val Glu Glu Ser Ile Thr Val Thr Lys Val Thr725 730 735Lys Val Thr Ala Glu Val Glu Val Ser Lys Glu Ala Arg Lys Glu Asp740 745 750Ile Ala Val Asn Gly Glu Val Glu Glu Lys Lys Asp Glu Ala Lys Glu755 760 765Lys Glu Ala Glu Glu Glu Glu Lys Gly Val Val Thr Asn Gly Leu Asp770 775 780Val Ser Pro Val Asp Glu Lys Gly Glu Lys Val Val Val Thr Lys Lys785 790 795 800Ala Glu Lys Ile Thr Ser Glu Gly Gly Asp Ser Thr Thr Thr Tyr Ile805 810 815Thr Lys Ser Val Thr Val Thr Gln Lys Val Glu Glu His Glu Glu Ser820 825 830Phe Glu Glu Lys Leu Val Ser Thr Lys Lys Val Glu Lys Val Thr Ser835 840 845His Ala Val Val Lys Glu Ile Lys Glu850 85581530DNAGallus gallus 8gacctcacca cctatcagga cacgatccat cagctggaaa atgagctcag aggaacgaag 60tgggagatgg cacgtcattt gagggagtac caggatctcc tcaatgtcaa gatggccctg 120gatatcgaaa ttgctgcata caggaagctg ctggagggtg aggagacaag attcagtgcc 180ttctctggaa gcatcactgg acccatattc acacacagac aaccatcggt cacaatagca 240tccactaaaa tacagaaaac caaaatcgag ccaccaaagc tgaaggtcca gcacaaattt 300gtagaagaaa tcattgaaga gacgaaagta gaggatgaga agtctgaaat ggaagatgcc 360ctctcagcca ttgcagaaga aatggcagca aaggctcagg aggaagaaca ggaggaggaa 420aaggcagaag aagaagctgt agaggaagaa gctgtttctg agaaggctgc agaacaggca 480gctgaggaag aagagaagga ggaagaagaa gcagaggagg aagaagctgc aaaatcagac 540gctgcagaag aaggaggctc taaaaaggaa gaaatagagg aaaaggaaga aagggaggag 600gctgaagaag aagaagctga agccaagggc aaagctgaag aggcaggtgc aaaggtagaa 660aaagtgaaat cacctcctgc aaagtcaccc cctaaatccc cccctaaatc ccctgtaaca 720gagcaagcca aggccgtcca gaaagcagca gcagaggtag gaaaggatca gaaagcagag 780aaagctgctg agaaggcagc caaggaggag aaggcagcat ccccagagaa gccggcgaca 840ccaaaggtga cctccccgga gaaaccagcg actccggaga aaccaccaac cccagagaaa 900gcgatcaccc cggagaaggt ccgttcccca gaaaaaccaa caaccccgga aaaagtggtg 960agcccagaga aaccagcaag cccagagaag ccccgaaccc cagagaaacc agcaagcccc 1020gaaaaaccgg caacaccaga gaagccccgc actcctgaaa agccagcgac gccggagaag 1080ccccgttctc cagagaagcc atcctccccg ctcaaagatg aaaaggctgt ggtggaggag 1140agcatcactg tcacaaaggt aacaaaagtc actgcagagg tggaggtgtc gaaggaagcc 1200aggaaagaag acattgcggt gaatggtgaa gtggaggaga agaaggatga ggcgaaggag 1260aaggaggctg aggaggaaga gaagggcgtt gtcaccaatg ggctcgatgt gagccccgtc 1320gatgagaagg gtgagaaagt tgtagtaacc aaaaaagcag agaaaatcac aagtgaagga 1380ggggacagta ctaccacgta catcacgaag tcggtgacgg tcactcagaa ggtggaggaa 1440cacgaagaga gctttgagga gaaattggtg tccactaaga aagtggagaa agttacttca 1500catgctgtag taaaagagat taaagaatga 15309915PRTHomo sapiens 9Ser Tyr Thr Leu Asp Ser Leu Gly Asn Pro Ser Ala Tyr Arg Arg Val1 5 10 15Thr Glu Thr Arg Ser Ser Phe Ser Arg Val Ser Gly Ser Pro Ser Ser20 25 30Gly Phe Arg Ser Gln Ser Trp Ser Arg Gly Ser Pro Ser Thr Val Ser35 40 45Ser Ser Tyr Lys Arg Ser Met Leu Ala Pro Arg Leu Ala Tyr Ser Ser50 55 60Ala Met Leu Ser Ser Ala Glu Ser Ser Leu Asp Phe Ser Gln Ser Ser65 70 75 80Ser Leu Leu Asn Gly Gly Ser Gly Pro Gly Gly Asp Tyr Lys Leu Ser85 90 95Arg Ser Asn Glu Lys Glu Gln Leu Gln Gly Leu Asn Asp Arg Phe Ala100 105 110Gly Tyr Ile Glu Lys Val His Tyr Leu Glu Gln Gln Asn Lys Glu Ile115 120 125Glu Ala Glu Ile Gln Ala Leu Arg Gln Lys Gln Ala Ser His Ala Gln130 135 140Leu Gly Asp Ala Tyr Asp Gln Glu Ile Arg Glu Leu Arg Ala Thr Leu145 150 155 160Glu Met Val Asn His Glu Lys Ala Gln Val Gln Leu Asp Ser Asp His165 170 175Leu Glu Glu Asp Ile His Arg Leu Lys Glu Arg Phe Glu Glu Glu Ala180 185 190Arg Leu Arg Asp Asp Thr Glu Ala Ala Ile Arg Ala Leu Arg Lys Asp195 200 205Ile Glu Glu Ala Ser Leu Val Lys Val Glu Leu Asp Lys Lys Val Gln210 215 220Ser Leu Gln Asp Glu Val Ala Phe Leu Arg Ser Asn His Glu Glu Glu225 230 235 240Val Ala Asp Leu Leu Ala Gln Ile Gln Ala Ser His Ile Thr Val Glu245 250 255Arg Lys Asp Tyr Leu Lys Thr Asp Ile Ser Thr Ala Leu Lys Glu Ile260 265 270Arg Ser Gln Leu Glu Ser His Ser Asp Gln Asn Met His Gln Ala Glu275 280 285Glu Trp Phe Lys Cys Arg Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln290 295 300Asn Lys Glu Ala Ile Arg Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg305 310 315 320Arg Gln Leu Gln Ser Lys Ser Ile Glu Leu Glu Ser Val Arg Gly Thr325 330 335Lys Glu Ser Leu Glu Arg Gln Leu Ser Asp Ile Glu Glu Arg His Asn340 345 350His Asp Leu Ser Ser Tyr Gln Asp Thr Ile Gln Gln Leu Glu Asn Glu355 360 365Leu Arg Gly Thr Lys Trp Glu Met Ala Arg His Leu Arg Glu Tyr Gln370 375 380Asp Leu Leu Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr385 390 395 400Arg Lys Leu Leu Glu Gly Glu Glu Thr Arg Phe Ser Thr Phe Ala Gly405 410 415Ser Ile Thr Gly Pro Leu Tyr Thr His Arg Pro Pro Ile Thr Ile Ser420 425 430Ser Lys Ile Gln Lys Thr Lys Val Glu Ala Pro Lys Leu Lys Val Gln435 440 445His Lys Phe Val Glu Glu Ile Ile Glu Glu Thr Lys Val Glu Asp Glu450 455 460Lys Ser Glu Met Glu Glu Ala Leu Thr Ala Ile Thr Glu Glu Leu Ala465 470 475 480Ala Ser Met Lys Glu Glu Lys Lys Glu Ala Ala Glu Glu Lys Glu Glu485 490 495Glu Pro Glu Ala Glu Glu Glu Glu Val Ala Ala Lys Lys Ser Pro Val500 505 510Lys Ala Thr Ala Pro Glu Val Lys Glu Glu Glu Gly Glu Lys Glu Glu515 520 525Glu Glu Gly Gln Glu Glu Glu Glu Glu Glu Asp Glu Gly Ala Lys Ser530 535 540Asp Gln Ala Glu Glu Gly Gly Ser Glu Lys Glu Gly Ser Ser Glu Lys545 550 555 560Glu Glu Gly Glu Gln Glu Glu Gly Glu Thr Glu Ala Glu Ala Glu Gly565 570 575Glu Glu Ala Glu Ala Lys Glu Glu Lys Lys Val Glu Glu Lys Ser Glu580 585 590Glu Val Ala Thr Lys Glu Glu Leu Val Ala Asp Ala Lys Val Glu Lys595 600 605Pro Glu Lys Ala Lys Ser Pro Val Pro Lys Ser Pro Val Glu Glu Lys610 615 620Gly Lys Ser Pro Val Pro Lys Ser Pro Val Glu Glu Lys Gly Lys Ser625 630 635 640Pro Val Pro Lys Ser Pro Val Glu Glu Lys Gly Lys Ser Pro Val Pro645 650 655Lys Ser Pro Val Glu Glu Lys Gly Lys Ser Pro Val Ser Lys Ser Pro660 665 670Val Glu Glu Lys Ala Lys Ser Pro Val Pro Lys Ser Pro Val Glu Glu675 680 685Ala Lys Ser Lys Ala Glu Val Gly Lys Gly Glu Gln Lys Glu Glu Glu690 695 700Glu Lys Glu Val Lys Glu Ala Pro Lys Glu Glu Lys Val Glu Lys Lys705 710 715 720Glu Glu Lys Pro Lys Asp Val Pro Glu Lys Lys Lys Ala Glu Ser Pro725 730 735Val Lys Glu Glu Ala Val Ala Glu Val Val Thr Ile Thr Lys Ser Val740 745 750Lys Val His Leu Glu Lys Glu Thr Lys Glu Glu Gly Lys Pro Leu Gln755 760 765Gln Glu Lys Glu Lys Glu Lys Ala Gly Gly Glu Gly Gly Ser Glu Glu770 775 780Glu Gly Ser Asp Lys Gly Ala Lys Gly Ser Arg Lys Glu Asp Ile Ala785 790 795 800Val Asn Gly Glu Val Glu Gly Lys Glu Glu Val Glu Gln Glu Thr Lys805 810 815Glu Lys Gly Ser Gly Arg Glu Glu Glu Lys Gly Val Val Thr Asn Gly820 825 830Leu Asp Leu Ser Pro Ala Asp Glu Lys Lys Gly Gly Asp Lys Ser Glu835 840 845Glu Lys Val Val Val Thr Lys Thr Val Glu Lys Ile Thr Ser Glu Gly850 855 860Gly Asp Gly Ala Thr Lys Tyr Ile Thr Lys Ser Val Thr Val Thr Gln865 870 875 880Lys Val Glu Glu His Glu Glu Thr Phe Glu Glu Lys Leu Val Ser Thr885 890 895Lys Lys Val Glu Lys Val Thr Ser His Ala Ile Val Lys Glu Val Thr900 905 910Gln Ser Asp915102751DNAHomo sapiens 10atgagctaca cgttggactc gctgggcaac ccgtccgcct accggcgggt aaccgagacc 60cgctcgagct tcagccgcgt cagcggctcc ccgtccagtg gcttccgctc gcagtcgtgg 120tcccgcggct cgcccagcac cgtgtcctcc tcctataagc gcagcatgct cgccccgcgc 180ctcgcttaca gctcggccat gctcagctcc gccgagagca gccttgactt cagccagtcc 240tcgtccctgc tcaacggcgg ctccggaccc ggcggcgact acaagctgtc ccgctccaac 300gagaaggagc agctgcaggg gctgaacgac cgctttgccg gctacataga gaaggtgcac 360tacctggagc agcagaataa ggagattgag gcggagatcc aggcgctgcg gcagaagcag 420gcctcgcacg cccagctggg cgacgcgtac gaccaggaga tccgcgagct gcgcgccacc 480ctggagatgg tgaaccacga gaaggctcag gtgcagctgg actcggacca cctggaggaa 540gacatccacc ggctcaagga gcgctttgag gaggaggcgc ggttgcggga cgacactgag 600gcggccatcc gggcgctgcg caaagacatc gaggaggcgt cgctggtcaa ggtggagctg 660gacaagaagg tgcagtcgct gcaggatgag gtggccttcc tgcggagcaa ccacgaggag 720gaggtggccg accttctggc ccagatccag gcatcgcaca tcacggtgga gcgcaaagac 780tacctgaaga cagacatctc gacggcgctg aaggaaatcc gctcccagct cgaaagccac 840tcagaccaga atatgcacca ggccgaagag tggttcaaat gccgctacgc caagctcacc 900gaggcggccg agcagaacaa ggaggccatc cgctccgcca aggaagagat cgccgagtac 960cggcgccagc tgcagtccaa gagcatcgag ctagagtcgg tgcgcggcac caaggagtcc 1020ctggagcggc agctcagcga catcgaggag cgccacaacc acgacctcag cagctaccag 1080gacaccatcc agcagctgga aaatgagctt cggggcacaa agtgggaaat ggctcgtcat 1140ttgcgcgaat accaggacct cctcaacgtc aagatggctc tggatataga aatcgctgcg 1200tacagaaaac tcctggaggg tgaagagact agatttagca catttgcagg aagcatcact 1260gggccactgt atacacaccg acccccaatc acaatatcca gtaagattca gaaaaccaag 1320gtggaagctc ccaagcttaa ggtccaacac aaatttgtcg aggagatcat agaggaaacc 1380aaagtggagg atgagaagtc agaaatggaa gaggccctga cagccattac agaggaattg 1440gccgcttcca tgaaggaaga gaagaaagaa gcagcagaag aaaaggaaga ggaacccgaa 1500gctgaagaag aagaagtagc tgccaaaaag tctccagtga aagcaactgc acctgaagtt 1560aaagaagagg aaggggaaaa ggaggaagaa gaaggccagg aagaagagga ggaagaagat 1620gagggagcta agtcagacca agccgaagag ggaggatccg agaaggaagg ctctagtgaa 1680aaagaggaag gtgagcagga agaaggagaa acagaagctg aagctgaagg agaggaagcc 1740gaagctaaag aggaaaagaa agtggaggaa aagagtgagg aagtggctac caaggaggag 1800ctggtggcag atgccaaggt ggaaaagcca gaaaaagcca agtctcctgt gccaaaatca 1860ccagtggaag agaaaggcaa gtctcctgtg cccaagtcac cagtggaaga gaaaggcaag 1920tctcctgtgc ccaagtcacc agtggaagag aaaggcaagt ctcctgtgcc gaaatcacca 1980gtggaagaga aaggcaagtc tcctgtgtca aaatcaccag tggaagagaa agccaaatct 2040cctgtgccaa aatcaccagt ggaagaggca aagtcaaaag cagaagtggg gaaaggtgaa 2100cagaaagagg aagaagaaaa ggaagtcaag gaagctccca aggaagagaa ggtagagaaa 2160aaggaagaga aaccaaagga tgtgccagag aagaagaaag ctgagtcccc tgtaaaggag 2220gaagctgtgg cagaggtggt caccatcacc aaatcggtaa aggtgcactt ggagaaagag 2280accaaagaag aggggaagcc actgcagcag gagaaagaga aggagaaagc gggaggagag 2340ggaggaagtg aggaggaagg gagtgataaa ggtgccaagg gatccaggaa ggaagacata 2400gctgtcaatg gggaggtaga aggaaaagag gaggtagagc aggagaccaa ggaaaaaggc 2460agtgggaggg aagaggagaa aggcgttgtc accaatggcc tagacttgag cccagcagat 2520gaaaagaagg ggggtgataa aagtgaggag aaagtggtgg tgaccaaaac ggtagaaaaa 2580atcaccagtg aggggggaga tggtgctacc aaatacatca ctaaatctgt aaccgtcact 2640caaaaggttg aagagcatga agagaccttt gaggagaaac tagtgtctac taaaaaggta 2700gaaaaagtca cttcacacgc catagtaaag gaagtcaccc agagtgacta a 275111848PRTMus musculus 11Ser Tyr Thr Leu Asp Ser Leu Gly Asn Pro Ser Ala Tyr Arg Arg Val1 5 10 15Pro Thr Glu Thr Arg Ser Ser Phe Ser Arg Val Ser Gly Ser Pro Ser20 25 30Ser Gly Phe Arg Ser Gln Ser Trp Ser Arg Gly Ser Pro Ser Thr Val35 40 45Ser Ser Ser Tyr Thr Arg Ser Ala Val Ala Pro Arg Leu Ala Tyr Ser50 55 60Ser Ala Met Leu Ser Ser Ala Glu Ser Ser Leu Asp Phe Ser Gln Ser65 70 75 80Ser Ser Leu Leu Asn Gly Gly Ser Gly Gly Asp Tyr Lys Leu Ser Arg85 90 95Ser Asn Glu Lys Glu Gln Leu Gln Gly Leu Asn Asp Arg Phe Ala Gly100 105 110Tyr Ile Glu Lys Val His Tyr Leu Glu Gln Gln Asn Lys Glu Ile Glu115 120 125Ala Glu Ile Gln Ala Leu Arg Gln Lys Gln Ala Ser His Ala Gln Leu130 135 140Gly Asp Ala Tyr Asp Gln Glu Ile Arg Glu Leu Arg Ala Thr Leu Glu145 150 155 160Met Val Asn His Glu Lys Ala Gln Val Gln Leu Asp Ser Asp His Leu165 170 175Glu Glu Asp Ile His Arg Leu Lys Glu Arg Phe Glu Glu Glu Ala Arg180 185 190Leu Arg Asp Asp Thr Glu Ala Ala Ile Arg Ala Leu Arg Lys Asp Ile195 200 205Glu Glu Ser Ser Met Val Lys Val Glu Leu Asp Lys Lys Val Gln Ser210 215 220Leu Gln Asp Glu Val Ala Phe Leu Arg Arg Asn His Glu Glu Glu Val225 230 235 240Ala Asp Leu Leu Ala Gln Ile Gln Ala Ser His Ile Thr Val Glu Arg245 250 255Lys Asp Tyr Leu Lys Thr Asp Ile Ser Thr Ala Leu Lys Glu Ile Arg260 265 270Ser Gln Leu Glu Cys His Ser Asp Gln Asn Met His Gln Ala Glu Glu275 280 285Trp Phe Lys Cys Arg Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln Asn290 295 300Lys Glu Ala Ile Arg Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg Arg305 310 315 320Gln Leu Gln Ser Lys Ser Ile Glu Leu Glu Ser Val Arg Gly Thr Lys325 330 335Glu Ser Leu Glu Arg Gln Leu Ser Asp Ile Glu Glu Arg His Asn His340 345 350Asp Leu Ser Ser Tyr Gln Asp Thr Ile Gln Gln Leu Glu Asn Glu Leu355 360 365Arg Gly Thr Lys Trp Glu Met Ala Arg His Leu Arg Glu Tyr Gln Asp370 375 380Leu Leu Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr Arg385 390 395 400Lys Leu Leu Glu Gly Glu Glu Thr Arg Phe Ser Thr Phe Ser Gly Ser405 410 415Ile Thr Gly Pro Leu Tyr Thr His Arg Gln Pro Ser Val Thr Ile Ser420 425 430Ser Lys Ile Gln Lys Thr Lys Val Glu Ala Pro Lys Leu Lys Val Gln435 440 445His Lys Phe Val Glu Glu Ile Ile Glu Glu Thr Lys Val Glu Asp Glu450 455 460Lys Ser Glu Met Glu Glu Thr Leu Thr Ala Ile Ala Glu Glu Leu Ala465 470 475 480Ala Ser Ala Lys Glu Glu Lys Glu Glu Ala Glu Glu Lys Glu Glu Glu485 490 495Pro Glu Ala Glu Lys Ser Pro Val Lys Ser Pro Glu Ala Lys Glu Glu500 505 510Glu Glu Glu Gly Glu Lys Glu Glu Glu Glu Glu Gly Gln Glu Glu Glu515 520 525Glu Glu Glu Asp Glu Gly Val Lys Ser Asp Gln Ala Glu Glu Gly Gly530 535 540Ser Glu Lys Glu Gly Ser Ser Glu Lys Asp Glu Gly Glu Gln Glu Glu545 550 555 560Glu Glu Gly Glu Thr Glu Ala Glu Gly Glu Gly Glu Glu Ala Glu Ala565 570 575Lys Glu Glu Lys Lys Ile Glu Gly Lys Val Glu Glu Val Ala Val Lys580 585 590Glu Glu Ile Lys Val Glu Lys Pro Glu Lys Ala Lys Ser Pro Met Pro595 600 605Lys Ser Pro Val Glu Glu Val Lys Pro Lys Pro Glu Ala Lys Ala Gly610 615 620Lys Gly Glu Gln Lys Glu Glu Glu Lys Val Glu Glu Glu Lys Lys Glu625 630 635 640Val Thr Lys Glu Ser Pro Lys Glu Glu Lys Val Glu Lys Lys Glu Glu645 650 655Lys Pro Lys Asp Val Ala Asp Lys Lys Lys Ala Glu Ser Pro Val Lys660 665 670Glu Lys Ala Val Glu Glu Val Ile Thr Ile Ser Lys Ser Val Lys Val675 680 685Ser Leu Glu Lys Asp Thr Lys Glu Glu Lys Pro Gln Pro Gln Glu Lys690 695 700Val Lys Glu Lys Ala Glu Glu Glu Gly Gly Ser Glu Glu Glu Gly Ser705 710 715 720Asp Arg Ser Pro Gln Glu Ser Lys Lys Glu Asp Ile Ala Ile Asn Gly725 730 735Glu Val Glu Gly Lys Glu Glu Glu Glu Gln Glu Thr Gln Glu Lys Gly740 745 750Ser Gly Arg Glu Glu Glu Lys Gly Val Val Thr Asn Gly Leu Asp Val755 760 765Ser Pro Ala Glu Glu Lys Lys Gly Glu Asp Ser Ser Asp Asp Lys Val770 775 780Val Val Thr Lys Lys Val Glu Lys Ile Thr Ser Glu Gly Gly Asp Gly785 790 795 800Ala Thr Lys Tyr Ile Thr Lys Ser Val Thr Val Thr Gln Lys Val Glu805 810

815Glu His Glu Glu Thr Phe Glu Glu Lys Leu Val Ser Thr Lys Lys Val820 825 830Glu Lys Val Thr Ser His Ala Ile Val Lys Glu Val Thr Gln Gly Asp835 840 845122550DNAMus musculus 12atgagctaca cgctggactc gctgggcaac ccgtccgcct accggcgcgt tccaaccgag 60acccggtcca gcttcagccg cgtgagcggt tccccgtcca gcggcttccg ctcgcagtcc 120tggtcccgcg gctcgcccag caccgtgtcc tcctcctaca cgcgcagcgc ggtcgccccg 180cgtctcgcct acagctcggc tatgctcagc tcggccgaga gcagcctcga cttcagccag 240tcctcgtcgc tgctcaacgg cggctccggc ggcgactaca aactgtcccg ctctaacgag 300aaagagcagc tgcaggggct gaacgaccgc ttcgccggct acatcgagaa agtgcactac 360ttggaacaac agaacaagga gatcgaagca gagatccagg cactgcggca gaagcaggcc 420tcgcacgccc agctgggtga tgcttacgac caggagatcc gagagctgcg cgccaccctc 480gagatggtga accacgagaa ggctcaagtg cagctggact ccgatcactt ggaggaagac 540atccaccggc tcaaggagcg cttcgaggag gaggcgcggc tgcgggacga caccgaggct 600gccattcgcg cgctgcgcaa agacatcgaa gagtcgtcga tggttaaggt ggagctggac 660aagaaggtgc agtcgctgca ggatgaggtg gctttcctgc ggcgtaatca cgaagaggag 720gtggccgacc tgctggctca gatccaggcg tcgcacatca cggtagagcg caaagattac 780ctgaagacag acatctccac ggcgctgaag gagatccgct cccagctcga gtgtcactca 840gaccagaaca tgcaccaggc cgaagagtgg ttcaaatgcc gctacgccaa gctcaccgag 900gcggccgagc agaacaagga ggccattcgc tctgccaagg aagagatcgc cgagtaccgg 960cgccagctgc agtccaagag catcgagctc gagtcggtgc gaggcactaa ggagtccctg 1020gaacggcagc tcagcgacat cgaggagcgc cacaaccacg acctcagcag ctaccaggac 1080accatccagc agttggaaaa tgaacttcgg ggaaccaagt gggaaatggc tcgtcatttg 1140cgagaatacc aggatctcct taacgtcaag atggccctgg acatcgagat cgccgcgtac 1200aggaaactcc tagaggggga agagaccaga tttagcacat tttcaggaag catcaccggg 1260cctctgtaca cacaccgaca gccctcagtc acaatatcca gtaagattca gaagaccaaa 1320gtcgaggccc ccaagctcaa ggtccaacac aaatttgtgg aggagatcat cgaagaaact 1380aaagtggaag atgagaagtc agaaatggaa gaaaccctca cagccatcgc agaggagttg 1440gcagcctccg ccaaagagga gaaggaagag gccgaagaaa aggaggagga accagaagcc 1500gaaaagtctc ccgtgaagtc tcctgaggct aaggaagagg aggaggaagg ggaaaaggag 1560gaagaagagg aaggccagga ggaagaagag gaggaagatg aaggtgtcaa gtcagaccag 1620gcagaagagg ggggatctga gaaggaaggc tccagtgaaa aagatgaagg tgagcaggaa 1680gaagaagaag gagaaaccga ggcagaaggt gaaggagagg aagcagaagc taaggaggaa 1740aagaaaattg agggaaaggt tgaggaagtg gctgtcaagg aggaaatcaa ggtcgagaag 1800cctgagaaag ccaaatcccc tatgcccaaa tcacccgtgg aagaagtaaa gccaaaacca 1860gaggccaagg ccgggaaggg tgagcagaag gaggaagaga aagttgagga agagaagaag 1920gaagtcacca aagaatcacc caaggaagag aaggtggaga aaaaggagga gaagccaaaa 1980gatgttgcag ataaaaagaa ggccgagtcc ccggtgaaag agaaggctgt ggaggaggtg 2040atcaccatca gcaagtcggt aaaggtgagc ctggagaaag acaccaaaga ggagaagccg 2100cagccgcagg agaaggtgaa ggagaaggca gaggaggagg ggggcagtga ggaggaaggg 2160agtgaccgta gcccgcagga gtccaagaag gaagacatag ctatcaatgg ggaggtggaa 2220ggaaaagagg aggaggagca ggaaactcag gagaagggca gtgggcggga ggaggagaaa 2280ggggtggtca ctaatggctt agatgtgagc cctgcagagg agaagaaagg agaggatagc 2340agtgatgata aagtggtggt caccaagaag gtagaaaaga tcaccagcga gggaggcgat 2400ggtgctacca aatacatcac caaatctgta accgtcactc aaaaggttga agagcatgag 2460gagacctttg aggagaagct ggtctcaact aaaaaggtag aaaaggtcac ttcacacgcc 2520atagtcaagg aagtcaccca gggtgactaa 255013845PRTRattus Norvegicus 13Ser Tyr Thr Leu Asp Ser Leu Gly Asn Pro Ser Ala Tyr Arg Arg Val1 5 10 15Pro Thr Glu Thr Arg Ser Ser Phe Ser Arg Val Ser Gly Ser Pro Ser20 25 30Ser Gly Phe Arg Ser Gln Ser Trp Ser Arg Gly Ser Pro Ser Thr Val35 40 45Ser Ser Ser Tyr Lys Arg Ser Ala Leu Ala Pro Arg Leu Ala Tyr Ser50 55 60Ser Ala Met Leu Ser Ser Ala Glu Ser Ser Leu Asp Phe Ser Gln Ser65 70 75 80Ser Ser Leu Leu Asn Gly Gly Ser Gly Gly Asp Tyr Lys Leu Ser Arg85 90 95Ser Asn Glu Lys Glu Gln Leu Gln Gly Leu Asn Asp Arg Phe Ala Gly100 105 110Tyr Ile Glu Lys Val His Tyr Leu Glu Gln Gln Asn Lys Glu Ile Glu115 120 125Ala Glu Ile His Ala Leu Arg Gln Lys Gln Ala Ser His Ala Gln Leu130 135 140Gly Asp Ala Tyr Asp Gln Glu Ile Arg Glu Leu Arg Ala Thr Leu Glu145 150 155 160Met Val Asn His Glu Lys Ala Gln Val Gln Leu Asp Ser Asp His Leu165 170 175Glu Glu Asp Ile His Arg Leu Lys Glu Arg Phe Glu Glu Glu Ala Arg180 185 190Leu Arg Asp Asp Thr Glu Ala Ala Ile Arg Ala Val Arg Lys Asp Ile195 200 205Glu Glu Ser Ser Met Val Lys Val Glu Leu Asp Lys Lys Val Gln Ser210 215 220Leu Gln Asp Glu Val Ala Phe Leu Arg Ser Asn His Glu Glu Glu Val225 230 235 240Ala Asp Leu Leu Ala Gln Ile Gln Ala Ser His Ile Thr Val Glu Arg245 250 255Lys Asp Tyr Leu Lys Thr Asp Ile Ser Thr Ala Leu Lys Glu Ile Arg260 265 270Ser Gln Leu Glu Cys His Ser Asp Gln Asn Met His Gln Ala Glu Glu275 280 285Trp Phe Lys Cys Arg Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln Asn290 295 300Lys Glu Ala Ile Arg Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg Arg305 310 315 320Gln Leu Gln Ser Lys Ser Ile Glu Leu Glu Ser Val Arg Gly Thr Lys325 330 335Glu Ser Leu Glu Arg Gln Leu Ser Asp Ile Glu Glu Arg His Asn His340 345 350Asp Leu Ser Ser Tyr Gln Asp Thr Ile Gln Gln Leu Glu Asn Glu Leu355 360 365Arg Gly Thr Lys Trp Glu Met Ala Arg His Leu Arg Glu Tyr Gln Asp370 375 380Leu Leu Asn Val Lys Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr Arg385 390 395 400Lys Leu Leu Glu Gly Glu Glu Thr Arg Phe Ser Thr Phe Ser Gly Ser405 410 415Ile Thr Gly Pro Leu Tyr Thr His Arg Gln Pro Ser Val Thr Ile Ser420 425 430Ser Lys Ile Gln Lys Thr Lys Val Glu Ala Pro Lys Leu Lys Val Gln435 440 445His Lys Phe Val Glu Glu Ile Ile Glu Glu Thr Lys Val Glu Asp Glu450 455 460Lys Ser Glu Met Glu Asp Ala Leu Thr Val Ile Ala Glu Glu Leu Ala465 470 475 480Ala Ser Ala Lys Glu Glu Lys Glu Glu Ala Glu Glu Lys Glu Glu Glu485 490 495Pro Glu Val Glu Lys Ser Pro Val Lys Ser Pro Glu Ala Lys Glu Glu500 505 510Glu Glu Gly Glu Lys Glu Glu Glu Glu Glu Gly Gln Glu Glu Glu Glu515 520 525Glu Glu Asp Glu Gly Val Lys Ser Asp Gln Ala Glu Glu Gly Gly Ser530 535 540Glu Lys Glu Gly Ser Ser Glu Lys Asp Glu Gly Glu Gln Glu Glu Glu545 550 555 560Gly Glu Thr Glu Ala Glu Gly Glu Gly Glu Glu Ala Glu Ala Lys Glu565 570 575Glu Lys Lys Thr Glu Gly Lys Val Glu Glu Met Ala Ile Lys Glu Glu580 585 590Ile Lys Val Glu Lys Pro Glu Lys Ala Lys Ser Pro Val Pro Lys Ser595 600 605Pro Val Glu Glu Val Lys Pro Lys Pro Glu Ala Lys Ala Gly Lys Asp610 615 620Glu Gln Lys Glu Glu Glu Lys Val Glu Glu Lys Lys Glu Val Ala Lys625 630 635 640Glu Ser Pro Lys Glu Glu Lys Val Glu Lys Lys Glu Glu Lys Pro Lys645 650 655Asp Val Pro Asp Lys Lys Lys Ala Glu Ser Pro Val Lys Glu Lys Ala660 665 670Val Glu Glu Met Ile Thr Ile Thr Lys Ser Val Lys Val Ser Leu Glu675 680 685Lys Asp Thr Lys Glu Glu Lys Pro Gln Gln Gln Glu Lys Val Lys Glu690 695 700Lys Ala Glu Glu Glu Gly Gly Ser Glu Glu Glu Val Gly Asp Lys Ser705 710 715 720Pro Gln Glu Ser Lys Lys Glu Asp Ile Ala Ile Asn Gly Glu Val Glu725 730 735Gly Lys Glu Glu Glu Glu Gln Glu Thr Gln Glu Lys Gly Ser Gly Gln740 745 750Glu Glu Glu Lys Gly Val Val Thr Asn Gly Leu Asp Val Ser Pro Ala755 760 765Glu Glu Lys Lys Gly Glu Asp Arg Ser Asp Asp Lys Val Val Val Thr770 775 780Lys Lys Val Glu Lys Ile Thr Ser Glu Gly Gly Asp Gly Ala Thr Lys785 790 795 800Tyr Ile Thr Lys Ser Val Thr Val Thr Gln Lys Val Glu Glu His Glu805 810 815Glu Thr Phe Glu Glu Lys Leu Val Ser Thr Lys Lys Val Glu Lys Val820 825 830Thr Ser His Ala Ile Val Lys Glu Val Thr Gln Gly Asp835 840 845142538DNARattus norvegicus 14atgagctaca cgctggactc gctgggcaac ccgtccgcct accggcgcgt caccgagacc 60ccgtccagct tcagtcgtgt gagcggttcc ccgtccagcg gcttccgctc gcagtcctgg 120tcccgcggct cgcccagcac cgtgtcctcc tcctacaagc gcagcgcgct cgccccgcgc 180ctcgcctaca gctcggctat gctcagctcg gccgagagca gcctcgactt cagccagtcc 240tcttcgctgc ttaacggcgg ctccggcggc gactacaagc tgtcccgctc aaacgagaaa 300gagcagctgc aggggctgaa cgaccgtttc gccggctaca tcgagaaagt gcactacttg 360gaacaacaga acaaggagat cgaggcagag atccacgcgc tgcggcagaa gcaggcctcg 420cacgcccagc tgggtgacgc ttacgaccag gagatccgag agctgcgcgc caccctggag 480atggtgaatc acgagaaggc tcaagtgcag ctggactctg atcacttgga ggaagacatc 540caccggctca aggagcgctt cgaggaggag gcgcggctgc gggacgacac cgaggctgcc 600atccgggcgc tgcgcaaaga catagaggag tcgtcgatgg ttaaggtgga gctggacaag 660aaggtgcagt cgctgcagga tgaggtggcc ttcctgcgga gcaatcacga agaggaggtg 720gccgacctgc tggcccagat ccaggcgtcg cacatcaccg tagagcgcaa agactacctg 780aagacagaca tctccacggc gctgaaagag atccgctccc agctcgagtg tcactccgac 840cagaacatgc accaggccga agagtggttc aaatgccgct acgccaagct caccgaggcg 900gccgagcaga acaaggaggc catccgctcc gctaaagaag agatcgccga gtaccggcgc 960cagctgcagt ccaagagcat tgagctcgag tcggtgcgag gcactaagga gtccctggaa 1020cggcagctca gcgacatcga ggagcgccac aaccacgacc tcagcagcta ccaggacacc 1080atccagcagc tggaaaatga gcttcgggga acaaagtggg aaatggctcg tcatttgcga 1140gaataccagg atctccttaa cgtcaagatg gctctggaca tcgagatcgc cgcatatagg 1200aaactactgg agggtgaaga gaccagattt agcacatttt caggaagcat cactgggcct 1260ctgtacacac accgacagcc ctcagtcaca atatccagta agattcagaa gaccaaagtc 1320gaggccccca agctcaaggt ccaacacaaa tttgtggagg agatcattga ggagactaaa 1380gtggaagatg agaagtcaga aatggaagac gccctcacag tcattgcaga ggaattggca 1440gcctctgcca aagaggagaa agaagaggca gaagaaaagg aagaggaacc ggaagttgaa 1500aagtctcccg tgaagtctcc tgaggctaag gaagaggagg aaggggaaaa ggaggaagaa 1560gaggaaggcc aagaggaaga agaggaggaa gatgaaggtg tcaagtcaga ccaggcagaa 1620gagggaggat ctgagaagga aggctcgagt gaaaaggatg aaggtgagca agaagaagaa 1680ggggaaactg aggcagaagg tgaaggagag gaagcagaag ctaaggagga aaagaaaaca 1740gagggaaagg tcgaggaaat ggctatcaag gaggaaatca aggtcgagaa gcccgagaaa 1800gccaagtccc ctgtgccaaa atcacccgtg gaagaagtaa agccaaaacc agaagccaaa 1860gccggaaagg atgagcagaa ggaggaagag aaagttgagg agaagaagga ggtagccaag 1920gaatcaccca aggaagagaa ggtggagaaa aaggaggaga agccaaaaga tgtcccagat 1980aaaaagaagg ctgagtcccc agtgaaagaa aaggccgtag aggaaatgat caccattact 2040aagtcggtaa aggtgagcct ggagaaagac accaaagagg agaagcctca gcagcaggag 2100aaggtgaagg agaaggcaga ggaggagggg ggtagtgagg aggaagtggg tgacaaaagc 2160ccgcaagaat ccaagaagga agacatagct atcaatgggg aggtggaagg aaaagaggag 2220gaggagcagg aaactcagga gaagggcagt gggcaagagg aggagaaagg ggtggtcact 2280aatggcttag atgtgagccc tgcggaggaa aagaaagggg aggatagaag tgatgacaaa 2340gtggtggtga ccaagaaggt agaaaaaatc accagcgagg gaggcgatgg tgctaccaaa 2400tacatcacca aatctgttac tgtcactcaa aaggttgaag agcatgagga gacctttgag 2460gagaagctgg tgtcaactaa aaaggtagaa aaggtcactt cacatgccat agtcaaggaa 2520gtcacccagg gtgactaa 253815644PRTOryctolagus cuniculus 15Val Lys Val Glu Leu Asp Lys Lys Val Gln Ser Leu Gln Asp Glu Val1 5 10 15Ala Phe Leu Arg Thr Asn His Glu Glu Glu Val Ala Asp Leu Leu Ala20 25 30Gln Ile Gln Ala Ser His Ile Thr Val Glu Arg Lys Asp Tyr Leu Lys35 40 45Thr Asp Ile Ser Ser Ala Leu Lys Glu Ile Arg Ser Gln Leu Glu Cys50 55 60His Ser Asp Gln Asn Met His Gln Ala Glu Glu Trp Phe Lys Cys Arg65 70 75 80Tyr Ala Lys Leu Thr Glu Ala Ala Glu Gln Asn Lys Glu Ala Ile Arg85 90 95Ser Ala Lys Glu Glu Ile Ala Glu Tyr Arg Arg Gln Leu Gln Ser Lys100 105 110Ser Ile Glu Leu Glu Ser Val Ala Trp His Lys Glu Ser Leu Glu Arg115 120 125His Val Ser Asp Ile Glu Glu Arg His Asn His Asp Leu Ser Ser Tyr130 135 140Gln Asp Thr Ile Gln Gln Leu Glu Asn Glu Leu Arg Gly Thr Lys Trp145 150 155 160Glu Met Ala Arg His Leu Arg Glu Tyr Gln Asp Leu Leu Asn Val Lys165 170 175Met Ala Leu Asp Ile Glu Ile Ala Ala Tyr Arg Lys Leu Leu Glu Gly180 185 190Glu Glu Thr Arg Phe Ser Thr Phe Ser Gly Ser Ile Thr Gly Pro Leu195 200 205Tyr Thr His Arg Gln Pro Ser Val Thr Ile Ser Ser Lys Ile Gln Lys210 215 220Thr Lys Val Glu Ala Pro Lys Leu Lys Val Gln His Lys Phe Val Glu225 230 235 240Glu Ile Ile Glu Glu Thr Lys Val Glu Asp Glu Lys Ser Glu Met Glu245 250 255Asp Ala Leu Thr Ala Ile Ala Glu Glu Leu Ala Val Ser Val Lys Glu260 265 270Glu Glu Lys Glu Glu Glu Ala Glu Gly Lys Glu Glu Glu Gln Glu Ala275 280 285Glu Glu Glu Val Ala Ala Ala Lys Lys Ser Pro Val Lys Ala Thr Thr290 295 300Pro Glu Ile Lys Glu Glu Glu Gly Glu Lys Glu Glu Glu Gly Gln Glu305 310 315 320Glu Glu Glu Glu Glu Glu Asp Glu Gly Val Lys Ser Asp Gln Ala Glu325 330 335Glu Gly Gly Ser Glu Lys Glu Gly Ser Ser Lys Asn Glu Gly Glu Gln340 345 350Glu Glu Gly Glu Thr Glu Ala Glu Gly Glu Val Glu Glu Ala Glu Ala355 360 365Lys Glu Glu Lys Lys Thr Glu Glu Lys Ser Glu Glu Val Ala Ala Lys370 375 380Glu Glu Pro Val Thr Glu Ala Lys Val Gly Lys Pro Glu Lys Ala Lys385 390 395 400Ser Pro Val Pro Lys Ser Pro Val Glu Glu Val Lys Pro Lys Ala Glu405 410 415Ala Thr Ala Gly Lys Gly Glu Gln Lys Glu Glu Glu Glu Lys Val Glu420 425 430Glu Glu Lys Lys Lys Ala Ala Lys Glu Ser Pro Lys Glu Glu Lys Val435 440 445Glu Lys Lys Glu Glu Lys Pro Lys Asp Val Pro Lys Lys Lys Ala Glu450 455 460Ser Pro Val Lys Glu Glu Ala Ala Glu Glu Ala Ala Thr Ile Thr Lys465 470 475 480Pro Thr Lys Val Gly Leu Glu Lys Glu Thr Lys Glu Gly Glu Lys Pro485 490 495Leu Gln Gln Glu Lys Glu Lys Glu Lys Ala Gly Glu Glu Gly Gly Ser500 505 510Glu Glu Glu Gly Ser Asp Gln Gly Ser Lys Arg Ala Lys Lys Glu Asp515 520 525Ile Ala Val Asn Gly Glu Gly Glu Gly Lys Glu Glu Glu Glu Pro Glu530 535 540Thr Lys Glu Lys Gly Ser Gly Arg Glu Glu Glu Lys Gly Val Val Thr545 550 555 560Asn Gly Leu Asp Leu Ser Pro Ala Asp Glu Lys Lys Gly Gly Asp Arg565 570 575Ser Glu Glu Lys Val Val Val Thr Lys Lys Val Glu Lys Ile Thr Thr580 585 590Glu Gly Gly Asp Gly Ala Thr Lys Tyr Ile Thr Lys Ser Val Thr Ala595 600 605Gln Lys Val Glu Glu His Glu Glu Thr Phe Glu Glu Lys Leu Val Ser610 615 620Thr Lys Lys Val Glu Lys Val Thr Ser His Ala Ile Val Lys Glu Val625 630 635 640Thr Gln Ser Asp161938DNAOryctolagus cuniculus 16tggtcaaggt ggagctggac aagaaggtcc agtcgctgca ggatgaggtg gccttcctgc 60ggacgaacca cgaggaggag gtagcggacc tgctggccca gatccaggcg tcgcacatca 120cggtggagcg caaagactac ctgaagacgg acatctcgtc ggcgctgaag gagatccgct 180cccagctcga gtgccactcc gaccagaaca tgcatcaggc cgaagagtgg tttaagtgcc 240gctacgccaa gctcaccgaa gccgccgagc agaacaagga ggccatccgc tccgccaagg 300aagagatcgc cgagtaccgg cgccagctgc agtccaagag catcgagctc gagtcggtcg 360cgtggcacaa ggagtccctg gagcggcacg tcagcgacat cgaggagcgc cacaaccacg 420acctcagcag ctaccaggac accattcagc agctggaaaa tgagcttcgg ggaacgaagt 480gggaaatggc ccgccacttg cgagagtacc aggatctcct caatgtcaag atggctctgg 540atatcgagat cgcagcctac agaaaactcc tggagggtga agagaccaga ttcagcacat 600tttcaggaag catcactggg ccactgtata cacaccgaca gccctcagtc accatatcca 660gtaagattca gaagacaaag gtggaagctc ccaagctcaa agtccaacac aaatttgttg 720aggagatcat agaggaaacc aaagtggagg atgagaagtc agaaatggaa gatgcactga 780cagccattgc agaggaactg gccgtgtctg tgaaggaaga ggagaaggaa gaagaggcag 840aaggaaagga agaggagcaa gaagctgaag aagaagttgc agctgccaag aagtctccag 900tgaaggctac cacacccgag attaaagagg aagaagggga aaaggaagaa gaaggccagg 960aggaggaaga agaggaagaa gatgaaggtg ttaagtcaga ccaagctgaa gagggaggat 1020cagagaagga aggctctagc aagaacgagg gtgagcagga agaaggagaa accgaggctg 1080aaggtgaagt agaagaagca gaagccaagg aggaaaagaa aaccgaggag aagagtgaag

1140aagtggctgc taaagaggag ccagtgacag aagccaaggt gggaaagcca gagaaagcca 1200agtcccctgt gccaaaatca ccagtggaag aggtgaagcc aaaagctgaa gccacagcag 1260ggaaagggga gcagaaagag gaagaagaga aggttgagga agaaaagaaa aaggcagcca 1320aggaatctcc aaaggaagag aaggtggaga agaaggagga gaaaccaaaa gatgtgccaa 1380agaagaaagc tgaatccccg gtaaaagagg aggccgcaga agaggctgcc accatcacca 1440aacccacaaa ggtgggcttg gagaaagaga ccaaagaagg ggagaagccg ctgcagcagg 1500agaaggaaaa ggagaaagca ggagaggagg gagggagtga ggaggaaggg agcgaccagg 1560ggtcaaagag ggccaagaag gaagacatag cagtcaatgg ggagggcgaa gggaaagagg 1620aggaagagcc ggagaccaag gaaaagggca gtgggcgaga agaggagaaa ggcgtcgtca 1680ccaatgggtt agacctgagc ccagcagacg agaagaaggg gggtgacaga agcgaggaga 1740aagtggtggt gaccaaaaag gtagaaaaaa tcaccactga ggggggcgat ggtgctacca 1800aatacatcac taaatctgta accgctcaaa aggtcgaaga gcatgaagag acctttgagg 1860agaaactagt gtctactaaa aaggtagaaa aagtcacttc acacgccatt gtaaaggaag 1920tcacccagag tgactaag 193817424PRTEnterobacteria phage M13 17Met Lys Lys Leu Leu Phe Ala Ile Pro Leu Val Val Pro Phe Tyr Ser1 5 10 15His Ser Ala Glu Thr Val Glu Ser Cys Leu Ala Lys Pro His Thr Glu20 25 30Asn Ser Phe Thr Asn Val Trp Lys Asp Asp Lys Thr Leu Asp Arg Tyr35 40 45Ala Asn Tyr Glu Gly Cys Leu Trp Asn Ala Thr Gly Val Val Val Cys50 55 60Thr Gly Asp Glu Thr Gln Cys Tyr Gly Thr Trp Val Pro Ile Gly Leu65 70 75 80Ala Ile Pro Glu Asn Glu Gly Gly Gly Ser Glu Gly Gly Gly Ser Glu85 90 95Gly Gly Gly Ser Glu Gly Gly Gly Thr Lys Pro Pro Glu Tyr Gly Asp100 105 110Thr Pro Ile Pro Gly Tyr Thr Tyr Ile Asn Pro Leu Asp Gly Thr Tyr115 120 125Pro Pro Gly Thr Glu Gln Asn Pro Ala Asn Pro Asn Pro Ser Leu Glu130 135 140Glu Ser Gln Pro Leu Asn Thr Phe Met Phe Gln Asn Asn Arg Phe Arg145 150 155 160Asn Arg Gln Gly Ala Leu Thr Val Tyr Thr Gly Thr Val Thr Gln Gly165 170 175Thr Asp Pro Val Lys Thr Tyr Tyr Gln Tyr Thr Pro Val Ser Ser Lys180 185 190Ala Met Tyr Asp Ala Tyr Trp Asn Gly Lys Phe Arg Asp Cys Ala Phe195 200 205His Ser Gly Phe Asn Glu Asp Pro Phe Val Cys Glu Tyr Gln Gly Gln210 215 220Ser Ser Asp Leu Pro Gln Pro Pro Val Asn Ala Gly Gly Gly Ser Gly225 230 235 240Gly Gly Ser Gly Gly Gly Ser Glu Gly Gly Gly Ser Glu Gly Gly Gly245 250 255Ser Glu Gly Gly Gly Ser Glu Gly Gly Gly Ser Gly Gly Gly Ser Gly260 265 270Ser Gly Asp Phe Asp Tyr Glu Lys Met Ala Asn Ala Asn Lys Gly Ala275 280 285Met Thr Glu Asn Ala Asp Glu Asn Ala Leu Gln Ser Asp Ala Lys Gly290 295 300Lys Leu Asp Ser Val Ala Thr Asp Tyr Gly Ala Ala Ile Asp Gly Phe305 310 315 320Ile Gly Asp Val Ser Gly Leu Ala Asn Gly Asn Gly Ala Thr Gly Asp325 330 335Phe Ala Gly Ser Asn Ser Gln Met Ala Gln Val Gly Asp Gly Asp Asn340 345 350Ser Pro Leu Met Asn Asn Phe Arg Gln Tyr Leu Pro Ser Leu Pro Gln355 360 365Ser Val Glu Cys Arg Pro Phe Val Phe Ser Ala Gly Lys Pro Tyr Glu370 375 380Phe Ser Ile Asp Cys Asp Lys Ile Asn Leu Phe Arg Gly Val Phe Ala385 390 395 400Phe Leu Leu Tyr Val Ala Thr Phe Met Tyr Val Phe Ser Thr Phe Ala405 410 415Asn Ile Leu Arg Asn Lys Glu Ser420181275DNAEnterobacteria phage M13 18atgaaaaaat tattattcgc aattccttta gttgttcctt tctattctca ctccgctgaa 60actgttgaaa gttgtttagc aaaaccccat acagaaaatt catttactaa cgtctggaaa 120gacgacaaaa ctttagatcg ttacgctaac tatgagggtt gtctgtggaa tgctacaggc 180gttgtagttt gtactggtga cgaaactcag tgttacggta catgggttcc tattgggctt 240gctatccctg aaaatgaggg tggtggctct gagggtggcg gttctgaggg tggcggttct 300gagggtggcg gtactaaacc tcctgagtac ggtgatacac ctattccggg ctatacttat 360atcaaccctc tcgacggcac ttatccgcct ggtactgagc aaaaccccgc taatcctaat 420ccttctcttg aggagtctca gcctcttaat actttcatgt ttcagaataa taggttccga 480aataggcagg gggcattaac tgtttatacg ggcactgtta ctcaaggcac tgaccccgtt 540aaaacttatt accagtacac tcctgtatca tcaaaagcca tgtatgacgc ttactggaac 600ggtaaattca gagactgcgc tttccattct ggctttaatg aggatccatt cgtttgtgaa 660tatcaaggcc aatcgtctga cctgcctcaa cctcctgtca atgctggcgg cggctctggt 720ggtggttctg gtggcggctc tgagggtggt ggctctgagg gtggcggttc tgagggtggc 780ggctctgagg gaggcggttc cggtggtggc tctggttccg gtgattttga ttatgaaaag 840atggcaaacg ctaataaggg ggctatgacc gaaaatgccg atgaaaacgc gctacagtct 900gacgctaaag gcaaacttga ttctgtcgct actgattacg gtgctgctat cgatggtttc 960attggtgacg tttccggcct tgctaatggt aatggtgcta ctggtgattt tgctggctct 1020aattcccaaa tggctcaagt cggtgacggt gataattcac ctttaatgaa taatttccgt 1080caatatttac cttccctccc tcaatcggtt gaatgtcgcc cttttgtctt tagcgctggt 1140aaaccatatg aattttctat tgattgtgac aaaataaact tattccgtgg tgtctttgcg 1200tttcttttat atgttgccac ctttatgtat gtattttcta cgtttgctaa catactgcgt 1260aataaggagt cttaa 127519720PRTSaccharomyces cerevisiae 19Met Ala Lys Arg Val Ala Asp Ala Gln Ile Gln Arg Glu Thr Tyr Asp1 5 10 15Ser Asn Glu Ser Asp Asp Asp Val Thr Pro Ser Thr Lys Val Ala Ser20 25 30Ser Ala Val Met Asn Arg Arg Lys Ile Ala Met Pro Lys Arg Arg Met35 40 45Ala Phe Lys Pro Phe Gly Ser Ala Lys Ser Asp Glu Thr Lys Gln Ala50 55 60Ser Ser Phe Ser Phe Leu Asn Arg Ala Asp Gly Thr Gly Glu Ala Gln65 70 75 80Val Asp Asn Ser Pro Thr Thr Glu Ser Asn Ser Arg Leu Lys Ala Leu85 90 95Asn Leu Gln Phe Lys Ala Lys Val Asp Asp Leu Val Leu Gly Lys Pro100 105 110Leu Ala Asp Leu Arg Pro Leu Phe Thr Arg Tyr Glu Leu Tyr Ile Lys115 120 125Asn Ile Leu Glu Ala Pro Val Lys Ser Ile Glu Asn Pro Thr Gln Thr130 135 140Lys Gly Asn Asp Ala Lys Pro Ala Lys Val Glu Asp Val Gln Lys Ser145 150 155 160Ser Asp Ser Ser Ser Glu Asp Glu Val Lys Val Glu Gly Pro Lys Phe165 170 175Thr Ile Asp Ala Lys Pro Pro Ile Ser Asp Ser Val Phe Ser Phe Gly180 185 190Pro Lys Lys Glu Asn Arg Lys Lys Asp Glu Ser Asp Ser Glu Asn Asp195 200 205Ile Glu Ile Lys Gly Pro Glu Phe Lys Phe Ser Gly Thr Val Ser Ser210 215 220Asp Val Phe Lys Leu Asn Pro Ser Thr Asp Lys Asn Glu Lys Lys Thr225 230 235 240Glu Thr Asn Ala Lys Pro Phe Ser Phe Ser Ser Ala Thr Ser Thr Thr245 250 255Glu Gln Thr Lys Ser Lys Asn Pro Leu Ser Leu Thr Glu Ala Thr Lys260 265 270Thr Asn Val Asp Asn Asn Ser Lys Ala Glu Ala Ser Phe Thr Phe Gly275 280 285Thr Lys His Ala Ala Asp Ser Gln Asn Asn Lys Pro Ser Phe Val Phe290 295 300Gly Gln Ala Ala Ala Lys Pro Ser Leu Glu Lys Ser Ser Phe Thr Phe305 310 315 320Gly Ser Thr Thr Ile Glu Lys Lys Asn Asp Glu Asn Ser Thr Ser Asn325 330 335Ser Lys Pro Glu Lys Ser Ser Asp Ser Asn Asp Ser Asn Pro Ser Phe340 345 350Ser Phe Ser Ile Pro Ser Lys Asn Thr Pro Asp Ala Ser Lys Pro Ser355 360 365Phe Ser Phe Gly Val Pro Asn Ser Ser Lys Asn Glu Thr Ser Lys Pro370 375 380Val Phe Ser Phe Gly Ala Ala Thr Pro Ser Ala Lys Glu Ala Ser Gln385 390 395 400Glu Asp Asp Asn Asn Asn Val Glu Lys Pro Ser Ser Lys Pro Ala Phe405 410 415Asn Leu Ile Ser Asn Ala Gly Thr Glu Lys Glu Lys Glu Ser Lys Lys420 425 430Asp Ser Lys Pro Ala Phe Ser Phe Gly Ile Ser Asn Gly Ser Glu Ser435 440 445Lys Asp Ser Asp Lys Pro Ser Leu Pro Ser Ala Val Asp Gly Glu Asn450 455 460Asp Lys Lys Glu Ala Thr Lys Pro Ala Phe Ser Phe Gly Ile Asn Thr465 470 475 480Asn Thr Thr Lys Thr Ala Asp Thr Lys Ala Pro Thr Phe Thr Phe Gly485 490 495Ser Ser Ala Leu Ala Asp Asn Lys Glu Asp Val Lys Lys Pro Phe Ser500 505 510Phe Gly Thr Ser Gln Pro Asn Asn Thr Pro Ser Phe Ser Phe Gly Lys515 520 525Thr Thr Ala Asn Leu Pro Ala Asn Ser Ser Thr Ser Pro Ala Pro Ser530 535 540Ile Pro Ser Thr Gly Phe Lys Phe Ser Leu Pro Phe Glu Gln Lys Gly545 550 555 560Ser Gln Thr Thr Thr Asn Asp Ser Lys Glu Glu Ser Thr Thr Glu Ala565 570 575Thr Gly Asn Glu Ser Gln Asp Ala Thr Lys Val Asp Ala Thr Pro Glu580 585 590Glu Ser Lys Pro Ile Asn Leu Gln Asn Gly Glu Glu Asp Glu Val Ala595 600 605Leu Phe Ser Gln Lys Ala Lys Leu Met Thr Phe Asn Ala Glu Thr Lys610 615 620Ser Tyr Asp Ser Arg Gly Val Gly Glu Met Lys Leu Leu Lys Lys Lys625 630 635 640Asp Asp Pro Ser Lys Val Arg Leu Leu Cys Arg Ser Asp Gly Met Gly645 650 655Asn Val Leu Leu Asn Ala Thr Val Val Asp Ser Phe Lys Tyr Glu Pro660 665 670Leu Ala Pro Gly Asn Asp Asn Leu Ile Lys Ala Pro Thr Val Ala Ala675 680 685Asp Gly Lys Leu Val Thr Tyr Ile Val Lys Phe Lys Gln Lys Glu Glu690 695 700Gly Arg Ser Phe Thr Lys Ala Ile Glu Asp Ala Lys Lys Glu Met Lys705 710 715 720202163DNASaccharomyces cerevisiae 20atggccaaaa gagttgccga tgcgcaaata cagagagaaa cgtacgattc taacgagtct 60gacgatgacg tgactccctc cactaaggtt gcgtcatctg ctgtgatgaa tagaagaaaa 120attgccatgc caaagcgcag gatggcgttc aaaccttttg gttctgcaaa atcggatgaa 180accaagcagg ctagttcctt tagcttcctg aaccgggcgg acggcactgg agaagctcag 240gttgataata gccctaccac agaaagcaat tccagactaa aagcattgaa cctccagttc 300aaggctaagg ttgatgactt agttctaggc aagccgttag cggacttgag gccccttttc 360accaggtacg aattatacat aaagaatatc ttagaagctc ccgtgaaatt tatcgagaat 420ccaacgcaga caaagggaaa tgatgctaaa cctgccaaag tagaagatgt ccaaaaaagt 480tccgattctt catctgaaga tgaggttaag gtggaggggc ccaagttcac aatagatgct 540aaaccgccta tttcagattc cgttttctca tttggcccaa aaaaagaaaa tcgcaagaaa 600gatgaaagtg atagcgaaaa cgatatagaa atcaagggcc ctgaatttaa attttctgga 660actgtatcaa gtgatgtatt taagctgaat ccaagcaccg ataaaaatga aaagaaaacc 720gagactaatg ctaaaccatt ttcattttct tcggccactt caactactga acaaacgaag 780agtaaaaatc ccctttcatt gacagaagct accaagacca atgtggacaa caacagtaaa 840gccgaggctt ccttcacttt tggaacaaaa catgctgcgg attctcaaaa taataaacca 900tcttttgtat ttggtcaagc agctgcaaaa ccatcgctag aaaagagctc attcacgttt 960ggttcaacaa caattgaaaa aaaaaatgac gaaaactcaa cctctaactc aaaacctgaa 1020aagtctagtg atagcaatga ttcaaaccca tctttttcct tttccatacc cagtaagaat 1080acacctgatg catctaagcc atcttttaat tttggggtcc caaactcttc caaaaacgaa 1140acttcaaaac cggtattttc gtttggtgca gcaacaccat cggccaaaga agctagtcag 1200gaagatgaca acaacaacgt tgaaaaacct tcctctaagc ctgccttcaa tttcatatct 1260aacgctggta ccgagaaaga gaaggaaagt aaaaaggact caaagccagc tttttcattt 1320ggcatatcaa acggaagtga aagcaaagac tctgacaaac cctctttacc ctctgcggtt 1380gatggtgaaa atgacaagaa agaagcaaca aaacctgctt tttcgtttgg aataaataca 1440aatactacta aaaccgcgga tactaaagct ccaactttta catttggctc ctctgcactc 1500gctgacaata aagaggatgt taagaaacct ttttcattcg gtacctccca gcctaataat 1560actccatcct tctcattcgg aaaaacaaca gcaaacttgc ctgctaattc ttcaacatct 1620cctgctccct ctataccatc gacggggttc aaattttctt tgccatttga acaaaaaggt 1680agtcaaacaa ctacaaatga tagcaaggaa gaatcaacaa cagaagcaac tggaaatgag 1740tcgcaagatg caaccaaagt agatgctacc ccagaagaat caaagccaat aaacttgcaa 1800aacggtgagg aagacgaagt ggctttattt tcgcaaaaag caaaattaat gacattcaat 1860gctgaaacca aatcgtacga ttcaagaggc gtaggcgaaa tgaagctttt gaagaaaaag 1920gacgatcctt ctaaagtgcg cctactttgt aggtctgacg gtatgggtaa tgtattacta 1980aatgcaactg ttgtagactc cttcaaatat gagcctttag ctcccggaaa tgataatctc 2040attaaagctc ctactgttgc ggctgatggg aaacttgtaa cttatatcgt caagtttaag 2100cagaaggaag aaggccgctc atttacgaaa gctattgaag atgctaaaaa agaaatgaaa 2160taa 216321860PRTMus musculus 21Met Ala Gly Leu Thr Ala Val Val Pro Gln Pro Gly Val Leu Leu Ile1 5 10 15Leu Leu Leu Asn Leu Leu His Pro Ala Gln Pro Gly Gly Val Pro Gly20 25 30Ala Val Pro Gly Gly Leu Pro Gly Gly Val Pro Gly Gly Val Tyr Tyr35 40 45Pro Gly Ala Gly Ile Gly Gly Leu Gly Gly Gly Gly Gly Ala Leu Gly50 55 60Pro Gly Gly Lys Pro Pro Lys Pro Gly Ala Gly Leu Leu Gly Thr Phe65 70 75 80Gly Ala Gly Pro Gly Gly Leu Gly Gly Ala Gly Pro Gly Ala Gly Leu85 90 95Gly Ala Phe Pro Ala Gly Thr Phe Pro Gly Ala Gly Ala Leu Val Pro100 105 110Gly Gly Ala Ala Gly Ala Ala Ala Ala Tyr Lys Ala Ala Ala Lys Ala115 120 125Gly Ala Gly Leu Gly Gly Val Gly Gly Val Pro Gly Gly Val Gly Val130 135 140Gly Gly Val Pro Gly Gly Val Gly Val Gly Gly Val Pro Gly Gly Val145 150 155 160Gly Val Gly Gly Val Pro Gly Gly Val Gly Gly Ile Gly Gly Ile Gly165 170 175Gly Leu Gly Val Ser Thr Gly Ala Val Val Pro Gln Val Gly Ala Gly180 185 190Ile Gly Ala Gly Gly Lys Pro Gly Lys Val Pro Gly Val Gly Leu Pro195 200 205Gly Val Tyr Pro Gly Gly Val Leu Pro Gly Thr Gly Ala Arg Phe Pro210 215 220Gly Val Gly Val Leu Pro Gly Val Pro Thr Gly Thr Gly Val Lys Ala225 230 235 240Lys Ala Pro Gly Gly Gly Gly Ala Phe Ser Gly Ile Pro Gly Val Gly245 250 255Pro Phe Gly Gly Gln Gln Pro Gly Val Pro Leu Gly Tyr Pro Ile Lys260 265 270Ala Pro Lys Leu Pro Gly Gly Tyr Gly Leu Pro Tyr Thr Asn Gly Lys275 280 285Leu Pro Tyr Gly Val Ala Gly Ala Gly Gly Lys Ala Gly Tyr Pro Thr290 295 300Gly Thr Gly Val Gly Ser Gln Ala Ala Ala Ala Ala Ala Lys Ala Ala305 310 315 320Lys Tyr Gly Ala Gly Gly Ala Gly Val Leu Pro Gly Val Gly Gly Gly325 330 335Gly Ile Pro Gly Gly Ala Gly Ala Ile Pro Gly Ile Gly Gly Ile Ala340 345 350Gly Ala Gly Thr Pro Ala Ala Ala Ala Ala Ala Lys Ala Ala Ala Lys355 360 365Ala Ala Lys Tyr Gly Ala Ala Gly Gly Leu Val Pro Gly Gly Pro Gly370 375 380Val Arg Leu Pro Gly Ala Gly Ile Pro Gly Val Gly Gly Ile Pro Gly385 390 395 400Val Gly Gly Ile Pro Gly Val Gly Gly Pro Gly Ile Gly Gly Pro Gly405 410 415Ile Val Gly Gly Pro Gly Ala Val Ser Pro Ala Ala Ala Ala Lys Ala420 425 430Ala Ala Lys Ala Ala Lys Tyr Gly Ala Arg Gly Gly Val Gly Ile Pro435 440 445Thr Tyr Gly Val Gly Ala Gly Gly Phe Pro Gly Tyr Gly Val Gly Ala450 455 460Gly Ala Gly Leu Gly Gly Ala Ser Pro Ala Ala Ala Ala Ala Ala Ala465 470 475 480Lys Ala Ala Lys Tyr Gly Ala Gly Gly Ala Gly Ala Leu Gly Gly Leu485 490 495Val Pro Gly Ala Val Pro Gly Ala Leu Pro Gly Ala Val Pro Ala Val500 505 510Pro Gly Ala Gly Gly Val Pro Gly Ala Gly Thr Pro Ala Ala Ala Ala515 520 525Ala Ala Ala Ala Ala Lys Ala Ala Ala Lys Ala Gly Leu Gly Pro Gly530 535 540Val Gly Gly Val Pro Gly Gly Val Gly Val Gly Gly Ile Pro Gly Gly545 550 555 560Val Gly Val Gly Gly Val Pro Gly Gly Val Gly Pro Gly Gly Val Thr565 570 575Gly Ile Gly Ala Gly Pro Gly Gly Leu Gly Gly Ala Gly Ser Pro Ala580 585 590Ala Ala Lys Ser Ala Ala Lys Ala Ala Ala Lys Ala Gln Tyr Arg Ala595 600 605Ala Ala Gly Leu Gly Ala Gly Val Pro Gly Phe Gly Ala Gly Ala Gly610 615 620Val Pro Gly Phe Gly Ala Gly Ala Gly Val Pro Gly Phe Gly Ala Gly625 630 635 640Ala Gly Val Pro Gly Phe Gly Ala Gly Ala Gly Val Pro Gly Phe Gly645 650 655Ala Gly Ala Val Pro Gly Ser Leu Ala Ala Ser Lys Ala Ala Lys Tyr660 665 670Gly Ala Ala Gly Gly Leu Gly Gly Pro Gly Gly Leu Gly Gly Pro Gly675 680 685Gly Leu Gly Gly Pro Gly Gly Leu Gly Gly Ala Gly Val Pro Gly Arg690 695 700Val Ala Gly Ala Ala Pro Pro Ala Ala Ala Ala Ala Ala Ala Lys Ala705 710

715 720Ala Ala Lys Ala Ala Gln Tyr Gly Leu Gly Gly Ala Gly Gly Leu Gly725 730 735Ala Gly Gly Leu Gly Ala Gly Gly Leu Gly Ala Gly Gly Leu Gly Ala740 745 750Gly Gly Leu Gly Ala Gly Gly Leu Gly Ala Gly Gly Leu Gly Ala Gly755 760 765Gly Leu Gly Ala Gly Gly Gly Val Ser Pro Ala Ala Ala Ala Lys Ala770 775 780Ala Lys Tyr Gly Ala Ala Gly Leu Gly Gly Val Leu Gly Ala Arg Pro785 790 795 800Phe Pro Gly Gly Gly Val Ala Ala Arg Pro Gly Phe Gly Leu Ser Pro805 810 815Ile Tyr Pro Gly Gly Gly Ala Gly Gly Leu Gly Val Gly Gly Lys Pro820 825 830Pro Lys Pro Tyr Gly Gly Ala Leu Gly Ala Leu Gly Tyr Gln Gly Gly835 840 845Gly Cys Phe Gly Lys Ser Cys Gly Arg Lys Arg Lys850 855 860222583DNAMus musculus 22atggcgggtc tgacagcggt agtcccgcag cctggcgtct tgctgatcct cttgctcaac 60ctcctccatc ccgcgcagcc tggaggggtt ccaggagctg tgcctggcgg acttcctggt 120ggagttcccg gtggagtcta ttatccaggg gctggtattg gaggcctggg aggaggagga 180ggagctctgg gacctggagg aaaaccacct aagccaggtg ccggacttct gggaacgttt 240ggagcaggtc ctggaggact tggaggtgct ggcccgggtg caggtctcgg ggcctttcct 300gcaggcacct tcccaggggc aggagctctg gtgcccgggg gagcagcagg ggctgctgcc 360gcttataaag ctgccgccaa agctggggct gggcttggtg gcgttggcgg agtcccaggt 420ggtgttggcg ttggtggagt tccaggtggt gttggagttg gcggagtccc aggtggtgtt 480ggagttggtg gagtccctgg cggtgttggt ggtattggtg gcatcggtgg cttaggagtc 540tcgacaggtg ctgtggtgcc ccaagtcgga gctggcatcg gagctggagg aaagcctggg 600aaagttcctg gtgttggtct tccaggtgta tacccaggcg gagtgctccc aggaacagga 660gctcggttcc ctggtgtggg ggtgctccct ggagttccca ctggcacagg agtcaaagcc 720aaggctccag gtggaggtgg tgctttttct ggaatcccag gggtcggacc ctttgggggt 780cagcagcctg gtgtcccact gggttatccc atcaaagcac caaagctgcc aggtggctac 840ggactgccct ataccaatgg gaaattgccc tatggagtag ctggtgcagg gggcaaggct 900ggctacccaa cagggacagg ggtcggatcc caggcggcgg cggcagcagc taaagcagcc 960aagtatggtg ctgggggagc tggagtcctc cctggtgttg gagggggtgg cattcctggt 1020ggtgctggcg caattcctgg gattggaggc attgcaggcg ctggaactcc tgcagcagca 1080gctgctgcaa aggctgctgc taaggctgct aagtatggag ctgctggagg tttagtgcct 1140ggtggaccag gagttaggct cccaggtgct ggaatcccag gtgttggtgg cattcctggt 1200gttggtggca tcccaggtgt tgggggccct ggtattggag gtccaggcat tgtgggtgga 1260ccaggagctg tgtcaccagc tgctgctgct aaagctgctg ccaaagctgc caaatacgga 1320gccagaggtg gagttggcat cccgacatat ggggttggtg ctggtggctt tcctggctat 1380ggtgttggag ctggagcagg acttggaggt gcaagcccag ctgctgctgc tgccgccgcc 1440aaagctgcta agtatggtgc tggaggagct ggagccctgg gaggcctggt gccaggtgca 1500gtaccaggtg cactgccagg tgcagtacca gctgtgccgg gagctggtgg agtgccagga 1560gcaggtaccc ctgcagctgc agctgctgcc gccgccgcta aagcagccgc caaagcaggt 1620ttgggtcctg gtgttggtgg ggttcctggt ggagttggtg ttggtgggat tcccggtgga 1680gttggtgttg gtggggttcc tggtggagtt ggccctggtg gtgttactgg tattggagct 1740ggtcctggcg gtcttggagg agcagggtca ccggctgccg ctaaatctgc tgctaaggca 1800gctgccaaag cccagtacag agctgccgct gggcttggag ctggtgtccc tggatttggg 1860gctggtgctg gtgtccccgg atttggggct ggtgctggtg tccccggatt tggggctggt 1920gctggtgtcc ccggatttgg ggctggtgct ggtgtccctg gatttggagc tggagcagta 1980cctggatcgc tggctgcatc caaagctgct aaatatggag cagcaggtgg ccttggtggc 2040cctggaggtc tcggtggccc tggaggtctc ggtggacctg gaggacttgg tggggctggt 2100gttcccggta gagtagcagg agctgcaccc cctgctgctg ccgctgctgc tgccaaagct 2160gctgctaagg ctgcccagta tggccttggt ggagccggag gattgggagc cggtggactg 2220ggggccggtg gactgggagc cggtggactg ggagctggtg gactgggagc cggtggactg 2280ggagctggtg gactgggagc cggtggactg ggagctggtg gaggtgtgtc ccctgctgca 2340gctgctaagg cagccaaata tggtgctgct ggccttggag gtgtcctagg agccaggcca 2400ttcccaggtg gaggagttgc agcaagacct ggctttggac tttctcccat ttatccaggt 2460ggtggtgctg ggggcctggg agttggtgga aaacccccga agccctatgg aggagccctt 2520ggagccctgg gataccaagg tgggggctgc tttgggaaat cctgtgggcg gaagagaaag 2580tga 2583236PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 23Ser Pro Glu Ala Glu Lys1 5246PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 24Ser Pro Ala Ala Val Lys1 5256PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 25Ser Pro Ala Glu Ala Lys1 5266PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 26Ser Pro Ala Glu Pro Lys1 5276PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 27Ser Pro Ala Glu Val Lys1 5286PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 28Ser Pro Ala Thr Val Lys1 5296PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 29Ser Pro Glu Lys Ala Lys1 5306PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 30Ser Pro Gly Glu Ala Lys1 5316PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 31Ser Pro Ile Glu Val Lys1 5326PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 32Ser Pro Pro Glu Ala Lys1 5336PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 33Ser Pro Ser Glu Ala Lys1 5347PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 34Ser Pro Glu Lys Glu Ala Lys1 5358PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 35Ser Pro Ala Lys Glu Lys Ala Lys1 5368PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 36Ser Pro Glu Lys Glu Glu Ala Lys1 5378PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 37Ser Pro Thr Lys Glu Glu Ala Lys1 5388PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 38Ser Pro Val Lys Glu Glu Ala Lys1 5398PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 39Ser Pro Val Lys Ala Glu Ala Lys1 5408PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 40Ser Pro Val Lys Glu Glu Ala Lys1 5418PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 41Ser Pro Val Lys Glu Glu Val Lys1 5429PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 42Ser Pro Val Lys Glu Glu Glu Lys Pro1 54311PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 43Ser Pro Glu Lys Ala Lys Thr Leu Asp Val Lys1 5 104411PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 44Ser Pro Ala Asp Lys Phe Pro Glu Lys Ala Lys1 5 104513PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 45Ser Pro Glu Ala Lys Thr Pro Ala Lys Glu Glu Ala Arg1 5 104614PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 46Ser Pro Glu Lys Ala Lys Thr Pro Val Lys Glu Gly Ala Lys1 5 104714PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 47Ser Pro Val Lys Glu Glu Ala Lys Thr Pro Glu Lys Ala Lys1 5 104819PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 48Ser Pro Val Lys Glu Gly Ala Lys Pro Pro Glu Lys Ala Lys Pro Leu1 5 10 15Asp Val Lys4920PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 49Ser Pro Val Lys Glu Asp Ile Lys Pro Pro Ala Glu Ala Lys Ser Pro1 5 10 15Glu Lys Ala Lys205021PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 50Ser Pro Leu Lys Glu Asp Ala Lys Ala Pro Glu Lys Glu Ile Pro Lys1 5 10 15Lys Glu Glu Val Lys205121PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 51Ser Pro Glu Lys Glu Glu Ala Lys Thr Ser Glu Lys Val Ala Pro Lys1 5 10 15Lys Glu Glu Val Lys205225PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 52Ser Pro Glu Ala Gln Thr Pro Val Gln Glu Glu Ala Thr Val Pro Thr1 5 10 15Asp Ile Arg Pro Pro Glu Gln Val Lys20 255336PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from human or mouse neurofilament NF-H proteins (SEQ ID Nos. 1 and 3) 53Ser Pro Val Lys Glu Glu Val Lys Ala Lys Glu Pro Pro Lys Lys Val1 5 10 15Glu Glu Glu Lys Thr Leu Pro Thr Pro Lys Thr Glu Ala Lys Glu Ser20 25 30Lys Lys Asp Glu35544PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 54Ser Pro Pro Lys1554PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 55Ser Pro Val Lys1565PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 56Ser Pro Ala Ala Lys1 5575PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 57Ser Pro Ala Pro Lys1 5585PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 58Ser Pro Glu Ala Lys1 5595PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 59Ser Pro Met Pro Lys1 5605PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 60Ser Pro Pro Ala Lys1 5615PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 61Ser Pro Thr Ala Lys1 5625PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 62Ser Pro Thr Thr Lys1 5635PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 63Ser Pro Val Ala Lys1 5645PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 64Ser Pro Val Ala Lys1 5655PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 65Ser Pro Val Pro Lys1 5665PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 66Ser Pro Val Ser Lys1 5676PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 67Ser Pro Glu Lys Pro Ala1 5688PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 68Ser Pro Val Glu Glu Lys Ala Lys1 5698PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 69Ser Pro Val Glu Glu Lys Gly Lys1 5708PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 70Ser Pro Val Glu Glu Val Lys Pro1 57111PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 71Ser Pro Glu Lys Pro Ala Thr Pro Lys Val Thr1 5 107212PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 72Ser Pro Glu Lys Pro Arg Thr Pro Glu Lys Pro Ala1 5 107312PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 73Ser Pro Glu Lys Pro Thr Thr Pro Glu Lys Val Val1 5 107414PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 74Ser Pro Glu Lys Pro Ser Ser Pro Leu Lys Asp Glu Lys Ala1 5 107515PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 75Ser Pro Val Lys Glu Lys Ala Val Glu Glu Met Ile Thr Ile Thr1 5 10 157616PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 76Ser Pro Val Lys Glu Glu Ala Ala Glu Glu Ala Ala Thr Ile Thr Lys1 5 10 157720PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 77Ser Pro Val Pro Lys Ser Pro Val Glu Glu Val Lys Pro Lys Ala Glu1 5 10 15Ala Thr Ala Gly207820PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 78Ser Pro Val Lys Ala Glu Ser Pro Val Lys Glu Glu Val Pro Ala Lys1 5 10 15Pro Val Lys Val207921PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit

neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 79Ser Pro Glu Lys Glu Ala Lys Glu Glu Glu Lys Pro Gln Glu Lys Glu1 5 10 15Lys Glu Lys Glu Lys208023PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 80Ser Pro Val Lys Ala Thr Thr Pro Glu Ile Lys Glu Glu Glu Gly Glu1 5 10 15Lys Glu Glu Glu Gly Gln Glu208122PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 81Ser Pro Val Glu Glu Val Lys Pro Lys Pro Glu Ala Lys Ala Gly Lys1 5 10 15Gly Glu Gln Lys Glu Glu208224PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 82Ser Pro Glu Lys Pro Ala Thr Pro Glu Lys Pro Pro Thr Pro Glu Lys1 5 10 15Ala Ile Thr Pro Glu Lys Val Arg208324PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 83Ser Pro Glu Lys Pro Ala Thr Pro Glu Lys Pro Arg Thr Pro Glu Lys1 5 10 15Pro Ala Thr Pro Glu Lys Pro Arg208423PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 84Ser Pro Lys Glu Glu Lys Val Glu Lys Lys Glu Glu Lys Pro Lys Asp1 5 10 15Val Pro Lys Lys Lys Ala Glu208524PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 85Ser Pro Lys Glu Glu Lys Ala Glu Lys Lys Glu Glu Lys Pro Lys Asp1 5 10 15Val Pro Glu Lys Lys Lys Ala Glu208624PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 86Ser Pro Val Glu Glu Ala Lys Ser Lys Ala Glu Val Gly Lys Gly Glu1 5 10 15Gln Lys Glu Glu Glu Glu Lys Glu208724PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 87Ser Pro Lys Glu Glu Lys Val Glu Lys Lys Glu Glu Lys Pro Lys Asp1 5 10 15Val Pro Asp Lys Lys Lys Ala Glu208826PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 88Ser Pro Val Lys Glu Glu Ala Val Ala Glu Val Val Thr Ile Thr Lys1 5 10 15Ser Val Lys Val His Leu Glu Lys Glu Thr20 258931PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 89Ser Ser Glu Lys Asp Glu Gly Glu Gln Glu Glu Glu Glu Gly Glu Thr1 5 10 15Glu Ala Glu Gly Glu Gly Glu Glu Ala Glu Ala Lys Glu Glu Lys20 25 309032PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 90Ser Pro Val Glu Glu Val Lys Pro Lys Ala Glu Ala Gly Ala Glu Lys1 5 10 15Gly Glu Gln Lys Glu Lys Val Glu Glu Glu Lys Lys Glu Ala Lys Glu20 25 309137PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 91Ser Pro Val Thr Glu Gln Ala Lys Ala Val Gln Lys Ala Ala Ala Glu1 5 10 15Val Gly Lys Asp Gln Lys Ala Glu Lys Ala Ala Glu Lys Ala Ala Lys20 25 30Glu Glu Lys Ala Ala359244PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from bovine, chicken, human, mouse, rat and rabbit neurofilament NF-M proteins (SEQ ID Nos. 5,7,9,11,13 and 15) 92Ser Pro Glu Ala Lys Glu Glu Glu Glu Glu Gly Glu Lys Glu Glu Glu1 5 10 15Glu Glu Gly Gln Glu Glu Glu Glu Glu Glu Asp Glu Gly Val Lys Ser20 25 30Asp Gln Ala Glu Glu Gly Gly Ser Glu Lys Glu Gly35 40935PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 93Glu Gly Gly Gly Ser1 5945PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 94Glu Gly Gly Gly Thr1 5955PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 95Ser Glu Gly Gly Gly1 5967PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 96Gly Gly Gly Ser Gly Gly Gly1 5978PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 97Ser Gly Gly Gly Ser Gly Ser Gly1 5989PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from filamentous phage fd adsorption protein pIII (SEQ ID No. 17) 98Ser Gly Gly Gly Ser Glu Gly Gly Gly1 59913PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 99Phe Ser Phe Gly Thr Ser Gln Pro Asn Asn Thr Pro Ser1 5 1010017PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 100Phe Ser Phe Ser Ile Pro Ser Lys Asn Thr Pro Asp Ala Ser Lys Pro1 5 10 15Ser10116PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 101Phe Val Phe Gly Gln Ala Ala Ala Lys Pro Ser Leu Glu Lys Ser Ser1 5 10 1510217PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 102Phe Ser Phe Gly Val Pro Asn Ser Ser Lys Asn Glu Thr Ser Lys Pro1 5 10 15Val10317PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 103Phe Thr Phe Gly Thr Lys His Ala Ala Asp Ser Gln Asn Asn Lys Pro1 5 10 15Ser10418PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 104Phe Thr Phe Gly Ser Ser Ala Leu Ala Asp Asn Lys Glu Asp Val Lys1 5 10 15Lys Pro10519PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 105Phe Ser Phe Gly Ile Asn Thr Asn Thr Thr Lys Thr Ala Asp Thr Lys1 5 10 15Ala Pro Thr10626PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 106Phe Ser Phe Gly Lys Thr Thr Ala Asn Leu Pro Ala Asn Ser Ser Thr1 5 10 15Ser Pro Ala Pro Ser Ile Pro Ser Thr Gly20 2510727PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 107Phe Ser Phe Gly Pro Lys Lys Glu Asn Arg Lys Lys Asp Glu Ser Asp1 5 10 15Ser Glu Asn Asp Ile Glu Ile Lys Gly Pro Glu20 2510831PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 108Phe Lys Phe Ser Gly Thr Val Ser Ser Asp Val Phe Lys Leu Asn Pro1 5 10 15Ser Thr Asp Lys Asn Glu Lys Lys Thr Glu Thr Asn Ala Lys Pro20 25 3010933PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 109Phe Lys Phe Ser Leu Pro Phe Glu Gln Lys Gly Ser Gln Thr Thr Thr1 5 10 15Asn Asp Ser Lys Glu Glu Ser Thr Thr Glu Ala Thr Gly Asn Glu Ser20 25 30Gln11034PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 110Phe Thr Phe Gly Ser Thr Thr Ile Glu Lys Lys Asn Asp Glu Asn Ser1 5 10 15Thr Ser Asn Ser Lys Pro Glu Lys Ser Ser Asp Ser Asn Asp Ser Asn20 25 30Pro Ser11136PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 111Phe Ser Phe Gly Ile Ser Asn Gly Ser Glu Ser Lys Asp Ser Asp Lys1 5 10 15Pro Ser Leu Pro Ser Ala Val Asp Gly Glu Asn Asp Lys Lys Glu Ala20 25 30Thr Lys Pro Ala3511238PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 112Phe Ser Phe Ser Ser Ala Thr Ser Thr Thr Glu Gln Thr Lys Ser Lys1 5 10 15Asn Pro Leu Ser Leu Thr Glu Ala Thr Lys Thr Asn Val Asp Asn Asn20 25 30Ser Lys Ala Glu Ala Ser3511352PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from yeast nucleoporin Nup2p protein (SEQ ID No. 19) 113Phe Ser Phe Gly Ala Ala Thr Pro Ser Ala Lys Glu Ala Ser Gln Glu1 5 10 15Asp Asp Asn Asn Asn Val Glu Lys Pro Ser Ser Lys Pro Ala Phe Asn20 25 30Leu Ile Ser Asn Ala Gly Thr Glu Lys Glu Lys Glu Ser Lys Lys Asp35 40 45Ser Lys Pro Ala501144PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 114Val Pro Gly Ala11155PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 115Gly Ala Gly Gly Leu1 51165PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 116Gly Ala Gly Gly Gly1 51175PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 117Val Pro Gly Val Gly1 51188PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 118Val Pro Gly Phe Gly Ala Gly Ala1 51198PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 119Val Pro Gly Ala Leu Pro Gly Ala1 51209PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 120Val Pro Gly Phe Gly Ala Gly Ala Gly1 51219PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 121Val Pro Ala Val Pro Gly Ala Gly Gly1 51229PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 122Val Pro Gly Gly Val Gly Val Gly Gly1 512310PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 123Val Gly Ala Gly Gly Phe Pro Gly Tyr Gly1 5 1012412PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 124Val Pro Gly Ala Val Pro Gly Gly Leu Pro Gly Gly1 5 1012515PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 125Val Ser Pro Ala Ala Ala Ala Lys Ala Ala Lys Tyr Gly Ala Ala1 5 10 1512616PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 126Val Pro Gln Val Gly Ala Gly Ile Gly Ala Gly Gly Lys Pro Gly Lys1 5 10 1512718PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 127Val Pro Gly Gly Val Gly Val Gly Gly Ile Pro Gly Gly Val Gly Val1 5 10 15Gly Gly12821PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 128Val Pro Gly Gly Val Gly Gly Ile Gly Gly Ile Gly Gly Leu Gly Val1 5 10 15Ser Thr Gly Ala Val2012927PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 129Val Pro Gly Gly Ala Ala Gly Ala Ala Ala Ala Tyr Lys Ala Ala Ala1 5 10 15Lys Ala Gly Ala Gly Leu Gly Gly Val Gly Gly20 2513028PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 130Val Ser Pro Ala Ala Ala Ala Lys Ala Ala Ala Lys Ala Ala Lys Tyr1 5 10 15Gly Ala Arg Gly Gly Val Gly Ile Pro Thr Tyr Gly20 2513130PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 131Lys Pro Pro Lys Pro Tyr Gly Gly Ala Leu Gly Ala Leu Gly Tyr Gln1 5 10 15Gly Gly Gly Cys Phe Gly Lys Ser Cys Gly Arg Lys Arg Lys20 25 3013230PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 132Val Pro Gly Ala Gly Thr Pro Ala Ala Ala Ala Ala Ala Ala Ala Ala1 5 10 15Lys Ala Ala Ala Lys Ala Gly Leu Gly Pro Gly Val Gly Gly20 25 3013330PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 133Val Pro Gly Arg Val Ala Gly Ala Ala Pro Pro Ala Ala Ala Ala Ala1 5 10 15Ala Ala Lys Ala Ala Ala Lys Ala Ala Gln Tyr Gly Leu Gly20 25 3013430PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 134Val Pro Gly Val Gly Leu Pro Gly Val Tyr Pro Gly Gly Val Leu Pro1 5 10 15Gly Thr Gly Ala Arg Phe Pro Gly Val Gly Val Leu Pro Gly20 25 3013533PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 135Val Pro Thr Gly Thr Gly Val Lys Ala Lys Ala Pro Gly Gly Gly Gly1 5 10 15Ala Phe Ser Gly Ile Pro Gly Val Gly Pro Phe Gly Gly Gln Gln Pro20 25 30Gly13634PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 136Val Pro Gly Gly Val Tyr Tyr Pro Gly Ala Gly Ile Gly Gly Leu Gly1 5 10 15Gly Gly Gly Gly Ala Leu Gly Pro Gly Gly Lys Pro Pro Lys Pro Gly20 25 30Ala Gly13735PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 137Val Gly Ala Gly Ala Gly Leu Gly Gly Ala Ser Pro Ala Ala Ala Ala1 5 10 15Ala Ala Ala Lys Ala Ala Lys Tyr Gly Ala Gly Gly Ala Gly Ala Leu20 25 30Gly Gly Leu3513840PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 138Gly Leu Gly Gly Val Leu Gly Ala Arg Pro Phe Pro Gly Gly Gly Val1 5 10 15Ala Ala Arg Pro Gly Phe Gly Leu Ser Pro Ile Tyr Pro Gly Gly Gly20 25 30Ala Gly Gly Leu Gly Val Gly Gly35 4013941PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 139Val Pro Gly Ser Leu Ala Ala Ser Lys Ala Ala Lys Tyr Gly Ala Ala1 5

10 15Gly Gly Leu Gly Gly Pro Gly Gly Leu Gly Gly Pro Gly Gly Leu Gly20 25 30Gly Pro Gly Gly Leu Gly Gly Ala Gly35 4014045PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 140Val Pro Gly Gly Pro Gly Val Arg Leu Pro Gly Ala Gly Ile Pro Gly1 5 10 15Val Gly Gly Ile Pro Gly Val Gly Gly Ile Pro Gly Val Gly Gly Pro20 25 30Gly Ile Gly Gly Pro Gly Ile Val Gly Gly Pro Gly Ala35 40 4514150PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 141Val Leu Pro Gly Val Gly Gly Gly Gly Ile Pro Gly Gly Ala Gly Ala1 5 10 15Ile Pro Gly Ile Gly Gly Ile Ala Gly Ala Gly Thr Pro Ala Ala Ala20 25 30Ala Ala Ala Lys Ala Ala Ala Lys Ala Ala Lys Tyr Gly Ala Ala Gly35 40 45Gly Leu5014250PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 142Val Pro Gly Gly Val Gly Pro Gly Gly Val Thr Gly Ile Gly Ala Gly1 5 10 15Pro Gly Gly Leu Gly Gly Ala Gly Ser Pro Ala Ala Ala Lys Ser Ala20 25 30Ala Lys Ala Ala Ala Lys Ala Gln Tyr Arg Ala Ala Ala Gly Leu Gly35 40 45Ala Gly5014364PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 143Val Pro Leu Gly Tyr Pro Ile Lys Ala Pro Lys Leu Pro Gly Gly Tyr1 5 10 15Gly Leu Pro Tyr Thr Asn Gly Lys Leu Pro Tyr Gly Val Ala Gly Ala20 25 30Gly Gly Lys Ala Gly Tyr Pro Thr Gly Thr Gly Val Gly Ser Gln Ala35 40 45Ala Ala Ala Ala Ala Lys Ala Ala Lys Tyr Gly Ala Gly Gly Ala Gly50 55 601445PRTArtificial SequenceEntropic bristle domain (EBD) sequence derived from mouse elastin protein (SEQ ID No. 21) 144Val Pro Gly Xaa Gly1 5

User Contributions:

comments("1"); ?> comment_form("1"); ?>

Inventors list

Agents list

Assignees list

List by place

Classification tree browser

Top 100 Inventors

Top 100 Agents

Top 100 Assignees

Usenet FAQ Index

Documents

Other FAQs

Patent applications by A. Keith Dunker, Indianapolis, IN US

Patent applications by James Mueller, Indianapolis, IN US

Patent applications by Marc S. Cortese, Indianapolis, IN US

Patent applications by Vladimir N. Uversky, Carmel, IN US

Patent applications by MOLECULAR KINETICS INCORPORATED

Patent applications in class Recombinant DNA technique included in method of making a protein or polypeptide

Patent applications in all subclasses Recombinant DNA technique included in method of making a protein or polypeptide

User Contributions:

Comment about this patent or add new information about this topic:

Images included with this patent application:

Date	Title
Similar patent applications:
2011-02-24	Method for producing a recombinant protein on a manufacturing scale
2011-01-27	Bacterial leader sequences for increased expression
2011-03-03	Multiple promoter platform for protein production
2011-03-17	Human host cell for producing recombinant proteins with high quality and quantity

Date	Title
New patent applications in this class:
2022-05-05	Engineered cd47 extracellular domain for bioconjugation
2019-05-16	High cell density anaerobic fermentation for protein expression
2019-05-16	Polynucleotide encoding fusion of anchoring motif and dehalogenase, host cell including the polynucleotide, and use thereof
2019-05-16	Cell culture method, medium, and medium kit
2018-01-25	Protein expression strains

Date	Title
New patent applications from these inventors:
2013-03-07	Artificial entropic bristle domain sequences and their use in recombinant protein production
2012-07-26	Artificial entropic bristle domain sequences and their use in recombinant protein production
2012-07-26	Artificial entropic bristle domain sequences and their use in recombinant protein production

Rank	Inventor's name
Top Inventors for class "Chemistry: molecular biology and microbiology"
1	Marshall Medoff
2	Anthony P. Burgard
3	Mark J. Burk
4	Robin E. Osterhout
5	Rangarajan Sampath

Patent application title: ENTROPIC BRISTLE DOMAIN SEQUENCES AND THEIR USE IN RECOMBINANT PROTEIN PRODUCTION

Patent application title: ENTROPIC BRISTLE DOMAIN SEQUENCES AND THEIR USE IN RECOMBINANT PROTEIN PRODUCTION

Inventors: Vladimir N. Uversky A. Keith Dunker Marc S. Cortese James Mueller Agents: SEED INTELLECTUAL PROPERTY LAW GROUP PLLC Assignees: MOLECULAR KINETICS INCORPORATED Origin: SEATTLE, WA US IPC8 Class: AC12P2106FI USPC Class: 435 691

Abstract:

Claims:

Description:

Inventors: Vladimir N. Uversky A. Keith Dunker Marc S. Cortese James Mueller
Agents: SEED INTELLECTUAL PROPERTY LAW GROUP PLLC
Assignees: MOLECULAR KINETICS INCORPORATED
Origin: SEATTLE, WA US
IPC8 Class: AC12P2106FI
USPC Class: 435 691