Patent application title: SYSTEM FOR DETECTNG PROTEASE
Inventors:
Inhwan Hwang (Seoul, KR)
Dae Heon Kim (Seoul, KR)
Yong Jik Lee (Pohang, KR)
IPC8 Class: AC40B3006FI
USPC Class:
506 10
Class name: Combinatorial chemistry technology: method, library, apparatus method of screening a library by measuring the effect on a living organism, tissue, or cell
Publication date: 2011-12-01
Patent application number: 20110294697
Abstract:
Disclosed is a system for detecting a protease inside a cell. In one
embodiment, the system includes a chimeric protein that comprises as
covalently linked components: 1) at least one optionally masked signal
protein; 2) at least one protease-specific cleavage site; and 3) at least
one detectable amino acid sequence. The invention has a wide spectrum of
applications including use in the detection of novel protease inhibitors
inside cells and tissue.Claims:
1. A chimeric protein comprising at least one signal protein that has a
trafficking signal targeting to a subcellular organelle and at least one
proteolytic cleavage site for a protease, which is constructed such that
(a) the trafficking signals of all the signal proteins are inactivated by
linking the proteolytic site or a signal masking protein through the
proteolytic site to the N- or C-terminus of the signal proteins, and thus
the chimeric protein is present in cytosol; (b) the trafficking signal of
at least one signal protein is activated when the proteolytic cleavage
site is cleaved by the protease, and as a result at least one fragment
protein that includes the activated signal protein is transported to a
subcellular organelle; and (c) the chimeric protein is labeled with at
least one fluorescent protein and the position and intensity distribution
of the fluorescent label signal in the cell is altered depending on the
cleavage by the protease.
2. A chimeric protein comprising at least two signal proteins that have trafficking signals targeting to subcellular organelles and at least one proteolytic cleavage site for a protease, which is constructed such that (a) the trafficking signal of one signal protein remains active, and those of the rest of the signal proteins are inactivated by linking the proteolytic site or a signal masking protein through the proteolytic site to the N- or C-terminus of the signal proteins, and thus the chimeric protein is transported to a specific subcellular organelle targeted by the trafficking signal of the active signal protein; (b) at least one proteolytic site and at least one inactivated signal protein are exposed to cytosol after the chimeric protein is transported to the subcellular organelle; (c) the trafficking signal of the at least one inactivated signal protein exposed to cytosol is activated when the proteolytic cleavage site is cleaved by the protease, and as a result the fragment protein that includes the activated signal protein is transported to a subcellular organelle that is different from the subcellular organelle to which the chimeric protein was transported; and (d) the chimeric protein is labeled with at least one fluorescent protein and the position and intensity distribution of the fluorescent signal in the cell is altered depending on the cleavage by the protease.
3. The chimeric protein according to any of claims 1 and 2, wherein among the fragment proteins produced by the proteolytic cleavage, at least two fragment proteins with different cellular localization characteristics includes a fluorescent protein in each and the fluorescent proteins used consist of at least two fluorescent proteins with different wavelengths.
4. The chimeric protein according to any of claims 1 and 2, wherein among the fragment proteins including a signal protein whose inactivated trafficking signal is activated by the proteolytic cleavage, at least one fragment protein includes a fluorescent protein.
5. A chimeric protein comprising a signal protein that has a trafficking signal targeting to a subcellular organelle and a fluorescent protein linked to the signal protein through a proteolytic cleavage site of a protease, which is constructed such that, (a) the trafficking signal of the signal protein remains active in the chimeric protein and thus the chimeric protein is transported to the subcellular organelle; (b) the proteolytic cleavage site and the fluorescent protein are exposed to cytosol after the chimeric protein is transported to the subcellular organelle; and (c) therefore, the fluorescent protein exposed to cytosol is released to cytosol when the proteolytic cleavage site exposed to cytosol is cleaved by the protease.
6. The chimeric protein of claim 5, wherein the signal protein remaining active is further labeled with a fluorescent protein that has a fluorescence wavelength different from that of the fluorescent protein exposed to cytosol, and this additional fluorescence protein remains attached to the active signal protein after the proteolytic cleavage occurs.
7. The chimeric protein according to any of claims 1 and 2, wherein the trafficking signal of the inactivated signal protein is a signal targeting to a subcellular organelle selected from the group consisting of mitochondria, chloroplast, and peroxisome.
8. The chimeric protein according to any of claims 1 and 2, wherein the signal protein that is inactivated is a full length protein selected from the group consisting of Arabidopsis outer envelope membrane protein 7 (AtOEP7), Rubisco small subunit (RbcS), Chlorophyll a/b binding protein (Cab), Rubisco activase (RA), F1-ATPase, and Peroxisome-targeting motif (SKL), or a portion thereof that includes the trafficking signal.
9. The chimeric protein according to any of claims 2 and 5, wherein the trafficking signal of the signal protein remaining active is a signal targeting to one selected from the group consisting of outer membranes of mitochondria, chloroplast, and nucleus, peroxisome membrane, and plasma membrane.
10. The chimeric protein according to any of claims 2 and 5, wherein the signal protein remaining active is a protein that binds specifically to a specific phospholipid.
11. The chimeric protein according to any of claims 2 and 5, wherein the signal protein remaining active is a full length protein selected from the group consisting of Arabidopsis outer envelope membrane protein 7 (AtOEP7), H+-ATPase, Pleckstrin homology domain (PH), and pleckstrin homology domain of FAPP (family A (phosphoinositide binding specific) member 3), or a portion thereof that includes the trafficking signal.
12. The chimeric protein according to any of claims 1 and 2, wherein the signal masking protein is selected from the group consisting of amino acids, peptides, and proteins.
13. The chimeric protein according to any of claims 1, 2, and 5, wherein the fluorescent protein is selected from the group consisting of green fluorescent protein (GFP), red fluorescent protein (RFP), mutants thereof, and derivatives thereof.
14. A recombinant gene comprising a nucleic acid sequence encoding the chimeric protein of any of claims 1, 2, and 5, which is constructed to express the chimeric protein in a cell.
15. A cell transformed with the recombinant gene of claim 14.
16. The cell of claim 15, wherein the cell is a eukaryotic cell.
17. The cell of claim 16, wherein the eukaryotic cell is a plant cell.
18. A method for analyzing the activity of a protease in vivo, comprising (a) transforming a cell with the recombinant gene of claim 14; (b) transforming the cell to express the protease before, after, or at the same time as step (a); (c) incubating the transformed cell to express the proteins; (d) observing an image of the distribution of the fluorescence signal in the incubated cell; and (e) determining the activity of the protease by comparing the fluorescence image observed in step (d) with that of a control transformed cell prepared without step (b).
19. A method for screening protease inhibitors in vivo, comprising (a) transforming a cell with the recombinant gene of claim 14; (b) transforming the cell to express the protease before, after, or at the same time as step (a); (c) treating the transformed cell with a drug candidate for inhibiting the protease before, after, or at the same time as step (b); (d) incubating the transformed cell to express the proteins; (e) observing an image of the distribution of the fluorescent signal in the incubated cell; and (f) determining the protease inhibition activity of the drug candidate by comparing the fluorescence image observed in step (e) with at least one of the fluorescent images of a control transformed cell prepared without step (b), a control transformed cell prepared without step (c), and a control transformed cell prepared without steps (b) and (c).
20. The method of any of claims 18 and 19, wherein the step of transforming the cell to express the protease is a step of inserting a recombinant gene for expression of the protease, or a step of infecting with a virus that can express the protease.
21. The method of any of claims 18 and 19, wherein the cell to be transformed is a eukaryotic cell.
22. The method of any of claims 18 and 19, wherein the cell to be transformed is a plant cell.
23. A system for detecting a protease inside a cell, wherein the system comprises a chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence.
24. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) the signal protein; 2) the protease-specific cleavage site; and 3) the detectable amino acid sequence.
25. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a masking sequence; 2) the protease cleavage site; 3) the signal protein; and 4) the detectable amino acid sequence.
26. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) the signal protein; 2) the protease cleavage site; 3) the masking sequence; and 4) the detectable amino acid sequence.
27. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a first signal protein; 2) the protease cleavage site; and 3) a second signal protein; and 4) the detectable amino acid sequence.
28. The system of claim 27, wherein the chimeric protein comprises covalently linked in sequence: 1) the first signal protein; 2) a first protease cleavage site; 3) the masking sequence; 4) the second signal protein; and 5) the detectable amino acid sequence.
29. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) the masking sequence; 2) a first protease cleavage site; 3) a first signal protein; 4) a second protease cleavage site; 5) a second signal protein; and 6) the detectable amino acid sequence.
30. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a first signal protein; 2) a first protease cleavage site; 3) a second signal protein; 4) a second protease cleavage site; 5) a masking sequence; and 6) the detectable amino acid sequence.
31. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) the protease-specific cleavage site; 2) the signal protein; and 3) the detectable amino acid sequence.
32. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a first signal protein; 2) a first detectable sequence; 3) the protease cleavage site; and 4) a second detectable sequence.
33. The system of claim 32, wherein the chimeric protein further comprises a second signal protein covalently linked between the C-terminus of the protease cleavage site and the N-terminus of the second detectable sequence.
34. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a first signal protein; 2) the protease cleavage site; 3) a second signal protein; and 4) a second detectable sequence.
35. The system of claim 23, wherein the chimeric protein comprises covalently linked in sequence: 1) a first detectable sequence; 2) the protease cleavage site; 3) the signal protein; and 4) a second detectable sequence.
36. The system of claim 23, wherein any one of the components comprises the N-terminus of the chimeric protein.
37. The system of claim 23, wherein any one of the components comprises the C-terminus of the chimeric protein.
38. The system of claim 23, wherein the signal protein is sufficient to localize the chimeric protein or at least one of its components to a plant or animal cell organelle.
39. The system of claim 38, wherein the signal protein localizes the chimeric protein or at least one of its components to the nucleus, golgi body, lytic vacuole, storage vacuole, peroxisome, mitochondrion, endoplasmic reticulum, plasma membrane, or chloroplast of a plant cell.
40. The system of claim 39, wherein the signal protein is one of AtOEP7; RbcS; Cab; RA; SKL; F1-ATPase; PH; FAPP; H+-ATPase; or a functional fragment thereof.
41. The system of claim 38, wherein the signal protein localizes the chimeric protein to the nucleus, golgi body, storage vacuole, lysosome, peroxisome, endoplasmic reticulum, plasma membrane, or mitochondrion of an animal cell.
42. The system of claim 41, wherein the signal protein is one of human peptide methionine sulfoxide reductase (MSRA), cytochrome b2,11-beta-hydroxysteroid dehydrogenase (11.beta.-HSD), G9-AKL, peroxisomal integral membrane protein 47 (PMP47); or a functional fragment thereof.
43. The system of claim 23, wherein the cleavage site is specifically cleaved by a mammalian or viral protease.
44. The system of claim 43, wherein cleavage site is specifically cleaved by a protease associated with a human pathogen.
45. The system of claim 44, wherein the protease is expressed by a cytomegalovirus (CMV); herpes simplex virus (HSV); hepatitis virus; a plasmodium, human immunodeficiency virus (HIV), Kaposi's sarcoma-associated herpes virus (KSHV), yellow fever virus, flavivirus, or rhinovirus.
46. The system of claim 43, wherein the protease is a serine-type protease.
47. The system of claim 45, wherein the plasmodium is P. falciparum and the protease is one of plasmepsin I and plasmepsin II.
48. The system of claim 45, wherein cleavage site is specifically cleaved by a maturational protease of HSV.
49. The system of claim 45, wherein the hepatitis virus is type C.
50. The system of claim 44, wherein the human pathogen is yeast, bacterium, fungi, nematode, virus, or protozoa.
51. The system of claim 43, wherein the cleavage site is specifically cleaved by a mammalian protease associated with blood coagulation, apoptosis, or the extracellular matrix.
52. The system of claim 23, wherein at least one of the detectable sequences is a fluorescent, phosphorescent, or chemiluminescent sequence.
53. The system of claim 52, wherein the emission wavelength of one of the detectable sequences is different from at least one other of the detectable sequences.
54. The system of claim 52, wherein the detectable sequence is a jellyfish fluorescent protein or a derivative thereof.
55. A substantially pure chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence.
56. A nucleic acid comprising sequence that encodes a chimeric protein for detecting protease activity in a cell, wherein the chimeric protein comprises as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence.
57. A vector comprising the nucleic acid of claim 56.
58. A cell transformed with the vector of claim 57.
59. A kit for detecting a protease inside a cell, the kit comprising at least one of: a) a chimeric protein comprising as covalently linked components: i) at least one optionally masked signal protein; ii) at least one protease-specific cleavage site; and iii) at least one detectable amino acid sequence; and b) a vector comprising a nucleic acid comprising sequence encoding the chimeric protein.
60. A method for detecting a protease inside a cell or tissue, the method comprising: a) introducing, into a subject cell or tissue, a first vector comprising nucleic acid encoding a chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence, b) incubating the cell or tissue under conditions conducive to expressing the chimeric protein encoded by the first vector; and c) detecting a change in the subcellular localization of the chimeric protein as being indicative of the presence of the protease inside the cell.
61. The method of claim 60, wherein the method further comprises introducing, into the subject cell or tissue, a second vector comprising nucleic acid sequence encoding the protease; and expressing the second vector in the cell or tissue to produce the protease therein.
62. A method for detecting a protease inhibitor in vivo, the method comprising: a) introducing, into a subject cell or tissue, a first vector comprising nucleic acid encoding a chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence, b) introducing into the cell or tissue a second vector encoding a subject protease, c) contacting the cell or tissue with candidate compound, d) incubating the cell or tissue under conditions conducive to expressing the chimeric protein encoded by the first vector and the protease encoded by the second vector; and e) detecting a change in the subcellular localization of the chimeric protein as being indicative of the presence of the protease inhibitor.
63. The method of claim 62, wherein the method further comprises use of an automated or semi-automated device for detecting the change in subcellular localization of the chimeric protein.
64. The method of claim 63, wherein the automated or semi-automated device comprises an optical system adapted to detect the detectable sequence inside the cell or tissue.
Description:
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application is a continuation of Korean patent application No. 10-2001-0048123 as filed on Aug. 10, 2001 and entitled A System For In Vivo Screening Of Protease Inhibitors, the disclosure of which is hereby incorporated by reference.
FIELD OF THE INVENTION
[0002] The present invention generally relates to system for detecting protease inside a cell. A preferred system includes a chimeric substrate protein that includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence. The invention has a broad spectrum of important applications including use in screens to detect compounds that block proteases produced by one or more human pathogens.
BACKGROUND
[0003] Protease is an enzyme that cleaves a specific peptide bond of proteins. In biological organisms, proteases having specific proteolytic activities, and their inhibitors are involved in regulation of various biological functions. In diverse biological processes, biologically necessary functions can be activated and regulated by proteolytic cleavage of a polyprotein precursor by a protease that results in formation of active proteins. Examples include blood coagulation, immuno-defensive processes, selective transports of proteins through intracellular membranes, viral proliferation in a host cell, etc. Therefore, protease is a major target in the development of specific protease inhibitors as new drugs.
[0004] Viral protease inhibitor is a representative example of a protease inhibitor developed as a new drug. Since the viral protease participates in the activation of polyprotein precursors via proteolytic cleavage, the protease is an essential element for the initiation of the viral proliferation and thus for the correct capsid assembly of replicated viruses in the host cell.
[0005] Protease inhibitors have been developed to block the proliferation of HIV that causes the acquired immune deficiency syndrome (AIDS). For example, amprenavir, nelfinavir, indinavir, ritonavir, and squinavir have been approved by FDA as drugs for inhibiting the HIV protease, and lopinavir and efavirenz are under clinical studies. Patients who were administered those medicines showed that the number of HIV particulates decreased to about 10% of that before the medicinal treatment. This shows that the protease inhibitor can be used as an efficient medicine. However, several side effects were reported during such treatments (Miller, T. L. et al., 2001), and mutants having mutated protease genes were reported in the cases of prolonged treatments (Jacobsen, H. et al., 1996; Cote, H. et al., 2001). Therefore, more diverse protease inhibitors that can specifically block the proliferation of various mutant HIV viruses need to be developed.
[0006] Protease inhibitors have been studied to inhibit other human and animal viruses such as HCV (Kasai, N. et al., 2001) and HERV (Kuhelj, R. et al., 2001). Researches for plant virus diseases have been also performed based on the same concept. For example, inhibition of the proteolytic cleavage of the polyproteins produced by TEV (tobacco etch virus) and PVY (potato virus Y) has been studied by expressing a recombinant protein as a protease inhibitor in a transgenic plant (Gutierres-Campos, R. et al., 1999). A study to identify proteolytic sites of a protease from a plant virus has been also performed (Yoon, H. Y. et al., 2000).
[0007] There have been attempts to develop protease inhibitors. For example, in many cases, screening of protease inhibitors has been performed by measuring cleavage of a substrate using electrophoresis, after a protease, its substrate peptide or protein, and a candidate chemical were mixed to react in vitro. As the importance of the proteolytic site has been recognized, peptides having amino acid sequences that are similar to the proteolytic site have been synthesized and used to find protease inhibitors (Kettner, C. A. and Korant, B. D., 1987). As it becomes easier to determine tertiary structures of proteins and also possible to design chemicals using computer simulation, many researches have been conducted to design and synthesize molecules that specifically bind to the active site of the enzyme (Wlodawer, A. and Erickson, J. W., 1993; Rodgers, J. D. et al., 1998; Mardis, K. L. et al., 2001). Also, attempts were made to use fluorescence-labeled substrates in order to increase the efficiency of the protease activity measurement (Ermolief, J. et al., 2000), and to use fragments of antibody expressed in the periplasm of E. Coli as protease inhibitors in order to enlarge the skeletal structure of the protease inhibitor (Kasai, N. et al., 2001).
[0008] Most of the protease activity screening methods used currently are performed in vitro. However, in the in vitro screening method, it is not possible to examine various complicated effects such as the transport efficiency of drug candidates into the cell, the stability and cytotoxicity of drug candidates in the cell, etc. Many additional time-consuming experiments are thus necessary before examining the drug candidates selected by the in vitro screening method in a living body. Therefore, a simpler and more generalized in vivo screening method needs to be developed to examine the cellular functions of the protease inhibitor candidates and also to screen more specific protease inhibitors.
[0009] There have been efforts to detect proteolytic cleavage and screening of protease inhibitors in more in vivo or in vivo-like conditions.
[0010] For example, there has been a report of a method using protease present in isolated vesicles (Hook, V. Y., 2001). Other disclosed methods include in situ zymography using a tissue section (Yi, C.-F. et al., 2001), and a method of treating cells with a polypeptide substrate of a protease (Kuhelj, R. et al., 2001).
[0011] However there is increasing recognition that these and related methods are associated with shortcomings.
[0012] For example, many of the methods are believed to only approximate in vivo environments. Accordingly, such methods do not always reflect intracellular environments that may substantially impact protease function.
[0013] Moreover, many of the prior methods are believed to be limited in terms of sensitivity, selectivity, and convenience. These and other drawbacks are believed to have lowered the efficiency and reliability of past screening attempts.
[0014] It would be desirable to have better in vivo methods for detecting protease inside cells that are more sensitive and easier to use. It would be especially desirable to have in vivo methods that can be readily adapted to detect inhibitors of mammalian and viral proteases.
SUMMARY OF THE INVENTION
[0015] The present invention relates to a system for detecting protease inside a cell or tissue. In one embodiment, the system includes at least one chimeric protein that includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence. A preferred chimeric protein functions as a "molecular beacon" that changes position inside the cell in the presence of the protease. The invention has a wide range of applications including use in in vivo screens to detect compounds that inhibit or block proteases associated with a human pathogen.
[0016] Preferred use of the invention entails that the subject cells or tissue include at least one active protease therein. Suitable proteases include those that are endogenous to the cell, for instance, what are known as "housekeeping" enzymes. Additionally suitable proteases include those that are not naturally-occurring to the cell or tissue. For instance, such a protease can be a consequence of a pathogen infection. Alternatively, presence of the protease inside the cells or tissue can be a result of an experimental manipulation intended to introduce the protease therein. In these embodiments, a change in the subcellular position of the chimeric protein (or a detectable component thereof) is taken to indicative of the presence of the protease inside the cells or tissue. Thus the invention provides as a spatially sensitive "molecular beacon" whose location inside the cells or tissue is indicative of the presence (or absence) of the subject protease.
[0017] As will become apparent, the invention is one of general application. That is, it can be used to detect a wide spectrum of proteases inside the cells or tissue. Preferred proteases are capable of cleaving (hydrolyzing), preferably specifically, a cleavage site within the chimeric protein. Sometimes the chimeric proteins will be referred to herein as chimeric substrate proteins to denote cleavage potential by the protease of interest. Site specific cleavage is understood to break the chimeric molecule, generally at or near the specific cleavage site, and release at least one of components therefrom e.g., the optionally masked signal protein, cleavage site, or the detectable amino acid sequence. Preferred release involves at least one of the detectable amino acid sequences but it may involve other components of the chimeric molecule depending on use.
[0018] More preferred release of one or more of the components from the chimeric molecule is intended to provide the spatially sensitive molecular beacon. For example, and in one embodiment, the detectable amino acid sequence is released from the chimeric substrate protein and diffuses essentially freely throughout the host cells or tissue. That diffuse signal is readily detectable and can be taken as indicative of presence of the protease. However in another invention embodiment, release of one or more of the detectable amino acid sequences is associated with guidance of the molecule to another subcellular location by the optionally masked signal protein. In this example of the invention, a more focused and higher intensity signal serves as indication of the protease.
[0019] In some embodiments of the invention, it will be useful for the chimeric molecule to retain at least one of the detectable amino acid sequences, preferably as an in-frame fusion, even in the presence of the protease inside the cells or tissue. In this instance, localization of the chimeric molecule can be monitored by reference to the detectable amino acid sequence when the protease is present. In most cases, the release and subsequent subcellular localization of the detectable amino acid sequence (alone or in combination with another component of the detectable chimeric protein) is readily visualized in situ by one or a combination of conventional detection strategies.
[0020] It is thus an object of the invention to link presence and preferably activity of the protease of interest to a change in localization of the chimeric protein. That change is readily detectable as increase or decrease in signal location and, preferably, intensity. As an example, the subcellular distribution of a detectably labeled chimeric protein (or one or more of the components) can be initially confined to a relatively small location such as an cell organelle. That confinement produces a relatively high signal intensity. However in the presence of protease that specifically cleaves the chimeric protein, the distribution can be much less constrained and even diffuse. That lack of confinement produces a relatively low signal intensity. In this example, the specific cleavage by the protease can be associated with movement of the chimeric protein (or labeled component) from the organelle to a larger space such as the cytosol. Alternatively, signal intensity can increase sharply in embodiments where presence of active protease is linked by the invention to subcellular movement of labeled protein from the cytosol to a more confined space (e.g., organelle or vacuole). In yet another embodiment, the presence of active protease can be associated with little or no change in signal intensity. Instead, the change is monitored by labeled protein moving from one sub-cellular location to another as in, for instance, movement of the chimeric substrate protein from one organelle to another organelle or vacuole.
[0021] Practice of the present invention provides a number of important advantages.
[0022] For example, the invention provides chimeric proteins that are cleaved specifically by the subject protease to produce labeled (and unlabeled) component proteins. Preferred practice of the invention links the subcellular location and, preferably, the signal intensity of the labeled proteins to the presence (or absence) of the subject protease. This "two-factor"detection strategy provides for highly sensitive and reliable protease detection. That is, both the location of labeled protein and its signal intensity within the cells or tissues can be taken to be indicative of the presence of the active protease. The invention is also highly selective i.e., it can readily discriminate between presence of different proteases or isozymes as well as presence of inactive and active versions of the same protease inside cells. Preferred chimeric proteins of the invention can be made with available reagents and standard recombinant manipulations making the invention easy to use.
[0023] Additionally, the invention is flexible and can be used to detect active protease in a wide range of cells, typically eukaryotes, including those derived from plants, yeast, fungi, animals, and insects. Preferred chimeric proteins have minimal impact on the gene expression in the cells or tissue, thereby avoiding potentially complicating genetic effects. The invention is also compatible with a variety of suitable protease cleavage sites and detectable amino acid sequences.
[0024] Accordingly, and in one aspect, the invention provides a system for detecting a protease inside a cell which system preferably includes at least one of the foregoing chimeric proteins. With respect to the chimeric protein, the order of linkage of each protein component (optionally masked signal protein, protease cleavage site, and detectable sequence) is not important so long as intended results are achieved. Typically however, the linkage order starts from the N-terminus of one component and ends at the C-terminus of another component.
[0025] As mentioned, the chimeric protein includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence. Preferably, the chimeric protein includes less than about 10 optionally masked signal proteins, more preferably less than about five of same, typically about 1, 2, or 3 of such signal proteins. By the phrase "optionally masked" is meant that an intended function (typically a trafficking signal) of the signal protein is masked or it is not masked. By the term "masked" is meant that an intended function of the signal protein is substantially reduced or preferably blocked completely, either reversibly or irreversibly, by covalently linking at least one masking sequence to the signal protein. Typically preferred signal proteins that are masked include about 1 to about 2 of such masking sequences. A generally preferred masking sequence consists of less than about 200 amino acid residues, preferably less than about 50 of same, with between from about 3 to about 20 residues being preferred for many applications. A specifically preferred masking sequence for many invention uses is at least one site specific protease cleavage site e.g., 1, 2, 3 or 4 of such sites.
[0026] Additionally preferred chimeric proteins in accord with the invention include, as covalently linked components, less than about 10 protease specific cleavage sites, preferably less than about 5 of same with about 1, 2, 3, or 4 of such sites being often preferred.
[0027] Still further preferred chimeric proteins include, as covalently linked components, less than about 10 of the detectable amino acid sequences, preferably less than about 5 of same with about 1, 2, 3, or 4 of such detectable amino sequences being preferred generally. In one embodiment, the sequences are fluorescent, phosphorescent or chemiluminescent proteins or functional fragments thereof. A functional fragment of the detectable amino acid sequence is capable of being detected with substantially the same sensitivity as the full-length sequence. In another embodiment, the amino acid sequence is an enzyme or catalytic fragment thereof that can be made fluorescent, phosphorescent or chemiluminescent upon contact with a suitable substrate. Methods for detecting and optionally quantifying signal from the detectable amino acid sequences are known in the field and explained in more detail below.
[0028] In another aspect, the invention provides a substantially pure chimeric protein that includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence.
[0029] In yet another aspect, the invention features a nucleic acid that includes a sequence that encodes the chimeric protein. In one embodiment, the encoded chimeric protein includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence.
[0030] Further provided by the present invention is a vector that includes a nucleic acid encoding the chimeric protein as disclosed herein. Also provided are cells such as plant, yeast, animal, fungi or insect cells, that include and preferably also express the chimeric protein.
[0031] Also provided by the present invention is a kit for detecting a protease inside a cell. In one embodiment, the kit includes the system described herein, which system preferably includes at least one of: a) a chimeric protein as described herein; and b) a vector comprising any of the nucleic acids encoding the chimeric proteins disclosed herein. Optionally, the kit may further include a vector comprising a nucleic acid encoding a protease specific to the chimeric protein and/or the cells that include and preferably also express the chimeric protein and the protease. The kit can also be used to screen inhibitors for the protease.
[0032] The invention provides additional uses and advantages. For instance, it is also an object of the present invention to provide an effective and generalized method for in vivo screening of inhibitors that are specific to a protease. Thus in one embodiment, the invention provides: [0033] (1) a chimeric substrate protein comprising at least one signal protein directing transport to a subcellular organelle, at least one proteolytic cleavage site specific to a protease, and at least one fluorescent protein label, and generalized methods for constructing the chimeric substrate protein; [0034] (2) a recombinant gene comprising a nucleic acid sequence encoding the chimeric substrate protein, which can be used to transform a cell to express the chimeric substrate protein; [0035] (3) a system wherein the chimeric substrate protein and its specific protease co-exist in a living cell so that the proteolytic cleavage of the substrate by the protease can take place in a living cell; [0036] (4) an efficient stepwise method for determining the protease activity in vivo by directly identifying the cleavage of the chimeric substrate protein by the protease in the living cell, via direct observation of the cell with the fluorescence signal emitted from the fluorescent protein label(s) conjugated to the substrate; and [0037] (5) an effective stepwise method for screening inhibitors specific to a protease by using the above system constructed for determination of the protease activity in the living cell.
[0038] By addressing the technical problems involved, the present invention provides more realistic in vivo methods for analyzing the activity of a specific protease and screening inhibitors for the protease, in which the protease and its specific substrate co-exist in a living cell so that the cleavage of the substrate by the protease can take place in a living cell and the result can be directly observed from the living cell.
[0039] In a related aspect, the invention provides a highly useful method for detecting a protease inside a cell or tissue. In an illustration of the invention, the method includes at least one and preferably all of the following steps: [0040] a) introducing, into a subject cell or tissue, a first vector comprising nucleic acid encoding a chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence, [0041] b) incubating the cell or tissue under conditions conductive to expressing the chimeric protein encoded by the first vector; and [0042] c) detecting a change in at least one of the subcellular localization and signal intensity of the chimeric protein (or detectably labeled component thereof) as being indicative of the presence of the protease inside the cell.
[0043] As discussed, the method is flexible and can be readily adapted to suit an intended use. For example, if needed, the method can further include the step of introducing, into the subject cell or tissue, a second vector comprising nucleic acid sequence encoding the protease; and expressing the second vector in the cell or tissue to produce the protease inside the cell or tissue.
[0044] The invention also provides a method for detecting and optionally quantifying the in vivo activity of a protease inhibitor. Preferred inhibitors can be endogenous to the cell or tissue of interest. However, in many invention embodiments the inhibitor will be administered to same and include naturally-occurring, synthetic, and semi-synthetic molecules. Such molecules can be obtained from chemical libraries and may include those having known, suspected or completely unknown inhibitor activity. For example, in embodiments in which the activity of a particular protease inhibitor is established, the invention can be used to confirm the activity of the protease in a particular cell, tissue type, or culture conditions. In other embodiments, the invention can be used to screen candidate compounds from the chemical libraries. In one example of the invention, the detection method includes at least one and preferably all of the following steps: [0045] a) introducing, into a subject cell or tissue, a first vector comprising nucleic acid encoding a chimeric protein comprising as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence, [0046] b) introducing into the cell or tissue a second vector encoding at least one subject protease, preferably one of same, [0047] c) contacting the cell or tissue with candidate compound, [0048] d) incubating the cell or tissue under conditions conducive to expressing the chimeric protein encoded by the first vector and the protease encoded by the second vector; and [0049] e) detecting a change in at least one of the subcellular localization and signal intensity of the chimeric protein (or a detectably labeled component thereof) as being indicative of the presence of the protease inhibitor.
[0050] The foregoing detection method is flexible and can be readily adapted to screen candidate compounds in stand alone, low or high throughput modes. For example, and in one embodiment, the method further includes use of an automated or semi-automated device that is preferably intended to detect the change in subcellular localization and signal intensity of the detectably labeled chimeric protein or labeled component protein thereof. A more particular device includes an optical system adapted to detect the detectable sequence inside the cell or tissue which system can provide output to a user in real-time or as stored output.
[0051] If desired, the detectable change in the subcellular localization, signal intensity (or both) of the chimeric protein (or a detectably labeled component thereof) can be monitored by reference to a suitable control. One suitable control is addition of water, saline or buffer instead of the compound to be tested in step c). Of course, use of a control may not be needed in embodiments in which the characteristics of the chimeric protein, the host cells or tissue, etc. are already established for a particular method.
[0052] As discussed, the invention is well-suited to detect protease inhibitors. Known protease inhibitors in accord with the invention are recognized viral disease inhibitors. Because the viral protease is essential for replication and reassembly of viral capsids, the protease inhibitor can be used for treatment of viral diseases by inhibiting the protease to suppress the viral proliferation. Animal viruses can cause diseases such as AIDS and hepatitis etc., and plant viruses reduce crop yield by causing wilting leaves and mottling.
[0053] In one aspect, the present invention provides a framework for effectively developing inhibitors specific to various proteases by supplying the substrates and the method for in vivo screening of inhibitors specific to various proteases. This method can be also used to determine the efficiency of the drug candidates screened by the conventional in vitro method.
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] FIG. 1 shows schematic diagrams of signal proteins having trafficking signals to subcellular organelles and labeled with fluorescent proteins.
[0055] FIG. 2(a)-(h) show fluorescence photographs visualizing the localized distributions of the signal proteins labeled with the fluorescent proteins when they are expressed correctly in a cell.
[0056] (a) shows that Arabidopsis outer envelope protein (AtOEP7) labeled with green fluorescent protein (GFP) is localized in chloroplast envelope;
[0057] (b), (c), and (d) show that Rubisco small subunit (RbcS), Chlorophyll a/b binding protein (Cab), and Rubisco activase (RA) labeled with GFP are localized in chloroplast stroma;
[0058] (e) shows that F1-ATPase labeled with GFP is localized in mitochondria;
[0059] (f) shows that peroxisome targeting motif (SKL) labeled with GFP is localized in peroxisome; and
[0060] (g) and (h) show that H+-ATPase and Pleckstrin homology domain (PH) labeled with GFP are localized in the plasma envelope.
[0061] FIG. 3 shows schematic diagrams of recombinant genes for NIa protease and an in vivo substrate of NIa protease, RFP:PS(NIa):AtOEP7:GFP:
[0062] (a) shows the structure of the recombinant gene constructed to express the chimeric protein RFP:PS(NIa):AtOEP7:GFP used as an in vivo substrate of NIa protease in Example 2 of the present invention, wherein RFP, GFP, PS(NIa), AtOEP7, and 35S indicate red fluorescent protein, green fluorescent protein, the proteolytic cleavage site of the protease, Arabidopsis outer envelope protein, and CaMV35S promoter, respectively; and
[0063] (b) shows the structure of the recombinant gene constructed to express Ma protease used in Example 2, wherein NIa represents the coding region of NIa protease from TVMV.
[0064] FIG. 4 shows fluorescence photographs observed after expressing the chimeric protein RFP:PS(NIa):AtOEP7:GFP in the Arabidopsis protoplast transformed with the recombinant gene shown in FIG. 3(a):
[0065] (a), (b), (c), and (d) are images of green fluorescence signal, red fluorescence signal, and overlap of green and red fluorescence signals, and an image captured under bright field, respectively. The red fluorescence signal observed in chloroplast is auto-fluorescence signal of chloroplast and the red fluorescence signal observed in cytosol originates from the red fluorescent protein.
[0066] FIG. 5 shows fluorescence images observed after co-expressing the chimeric protein RFP:PS(NIa):AtOEP7:GFP and Ma protease in the Arabidopsis protoplast transformed with the recombinant genes shown in FIGS. 3(a) and 3(b), showing that the cleavage of RFP:PS(NIa):AtOEP7:GFP by NIa protease can be visualized:
[0067] (a), (b), (c), and (d) are images of green fluorescence signal, red fluorescence signal, and overlap of green and red fluorescence signals, and an image captured under bright field, respectively. The red fluorescence signal observed in chloroplast is auto-fluorescence signal of chloroplast and the red fluorescence signal observed in cytosol originates from the red fluorescent protein.
[0068] FIG. 6 is representation of a Western blot showing that the cleavage of the chimeric protein RFP:PS(NIa):AtOEP7:GFP by Ma protease takes place in the Arabidopsis protoplast. The case (+) of co-expressing RFP:PS(NIa):AtOEP7:GFP shown in FIG. 3(a) and NIa protease shown in FIG. 3(b) is compared with the case (-) of expressing RFP:PS(NIa):AtOEP7:GFP alone. The protein bands observed at 70 kD and 35 kD correspond to the intact chimeric protein RFP:PS(NIa):AtOEP7:GFP and the fragment protein AtOEP7:GFP produced by the proteolytic cleavage, respectively.
[0069] FIG. 7(a)-(j) are drawings showing plasmid maps of various constructs used to express the fusion proteins shown in FIG. 1: (a) AtOEP7:GFP, (b) AtOEP7:RFP, (c) RbcS:GFP, (d) RbcS:RFP, (e) Cab:GFP, (f) RA:GFP, (g) F1-ATPase:GFP, (h) GFP:SKL, (i) H+-ATPase:GFP, (j) GFP:PH.
[0070] FIG. 8(a) and (b) show plasmid maps for Ma protease and its chimeric substrate protein RFP:PS(NIa):AtOEP7:GFP, respectively. An important part of the nucleic acid sequence for the chimeric substrate protein is noted.
[0071] FIG. 9(a) and (b) show plasmid maps for HIV-1 protease and its chimeric substrate protein RFP:PS(HIV-1):AtOEP7:GFP, respectively. Nucleic acid and protein sequences for the proteolytic cleavage sites (SEQ ID NOs: 57-74) are noted.
[0072] FIG. 10 shows schematic diagrams of recombinant genes encoding examples of optionally masked chimeric substrate protein in which trafficking signal of one signal protein remains active. PS indicates the proteolytic cleavage site sequence. FP-1 and FP-2 indicate coding sequences for fluorescent proteins having different fluorescence wavelengths.
[0073] FIG. 11 shows fluorescence images observed after expressing a chimeric substrate protein RFP:PS(HIV-1):AtOEP7:GFP in the Arabidopsis protoplast transformed with one of the recombinant plasmids shown in FIG. 9(b). FIG. 11(a), (b), and (c) are images of green fluorescence signal, red fluorescence signal, and overlap of green and red fluorescence signals, respectively. The weak red fluorescence signal observed in the green fluorescence image is auto-fluorescence of chloroplast.
[0074] FIG. 12 shows fluorescence images observed after co-expressing HIV-1 protease and a chimeric substrate protein RFP:PS(HIV-1):AtOEP7:GFP in the Arabidopsis protoplast transformed with the recombinant genes shown in FIGS. 9(a) and (b). FIGS. 12(a), (b), and (c) are images of green fluorescence signal, red fluorescence signal, and overlap of green and red fluorescence signals, respectively. The weak red fluorescence signal observed in the green fluorescence image is auto-fluorescence of chloroplast.
[0075] FIG. 13(a) and (b) shows images observed after expressing a chimeric substrate protein H+-ATPase:PS(NIa):GFP. FIG. 13(c) and (d) show images observed after co-expressing NIa protease and a chimeric substrate protein H+-ATPase:PS(NIa):GFP. FIG. 13(a) and (c) are images of green fluorescence signal, and FIG. 13(b) and (d) are images obtained under bright field. The red fluorescence observed in the green fluorescence images are auto-fluorescence of chloroplast.
DETAILED DESCRIPTION OF THE INVENTION
[0076] As discussed, the invention provides a highly useful system for detecting a protease inside a cell or tissue. If desired, the system is readily adapted to detect more than one protease, preferably less than about 3 of same, usually about 1 protease. Preferably, the system includes a chimeric protein that includes as covalently linked components: 1) at least one optionally masked signal protein; 2) at least one protease-specific cleavage site; and 3) at least one detectable amino acid sequence. Typically preferred chimeric proteins consist of less than about 20 components, more preferably less than about 10 of same, with between from about 3 to about 6 components being generally preferred for most of the proteins. The invention has a wide spectrum of important applications including use in screens to detect candidate compounds that reduce or completely block protease activity in vivo.
[0077] A "system" according to the invention includes one or more of the chimeric molecules described herein as well as any additional components which may be added thereto such as those which may facilitate solubilization or stability of same. Examples include but are not limited to a serum protein such as bovine serum albumin, a buffer such as phosphate buffered saline, or an acceptable vehicle or stabilizer. See generally Reminington's Pharmaceutical Sciences, Mack Pub. Co., Easton, Pa., 1980, for a discussion of acceptable vehicles, stabilizers, etc. Typical systems in accord with the invention will also include at least one of the nucleic acids, vectors, manipulated cells or tissue described herein. In such invention embodiments, the chimeric protein can serve as a useful experimental control. A preferred system includes from between about 1 to 10, preferably less than about 5 and more preferably about 1 of the chimeric proteins dissolved in an acceptable carrier such as water or buffered saline. Preferably, the system is provided sterile.
[0078] By the phrase "signal protein" is meant a polypeptide sequence that has either a specific trafficking signal targeting to a subcellular organelle or a specific property related to its localization in a cell such as aggregate formation. Preferred signal proteins can be found throughout this disclosure including the Examples section.
[0079] Preferred chimeric proteins according to the invention include an optionally masked signal protein, protease specific cleavage site, and detectable amino acid sequence that are covalently linked together (i.e. fused) by recombinant, chemical or other suitable method. In most embodiments, recombinant approaches will be preferred. Although not generally needed for most invention embodiments, one or more of the components can be fused at one or several sites through a peptide linker sequence. Particular peptide linker sequences will less than about 30 amino acids, more preferably less than about 15 amino acids, still more preferably from about 1 to about 5 amino acids. That peptide sequence can include one or more sites for cleavage by a pathogen induced or host cell induced protease. Alternatively, the peptide linker may be used to assist in construction of the chimeric protein. Specifically preferred chimeric proteins can be referred to as "in-frame" fusion molecules.
[0080] As noted, components of the chimeric proteins disclosed herein can be organized in nearly any fashion provided that the protein has the function for which it was intended. And as mentioned, each component of the chimeric protein can be spaced from another component by at least one suitable peptide linker sequence.
[0081] For instance, any one of the components of the chimeric protein can include the N-terminus of the protein. Additionally, any one of the components can include the C-terminus of the chimeric protein which terminus may include another component such as a purification tag sequence as discussed below. Unless specified otherwise, the phrase "covalently linked in sequence" means, with respect to an amino acid sequence, peptide bonds bound together in the N to C direction. With respect to a nucleotide sequence, the phrase is meant to denote joining of one nucleoside to another in a 5' to 3' direction.
[0082] As a more specific example of the system, the chimeric protein includes covalently linked in sequence: 1) the signal protein; 2) the protease-specific cleavage site; and 3) the detectable amino acid sequence. Alternatively, the chimeric protein can include covalently linked in sequence: 1) a masking sequence; 2) the protease cleavage site; 3) the signal protein; and 4) the detectable amino acid sequence. In another embodiment, the chimeric protein for use with the system features covalently linked in sequence: 1) the signal protein; 2) the protease cleavage site; 3) the masking sequence; and 4) the detectable amino acid sequence.
[0083] In invention embodiments in which more than one signal protein is needed in the chimeric protein, such a protein can include covalently linked in sequence: 1) a first signal protein; 2) the protease cleavage site; and 3) a second signal protein; and 4) the detectable amino acid sequence. More particularly, such a protein can include covalently linked in sequence: 1) the first signal protein; 2) a first protease cleavage site; 3) the masking sequence; 4) the second signal protein; and 5) the detectable amino acid sequence.
[0084] In some instances, it will be helpful to have a system in which the chimeric protein includes more than one protease cleavage site e.g., 1, 2, 3 or 4. In such a case, the chimeric protein can include covalently linked in sequence: 1) the masking sequence; 2) a first protease cleavage site; 3) a first signal protein; 4) a second protease cleavage site; 5) a second signal protein; and 6) the detectable amino acid sequence. Alternatively, the chimeric protein can include covalently linked in sequence: 1) a first signal protein; 2) a first protease cleavage site; 3) a second signal protein; 4) a second protease cleavage site; 5) a masking sequence and 6) the detectable amino acid sequence.
[0085] In another invention embodiment, the chimeric protein include covalently linked in sequence: 1) the protease-specific cleavage site; 2) the signal protein; and 3) the detectable amino acid sequence. Alternatively, the chimeric protein can include covalently linked in sequence: 1) a first signal protein; 2) a first detectable sequence; 3) the protease cleavage site; and 4) a second detectable sequence. In this invention example, the protein can further include a second signal protein covalently linked between the C-terminus of the protease cleavage site and the N-terminus of the second detectable sequence.
[0086] The invention provides for still further chimeric proteins that include covalently linked in sequence: 1) a first signal protein; 2) the protease cleavage site; 3) a second signal protein; and 4) a second detectable sequence. Alternatively, the chimeric protein can include covalently linked in sequence: 1) a first detectable sequence; 2) the protease cleavage site; 3) the signal protein; and 4) a second detectable sequence.
[0087] More preferred chimeric proteins in accord with the invention will have a molecular size of less than about 250 kDa, preferably less than about 200 kDa, more preferably a molecular size of between about 25 to about 175 kDa as determined by standard SDS PAGE gel electrophoresis using appropriate molecular weight markers.
[0088] A "polypeptide" refers to any polymer preferably consisting essentially of any of the 20 natural amino acids regardless of its size. Although the term "protein" is often used in reference to relatively large proteins, and "peptide" is often used in reference to small polypeptides, use of these terms in the field often overlaps. The term "polypeptide" refers generally to proteins, polypeptides, and peptides unless otherwise noted.
[0089] As used herein, the term "cell" is intended to include any primary cell or immortalized cell line, any group of such cells as in, a tissue or an organ. Preferred cells include mammalian cells such of those of human origin, plant cells, yeast, fungi and insect cells. A "host cell" in accord with the invention can be an infected cell or it can be a cell such as E. coli that can be used to propagate a nucleic acid or vector as described herein.
[0090] It will be appreciated that particular uses of the invention will often require a specific chimeric protein configuration. Choice of a particular chimeric protein component or group of components will be guided by recognized parameters including the signal protein(s), protease specific cleavage site(s), and detectable amino acid sequences selected, the protease(s) to be monitored, and the level of sensitivity or selectivity required for an application.
[0091] By way of example, the invention encompasses embodiments in which the chimeric protein includes one signal protein, one detectable sequence and one protease specific cleavage site. In this example, it will often be helpful to include a masking sequence linked to the N- or C-terminus of the signal protein. That is, it is envisioned that linkage of the masking sequence to the detectable sequence will be less preferred for some of invention uses.
[0092] Other specific uses of the invention will typically require other specific chimeric protein configurations. For instance, where a particular protein has one signal protein, and one protease specific cleavage site positioned between two detectable amino acid sequences, it will often be desirable to remove one of the detectable amino acid sequences to optimize the system.
[0093] Still other uses of the invention will be facilitated by having multiple masking sequences which can be a signal sequence, detectable amino acid sequence or other suitable sequence such as a protease specific cleavage site. In these embodiments, having one or more additional masking sequences e.g., the protease cleavage site, may not be necessary to achieve maximal use of the system. However in embodiments that include two different signal proteins, one protease cleavage site and two detectable amino acid sequences, one or two masking sequences thereon can be helpful.
[0094] Additionally, particular chimeric proteins may include removable tagging sequences that, in some embodiments, may assist identification and/or purification of the chimeric protein. An example is 6×His and MYC tags. Other suitable tagging sequences are well known in the field and can be used with the invention if desired.
[0095] Practice of the invention is fully compatible with a wide variety of signal proteins and functional fragments thereof that are masked or unmasked. Preferred examples are generally sufficient to localize the chimeric protein (or detectable component of that protein such as the detectable sequence) to an organelle or other subcellular compartment. By the phrase "compartment" is meant an internally limited space such as a vacuole, peroxisome, mitochondrion, etc.
[0096] Preferred plant signal proteins localize the chimeric protein or at least one of its components to the nucleus, golgi body, lytic vacuole, storage vacuole, peroxisome, mitochondrion, endoplasmic reticulum, plasma membrane, or chloroplast of a plant cell. More preferred plant signal proteins include AtOEP7; RbcS; Cab; RA; SKL; F1-ATPase; PH; FAPP; H+-ATPase; or a functional fragment thereof. Preferred animal signal proteins localize the chimeric protein or at least one of its components to the nucleus, golgi body, storage vacuole, lysosome, peroxisome, endoplasmic reticulum, plasma membrane, or mitochondrion of an animal cell. Examples include human peptide methionine sulfoxide reductase (MSRA), cytochrome b2,11-beta-hydroxysteroid dehydrogenase (11β-HSD), G9-AKL, peroxisomal integral membrane protein 47 (PMP47); or a functional fragment thereof. See the Examples below for sequences of preferred animal signal proteins.
[0097] As discussed, practice of the invention is compatible with use of signal proteins that are functional in animal cells. Typical examples include, but are not limited to, the animal signals shown in the following Table I.
TABLE-US-00001 TABLE I Protein Organelle Position of signal Type1 MSRA Mitochondria N-terminal A Cytochrome b2 Mitochondria N-terminal A 11β-HSD ER N-terminal A & B G9-AKL Peroxisome C-terminal A PMP47 Peroxisome A domain in the B middle of protein 1A type: Signal protein can be used as a masked signal protein. B type: Signal protein can be used as unmasked signal protein. See Hansel A et al. FASEB J 2002 Jun.; 16: 911-31 (human peptide methionine sulfoxide reductase; "MSRA"); Bomer U et al. (1997) J Biol Chem 272 30439-30446 (Cytochrome b2); Naray-Fejes-Toth A and Fejes-Toth G (1996) J Biol Chem 271 15436-1544 (11β-hydroxysteroid dehydrogenase (11β-HSD)); McNew JA and Goodman JM (1994) J Cell Biol 127 1245-1257 (G9-AKL); Dyer et al., (1996) J. Cell Biol. 133 269-280 (PMP47 (peroxisomal integral membrane protein 47)).
[0098] By the phrase "functional fragment", when used herein to describe a signal protein, is meant a fragment of a particular signal protein that is capable of providing at least about 70%, preferably higher than about 90% of the shuttling function of the full-length sequence. Methods for detecting and quantifying signal protein function are known and include localization and quantification of signal intensity using the fluorescence imaging techniques described in the Examples section.
[0099] As should be apparent, the invention is flexible and not limited to use of any particular protease specific cleavage site. For example, the cleavage site can be specifically cleaved by a mammalian or viral protease. By the phrase "specifically cleaved" is meant that peptide bonds in a specified protease cleavage site are specifically broken (i.e. hydrolyzed) by a subject protease. That is, the protease cleavage sites are not broken by proteases which naturally occur in the host cell including what is generally referred to as housekeeping proteases. Specific cleavage of those protease cleavage sites can be monitored by a variety of techniques including SDS-polyacrylamide gel electrophoretic methods.
[0100] Preferred protease cleavage sites are those that are specifically hydrolyzed by a protease associated with a human pathogen e.g., yeast, bacterium, fungus, nematode, virus or protozoan. More specific examples include cytomegalovirus (CMV); herpes simplex virus (HSV); hepatitis virus, preferably type A or C; a plasmodium, human immunodeficiency virus (HIV), Kaposi's sarcoma-associated herpes virus (KSHV), yellow fever virus, flavivirus, rhinovirus, or a plasmodium such as P. falciparum, P. vivax, P. ovale, or P. malariae. Typically, the plasmodia cause malaria or various medical complications relating to malaria. There is recognition that the proteases plasmepsin I and plasmepsin II are implicated. In embodiments in which HSV is of interest, the protease will be the maturational protease of HSV.
[0101] A variety of particular HIV-1 and HCV protease specific cleavage sites have been disclosed. See e.g., Gluzman, I. Y. et al., J. Clin. Invest., 94:1602 (1994); Grakoui, A. et al., J. of Viral., 67:2832 (1993); Kolykholov, A A. et al., J. of Virol., 68:7525 (1994); and Barrie, K. A. et al., Virology, 219:407 (1996), the disclosures of which are incorporated by reference.
[0102] Additional pathogen-specific proteases and specified cleavage sites have been described and can be used in accord with the present invention. For example, an HSV-1 maturational protease and protease cleavage site has been described. See e.g. Hall, M. R. T. and W. Gibson, Virology, 227:160 (1997). Further, the plasmepsins I and II have been found in the digestive vacuole of P. falciparum. The corresponding proteinase cleavage sites have also been disclosed. See e.g., Moon, R. P., Eur. I Biochem., 244:552 (1997).
[0103] Additional protease specific cleavage sites for use with the invention are specifically cleaved by a mammalian protease associated with blood coagulation, apoptosis, or the extracellular matrix. See the Examples and discussion that follows.
[0104] Practice of the invention is compatible with use of one or a combination of detectable amino acid sequences e.g., those that are directly or indirectly fluorescent, phosphorescent, luminescent or chemiluminescent. In embodiments in which two or more of such detectable sequences are used, the emission wavelength of one of the detectable sequences will often be different from at least one other of the detectable sequences. For example, a preferred detectable sequence is derived from certain well-known jellyfish fluorescent protein including those that are recognized to emit green, red, and yellow light under appropriate excitation conditions.
[0105] As discussed, the invention further provides substantially pure chimeric proteins. Such chimeric proteins can be separated and purified by appropriate combination of known techniques. If desired, such proteins can include one or more purification tags as described herein. These methods include, for example, methods utilizing solubility such as salt precipitation and solvent precipitation, methods utilizing the difference in molecular weight such as dialysis, ultra-filtration, gel-filtration, and SDS-polyacrylamide gel electrophoresis, methods utilizing a difference in electrical charge such as ion-exchange column chromatography, methods utilizing specific affinity such as affinity chromatograph, methods utilizing a difference in hydrophobicity such as reverse-phase high performance liquid chromatograph and methods utilizing a difference in isoelectric point, such as isoelectric focusing electrophoresis, metal affinity columns such as Ni-NTA. See generally Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed. (1989); Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York (1989); and Ausubel et al., Short Protocols in Molecular Biology, John Wiley & Sons, New York (1999) for disclosure relating to these methods.
[0106] It is preferred that the chimeric proteins of the present invention be substantially pure. That is, the chimeric proteins have been isolated from cell substituents that naturally accompany it so that the chimeric proteins are present preferably in at least 80% or 90% to 95% homogeneity (w/w). Chimeric proteins having at least 98 to 99% homogeneity (w/w) are most preferred for many pharmaceutical, clinical and research applications. Once substantially purified the chimeric protein should be substantially free of contaminants for cell culture and related applications. Once purified partially or to substantial purity, the soluble chimeric proteins can be used therapeutically, or in performing in vitro or in vivo assays as disclosed herein. Substantial purity can be determined by a variety of standard techniques such as chromatography and gel electrophoresis.
[0107] A suitable host cell can be used for preparative purposes to propagate nucleic acid encoding a desired chimeric protein. Thus a host cell can include a prokaryotic, plant or eukaryotic cell in which production of the chimeric protein is specifically intended. Thus host cells specifically include yeast, fly, worm, plant, frog, mammalian cells, plant cells and organs that are capable of propagating nucleic acid encoding the chimeric protein. Non-limiting examples of mammalian cell lines which can be used include CHO dhfl-cells (Urlaub and Chasm, Proc. Natl. Acad. Sci. USA, 77:4216 (1980)), 293 cells (Graham et al., J Gen. Virol., 36:59 (1977)), myeloma cells like SP2 or NSO (Galfre and Milstein, Meth. Enzymol., 73(B):3 (1981)). Other suitable cells are disclosed in Sambrook et al., supra.
[0108] Host cells capable of propagating nucleic acid encoding a desired chimeric protein encompass non-mammalian eukaryotic cells as well, including insect (e.g., Sp. frugiperda), yeast (e.g., S. cerevisiae, S. pombe, P. pastoris, K lactis, H. polymorpha; as generally reviewed by Fleer, R., Current Opinion in Biotechnology, 3(5):486496 (1992)), fungal and plant cells (e.g., Arabidopsis and Nicotinia). Also contemplated are use of certain prokaryotes such as E. coli and Bacillus.
[0109] Nucleic acid encoding a desired chimeric protein can be introduced into a host cell by standard techniques for transfecting cells. The term "transfecting" or "transfection" is intended to encompass all conventional techniques for introducing nucleic acid into host cells, including calcium phosphate co-precipitation, DEAE-dextran-mediated transfection, lipofection, electroporation, microinjection, viral transduction and/or integration. Suitable methods for transfecting host cells can be found in Sambrook et al. supra, and other laboratory textbooks.
[0110] The present invention further provides a production process for isolating a chimeric protein of interest. In the process, a host cell (e.g., a yeast, fungus, insect, bacterial or animal cell), into which has been introduced a nucleic acid encoding the protein of the interest operatively linked to a regulatory sequence, is grown at production scale in a culture medium in the presence of the chimeric protein to stimulate transcription of the nucleotides sequence encoding the chimeric protein of interest. Subsequently, the chimeric protein of interest is isolated from harvested host cells or from the culture medium. Standard protein purification techniques can be used to isolate the protein of interest from the medium or from the harvested cells. In particular, the purification techniques can be used to express and purify a desired chimeric protein on a large-scale (i.e. in at least milligram quantities) from a variety of implementations including roller bottles, spinner flasks, tissue culture plates, bioreactor, or a fermentor.
[0111] Thus the invention further provides a nucleic acid sequence encoding the chimeric substrate protein. The nucleic acid encoding the chimeric substrate protein can be used to transform a cell to express the chimeric substrate protein in the cell. In order to transform the cell, the recombinant gene must include a promoter and other regulatory nucleic acid sequences operably linked to the coding region of the chimeric substrate protein. Choice of a promoter will be guided by recognized parameters, typically selection of the host cell.
[0112] As discussed, the invention also provides nucleic acid sequences and particularly DNA sequences that encode the present chimeric proteins. Preferably, the DNA sequence is carried by a vector suited for extrachromosomal replication such as a phage, virus, plasmid, phagemid, cosmid, YAC, or episome. In particular, a DNA vector that encodes a desired chimeric protein can be used to facilitate preparative methods described herein and to obtain significant quantities of the chimeric protein. The DNA sequence can be inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for the transcription and translation of the inserted protein-coding sequence. A variety of host-vector systems may be utilized to express the protein-coding sequence. These include mammalian cell systems infected with virus (e.g., vaccinia virus, adenovirus, etc.); insect cell systems infected with virus (e.g., baculovirus); microorganisms such as yeast containing yeast vectors, or bacteria transformed with bacteriophage DNA, plasmid DNA or cosmid DNA. Depending on the host-vector system utilized, any one of a number of suitable transcription and translation elements may be used. See generally Sambrook et al., supra and Ausubel et al. supra.
[0113] In general, a preferred DNA vector according to the invention comprises a nucleotide sequence linked by phosphodiester bonds comprising, in a 5' to 3' direction a first cloning site for introduction of a first nucleotide sequence encoding a chimeric protein as described herein. If desired, the protein can be linked to DNA encoding one or more suitable tag sequences. FIGS. 7 (a)-(j); 8(a)-(b); and 9(a)-(c) provide examples of such vectors.
[0114] In some invention embodiments, it will be preferred that the chimeric protein encoded by the DNA vector be provided in a "cassette" format. By the term "cassette" is meant that the encoded protein (or a component thereof) can be readily substituted for another component by standard recombinant methods. In particular, a DNA vector configured in a cassette format is particularly desirable when it is useful to "swap" one site specific protease cleavage site or detectable amino acid sequence for another. See FIGS. 7 (a)-(j); 8(a)-(b); and 9(a)-(c) in which a core vector is used to express a variety of chimeric proteins.
[0115] More specifically, it is envisioned that in some cases, certain pathogen serotypes, especially viral strains, may be associated with individual protease cleavage sites specific for that serotype or strain. In this regard, the emergence of drug resistant HIV serotypes has been particularly problematic. In this case, one or more existing protease cleavage sites in a DNA vector formatted as a cassette can be replaced with other pre-determined protease cleavage sites as needed. Particular protease cleavage sites can be selected in accord with presence of the pathogen in individual human patients.
[0116] Significantly, the present invention can serve as an effective "warning system" that can register changes in protease activity in a subject cell or tissue. For instance, in cases where a PCR or hybridization experiment has indicated presence of genomic DNA encoding a pathogen-associated protease, use of the present invention can detect presence of active protease in the cells or tissue. This feature of the invention is useful in a variety of settings including cell culture in which pathogen contamination is known or suspected.
[0117] As discussed, the invention can be used to detect the presence of one or more proteases in a host cell or tissue of interest. The methods and compositions described herein are especially useful for detecting and analyzing protease inhibitor molecules, which as discussed, may be naturally-occurring or be part of a pool of such molecules (i.e., a chemical library).
A. Use of the Invention to Screen Candidate Protease Inhibitors
[0118] 1. General Considerations
[0119] In addition to the plant protoplasts as used in the examples described in this invention, normal plant cells with cell wall or human, animal, or insect cells can also be used to screen protease inhibitors according to the present invention. Plant cells will often be preferred because transformation and cultivation are easier and also identification of the protein localization can be facilitated due to the regular spherical shape of the protoplast. However, other cells and tissue may be useful e.g., if the protease of interest cannot be expressed as an active form in plant cells. For instance, such a protease may be difficult to express due to inappropriate post-translational modification. Suitable animal or insect cells may be used to overcome this problem if it arises. In such case, signal proteins that can work in the selected cell must be used. For example, if human, animal, or insect cells are to be used for screening, the chloroplast targeting signal proteins such as AtOEP7, RbcS, Cab, and RA cannot be used because chloroplasts are not present in these cells. For human, animal, or insect cells, signal proteins targeting to other subcellular organelles such as mitochondria, peroxisome, plasma membrane, etc can be used. In addition, the vector system including the promoter and other regulatory elements must be selected appropriately depending on the cells to be used.
[0120] For in vivo screening of protease inhibitors according to the present invention, transformed cells that can express both a protease of interest and a chimeric substrate protein specific to the protease are prepared. The most preferred method to prepare the transformed cell is to co-transform the cell with a recombinant plasmid encoding the protease and a recombinant plasmid encoding a chimeric substrate protein provided by the present invention. If the protease is a viral protease, viral infection could also be used to express the protease in the transformed cell.
[0121] Methods for transforming a wide variety of plant, animal, yeast, fungi and insect cells are established. For instance, methods for transforming cells by introducing recombinant plasmids include, but are not limited to, chemical-mediated methods using PEG (polyethylene glycol), potassium phosphate, or DEAE-dextran, cationic lipid-mediated lipofection, microinjection, electroporation, electrofusion, and DNA bombardment. Depending on the type of the cell to be used, an appropriate transformation method has to be selected and the conditions need to be optimized to achieve efficient transformation. If protoplasts of plant cells such as Arabidopsis or Tobacco protoplasts are to be used, the PEG-mediated transformation method described in Example 1(c) is a preferred method. The conditions given in Example 1(c) are optimized for Arabidopsis protoplasts. If normal plant cells with cell wall are to be used, DNA bombardment with a gene gun or the PEG-mediated transformation method could be used depending on the type of the plant cell. For human, animal, or insect cells, potassium phosphate-mediated or DEAD-dextran-mediated transfection method or cationic lipid-mediated lipofection method can be used to transform the cells.
[0122] To screen protease inhibitors, the transformed cells need to be contacted by a candidate compound during the expression of the protease and the chimeric substrate protein. Typically, a candidate compound is added to the solution containing the transformed cells and the resulting solution is incubated at appropriate temperature. The candidate inhibitor can be one selected from the group consisting of chemical compounds, peptides, mixtures of chemical compounds or peptides, and extracts of natural products. The incubation time to express the proteins can vary depending on the type of the cell and the incubation temperature. It could range about 1 hr to several days. For the Arabidopsis protoplast, the incubation time could be as short as 4 hrs and as long as about a week depending on the incubation conditions. When the Arabidopsis protoplasts are incubated in the W5 solution at 22° C., preferred incubation time ranges from 12 to 48 hrs. The concentration of the candidate inhibitor that can be examined is about 0.1 to 100 μg/ml. In typical screening, concentration of about 1 to a few μg/ml can be used. In the case that a large number of candidate compounds need to be screened, mixtures of about 5-30 candidate inhibitors can be used in the first round of the screening.
[0123] Nearly any compound or group of compounds can be screened for anti-protease activity in accord with this invention. Examples include, but are not limited to, cytokines, tumor suppressors, antibodies, receptors, muteins, fragments or portions of such proteins, and active RNA molecules, e.g., an antisense RNA molecule or ribozyme. A preferred compound for screening purposes is a synthetic or semi-synthetic drug (referred to sometimes as a "small molecule"). For example, a pool of derivatives of known inhibitors of human viral pathogens can be readily tested by the present methods. See e.g., U.S. Pat. Nos. 6,420,438; 6,329,525; 6,287,840; 6,147,188; and 6,046,190 (disclosing a variety of testable molecules and derivatives thereof).
[0124] It is possible to use the invention to screen additional compounds. See Pillay et al. (1995) Rev. Med. Virol. (disclosing a variety of potential viral protease inhibitor compounds); and Wei et al. (1995) Nature, 373: 117 (disclosing indinavir, ABT-538); Ho et al. (1992) Ann. Intern. Med. 113: 111 (disclosing an anti-herpes agent). Also, derivatives of the forgoing specific compounds can be screened in accord with the invention including, but not limited to, saquinavir and derivatives thereof.
[0125] In order to identify inhibition of the protease activity by the candidate compound, it is preferred to monitor the fluorescence images of the transformed cells as a function of time. The expression time could vary depending on the condition of the cells and therefore the proteolytic activity or inhibition could appear in different times. If the Arabidopsis protoplasts are the cells to be used, preferred time sequence to monitor the fluorescence images is 12, 18, 24, 36, and 48 hrs after the expression is started. If the chimeric substrate protein has two or more different fluorescence labels, it is preferred to monitor fluorescence images at two or more fluorescence wavelengths specific to the fluorescence labels. It is preferred to monitor the bright field image because it can facilitate identification of the subcellular organelles. In order to facilitate identification of the protease inhibitors, it is also preferred to monitor the fluorescence images of the transformed cells that are not contacted with the candidate inhibitor for comparison. A standard fluorescence microscope equipped with multicolor fluorescence filter sets such as Zeiss Axioplan fluorescence microscope or Nikon E800 fluorescence microscope can be used to monitor the fluorescence images with magnification of 200×, 400×, or 600×. The scanning confocal microscope can be used to obtain higher resolution images.
[0126] It will be apparent that the invention is compatible with the construction and use of a wide spectrum of chimeric substrate proteins. See Example 1, for instance.
[0127] 2. Illustrative Use of the NIa Protease
[0128] To measure the protease activity using the chimeric substrate protein constructed as described above, NIa protease of Tobacco Vein Mottling Virus (TVMV) was used as a model system in the Examples of the present invention. NIa protease is one of the best-characterized viral proteases, and it is known that it cleaves seven specific sites of the polyprotein produced by TVMV. In order to achieve the optimum protease activity, six amino acid residues (P6-P1) on N-terminus and four amino acid residues (P1'-P4') on C-terminus are needed and four conserved amino acid residues (V-R-F-Q) must be included at P4-P1 of the substrate protein. If any one of the four conserved amino acid residues is mutated to glycine (Gly), the proteolytic cleavage of the substrate protein cannot take place (Yoon, H. Y. et al., 2000).
[0129] When the recombinant gene for expressing the chimeric substrate protein, RFP:PS(NIa):AtOEP7:GFP including the proteolytic site (PS(NIa)) of NIa protease was introduced into a protoplast by a polyethyleneglycol-mediated transformation method, GRP and RFP were accumulated as large aggregates in cytosol (FIG. 4). This result corresponds to the case in which the protease does not function. In this case, accumulation of the aggregates in cytosol is suspected to result from hydrophobic interactions among the hydrophobic regions of the AtOEP7 proteins. In the case that the red fluorescence signal of the chimeric substrate protein is not readily distinguishable from auto-fluorescence of chloroplast (FIG. 4(b)), it is possible to clearly determine the reaction of the substrate by using two different fluorescence labels and observing whether or not the green fluorescence coincides with the red fluorescence in the overlapped fluorescence image (FIG. 4(c)). When the plasmid encoding NIa protease and the plasmid encoding RFP:PS(NIa):AtOEP7:GFP were introduced together in the transformation, green and red fluorescent signals were separated into chloroplast outer envelope membrane and cytosol, respectively (FIG. 5). This result suggests that NIa protease successfully cleaved the proteolytic site of the chimeric substrate protein to generate two proteins, RFP and AtOEP7:GFP. The protease reaction can be identified by observing the green fluorescence signal translocated to chloroplast outer envelope membrane by the action of the signal protein (FIG. 5(a)), the red fluorescent signal dispersed in cytosol (FIG. 5(b)), and the overlapped image of both fluorescence signals (FIG. 5(c)).
[0130] The efficiency of the present invention for identifying the result of the enzyme reaction can be more clearly observed by comparing with the control experiment. Comparing FIGS. 4(a) and 5(a), it can be observed that the proteolytic cleavage activates the masked trafficking signal of the signal protein, inducing the change in the distribution of the fluorescence. Comparing FIGS. 4(b) and 5(b), it can be observed that the red fluorescent protein used as the signal masking protein plays a role in altering the distribution of the fluorescence signal in addition to the masking of the trafficking signal.
[0131] Accordingly, it is proposed that the chimeric substrate protein comprising a chimeric protein including a signal protein and a fluorescent protein, and a signal masking protein including a proteolytic site can be used as a substrate to determine the activity of a protease and also to screen its inhibitors in vivo.
[0132] 3. High Throughput Screening Assays: General Considerations
[0133] The in vivo protease inhibitor screening method provided by the present invention can be easily adapted to use in a high throughput assay. The high throughput screening method comprises contacting the cells transformed to express a protease and its chimeric substrate protein with a candidate compound, incubating the transformed cells, obtaining fluorescence images of the transformed cells, converting the fluorescence images into digital data, and analyzing the digital data to determine whether the candidate compound has inhibited the protease.
[0134] In the high throughput screening of the present invention, equal amounts of the transformed cell solution are loaded into arrays of wells in standard microtiter plates with 96 or 384 wells, and different candidate compounds are added to each well. The transformed cells in the microtiter plates are then incubated to express the protease and its chimeric substrate protein at a controlled environment (at appropriate temperature, humidity, and air composition). Fluorescence images of the transformed cells in each well are obtained using a fluorescence microscope after a preselected incubation times.
[0135] The high throughput screening apparatus consists of (1) an incubator for transformed cell sample arrays in multiples of microtiter plates, (2) an automated sampler, (3) an automated fluorescence microscope equipped with a XY-translation sample stage and a high resolution digital camera. After a preselected incubation time, the automated sampler samples a small portion of the transformed cell solution from each well and loads it to an observation plate such as a slide glass. Sampling can be done in a parallel manner using multiple tips or pipets or in a sequential manner using a single tip or pipet. After each sampling, the tip(s) or pipet(s) is subject to be washed with appropriate washing solution. The transformed cell samples are loaded on the observation plate in a form of arrays at preselected positions. The XY translation stage holds and moves the observation plate in the XY direction to locate the each sample under the microscope objective. A Z-axis focus drive moves either the microscope objective or the observation plate in the Z direction for focusing. For each sample, fluorescence images at preselected fluorescence wavelengths are captured using the digital camera, fed into a PC, and stored as a digital data. An automation controller is provided to control the pipet sampler, the XY translation stage, and the Z focus drive. The PC provides a display and a data analysis software.
[0136] The hits for the protease inhibitors can be automatically determined by examining the subcellular fluorescence distribution. In general, the extent of the localization or dispersion of the fluorescence signal within the size of the observed cell could be calculated from the digital image data and used as criteria for decision. For some cases, pattern or shape of the fluorescence signal distribution could be used as criteria for decision.
[0137] An example of a high throughput screening protocol suitable for use with the invention has been disclosed in U.S. Pat. No. 5,989,835 and PCT Application WO 00/79241 A2.
[0138] The following Tables 2 and 3 provide sequence information for use with the invention:
TABLE-US-00002 TABLE 2 SEQ ID NOs of PCR Primers 5' primer 3'-primer Partial AtOEP7 1 2 Partial RbcS 3 4 Partial Cab 5 6 Partial RA 7 8 Partial F1-ATPase 9 10 SKL 11 12 H+-ATPase 13 14 Substrate protein for NIa protease 15 16 HIV-1 protease 17 18 Substrate proteins for HIV-1 protease 19, 20, 21, 22, 23, 24, 28 25, 26, 27
TABLE-US-00003 TABLE 3 SEQ ID NOs of Signal Proteins Nucleic acid sequence Protein sequence Partial AtOEP7 29 30 Partial RbcS 31 32 Partial Cab 33 34 Partial RA 35 36 Partial F1-ATPase 37 38 H+-ATPase 39 40 Partial PH 41 42 Partial FAPP 43 44
[0139] The following provides illustrative nucleic acid and protein sequence information for use with the invention. Examples of proteases and their cleavage sites are provided in SEQ ID NOs: 45-50, 55-74, and 77-120. Examples include NIa protease (SEQ ID NOs: 45 and 46) and its cleavage sites (SEQ ID NOs: 47-50), HIV-1 protease (SEQ ID NOs: 55 and 56) and its cleavage sites (SEQ ID NOs: 57-74), HCV NS3 protease (SEQ ID NOs: 77 and 78) and its cleavage sites (SEQ ID NOs: 79-84), HSV-1 protease (SEQ ID NOs: 85 and 86) and its cleavage sites (SEQ ID NOs: 87-90), HTLV-1 protease (SEQ ID NOs: 91 and 92) and its cleavage sites (SEQ ID NOs: 93-96), HCMV protease (SEQ ID NOs: 97 and 98) and its cleavage sites (SEQ ID NOs: 99-102), APP beta-secretase (SEQ ID NOs: 103 and 104) and its cleavage site (SEQ ID NOs: 105 and 106), caspase 3 (SEQ ID NOs: 107 and 108) and its cleavage site (SEQ ID NOs: 113 and 114), the large subunit of caspase 3 (SEQ ID NOs: 109 and 110), the small subunit of caspase 3 (SEQ ID NOs: 111 and 112), human blood coagulation factor II (SEQ ID NOs: 115 and 116) and its cleavage site (SEQ ID NOs: 117 and 118), and human blood coagulation factor XI (SEQ ID NOs: 119 and 120).
[0140] Nucleic acid and protein sequences of two NIa protease substrate proteins are provided in SEQ ID NOs: 51-54 and those for a HIV-1 protease substrate protein are provided in SEQ ID NOs: 55 and 56.
[0141] The following discussion relates to Korean application No. 10-2001-0048123 in which additional uses and advantages of the present invention have been disclosed.
[0142] As provided therein, the invention provides important chimeric substrate proteins that can be used to screen protease inhibitors in vivo. As particularly disclosed therein, the invention relates to a system for screening protease inhibitors and it provides: (i) a chimeric substrate protein constructed to induce change in the subcellular localization and distribution of fluorescence by the specific function of a protease, (ii) a recombinant gene comprising a nucleic acid sequence encoding the chimeric substrate protein that can be used to express the chimeric substrate protein in a cell, (iii) a method to identify the activity of the protease by detecting the subcellular localization and distribution of fluorescence under the circumstance that the protease and the chimeric substrate protein are present together in a cell so that a proteolytic cleavage by the protease can take place in the cell, and (iv) a method to screen protease inhibitors in vivo using the chimeric substrate protein and the method described above.
[0143] More particularly, the Korean application No. 10-2001-0048123 discloses that such chimeric substrate proteins can be used in a cell in which the signal protein directs trafficking to a subcellular organelle when expressed in a cell. In further detail, the trafficking signal toward a specific subcellular organelle, included in the signal protein, can be inactivated by linking a signal masking protein to the N- or C-terminal of the signal protein. In the present invention, a signal protein is linked to a signal masking protein with a proteolytic cleavage site of a protease so that the trafficking of the signal protein can be activated or inactivated depending on the cleavage at the proteolytic site. In other words, the trafficking of the chimeric substrate protein, in which the signal protein and the signal masking protein are linked with the proteolytic site, does not occur until the signal masking protein is cleaved off by the protease. Such cleavage induces normal trafficking of the signal protein. In the present invention, the signal protein and/or the signal masking protein are labeled with fluorescent proteins so that the activity of the protease can be determined by measuring changes in the localization and distribution characteristics of the fluorescence signal. Therefore, the chimeric protein used as a substrate of the protease in the present invention has the following characteristics:
[0144] (1) The chimeric substrate protein includes at least one signal protein that has a trafficking signal directing transport to a specific subcellular organelle.
[0145] (2) The chimeric substrate protein includes at least one proteolytic cleavage site for a specific protease.
[0146] (3) The trafficking signal of the signal protein clarified in (1) is inactivated by linking the proteolytic cleavage site to the signal protein or by linking a signal masking protein to the signal protein through the proteolytic cleavage site.
[0147] (4) The inactivated trafficking signal of the signal protein can be activated when cleavage at the proteolytic site occurs by the protease.
[0148] (5) The chimeric substrate protein is labeled with at least one fluorescent protein and the fluorescence signal from the cell changes depending on the proteolytic cleavage by the protease.
[0149] As discussed in the Korean application No. 10-2001-0048123, the invention further provides for a method for measuring the protease activity in vivo using the chimeric substrate protein described herein. Also provided is method for screening protease inhibitors using the method for measuring the protease activity.
[0150] As further disclosed in the Korean application No. 10-2001-0048123, to measure the protease activity in vivo, a protease and a chimeric substrate protein specific to the protease must co-exist in a cell.
[0151] Also, the recombinant gene for the chimeric substrate protein according to one invention aspect is introduced into the cell to express the chimeric substrate protein in the cell. The target protease for screening protease inhibitors can be an endogeneous protease present in the cell or an exogeneous protease expressed by transforming with a recombinant gene or infecting with a virus. However, when an endogeneous protease is a target, the accuracy and efficiency of the screening may be low e.g., due to the difficulties in regulating the expression of the protease and also in detecting under a low level of the protease expression. Therefore, the present invention provides a system for more efficiently determining the protease activity in vivo, wherein a specific protease can be over-expressed or expressed in a regulated manner by transforming the cell with a recombinant gene or infecting the cell with a virus. Viral infection can be used in the case of a viral protease. However, regulation of viral protease expression is not completely understood, it is more preferable to use a protease expressed by transforming the cell with a recombinant gene encoding the protease. Since the expressed protease is located in cytosol, it is necessary to make the chimeric substrate protein located in cytosol as well. Therefore, a system for efficiently measuring the protease activity in vivo can be constructed by using the chimeric substrate protein according to the first aspect of the present invention, wherein the trafficking signal of the signal protein included in the chimeric substrate protein is masked.
[0152] Furthermore, inhibitors of the protease can be selected by detecting changes in the localization and distribution of the fluorescence signal, caused by treating the cell with a candidate chemical before, after, or at the same time as the protease and its chimeric substrate protein are expressed in the cell.
[0153] In many enzymatic reactions, reactants cannot be completely converted to products. In the case that the reaction is inhibited by an inhibitor, it could also be partially inhibited rather than completely inhibited. Moreover, when multiple cells are observed, the level of the protease activity in each cell could vary considerably. Therefore, there may be a considerable ambiguity in determining the inhibition activity of the protease inhibitor if the method used for determining the protease activity has low sensitivity or low contrast. In order to avoid such ambiguity, it is important in the construction of the chimeric substrate protein to select a signal protein that can induce a clearly distinguishable change in the cellular localization and distribution of the fluorescence signal depending on the proteolytic cleavage. In addition, the efficiency of determining the inhibition activity can be enhanced employing two or more fluorescent proteins having different fluorescence wavelengths. In Example 2 of the present invention, GFP and RFP were employed in the construction of the chimeric substrate protein so that they can be localized in different subcellular organelles upon proteolytic cleavage.
[0154] As discussed above, certain signal proteins according to the invention are optionally masked. In this embodiment; the signal protein included in the chimeric substrate protein provided by the present invention is inactivated due to the signal masking by the proteolytic site or the signal masking protein linked to the signal protein and thus the chimeric substrate protein is present in cytosol. The signal protein can be activated by the proteolytic cleavage to direct its trafficking to a subcellular organelle. Different characteristics of signal proteins need to be considered in selecting the signal protein whose trafficking signal is inactivated in the chimeric substrate protein. The endosomal trafficking proteins are translocated to the Golgi body, the lytic vacuole, the storage vacuole, or the plasma membrane as enclosed in the endoplasmic reticulum as soon as they are synthesized. The trafficking signals of the endosomal trafficking proteins are recognized during the translation process. Therefore, it may not be possible to inactivate and activate the trafficking signals of the endosomal trafficking proteins by simply linking and cleaving off the proteolytic site with or without the signal masking protein. Therefore, these endosomal trafficking proteins are less adequate for use in the present invention as the signal proteins whose trafficking signals are inactivated. Proteins expressed in the cytosol and transported directly to the subcellular organelles can be used as the signal protein because their trafficking signals can be masked according to the present invention. Among the latter signal proteins, signal proteins having the nuclear location signal (NLS) are not dependant on the N- or C-terminus and thus it is difficult to control the trafficking of these signal proteins by linking or cleaving off the proteolytic site with or without the signal masking protein. It is thus desirable to select a signal protein that has a trafficking signal targeting to mitochondria, chloroplast, or peroxisome. In the case of plant cells, it is more preferable to use a chloroplast targeting signal protein because chloroplast is relatively big and thus easier to detect its shape and distribution.
[0155] The signal masking protein included in the chimeric substrate protein of the present invention inactivates the trafficking signal of the signal protein by being linked to the signal protein through the proteolytic site. The signal masking protein can be an amino acid, a peptide, or a protein that is linked to the signal protein through the proteolytic site. For some cases, it may be possible to inactivate the trafficking signal of the signal protein by linking the proteolytic site alone to the signal protein. The signal masking protein and the proteolytic site must not interfere the binding of the substrate with the protease. In addition to the simple signal masking, the signal masking protein can also be used to change the overall characteristics of the chimeric substrate protein, or to attach an additional trafficking signal or a fluorescent label. For example, if another signal protein is selected as a signal masking protein, this signal protein will move to its target organelle when cleaved off by the protease. In such case, if this signal protein is also labeled with a fluorescent protein, it will be possible to more clearly identify the cleavage of the substrate protein by detecting two different fluorescence signals. In another example, a fluorescent protein can be used as a signal masking protein. In this case, the fluorescent protein formed by the proteolytic cleavage will stay in cytosol. It is thus possible to increase the efficiency of determining the protease activity or detecting the inhibition activity of a protease inhibitor by observing distinctively the fluorescence signal from cytosol and that from the subcellular organelle to which the signal protein formed by the proteolytic cleavage is translocated.
[0156] One or a combination of standard recombinant methods can be employed to make the chimeric substrate proteins disclosed herein. That is, the method for constructing the chimeric substrate protein can be characterized by its expandability. For example, if at least two proteolytic sites are included in the chimeric substrate protein of the protease, trafficking of two or more signal proteins can be observed. If proteolytic sites for two or more different proteases are introduced, the activities of two or more proteases can be examined simultaneously.
[0157] Detailed methods for selecting the signal protein and the signal masking protein and constructing the chimeric substrate protein are as follows. The mark-1-indicates the proteolytic site of the protease and M represents the signal masking protein.
[0158] (1) In the case that the trafficking signal is present at the N-terminus of the signal protein (nS), the signal masking protein is placed at the N-terminal side of the signal protein: (M-↓-nS).
[0159] (2) In the case that the trafficking signal is present at the C-terminus of the signal protein (Sc), the signal masking protein is placed at the C-terminal side of the signal protein: (Sc-↓-M).
[0160] (3) In the case that another signal protein S' is used as the signal masking protein, S' can be selected to possess the trafficking signal in the opposite side compared to that of S. The trafficking signals of the two signal proteins can be simultaneously masked by constructing the chimeric substrate protein with the trafficking signal parts of the two signal proteins being linked: (Sc-↓-nS' or S' c-↓-nS).
[0161] (4) In the construction as in (3), if two proteolytic sites are to be introduced, the signal masking protein has to be placed between two signal proteins: (Sc-↓-nS' or Sc'-↓-M-↓-nS).
[0162] (5) If the trafficking signals of a signal protein (S) and the other signal protein (S') that acts as a signal masking protein are on the same sides, another signal masking protein that masks the trafficking signal of S' needs to be linked: (M-↓-nS'-↓-nS or Sc-↓-Sc'-↓-M).
[0163] By extending the constructions of (1) to (5) described above, the chimeric substrate protein can be constructed with three or more proteolytic sites: (Scm-↓- . . . -↓-Sc2-↓-Sc1-↓-M-↓-nS1-.dwnarw- .-nS2-↓- . . . -↓-nSn).
[0164] The construction methods described above are the representative examples of possible construction methods.
[0165] In addition to the cases described above, wherein the trafficking signals of all the signal proteins included in the chimeric substrate protein are masked, there are other construction methods that can provide the chimeric substrate protein with its proteolytic cleavage to occur in cytosol. If the trafficking signal of only one signal protein included in the chimeric substrate protein remains active and all the trafficking signals of the rest of the signal proteins are masked, the chimeric substrate protein will be translocated to a subcellular organelle that is the target of the active signal protein. Herein, in the case that the translocated chimeric substrate protein resides on the membrane of a subcellular organelle, the chimeric substrate protein can be constructed in which at least one proteolytic site and at least one inactivated signal protein is exposed to cytosol so as to achieve the same effect as in the case of using the chimeric substrate protein with all the trafficking signals of the signal proteins being masked. If this chimeric protein having only one signal protein remaining active is used as a substrate, the proteolytic reaction can occur by the protease present in cytosol because the proteolytic site is exposed to cytosol, although the chimeric substrate protein is not freely dispersed in cytosol. In this case, the inactivated signal protein exposed to cytosol becomes activated by the proteolytic cleavage. Therefore, the fragment protein that includes this activated signal protein will be translocated to a specific subcellular organelle that is different from the subcellular organelle where the chimeric substrate protein resided, resulting in alteration in the localization and distribution of the fluorescence signal attached to the activated signal protein.
[0166] Another possible method for constructing the chimeric substrate protein with the trafficking signal of only one signal protein remaining active is to link a fluorescent protein such as GFP or RFP, that has no trafficking signal, to the proteolytic site exposed to cytosol, instead of linking a signal protein. In this case, since the fluorescent protein produced by the proteolytic cleavage becomes dispersed in cytosol, the distribution of the fluorescent signal changes from a specific cellular organelle to cytosol. In this case, however, clearness for distinguishing whether the fluorescence signal is located in membrane or cytosol could be low due to incompleteness of the enzyme reaction. In addition, there may be considerable difficulties in constructing the chimeric substrate protein with only one signal protein remaining active, because detailed information is needed not only for the subcellular organelle to which the signal protein is translocated, but also for the orientation and position of the translocated signal protein.
[0167] Signal proteins targeting to outer membranes of mitochondria, chloroplast, and nucleus, peroxisome membrane, and plasma membrane can be used as the signal protein that remains active in the chimeric substrate protein. Signal proteins that can specifically bind to phospholipids can also be used. Examples includes Pleckstrin homology domain (PH) that binds to phosphatidylinositol 4,5-diphosphate (P1(4,5)P2) as shown in FIG. 2(h) and pleckstrin homology domain of FAPP (family A (phosphoinositide binding specific) member 3) that binds to phosphatidylinositol 4-phosphate (PI(4)P).
[0168] In Example 2, Western blot analysis was performed for the cells transformed to express a protease and its chimeric substrate protein constructed according to the present invention, and it was confirmed that the protease reaction was taking place correctly. Comparing with the Western blot analysis in which the cells were lysed and the crude extract was electrophoresed and identified with antibody, the system provided by the present invention in which the identification can be carried out by simply observing the cell itself is more efficient in terms of both time and cost.
[0169] In Example 1 of the present invention, a system was constructed in which trafficking and distribution of a protein can be visually determined in a cell. Chimeric proteins were constructed to visualize the localization of the proteins after translocation by selecting signal proteins that have trafficking signals to subcellular organelles and labeling with a fluorescent protein. It is shown that localization of the chimeric protein can be identified by observing the fluorescent image of the cell transformed with a recombinant plasmid that includes a recombinant gene for the chimeric protein.
[0170] Among these chimeric proteins, AtOEP7:GFP was selected and a proteolytic site of a protease was linked to construct a chimeric substrate protein that can be used for screening protease inhibitors in vivo. AtOEP7 is a protein targeting to Arabidopsis chloroplast outer envelope membrane, and it was already mentioned that it is more desirable to select a chloroplast targeting protein for plant cells. A signal masking protein was linked to the N-terminal side of AtOEP7:GFP, because AtOEP7 has its trafficking signal at N-terminus. As a signal masking protein, red fluorescent protein (RFP) was selected. Therefore, the substrate chimeric protein was constructed such that the green fluorescence localizes to chloroplast envelope membrane and the red fluorescence distributes in cytosol after the proteolytic cleavage.
[0171] In general, preparation of the fusion molecules of the invention includes conventional recombinant steps involving, e.g., polymerase chain amplification reactions (PCR), preparation of plasmid DNA, cleavage of DNA with restriction enzymes, preparation of oligonucleotides, ligation of DNA, isolation of mRNA, introduction of the DNA into a suitable cell, and culturing of the cell. Additionally, the chimeric proteins described herein can be isolated and purified in accordance with well known techniques including methods that comprise standard electrophoretic, centrifugation and chromatographic manipulations. See generally, Sambrook et al., supra; and Ausubel et al., supra; for disclosure relating to these methods.
[0172] DNA and protein sequences described herein can be obtained from a variety of public sources including those specifically mentioned. A preferred source is the National Center for Biotechnology Information (NCBI)-Genetic Sequence Data Bank (Genbank) at the National Library of Medicine, 38A, 8N05, Rockville Pike, Bethesda, Md. 20894. Genbank is also available on the internet. See generally Benson, D. A. et al., Nucl. Acids. Res., 25:1 (1997) for a description of Genbank.
[0173] Other reagents used in the examples such as antibodies, cells and viruses can be obtained from recognized commercial or public sources such as Linscott's Directory (40 Glen Drive, Mill Valley Calif. 94941), and the American Type Culture Collection (ATCC) 12301 Parklawn Drive, Rockville, Md. 20852.
[0174] All documents mentioned herein are incorporated herein by reference.
[0175] The present invention is further illustrated by the following Examples. These Examples are provided to aid in the understanding of the invention and are not construed as a limitation thereof.
Example 1
Detection of Chimeric Proteins and Trafficking to the Subcellular Organelles
[0176] (a) Construction of Recombinant Plasmids for Expression of the Chimeric Proteins
[0177] The coding region of the outer envelope membrane protein of Arabidopsis, AtOEP7, that is a homolog of OEP14 of pea, was amplified by polymerase chain reaction (PCR) from Arabidopsis genomic DNA using two specific primers (5'-GACGACGACGCAGCGATG and 5'-GGATCCCCAAACCCTCTTTGGATGT) designed to remove the natural termination codon. Then, it was ligated in frame to the 5' end of the coding region of the green or red fluorescent protein to construct recombinant plasmids for AtOEP7:GFP and AtOEP7:RFP, respectively. The ligated genes were regulated by the 35S promoter in the recombinant plasmids. The same method was used for construction of other recombinant plasmids described hereafter.
[0178] For expression of the chimeric protein of Rubisco (ribulose bisphosphate carboxylase) complex protein, the coding region for the transit peptide of the small subunit of the Rubisco complex was amplified by PCR from a λZAPII cDNA library using two specific primers (5'-CCTCAGTCACACAAAGAG and 5'-ACTCGAGGGAATCGGTAAGGTCAG). The resulting PCR product was subcloned into pBluescript and subsequently ligated in-frame to the 5' end of the coding region of GFP or RFP to construct recombinant plasmids for RbcS:GFP or RbcS:RFP, respectively.
[0179] For expression of the chlorophyll a/b binding protein, the corresponding gene was amplified by PCR from a λZAPII cDNA library using two specific primers (5'-TAGAGAGAAACGATGGCG and 5'-GGATCCCGTTTGGGAGTGGAACTCC) to construct a recombinant plasmid for Cab:GFP.
[0180] The coding region for the transit peptide of rubisco activase (RA) was amplified by PCR from a λZAP11 cDNA library using two specific primers (5'-TCTAGAATGGCCGCCGCAGTTTCC and 5'-GGATCCATCTGTCTCCATCGGTTTG) and ligated to the 5' end of the coding region of GFP to construct a recombinant plasmid for RA:GFP.
[0181] The coding region for the transit peptide of F1-ATPase-(accession number: D88374) was amplified by PCR from a λZAPII cDNA library using two specific primers (5'-CTTTAATCAATGGCAATG and 5'-CCATGGCCTGAACTGCTCTAAGCTT) and ligated to the 5' end of the coding region of GFP to construct F1-ATPase:GFP.
[0182] A recombinant plasmid for the peroxisome targeting protein, GFP:SKL, was constructed by PCR amplification with 326GFP (Davis, S. J. and Viestra, R. D., 1998) as a template using two specific primers (5'-CCGTATGTTACATCACC and 5'-TTATAGCTTTGATTTGTATAGTTCATCCAT).
[0183] The full length H+-ATPase (AHA2 of Arabidopsis) was amplified with two specific primers (5'-GAGATGTCGAGTCTCGAA and 5'-CTCGAGCACAGTGTAGTGACTGG) using the above method and ligated to the 5' end of the coding region of GFP to construct a recombinant plasmid for H+-ATPase:GFP.
[0184] A recombinant plasmid for the chimeric protein of the PH domain (Pleckstrin homology domain), GFP:PH, was constructed according to the method described by Kost, B. et al. (1998).
[0185] Schematic structures of the chimeric proteins expressed from the recombinant plasmids constructed according to the above method are shown in FIG. 1.
[0186] (b) Preparation of Protoplasts
[0187] Leaf tissues (5 g) of 3-4 week-old Arabidopsis plants grown on soil in a green house were cut into small squares (5-10 mm2) with a new razor blade and incubated with 50 ml of the enzyme solution (0.25% Macerozyme R-10, 1.0% Cellulase R-10, 400 mM mannitol, 8 mM CaCl2, 5 mM Mes-KOH, pH 5.6) at 22° C. with gentle agitation (50-75 rpm). After incubation, the protoplast suspension was filtered through 100 μm mesh and protoplasts were collected by centrifugation at 46×g for 5 min. The pelleted protoplasts were resuspended in 5 to 10 ml of the W5 solution (154 mM NaCl, 125 mM CaCl2, 5 mM KCl, 5 mM glucose, 1.5 mM Mes-KOH, pH 5.6), overlaid on top of 20 ml of 21% sucrose, and centrifuged at 78×g for 10 min. The intact protoplasts at the interface were transferred to 20 ml of the W5 solution. The protoplasts were pelleted again by centrifugation at 55×g for 5 min, resuspended in 20 ml of the W5 solution, and then incubated on ice for 30 min.
[0188] (c) Isolation of the Recombinant Plasmid Dnas and Transformation of the Protoplast
[0189] Recombinant plasmids were purified using Qiagen columns (Valencia, Calif.) according to the manufacture's protocol. To transform the protoplasts with the DNA, the protoplasts were pelleted again at 46×g for 5 min and resuspended in the MaMg solution (400 mM Mannitol, 15 mM MgCl2, 5 mM Mes-KOH, pH 5.6) at a density of 5×106 protoplasts/ml. The recombinant plasmid constructs were introduced into the Arabidodsis protoplasts by PEG (polyethylene glycol)-mediated transformation method (Jin et al., 2001). About 20-50 μg of the plasmid DNA at a concentration of 2 μg/μl was mixed with 300 μl of the protoplast suspension, and 325 μl of the PEG (polyethylene glycol) solution (400 mM Mannitol, 100 mM Ca(NO3)2, 40% PEG 4000) was added and gently mixed. The mixture was incubated for 30 min at room temperature. After incubation, the mixture was diluted with 10 ml of W5 solution. The protoplasts were recovered by centrifugation at 50×g for 5 min, resuspended in 3 ml of the W5 solution, and incubated at 22° C. in the dark.
[0190] (d) Expression of the Chimeric Proteins and Observation of their Subcellular Localizations
[0191] The recombinant plasmid DNAs constructed in Example 1(a) were used to transform the protoplasts according to the method described in Example 1(c). The expression of the chimeric proteins after the transformation was monitored as a function of time by capturing images using a fluorescence microscope (Axioplan fluorescence microscope, Zeiss, Germany) equipped with a cooled charge-coupled device camera. The filter sets used were XF116 (exciter: 474AF20, dichroic: 500DRLP, emitter: 510AF23), XF33/E (exciter: 535DF35, dichroic: 570DRLP, emitter: 605DF50), and XF137 (exciter: 540AF30, dichroic: 570DRLP, emitter: 585ALP) (Omega, Inc, Brattleboro, Vt.) for GFP, RFP, and auto-fluorescence of chlorophyll, respectively. Data were then processed using Adobe (Mountain View, Calif.) Photoshop software and presented in pseudo-color format.
[0192] Green fluorescence of the chimeric protein of AtOEP7:GFP was observed at the outer envelope membrane of the chloroplast (FIG. 2(a)). This result indicates that the chimeric protein comprising the chloroplast envelope targeting signal peptide and the fluorescent protein label was correctly targeted to the chloroplast envelope membrane.
[0193] Localization of the green fluorescence from the chimeric proteins RbcS:GFP, Cab:GFP and RA:GFP are presented in (b), (c), and (d) of FIG. 2, respectively. As shown in the figures, RbcS:GFP was located in the stroma of chloroplast, and Cab:GFP and RA:GFP also emitted the fluorescence in chloroplast. These results indicate that the chimeric proteins comprising the signal peptide of RbcS, Cab, or RA, and the fluorescence protein label were targeted to chloroplast.
[0194] The green fluorescence signals of the chimeric proteins, F1-ATPase:GFP, GFP:SKL, and H+-ATPase:GFP were observed in mitochondria, peroxisome, and plasma membrane (FIGS. 2(e)-(g)), respectively. The red fluorescent signals in these results were auto-fluorescence of chloroplast.
[0195] The green fluorescence signal of GFP:PH, comprising the PH domain (Pleckstrin homology domain) that specifically binds to a phospholipid, was distributed on the plasma membrane where phosphatidylinositol 4,5-diphosphate (PI(4, 5)P2) was present (FIG. 2(h)).
[0196] FIG. 7(a)-(j) is explained in more detail as follows. The figure shows plasmid maps of the recombinant plasmids used to express the fusion proteins depicted in FIG. 1. The methods for constructing these recombinant plasmids are described in the present example. Nucleic acid and protein sequences of the signal proteins included in these fusion proteins are provided in SEQ ID NOs: 29-42. In addition, nucleic acid and protein sequences of partial pleckstrin homology domain of FAPP (family A (phosphoinositide binding specific) member 3) are provided in SEQ ID NOs: 43 and 44.
[0197] These signal proteins are examples of the signal proteins that can be used as either inactivated signal proteins or active signal proteins according to the present invention. AtOEP7, RbcS, Cab, RA, F1-ATPase, and SKL (peroxisome targeting sequence) are examples of signal proteins that are inactivated by masking in the chimeric substrate protein. AtOEP7, HtATPase, PH, and FAPP are examples of signal proteins that remain active in the chimeric substrate protein.
Example 2
Detection of Cleavage of the Chimeric Substrate Protein by Protease
[0198] (a) Construction of the Recombinant Plasmids
[0199] The recombinant plasmid for NIa protease was constructed by placing the coding region of NIa protease under the control of the 35S promoter in a pUC vector.
[0200] The recombinant plasmid for Arabidopsis outer envelope membrane protein:green fluorescent protein (AtOEP7:GFP) was constructed by ligating the AtOEP7 coding region without the termination codon to the 5' end of the coding region of the green fluorescent protein in the 326GFP vector (obtained from Arabidopsis Biological Resource Center, Ohio University, USA). The cleavage site of the protease, VRFQ, was ligated to the N-terminus of AtOEP7:GFP by PCR amplification of this plasmid with two primers (5' primer, 5'-CCCGGGGTGTGCGCTTCCAGGGAAAAACTTCGGGAGCG and 3' primer, 5'-GAGCTCTTATTTGTATAGTTCATC). The PCR product (SmaI and XhoI fragment) was then ligated to HindIII (filled in) and XhoI sites of the 326RFP-nt vector to construct the recombinant plasmid for the chimeric substrate protein RFP:VRFQ:AtOEP7:GFP (FIG. 3(a)).
[0201] (b) Transformation with the Recombinant Plasmids
[0202] Transformation was performed as in (b) and (c) of Example 1.
[0203] (c) Detection of the Fluorescent Proteins Using a Fluorescence Microscope
[0204] Detection of the fluorescent protein was carried out as in (d) of Example 1. The final recombinant plasmid for expression of the chimeric substrate protein RFP:VRFQ:AtOEP7:GFP was introduced to the protoplast and the subcellular trafficking was examined for 24-36 hrs after transformation. As shown in FIG. 4, the chimeric substrate protein was localized as large speckles or aggregates in the protoplast but not targeted to the chloroplast. Both of the red and green fluorescent signals were observed in the same speckle.
[0205] In the next experiment, it was examined whether NIa protease can cleave the cleavage site in the chimeric substrate protein. When the protoplast was co-transformed with the recombinant plasmid for NIa protease, the green fluorescence signal was observed at the envelope membrane of the chloroplast whereas the red fluorescence signal was observed as uniformly dispersed in the cytosol, as shown in FIG. 5. Furthermore, the red and green fluorescence signals no longer overlapped each other, strongly suggesting that NIa protease cleaved the chimeric substrate protein in vivo.
[0206] (d) Western Blot Analysis
[0207] The transformed protoplasts were harvested and lysed in 50 μl of cell lysis buffer (50 mM Tris-HCl, pH 7.5, 1 mM DTT, 1 mM EDTA, 50 mM NaCl). Expression of the chimeric substrate protein RFP:VRFQ:AtOEP7:GFP and cleavage of the chimeric substrate protein into RFP and AtOEP7:GFP by NIa protease were identified by Western blot analysis using monoclonal anti-GFP antibody (Clontech, Inc) and the ECL kit (Amersham, Inc).
[0208] As shown in FIG. 6, when Ma protease was not co-transformed, the chimeric substrate protein RFP:VRFQ:AtOEP7:GFP was detected at the expected size of 70 kDa. In contrast, when NIa protease was co-transformed, the anti-GFP antibody detected a protein at 35 kDa, an expected size of AtOEP7:GFP. This result indicates that the chimeric substrate protein RFP:VRFQ:AtOEP7:GFP was cleaved into two proteins, RFP and AtOEP7:GFP. Therefore, this result clearly demonstrates that Ma protease can cleave the chimeric substrate protein in vivo and the cleavage reaction can be easily assayed by detecting the localization of the green fluorescence signal at the chloroplast envelope membrane and the dispersed distribution of the red fluorescence signal in the cytosol.
[0209] FIG. 8(a)-(b) are explained in more detail as follows. The figure shows plasmid maps of the recombinant plasmids used in the present example to express NIa protease and its chimeric substrate protein RFP:PS(NIa):AtOEP7:GFP, respectively. Nucleic acid and protein sequences of NIa protease and its cleavage sites are provided in SEQ ID NOs: 45-50. Full nucleic acid and protein sequences of this chimeric substrate protein are given in SEQ ID NOs: 51 and 52, respectively.
Example 3
In Vivo Screening System for HIV-1 Protease Inhibitors
[0210] A convenient in vivo screening system for detecting inhibitors of the human immunodeficiency virus (HIV-1) protease was performed as follows.
[0211] (a) Construction of Recombinant Plasmids for Expression of HIV-1 Protease
[0212] To construct a recombinant plasmid for HIV-1 protease, the coding region of HIV-1 protease was PCR amplified with two primers: (5'-TCTAGAATGCCTCAGGTCACTCTTTGG-3' and 5'-CTCGAGTCAAAAATTTAAAGTGCAACC-3') using pHX2BΔRT as a template. The pHX-2BΔRT is a plasmid clone containing HX2B (GenBank accession number K03455) without the reverse transcriptase coding region. The amplified product was subcloned into pBluescript-T vector and subsequently cloned into XbaI and XhoI sites of a pUC vector under the control of the 35S promotor.
[0213] The plasmids maps for HIV-1 protease is shown in FIGS. 9(a) and the nucleic acid and protein sequences are given in SEQ ID NOs: 55 and 56, respectively.
[0214] (b) Construction of Recombinant Plasmids for Expression of Chimeric Substrate Proteins for HIV-1 Protease
[0215] Recombinant plasmids for total of 9 chimeric substrate proteins were constructed.
[0216] Primers used are as follows: Forward primers used were 5'-CCCGGGTAGCCAAAATTACCCTATAGTGGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTGCAAGAGTTTTGGCTGAAGCAGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTGCTACCATAATGATGCAGAGAGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTAGACAGGCTAATTTTTTAGGGGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTCCAGGGAATTTTCTTCAGAGCGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTAGCGTGCCTCAAATAGGAAAAACTTCGGGAGCG-3', 5'-CCCGGGTACTTTAAATTTTCCCATTAGCGGAAAAACTTCGGGA GCG-3', 5'-CCCGGGTGCAGAAACCTTCTATGTAGATGGAAAAACTTCGGGAGCG-3', and 5'-CCCGGGTAGGAAAGTACTATTTTTAGATGGAAAAACTTCGGGAGCG-3'.
[0217] Common reverse primer used was 5'-CTCGAGTTATTTGTATAGTTCATC-3'. These primers were designed to contain the proteolytic sites of the HIV-1 protease. The underlined regions in the forward primers correspond to the proteolytic site sequences (SEQ ID NOs: 57, 59, 61, 63, 65, 67, 69, 71, and 73). PCR amplification was performed with one of the forward primers and the common primer described above using the plasmid for the NIa protease substrate protein (SEQ ID NO: 51) as a template. PCR product was restriction digested with SmaI and XhoI and subcloned into HindIII (filled-in) and XhoI digested 326RFP-nt vector. Therefore, each of the 9 resulting plasmids contains a DNA sequence encoding one of the chimeric substrate proteins of the composition RFP:PS(HIV-1):AtOEP7:GFP with PS(HIV-1) being one of the proteolytic site sequences of the HIV-1 protease. Nucleic acid and protein sequences of one of these chimeric substrate proteins are provided in SEQ ID NOs: 75 and 76.
[0218] (c) HIV-1 Protease Inhibitor Screening In Vivo
[0219] Preparation and transformation of the protoplast of Arabidopsis leaf tissue were performed according to the procedures described in Example 1(b) and (c). Detection of the fluorescence images of the transformed protoplasts was carried out as described in Example 1(d) (but using a Nikon E800 fluorescence microscope using the same types of the filter sets).
[0220] As a working example for detecting inhibition of HIV-1 protease in vivo, the protoplasts were transformed with a recombinant plasmid for a HIV-1 protease substrate protein, RFP:PS(HIV-1):AtOEP7:GFP. The proteolytic site sequence included in the chimeric substrate protein was RQANFLG (SEQ ID NO: 64). The transformed protoplasts were incubated at 22° C. in the W5 solution for 18-48 hrs to express the chimeric substrate protein, and the subcellular localization of the fluorescence signals from the expressed chimeric substrate protein was monitored using a fluorescence microscope.
[0221] As shown in FIG. 11, both the green and red fluorescence signals were observed at the same positions in the cytosol as large speckles or aggregates, but not targeted to the chloroplast. This result indicates that the chimeric substrate proteins were not cleaved and thus they are present in the cytosol as an un-cleaved form. This data corresponds to results when complete inhibition of HIV-1 protease occurs.
[0222] Detection of HIV-1 proteolytic activity was performed as follows. Protoplasts were co-transformed with the recombinant plasmid for HIV-1 protease and the recombinant plasmid for a HIV-1 protease substrate protein RFP:PS(HIV-1):AtOEP7:GFP. The proteolytic site sequence included in the chimeric substrate protein was RQANFLG (SEQ ID NO: 64). The subcellular localization of the fluorescence signals from the expressed chimeric substrate protein was monitored 18-48 hrs after the transformation using a fluorescence microscope. As shown in FIG. 12, the red fluorescence signal was observed as uniformly dispersed in the cytosol, while most of the green fluorescence signal was observed around the chloroplasts. These results indicates that the chimeric substrate proteins were cleaved by HIV-1 protease.
[0223] This system can be used to detect molecules that decrease the dispersed red fluorescence signal and the chloroplast-targeted green fluorescence signal, and thus block or inhibit HIV-1 protease activity in the protoplasts.
[0224] FIG. 9(a) and (b) are explained in more detail as follows. The figures show plasmid maps for the recombinant plasmids for HIV-1 protease and the chimeric substrate proteins RFP:PS(HIV-1):AtOEP7:GFP. These recombinant plasmids can be used to express HIV-1 protease and the chimeric substrate protein in plant cells such as Arabidopsis thaliana, Tobacco, etc. Nucleic acid and protein sequences of HIV-1 protease and its cleavage sites are provided in SEQ ID NOs: 55-74. Full nucleic acid and protein sequences of the chimeric substrate protein used in Example 3 are provided in SEQ ID NOs: 75 and 76, respectively.
Example 4
Preparation and Use of Optionally Masked Chimeric Proteins with One Signal Protein Remaining Active
[0225] As discussed above, it is an object of the present invention to provide recombinant chimeric proteins in which at least some of the signal proteins are masked by at least one suitable amino acid sequence. For instance, all the signal proteins of a particular chimeric protein in accord with the invention can be masked or they can be unmasked as needed. Alternatively, a portion of the signal component of the chimeric protein can be unmasked and the remaining signal(s) can be masked. Such "optionally masked" chimeric proteins provide significant flexibility to the invention and have a wide range of important applications.
[0226] For instance, such chimeric proteins can be used in screens to detect in vivo protease activity by virtue of a change in subcellular localization of one or more fluorescence signals of the chimeric protein. Such a screen is highly sensitive, namely because it can register slight changes in the spatial distribution of the chimeric protein. Choice of whether to mask or unmask one or more than one signal proteins included within a subject chimeric molecule will be guided by intended invention use.
[0227] FIG. 10 provides an illustrative collection of "optionally masked" chimeric proteins. More specifically, the figure shows schematic diagrams of recombinant genes encoding chimeric substrate protein in which the trafficking signal of one signal protein remains active. H+-ATPase is used as an example of the active signal protein whose trafficking signal is not masked in the chimeric substrate protein. Other examples that can be used as the active signal protein include AtOEP7, PH, and FAPP. AtOEP7 is used as an example of the inactivated signal protein whose trafficking signal is masked by linking a proteolytic cleavage site or a signal masking protein through a proteolytic cleavage site. Other examples that can be used as the inactivated signal protein include RbcS, Cab, RA, F1-ATPase, and SKL. These chimerid substrate proteins are designed to induce a change in the subcellular fluorescence signal distribution upon proteolytic cleavage. Nearly any of the protease cleavage sequences disclosed herein can be used to provide the cleavage site (PS) of the chimeric proteins exemplified in FIG. 10. Accordingly, such optionally masked chimeric proteins can be used to detect a wide variety of protease inhibitor molecules. Of course, nearly any of the protease cleavage sequences disclosed herein can be used to provide the cleavage site (PS) of the chimeric proteins exemplified in FIG. 10.
[0228] FIG. 10 is explained in more detail as follows. FIG. 10(a) shows a construct where the fluorescence signal (FP-1) is translocated from the plasma membrane to the cytosol upon proteolytic cleavage. In the case of FIG. 10(b), one fluorescence signal (FP-2) is translocated in the same manner as in the case of FIG. 10(a), but the other fluorescence signal (FP-1) remains on the plasma membrane. In the case of FIG. 10(c), the fluorescence signal (FP-1) is translocated from the plasma membrane to the chlorophyll upon proteolytic cleavage. In the case of FIG. 10(d), one fluorescence signal (FP-2) is translocated in the same manner as in the case of FIG. 10(c), but the other fluorescence signal (FP-1) remains on the plasma membrane.
[0229] (a) Construction of Recombinant Plasmids for Expression of Chimeric Substrate Proteins with One Signal Protein Remaining Active
[0230] Examples of the recombinant plasmids depicted FIG. 10 were constructed as follows.
[0231] Two recombinant plasmids encoding H+-ATPase:PS:GFP (FIG. 10(a)) with the proteolytic cleavage site sequences being VRFQ (SEQ ID NO: 48) and RQANFLG (SEQ ID NO: 64) were constructed as follows.
[0232] Forward primers 5'-CTCGAG PS ATGAGTAAAGGAGAAGAA-3' (here PS is GTGCGCTTCCAG for VRFQ NIa cleavage site or AGACAGGCTAATTTTTTAGGG for RQANFLG HIV-1 cleavage site) and a reverse primer 5'-GAGCTCTTATTTGTATAGTTCATC-3' were used for PCR amplification of 326GFP vector. These PCR products, containing a proteolytic cleavage site for NIa or HIV-1 protease, restrcition sites (Xho I and Sac I) for subcloning, and stop codon at C-terminal of GFP, were subcloned into pBluscript-T vetor. Xho Sac I fragments of these subclones were ligated into Xho I and Sac I digested pH+ATPase-G vector (FIG. 7(i)).
[0233] Two recombinant plasmids encoding HtATPase:GFP:PS:RFP (FIG. 10(b)) with the proteolytic cleavage site sequences being VRFQ (SEQ ID NO: 48) and RQANFLG (SEQ ID NO: 64) were constructed as follows.
[0234] In order to prepare GFP without stop codon, PCR amplification was performed with primers 5'-CTCGAGATGAAAGGAGAAGAACTT-3' and 5'-GAGCTCTTTGTATAGTTCATCCAT-3'. The PCR product containing Xho I and Sac I sites was subcloned into pBluescript-T vector and subsequently restriction digested with Xho I and Sac I. This Xho I/Sac I fragment was subcloned into Xho I and Sac I digested pH+ATPase-G vector (FIG. 7(i)).
[0235] In order to place proteolytic cleavage site upstream of RFP, forward primers 5'-GAGCTC PS ATGGTGCGCTCCTCCAAG-3' (here PS is GTGCGCTTCCAG for VRFQ NIa cleavage site, or AGACAGGCTAATTTTTTAGGG for RQANFLG HIV-1 cleavage site) and a reverse primer 5'-GAGCTCCTACAGGAACAGGTGGTG-3' were used for PCR amplification of 326RFP vector. The constructs containing PS:RFP, which also contained Sac I sites, were restriction digested with Sac I and these Sac I fragments were subcloned into Sac I sites of the H+-ATPase:GFP (without stop codon) subclone prepared as described above to generate recombinant plasmids for H+ ATPase:GFP:PS:RFP.
[0236] Two recombinant plasmids encoding H+-ATPase:PS:AtOEP7:GFP (FIG. 10(c)) with the proteolytic cleavage site sequences being VRFQ (SEQ ID NO: 48) and RQANFLG (SEQ ID NO: 64) were constructed as follows.
[0237] The pSub-NIa1 vector (FIG. 8(b)) was PCR amplified with primers 5'-CTCGAG PS GGAAAAACTTCGGGAGCG-3' (here PS is GTGCGCTTCCAG for VRFQ NIa cleavage site, or AGACAGGCTAATTTTTTAGGG for RQANFLG HIV-1 cleavage site) and 5'-GAGCTC TTATTTGTATAGTTCATC-3'. Thus, the PCR products contained Xho I and Sac I sites. Xho I/Sac I fragments of these subclones were ligated into Xho I and Sac I digested pH+ ATPase-G vector (FIG. 7(i)) to generate recombinant plasmids for H+-ATPase:PS:AtOEP7:GFP.
[0238] Two recombinant plasmids encoding H+-ATPase:RFP:PS:AtOEP7:GFP (FIG. 10(d)) with the proteolytic cleavage site sequences being VRFQ (SEQ ID NO: 48) and RQANFLG (SEQ ID NO: 64) were constructed as follows.
[0239] Primers 5'-CTCGAGATGGTGCGCTCCTCCAAG-3' and 5'-GAGCTCTTATTTGTATAGTTCATC-3' were used to PCR amplify the pSub-NIa1 vector (FIG. 8(b)) and the pSub-HIV4 vector (FIG. 9(b)). Thus, these PCR products contained Xho I and Sac I sites. Xho 11Sac I fragments of these subclones were ligated into Xho I and Sac I digested pH+ATPase-G vector (FIG. 7(i)) to generate recombinant plasmids for H+-ATPase:RFP:PS:AtOEP7:GFP.
[0240] (b) A Working Example Using a Recombinant Plasmid Depicted in FIG. 10(a)
[0241] Preparation and transformation of the protoplast of Arabidopsis leaf tissue were performed according to the procedures described in Example 1(b) and (c). Detection of the fluorescence images of the transformed protoplasts was carried out as described in Example 1(d) (but using a Nikon E800 fluorescence microscope using the same types of the filter sets).
[0242] As a working example for detecting inhibition of NIa protease, the protoplasts were transformed with a recombinant plasmid encoding a chimeric substrate protein H+-ATPase:PS:GFP prepared as described in Example 4(a). The proteolytic site sequence included in the chimeric substrate protein was VRQF (SEQ ID NO: 48). The transformed protoplasts were incubated at 22° C. in the W5 solution for 18-48 hrs to express the chimeric substrate protein, and the subcellular localization of the fluorescence signal from the expressed chimeric substrate protein was monitored using a fluorescence microscope. As shown in FIG. 13(a), the green fluorescence signal was translocated to the plasma membrane. This result indicates that the chimeric substrate proteins were not cleaved as expected, and thus the attached green fluorescence proteins were targeted to the plasma membrane by the trafficking signal of H+-ATPase. This data corresponds to results when complete inhibition of the protease occurs.
[0243] Detection of NIa proteolytic activity was performed as follows. Protoplasts were co-transformed with the recombinant plasmid for NIa protease (SEQ ID NO: 45) and the recombinant plasmid for a chimeric substrate protein H+-ATPase:PS:GFP (SEQ ID NO: 53). The proteolytic site sequence included in the chimeric substrate protein was VRQF (SEQ ID NO: 48). The subcellular localization of the fluorescence signal from the expressed chimeric substrate protein was monitored 18-48 hrs after the transformation using a fluorescence microscope. As shown in FIG. 13(c), the green fluorescence signal was observed in the cytosol, but not targeted to the plasma membrane. This data indicates that GFP was cleaved off from the chimeric substrate protein by the protease.
[0244] All references disclosed herein are incorporated by reference. The following references are specifically incorporated by reference. [0245] Hook, V. Y. H. U.S. Pat. No. 6,245,884 (2001). [0246] Kettner, C. A. and Korant, B. D. U.S. Pat. No. 4,644,055 (1987). [0247] Cote, H. C. F., Brumme, Z. L., and Harrigan, P. R. (2001). J. Virol. 75, 589-594. [0248] Davis, S. J. and Vierstra, R. D. (1998). Plant Physiol. 112, 833-844. [0249] Ermolieff, J., Loy, J. A., Koelsch, G., and Tang, J. (2000). Biochem. 39, 12450-12456. [0250] Gillooly, D. J., Morrow, I. C., Lindsay, M., Gould, R., Bryant, N. J., Gaullier, J.-M., Parton, R. G., and Stenmark, H. (2000). EMBO J. 19, 4577-4588. [0251] Gutierrez-Campos, R., Torress-Acosta, J. A., Saucedo-Arias, L. et al. (1999). Nat. Biotechnol. 17, 1223-1226. [0252] Jacobsen, H., Hanggi, M., Ott, M., Duncan, I. B., and Owen, S. (1996). J. Infect. Dis. 173, 1379-1387. [0253] Kasai, N., Tsumoto, K., Niwa, S., Misawa, S., Ueno, T., Hayashi, H., and Kumagai, I. (2001). Biochem. Biophys. Res. Comm. 281, 416-424. [0254] Kuhelj, R., Rizzo, C. J., Chang, C.-H., Jadha, P. K., Towler, E. M., and Korant, B. D. (2001). J. Biol. Chem. 276, 16674-16682. [0255] Kost, B., Spielhofer, P., and Chua, N. H. (1998). Plant J. 16, 383-401. [0256] Mardis, K. L., Luo, R., and Gilson, M. K. (2001). J. Mol. Biol. 309, 507-517. [0257] Miller, T. L., Mawn, B. E., Orav, E. J., Wilk, D., Weinberg, G. A., Nicchitta, J., Furuta, L., Cutroni, R., McIntosh, K., Burchett, S. K., and Gorbach, S. L. (2001). Pediat. 107-5, 1-6. [0258] Morise, H., Shimomura, O., Johnson, F. H., and Winant, J. (1974). Biochem. 13, 2656-2662. [0259] Pih, K. T., Yi, M. J., Liang, Y. S., Shin, B. J., Cho, M. J., Hwang, I., and Son, D. (2000). Plant Physiol. 123, 51-58. [0260] Rogers, J. D., Lam, P. Y., Johnson, B. L., Wang, H. S., Ko, S. S., Seits, S. P., Trainor, G. L., Anderson, P. S., Klabe, R. M., Bachelor, L. T., Cordova, B., Garber, S., Reid, C., Wright, M. R., Chang, C. H., and Erickson-Biitanen, S. (1998). Chem. Biol. 5, 597-608. [0261] Wlodawer, A. and Erickson, J. W. (1983). Annu. Rev. Biochem. 61, 543-585. [0262] Yi, C.-F., Gosiewska, A., Burtis, D., and Geesin, J. (2001). Anal. Biochem. 291, 27-33. [0263] Yoon, H. Y., Hwang, D.C., Choi, K. Y., and Song, B. D. (2000). Mol. Cell. 10, 213-219.
[0264] The invention has been described in detail with reference to preferred embodiments thereof. However, it will be appreciated that those skilled in the art, upon consideration of this disclosure, may make modifications and improvements within the spirit and scope of the invention.
Sequence CWU
1
133118DNAArtificialSynthetic Sequence 1gacgacgacg cagcgatg
18225DNAArtificialSynthetic Sequence
2ggatccccaa accctctttg gatgt
25318DNAArtificialSynthetic Sequence 3cctcagtcac acaaagag
18424DNAArtificialSynthetic Sequence
4actcgaggga atcggtaagg tcag
24518DNAArtificialSynthetic Sequence 5tagagagaaa cgatggcg
18625DNAArtificialSynthetic Sequence
6ggatcccgtt tgggagtgga actcc
25724DNAArtificialSynthetic Sequence 7tctagaatgg ccgccgcagt ttcc
24825DNAArtificialSynthetic Sequence
8ggatccatct gtctccatcg gtttg
25918DNAArtificialSynthetic Sequence 9ctttaatcaa tggcaatg
181025DNAArtificialSynthetic Sequence
10ccatggcctg aactgctcta agctt
251117DNAArtificialSynthetic Sequence 11ccgtatgtta catcacc
171230DNAArtificialSynthetic Sequence
12ttatagcttt gatttgtata gttcatccat
301318DNAArtificialSynthetic Sequence 13gagatgtcga gtctcgaa
181423DNAArtificialSynthetic Sequence
14ctcgagcaca gtgtagtgac tgg
231538DNAArtificialSynthetic Sequence 15cccggggtgt gcgcttccag ggaaaaactt
cgggagcg 381624DNAArtificialSynthetic
Sequence 16gagctcttat ttgtatagtt catc
241727DNAArtificialSynthetic Sequence 17tctagaatgc ctcaggtcac
tctttgg
271827DNAArtificialSynthetic Sequence 18ctcgagtcaa aaatttaaag tgcaacc
271946DNAArtificialSynthetic Sequence
19cccgggtagc caaaattacc ctatagtggg aaaaacttcg ggagcg
462046DNAArtificialSynthetic Sequence 20cccgggtgca agagttttgg ctgaagcagg
aaaaacttcg ggagcg 462146DNAArtificialSynthetic
Sequence 21cccgggtgct accataatga tgcagagacc aaaaacttcg ggagcg
462246DNAArtificialSynthetic Sequence 22cccgggtaga caggctaatt
ttttaggggg aaaaacttcg ggagcg
462346DNAArtificialSynthetic Sequence 23cccgggtcca gggaattttc ttcagagcgg
aaaaacttcg ggagcg 462441DNAArtificialSynthetic
Sequence 24cccgggtagc gtgcctcaaa taggaaaaaa cttcgggagc g
412546DNAArtificialSynthetic Sequence 25cccgggtact ttaaattttc
ccattagcgg aaaaacttcg ggagcg
462646DNAArtificialSynthetic Sequence 26cccgggtgca gaaaccttct atgtagatgg
aaaaacttcg ggagcg 462746DNAArtificialSynthetic
Sequence 27cccgggtagg aaagtactat ttttagatgg aaaaacttcg ggagcg
462824DNAArtificialSynthetic Sequence 28ctcgagttat ttgtatagtt catc
2429192DNAArabidopsis
thalianaCDS(1)..(192)Partial AtOEP7, a homologue of pea OEP14,
comprising a chloroplast (outer envelop membrane) targeting sequence
29atg gga aaa act tcg gga gcg aaa cag gcg act gtg gtg gtc gca gcg
48Met Gly Lys Thr Ser Gly Ala Lys Gln Ala Thr Val Val Val Ala Ala1
5 10 15atg gcg tta gga tgg tta
gcc ata gag atc gct ttc aag cct ttc ctc 96Met Ala Leu Gly Trp Leu
Ala Ile Glu Ile Ala Phe Lys Pro Phe Leu 20 25
30gat aaa ttc cgc tcc tca atc gac aaa tct gac cca acc
aaa gac ccc 144Asp Lys Phe Arg Ser Ser Ile Asp Lys Ser Asp Pro Thr
Lys Asp Pro 35 40 45gat gac ttc
gac acc gcc gct act gca acc aca tcc aaa gag ggt ttg 192Asp Asp Phe
Asp Thr Ala Ala Thr Ala Thr Thr Ser Lys Glu Gly Leu 50
55 603064PRTArabidopsis thaliana 30Met Gly Lys Thr Ser
Gly Ala Lys Gln Ala Thr Val Val Val Ala Ala1 5
10 15Met Ala Leu Gly Trp Leu Ala Ile Glu Ile Ala
Phe Lys Pro Phe Leu 20 25
30Asp Lys Phe Arg Ser Ser Ile Asp Lys Ser Asp Pro Thr Lys Asp Pro
35 40 45Asp Asp Phe Asp Thr Ala Ala Thr
Ala Thr Thr Ser Lys Glu Gly Leu 50 55
6031237DNAArabidopsis thalianaCDS(1)..(237)Partial rubisco small subunit
comprising a chloroplast (stroma) targeting sequence 31atg gct tcc
tct atg ctc tct tcc gct act atg gtt gcc tct ccg gct 48Met Ala Ser
Ser Met Leu Ser Ser Ala Thr Met Val Ala Ser Pro Ala1 5
10 15cag gcc act atg gtc gct cct ttc aac
gga ctt aag tcc tcc gct gcc 96Gln Ala Thr Met Val Ala Pro Phe Asn
Gly Leu Lys Ser Ser Ala Ala 20 25
30ttc cca gcc acc cgc aag gct aac aac gac att act tcc atc aca agc
144Phe Pro Ala Thr Arg Lys Ala Asn Asn Asp Ile Thr Ser Ile Thr Ser
35 40 45aac ggc gga aga gtt aac tgc
atg cag gtg tgg cct ccg att gga aag 192Asn Gly Gly Arg Val Asn Cys
Met Gln Val Trp Pro Pro Ile Gly Lys 50 55
60aag aag ttt gag act ctc tct tac ctt cct gac ctt acc gat tcc
237Lys Lys Phe Glu Thr Leu Ser Tyr Leu Pro Asp Leu Thr Asp Ser65
70 753279PRTArabidopsis thaliana 32Met Ala Ser
Ser Met Leu Ser Ser Ala Thr Met Val Ala Ser Pro Ala1 5
10 15Gln Ala Thr Met Val Ala Pro Phe Asn
Gly Leu Lys Ser Ser Ala Ala 20 25
30Phe Pro Ala Thr Arg Lys Ala Asn Asn Asp Ile Thr Ser Ile Thr Ser
35 40 45Asn Gly Gly Arg Val Asn Cys
Met Gln Val Trp Pro Pro Ile Gly Lys 50 55
60Lys Lys Phe Glu Thr Leu Ser Tyr Leu Pro Asp Leu Thr Asp Ser65
70 7533105DNAArabidopsis
thalianaCDS(1)..(105)Partial Chlorophyll a/b binding protein
comprising a chloroplast (stroma) targeting sequence 33atg gcg tcg
aac tcg ctt atg agc tgt ggc ata gcc gcc gtg tac cct 48Met Ala Ser
Asn Ser Leu Met Ser Cys Gly Ile Ala Ala Val Tyr Pro1 5
10 15tcg ctt ctc tct tct tcc aag tct aaa
ttc gta tcc gcc gga gtt cca 96Ser Leu Leu Ser Ser Ser Lys Ser Lys
Phe Val Ser Ala Gly Val Pro 20 25
30ctc cca aac
105Leu Pro Asn 353435PRTArabidopsis thaliana 34Met Ala Ser Asn
Ser Leu Met Ser Cys Gly Ile Ala Ala Val Tyr Pro1 5
10 15Ser Leu Leu Ser Ser Ser Lys Ser Lys Phe
Val Ser Ala Gly Val Pro 20 25
30Leu Pro Asn 3535204DNAArabidopsis thalianaCDS(1)..(204)Partial
rubisco activase comprising a chloroplast (stroma) targeting
sequence 35atg gcc gcc gca gtt tcc acc gtc ggt gcc atc aac aga gct ccg
ttg 48Met Ala Ala Ala Val Ser Thr Val Gly Ala Ile Asn Arg Ala Pro
Leu1 5 10 15agc ttg aac
ggg tca gga tca gga gct gta tca gcc cca gct tca acc 96Ser Leu Asn
Gly Ser Gly Ser Gly Ala Val Ser Ala Pro Ala Ser Thr 20
25 30ttc ttg gga aag aaa gtt gta act gtg tcg
aga ttc gca cag agc aac 144Phe Leu Gly Lys Lys Val Val Thr Val Ser
Arg Phe Ala Gln Ser Asn 35 40
45aag aag agc aac gga tca ttc aag gtg ttg gct gtg aaa gaa gac aaa
192Lys Lys Ser Asn Gly Ser Phe Lys Val Leu Ala Val Lys Glu Asp Lys 50
55 60caa acc gat gga
204Gln Thr Asp Gly653668PRTArabidopsis
thaliana 36Met Ala Ala Ala Val Ser Thr Val Gly Ala Ile Asn Arg Ala Pro
Leu1 5 10 15Ser Leu Asn
Gly Ser Gly Ser Gly Ala Val Ser Ala Pro Ala Ser Thr 20
25 30Phe Leu Gly Lys Lys Val Val Thr Val Ser
Arg Phe Ala Gln Ser Asn 35 40
45Lys Lys Ser Asn Gly Ser Phe Lys Val Leu Ala Val Lys Glu Asp Lys 50
55 60Gln Thr Asp Gly6537234DNAArabidopsis
thalianaCDS(1)..(234)Partial gamma subunit of mitochindrial
F1-ATPase comprising a mitochondria targeting sequence 37atg gca atg gct
gtt ttc cgt cgc gaa ggg agg cgt ctc ctc cct tca 48Met Ala Met Ala
Val Phe Arg Arg Glu Gly Arg Arg Leu Leu Pro Ser1 5
10 15atc gcc gct cgc cca atc gct gct atc cga
tct cct ctc tct tct gac 96Ile Ala Ala Arg Pro Ile Ala Ala Ile Arg
Ser Pro Leu Ser Ser Asp 20 25
30cag gag gaa gga ctt ctt gga gtt cga tct atc tca act caa gtg gtg
144Gln Glu Glu Gly Leu Leu Gly Val Arg Ser Ile Ser Thr Gln Val Val
35 40 45cgt aac cgc atg aag agt gtt aag
aac atc caa aag atc aca aag gca 192Arg Asn Arg Met Lys Ser Val Lys
Asn Ile Gln Lys Ile Thr Lys Ala 50 55
60atg aag atg gtt gct gct tcc aag ctt aga gca gtt cag gcc
234Met Lys Met Val Ala Ala Ser Lys Leu Arg Ala Val Gln Ala65
70 753878PRTArabidopsis thaliana 38Met Ala Met Ala
Val Phe Arg Arg Glu Gly Arg Arg Leu Leu Pro Ser1 5
10 15Ile Ala Ala Arg Pro Ile Ala Ala Ile Arg
Ser Pro Leu Ser Ser Asp 20 25
30Gln Glu Glu Gly Leu Leu Gly Val Arg Ser Ile Ser Thr Gln Val Val
35 40 45Arg Asn Arg Met Lys Ser Val Lys
Asn Ile Gln Lys Ile Thr Lys Ala 50 55
60Met Lys Met Val Ala Ala Ser Lys Leu Arg Ala Val Gln Ala65
70 75392847DNAArabidopsis
thalianagene(1)..(2847)Gene for H+-ATPase comprising a plasma membrane
targeting sequence 39atg tcg agt ctc gaa gat atc aag aac gag act gtt
gat ctg gaa aaa 48Met Ser Ser Leu Glu Asp Ile Lys Asn Glu Thr Val
Asp Leu Glu Lys1 5 10
15att ccg att gag gaa gtt ttc cag cag cta aaa tgt tca agg gaa gga
96Ile Pro Ile Glu Glu Val Phe Gln Gln Leu Lys Cys Ser Arg Glu Gly
20 25 30ttg aca acg cag gaa ggg gag
gac agg att cag atc ttt ggc ccc aac 144Leu Thr Thr Gln Glu Gly Glu
Asp Arg Ile Gln Ile Phe Gly Pro Asn 35 40
45aag ctc gaa gag aaa aag gaa agc aaa ctt ctg aag ttt ttg ggg
ttt 192Lys Leu Glu Glu Lys Lys Glu Ser Lys Leu Leu Lys Phe Leu Gly
Phe 50 55 60atg tgg aat cca ctt tca
tgg gtc atg gaa atg gct gca atc atg gcc 240Met Trp Asn Pro Leu Ser
Trp Val Met Glu Met Ala Ala Ile Met Ala65 70
75 80att gct ttg gcc aac ggt gat ggt agg cct ccg
gat tgg cag gat ttt 288Ile Ala Leu Ala Asn Gly Asp Gly Arg Pro Pro
Asp Trp Gln Asp Phe 85 90
95gtt ggt att atc tgt ctg ttg gtt atc aac tct acc atc agt ttt atc
336Val Gly Ile Ile Cys Leu Leu Val Ile Asn Ser Thr Ile Ser Phe Ile
100 105 110gaa gaa aac aat gct ggt
aat gct gct gct gct ctt atg gct ggt ctt 384Glu Glu Asn Asn Ala Gly
Asn Ala Ala Ala Ala Leu Met Ala Gly Leu 115 120
125gct cct aaa acc aag gtt ctt agg gat gga aag tgg agt gaa
caa gaa 432Ala Pro Lys Thr Lys Val Leu Arg Asp Gly Lys Trp Ser Glu
Gln Glu 130 135 140gct gct att ctt gtc
cca gga gat att gtt agc att aaa tta gga gac 480Ala Ala Ile Leu Val
Pro Gly Asp Ile Val Ser Ile Lys Leu Gly Asp145 150
155 160att atc cca gct gat gcc cgt cta ctt gaa
ggt gat cct tta aag gtt 528Ile Ile Pro Ala Asp Ala Arg Leu Leu Glu
Gly Asp Pro Leu Lys Val 165 170
175gac caa tct gct cta act gga gag tcc ctt cct gta acc aag cac ccg
576Asp Gln Ser Ala Leu Thr Gly Glu Ser Leu Pro Val Thr Lys His Pro
180 185 190ggt caa gaa gtt ttc tct
ggt tca acc tgc aaa caa gga gaa atc gag 624Gly Gln Glu Val Phe Ser
Gly Ser Thr Cys Lys Gln Gly Glu Ile Glu 195 200
205gcg gtt gtt att gcc act ggg gtt cat acc ttc ttc ggt aaa
gct gct 672Ala Val Val Ile Ala Thr Gly Val His Thr Phe Phe Gly Lys
Ala Ala 210 215 220cac ctt gtg gac agc
act aac caa gtt gga cat ttc cag aag gtt ctt 720His Leu Val Asp Ser
Thr Asn Gln Val Gly His Phe Gln Lys Val Leu225 230
235 240aca gcc att ggg aac ttc tgt atc tgt tcc
att gct atc ggt atg gtg 768Thr Ala Ile Gly Asn Phe Cys Ile Cys Ser
Ile Ala Ile Gly Met Val 245 250
255att gag atc atc gtc atg tat ccg atc caa cgc cga aag tac aga gat
816Ile Glu Ile Ile Val Met Tyr Pro Ile Gln Arg Arg Lys Tyr Arg Asp
260 265 270gga att gac aac ctt ttg
gtc ctc ttg atc ggt ggt atc ccc att gct 864Gly Ile Asp Asn Leu Leu
Val Leu Leu Ile Gly Gly Ile Pro Ile Ala 275 280
285atg cct aca gtc ttg tcc gtg acc atg gct att ggg tct cac
agg ttg 912Met Pro Thr Val Leu Ser Val Thr Met Ala Ile Gly Ser His
Arg Leu 290 295 300tct cag caa ggt gcc
atc acc aag cgt atg act gcc att gaa gag atg 960Ser Gln Gln Gly Ala
Ile Thr Lys Arg Met Thr Ala Ile Glu Glu Met305 310
315 320gca gga atg gat gtc ctg tgc agt gac aaa
acc ggg aca cta acc ctc 1008Ala Gly Met Asp Val Leu Cys Ser Asp Lys
Thr Gly Thr Leu Thr Leu 325 330
335aac aaa ttg agt gtg gac aaa aac ttg gtc gag gtt ttc tgc aag ggt
1056Asn Lys Leu Ser Val Asp Lys Asn Leu Val Glu Val Phe Cys Lys Gly
340 345 350gtg gag aaa gat caa gtc
cta tta ttt gca gct atg gct tcc agg gtt 1104Val Glu Lys Asp Gln Val
Leu Leu Phe Ala Ala Met Ala Ser Arg Val 355 360
365gag aac cag gat gcc att gat gca gcc atg gtt ggg atg ctt
gct gat 1152Glu Asn Gln Asp Ala Ile Asp Ala Ala Met Val Gly Met Leu
Ala Asp 370 375 380cca aag gag gct aga
gct gga atc agg gaa gtt cac ttc ctt cca ttc 1200Pro Lys Glu Ala Arg
Ala Gly Ile Arg Glu Val His Phe Leu Pro Phe385 390
395 400aac cct gtg gat aag aga act gct ttg act
tac att gac ggc agt ggt 1248Asn Pro Val Asp Lys Arg Thr Ala Leu Thr
Tyr Ile Asp Gly Ser Gly 405 410
415aac tgg cac aga gtc agt aaa ggt gct cct gag cag atc ctc gaa ctt
1296Asn Trp His Arg Val Ser Lys Gly Ala Pro Glu Gln Ile Leu Glu Leu
420 425 430gcc aaa gcc agc aat gat
ctt agc aag aag gtg ctc tcc att att gac 1344Ala Lys Ala Ser Asn Asp
Leu Ser Lys Lys Val Leu Ser Ile Ile Asp 435 440
445aag tat gct gag cgt ggt ctt agg tcg ttg gct gtt gct cgc
cag gtg 1392Lys Tyr Ala Glu Arg Gly Leu Arg Ser Leu Ala Val Ala Arg
Gln Val 450 455 460gtg cca gag aaa aca
aag gaa agc cca ggt gcg cca tgg gaa ttt gtt 1440Val Pro Glu Lys Thr
Lys Glu Ser Pro Gly Ala Pro Trp Glu Phe Val465 470
475 480ggc ttg ttg cca ctt ttt gat ccc cca aga
cat gac agt gct gaa aca 1488Gly Leu Leu Pro Leu Phe Asp Pro Pro Arg
His Asp Ser Ala Glu Thr 485 490
495att cga cgg gct ttg aat ctt ggt gtt aac gtc aag atg atc act ggt
1536Ile Arg Arg Ala Leu Asn Leu Gly Val Asn Val Lys Met Ile Thr Gly
500 505 510gac caa ctt gct att ggt
aag gaa act ggt cgc aga ctt gga atg gga 1584Asp Gln Leu Ala Ile Gly
Lys Glu Thr Gly Arg Arg Leu Gly Met Gly 515 520
525aca aac atg tat cca tct tcg gct ctt ctt ggt aca cac aaa
gac gca 1632Thr Asn Met Tyr Pro Ser Ser Ala Leu Leu Gly Thr His Lys
Asp Ala 530 535 540aac ctc gca tcc att
cct gtt gag gag ttg att gaa aag gct gat gga 1680Asn Leu Ala Ser Ile
Pro Val Glu Glu Leu Ile Glu Lys Ala Asp Gly545 550
555 560ttt gcc gga gtc ttc cca gag cac aaa tac
gaa att gtg aaa aag ttg 1728Phe Ala Gly Val Phe Pro Glu His Lys Tyr
Glu Ile Val Lys Lys Leu 565 570
575cag gag agg aag cat att gtt gga atg act ggt gat ggt gtc aat gat
1776Gln Glu Arg Lys His Ile Val Gly Met Thr Gly Asp Gly Val Asn Asp
580 585 590gcc cct gct cta aag aaa
gct gat atc ggt att gct gtt gct gat gct 1824Ala Pro Ala Leu Lys Lys
Ala Asp Ile Gly Ile Ala Val Ala Asp Ala 595 600
605aca gat gct gct cgt ggt gct tca gat atc gtg ctc act gag
cct gga 1872Thr Asp Ala Ala Arg Gly Ala Ser Asp Ile Val Leu Thr Glu
Pro Gly 610 615 620ctc agc gtt att atc
agt gct gtt ctc acc agc aga gct att ttc cag 1920Leu Ser Val Ile Ile
Ser Ala Val Leu Thr Ser Arg Ala Ile Phe Gln625 630
635 640aga atg aag aac tat act atc tat gca gtc
tca atc acc atc cgt att 1968Arg Met Lys Asn Tyr Thr Ile Tyr Ala Val
Ser Ile Thr Ile Arg Ile 645 650
655gtg ttt ggt ttc atg ctt att gct ttg ata tgg gaa ttt gac ttc tca
2016Val Phe Gly Phe Met Leu Ile Ala Leu Ile Trp Glu Phe Asp Phe Ser
660 665 670gcc ttc atg gtt ctg atc
att gcc att ctt aac gac ggt acc atc atg 2064Ala Phe Met Val Leu Ile
Ile Ala Ile Leu Asn Asp Gly Thr Ile Met 675 680
685aca atc tca aag gac aga gtt aag cca tct ccc aca cct gat
agc tgg 2112Thr Ile Ser Lys Asp Arg Val Lys Pro Ser Pro Thr Pro Asp
Ser Trp 690 695 700aaa ctt aaa gaa att
ttt gct act gga gtc gtt cta gga ggc tac cag 2160Lys Leu Lys Glu Ile
Phe Ala Thr Gly Val Val Leu Gly Gly Tyr Gln705 710
715 720gcc atc atg act gtt att ttc ttc tgg gcg
gcg cac aag act gac ttt 2208Ala Ile Met Thr Val Ile Phe Phe Trp Ala
Ala His Lys Thr Asp Phe 725 730
735ttc tcg gac aca ttc ggt gtg agg tcc att agg gac aat aac cac gag
2256Phe Ser Asp Thr Phe Gly Val Arg Ser Ile Arg Asp Asn Asn His Glu
740 745 750cta atg ggt gcg gtg tac
tta caa gtt agt atc att agt caa gct ctg 2304Leu Met Gly Ala Val Tyr
Leu Gln Val Ser Ile Ile Ser Gln Ala Leu 755 760
765atc ttc gtc aca aga tca agg agt tgg tct ttt gtt gaa cgt
cct gga 2352Ile Phe Val Thr Arg Ser Arg Ser Trp Ser Phe Val Glu Arg
Pro Gly 770 775 780gca ttg ctg atg att
gct ttc ctc att gca caa ctg att gct act ttg 2400Ala Leu Leu Met Ile
Ala Phe Leu Ile Ala Gln Leu Ile Ala Thr Leu785 790
795 800att gcg gtt tac gcc aac tgg gaa ttt gca
aag att agg ggt att gga 2448Ile Ala Val Tyr Ala Asn Trp Glu Phe Ala
Lys Ile Arg Gly Ile Gly 805 810
815tgg gga tgg gct ggt gtg atc tgg cta tac agt att gtc aca tac ttc
2496Trp Gly Trp Ala Gly Val Ile Trp Leu Tyr Ser Ile Val Thr Tyr Phe
820 825 830cca ttg gac gtt ttc aag
ttt gcc att cga tac atc ttg agc gga aag 2544Pro Leu Asp Val Phe Lys
Phe Ala Ile Arg Tyr Ile Leu Ser Gly Lys 835 840
845gcg tgg ctc aac ttg ttt gag aac aag acg gct ttc acg atg
aag aaa 2592Ala Trp Leu Asn Leu Phe Glu Asn Lys Thr Ala Phe Thr Met
Lys Lys 850 855 860gat tac gga aaa gaa
gag aga gag gct caa tgg gca ctt gct caa agg 2640Asp Tyr Gly Lys Glu
Glu Arg Glu Ala Gln Trp Ala Leu Ala Gln Arg865 870
875 880aca ctt cac ggt tta cag cca aaa gaa gct
gtt aac atc ttc cct gag 2688Thr Leu His Gly Leu Gln Pro Lys Glu Ala
Val Asn Ile Phe Pro Glu 885 890
895aaa gga agt tac aga gaa ttg tct gag atc gct gag caa gct aag aga
2736Lys Gly Ser Tyr Arg Glu Leu Ser Glu Ile Ala Glu Gln Ala Lys Arg
900 905 910aga gct gag atc gct agg
ctt agg gag ctg cac aca ctc aag gga cat 2784Arg Ala Glu Ile Ala Arg
Leu Arg Glu Leu His Thr Leu Lys Gly His 915 920
925gtg gaa tca gtc gtg aag cta aag ggc ttg gac att gaa act
ccc agt 2832Val Glu Ser Val Val Lys Leu Lys Gly Leu Asp Ile Glu Thr
Pro Ser 930 935 940cac tac act gtg tag
2847His Tyr Thr
Val94540948PRTArabidopsis thaliana 40Met Ser Ser Leu Glu Asp Ile Lys Asn
Glu Thr Val Asp Leu Glu Lys1 5 10
15Ile Pro Ile Glu Glu Val Phe Gln Gln Leu Lys Cys Ser Arg Glu
Gly 20 25 30Leu Thr Thr Gln
Glu Gly Glu Asp Arg Ile Gln Ile Phe Gly Pro Asn 35
40 45Lys Leu Glu Glu Lys Lys Glu Ser Lys Leu Leu Lys
Phe Leu Gly Phe 50 55 60Met Trp Asn
Pro Leu Ser Trp Val Met Glu Met Ala Ala Ile Met Ala65 70
75 80Ile Ala Leu Ala Asn Gly Asp Gly
Arg Pro Pro Asp Trp Gln Asp Phe 85 90
95Val Gly Ile Ile Cys Leu Leu Val Ile Asn Ser Thr Ile Ser
Phe Ile 100 105 110Glu Glu Asn
Asn Ala Gly Asn Ala Ala Ala Ala Leu Met Ala Gly Leu 115
120 125Ala Pro Lys Thr Lys Val Leu Arg Asp Gly Lys
Trp Ser Glu Gln Glu 130 135 140Ala Ala
Ile Leu Val Pro Gly Asp Ile Val Ser Ile Lys Leu Gly Asp145
150 155 160Ile Ile Pro Ala Asp Ala Arg
Leu Leu Glu Gly Asp Pro Leu Lys Val 165
170 175Asp Gln Ser Ala Leu Thr Gly Glu Ser Leu Pro Val
Thr Lys His Pro 180 185 190Gly
Gln Glu Val Phe Ser Gly Ser Thr Cys Lys Gln Gly Glu Ile Glu 195
200 205Ala Val Val Ile Ala Thr Gly Val His
Thr Phe Phe Gly Lys Ala Ala 210 215
220His Leu Val Asp Ser Thr Asn Gln Val Gly His Phe Gln Lys Val Leu225
230 235 240Thr Ala Ile Gly
Asn Phe Cys Ile Cys Ser Ile Ala Ile Gly Met Val 245
250 255Ile Glu Ile Ile Val Met Tyr Pro Ile Gln
Arg Arg Lys Tyr Arg Asp 260 265
270Gly Ile Asp Asn Leu Leu Val Leu Leu Ile Gly Gly Ile Pro Ile Ala
275 280 285Met Pro Thr Val Leu Ser Val
Thr Met Ala Ile Gly Ser His Arg Leu 290 295
300Ser Gln Gln Gly Ala Ile Thr Lys Arg Met Thr Ala Ile Glu Glu
Met305 310 315 320Ala Gly
Met Asp Val Leu Cys Ser Asp Lys Thr Gly Thr Leu Thr Leu
325 330 335Asn Lys Leu Ser Val Asp Lys
Asn Leu Val Glu Val Phe Cys Lys Gly 340 345
350Val Glu Lys Asp Gln Val Leu Leu Phe Ala Ala Met Ala Ser
Arg Val 355 360 365Glu Asn Gln Asp
Ala Ile Asp Ala Ala Met Val Gly Met Leu Ala Asp 370
375 380Pro Lys Glu Ala Arg Ala Gly Ile Arg Glu Val His
Phe Leu Pro Phe385 390 395
400Asn Pro Val Asp Lys Arg Thr Ala Leu Thr Tyr Ile Asp Gly Ser Gly
405 410 415Asn Trp His Arg Val
Ser Lys Gly Ala Pro Glu Gln Ile Leu Glu Leu 420
425 430Ala Lys Ala Ser Asn Asp Leu Ser Lys Lys Val Leu
Ser Ile Ile Asp 435 440 445Lys Tyr
Ala Glu Arg Gly Leu Arg Ser Leu Ala Val Ala Arg Gln Val 450
455 460Val Pro Glu Lys Thr Lys Glu Ser Pro Gly Ala
Pro Trp Glu Phe Val465 470 475
480Gly Leu Leu Pro Leu Phe Asp Pro Pro Arg His Asp Ser Ala Glu Thr
485 490 495Ile Arg Arg Ala
Leu Asn Leu Gly Val Asn Val Lys Met Ile Thr Gly 500
505 510Asp Gln Leu Ala Ile Gly Lys Glu Thr Gly Arg
Arg Leu Gly Met Gly 515 520 525Thr
Asn Met Tyr Pro Ser Ser Ala Leu Leu Gly Thr His Lys Asp Ala 530
535 540Asn Leu Ala Ser Ile Pro Val Glu Glu Leu
Ile Glu Lys Ala Asp Gly545 550 555
560Phe Ala Gly Val Phe Pro Glu His Lys Tyr Glu Ile Val Lys Lys
Leu 565 570 575Gln Glu Arg
Lys His Ile Val Gly Met Thr Gly Asp Gly Val Asn Asp 580
585 590Ala Pro Ala Leu Lys Lys Ala Asp Ile Gly
Ile Ala Val Ala Asp Ala 595 600
605Thr Asp Ala Ala Arg Gly Ala Ser Asp Ile Val Leu Thr Glu Pro Gly 610
615 620Leu Ser Val Ile Ile Ser Ala Val
Leu Thr Ser Arg Ala Ile Phe Gln625 630
635 640Arg Met Lys Asn Tyr Thr Ile Tyr Ala Val Ser Ile
Thr Ile Arg Ile 645 650
655Val Phe Gly Phe Met Leu Ile Ala Leu Ile Trp Glu Phe Asp Phe Ser
660 665 670Ala Phe Met Val Leu Ile
Ile Ala Ile Leu Asn Asp Gly Thr Ile Met 675 680
685Thr Ile Ser Lys Asp Arg Val Lys Pro Ser Pro Thr Pro Asp
Ser Trp 690 695 700Lys Leu Lys Glu Ile
Phe Ala Thr Gly Val Val Leu Gly Gly Tyr Gln705 710
715 720Ala Ile Met Thr Val Ile Phe Phe Trp Ala
Ala His Lys Thr Asp Phe 725 730
735Phe Ser Asp Thr Phe Gly Val Arg Ser Ile Arg Asp Asn Asn His Glu
740 745 750Leu Met Gly Ala Val
Tyr Leu Gln Val Ser Ile Ile Ser Gln Ala Leu 755
760 765Ile Phe Val Thr Arg Ser Arg Ser Trp Ser Phe Val
Glu Arg Pro Gly 770 775 780Ala Leu Leu
Met Ile Ala Phe Leu Ile Ala Gln Leu Ile Ala Thr Leu785
790 795 800Ile Ala Val Tyr Ala Asn Trp
Glu Phe Ala Lys Ile Arg Gly Ile Gly 805
810 815Trp Gly Trp Ala Gly Val Ile Trp Leu Tyr Ser Ile
Val Thr Tyr Phe 820 825 830Pro
Leu Asp Val Phe Lys Phe Ala Ile Arg Tyr Ile Leu Ser Gly Lys 835
840 845Ala Trp Leu Asn Leu Phe Glu Asn Lys
Thr Ala Phe Thr Met Lys Lys 850 855
860Asp Tyr Gly Lys Glu Glu Arg Glu Ala Gln Trp Ala Leu Ala Gln Arg865
870 875 880Thr Leu His Gly
Leu Gln Pro Lys Glu Ala Val Asn Ile Phe Pro Glu 885
890 895Lys Gly Ser Tyr Arg Glu Leu Ser Glu Ile
Ala Glu Gln Ala Lys Arg 900 905
910Arg Ala Glu Ile Ala Arg Leu Arg Glu Leu His Thr Leu Lys Gly His
915 920 925Val Glu Ser Val Val Lys Leu
Lys Gly Leu Asp Ile Glu Thr Pro Ser 930 935
940His Tyr Thr Val94541525DNARattus norvegicusgene(1)..(525)Gene for
partial pleckstrin homology domain of phospholipase C-delta 1
comprising phosphatidylinositol 4,5-diphosphate targeting sequence
41gac tcg ggt agg gac ttc ctg acc ctg cac ggg ctc cag gat gac ccg
48Asp Ser Gly Arg Asp Phe Leu Thr Leu His Gly Leu Gln Asp Asp Pro1
5 10 15gac ctt cag gcc ctt ctg
aag ggc agc cag ctt ctg aag gtg aag tcc 96Asp Leu Gln Ala Leu Leu
Lys Gly Ser Gln Leu Leu Lys Val Lys Ser 20 25
30agc tcg tgg cgt agg gaa cgc ttc tac aag cta cag gag
gac tgc aag 144Ser Ser Trp Arg Arg Glu Arg Phe Tyr Lys Leu Gln Glu
Asp Cys Lys 35 40 45acc atc tgg
cag gaa tct cga aag gtc atg agg tcc ccg gag tcg cag 192Thr Ile Trp
Gln Glu Ser Arg Lys Val Met Arg Ser Pro Glu Ser Gln 50
55 60ctg ttc tcc atc gag gac att cag gag gta cgg atg
gga cac cgc aca 240Leu Phe Ser Ile Glu Asp Ile Gln Glu Val Arg Met
Gly His Arg Thr65 70 75
80gaa ggc ctg gag aag ttt gcc cga gac atc ccc gag gat cga tgc ttc
288Glu Gly Leu Glu Lys Phe Ala Arg Asp Ile Pro Glu Asp Arg Cys Phe
85 90 95tcc att gtc ttc aag gac
cag cgc aac acc cta gac ctc att gcc cca 336Ser Ile Val Phe Lys Asp
Gln Arg Asn Thr Leu Asp Leu Ile Ala Pro 100
105 110tca cca gct gac gct cag cac tgg gtg cag ggc ctg
cgc aag atc atc 384Ser Pro Ala Asp Ala Gln His Trp Val Gln Gly Leu
Arg Lys Ile Ile 115 120 125cac cac
tcc ggc tcc atg gac cag cgg cag aag ctg cag cac tgg att 432His His
Ser Gly Ser Met Asp Gln Arg Gln Lys Leu Gln His Trp Ile 130
135 140cac tcc tgc ttg cga aag gct gat aaa aac aag
gac aac aag atg aac 480His Ser Cys Leu Arg Lys Ala Asp Lys Asn Lys
Asp Asn Lys Met Asn145 150 155
160ttc aag gag ctg aag gac ttc ctg aag gag ctc aac atc cag tga
525Phe Lys Glu Leu Lys Asp Phe Leu Lys Glu Leu Asn Ile Gln
165 17042174PRTRattus norvegicus 42Asp Ser Gly Arg Asp
Phe Leu Thr Leu His Gly Leu Gln Asp Asp Pro1 5
10 15Asp Leu Gln Ala Leu Leu Lys Gly Ser Gln Leu
Leu Lys Val Lys Ser 20 25
30Ser Ser Trp Arg Arg Glu Arg Phe Tyr Lys Leu Gln Glu Asp Cys Lys
35 40 45Thr Ile Trp Gln Glu Ser Arg Lys
Val Met Arg Ser Pro Glu Ser Gln 50 55
60Leu Phe Ser Ile Glu Asp Ile Gln Glu Val Arg Met Gly His Arg Thr65
70 75 80Glu Gly Leu Glu Lys
Phe Ala Arg Asp Ile Pro Glu Asp Arg Cys Phe 85
90 95Ser Ile Val Phe Lys Asp Gln Arg Asn Thr Leu
Asp Leu Ile Ala Pro 100 105
110Ser Pro Ala Asp Ala Gln His Trp Val Gln Gly Leu Arg Lys Ile Ile
115 120 125His His Ser Gly Ser Met Asp
Gln Arg Gln Lys Leu Gln His Trp Ile 130 135
140His Ser Cys Leu Arg Lys Ala Asp Lys Asn Lys Asp Asn Lys Met
Asn145 150 155 160Phe Lys
Glu Leu Lys Asp Phe Leu Lys Glu Leu Asn Ile Gln 165
17043282DNAMus musculusCDS(1)..(282)Partial pleckstrin homology
domain of FAPP (family A (phosphoinositide binding specific) member
3), comprising a phosphatidylinositol 4-phosphate targeting sequence
43atg gag ggg gtt ctg tac aag tgg acc aac tat ctc aca ggt tgg cag
48Met Glu Gly Val Leu Tyr Lys Trp Thr Asn Tyr Leu Thr Gly Trp Gln1
5 10 15cct cga tgg ttt gtt ctg
gat aat gga atc ctg tcc tac tat gac tca 96Pro Arg Trp Phe Val Leu
Asp Asn Gly Ile Leu Ser Tyr Tyr Asp Ser 20 25
30cag gat gat gtc tgc aaa ggg agc aaa ggg agt ata aag
atg gcg gtc 144Gln Asp Asp Val Cys Lys Gly Ser Lys Gly Ser Ile Lys
Met Ala Val 35 40 45tgt gag att
aaa gtc cat ccc gca gac aac aca aga atg gag tta atc 192Cys Glu Ile
Lys Val His Pro Ala Asp Asn Thr Arg Met Glu Leu Ile 50
55 60att cca gga gag cag cat ttc tac atg aag gca gta
aat gcc gcc gag 240Ile Pro Gly Glu Gln His Phe Tyr Met Lys Ala Val
Asn Ala Ala Glu65 70 75
80aga cag agg tgg ctg gtt gcc ctt ggg agc tcc aaa gcg tgc
282Arg Gln Arg Trp Leu Val Ala Leu Gly Ser Ser Lys Ala Cys
85 904494PRTMus musculus 44Met Glu Gly Val Leu Tyr Lys
Trp Thr Asn Tyr Leu Thr Gly Trp Gln1 5 10
15Pro Arg Trp Phe Val Leu Asp Asn Gly Ile Leu Ser Tyr
Tyr Asp Ser 20 25 30Gln Asp
Asp Val Cys Lys Gly Ser Lys Gly Ser Ile Lys Met Ala Val 35
40 45Cys Glu Ile Lys Val His Pro Ala Asp Asn
Thr Arg Met Glu Leu Ile 50 55 60Ile
Pro Gly Glu Gln His Phe Tyr Met Lys Ala Val Asn Ala Ala Glu65
70 75 80Arg Gln Arg Trp Leu Val
Ala Leu Gly Ser Ser Lys Ala Cys 85
9045741DNATobacco vein mottling virusgene(1)..(741)Gene for NIa protease
45atg tca aag gcg cta ctt aag gga gtg cga gat ttt aat cca atc tct
48Met Ser Lys Ala Leu Leu Lys Gly Val Arg Asp Phe Asn Pro Ile Ser1
5 10 15gct tgc gta tgc ctc ctt
gag aac tcc tcg gat ggg cat agt gag aga 96Ala Cys Val Cys Leu Leu
Glu Asn Ser Ser Asp Gly His Ser Glu Arg 20 25
30ctg ttt ggc att ggt ttt ggc cca tat atc att gcc aac
caa cat ctt 144Leu Phe Gly Ile Gly Phe Gly Pro Tyr Ile Ile Ala Asn
Gln His Leu 35 40 45ttt aga agg
aac aat gga gag ttg act atc aaa acc atg cat ggt gag 192Phe Arg Arg
Asn Asn Gly Glu Leu Thr Ile Lys Thr Met His Gly Glu 50
55 60ttc aaa gtc aag aac tca aca caa ttg cag atg aaa
cca gtt gag ggc 240Phe Lys Val Lys Asn Ser Thr Gln Leu Gln Met Lys
Pro Val Glu Gly65 70 75
80aga gac ata ata gtt atc aaa atg gct aag gac ttc cca cca ttc cct
288Arg Asp Ile Ile Val Ile Lys Met Ala Lys Asp Phe Pro Pro Phe Pro
85 90 95caa aaa cta aaa ttc aga
cag cct acc atc aaa gat aga gtg tgc atg 336Gln Lys Leu Lys Phe Arg
Gln Pro Thr Ile Lys Asp Arg Val Cys Met 100
105 110gta tcc aca aat ttt cag cag aaa agt gtc tct agt
cta gtg tct gag 384Val Ser Thr Asn Phe Gln Gln Lys Ser Val Ser Ser
Leu Val Ser Glu 115 120 125tca tca
cac att gtg cat aaa gag gac act tca ttc tgg caa cac tgg 432Ser Ser
His Ile Val His Lys Glu Asp Thr Ser Phe Trp Gln His Trp 130
135 140ata aca aca aag gat gga caa tgt gga agt ccg
ctg gtt tca atc att 480Ile Thr Thr Lys Asp Gly Gln Cys Gly Ser Pro
Leu Val Ser Ile Ile145 150 155
160gat gga aat att ttg ggg atc cac agc ctg acg cat acg acc aat ggt
528Asp Gly Asn Ile Leu Gly Ile His Ser Leu Thr His Thr Thr Asn Gly
165 170 175agc aat tac ttc gtg
gaa ttt cct gag aag ttc gta gct aca tat ctt 576Ser Asn Tyr Phe Val
Glu Phe Pro Glu Lys Phe Val Ala Thr Tyr Leu 180
185 190gat gcc gct gat ggt tgg tgc aag aat tgg aag ttc
aat gct gat aag 624Asp Ala Ala Asp Gly Trp Cys Lys Asn Trp Lys Phe
Asn Ala Asp Lys 195 200 205atc agt
tgg ggt tcc ttt aca tta gtt gag gat gcg ccc gaa gat gac 672Ile Ser
Trp Gly Ser Phe Thr Leu Val Glu Asp Ala Pro Glu Asp Asp 210
215 220ttc atg gcc aag aaa act gtt gcc gcc atc atg
gac gat ttg gtc cgc 720Phe Met Ala Lys Lys Thr Val Ala Ala Ile Met
Asp Asp Leu Val Arg225 230 235
240act caa ggg gga gaa gct tga
741Thr Gln Gly Gly Glu Ala 24546246PRTTobacco vein
mottling virus 46Met Ser Lys Ala Leu Leu Lys Gly Val Arg Asp Phe Asn Pro
Ile Ser1 5 10 15Ala Cys
Val Cys Leu Leu Glu Asn Ser Ser Asp Gly His Ser Glu Arg 20
25 30Leu Phe Gly Ile Gly Phe Gly Pro Tyr
Ile Ile Ala Asn Gln His Leu 35 40
45Phe Arg Arg Asn Asn Gly Glu Leu Thr Ile Lys Thr Met His Gly Glu 50
55 60Phe Lys Val Lys Asn Ser Thr Gln Leu
Gln Met Lys Pro Val Glu Gly65 70 75
80Arg Asp Ile Ile Val Ile Lys Met Ala Lys Asp Phe Pro Pro
Phe Pro 85 90 95Gln Lys
Leu Lys Phe Arg Gln Pro Thr Ile Lys Asp Arg Val Cys Met 100
105 110Val Ser Thr Asn Phe Gln Gln Lys Ser
Val Ser Ser Leu Val Ser Glu 115 120
125Ser Ser His Ile Val His Lys Glu Asp Thr Ser Phe Trp Gln His Trp
130 135 140Ile Thr Thr Lys Asp Gly Gln
Cys Gly Ser Pro Leu Val Ser Ile Ile145 150
155 160Asp Gly Asn Ile Leu Gly Ile His Ser Leu Thr His
Thr Thr Asn Gly 165 170
175Ser Asn Tyr Phe Val Glu Phe Pro Glu Lys Phe Val Ala Thr Tyr Leu
180 185 190Asp Ala Ala Asp Gly Trp
Cys Lys Asn Trp Lys Phe Asn Ala Asp Lys 195 200
205Ile Ser Trp Gly Ser Phe Thr Leu Val Glu Asp Ala Pro Glu
Asp Asp 210 215 220Phe Met Ala Lys Lys
Thr Val Ala Ala Ile Met Asp Asp Leu Val Arg225 230
235 240Thr Gln Gly Gly Glu Ala
2454712DNAArtificialSynthetic Sequence 47gtg cgc ttc cag
12Val Arg Phe
Gln1484PRTArtificialSynthetic Sequence 48Val Arg Phe
Gln14921DNAArtificialSynthetic Sequence 49gaa cca gtc tat ttc caa ggg
21Glu Pro Val Tyr Phe Gln Gly1
5507PRTArtificialSynthetic Sequence 50Glu Pro Val Tyr Phe Gln
Gly1 5511638DNAArtificialSynthetic Sequence 51atg gtg cgc
tcc tcc aag aac gtc atc aag gag ttc atg cgc ttc aag 48Met Val Arg
Ser Ser Lys Asn Val Ile Lys Glu Phe Met Arg Phe Lys1 5
10 15gtg cgc atg gag ggc acc gtg aac ggc
cac gag ttc gag atc gag ggc 96Val Arg Met Glu Gly Thr Val Asn Gly
His Glu Phe Glu Ile Glu Gly 20 25
30gag ggc gag ggc cgc ccc tac gag ggc cac aac acc gtg aag ctg aag
144Glu Gly Glu Gly Arg Pro Tyr Glu Gly His Asn Thr Val Lys Leu Lys
35 40 45gtg acc aag ggc ggc ccc ctg
ccc ttc gcc tgg gac atc ctg tcc ccc 192Val Thr Lys Gly Gly Pro Leu
Pro Phe Ala Trp Asp Ile Leu Ser Pro 50 55
60cag ttc cag tac ggc tcc aag gtg tac gtg aag cac ccc gcc gac atc
240Gln Phe Gln Tyr Gly Ser Lys Val Tyr Val Lys His Pro Ala Asp Ile65
70 75 80ccc gac tac aag
aag ctg tcc ttc ccc gag ggc ttc aag tgg gag cgc 288Pro Asp Tyr Lys
Lys Leu Ser Phe Pro Glu Gly Phe Lys Trp Glu Arg 85
90 95gtg atg aac ttc gag gac ggc ggc gtg gtg
acc gtg acc cag gac tcc 336Val Met Asn Phe Glu Asp Gly Gly Val Val
Thr Val Thr Gln Asp Ser 100 105
110tcc ctg cag gac ggc tgc ttc atc tac aag gtg aag ttc atc ggc gtg
384Ser Leu Gln Asp Gly Cys Phe Ile Tyr Lys Val Lys Phe Ile Gly Val
115 120 125aac ttc ccc tcc gac ggc ccc
gta atg cag aag aag acc atg ggc tgg 432Asn Phe Pro Ser Asp Gly Pro
Val Met Gln Lys Lys Thr Met Gly Trp 130 135
140gag gcc tcc acc gag cgc ctg tac ccc cgc gac ggc gtg ctg aag ggc
480Glu Ala Ser Thr Glu Arg Leu Tyr Pro Arg Asp Gly Val Leu Lys Gly145
150 155 160gag atc cac aag
gcc ctg aag ctg aag gac ggc ggc cac tac ctg gtg 528Glu Ile His Lys
Ala Leu Lys Leu Lys Asp Gly Gly His Tyr Leu Val 165
170 175gag ttc aag tcc atc tac atg gcc aag aag
ccc gtg cag ctg ccc ggc 576Glu Phe Lys Ser Ile Tyr Met Ala Lys Lys
Pro Val Gln Leu Pro Gly 180 185
190tac tac tac gtg gac tcc aag ctg gac atc acc tcc cac aac gag gac
624Tyr Tyr Tyr Val Asp Ser Lys Leu Asp Ile Thr Ser His Asn Glu Asp
195 200 205tac acc atc gtg gag cag tac
gag cgc acc gag ggc cgc cac cac ctg 672Tyr Thr Ile Val Glu Gln Tyr
Glu Arg Thr Glu Gly Arg His His Leu 210 215
220ttc ctg ccc cgg gca atc aag ctg ggt gtg cgc ttc cag gga aaa act
720Phe Leu Pro Arg Ala Ile Lys Leu Gly Val Arg Phe Gln Gly Lys Thr225
230 235 240tcg gga gcg aaa
cag gcg act gtg gtg gtc gca gcg atg gcg tta gga 768Ser Gly Ala Lys
Gln Ala Thr Val Val Val Ala Ala Met Ala Leu Gly 245
250 255tgg tta gcc ata gag atc gct ttc aag cct
ttc ctc gat aaa ttc cgc 816Trp Leu Ala Ile Glu Ile Ala Phe Lys Pro
Phe Leu Asp Lys Phe Arg 260 265
270tcc tca atc gac aaa tct gac cca acc aaa gac ccc gat gac ttc gac
864Ser Ser Ile Asp Lys Ser Asp Pro Thr Lys Asp Pro Asp Asp Phe Asp
275 280 285acc gcc gct act gca acc aca
tcc aaa gag ggt ttg ggg atc caa gga 912Thr Ala Ala Thr Ala Thr Thr
Ser Lys Glu Gly Leu Gly Ile Gln Gly 290 295
300gat ata aca atg agt aaa gga gaa gaa ctt ttc act gga gtt gtc cca
960Asp Ile Thr Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro305
310 315 320att ctt gtt gaa
tta gat ggt gat gtt aat ggg cac aaa ttt tct gtc 1008Ile Leu Val Glu
Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val 325
330 335agt gga gag ggt gaa ggt gat gca aca tac
gga aaa ctt acc ctt aaa 1056Ser Gly Glu Gly Glu Gly Asp Ala Thr Tyr
Gly Lys Leu Thr Leu Lys 340 345
350ttt att tgc act act gga aaa cta cct gtt cca tgg cca aca ctt gtc
1104Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val
355 360 365act act ttc tct tat ggt gtt
caa tgc ttt tca aga tac cca gat cat 1152Thr Thr Phe Ser Tyr Gly Val
Gln Cys Phe Ser Arg Tyr Pro Asp His 370 375
380atg aag cgg cac gac ttc ttc aag agc gcc atg cct gag gga tac gtg
1200Met Lys Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val385
390 395 400cag gag agg acc
atc tct ttc aag gac gac ggg aac tac aag aca cgt 1248Gln Glu Arg Thr
Ile Ser Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg 405
410 415gct gaa gtc aag ttt gag gga gac acc ctc
gtc aac agg atc gag ctt 1296Ala Glu Val Lys Phe Glu Gly Asp Thr Leu
Val Asn Arg Ile Glu Leu 420 425
430aag gga atc gat ttc aag gag gac gga aac atc ctc ggc cac aag ttg
1344Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu
435 440 445gaa tac aac tac aac tcc cac
aac gta tac atc acg gca gac aaa caa 1392Glu Tyr Asn Tyr Asn Ser His
Asn Val Tyr Ile Thr Ala Asp Lys Gln 450 455
460aag aat gga atc aaa gct aac ttc aaa att aga cac aac att gaa gat
1440Lys Asn Gly Ile Lys Ala Asn Phe Lys Ile Arg His Asn Ile Glu Asp465
470 475 480gga agc gtt caa
cta gca gac cat tat caa caa aat act cca att ggc 1488Gly Ser Val Gln
Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly 485
490 495gat ggc cct gtc ctt tta cca gac aac cat
tac ctg tcc aca caa tct 1536Asp Gly Pro Val Leu Leu Pro Asp Asn His
Tyr Leu Ser Thr Gln Ser 500 505
510gcc ctt tcg aaa gat ccc aac gaa aag aga gac cac atg gtc ctt ctt
1584Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu
515 520 525gag ttt gta aca gct gct ggg
att aca cat ggc atg gat gaa cta tac 1632Glu Phe Val Thr Ala Ala Gly
Ile Thr His Gly Met Asp Glu Leu Tyr 530 535
540aaa taa
1638Lys54552545PRTArtificialSynthetic Sequence 52Met Val Arg Ser Ser
Lys Asn Val Ile Lys Glu Phe Met Arg Phe Lys1 5
10 15Val Arg Met Glu Gly Thr Val Asn Gly His Glu
Phe Glu Ile Glu Gly 20 25
30Glu Gly Glu Gly Arg Pro Tyr Glu Gly His Asn Thr Val Lys Leu Lys
35 40 45Val Thr Lys Gly Gly Pro Leu Pro
Phe Ala Trp Asp Ile Leu Ser Pro 50 55
60Gln Phe Gln Tyr Gly Ser Lys Val Tyr Val Lys His Pro Ala Asp Ile65
70 75 80Pro Asp Tyr Lys Lys
Leu Ser Phe Pro Glu Gly Phe Lys Trp Glu Arg 85
90 95Val Met Asn Phe Glu Asp Gly Gly Val Val Thr
Val Thr Gln Asp Ser 100 105
110Ser Leu Gln Asp Gly Cys Phe Ile Tyr Lys Val Lys Phe Ile Gly Val
115 120 125Asn Phe Pro Ser Asp Gly Pro
Val Met Gln Lys Lys Thr Met Gly Trp 130 135
140Glu Ala Ser Thr Glu Arg Leu Tyr Pro Arg Asp Gly Val Leu Lys
Gly145 150 155 160Glu Ile
His Lys Ala Leu Lys Leu Lys Asp Gly Gly His Tyr Leu Val
165 170 175Glu Phe Lys Ser Ile Tyr Met
Ala Lys Lys Pro Val Gln Leu Pro Gly 180 185
190Tyr Tyr Tyr Val Asp Ser Lys Leu Asp Ile Thr Ser His Asn
Glu Asp 195 200 205Tyr Thr Ile Val
Glu Gln Tyr Glu Arg Thr Glu Gly Arg His His Leu 210
215 220Phe Leu Pro Arg Ala Ile Lys Leu Gly Val Arg Phe
Gln Gly Lys Thr225 230 235
240Ser Gly Ala Lys Gln Ala Thr Val Val Val Ala Ala Met Ala Leu Gly
245 250 255Trp Leu Ala Ile Glu
Ile Ala Phe Lys Pro Phe Leu Asp Lys Phe Arg 260
265 270Ser Ser Ile Asp Lys Ser Asp Pro Thr Lys Asp Pro
Asp Asp Phe Asp 275 280 285Thr Ala
Ala Thr Ala Thr Thr Ser Lys Glu Gly Leu Gly Ile Gln Gly 290
295 300Asp Ile Thr Met Ser Lys Gly Glu Glu Leu Phe
Thr Gly Val Val Pro305 310 315
320Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val
325 330 335Ser Gly Glu Gly
Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys 340
345 350Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro
Trp Pro Thr Leu Val 355 360 365Thr
Thr Phe Ser Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His 370
375 380Met Lys Arg His Asp Phe Phe Lys Ser Ala
Met Pro Glu Gly Tyr Val385 390 395
400Gln Glu Arg Thr Ile Ser Phe Lys Asp Asp Gly Asn Tyr Lys Thr
Arg 405 410 415Ala Glu Val
Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu 420
425 430Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn
Ile Leu Gly His Lys Leu 435 440
445Glu Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Thr Ala Asp Lys Gln 450
455 460Lys Asn Gly Ile Lys Ala Asn Phe
Lys Ile Arg His Asn Ile Glu Asp465 470
475 480Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn
Thr Pro Ile Gly 485 490
495Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser
500 505 510Ala Leu Ser Lys Asp Pro
Asn Glu Lys Arg Asp His Met Val Leu Leu 515 520
525Glu Phe Val Thr Ala Ala Gly Ile Thr His Gly Met Asp Glu
Leu Tyr 530 535
540Lys545533579DNAArtificialSynthetic Sequence 53atg tcg agt ctc gaa gat
atc aag aac gag act gtt gat ctg gaa aaa 48Met Ser Ser Leu Glu Asp
Ile Lys Asn Glu Thr Val Asp Leu Glu Lys1 5
10 15att ccg att gag gaa gtt ttc cag cag cta aaa tgt
tca agg gaa gga 96Ile Pro Ile Glu Glu Val Phe Gln Gln Leu Lys Cys
Ser Arg Glu Gly 20 25 30ttg
aca acg cag gaa ggg gag gac agg att cag atc ttt ggc ccc aac 144Leu
Thr Thr Gln Glu Gly Glu Asp Arg Ile Gln Ile Phe Gly Pro Asn 35
40 45aag ctc gaa gag aaa aag gaa agc aaa
ctt ctg aag ttt ttg ggg ttt 192Lys Leu Glu Glu Lys Lys Glu Ser Lys
Leu Leu Lys Phe Leu Gly Phe 50 55
60atg tgg aat cca ctt tca tgg gtc atg gaa atg gct gca atc atg gcc
240Met Trp Asn Pro Leu Ser Trp Val Met Glu Met Ala Ala Ile Met Ala65
70 75 80att gct ttg gcc aac
ggt gat ggt agg cct ccg gat tgg cag gat ttt 288Ile Ala Leu Ala Asn
Gly Asp Gly Arg Pro Pro Asp Trp Gln Asp Phe 85
90 95gtt ggt att atc tgt ctg ttg gtt atc aac tct
acc atc agt ttt atc 336Val Gly Ile Ile Cys Leu Leu Val Ile Asn Ser
Thr Ile Ser Phe Ile 100 105
110gaa gaa aac aat gct ggt aat gct gct gct gct ctt atg gct ggt ctt
384Glu Glu Asn Asn Ala Gly Asn Ala Ala Ala Ala Leu Met Ala Gly Leu
115 120 125gct cct aaa acc aag gtt ctt
agg gat gga aag tgg agt gaa caa gaa 432Ala Pro Lys Thr Lys Val Leu
Arg Asp Gly Lys Trp Ser Glu Gln Glu 130 135
140gct gct att ctt gtc cca gga gat att gtt agc att aaa tta gga gac
480Ala Ala Ile Leu Val Pro Gly Asp Ile Val Ser Ile Lys Leu Gly Asp145
150 155 160att atc cca gct
gat gcc cgt cta ctt gaa ggt gat cct tta aag gtt 528Ile Ile Pro Ala
Asp Ala Arg Leu Leu Glu Gly Asp Pro Leu Lys Val 165
170 175gac caa tct gct cta act gga gag tcc ctt
cct gta acc aag cac ccg 576Asp Gln Ser Ala Leu Thr Gly Glu Ser Leu
Pro Val Thr Lys His Pro 180 185
190ggt caa gaa gtt ttc tct ggt tca acc tgc aaa caa gga gaa atc gag
624Gly Gln Glu Val Phe Ser Gly Ser Thr Cys Lys Gln Gly Glu Ile Glu
195 200 205gcg gtt gtt att gcc act ggg
gtt cat acc ttc ttc ggt aaa gct gct 672Ala Val Val Ile Ala Thr Gly
Val His Thr Phe Phe Gly Lys Ala Ala 210 215
220cac ctt gtg gac agc act aac caa gtt gga cat ttc cag aag gtt ctt
720His Leu Val Asp Ser Thr Asn Gln Val Gly His Phe Gln Lys Val Leu225
230 235 240aca gcc att ggg
aac ttc tgt atc tgt tcc att gct atc ggt atg gtg 768Thr Ala Ile Gly
Asn Phe Cys Ile Cys Ser Ile Ala Ile Gly Met Val 245
250 255att gag atc atc gtc atg tat ccg atc caa
cgc cga aag tac aga gat 816Ile Glu Ile Ile Val Met Tyr Pro Ile Gln
Arg Arg Lys Tyr Arg Asp 260 265
270gga att gac aac ctt ttg gtc ctc ttg atc ggt ggt atc ccc att gct
864Gly Ile Asp Asn Leu Leu Val Leu Leu Ile Gly Gly Ile Pro Ile Ala
275 280 285atg cct aca gtc ttg tcc gtg
acc atg gct att ggg tct cac agg ttg 912Met Pro Thr Val Leu Ser Val
Thr Met Ala Ile Gly Ser His Arg Leu 290 295
300tct cag caa ggt gcc atc acc aag cgt atg act gcc att gaa gag atg
960Ser Gln Gln Gly Ala Ile Thr Lys Arg Met Thr Ala Ile Glu Glu Met305
310 315 320gca gga atg gat
gtc ctg tgc agt gac aaa acc ggg aca cta acc ctc 1008Ala Gly Met Asp
Val Leu Cys Ser Asp Lys Thr Gly Thr Leu Thr Leu 325
330 335aac aaa ttg agt gtg gac aaa aac ttg gtc
gag gtt ttc tgc aag ggt 1056Asn Lys Leu Ser Val Asp Lys Asn Leu Val
Glu Val Phe Cys Lys Gly 340 345
350gtg gag aaa gat caa gtc cta tta ttt gca gct atg gct tcc agg gtt
1104Val Glu Lys Asp Gln Val Leu Leu Phe Ala Ala Met Ala Ser Arg Val
355 360 365gag aac cag gat gcc att gat
gca gcc atg gtt ggg atg ctt gct gat 1152Glu Asn Gln Asp Ala Ile Asp
Ala Ala Met Val Gly Met Leu Ala Asp 370 375
380cca aag gag gct aga gct gga atc agg gaa gtt cac ttc ctt cca ttc
1200Pro Lys Glu Ala Arg Ala Gly Ile Arg Glu Val His Phe Leu Pro Phe385
390 395 400aac cct gtg gat
aag aga act gct ttg act tac att gac ggc agt ggt 1248Asn Pro Val Asp
Lys Arg Thr Ala Leu Thr Tyr Ile Asp Gly Ser Gly 405
410 415aac tgg cac aga gtc agt aaa ggt gct cct
gag cag atc ctc gaa ctt 1296Asn Trp His Arg Val Ser Lys Gly Ala Pro
Glu Gln Ile Leu Glu Leu 420 425
430gcc aaa gcc agc aat gat ctt agc aag aag gtg ctc tcc att att gac
1344Ala Lys Ala Ser Asn Asp Leu Ser Lys Lys Val Leu Ser Ile Ile Asp
435 440 445aag tat gct gag cgt ggt ctt
agg tcg ttg gct gtt gct cgc cag gtg 1392Lys Tyr Ala Glu Arg Gly Leu
Arg Ser Leu Ala Val Ala Arg Gln Val 450 455
460gtg cca gag aaa aca aag gaa agc cca ggt gcg cca tgg gaa ttt gtt
1440Val Pro Glu Lys Thr Lys Glu Ser Pro Gly Ala Pro Trp Glu Phe Val465
470 475 480ggc ttg ttg cca
ctt ttt gat ccc cca aga cat gac agt gct gaa aca 1488Gly Leu Leu Pro
Leu Phe Asp Pro Pro Arg His Asp Ser Ala Glu Thr 485
490 495att cga cgg gct ttg aat ctt ggt gtt aac
gtc aag atg atc act ggt 1536Ile Arg Arg Ala Leu Asn Leu Gly Val Asn
Val Lys Met Ile Thr Gly 500 505
510gac caa ctt gct att ggt aag gaa act ggt cgc aga ctt gga atg gga
1584Asp Gln Leu Ala Ile Gly Lys Glu Thr Gly Arg Arg Leu Gly Met Gly
515 520 525aca aac atg tat cca tct tcg
gct ctt ctt ggt aca cac aaa gac gca 1632Thr Asn Met Tyr Pro Ser Ser
Ala Leu Leu Gly Thr His Lys Asp Ala 530 535
540aac ctc gca tcc att cct gtt gag gag ttg att gaa aag gct gat gga
1680Asn Leu Ala Ser Ile Pro Val Glu Glu Leu Ile Glu Lys Ala Asp Gly545
550 555 560ttt gcc gga gtc
ttc cca gag cac aaa tac gaa att gtg aaa aag ttg 1728Phe Ala Gly Val
Phe Pro Glu His Lys Tyr Glu Ile Val Lys Lys Leu 565
570 575cag gag agg aag cat att gtt gga atg act
ggt gat ggt gtc aat gat 1776Gln Glu Arg Lys His Ile Val Gly Met Thr
Gly Asp Gly Val Asn Asp 580 585
590gcc cct gct cta aag aaa gct gat atc ggt att gct gtt gct gat gct
1824Ala Pro Ala Leu Lys Lys Ala Asp Ile Gly Ile Ala Val Ala Asp Ala
595 600 605aca gat gct gct cgt ggt gct
tca gat atc gtg ctc act gag cct gga 1872Thr Asp Ala Ala Arg Gly Ala
Ser Asp Ile Val Leu Thr Glu Pro Gly 610 615
620ctc agc gtt att atc agt gct gtt ctc acc agc aga gct att ttc cag
1920Leu Ser Val Ile Ile Ser Ala Val Leu Thr Ser Arg Ala Ile Phe Gln625
630 635 640aga atg aag aac
tat act atc tat gca gtc tca atc acc atc cgt att 1968Arg Met Lys Asn
Tyr Thr Ile Tyr Ala Val Ser Ile Thr Ile Arg Ile 645
650 655gtg ttt ggt ttc atg ctt att gct ttg ata
tgg gaa ttt gac ttc tca 2016Val Phe Gly Phe Met Leu Ile Ala Leu Ile
Trp Glu Phe Asp Phe Ser 660 665
670gcc ttc atg gtt ctg atc att gcc att ctt aac gac ggt acc atc atg
2064Ala Phe Met Val Leu Ile Ile Ala Ile Leu Asn Asp Gly Thr Ile Met
675 680 685aca atc tca aag gac aga gtt
aag cca tct ccc aca cct gat agc tgg 2112Thr Ile Ser Lys Asp Arg Val
Lys Pro Ser Pro Thr Pro Asp Ser Trp 690 695
700aaa ctt aaa gaa att ttt gct act gga gtc gtt cta gga ggc tac cag
2160Lys Leu Lys Glu Ile Phe Ala Thr Gly Val Val Leu Gly Gly Tyr Gln705
710 715 720gcc atc atg act
gtt att ttc ttc tgg gcg gcg cac aag act gac ttt 2208Ala Ile Met Thr
Val Ile Phe Phe Trp Ala Ala His Lys Thr Asp Phe 725
730 735ttc tcg gac aca ttc ggt gtg agg tcc att
agg gac aat aac cac gag 2256Phe Ser Asp Thr Phe Gly Val Arg Ser Ile
Arg Asp Asn Asn His Glu 740 745
750cta atg ggt gcg gtg tac tta caa gtt agt atc att agt caa gct ctg
2304Leu Met Gly Ala Val Tyr Leu Gln Val Ser Ile Ile Ser Gln Ala Leu
755 760 765atc ttc gtc aca aga tca agg
agt tgg tct ttt gtt gaa cgt cct gga 2352Ile Phe Val Thr Arg Ser Arg
Ser Trp Ser Phe Val Glu Arg Pro Gly 770 775
780gca ttg ctg atg att gct ttc ctc att gca caa ctg att gct act ttg
2400Ala Leu Leu Met Ile Ala Phe Leu Ile Ala Gln Leu Ile Ala Thr Leu785
790 795 800att gcg gtt tac
gcc aac tgg gaa ttt gca aag att agg ggt att gga 2448Ile Ala Val Tyr
Ala Asn Trp Glu Phe Ala Lys Ile Arg Gly Ile Gly 805
810 815tgg gga tgg gct ggt gtg atc tgg cta tac
agt att gtc aca tac ttc 2496Trp Gly Trp Ala Gly Val Ile Trp Leu Tyr
Ser Ile Val Thr Tyr Phe 820 825
830cca ttg gac gtt ttc aag ttt gcc att cga tac atc ttg agc gga aag
2544Pro Leu Asp Val Phe Lys Phe Ala Ile Arg Tyr Ile Leu Ser Gly Lys
835 840 845gcg tgg ctc aac ttg ttt gag
aac aag acg gct ttc acg atg aag aaa 2592Ala Trp Leu Asn Leu Phe Glu
Asn Lys Thr Ala Phe Thr Met Lys Lys 850 855
860gat tac gga aaa gaa gag aga gag gct caa tgg gca ctt gct caa agg
2640Asp Tyr Gly Lys Glu Glu Arg Glu Ala Gln Trp Ala Leu Ala Gln Arg865
870 875 880aca ctt cac ggt
tta cag cca aaa gaa gct gtt aac atc ttc cct gag 2688Thr Leu His Gly
Leu Gln Pro Lys Glu Ala Val Asn Ile Phe Pro Glu 885
890 895aaa gga agt tac aga gaa ttg tct gag atc
gct gag caa gct aag aga 2736Lys Gly Ser Tyr Arg Glu Leu Ser Glu Ile
Ala Glu Gln Ala Lys Arg 900 905
910aga gct gag atc gct agg ctt agg gag ctg cac aca ctc aag gga cat
2784Arg Ala Glu Ile Ala Arg Leu Arg Glu Leu His Thr Leu Lys Gly His
915 920 925gtg gaa tca gtc gtg aag cta
aag ggc ttg gac att gaa act ccc agt 2832Val Glu Ser Val Val Lys Leu
Lys Gly Leu Asp Ile Glu Thr Pro Ser 930 935
940cac tac act gtg ctc gag gtg cgc ttc cag atg agt aaa gga gaa gaa
2880His Tyr Thr Val Leu Glu Val Arg Phe Gln Met Ser Lys Gly Glu Glu945
950 955 960ctt ttc act gga
gtt gtc cca att ctt gtt gaa tta gat ggt gat gtt 2928Leu Phe Thr Gly
Val Val Pro Ile Leu Val Glu Leu Asp Gly Asp Val 965
970 975aat ggg cac aaa ttt tct gtc agt gga gag
ggt gaa ggt gat gca aca 2976Asn Gly His Lys Phe Ser Val Ser Gly Glu
Gly Glu Gly Asp Ala Thr 980 985
990tac gga aaa ctt acc ctt aaa ttt att tgc act act gga aaa cta cct
3024Tyr Gly Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro
995 1000 1005gtt cca tgg cca aca ctt
gtc act act ttc tct tat ggt gtt caa 3069Val Pro Trp Pro Thr Leu
Val Thr Thr Phe Ser Tyr Gly Val Gln 1010 1015
1020tgc ttt tca aga tac cca gat cat atg aag cgg cac gac ttc
ttc 3114Cys Phe Ser Arg Tyr Pro Asp His Met Lys Arg His Asp Phe
Phe 1025 1030 1035aag agc gcc atg cct
gag gga tac gtg cag gag agg acc atc tct 3159Lys Ser Ala Met Pro
Glu Gly Tyr Val Gln Glu Arg Thr Ile Ser 1040 1045
1050ttc aag gac gac ggg aac tac aag aca cgt gct gaa gtc
aag ttt 3204Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val
Lys Phe 1055 1060 1065gag gga gac acc
ctc gtc aac agg atc gag ctt aag gga atc gat 3249Glu Gly Asp Thr
Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp 1070
1075 1080ttc aag gag gac gga aac atc ctc ggc cac aag
ttg gaa tac aac 3294Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys
Leu Glu Tyr Asn 1085 1090 1095tac aac
tcc cac aac gta tac atc acg gca gac aaa caa aag aat 3339Tyr Asn
Ser His Asn Val Tyr Ile Thr Ala Asp Lys Gln Lys Asn 1100
1105 1110gga atc aaa gct aac ttc aaa att aga cac
aac att gaa gat gga 3384Gly Ile Lys Ala Asn Phe Lys Ile Arg His
Asn Ile Glu Asp Gly 1115 1120 1125agc
gtt caa cta gca gac cat tat caa caa aat act cca att ggc 3429Ser
Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly 1130
1135 1140gat ggc cct gtc ctt tta cca gac aac
cat tac ctg tcc aca caa 3474Asp Gly Pro Val Leu Leu Pro Asp Asn
His Tyr Leu Ser Thr Gln 1145 1150
1155tct gcc ctt tcg aaa gat ccc aac gaa aag aga gac cac atg gtc
3519Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val
1160 1165 1170ctt ctt gag ttt gta aca
gct gct ggg att aca cat ggc atg gat 3564Leu Leu Glu Phe Val Thr
Ala Ala Gly Ile Thr His Gly Met Asp 1175 1180
1185gaa cta tac aaa taa
3579Glu Leu Tyr Lys 1190541192PRTArtificialSynthetic Sequence
54Met Ser Ser Leu Glu Asp Ile Lys Asn Glu Thr Val Asp Leu Glu Lys1
5 10 15Ile Pro Ile Glu Glu Val
Phe Gln Gln Leu Lys Cys Ser Arg Glu Gly 20 25
30Leu Thr Thr Gln Glu Gly Glu Asp Arg Ile Gln Ile Phe
Gly Pro Asn 35 40 45Lys Leu Glu
Glu Lys Lys Glu Ser Lys Leu Leu Lys Phe Leu Gly Phe 50
55 60Met Trp Asn Pro Leu Ser Trp Val Met Glu Met Ala
Ala Ile Met Ala65 70 75
80Ile Ala Leu Ala Asn Gly Asp Gly Arg Pro Pro Asp Trp Gln Asp Phe
85 90 95Val Gly Ile Ile Cys Leu
Leu Val Ile Asn Ser Thr Ile Ser Phe Ile 100
105 110Glu Glu Asn Asn Ala Gly Asn Ala Ala Ala Ala Leu
Met Ala Gly Leu 115 120 125Ala Pro
Lys Thr Lys Val Leu Arg Asp Gly Lys Trp Ser Glu Gln Glu 130
135 140Ala Ala Ile Leu Val Pro Gly Asp Ile Val Ser
Ile Lys Leu Gly Asp145 150 155
160Ile Ile Pro Ala Asp Ala Arg Leu Leu Glu Gly Asp Pro Leu Lys Val
165 170 175Asp Gln Ser Ala
Leu Thr Gly Glu Ser Leu Pro Val Thr Lys His Pro 180
185 190Gly Gln Glu Val Phe Ser Gly Ser Thr Cys Lys
Gln Gly Glu Ile Glu 195 200 205Ala
Val Val Ile Ala Thr Gly Val His Thr Phe Phe Gly Lys Ala Ala 210
215 220His Leu Val Asp Ser Thr Asn Gln Val Gly
His Phe Gln Lys Val Leu225 230 235
240Thr Ala Ile Gly Asn Phe Cys Ile Cys Ser Ile Ala Ile Gly Met
Val 245 250 255Ile Glu Ile
Ile Val Met Tyr Pro Ile Gln Arg Arg Lys Tyr Arg Asp 260
265 270Gly Ile Asp Asn Leu Leu Val Leu Leu Ile
Gly Gly Ile Pro Ile Ala 275 280
285Met Pro Thr Val Leu Ser Val Thr Met Ala Ile Gly Ser His Arg Leu 290
295 300Ser Gln Gln Gly Ala Ile Thr Lys
Arg Met Thr Ala Ile Glu Glu Met305 310
315 320Ala Gly Met Asp Val Leu Cys Ser Asp Lys Thr Gly
Thr Leu Thr Leu 325 330
335Asn Lys Leu Ser Val Asp Lys Asn Leu Val Glu Val Phe Cys Lys Gly
340 345 350Val Glu Lys Asp Gln Val
Leu Leu Phe Ala Ala Met Ala Ser Arg Val 355 360
365Glu Asn Gln Asp Ala Ile Asp Ala Ala Met Val Gly Met Leu
Ala Asp 370 375 380Pro Lys Glu Ala Arg
Ala Gly Ile Arg Glu Val His Phe Leu Pro Phe385 390
395 400Asn Pro Val Asp Lys Arg Thr Ala Leu Thr
Tyr Ile Asp Gly Ser Gly 405 410
415Asn Trp His Arg Val Ser Lys Gly Ala Pro Glu Gln Ile Leu Glu Leu
420 425 430Ala Lys Ala Ser Asn
Asp Leu Ser Lys Lys Val Leu Ser Ile Ile Asp 435
440 445Lys Tyr Ala Glu Arg Gly Leu Arg Ser Leu Ala Val
Ala Arg Gln Val 450 455 460Val Pro Glu
Lys Thr Lys Glu Ser Pro Gly Ala Pro Trp Glu Phe Val465
470 475 480Gly Leu Leu Pro Leu Phe Asp
Pro Pro Arg His Asp Ser Ala Glu Thr 485
490 495Ile Arg Arg Ala Leu Asn Leu Gly Val Asn Val Lys
Met Ile Thr Gly 500 505 510Asp
Gln Leu Ala Ile Gly Lys Glu Thr Gly Arg Arg Leu Gly Met Gly 515
520 525Thr Asn Met Tyr Pro Ser Ser Ala Leu
Leu Gly Thr His Lys Asp Ala 530 535
540Asn Leu Ala Ser Ile Pro Val Glu Glu Leu Ile Glu Lys Ala Asp Gly545
550 555 560Phe Ala Gly Val
Phe Pro Glu His Lys Tyr Glu Ile Val Lys Lys Leu 565
570 575Gln Glu Arg Lys His Ile Val Gly Met Thr
Gly Asp Gly Val Asn Asp 580 585
590Ala Pro Ala Leu Lys Lys Ala Asp Ile Gly Ile Ala Val Ala Asp Ala
595 600 605Thr Asp Ala Ala Arg Gly Ala
Ser Asp Ile Val Leu Thr Glu Pro Gly 610 615
620Leu Ser Val Ile Ile Ser Ala Val Leu Thr Ser Arg Ala Ile Phe
Gln625 630 635 640Arg Met
Lys Asn Tyr Thr Ile Tyr Ala Val Ser Ile Thr Ile Arg Ile
645 650 655Val Phe Gly Phe Met Leu Ile
Ala Leu Ile Trp Glu Phe Asp Phe Ser 660 665
670Ala Phe Met Val Leu Ile Ile Ala Ile Leu Asn Asp Gly Thr
Ile Met 675 680 685Thr Ile Ser Lys
Asp Arg Val Lys Pro Ser Pro Thr Pro Asp Ser Trp 690
695 700Lys Leu Lys Glu Ile Phe Ala Thr Gly Val Val Leu
Gly Gly Tyr Gln705 710 715
720Ala Ile Met Thr Val Ile Phe Phe Trp Ala Ala His Lys Thr Asp Phe
725 730 735Phe Ser Asp Thr Phe
Gly Val Arg Ser Ile Arg Asp Asn Asn His Glu 740
745 750Leu Met Gly Ala Val Tyr Leu Gln Val Ser Ile Ile
Ser Gln Ala Leu 755 760 765Ile Phe
Val Thr Arg Ser Arg Ser Trp Ser Phe Val Glu Arg Pro Gly 770
775 780Ala Leu Leu Met Ile Ala Phe Leu Ile Ala Gln
Leu Ile Ala Thr Leu785 790 795
800Ile Ala Val Tyr Ala Asn Trp Glu Phe Ala Lys Ile Arg Gly Ile Gly
805 810 815Trp Gly Trp Ala
Gly Val Ile Trp Leu Tyr Ser Ile Val Thr Tyr Phe 820
825 830Pro Leu Asp Val Phe Lys Phe Ala Ile Arg Tyr
Ile Leu Ser Gly Lys 835 840 845Ala
Trp Leu Asn Leu Phe Glu Asn Lys Thr Ala Phe Thr Met Lys Lys 850
855 860Asp Tyr Gly Lys Glu Glu Arg Glu Ala Gln
Trp Ala Leu Ala Gln Arg865 870 875
880Thr Leu His Gly Leu Gln Pro Lys Glu Ala Val Asn Ile Phe Pro
Glu 885 890 895Lys Gly Ser
Tyr Arg Glu Leu Ser Glu Ile Ala Glu Gln Ala Lys Arg 900
905 910Arg Ala Glu Ile Ala Arg Leu Arg Glu Leu
His Thr Leu Lys Gly His 915 920
925Val Glu Ser Val Val Lys Leu Lys Gly Leu Asp Ile Glu Thr Pro Ser 930
935 940His Tyr Thr Val Leu Glu Val Arg
Phe Gln Met Ser Lys Gly Glu Glu945 950
955 960Leu Phe Thr Gly Val Val Pro Ile Leu Val Glu Leu
Asp Gly Asp Val 965 970
975Asn Gly His Lys Phe Ser Val Ser Gly Glu Gly Glu Gly Asp Ala Thr
980 985 990Tyr Gly Lys Leu Thr Leu
Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro 995 1000
1005Val Pro Trp Pro Thr Leu Val Thr Thr Phe Ser Tyr
Gly Val Gln 1010 1015 1020Cys Phe Ser
Arg Tyr Pro Asp His Met Lys Arg His Asp Phe Phe 1025
1030 1035Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu
Arg Thr Ile Ser 1040 1045 1050Phe Lys
Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe 1055
1060 1065Glu Gly Asp Thr Leu Val Asn Arg Ile Glu
Leu Lys Gly Ile Asp 1070 1075 1080Phe
Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr Asn 1085
1090 1095Tyr Asn Ser His Asn Val Tyr Ile Thr
Ala Asp Lys Gln Lys Asn 1100 1105
1110Gly Ile Lys Ala Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly
1115 1120 1125Ser Val Gln Leu Ala Asp
His Tyr Gln Gln Asn Thr Pro Ile Gly 1130 1135
1140Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr
Gln 1145 1150 1155Ser Ala Leu Ser Lys
Asp Pro Asn Glu Lys Arg Asp His Met Val 1160 1165
1170Leu Leu Glu Phe Val Thr Ala Ala Gly Ile Thr His Gly
Met Asp 1175 1180 1185Glu Leu Tyr Lys
119055303DNAHuman immunodeficiency virus type 1gene(1)..(303)Gene for
HIV-1 protease 55atg cct cag gtc act ctt tgg caa cga ccc ctc gtc aca ata
aag ata 48Met Pro Gln Val Thr Leu Trp Gln Arg Pro Leu Val Thr Ile
Lys Ile1 5 10 15ggg ggg
caa cta aag gaa gct cta tta gat aca gga gca gat gat aca 96Gly Gly
Gln Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp Thr 20
25 30gta tta gaa gaa atg agt ttg cca gga
aga tgg aaa cca aaa atg ata 144Val Leu Glu Glu Met Ser Leu Pro Gly
Arg Trp Lys Pro Lys Met Ile 35 40
45ggg gga att gga ggt ttt atc aaa gta aga cag tat gat cag ata ctc
192Gly Gly Ile Gly Gly Phe Ile Lys Val Arg Gln Tyr Asp Gln Ile Leu 50
55 60ata gaa atc tgt gga cat aaa gct ata
ggt aca gta tta gta gga cct 240Ile Glu Ile Cys Gly His Lys Ala Ile
Gly Thr Val Leu Val Gly Pro65 70 75
80aca cct gtc aac ata att gga aga aat ctg ttg act cag att
ggt tgc 288Thr Pro Val Asn Ile Ile Gly Arg Asn Leu Leu Thr Gln Ile
Gly Cys 85 90 95act tta
aat ttt tga 303Thr Leu
Asn Phe 10056100PRTHuman immunodeficiency virus type 1 56Met
Pro Gln Val Thr Leu Trp Gln Arg Pro Leu Val Thr Ile Lys Ile1
5 10 15Gly Gly Gln Leu Lys Glu Ala
Leu Leu Asp Thr Gly Ala Asp Asp Thr 20 25
30Val Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys
Met Ile 35 40 45Gly Gly Ile Gly
Gly Phe Ile Lys Val Arg Gln Tyr Asp Gln Ile Leu 50 55
60Ile Glu Ile Cys Gly His Lys Ala Ile Gly Thr Val Leu
Val Gly Pro65 70 75
80Thr Pro Val Asn Ile Ile Gly Arg Asn Leu Leu Thr Gln Ile Gly Cys
85 90 95Thr Leu Asn Phe
1005721DNAArtificialSynthetic Sequence 57agc caa aat tac cct ata gtg
21Ser Gln Asn Tyr Pro Ile Val1
5587PRTArtificialSynthetic Sequence 58Ser Gln Asn Tyr Pro Ile
Val1 55921DNAArtificialSynthetic Sequence 59gca aga gtt ttg
gct gaa gca 21Ala Arg Val Leu
Ala Glu Ala1 5607PRTArtificialSynthetic Sequence 60Ala Arg
Val Leu Ala Glu Ala1 56121DNAArtificialSynthetic Sequence
61gct acc ata atg atg cag aga
21Ala Thr Ile Met Met Gln Arg1 5627PRTArtificialSynthetic
Sequence 62Ala Thr Ile Met Met Gln Arg1
56321DNAArtificialSynthetic Sequence 63aga cag gct aat ttt tta ggg
21Arg Gln Ala Asn Phe Leu Gly1
5647PRTArtificialSynthetic Sequence 64Arg Gln Ala Asn Phe Leu Gly1
56521DNAArtificialSynthetic Sequence 65cca ggg aat ttt ctt
cag agc 21Pro Gly Asn Phe Leu
Gln Ser1 5667PRTArtificialSynthetic Sequence 66Pro Gly Asn
Phe Leu Gln Ser1 56715DNAArtificialSynthetic Sequence 67agc
gtg cct caa ata 15Ser
Val Pro Gln Ile1 5685PRTArtificialSynthetic Sequence 68Ser
Val Pro Gln Ile1 56921DNAArtificialSynthetic Sequence 69act
tta aat ttt ccc att agc 21Thr
Leu Asn Phe Pro Ile Ser1 5707PRTArtificialSynthetic
Sequence 70Thr Leu Asn Phe Pro Ile Ser1
57121DNAArtificialSynthetic Sequence 71gca gaa acc ttc tat gta gat
21Ala Glu Thr Phe Tyr Val Asp1
5727PRTArtificialSynthetic Sequence 72Ala Glu Thr Phe Tyr Val Asp1
57321DNAArtificialSynthetic Sequence 73agg aaa gta cta ttt
tta gat 21Arg Lys Val Leu Phe
Leu Asp1 5747PRTArtificialSynthetic Sequence 74Arg Lys Val
Leu Phe Leu Asp1 5751647DNAArtificialSynthetic Sequence
75atg gtg cgc tcc tcc aag aac gtc atc aag gag ttc atg cgc ttc aag
48Met Val Arg Ser Ser Lys Asn Val Ile Lys Glu Phe Met Arg Phe Lys1
5 10 15gtg cgc atg gag ggc acc
gtg aac ggc cac gag ttc gag atc gag ggc 96Val Arg Met Glu Gly Thr
Val Asn Gly His Glu Phe Glu Ile Glu Gly 20 25
30gag ggc gag ggc cgc ccc tac gag ggc cac aac acc gtg
aag ctg aag 144Glu Gly Glu Gly Arg Pro Tyr Glu Gly His Asn Thr Val
Lys Leu Lys 35 40 45gtg acc aag
ggc ggc ccc ctg ccc ttc gcc tgg gac atc ctg tcc ccc 192Val Thr Lys
Gly Gly Pro Leu Pro Phe Ala Trp Asp Ile Leu Ser Pro 50
55 60cag ttc cag tac ggc tcc aag gtg tac gtg aag cac
ccc gcc gac atc 240Gln Phe Gln Tyr Gly Ser Lys Val Tyr Val Lys His
Pro Ala Asp Ile65 70 75
80ccc gac tac aag aag ctg tcc ttc ccc gag ggc ttc aag tgg gag cgc
288Pro Asp Tyr Lys Lys Leu Ser Phe Pro Glu Gly Phe Lys Trp Glu Arg
85 90 95gtg atg aac ttc gag gac
ggc ggc gtg gtg acc gtg acc cag gac tcc 336Val Met Asn Phe Glu Asp
Gly Gly Val Val Thr Val Thr Gln Asp Ser 100
105 110tcc ctg cag gac ggc tgc ttc atc tac aag gtg aag
ttc atc ggc gtg 384Ser Leu Gln Asp Gly Cys Phe Ile Tyr Lys Val Lys
Phe Ile Gly Val 115 120 125aac ttc
ccc tcc gac ggc ccc gta atg cag aag aag acc atg ggc tgg 432Asn Phe
Pro Ser Asp Gly Pro Val Met Gln Lys Lys Thr Met Gly Trp 130
135 140gag gcc tcc acc gag cgc ctg tac ccc cgc gac
ggc gtg ctg aag ggc 480Glu Ala Ser Thr Glu Arg Leu Tyr Pro Arg Asp
Gly Val Leu Lys Gly145 150 155
160gag atc cac aag gcc ctg aag ctg aag gac ggc ggc cac tac ctg gtg
528Glu Ile His Lys Ala Leu Lys Leu Lys Asp Gly Gly His Tyr Leu Val
165 170 175gag ttc aag tcc atc
tac atg gcc aag aag ccc gtg cag ctg ccc ggc 576Glu Phe Lys Ser Ile
Tyr Met Ala Lys Lys Pro Val Gln Leu Pro Gly 180
185 190tac tac tac gtg gac tcc aag ctg gac atc acc tcc
cac aac gag gac 624Tyr Tyr Tyr Val Asp Ser Lys Leu Asp Ile Thr Ser
His Asn Glu Asp 195 200 205tac acc
atc gtg gag cag tac gag cgc acc gag ggc cgc cac cac ctg 672Tyr Thr
Ile Val Glu Gln Tyr Glu Arg Thr Glu Gly Arg His His Leu 210
215 220ttc ctg ccc cgg gca atc aag ctg ggt aga cag
gct aat ttt tta ggg 720Phe Leu Pro Arg Ala Ile Lys Leu Gly Arg Gln
Ala Asn Phe Leu Gly225 230 235
240gga aaa act tcg gga gcg aaa cag gcg act gtg gtg gtc gca gcg atg
768Gly Lys Thr Ser Gly Ala Lys Gln Ala Thr Val Val Val Ala Ala Met
245 250 255gcg tta gga tgg tta
gcc ata gag atc gct ttc aag cct ttc ctc gat 816Ala Leu Gly Trp Leu
Ala Ile Glu Ile Ala Phe Lys Pro Phe Leu Asp 260
265 270aaa ttc cgc tcc tca atc gac aaa tct gac cca acc
aaa gac ccc gat 864Lys Phe Arg Ser Ser Ile Asp Lys Ser Asp Pro Thr
Lys Asp Pro Asp 275 280 285gac ttc
gac acc gcc gct act gca acc aca tcc aaa gag ggt ttg ggg 912Asp Phe
Asp Thr Ala Ala Thr Ala Thr Thr Ser Lys Glu Gly Leu Gly 290
295 300atc caa gga gat ata aca atg agt aaa gga gaa
gaa ctt ttc act gga 960Ile Gln Gly Asp Ile Thr Met Ser Lys Gly Glu
Glu Leu Phe Thr Gly305 310 315
320gtt gtc cca att ctt gtt gaa tta gat ggt gat gtt aat ggg cac aaa
1008Val Val Pro Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys
325 330 335ttt tct gtc agt gga
gag ggt gaa ggt gat gca aca tac gga aaa ctt 1056Phe Ser Val Ser Gly
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu 340
345 350acc ctt aaa ttt att tgc act act gga aaa cta cct
gtt cca tgg cca 1104Thr Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro
Val Pro Trp Pro 355 360 365aca ctt
gtc act act ttc tct tat ggt gtt caa tgc ttt tca aga tac 1152Thr Leu
Val Thr Thr Phe Ser Tyr Gly Val Gln Cys Phe Ser Arg Tyr 370
375 380cca gat cat atg aag cgg cac gac ttc ttc aag
agc gcc atg cct gag 1200Pro Asp His Met Lys Arg His Asp Phe Phe Lys
Ser Ala Met Pro Glu385 390 395
400gga tac gtg cag gag agg acc atc tct ttc aag gac gac ggg aac tac
1248Gly Tyr Val Gln Glu Arg Thr Ile Ser Phe Lys Asp Asp Gly Asn Tyr
405 410 415aag aca cgt gct gaa
gtc aag ttt gag gga gac acc ctc gtc aac agg 1296Lys Thr Arg Ala Glu
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg 420
425 430atc gag ctt aag gga atc gat ttc aag gag gac gga
aac atc ctc ggc 1344Ile Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp Gly
Asn Ile Leu Gly 435 440 445cac aag
ttg gaa tac aac tac aac tcc cac aac gta tac atc acg gca 1392His Lys
Leu Glu Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Thr Ala 450
455 460gac aaa caa aag aat gga atc aaa gct aac ttc
aaa att aga cac aac 1440Asp Lys Gln Lys Asn Gly Ile Lys Ala Asn Phe
Lys Ile Arg His Asn465 470 475
480att gaa gat gga agc gtt caa cta gca gac cat tat caa caa aat act
1488Ile Glu Asp Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr
485 490 495cca att ggc gat ggc
cct gtc ctt tta cca gac aac cat tac ctg tcc 1536Pro Ile Gly Asp Gly
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser 500
505 510aca caa tct gcc ctt tcg aaa gat ccc aac gaa aag
aga gac cac atg 1584Thr Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys
Arg Asp His Met 515 520 525gtc ctt
ctt gag ttt gta aca gct gct ggg att aca cat ggc atg gat 1632Val Leu
Leu Glu Phe Val Thr Ala Ala Gly Ile Thr His Gly Met Asp 530
535 540gaa cta tac aaa taa
1647Glu Leu Tyr Lys54576548PRTArtificialSynthetic
Sequence 76Met Val Arg Ser Ser Lys Asn Val Ile Lys Glu Phe Met Arg Phe
Lys1 5 10 15Val Arg Met
Glu Gly Thr Val Asn Gly His Glu Phe Glu Ile Glu Gly 20
25 30Glu Gly Glu Gly Arg Pro Tyr Glu Gly His
Asn Thr Val Lys Leu Lys 35 40
45Val Thr Lys Gly Gly Pro Leu Pro Phe Ala Trp Asp Ile Leu Ser Pro 50
55 60Gln Phe Gln Tyr Gly Ser Lys Val Tyr
Val Lys His Pro Ala Asp Ile65 70 75
80Pro Asp Tyr Lys Lys Leu Ser Phe Pro Glu Gly Phe Lys Trp
Glu Arg 85 90 95Val Met
Asn Phe Glu Asp Gly Gly Val Val Thr Val Thr Gln Asp Ser 100
105 110Ser Leu Gln Asp Gly Cys Phe Ile Tyr
Lys Val Lys Phe Ile Gly Val 115 120
125Asn Phe Pro Ser Asp Gly Pro Val Met Gln Lys Lys Thr Met Gly Trp
130 135 140Glu Ala Ser Thr Glu Arg Leu
Tyr Pro Arg Asp Gly Val Leu Lys Gly145 150
155 160Glu Ile His Lys Ala Leu Lys Leu Lys Asp Gly Gly
His Tyr Leu Val 165 170
175Glu Phe Lys Ser Ile Tyr Met Ala Lys Lys Pro Val Gln Leu Pro Gly
180 185 190Tyr Tyr Tyr Val Asp Ser
Lys Leu Asp Ile Thr Ser His Asn Glu Asp 195 200
205Tyr Thr Ile Val Glu Gln Tyr Glu Arg Thr Glu Gly Arg His
His Leu 210 215 220Phe Leu Pro Arg Ala
Ile Lys Leu Gly Arg Gln Ala Asn Phe Leu Gly225 230
235 240Gly Lys Thr Ser Gly Ala Lys Gln Ala Thr
Val Val Val Ala Ala Met 245 250
255Ala Leu Gly Trp Leu Ala Ile Glu Ile Ala Phe Lys Pro Phe Leu Asp
260 265 270Lys Phe Arg Ser Ser
Ile Asp Lys Ser Asp Pro Thr Lys Asp Pro Asp 275
280 285Asp Phe Asp Thr Ala Ala Thr Ala Thr Thr Ser Lys
Glu Gly Leu Gly 290 295 300Ile Gln Gly
Asp Ile Thr Met Ser Lys Gly Glu Glu Leu Phe Thr Gly305
310 315 320Val Val Pro Ile Leu Val Glu
Leu Asp Gly Asp Val Asn Gly His Lys 325
330 335Phe Ser Val Ser Gly Glu Gly Glu Gly Asp Ala Thr
Tyr Gly Lys Leu 340 345 350Thr
Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro 355
360 365Thr Leu Val Thr Thr Phe Ser Tyr Gly
Val Gln Cys Phe Ser Arg Tyr 370 375
380Pro Asp His Met Lys Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu385
390 395 400Gly Tyr Val Gln
Glu Arg Thr Ile Ser Phe Lys Asp Asp Gly Asn Tyr 405
410 415Lys Thr Arg Ala Glu Val Lys Phe Glu Gly
Asp Thr Leu Val Asn Arg 420 425
430Ile Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly
435 440 445His Lys Leu Glu Tyr Asn Tyr
Asn Ser His Asn Val Tyr Ile Thr Ala 450 455
460Asp Lys Gln Lys Asn Gly Ile Lys Ala Asn Phe Lys Ile Arg His
Asn465 470 475 480Ile Glu
Asp Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr
485 490 495Pro Ile Gly Asp Gly Pro Val
Leu Leu Pro Asp Asn His Tyr Leu Ser 500 505
510Thr Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp
His Met 515 520 525Val Leu Leu Glu
Phe Val Thr Ala Ala Gly Ile Thr His Gly Met Asp 530
535 540Glu Leu Tyr Lys545771893DNAHepatitis C
virusCDS(1)..(1893)HCV NS3 protease 77gcg cct atc acg gcc tat tcc caa caa
acg cgg ggc ctg ctt ggc tgt 48Ala Pro Ile Thr Ala Tyr Ser Gln Gln
Thr Arg Gly Leu Leu Gly Cys1 5 10
15atc atc act agc ctc aca ggt cgg gac aag aac cag gtc gat ggg
gag 96Ile Ile Thr Ser Leu Thr Gly Arg Asp Lys Asn Gln Val Asp Gly
Glu 20 25 30gtt cag gtg ctc
tcc acc gca acg caa tct ttc ctg gcg acc tgc gtc 144Val Gln Val Leu
Ser Thr Ala Thr Gln Ser Phe Leu Ala Thr Cys Val 35
40 45aat ggc gtg tgt tgg acc gtc tac cat ggt gcc ggc
tcg aag acc ctg 192Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly
Ser Lys Thr Leu 50 55 60gcc ggc ccg
aag ggt cca atc acc caa atg tac acc aat gta gac cag 240Ala Gly Pro
Lys Gly Pro Ile Thr Gln Met Tyr Thr Asn Val Asp Gln65 70
75 80gac ctc gtc ggc tgg ccg gcg ccc
ccc ggg gcg cgc tcc atg aca ccg 288Asp Leu Val Gly Trp Pro Ala Pro
Pro Gly Ala Arg Ser Met Thr Pro 85 90
95tgc acc tgc ggc agc tcg gac ctt tac ttg gtc acg agg cat
gcc gat 336Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His
Ala Asp 100 105 110gtc att ccg
gtg cgc cgg cga ggc gac agc agg ggg agt cta ctc tcc 384Val Ile Pro
Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu Ser 115
120 125cct agg ccc gtc tcc tac ctg aag ggc tcc tcg
ggt gga cca ctg ctt 432Pro Arg Pro Val Ser Tyr Leu Lys Gly Ser Ser
Gly Gly Pro Leu Leu 130 135 140tgc cct
tcg ggg cac gtt gta ggc atc ttc cgg gct gct gtg tgc acc 480Cys Pro
Ser Gly His Val Val Gly Ile Phe Arg Ala Ala Val Cys Thr145
150 155 160cgg ggg gtt gcg aag gcg gtg
gac ttc ata ccc gtt gag tct atg gaa 528Arg Gly Val Ala Lys Ala Val
Asp Phe Ile Pro Val Glu Ser Met Glu 165
170 175act acc atg cgg tct ccg gtc ttc aca gac aac tca
tcc cct ccg gcc 576Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser
Ser Pro Pro Ala 180 185 190gta
ccg caa aca ttc caa gtg gca cat tta cac gct ccc act ggc agc 624Val
Pro Gln Thr Phe Gln Val Ala His Leu His Ala Pro Thr Gly Ser 195
200 205ggc aag agc acc aaa gtg ccg gct gca
tat gca gcc caa ggg tac aag 672Gly Lys Ser Thr Lys Val Pro Ala Ala
Tyr Ala Ala Gln Gly Tyr Lys 210 215
220gtg ctc gtc cta aac ccg tcc gtt gct gcc aca ttg ggc ttt gga gcg
720Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala225
230 235 240tat atg tcc aag
gca cat ggc atc gag cct aac atc aga act ggg gta 768Tyr Met Ser Lys
Ala His Gly Ile Glu Pro Asn Ile Arg Thr Gly Val 245
250 255agg acc atc acc acg ggc ggc ccc atc acg
tac tcc acc tat ggc aag 816Arg Thr Ile Thr Thr Gly Gly Pro Ile Thr
Tyr Ser Thr Tyr Gly Lys 260 265
270ttc ctt gcc gac ggt gga tgc tcc ggg ggc gcc tat gac atc ata ata
864Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp Ile Ile Ile
275 280 285tgt gac gaa tgc cac tca act
gac tgg aca acc atc ttg ggc atc ggc 912Cys Asp Glu Cys His Ser Thr
Asp Trp Thr Thr Ile Leu Gly Ile Gly 290 295
300aca gtc ctg gat cag gca gag acg gct gga gcg cgg ctc gtc gtg ctc
960Thr Val Leu Asp Gln Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu305
310 315 320gcc acc gcc acg
cct ccg gga tcg atc acc gtg cca cac ccc aac atc 1008Ala Thr Ala Thr
Pro Pro Gly Ser Ile Thr Val Pro His Pro Asn Ile 325
330 335gag gaa gtg gcc ctg tcc aac act ggg gag
att ccc ttc tat ggc aaa 1056Glu Glu Val Ala Leu Ser Asn Thr Gly Glu
Ile Pro Phe Tyr Gly Lys 340 345
350gcc atc ccc att gag gcc atc aag ggg gga agg cat ctc atc ttc tgc
1104Ala Ile Pro Ile Glu Ala Ile Lys Gly Gly Arg His Leu Ile Phe Cys
355 360 365cat tcc aag aag aag tgt gac
gag ctc gcc gca aag ctg aca ggc ctc 1152His Ser Lys Lys Lys Cys Asp
Glu Leu Ala Ala Lys Leu Thr Gly Leu 370 375
380gga ctc aat gct gta gcg tat tac cgg ggt ctc gat gtg tcc gtc ata
1200Gly Leu Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val Ile385
390 395 400ccg act agc gga
gac gtc gtt gtc gtg gca aca gac gct cta atg acg 1248Pro Thr Ser Gly
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr 405
410 415ggc ttt acc ggc gac ttt gac tca gtg atc
gac tgc aac aca tgt gtc 1296Gly Phe Thr Gly Asp Phe Asp Ser Val Ile
Asp Cys Asn Thr Cys Val 420 425
430acc cag aca gtc gat ttc agc ttg gat ccc acc ttc acc att gag acg
1344Thr Gln Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr Ile Glu Thr
435 440 445aca acc gtg ccc caa gac gcg
gtg tcg cgc tcg cag cgg cga ggt agg 1392Thr Thr Val Pro Gln Asp Ala
Val Ser Arg Ser Gln Arg Arg Gly Arg 450 455
460act ggc agg ggc agg agt ggc atc tac agg ttt gtg act cca gga gaa
1440Thr Gly Arg Gly Arg Ser Gly Ile Tyr Arg Phe Val Thr Pro Gly Glu465
470 475 480cgg ccc tca ggc
atg ttc gac tcc tcg gtc ctg tgt gag tgc tat gac 1488Arg Pro Ser Gly
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp 485
490 495gca ggc tgc gct tgg tat gag ctc acg ccc
gct gag act aca gtc agg 1536Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro
Ala Glu Thr Thr Val Arg 500 505
510ttg cgg gct tac ctg aat aca cca ggg ttg ccc gtc tgc cag gac cat
1584Leu Arg Ala Tyr Leu Asn Thr Pro Gly Leu Pro Val Cys Gln Asp His
515 520 525ctg gag ttc tgg gaa agc gtc
ttc aca ggc ctc acc cac ata gat gcc 1632Leu Glu Phe Trp Glu Ser Val
Phe Thr Gly Leu Thr His Ile Asp Ala 530 535
540cac ttc ctg tcc caa acc aag cag gca gga gac aac ttc ccc tac ctg
1680His Phe Leu Ser Gln Thr Lys Gln Ala Gly Asp Asn Phe Pro Tyr Leu545
550 555 560gtg gca tac caa
gcc acg gtg tgc gcc agg gct cag gct cca cct cca 1728Val Ala Tyr Gln
Ala Thr Val Cys Ala Arg Ala Gln Ala Pro Pro Pro 565
570 575tcg tgg gat caa atg tgg aag tgt ctc ata
cgg ctt aaa cct acg ctg 1776Ser Trp Asp Gln Met Trp Lys Cys Leu Ile
Arg Leu Lys Pro Thr Leu 580 585
590cac ggg cca aca ccc ctg ctg tat agg cta gga gcc gtt caa aat gag
1824His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gln Asn Glu
595 600 605atc acc ctc aca cat ccc ata
acc aaa ttc gtc atg gca tgc atg tcg 1872Ile Thr Leu Thr His Pro Ile
Thr Lys Phe Val Met Ala Cys Met Ser 610 615
620gcc gac ctg gag gtc gtc act
1893Ala Asp Leu Glu Val Val Thr625 63078631PRTHepatitis
C virus 78Ala Pro Ile Thr Ala Tyr Ser Gln Gln Thr Arg Gly Leu Leu Gly
Cys1 5 10 15Ile Ile Thr
Ser Leu Thr Gly Arg Asp Lys Asn Gln Val Asp Gly Glu 20
25 30Val Gln Val Leu Ser Thr Ala Thr Gln Ser
Phe Leu Ala Thr Cys Val 35 40
45Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Ser Lys Thr Leu 50
55 60Ala Gly Pro Lys Gly Pro Ile Thr Gln
Met Tyr Thr Asn Val Asp Gln65 70 75
80Asp Leu Val Gly Trp Pro Ala Pro Pro Gly Ala Arg Ser Met
Thr Pro 85 90 95Cys Thr
Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala Asp 100
105 110Val Ile Pro Val Arg Arg Arg Gly Asp
Ser Arg Gly Ser Leu Leu Ser 115 120
125Pro Arg Pro Val Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu Leu
130 135 140Cys Pro Ser Gly His Val Val
Gly Ile Phe Arg Ala Ala Val Cys Thr145 150
155 160Arg Gly Val Ala Lys Ala Val Asp Phe Ile Pro Val
Glu Ser Met Glu 165 170
175Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser Pro Pro Ala
180 185 190Val Pro Gln Thr Phe Gln
Val Ala His Leu His Ala Pro Thr Gly Ser 195 200
205Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gln Gly
Tyr Lys 210 215 220Val Leu Val Leu Asn
Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala225 230
235 240Tyr Met Ser Lys Ala His Gly Ile Glu Pro
Asn Ile Arg Thr Gly Val 245 250
255Arg Thr Ile Thr Thr Gly Gly Pro Ile Thr Tyr Ser Thr Tyr Gly Lys
260 265 270Phe Leu Ala Asp Gly
Gly Cys Ser Gly Gly Ala Tyr Asp Ile Ile Ile 275
280 285Cys Asp Glu Cys His Ser Thr Asp Trp Thr Thr Ile
Leu Gly Ile Gly 290 295 300Thr Val Leu
Asp Gln Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu305
310 315 320Ala Thr Ala Thr Pro Pro Gly
Ser Ile Thr Val Pro His Pro Asn Ile 325
330 335Glu Glu Val Ala Leu Ser Asn Thr Gly Glu Ile Pro
Phe Tyr Gly Lys 340 345 350Ala
Ile Pro Ile Glu Ala Ile Lys Gly Gly Arg His Leu Ile Phe Cys 355
360 365His Ser Lys Lys Lys Cys Asp Glu Leu
Ala Ala Lys Leu Thr Gly Leu 370 375
380Gly Leu Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val Ile385
390 395 400Pro Thr Ser Gly
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr 405
410 415Gly Phe Thr Gly Asp Phe Asp Ser Val Ile
Asp Cys Asn Thr Cys Val 420 425
430Thr Gln Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr Ile Glu Thr
435 440 445Thr Thr Val Pro Gln Asp Ala
Val Ser Arg Ser Gln Arg Arg Gly Arg 450 455
460Thr Gly Arg Gly Arg Ser Gly Ile Tyr Arg Phe Val Thr Pro Gly
Glu465 470 475 480Arg Pro
Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp
485 490 495Ala Gly Cys Ala Trp Tyr Glu
Leu Thr Pro Ala Glu Thr Thr Val Arg 500 505
510Leu Arg Ala Tyr Leu Asn Thr Pro Gly Leu Pro Val Cys Gln
Asp His 515 520 525Leu Glu Phe Trp
Glu Ser Val Phe Thr Gly Leu Thr His Ile Asp Ala 530
535 540His Phe Leu Ser Gln Thr Lys Gln Ala Gly Asp Asn
Phe Pro Tyr Leu545 550 555
560Val Ala Tyr Gln Ala Thr Val Cys Ala Arg Ala Gln Ala Pro Pro Pro
565 570 575Ser Trp Asp Gln Met
Trp Lys Cys Leu Ile Arg Leu Lys Pro Thr Leu 580
585 590His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala
Val Gln Asn Glu 595 600 605Ile Thr
Leu Thr His Pro Ile Thr Lys Phe Val Met Ala Cys Met Ser 610
615 620Ala Asp Leu Glu Val Val Thr625
6307930DNAArtificialSynthetic Sequence 79gat gaa atg gaa gag tgc gcc tca
cac ctc 30Asp Glu Met Glu Glu Cys Ala Ser
His Leu1 5 108010PRTArtificialSynthetic
Sequence 80Asp Glu Met Glu Glu Cys Ala Ser His Leu1 5
108130DNAArtificialSynthetic Sequence 81gac tgc tcc acg cca
tgc tcc ggc tcg tgg 30Asp Cys Ser Thr Pro
Cys Ser Gly Ser Trp1 5
108210PRTArtificialSynthetic Sequence 82Asp Cys Ser Thr Pro Cys Ser Gly
Ser Trp1 5 108330DNAArtificialSynthetic
Sequence 83gac gac atc gtc tgc tgc tca atg tcc tac
30Asp Asp Ile Val Cys Cys Ser Met Ser Tyr1 5
108410PRTArtificialSynthetic Sequence 84Asp Asp Ile Val Cys Cys
Ser Met Ser Tyr1 5 10851908DNAHuman
herpesvirus 1gene(1)..(1908)Gene for HSV-1 protease 85atg gca gcc gat gcc
ccg gga gac cgg atg gag gag ccc ctg ccc gac 48Met Ala Ala Asp Ala
Pro Gly Asp Arg Met Glu Glu Pro Leu Pro Asp1 5
10 15agg gcc gtg ccc att tac gtg gct ggg ttt ttg
gcc ctg tat gac agc 96Arg Ala Val Pro Ile Tyr Val Ala Gly Phe Leu
Ala Leu Tyr Asp Ser 20 25
30ggg gac tcg ggc gag ttg gca ttg gat ccg gat acg gtg cgg gcg gcc
144Gly Asp Ser Gly Glu Leu Ala Leu Asp Pro Asp Thr Val Arg Ala Ala
35 40 45ctg cct ccg gat aac cca ctc ccg
att aac gtg gac cac cgc gct ggc 192Leu Pro Pro Asp Asn Pro Leu Pro
Ile Asn Val Asp His Arg Ala Gly 50 55
60tgc gag gtg ggg cgg gtg ctg gcc gtg gtc gac gac ccc cgc ggg ccg
240Cys Glu Val Gly Arg Val Leu Ala Val Val Asp Asp Pro Arg Gly Pro65
70 75 80ttt ttt gtg ggg ctg
atc gcc tgc gtg cag ctg gag cgc gtc ctc gag 288Phe Phe Val Gly Leu
Ile Ala Cys Val Gln Leu Glu Arg Val Leu Glu 85
90 95acg gcc gcc agc gct gcg att ttc gag cgc cgc
ggg ccg ccg ctc tcc 336Thr Ala Ala Ser Ala Ala Ile Phe Glu Arg Arg
Gly Pro Pro Leu Ser 100 105
110cgg gag gag cgc ctg ttg tac ctg atc acc aac tac ctg ccc tcg gtc
384Arg Glu Glu Arg Leu Leu Tyr Leu Ile Thr Asn Tyr Leu Pro Ser Val
115 120 125tcc ctg gcc aca aaa cgc ctg
ggg ggc gag gcg cac ccc gat cgc acg 432Ser Leu Ala Thr Lys Arg Leu
Gly Gly Glu Ala His Pro Asp Arg Thr 130 135
140ctg ttc gcg cac gtc gcg ctg tgc gcg atc ggg cgg cgc ctc ggc act
480Leu Phe Ala His Val Ala Leu Cys Ala Ile Gly Arg Arg Leu Gly Thr145
150 155 160atc gtc acc tac
gac acc ggt ctc gac gcc gcc atc gcg ccc ttt cgc 528Ile Val Thr Tyr
Asp Thr Gly Leu Asp Ala Ala Ile Ala Pro Phe Arg 165
170 175cac ctg tcg ccg gcg tct cgc gag ggg gcg
cgg cga ctg gcc gcc gag 576His Leu Ser Pro Ala Ser Arg Glu Gly Ala
Arg Arg Leu Ala Ala Glu 180 185
190gcc gag ctc gcg ctg tcc ggg cgc acc tgg gcg ccc ggc gtg gag gcg
624Ala Glu Leu Ala Leu Ser Gly Arg Thr Trp Ala Pro Gly Val Glu Ala
195 200 205ctg acc cac acg ctg ctt tcc
acc gcc gtt aac aac atg atg ctg cgg 672Leu Thr His Thr Leu Leu Ser
Thr Ala Val Asn Asn Met Met Leu Arg 210 215
220gac cgc tgg agc ctg gtg gcc gag cgg cgg cgg cag gcc ggg atc gcc
720Asp Arg Trp Ser Leu Val Ala Glu Arg Arg Arg Gln Ala Gly Ile Ala225
230 235 240gga cac acc tac
ctc cag gcg agc gaa aaa ttc aaa atg tgg ggg gcg 768Gly His Thr Tyr
Leu Gln Ala Ser Glu Lys Phe Lys Met Trp Gly Ala 245
250 255gag cct gtt tcc gcg ccg gcg cgc ggg tat
aag aac ggg gcc ccg gag 816Glu Pro Val Ser Ala Pro Ala Arg Gly Tyr
Lys Asn Gly Ala Pro Glu 260 265
270tcc acg gac ata ccg ccc ggc tcg atc gct gcc gcg ccg cag ggt gac
864Ser Thr Asp Ile Pro Pro Gly Ser Ile Ala Ala Ala Pro Gln Gly Asp
275 280 285cgg tgc cca atc gtc cgt cag
cgc ggg gtc gcc ttg tcc ccg gta ctg 912Arg Cys Pro Ile Val Arg Gln
Arg Gly Val Ala Leu Ser Pro Val Leu 290 295
300ccc ccc atg aac ccc gtt ccg aca tcg ggc acc ccg gcc ccc gcg ccg
960Pro Pro Met Asn Pro Val Pro Thr Ser Gly Thr Pro Ala Pro Ala Pro305
310 315 320ccc ggc gac ggg
agc tac ctg tgg atc ccg gcc tcc cat tac aac cag 1008Pro Gly Asp Gly
Ser Tyr Leu Trp Ile Pro Ala Ser His Tyr Asn Gln 325
330 335ctc gtc gcc ggc cat gcc gcg ccc caa ccc
cag ccg cat tcc gcg ttt 1056Leu Val Ala Gly His Ala Ala Pro Gln Pro
Gln Pro His Ser Ala Phe 340 345
350ggt ttc ccg gct gcg gcg ggg tcc gtg gcc tat ggg cct cac ggt gcg
1104Gly Phe Pro Ala Ala Ala Gly Ser Val Ala Tyr Gly Pro His Gly Ala
355 360 365ggt ctt tcc cag cat tac cct
ccc cac gtc gcc cat cag tat ccc ggg 1152Gly Leu Ser Gln His Tyr Pro
Pro His Val Ala His Gln Tyr Pro Gly 370 375
380gtg ctg ttc tcg gga ccc agc cca ctc gag gcg cag ata gcc gcg ttg
1200Val Leu Phe Ser Gly Pro Ser Pro Leu Glu Ala Gln Ile Ala Ala Leu385
390 395 400gtg ggg gcc ata
gcc gcg gac cgc cag gcg ggc ggt cag ccg gcc gcg 1248Val Gly Ala Ile
Ala Ala Asp Arg Gln Ala Gly Gly Gln Pro Ala Ala 405
410 415gga gac cct ggg gtc cgg ggg tcg gga aag
cgt cgc cgg tac gag gcg 1296Gly Asp Pro Gly Val Arg Gly Ser Gly Lys
Arg Arg Arg Tyr Glu Ala 420 425
430ggg ccg tcg gag tcc tac tgc gac cag gac gaa ccg gac gcg gac tac
1344Gly Pro Ser Glu Ser Tyr Cys Asp Gln Asp Glu Pro Asp Ala Asp Tyr
435 440 445ccg tac tac ccc ggg gag gct
cga ggc gcg ccg cgc ggg gtc gac tcc 1392Pro Tyr Tyr Pro Gly Glu Ala
Arg Gly Ala Pro Arg Gly Val Asp Ser 450 455
460cgg cgc gcg gcc cgc cat tct ccc ggg acc aac gag acc atc acg gcg
1440Arg Arg Ala Ala Arg His Ser Pro Gly Thr Asn Glu Thr Ile Thr Ala465
470 475 480ctg atg ggg gcg
gtg acg tct ctg cag cag gaa ctg gcg cac atg cgg 1488Leu Met Gly Ala
Val Thr Ser Leu Gln Gln Glu Leu Ala His Met Arg 485
490 495gct cgg acc agc gcc ccc tat gga atg tac
acg ccg gtg gcg cac tat 1536Ala Arg Thr Ser Ala Pro Tyr Gly Met Tyr
Thr Pro Val Ala His Tyr 500 505
510cgc cct cag gtg ggg gag ccg gaa cca aca acg acc cac ccg gcc ctt
1584Arg Pro Gln Val Gly Glu Pro Glu Pro Thr Thr Thr His Pro Ala Leu
515 520 525tgt ccc ccg gag gcc gtg tat
cgc ccc cca cca cac agc gcc ccc tac 1632Cys Pro Pro Glu Ala Val Tyr
Arg Pro Pro Pro His Ser Ala Pro Tyr 530 535
540ggt cct ccc cag ggt ccg gcg tcc cat gcc ccc act ccc ccg tat gcc
1680Gly Pro Pro Gln Gly Pro Ala Ser His Ala Pro Thr Pro Pro Tyr Ala545
550 555 560cca gct gcc tgc
ccg cca ggc ccg cca ccg ccc cca tgt cct tcc acc 1728Pro Ala Ala Cys
Pro Pro Gly Pro Pro Pro Pro Pro Cys Pro Ser Thr 565
570 575cag acg cgc gcc cct cta ccg acg gag ccc
gcg ttc ccc ccc gcc gcc 1776Gln Thr Arg Ala Pro Leu Pro Thr Glu Pro
Ala Phe Pro Pro Ala Ala 580 585
590acc gga tcc caa ccg gag gca tcc aac gcg gag gcc ggg gcc ctt gtc
1824Thr Gly Ser Gln Pro Glu Ala Ser Asn Ala Glu Ala Gly Ala Leu Val
595 600 605aac gcc agc agc gca gca cac
gtg gac gtt gac acg gcc cgc gcc gcc 1872Asn Ala Ser Ser Ala Ala His
Val Asp Val Asp Thr Ala Arg Ala Ala 610 615
620gat ttg ttc gtc tct cag atg atg ggg gcc cgc tga
1908Asp Leu Phe Val Ser Gln Met Met Gly Ala Arg625 630
63586635PRTHuman herpesvirus 1 86Met Ala Ala Asp Ala Pro
Gly Asp Arg Met Glu Glu Pro Leu Pro Asp1 5
10 15Arg Ala Val Pro Ile Tyr Val Ala Gly Phe Leu Ala
Leu Tyr Asp Ser 20 25 30Gly
Asp Ser Gly Glu Leu Ala Leu Asp Pro Asp Thr Val Arg Ala Ala 35
40 45Leu Pro Pro Asp Asn Pro Leu Pro Ile
Asn Val Asp His Arg Ala Gly 50 55
60Cys Glu Val Gly Arg Val Leu Ala Val Val Asp Asp Pro Arg Gly Pro65
70 75 80Phe Phe Val Gly Leu
Ile Ala Cys Val Gln Leu Glu Arg Val Leu Glu 85
90 95Thr Ala Ala Ser Ala Ala Ile Phe Glu Arg Arg
Gly Pro Pro Leu Ser 100 105
110Arg Glu Glu Arg Leu Leu Tyr Leu Ile Thr Asn Tyr Leu Pro Ser Val
115 120 125Ser Leu Ala Thr Lys Arg Leu
Gly Gly Glu Ala His Pro Asp Arg Thr 130 135
140Leu Phe Ala His Val Ala Leu Cys Ala Ile Gly Arg Arg Leu Gly
Thr145 150 155 160Ile Val
Thr Tyr Asp Thr Gly Leu Asp Ala Ala Ile Ala Pro Phe Arg
165 170 175His Leu Ser Pro Ala Ser Arg
Glu Gly Ala Arg Arg Leu Ala Ala Glu 180 185
190Ala Glu Leu Ala Leu Ser Gly Arg Thr Trp Ala Pro Gly Val
Glu Ala 195 200 205Leu Thr His Thr
Leu Leu Ser Thr Ala Val Asn Asn Met Met Leu Arg 210
215 220Asp Arg Trp Ser Leu Val Ala Glu Arg Arg Arg Gln
Ala Gly Ile Ala225 230 235
240Gly His Thr Tyr Leu Gln Ala Ser Glu Lys Phe Lys Met Trp Gly Ala
245 250 255Glu Pro Val Ser Ala
Pro Ala Arg Gly Tyr Lys Asn Gly Ala Pro Glu 260
265 270Ser Thr Asp Ile Pro Pro Gly Ser Ile Ala Ala Ala
Pro Gln Gly Asp 275 280 285Arg Cys
Pro Ile Val Arg Gln Arg Gly Val Ala Leu Ser Pro Val Leu 290
295 300Pro Pro Met Asn Pro Val Pro Thr Ser Gly Thr
Pro Ala Pro Ala Pro305 310 315
320Pro Gly Asp Gly Ser Tyr Leu Trp Ile Pro Ala Ser His Tyr Asn Gln
325 330 335Leu Val Ala Gly
His Ala Ala Pro Gln Pro Gln Pro His Ser Ala Phe 340
345 350Gly Phe Pro Ala Ala Ala Gly Ser Val Ala Tyr
Gly Pro His Gly Ala 355 360 365Gly
Leu Ser Gln His Tyr Pro Pro His Val Ala His Gln Tyr Pro Gly 370
375 380Val Leu Phe Ser Gly Pro Ser Pro Leu Glu
Ala Gln Ile Ala Ala Leu385 390 395
400Val Gly Ala Ile Ala Ala Asp Arg Gln Ala Gly Gly Gln Pro Ala
Ala 405 410 415Gly Asp Pro
Gly Val Arg Gly Ser Gly Lys Arg Arg Arg Tyr Glu Ala 420
425 430Gly Pro Ser Glu Ser Tyr Cys Asp Gln Asp
Glu Pro Asp Ala Asp Tyr 435 440
445Pro Tyr Tyr Pro Gly Glu Ala Arg Gly Ala Pro Arg Gly Val Asp Ser 450
455 460Arg Arg Ala Ala Arg His Ser Pro
Gly Thr Asn Glu Thr Ile Thr Ala465 470
475 480Leu Met Gly Ala Val Thr Ser Leu Gln Gln Glu Leu
Ala His Met Arg 485 490
495Ala Arg Thr Ser Ala Pro Tyr Gly Met Tyr Thr Pro Val Ala His Tyr
500 505 510Arg Pro Gln Val Gly Glu
Pro Glu Pro Thr Thr Thr His Pro Ala Leu 515 520
525Cys Pro Pro Glu Ala Val Tyr Arg Pro Pro Pro His Ser Ala
Pro Tyr 530 535 540Gly Pro Pro Gln Gly
Pro Ala Ser His Ala Pro Thr Pro Pro Tyr Ala545 550
555 560Pro Ala Ala Cys Pro Pro Gly Pro Pro Pro
Pro Pro Cys Pro Ser Thr 565 570
575Gln Thr Arg Ala Pro Leu Pro Thr Glu Pro Ala Phe Pro Pro Ala Ala
580 585 590Thr Gly Ser Gln Pro
Glu Ala Ser Asn Ala Glu Ala Gly Ala Leu Val 595
600 605Asn Ala Ser Ser Ala Ala His Val Asp Val Asp Thr
Ala Arg Ala Ala 610 615 620Asp Leu Phe
Val Ser Gln Met Met Gly Ala Arg625 630
6358715DNAArtificialSynthetic Sequence 87ctn gtn ctn gcn agy
15Xaa Val Xaa Ala Xaa1
5885PRTArtificialmisc_feature(1)..(1)The 'Xaa' at location 1 stands for
Leu. 88Xaa Val Xaa Ala Xaa1 58918DNAArtificialSynthetic
Sequence 89ctn gtn ctn gcn agy agy
18Xaa Val Xaa Ala Xaa Xaa1
5906PRTArtificialmisc_feature(1)..(1)The 'Xaa' at location 1 stands for
Leu. 90Xaa Val Xaa Ala Xaa Xaa1 591501DNAHuman
T-lymphotropic virus 1gene(1)..(501)Gene for HTLV-1 protease 91atg aca
gtc ctt ccg ata gcc ttg ttc tca agt aat act ccc ctc aga 48Met Thr
Val Leu Pro Ile Ala Leu Phe Ser Ser Asn Thr Pro Leu Arg1 5
10 15aat aca tcc gta tta ggg gca ggg
ggc caa acc caa gat cac ttt aag 96Asn Thr Ser Val Leu Gly Ala Gly
Gly Gln Thr Gln Asp His Phe Lys 20 25
30ctc acc tcc ctt cct gta cta ata cgc ctc cct ttc cgg aca acg
cct 144Leu Thr Ser Leu Pro Val Leu Ile Arg Leu Pro Phe Arg Thr Thr
Pro 35 40 45att gtt tta aca tct
tgc cta gtt gat acc aaa aac aac tgg gcc atc 192Ile Val Leu Thr Ser
Cys Leu Val Asp Thr Lys Asn Asn Trp Ala Ile 50 55
60ata ggt cgt gat gcc tta caa caa tgc caa ggc gcc ctg tac
ctc cct 240Ile Gly Arg Asp Ala Leu Gln Gln Cys Gln Gly Ala Leu Tyr
Leu Pro65 70 75 80gag
gca aaa ggg ccg cct gta atc ttg cca ata cag gcg cca gcc gtc 288Glu
Ala Lys Gly Pro Pro Val Ile Leu Pro Ile Gln Ala Pro Ala Val
85 90 95ctt ggg cta gaa cac ctc cca
agg ccc ccc gaa atc agc cag ttc cct 336Leu Gly Leu Glu His Leu Pro
Arg Pro Pro Glu Ile Ser Gln Phe Pro 100 105
110tta aac cag aac ggc tcc agg cct tgc aac act tgg tcc gga
agg ccc 384Leu Asn Gln Asn Gly Ser Arg Pro Cys Asn Thr Trp Ser Gly
Arg Pro 115 120 125tgg agg cag gcc
ata tcg aac cct aca ccg ggc cag gaa ata acc cag 432Trp Arg Gln Ala
Ile Ser Asn Pro Thr Pro Gly Gln Glu Ile Thr Gln 130
135 140tat tcc cag tta aaa agg cca atg gaa cct ggc gat
tca tcc acg acc 480Tyr Ser Gln Leu Lys Arg Pro Met Glu Pro Gly Asp
Ser Ser Thr Thr145 150 155
160tgc ggg cca cta act ctc taa
501Cys Gly Pro Leu Thr Leu 16592166PRTHuman T-lymphotropic
virus 1 92Met Thr Val Leu Pro Ile Ala Leu Phe Ser Ser Asn Thr Pro Leu
Arg1 5 10 15Asn Thr Ser
Val Leu Gly Ala Gly Gly Gln Thr Gln Asp His Phe Lys 20
25 30Leu Thr Ser Leu Pro Val Leu Ile Arg Leu
Pro Phe Arg Thr Thr Pro 35 40
45Ile Val Leu Thr Ser Cys Leu Val Asp Thr Lys Asn Asn Trp Ala Ile 50
55 60Ile Gly Arg Asp Ala Leu Gln Gln Cys
Gln Gly Ala Leu Tyr Leu Pro65 70 75
80Glu Ala Lys Gly Pro Pro Val Ile Leu Pro Ile Gln Ala Pro
Ala Val 85 90 95Leu Gly
Leu Glu His Leu Pro Arg Pro Pro Glu Ile Ser Gln Phe Pro 100
105 110Leu Asn Gln Asn Gly Ser Arg Pro Cys
Asn Thr Trp Ser Gly Arg Pro 115 120
125Trp Arg Gln Ala Ile Ser Asn Pro Thr Pro Gly Gln Glu Ile Thr Gln
130 135 140Tyr Ser Gln Leu Lys Arg Pro
Met Glu Pro Gly Asp Ser Ser Thr Thr145 150
155 160Cys Gly Pro Leu Thr Leu
1659336DNAArtificialSynthetic Sequence 93tcn cgn ccn car gtn ctn ccn gtn
atg cay ccn aay 36Xaa Arg Pro Gln Val Xaa Pro Val
Met His Pro Asn1 5
109412PRTArtificialmisc_feature(1)..(1)The 'Xaa' at location 1 stands for
Ser. 94Xaa Arg Pro Gln Val Xaa Pro Val Met His Pro Asn1 5
109536DNAArtificialSynthetic Sequence 95tcn aar acn aar
gtn ctn gtn gtn car ccn aar aay 36Xaa Lys Xaa Lys
Val Xaa Val Val Gln Pro Lys Asn1 5
109612PRTArtificialmisc_feature(1)..(1)The 'Xaa' at location 1 stands for
Ser. 96Xaa Lys Xaa Lys Val Xaa Val Val Gln Pro Lys Asn1 5
1097768DNAHuman cytomegalovirusCDS(1)..(768)HCMV
assemblin(protease) contained in "UL80" 97atg acg atg gac gag cag cag tcg
cag gct gtg gcg ccg gtc tac gtg 48Met Thr Met Asp Glu Gln Gln Ser
Gln Ala Val Ala Pro Val Tyr Val1 5 10
15ggc ggc ttt ctc gcc cgc tac gac cag tct ccg gac gag gcc
gaa ttg 96Gly Gly Phe Leu Ala Arg Tyr Asp Gln Ser Pro Asp Glu Ala
Glu Leu 20 25 30ctg ttg ccg
cgg gac gta gtg gag cac tgg ttg cac gcg cag ggc cag 144Leu Leu Pro
Arg Asp Val Val Glu His Trp Leu His Ala Gln Gly Gln 35
40 45gga cag cct tcg ttg tcg gtc gcg ctc ccg ctc
aac atc aac cac gac 192Gly Gln Pro Ser Leu Ser Val Ala Leu Pro Leu
Asn Ile Asn His Asp 50 55 60gac acg
gcc gtt gta gga cac gtt gcg gcg atg cag agc gtc cgc gac 240Asp Thr
Ala Val Val Gly His Val Ala Ala Met Gln Ser Val Arg Asp65
70 75 80ggt ctt ttt tgc ctg ggc tgc
gtc act tcg ccc agg ttt ctg gag att 288Gly Leu Phe Cys Leu Gly Cys
Val Thr Ser Pro Arg Phe Leu Glu Ile 85 90
95gta cgc cgc gct tcg gaa aag tcc gag ctg gtt tcg cgc
ggg ccc gtc 336Val Arg Arg Ala Ser Glu Lys Ser Glu Leu Val Ser Arg
Gly Pro Val 100 105 110agt ccg
ctg cag cca gac aag gtg gtg gag ttt ctc agc ggc agc tac 384Ser Pro
Leu Gln Pro Asp Lys Val Val Glu Phe Leu Ser Gly Ser Tyr 115
120 125gcc ggc ctc tcg ctc tcc agc cgg cgc tgc
gac gac gtg gag gcc gcg 432Ala Gly Leu Ser Leu Ser Ser Arg Arg Cys
Asp Asp Val Glu Ala Ala 130 135 140acg
tcg ctt tcg ggc tcg gaa acc acg ccg ttc aaa cac gtg gct ttg 480Thr
Ser Leu Ser Gly Ser Glu Thr Thr Pro Phe Lys His Val Ala Leu145
150 155 160tgc agc gtg ggt cgg cgt
cgc ggt acg ttg gcc gtg tac ggg cgc gat 528Cys Ser Val Gly Arg Arg
Arg Gly Thr Leu Ala Val Tyr Gly Arg Asp 165
170 175ccc gag tgg gtc aca cag cgg ttt cca gac ctc acg
gcg gcc gac cgt 576Pro Glu Trp Val Thr Gln Arg Phe Pro Asp Leu Thr
Ala Ala Asp Arg 180 185 190gac
ggg cta cgt gca cag tgg cag cgc tgc ggc agc act gct gtc gac 624Asp
Gly Leu Arg Ala Gln Trp Gln Arg Cys Gly Ser Thr Ala Val Asp 195
200 205gcg tcg ggc gat ccc ttt cgc tca gac
agc tac ggc ctg ttg ggc aac 672Ala Ser Gly Asp Pro Phe Arg Ser Asp
Ser Tyr Gly Leu Leu Gly Asn 210 215
220agc gtg gac gcg ctc tac atc cgt gag cga ctg ccc aag ctg cgc tac
720Ser Val Asp Ala Leu Tyr Ile Arg Glu Arg Leu Pro Lys Leu Arg Tyr225
230 235 240gac aag caa cta
gtc ggc gtg acg gag cgc gag tca tac gtc aag gcg 768Asp Lys Gln Leu
Val Gly Val Thr Glu Arg Glu Ser Tyr Val Lys Ala 245
250 25598256PRTHuman cytomegalovirus 98Met Thr
Met Asp Glu Gln Gln Ser Gln Ala Val Ala Pro Val Tyr Val1 5
10 15Gly Gly Phe Leu Ala Arg Tyr Asp
Gln Ser Pro Asp Glu Ala Glu Leu 20 25
30Leu Leu Pro Arg Asp Val Val Glu His Trp Leu His Ala Gln Gly
Gln 35 40 45Gly Gln Pro Ser Leu
Ser Val Ala Leu Pro Leu Asn Ile Asn His Asp 50 55
60Asp Thr Ala Val Val Gly His Val Ala Ala Met Gln Ser Val
Arg Asp65 70 75 80Gly
Leu Phe Cys Leu Gly Cys Val Thr Ser Pro Arg Phe Leu Glu Ile
85 90 95Val Arg Arg Ala Ser Glu Lys
Ser Glu Leu Val Ser Arg Gly Pro Val 100 105
110Ser Pro Leu Gln Pro Asp Lys Val Val Glu Phe Leu Ser Gly
Ser Tyr 115 120 125Ala Gly Leu Ser
Leu Ser Ser Arg Arg Cys Asp Asp Val Glu Ala Ala 130
135 140Thr Ser Leu Ser Gly Ser Glu Thr Thr Pro Phe Lys
His Val Ala Leu145 150 155
160Cys Ser Val Gly Arg Arg Arg Gly Thr Leu Ala Val Tyr Gly Arg Asp
165 170 175Pro Glu Trp Val Thr
Gln Arg Phe Pro Asp Leu Thr Ala Ala Asp Arg 180
185 190Asp Gly Leu Arg Ala Gln Trp Gln Arg Cys Gly Ser
Thr Ala Val Asp 195 200 205Ala Ser
Gly Asp Pro Phe Arg Ser Asp Ser Tyr Gly Leu Leu Gly Asn 210
215 220Ser Val Asp Ala Leu Tyr Ile Arg Glu Arg Leu
Pro Lys Leu Arg Tyr225 230 235
240Asp Lys Gln Leu Val Gly Val Thr Glu Arg Glu Ser Tyr Val Lys Ala
245 250
2559915DNAArtificialSynthetic Sequence 99gtg gtg aac gcc agt
15Val Val Asn Ala Ser1
51005PRTArtificialSynthetic Sequence 100Val Val Asn Ala Ser1
510115DNAArtificialSynthetic Sequence 101tac gtc aag gcg agc
15Tyr Val Lys Ala Ser1
51025PRTArtificialSynthetic Sequence 102Tyr Val Lys Ala Ser1
51031506DNAHomo sapiensgene(1)..(1506)Gene for APP beta-secretase
103atg gcc caa gcc ctg ccc tgg ctc ctg ctg tgg atg ggc gcg gga gtg
48Met Ala Gln Ala Leu Pro Trp Leu Leu Leu Trp Met Gly Ala Gly Val1
5 10 15ctg cct gcc cac ggc acc
cag cac ggc atc cgg ctg ccc ctg cgc agc 96Leu Pro Ala His Gly Thr
Gln His Gly Ile Arg Leu Pro Leu Arg Ser 20 25
30ggc ctg ggg ggc gcc ccc ctg ggg ctg cgg ctg ccc cgg
gag acc gac 144Gly Leu Gly Gly Ala Pro Leu Gly Leu Arg Leu Pro Arg
Glu Thr Asp 35 40 45gaa gag ccc
gag gag ccc ggc cgg agg ggc agc ttt gtg gag atg gtg 192Glu Glu Pro
Glu Glu Pro Gly Arg Arg Gly Ser Phe Val Glu Met Val 50
55 60gac aac ctg agg ggc aag tcg ggg cag ggc tac tac
gtg gag atg acc 240Asp Asn Leu Arg Gly Lys Ser Gly Gln Gly Tyr Tyr
Val Glu Met Thr65 70 75
80gtg ggc agc ccc ccg cag acg ctc aac atc ctg gtg gat aca ggc agc
288Val Gly Ser Pro Pro Gln Thr Leu Asn Ile Leu Val Asp Thr Gly Ser
85 90 95agt aac ttt gca gtg ggt
gct gcc ccc cac ccc ttc ctg cat cgc tac 336Ser Asn Phe Ala Val Gly
Ala Ala Pro His Pro Phe Leu His Arg Tyr 100
105 110tac cag agg cag ctg tcc agc aca tac cgg gac ctc
cgg aag ggt gtg 384Tyr Gln Arg Gln Leu Ser Ser Thr Tyr Arg Asp Leu
Arg Lys Gly Val 115 120 125tat gtg
ccc tac acc cag ggc aag tgg gaa ggg gag ctg ggc acc gac 432Tyr Val
Pro Tyr Thr Gln Gly Lys Trp Glu Gly Glu Leu Gly Thr Asp 130
135 140ctg gta agc atc ccc cat ggc ccc aac gtc act
gtg cgt gcc aac att 480Leu Val Ser Ile Pro His Gly Pro Asn Val Thr
Val Arg Ala Asn Ile145 150 155
160gct gcc atc act gaa tca gac aag ttc ttc atc aac ggc tcc aac tgg
528Ala Ala Ile Thr Glu Ser Asp Lys Phe Phe Ile Asn Gly Ser Asn Trp
165 170 175gaa ggc atc ctg ggg
ctg gcc tat gct gag att gcc agg cct gac gac 576Glu Gly Ile Leu Gly
Leu Ala Tyr Ala Glu Ile Ala Arg Pro Asp Asp 180
185 190tcc ctg gag cct ttc ttt gac tct ctg gta aag cag
acc cac gtt ccc 624Ser Leu Glu Pro Phe Phe Asp Ser Leu Val Lys Gln
Thr His Val Pro 195 200 205aac ctc
ttc tcc ctg cag ctt tgt ggt gct ggc ttc ccc ctc aac cag 672Asn Leu
Phe Ser Leu Gln Leu Cys Gly Ala Gly Phe Pro Leu Asn Gln 210
215 220tct gaa gtg ctg gcc tct gtc gga ggg agc atg
atc att gga ggt atc 720Ser Glu Val Leu Ala Ser Val Gly Gly Ser Met
Ile Ile Gly Gly Ile225 230 235
240gac cac tcg ctg tac aca ggc agt ctc tgg tat aca ccc atc cgg cgg
768Asp His Ser Leu Tyr Thr Gly Ser Leu Trp Tyr Thr Pro Ile Arg Arg
245 250 255gag tgg tat tat gag
gtc atc att gtg cgg gtg gag atc aat gga cag 816Glu Trp Tyr Tyr Glu
Val Ile Ile Val Arg Val Glu Ile Asn Gly Gln 260
265 270gat ctg aaa atg gac tgc aag gag tac aac tat gac
aag agc att gtg 864Asp Leu Lys Met Asp Cys Lys Glu Tyr Asn Tyr Asp
Lys Ser Ile Val 275 280 285gac agt
ggc acc acc aac ctt cgt ttg ccc aag aaa gtg ttt gaa gct 912Asp Ser
Gly Thr Thr Asn Leu Arg Leu Pro Lys Lys Val Phe Glu Ala 290
295 300gca gtc aaa tcc atc aag gca gcc tcc tcc acg
gag aag ttc cct gat 960Ala Val Lys Ser Ile Lys Ala Ala Ser Ser Thr
Glu Lys Phe Pro Asp305 310 315
320ggt ttc tgg cta gga gag cag ctg gtg tgc tgg caa gca ggc acc acc
1008Gly Phe Trp Leu Gly Glu Gln Leu Val Cys Trp Gln Ala Gly Thr Thr
325 330 335cct tgg aac att ttc
cca gtc atc tca ctc tac cta atg ggt gag gtt 1056Pro Trp Asn Ile Phe
Pro Val Ile Ser Leu Tyr Leu Met Gly Glu Val 340
345 350acc aac cag tcc ttc cgc atc acc atc ctt ccg cag
caa tac ctg cgg 1104Thr Asn Gln Ser Phe Arg Ile Thr Ile Leu Pro Gln
Gln Tyr Leu Arg 355 360 365cca gtg
gaa gat gtg gcc acg tcc caa gac gac tgt tac aag ttt gcc 1152Pro Val
Glu Asp Val Ala Thr Ser Gln Asp Asp Cys Tyr Lys Phe Ala 370
375 380atc tca cag tca tcc acg ggc act gtt atg gga
gct gtt atc atg gag 1200Ile Ser Gln Ser Ser Thr Gly Thr Val Met Gly
Ala Val Ile Met Glu385 390 395
400ggc ttc tac gtt gtc ttt gat cgg gcc cga aaa cga att ggc ttt gct
1248Gly Phe Tyr Val Val Phe Asp Arg Ala Arg Lys Arg Ile Gly Phe Ala
405 410 415gtc agc gct tgc cat
gtg cac gat gag ttc agg acg gca gcg gtg gaa 1296Val Ser Ala Cys His
Val His Asp Glu Phe Arg Thr Ala Ala Val Glu 420
425 430ggc cct ttt gtc acc ttg gac atg gaa gac tgt ggc
tac aac att cca 1344Gly Pro Phe Val Thr Leu Asp Met Glu Asp Cys Gly
Tyr Asn Ile Pro 435 440 445cag aca
gat gag tca acc ctc atg acc ata gcc tat gtc atg gct gcc 1392Gln Thr
Asp Glu Ser Thr Leu Met Thr Ile Ala Tyr Val Met Ala Ala 450
455 460atc tgc gcc ctc ttc atg ctg cca ctc tgc ctc
atg gtg tgt cag tgg 1440Ile Cys Ala Leu Phe Met Leu Pro Leu Cys Leu
Met Val Cys Gln Trp465 470 475
480cgc tgc ctc cgc tgc ctg cgc cag cag cat gat gac ttt gct gat gac
1488Arg Cys Leu Arg Cys Leu Arg Gln Gln His Asp Asp Phe Ala Asp Asp
485 490 495atc tcc ctg ctg aag
tga 1506Ile Ser Leu Leu Lys
500104501PRTHomo sapiens 104Met Ala Gln Ala Leu Pro Trp Leu Leu
Leu Trp Met Gly Ala Gly Val1 5 10
15Leu Pro Ala His Gly Thr Gln His Gly Ile Arg Leu Pro Leu Arg
Ser 20 25 30Gly Leu Gly Gly
Ala Pro Leu Gly Leu Arg Leu Pro Arg Glu Thr Asp 35
40 45Glu Glu Pro Glu Glu Pro Gly Arg Arg Gly Ser Phe
Val Glu Met Val 50 55 60Asp Asn Leu
Arg Gly Lys Ser Gly Gln Gly Tyr Tyr Val Glu Met Thr65 70
75 80Val Gly Ser Pro Pro Gln Thr Leu
Asn Ile Leu Val Asp Thr Gly Ser 85 90
95Ser Asn Phe Ala Val Gly Ala Ala Pro His Pro Phe Leu His
Arg Tyr 100 105 110Tyr Gln Arg
Gln Leu Ser Ser Thr Tyr Arg Asp Leu Arg Lys Gly Val 115
120 125Tyr Val Pro Tyr Thr Gln Gly Lys Trp Glu Gly
Glu Leu Gly Thr Asp 130 135 140Leu Val
Ser Ile Pro His Gly Pro Asn Val Thr Val Arg Ala Asn Ile145
150 155 160Ala Ala Ile Thr Glu Ser Asp
Lys Phe Phe Ile Asn Gly Ser Asn Trp 165
170 175Glu Gly Ile Leu Gly Leu Ala Tyr Ala Glu Ile Ala
Arg Pro Asp Asp 180 185 190Ser
Leu Glu Pro Phe Phe Asp Ser Leu Val Lys Gln Thr His Val Pro 195
200 205Asn Leu Phe Ser Leu Gln Leu Cys Gly
Ala Gly Phe Pro Leu Asn Gln 210 215
220Ser Glu Val Leu Ala Ser Val Gly Gly Ser Met Ile Ile Gly Gly Ile225
230 235 240Asp His Ser Leu
Tyr Thr Gly Ser Leu Trp Tyr Thr Pro Ile Arg Arg 245
250 255Glu Trp Tyr Tyr Glu Val Ile Ile Val Arg
Val Glu Ile Asn Gly Gln 260 265
270Asp Leu Lys Met Asp Cys Lys Glu Tyr Asn Tyr Asp Lys Ser Ile Val
275 280 285Asp Ser Gly Thr Thr Asn Leu
Arg Leu Pro Lys Lys Val Phe Glu Ala 290 295
300Ala Val Lys Ser Ile Lys Ala Ala Ser Ser Thr Glu Lys Phe Pro
Asp305 310 315 320Gly Phe
Trp Leu Gly Glu Gln Leu Val Cys Trp Gln Ala Gly Thr Thr
325 330 335Pro Trp Asn Ile Phe Pro Val
Ile Ser Leu Tyr Leu Met Gly Glu Val 340 345
350Thr Asn Gln Ser Phe Arg Ile Thr Ile Leu Pro Gln Gln Tyr
Leu Arg 355 360 365Pro Val Glu Asp
Val Ala Thr Ser Gln Asp Asp Cys Tyr Lys Phe Ala 370
375 380Ile Ser Gln Ser Ser Thr Gly Thr Val Met Gly Ala
Val Ile Met Glu385 390 395
400Gly Phe Tyr Val Val Phe Asp Arg Ala Arg Lys Arg Ile Gly Phe Ala
405 410 415Val Ser Ala Cys His
Val His Asp Glu Phe Arg Thr Ala Ala Val Glu 420
425 430Gly Pro Phe Val Thr Leu Asp Met Glu Asp Cys Gly
Tyr Asn Ile Pro 435 440 445Gln Thr
Asp Glu Ser Thr Leu Met Thr Ile Ala Tyr Val Met Ala Ala 450
455 460Ile Cys Ala Leu Phe Met Leu Pro Leu Cys Leu
Met Val Cys Gln Trp465 470 475
480Arg Cys Leu Arg Cys Leu Arg Gln Gln His Asp Asp Phe Ala Asp Asp
485 490 495Ile Ser Leu Leu
Lys 50010518DNAArtificialSynthetic Sequence 105gaa gta aag atg
gat gca 18Glu Val Lys Met
Asp Ala1 51066PRTArtificialSynthetic Sequence 106Glu Val
Lys Met Asp Ala1 5107834DNAHomo sapiensgene(1)..(834)Gene
for Caspase 3 107atg gag aac act gaa aac tca gtg gat tca aaa tcc att aaa
aat ttg 48Met Glu Asn Thr Glu Asn Ser Val Asp Ser Lys Ser Ile Lys
Asn Leu1 5 10 15gaa cca
aag atc ata cat gga agc gaa tca atg gac tct gga ata tcc 96Glu Pro
Lys Ile Ile His Gly Ser Glu Ser Met Asp Ser Gly Ile Ser 20
25 30ctg gac aac agt tat aaa atg gat tat
cct gag atg ggt tta tgt ata 144Leu Asp Asn Ser Tyr Lys Met Asp Tyr
Pro Glu Met Gly Leu Cys Ile 35 40
45ata att aat aat aag aat ttt cat aaa agc act gga atg aca tct cgg
192Ile Ile Asn Asn Lys Asn Phe His Lys Ser Thr Gly Met Thr Ser Arg 50
55 60tct ggt aca gat gtc gat gca gca aac
ctc agg gaa aca ttc aga aac 240Ser Gly Thr Asp Val Asp Ala Ala Asn
Leu Arg Glu Thr Phe Arg Asn65 70 75
80ttg aaa tat gaa gtc agg aat aaa aat gat ctt aca cgt gaa
gaa att 288Leu Lys Tyr Glu Val Arg Asn Lys Asn Asp Leu Thr Arg Glu
Glu Ile 85 90 95gtg gaa
ttg atg cgt gat gtt tct aaa gaa gat cac agc aaa agg agc 336Val Glu
Leu Met Arg Asp Val Ser Lys Glu Asp His Ser Lys Arg Ser 100
105 110agt ttt gtt tgt gtg ctt ctg agc cat
ggt gaa gaa gga ata att ttt 384Ser Phe Val Cys Val Leu Leu Ser His
Gly Glu Glu Gly Ile Ile Phe 115 120
125gga aca aat gga cct gtt gac ctg aaa aaa ata aca aac ttt ttc aga
432Gly Thr Asn Gly Pro Val Asp Leu Lys Lys Ile Thr Asn Phe Phe Arg 130
135 140ggg gat cgt tgt aga agt cta act
gga aaa ccc aaa ctt ttc att att 480Gly Asp Arg Cys Arg Ser Leu Thr
Gly Lys Pro Lys Leu Phe Ile Ile145 150
155 160cag gcc tgc cgt ggt aca gaa ctg gac tgt ggc att
gag aca gac agt 528Gln Ala Cys Arg Gly Thr Glu Leu Asp Cys Gly Ile
Glu Thr Asp Ser 165 170
175ggt gtt gat gat gac atg gcg tgt cat aaa ata cca gtg gag gcc gac
576Gly Val Asp Asp Asp Met Ala Cys His Lys Ile Pro Val Glu Ala Asp
180 185 190ttc ttg tat gca tac tcc
aca gca cct ggt tat tat tct tgg cga aat 624Phe Leu Tyr Ala Tyr Ser
Thr Ala Pro Gly Tyr Tyr Ser Trp Arg Asn 195 200
205tca aag gat ggc tcc tgg ttc atc cag tcg ctt tgt gcc atg
ctg aaa 672Ser Lys Asp Gly Ser Trp Phe Ile Gln Ser Leu Cys Ala Met
Leu Lys 210 215 220cag tat gcc gac aag
ctt gaa ttt atg cac att ctt acc cgg gtt aac 720Gln Tyr Ala Asp Lys
Leu Glu Phe Met His Ile Leu Thr Arg Val Asn225 230
235 240cga aag gtg gca aca gaa ttt gag tcc ttt
tcc ttt gac gct act ttt 768Arg Lys Val Ala Thr Glu Phe Glu Ser Phe
Ser Phe Asp Ala Thr Phe 245 250
255cat gca aag aaa cag att cca tgt att gtt tcc atg ctc aca aaa gaa
816His Ala Lys Lys Gln Ile Pro Cys Ile Val Ser Met Leu Thr Lys Glu
260 265 270ctc tat ttt tat cac taa
834Leu Tyr Phe Tyr His
275108277PRTHomo sapiens 108Met Glu Asn Thr Glu Asn Ser Val Asp Ser Lys
Ser Ile Lys Asn Leu1 5 10
15Glu Pro Lys Ile Ile His Gly Ser Glu Ser Met Asp Ser Gly Ile Ser
20 25 30Leu Asp Asn Ser Tyr Lys Met
Asp Tyr Pro Glu Met Gly Leu Cys Ile 35 40
45Ile Ile Asn Asn Lys Asn Phe His Lys Ser Thr Gly Met Thr Ser
Arg 50 55 60Ser Gly Thr Asp Val Asp
Ala Ala Asn Leu Arg Glu Thr Phe Arg Asn65 70
75 80Leu Lys Tyr Glu Val Arg Asn Lys Asn Asp Leu
Thr Arg Glu Glu Ile 85 90
95Val Glu Leu Met Arg Asp Val Ser Lys Glu Asp His Ser Lys Arg Ser
100 105 110Ser Phe Val Cys Val Leu
Leu Ser His Gly Glu Glu Gly Ile Ile Phe 115 120
125Gly Thr Asn Gly Pro Val Asp Leu Lys Lys Ile Thr Asn Phe
Phe Arg 130 135 140Gly Asp Arg Cys Arg
Ser Leu Thr Gly Lys Pro Lys Leu Phe Ile Ile145 150
155 160Gln Ala Cys Arg Gly Thr Glu Leu Asp Cys
Gly Ile Glu Thr Asp Ser 165 170
175Gly Val Asp Asp Asp Met Ala Cys His Lys Ile Pro Val Glu Ala Asp
180 185 190Phe Leu Tyr Ala Tyr
Ser Thr Ala Pro Gly Tyr Tyr Ser Trp Arg Asn 195
200 205Ser Lys Asp Gly Ser Trp Phe Ile Gln Ser Leu Cys
Ala Met Leu Lys 210 215 220Gln Tyr Ala
Asp Lys Leu Glu Phe Met His Ile Leu Thr Arg Val Asn225
230 235 240Arg Lys Val Ala Thr Glu Phe
Glu Ser Phe Ser Phe Asp Ala Thr Phe 245
250 255His Ala Lys Lys Gln Ile Pro Cys Ile Val Ser Met
Leu Thr Lys Glu 260 265 270Leu
Tyr Phe Tyr His 275109441DNAHomo sapiensCDS(1)..(441)Caspase 3,
large subunit 109tct gga ata tcc ctg gac aac agt tat aaa atg gat tat cct
gag atg 48Ser Gly Ile Ser Leu Asp Asn Ser Tyr Lys Met Asp Tyr Pro
Glu Met1 5 10 15ggt tta
tgt ata ata att aat aat aag aat ttt cat aaa agc act gga 96Gly Leu
Cys Ile Ile Ile Asn Asn Lys Asn Phe His Lys Ser Thr Gly 20
25 30atg aca tct cgg tct ggt aca gat gtc
gat gca gca aac ctc agg gaa 144Met Thr Ser Arg Ser Gly Thr Asp Val
Asp Ala Ala Asn Leu Arg Glu 35 40
45aca ttc aga aac ttg aaa tat gaa gtc agg aat aaa aat gat ctt aca
192Thr Phe Arg Asn Leu Lys Tyr Glu Val Arg Asn Lys Asn Asp Leu Thr 50
55 60cgt gaa gaa att gtg gaa ttg atg cgt
gat gtt tct aaa gaa gat cac 240Arg Glu Glu Ile Val Glu Leu Met Arg
Asp Val Ser Lys Glu Asp His65 70 75
80agc aaa agg agc agt ttt gtt tgt gtg ctt ctg agc cat ggt
gaa gaa 288Ser Lys Arg Ser Ser Phe Val Cys Val Leu Leu Ser His Gly
Glu Glu 85 90 95gga ata
att ttt gga aca aat gga cct gtt gac ctg aaa aaa ata aca 336Gly Ile
Ile Phe Gly Thr Asn Gly Pro Val Asp Leu Lys Lys Ile Thr 100
105 110aac ttt ttc aga ggg gat cgt tgt aga
agt cta act gga aaa ccc aaa 384Asn Phe Phe Arg Gly Asp Arg Cys Arg
Ser Leu Thr Gly Lys Pro Lys 115 120
125ctt ttc att att cag gcc tgc cgt ggt aca gaa ctg gac tgt ggc att
432Leu Phe Ile Ile Gln Ala Cys Arg Gly Thr Glu Leu Asp Cys Gly Ile 130
135 140gag aca gac
441Glu Thr Asp145110147PRTHomo sapiens
110Ser Gly Ile Ser Leu Asp Asn Ser Tyr Lys Met Asp Tyr Pro Glu Met1
5 10 15Gly Leu Cys Ile Ile Ile
Asn Asn Lys Asn Phe His Lys Ser Thr Gly 20 25
30Met Thr Ser Arg Ser Gly Thr Asp Val Asp Ala Ala Asn
Leu Arg Glu 35 40 45Thr Phe Arg
Asn Leu Lys Tyr Glu Val Arg Asn Lys Asn Asp Leu Thr 50
55 60Arg Glu Glu Ile Val Glu Leu Met Arg Asp Val Ser
Lys Glu Asp His65 70 75
80Ser Lys Arg Ser Ser Phe Val Cys Val Leu Leu Ser His Gly Glu Glu
85 90 95Gly Ile Ile Phe Gly Thr
Asn Gly Pro Val Asp Leu Lys Lys Ile Thr 100
105 110Asn Phe Phe Arg Gly Asp Arg Cys Arg Ser Leu Thr
Gly Lys Pro Lys 115 120 125Leu Phe
Ile Ile Gln Ala Cys Arg Gly Thr Glu Leu Asp Cys Gly Ile 130
135 140Glu Thr Asp145111306DNAHomo
sapiensCDS(1)..(306)Caspase 3, small subunit 111agt ggt gtt gat gat gac
atg gcg tgt cat aaa ata cca gtg gag gcc 48Ser Gly Val Asp Asp Asp
Met Ala Cys His Lys Ile Pro Val Glu Ala1 5
10 15gac ttc ttg tat gca tac tcc aca gca cct ggt tat
tat tct tgg cga 96Asp Phe Leu Tyr Ala Tyr Ser Thr Ala Pro Gly Tyr
Tyr Ser Trp Arg 20 25 30aat
tca aag gat ggc tcc tgg ttc atc cag tcg ctt tgt gcc atg ctg 144Asn
Ser Lys Asp Gly Ser Trp Phe Ile Gln Ser Leu Cys Ala Met Leu 35
40 45aaa cag tat gcc gac aag ctt gaa ttt
atg cac att ctt acc cgg gtt 192Lys Gln Tyr Ala Asp Lys Leu Glu Phe
Met His Ile Leu Thr Arg Val 50 55
60aac cga aag gtg gca aca gaa ttt gag tcc ttt tcc ttt gac gct act
240Asn Arg Lys Val Ala Thr Glu Phe Glu Ser Phe Ser Phe Asp Ala Thr65
70 75 80ttt cat gca aag aaa
cag att cca tgt att gtt tcc atg ctc aca aaa 288Phe His Ala Lys Lys
Gln Ile Pro Cys Ile Val Ser Met Leu Thr Lys 85
90 95gaa ctc tat ttt tat cac
306Glu Leu Tyr Phe Tyr His
100112102PRTHomo sapiens 112Ser Gly Val Asp Asp Asp Met Ala Cys His Lys
Ile Pro Val Glu Ala1 5 10
15Asp Phe Leu Tyr Ala Tyr Ser Thr Ala Pro Gly Tyr Tyr Ser Trp Arg
20 25 30Asn Ser Lys Asp Gly Ser Trp
Phe Ile Gln Ser Leu Cys Ala Met Leu 35 40
45Lys Gln Tyr Ala Asp Lys Leu Glu Phe Met His Ile Leu Thr Arg
Val 50 55 60Asn Arg Lys Val Ala Thr
Glu Phe Glu Ser Phe Ser Phe Asp Ala Thr65 70
75 80Phe His Ala Lys Lys Gln Ile Pro Cys Ile Val
Ser Met Leu Thr Lys 85 90
95Glu Leu Tyr Phe Tyr His 10011312DNAArtificialSynthetic
Sequence 113gar agy atg gay
12Glu Xaa Met Asp11144PRTArtificialmisc_feature(2)..(2)The 'Xaa'
at location 2 stands for Ser. 114Glu Xaa Met Asp11151869DNAHomo
sapiensgene(1)..(1869)Gene for Homo sapiens coagulation factor II
(thrombin) 115atg gcg cac gtc cga ggc ttg cag ctg cct ggc tgc ctg gcc ctg
gct 48Met Ala His Val Arg Gly Leu Gln Leu Pro Gly Cys Leu Ala Leu
Ala1 5 10 15gcc ctg tgt
agc ctt gtg cac agc cag cat gtg ttc ctg gct cct cag 96Ala Leu Cys
Ser Leu Val His Ser Gln His Val Phe Leu Ala Pro Gln 20
25 30caa gca cgg tcg ctg ctc cag cgg gtc cgg
cga gcc aac acc ttc ttg 144Gln Ala Arg Ser Leu Leu Gln Arg Val Arg
Arg Ala Asn Thr Phe Leu 35 40
45gag gag gtg cgc aag ggc aac cta gag cga gag tgc gtg gag gag acg
192Glu Glu Val Arg Lys Gly Asn Leu Glu Arg Glu Cys Val Glu Glu Thr 50
55 60tgc agc tac gag gag gcc ttc gag gct
ctg gag tcc tcc acg gct acg 240Cys Ser Tyr Glu Glu Ala Phe Glu Ala
Leu Glu Ser Ser Thr Ala Thr65 70 75
80gat gtg ttc tgg gcc aag tac aca gct tgt gag aca gcg agg
acg cct 288Asp Val Phe Trp Ala Lys Tyr Thr Ala Cys Glu Thr Ala Arg
Thr Pro 85 90 95cga gat
aag ctt gct gca tgt ctg gaa ggt aac tgt gct gag ggt ctg 336Arg Asp
Lys Leu Ala Ala Cys Leu Glu Gly Asn Cys Ala Glu Gly Leu 100
105 110ggt acg aac tac cga ggg cat gtg aac
atc acc cgg tca ggc att gag 384Gly Thr Asn Tyr Arg Gly His Val Asn
Ile Thr Arg Ser Gly Ile Glu 115 120
125tgc cag cta tgg agg agt cgc tac cca cat aag cct gaa atc aac tcc
432Cys Gln Leu Trp Arg Ser Arg Tyr Pro His Lys Pro Glu Ile Asn Ser 130
135 140act acc cat cct ggg gcc gac cta
cag gag aat ttc tgc cgc aac ccc 480Thr Thr His Pro Gly Ala Asp Leu
Gln Glu Asn Phe Cys Arg Asn Pro145 150
155 160gac agc agc acc acg gga ccc tgg tgc tac act aca
gac ccc acc gtg 528Asp Ser Ser Thr Thr Gly Pro Trp Cys Tyr Thr Thr
Asp Pro Thr Val 165 170
175agg agg cag gaa tgc agc atc cct gtc tgt ggc cag gat caa gtc act
576Arg Arg Gln Glu Cys Ser Ile Pro Val Cys Gly Gln Asp Gln Val Thr
180 185 190gta gcg atg act cca cgc
tcc gaa ggc tcc agt gtg aat ctg tca cct 624Val Ala Met Thr Pro Arg
Ser Glu Gly Ser Ser Val Asn Leu Ser Pro 195 200
205cca ttg gag cag tgt gtc cct gat cgg ggg cag cag tac cag
ggg cgc 672Pro Leu Glu Gln Cys Val Pro Asp Arg Gly Gln Gln Tyr Gln
Gly Arg 210 215 220ctg gcg gtg acc aca
cat ggg ctc ccc tgc ctg gcc tgg gcc agc gca 720Leu Ala Val Thr Thr
His Gly Leu Pro Cys Leu Ala Trp Ala Ser Ala225 230
235 240cag gcc aag gcc ctg agc aag cac cag gac
ttc aac tca gct gtg cag 768Gln Ala Lys Ala Leu Ser Lys His Gln Asp
Phe Asn Ser Ala Val Gln 245 250
255ctg gtg gag aac ttc tgc cgc aac cca gac ggg gat gag gag ggc gtg
816Leu Val Glu Asn Phe Cys Arg Asn Pro Asp Gly Asp Glu Glu Gly Val
260 265 270tgg tgc tat gtg gcc ggg
aag cct ggc gac ttt ggg tac tgc gac ctc 864Trp Cys Tyr Val Ala Gly
Lys Pro Gly Asp Phe Gly Tyr Cys Asp Leu 275 280
285aac tat tgt gag gag gcc gtg gag gag gag aca gga gat ggg
ctg gat 912Asn Tyr Cys Glu Glu Ala Val Glu Glu Glu Thr Gly Asp Gly
Leu Asp 290 295 300gag gac tca gac agg
gcc atc gaa ggg cgt acc gcc acc agt gag tac 960Glu Asp Ser Asp Arg
Ala Ile Glu Gly Arg Thr Ala Thr Ser Glu Tyr305 310
315 320cag act ttc ttc aat ccg agg acc ttt ggc
tcg gga gag gca gac tgt 1008Gln Thr Phe Phe Asn Pro Arg Thr Phe Gly
Ser Gly Glu Ala Asp Cys 325 330
335ggg ctg cga cct ctg ttc gag aag aag tcg ctg gag gac aaa acc gaa
1056Gly Leu Arg Pro Leu Phe Glu Lys Lys Ser Leu Glu Asp Lys Thr Glu
340 345 350aga gag ctc ctg gaa tcc
tac atc gac ggg cgc att gtg gag ggc tcg 1104Arg Glu Leu Leu Glu Ser
Tyr Ile Asp Gly Arg Ile Val Glu Gly Ser 355 360
365gat gca gag atc ggc atg tca cct tgg cag gtg atg ctt ttc
cgg aag 1152Asp Ala Glu Ile Gly Met Ser Pro Trp Gln Val Met Leu Phe
Arg Lys 370 375 380agt ccc cag gag ctg
ctg tgt ggg gcc agc ctc atc agt gac cgc tgg 1200Ser Pro Gln Glu Leu
Leu Cys Gly Ala Ser Leu Ile Ser Asp Arg Trp385 390
395 400gtc ctc acc gcc gcc cac tgc ctc ctg tac
ccg ccc tgg gac aag aac 1248Val Leu Thr Ala Ala His Cys Leu Leu Tyr
Pro Pro Trp Asp Lys Asn 405 410
415ttc acc gag aat gac ctt ctg gtg cgc att ggc aag cac tcc cgc aca
1296Phe Thr Glu Asn Asp Leu Leu Val Arg Ile Gly Lys His Ser Arg Thr
420 425 430agg tac gag cga aac att
gaa aag ata tcc atg ttg gaa aag atc tac 1344Arg Tyr Glu Arg Asn Ile
Glu Lys Ile Ser Met Leu Glu Lys Ile Tyr 435 440
445atc cac ccc agg tac aac tgg cgg gag aac ctg gac cgg gac
att gcc 1392Ile His Pro Arg Tyr Asn Trp Arg Glu Asn Leu Asp Arg Asp
Ile Ala 450 455 460ctg atg aag ctg aag
aag cct gtt gcc ttc agt gac tac att cac cct 1440Leu Met Lys Leu Lys
Lys Pro Val Ala Phe Ser Asp Tyr Ile His Pro465 470
475 480gtg tgt ctg ccc gac agg gag acg gca gcc
agc ttg ctc cag gct gga 1488Val Cys Leu Pro Asp Arg Glu Thr Ala Ala
Ser Leu Leu Gln Ala Gly 485 490
495tac aag ggg cgg gtg aca ggc tgg ggc aac ctg aag gag acg tgg aca
1536Tyr Lys Gly Arg Val Thr Gly Trp Gly Asn Leu Lys Glu Thr Trp Thr
500 505 510gcc aac gtt ggt aag ggg
cag ccc agt gtc ctg cag gtg gtg aac ctg 1584Ala Asn Val Gly Lys Gly
Gln Pro Ser Val Leu Gln Val Val Asn Leu 515 520
525ccc att gtg gag cgg ccg gtc tgc aag gac tcc acc cgg atc
cgc atc 1632Pro Ile Val Glu Arg Pro Val Cys Lys Asp Ser Thr Arg Ile
Arg Ile 530 535 540act gac aac atg ttc
tgt gct ggt tac aag cct gat gaa ggg aaa cga 1680Thr Asp Asn Met Phe
Cys Ala Gly Tyr Lys Pro Asp Glu Gly Lys Arg545 550
555 560ggg gat gcc tgt gaa ggt gac agt ggg gga
ccc ttt gtc atg aag agc 1728Gly Asp Ala Cys Glu Gly Asp Ser Gly Gly
Pro Phe Val Met Lys Ser 565 570
575ccc ttt aac aac cgc tgg tat caa atg ggc atc gtc tca tgg ggt gaa
1776Pro Phe Asn Asn Arg Trp Tyr Gln Met Gly Ile Val Ser Trp Gly Glu
580 585 590ggc tgt gac cgg gat ggg
aaa tat ggc ttc tac aca cat gtg ttc cgc 1824Gly Cys Asp Arg Asp Gly
Lys Tyr Gly Phe Tyr Thr His Val Phe Arg 595 600
605ctg aag aag tgg ata cag aag gtc att gat cag ttt gga gag
tag 1869Leu Lys Lys Trp Ile Gln Lys Val Ile Asp Gln Phe Gly Glu
610 615 620116622PRTHomo sapiens 116Met
Ala His Val Arg Gly Leu Gln Leu Pro Gly Cys Leu Ala Leu Ala1
5 10 15Ala Leu Cys Ser Leu Val His
Ser Gln His Val Phe Leu Ala Pro Gln 20 25
30Gln Ala Arg Ser Leu Leu Gln Arg Val Arg Arg Ala Asn Thr
Phe Leu 35 40 45Glu Glu Val Arg
Lys Gly Asn Leu Glu Arg Glu Cys Val Glu Glu Thr 50 55
60Cys Ser Tyr Glu Glu Ala Phe Glu Ala Leu Glu Ser Ser
Thr Ala Thr65 70 75
80Asp Val Phe Trp Ala Lys Tyr Thr Ala Cys Glu Thr Ala Arg Thr Pro
85 90 95Arg Asp Lys Leu Ala Ala
Cys Leu Glu Gly Asn Cys Ala Glu Gly Leu 100
105 110Gly Thr Asn Tyr Arg Gly His Val Asn Ile Thr Arg
Ser Gly Ile Glu 115 120 125Cys Gln
Leu Trp Arg Ser Arg Tyr Pro His Lys Pro Glu Ile Asn Ser 130
135 140Thr Thr His Pro Gly Ala Asp Leu Gln Glu Asn
Phe Cys Arg Asn Pro145 150 155
160Asp Ser Ser Thr Thr Gly Pro Trp Cys Tyr Thr Thr Asp Pro Thr Val
165 170 175Arg Arg Gln Glu
Cys Ser Ile Pro Val Cys Gly Gln Asp Gln Val Thr 180
185 190Val Ala Met Thr Pro Arg Ser Glu Gly Ser Ser
Val Asn Leu Ser Pro 195 200 205Pro
Leu Glu Gln Cys Val Pro Asp Arg Gly Gln Gln Tyr Gln Gly Arg 210
215 220Leu Ala Val Thr Thr His Gly Leu Pro Cys
Leu Ala Trp Ala Ser Ala225 230 235
240Gln Ala Lys Ala Leu Ser Lys His Gln Asp Phe Asn Ser Ala Val
Gln 245 250 255Leu Val Glu
Asn Phe Cys Arg Asn Pro Asp Gly Asp Glu Glu Gly Val 260
265 270Trp Cys Tyr Val Ala Gly Lys Pro Gly Asp
Phe Gly Tyr Cys Asp Leu 275 280
285Asn Tyr Cys Glu Glu Ala Val Glu Glu Glu Thr Gly Asp Gly Leu Asp 290
295 300Glu Asp Ser Asp Arg Ala Ile Glu
Gly Arg Thr Ala Thr Ser Glu Tyr305 310
315 320Gln Thr Phe Phe Asn Pro Arg Thr Phe Gly Ser Gly
Glu Ala Asp Cys 325 330
335Gly Leu Arg Pro Leu Phe Glu Lys Lys Ser Leu Glu Asp Lys Thr Glu
340 345 350Arg Glu Leu Leu Glu Ser
Tyr Ile Asp Gly Arg Ile Val Glu Gly Ser 355 360
365Asp Ala Glu Ile Gly Met Ser Pro Trp Gln Val Met Leu Phe
Arg Lys 370 375 380Ser Pro Gln Glu Leu
Leu Cys Gly Ala Ser Leu Ile Ser Asp Arg Trp385 390
395 400Val Leu Thr Ala Ala His Cys Leu Leu Tyr
Pro Pro Trp Asp Lys Asn 405 410
415Phe Thr Glu Asn Asp Leu Leu Val Arg Ile Gly Lys His Ser Arg Thr
420 425 430Arg Tyr Glu Arg Asn
Ile Glu Lys Ile Ser Met Leu Glu Lys Ile Tyr 435
440 445Ile His Pro Arg Tyr Asn Trp Arg Glu Asn Leu Asp
Arg Asp Ile Ala 450 455 460Leu Met Lys
Leu Lys Lys Pro Val Ala Phe Ser Asp Tyr Ile His Pro465
470 475 480Val Cys Leu Pro Asp Arg Glu
Thr Ala Ala Ser Leu Leu Gln Ala Gly 485
490 495Tyr Lys Gly Arg Val Thr Gly Trp Gly Asn Leu Lys
Glu Thr Trp Thr 500 505 510Ala
Asn Val Gly Lys Gly Gln Pro Ser Val Leu Gln Val Val Asn Leu 515
520 525Pro Ile Val Glu Arg Pro Val Cys Lys
Asp Ser Thr Arg Ile Arg Ile 530 535
540Thr Asp Asn Met Phe Cys Ala Gly Tyr Lys Pro Asp Glu Gly Lys Arg545
550 555 560Gly Asp Ala Cys
Glu Gly Asp Ser Gly Gly Pro Phe Val Met Lys Ser 565
570 575Pro Phe Asn Asn Arg Trp Tyr Gln Met Gly
Ile Val Ser Trp Gly Glu 580 585
590Gly Cys Asp Arg Asp Gly Lys Tyr Gly Phe Tyr Thr His Val Phe Arg
595 600 605Leu Lys Lys Trp Ile Gln Lys
Val Ile Asp Gln Phe Gly Glu 610 615
62011721DNAArtificialSynthetic Sequence 117ccc ggg ccc cgg ggc ccg cct
21Pro Gly Pro Arg Gly Pro Pro1
51187PRTArtificialSynthetic Sequence 118Pro Gly Pro Arg Gly Pro
Pro1 51191878DNAHomo sapiensgene(1)..(1878)Gene for Human
factor XI (blood coagulation factor) 119atg att ttc tta tat caa gtg
gta cat ttc att tta ttt act tca gtt 48Met Ile Phe Leu Tyr Gln Val
Val His Phe Ile Leu Phe Thr Ser Val1 5 10
15tct ggt gaa tgt gtg act cag ttg ttg aag gac acc tgc
ttt gaa gga 96Ser Gly Glu Cys Val Thr Gln Leu Leu Lys Asp Thr Cys
Phe Glu Gly 20 25 30ggg gac
att act acg gtc ttc aca cca agc gcc aag tac tgc cag gta 144Gly Asp
Ile Thr Thr Val Phe Thr Pro Ser Ala Lys Tyr Cys Gln Val 35
40 45gtc tgc act tac cac cca aga tgt tta ctc
ttc act ttc acg gcg gaa 192Val Cys Thr Tyr His Pro Arg Cys Leu Leu
Phe Thr Phe Thr Ala Glu 50 55 60tca
cca tct gag gat ccc acc cga tgg ttt act tgt gtc ctg aaa gac 240Ser
Pro Ser Glu Asp Pro Thr Arg Trp Phe Thr Cys Val Leu Lys Asp65
70 75 80agt gtt aca gaa aca ctg
cca aga gtg aat agg aca gca gcg att tct 288Ser Val Thr Glu Thr Leu
Pro Arg Val Asn Arg Thr Ala Ala Ile Ser 85
90 95ggg tat tct ttc aag caa tgc tca cac caa ata agc
gct tgc aac aaa 336Gly Tyr Ser Phe Lys Gln Cys Ser His Gln Ile Ser
Ala Cys Asn Lys 100 105 110gac
att tat gtg gac cta gac atg aag ggc ata aac tat aac agc tca 384Asp
Ile Tyr Val Asp Leu Asp Met Lys Gly Ile Asn Tyr Asn Ser Ser 115
120 125gtt gcc aag agt gct caa gaa tgc caa
gaa aga tgc acg gat gac gtc 432Val Ala Lys Ser Ala Gln Glu Cys Gln
Glu Arg Cys Thr Asp Asp Val 130 135
140cac tgc cac ttt ttc acg tac gcc aca agg cag ttt ccc agc ctg gag
480His Cys His Phe Phe Thr Tyr Ala Thr Arg Gln Phe Pro Ser Leu Glu145
150 155 160cat cgt aac att
tgt cta ctg aag cac acc caa aca ggg aca cca acc 528His Arg Asn Ile
Cys Leu Leu Lys His Thr Gln Thr Gly Thr Pro Thr 165
170 175aga ata acg aag ctc gat aaa gtg gtg tct
gga ttt tca ctg aaa tcc 576Arg Ile Thr Lys Leu Asp Lys Val Val Ser
Gly Phe Ser Leu Lys Ser 180 185
190tgt gca ctt tct aat ctg gct tgt att agg gac att ttc cct aat acg
624Cys Ala Leu Ser Asn Leu Ala Cys Ile Arg Asp Ile Phe Pro Asn Thr
195 200 205gtg ttt gca gac agc aac atc
gac agt gtc atg gct ccc gat gct ttt 672Val Phe Ala Asp Ser Asn Ile
Asp Ser Val Met Ala Pro Asp Ala Phe 210 215
220gtc tgt ggc cga atc tgc act cat cat ccc ggt tgc ttg ttt ttt acc
720Val Cys Gly Arg Ile Cys Thr His His Pro Gly Cys Leu Phe Phe Thr225
230 235 240ttc ttt tcc cag
gaa tgg ccc aaa gaa tct caa aga aat ctt tgt ctc 768Phe Phe Ser Gln
Glu Trp Pro Lys Glu Ser Gln Arg Asn Leu Cys Leu 245
250 255ctt aaa aca tct gag agt gga ttg ccc agt
aca cgc att aaa aag agc 816Leu Lys Thr Ser Glu Ser Gly Leu Pro Ser
Thr Arg Ile Lys Lys Ser 260 265
270aaa gct ctt tct ggt ttc agt cta caa agc tgc agg cac agc atc cca
864Lys Ala Leu Ser Gly Phe Ser Leu Gln Ser Cys Arg His Ser Ile Pro
275 280 285gtg ttc tgc cat tct tca ttt
tac cat gac act gat ttc ttg gga gaa 912Val Phe Cys His Ser Ser Phe
Tyr His Asp Thr Asp Phe Leu Gly Glu 290 295
300gaa ctg gat att gtt gct gca aaa agt cac gag gcc tgc cag aaa ctg
960Glu Leu Asp Ile Val Ala Ala Lys Ser His Glu Ala Cys Gln Lys Leu305
310 315 320tgc acc aat gcc
gtc cgc tgc cag ttt ttt acc tat acc cca gcc caa 1008Cys Thr Asn Ala
Val Arg Cys Gln Phe Phe Thr Tyr Thr Pro Ala Gln 325
330 335gca tcc tgc aac gaa ggg aag ggc aag tgt
tac tta aag ctt tct tca 1056Ala Ser Cys Asn Glu Gly Lys Gly Lys Cys
Tyr Leu Lys Leu Ser Ser 340 345
350aac gga tct cca act aaa ata ctt cac ggg aga gga ggc atc tct gga
1104Asn Gly Ser Pro Thr Lys Ile Leu His Gly Arg Gly Gly Ile Ser Gly
355 360 365tac aca tta agg ttg tgt aaa
atg gat aat gag tgt acc acc aaa atc 1152Tyr Thr Leu Arg Leu Cys Lys
Met Asp Asn Glu Cys Thr Thr Lys Ile 370 375
380aag ccc agg atc gtt gga gga act gcg tct gtt cgt ggt gag tgg ccg
1200Lys Pro Arg Ile Val Gly Gly Thr Ala Ser Val Arg Gly Glu Trp Pro385
390 395 400tgg cag gtg acc
ctg cac aca acc tca ccc act cag aga cac ctg tgt 1248Trp Gln Val Thr
Leu His Thr Thr Ser Pro Thr Gln Arg His Leu Cys 405
410 415gga ggc tcc atc att gga aac cag tgg ata
tta aca gcc gct cac tgt 1296Gly Gly Ser Ile Ile Gly Asn Gln Trp Ile
Leu Thr Ala Ala His Cys 420 425
430ttc tat ggg gta gag tca cct aag att ttg cgt gtc tac agt ggc att
1344Phe Tyr Gly Val Glu Ser Pro Lys Ile Leu Arg Val Tyr Ser Gly Ile
435 440 445tta aat caa tct gaa ata aaa
gag gac aca tct ttc ttt ggg gtt caa 1392Leu Asn Gln Ser Glu Ile Lys
Glu Asp Thr Ser Phe Phe Gly Val Gln 450 455
460gaa ata ata atc cat gat cag tat aaa atg gca gaa agc ggg tat gat
1440Glu Ile Ile Ile His Asp Gln Tyr Lys Met Ala Glu Ser Gly Tyr Asp465
470 475 480att gcc ttg ttg
aaa ctg gaa acc aca gtg aat tac aca gat tct caa 1488Ile Ala Leu Leu
Lys Leu Glu Thr Thr Val Asn Tyr Thr Asp Ser Gln 485
490 495cga ccc ata tgc ctg cct tcc aaa gga gat
aga aat gta ata tac act 1536Arg Pro Ile Cys Leu Pro Ser Lys Gly Asp
Arg Asn Val Ile Tyr Thr 500 505
510gat tgc tgg gtg act gga tgg ggg tac aga aaa cta aga gac aaa ata
1584Asp Cys Trp Val Thr Gly Trp Gly Tyr Arg Lys Leu Arg Asp Lys Ile
515 520 525caa aat act ctc cag aaa gcc
aag ata ccc tta gtg acc aac gaa gag 1632Gln Asn Thr Leu Gln Lys Ala
Lys Ile Pro Leu Val Thr Asn Glu Glu 530 535
540tgc cag aag aga tac aga gga cat aaa ata acc cat aag atg atc tgt
1680Cys Gln Lys Arg Tyr Arg Gly His Lys Ile Thr His Lys Met Ile Cys545
550 555 560gcc ggc tac agg
gaa gga ggg aag gac gct tgc aag gga gat tcg gga 1728Ala Gly Tyr Arg
Glu Gly Gly Lys Asp Ala Cys Lys Gly Asp Ser Gly 565
570 575ggc cct ctg tcc tgc aaa cac aat gag gtc
tgg cat ctg gta ggc atc 1776Gly Pro Leu Ser Cys Lys His Asn Glu Val
Trp His Leu Val Gly Ile 580 585
590acg agc tgg ggc gaa ggc tgt gct caa agg gag cgg cca ggt gtt tac
1824Thr Ser Trp Gly Glu Gly Cys Ala Gln Arg Glu Arg Pro Gly Val Tyr
595 600 605acc aac gtg gtc gag tac gtg
gac tgg att ctg gag aaa act caa gca 1872Thr Asn Val Val Glu Tyr Val
Asp Trp Ile Leu Glu Lys Thr Gln Ala 610 615
620gtg tga
1878Val625120625PRTHomo sapiens 120Met Ile Phe Leu Tyr Gln Val Val His
Phe Ile Leu Phe Thr Ser Val1 5 10
15Ser Gly Glu Cys Val Thr Gln Leu Leu Lys Asp Thr Cys Phe Glu
Gly 20 25 30Gly Asp Ile Thr
Thr Val Phe Thr Pro Ser Ala Lys Tyr Cys Gln Val 35
40 45Val Cys Thr Tyr His Pro Arg Cys Leu Leu Phe Thr
Phe Thr Ala Glu 50 55 60Ser Pro Ser
Glu Asp Pro Thr Arg Trp Phe Thr Cys Val Leu Lys Asp65 70
75 80Ser Val Thr Glu Thr Leu Pro Arg
Val Asn Arg Thr Ala Ala Ile Ser 85 90
95Gly Tyr Ser Phe Lys Gln Cys Ser His Gln Ile Ser Ala Cys
Asn Lys 100 105 110Asp Ile Tyr
Val Asp Leu Asp Met Lys Gly Ile Asn Tyr Asn Ser Ser 115
120 125Val Ala Lys Ser Ala Gln Glu Cys Gln Glu Arg
Cys Thr Asp Asp Val 130 135 140His Cys
His Phe Phe Thr Tyr Ala Thr Arg Gln Phe Pro Ser Leu Glu145
150 155 160His Arg Asn Ile Cys Leu Leu
Lys His Thr Gln Thr Gly Thr Pro Thr 165
170 175Arg Ile Thr Lys Leu Asp Lys Val Val Ser Gly Phe
Ser Leu Lys Ser 180 185 190Cys
Ala Leu Ser Asn Leu Ala Cys Ile Arg Asp Ile Phe Pro Asn Thr 195
200 205Val Phe Ala Asp Ser Asn Ile Asp Ser
Val Met Ala Pro Asp Ala Phe 210 215
220Val Cys Gly Arg Ile Cys Thr His His Pro Gly Cys Leu Phe Phe Thr225
230 235 240Phe Phe Ser Gln
Glu Trp Pro Lys Glu Ser Gln Arg Asn Leu Cys Leu 245
250 255Leu Lys Thr Ser Glu Ser Gly Leu Pro Ser
Thr Arg Ile Lys Lys Ser 260 265
270Lys Ala Leu Ser Gly Phe Ser Leu Gln Ser Cys Arg His Ser Ile Pro
275 280 285Val Phe Cys His Ser Ser Phe
Tyr His Asp Thr Asp Phe Leu Gly Glu 290 295
300Glu Leu Asp Ile Val Ala Ala Lys Ser His Glu Ala Cys Gln Lys
Leu305 310 315 320Cys Thr
Asn Ala Val Arg Cys Gln Phe Phe Thr Tyr Thr Pro Ala Gln
325 330 335Ala Ser Cys Asn Glu Gly Lys
Gly Lys Cys Tyr Leu Lys Leu Ser Ser 340 345
350Asn Gly Ser Pro Thr Lys Ile Leu His Gly Arg Gly Gly Ile
Ser Gly 355 360 365Tyr Thr Leu Arg
Leu Cys Lys Met Asp Asn Glu Cys Thr Thr Lys Ile 370
375 380Lys Pro Arg Ile Val Gly Gly Thr Ala Ser Val Arg
Gly Glu Trp Pro385 390 395
400Trp Gln Val Thr Leu His Thr Thr Ser Pro Thr Gln Arg His Leu Cys
405 410 415Gly Gly Ser Ile Ile
Gly Asn Gln Trp Ile Leu Thr Ala Ala His Cys 420
425 430Phe Tyr Gly Val Glu Ser Pro Lys Ile Leu Arg Val
Tyr Ser Gly Ile 435 440 445Leu Asn
Gln Ser Glu Ile Lys Glu Asp Thr Ser Phe Phe Gly Val Gln 450
455 460Glu Ile Ile Ile His Asp Gln Tyr Lys Met Ala
Glu Ser Gly Tyr Asp465 470 475
480Ile Ala Leu Leu Lys Leu Glu Thr Thr Val Asn Tyr Thr Asp Ser Gln
485 490 495Arg Pro Ile Cys
Leu Pro Ser Lys Gly Asp Arg Asn Val Ile Tyr Thr 500
505 510Asp Cys Trp Val Thr Gly Trp Gly Tyr Arg Lys
Leu Arg Asp Lys Ile 515 520 525Gln
Asn Thr Leu Gln Lys Ala Lys Ile Pro Leu Val Thr Asn Glu Glu 530
535 540Cys Gln Lys Arg Tyr Arg Gly His Lys Ile
Thr His Lys Met Ile Cys545 550 555
560Ala Gly Tyr Arg Glu Gly Gly Lys Asp Ala Cys Lys Gly Asp Ser
Gly 565 570 575Gly Pro Leu
Ser Cys Lys His Asn Glu Val Trp His Leu Val Gly Ile 580
585 590Thr Ser Trp Gly Glu Gly Cys Ala Gln Arg
Glu Arg Pro Gly Val Tyr 595 600
605Thr Asn Val Val Glu Tyr Val Asp Trp Ile Leu Glu Lys Thr Gln Ala 610
615 620Val62512136DNAArtificialSynthetic
Sequence 121ctcgaggtgc gcttccagat gagtaaagga gaagaa
3612245DNAArtificialSynthetic Sequence 122ctcgagagac aggctaattt
tttagggatg agtaaaggag aagaa
4512324DNAArtificialSynthetic Sequence 123gagctcttat ttgtatagtt catc
2412427DNAArtificialSynthetic
Sequence 124ctcgagatga gtaaaggaga agaactt
2712524DNAArtificialSynthetic Sequence 125gagctctttg tatagttcat
ccat
2412636DNAArtificialSynthetic Sequence 126gagctcgtgc gcttccagat
ggtgcgctcc tccaag
3612745DNAArtificialSynthetic Sequence 127gagctcagac aggctaattt
tttagggatg gtgcgctcct ccaag
4512824DNAArtificialSynthetic Sequence 128gagctcctac aggaacaggt ggtg
2412936DNAArtificialSynthetic
Sequence 129ctcgaggtgc gcttccaggg aaaaacttcg ggagcg
3613045DNAArtificialSynthetic Sequence 130ctcgagagac aggctaattt
tttaggggga aaaacttcgg gagcg
4513124DNAArtificialSynthetic Sequence 131gagctcttat ttgtatagtt catc
2413224DNAArtificialSynthetic
Sequence 132ctcgagatgg tgcgctcctc caag
2413324DNAArtificialSynthetic Sequence 133gagctcttat ttgtatagtt
catc 24
User Contributions:
Comment about this patent or add new information about this topic: