Patent application title: VARIANTS AND EXONS OF THE GLYT1 TRANSPORTER
Inventors:
Gilbert Thill (Bois Colombes, FR)
Philippe Jais (Issy-Les-Moylineaux, FR)
Pascale Grel (Viry-Chatillon, FR)
Sandrine Mace (Jouy-En-Josas, FR)
Sandrine Mace (Jouy-En-Josas, FR)
Assignees:
Serono Genetics Institute S.A.
IPC8 Class: AA01K6700FI
USPC Class:
800 8
Class name: Multicellular living organisms and unmodified parts thereof and related processes nonhuman animal
Publication date: 2008-08-21
Patent application number: 20080201789
Claims:
1. An isolated, purified, or recombinant polynucleotide comprising:a) SEQ
ID NOs:2-9 or 14-19 or 21, or a sequence complementary to any of these
sequences;b) a nucleic acid sequence having at least about 95% identity
to SEQ ID NOs: 14-19 or 21 and encoding a functional glycine transporter;
orc) a polynucleotide which encodes a polypeptide comprising SEQ ID
NOs:26-31 or 33.
2. The polynucleotide of claim 1, wherein said polynucleotide is attached to a solid support.
3. The polynucleotide of claim 1, further comprising a label.
4. The polynucleotide of claim 1, wherein said polynucleotide is operably linked to a promoter.
5. An array of polynucleotides comprising the polynucleotide of claim 1.
6. The array of claim 5, wherein said array is addressable.
7. A recombinant vector comprising the polynucleotide of claim 1.
8. A host cell comprising the recombinant vector of claim 7.
9. A non-human host animal or mammal comprising the recombinant vector of claim 7.
10. An isolated, purified, or recombinant polypeptide comprising:a) SEQ ID NOs:26-33; orb) the polypeptide encoded by any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21.
11. A method of producing a GlyT1 polypeptide, said method comprising the following steps:a) providing a host cell comprising a nucleic acid according to claim 1 operably linked to a promoter;b) cultivating said host cell under conditions conducive to the expression of said polypeptide; andc) isolating said polypeptide from said host cell.
12. An isolated or purified antibody capable of selectively binding to an epitope-containing fragment of a polypeptide according to claim 10.
13. A method of binding an anti-GlyT1 antibody to a polypeptide comprising contacting said antibody with said polypeptide according to claim 10 under conditions in which said antibody can specifically bind to said polypeptide.
14. A diagnostic kit comprising a polynucleotide according to claim 1.
15. A method of detecting the expression of a GlyT1 gene within a cell, said method comprising the steps of:a) contacting said cell or an extract from said cell with a polynucleotide that hybridizes under stringent conditions to a polynucleotide encoding GlyT1 or a compound that specifically binds to GlyT1; andb) detecting the presence or absence of hybridization between said polynucleotide and an RNA species within said cell or extract, or the presence or absence of binding of said compound to a protein within said cell or extract;wherein a detection of the presence of said hybridization or of said binding indicates that said GlyT1 gene is expressed within said cell.
16. The method of claim 15, wherein said polynucleotide is an oligonucleotide primer, and wherein said hybridization is detected by detecting the presence of an amplification product comprising the sequence of said primer.
17. The method of claim 15, wherein said compound is an anti-GlyT1 antibody.
18. A method of identifying a candidate modulator of a GlyT1 polypeptide, said method comprising:a) contacting the polypeptide of claim 10 with a test compound; andb) determining whether said compound specifically binds to said polypeptide;wherein a detection that said compound specifically binds to said polypeptide indicates that said compound is a candidate modulator of said GlyT1 polypeptide.
19. The method of claim 18, further comprising testing the activity of said GlyT1 polypeptide in the presence of said candidate modulator, wherein a difference in the activity of said GlyT1 polypeptide in the presence of said candidate modulator in comparison to the activity in the absence of said candidate modulator indicates that the candidate modulator is a modulator of said GlyT1 polypeptide.
20. A method of identifying a modulator of a GlyT1 polypeptide, said method comprising:a) contacting the polypeptide of claim 10 with a test compound; andb) detecting the activity of said polypeptide in the presence and absence of said compound;wherein a detection of a difference in said activity in the presence of said compound in comparison to the activity in the absence of said compound indicates that said compound is a modulator of said GlyT1 polypeptide.
21. The method of claim 19, wherein said polypeptide is present in a cell or cell membrane, and wherein said activity comprises glycine transport activity.
22. The method of claim 20, wherein said polypeptide is present in a cell or cell membrane, and wherein said activity comprises glycine transport activity.
23. A method for pharmaceutical composition comprising:a) identifying a modulator of a GlyT1 polypeptide using the method of claim 20; andb) combining said modulator with a physiologically acceptable carrier.
Description:
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001]This application is a divisional of U.S. application Ser. No. 10/484,690, filed Sep. 27, 2004, which is the national stage of international application No. PCT/IB02/03386, filed Jul. 22, 2002, which claims the benefit of U.S. Provisional Application Ser. No. 60/307,685, filed Jul. 24, 2001, the disclosures of which are hereby incorporated by reference in their entireties, including all figures, tables and amino acid or nucleic acid sequences.
[0002]The Sequence Listing for this application is labeled "seq-list.txt" which was created on Feb. 15, 2008 and is 166 KB. The entire contents of the sequence listing is incorporated herein by reference in its entirety.
FIELD OF THE INVENTION
[0003]The present invention is directed to polynucleotides encoding novel exons and novel splice variants of the sodium and chloride-dependent glycine transporter type 1 (GlyT1), and their use in the treatment and diagnosis of neurological and psychiatric disorders such as schizophrenia. The invention also deals with antibodies directed specifically against these novel polypeptides which are useful, e.g., as diagnostic reagents.
BACKGROUND OF THE INVENTION
[0004]Neurotransmitter transporters play a critical role in the regulation of synaptic transmission. These transporters, which are located on the pre-synaptic terminal and surrounding glial cells, sequester neurotransmitter from the synapse, thereby regulating the synaptic concentration of neurotransmitter and influencing the duration and magnitude of synaptic transmission. Transporters also help to limit the extent of synaptic transmission by preventing the spread of transmitter to neighboring synapses. In view of the important role played by these transporters in neurological function, they represent attractive targets for pharmacological modulation, potentially providing novel methods of treatment for any of a number of psychological and neurological conditions.
[0005]The amino acid glycine functions at both inhibitory and excitatory synapses in the central and peripheral nervous systems of mammals. The excitatory and inhibitory functions of glycine are mediated by two different types of receptor, each of which is associated with a different type of glycine transporter. At excitatory synapses, glycine acts as an obligatory co-agonist at a class of glutamate receptors called N-methyl-D-aspartate (NMDA) receptors. Activation of these receptors in neurons increases sodium and calcium conductance, thereby depolarizing the neuron and increasing the likelihood that the neuron will fire an action potential.
[0006]The class of glycine transporter thought to be involved in excitatory synapses in conjunction with NMDA receptors is Glyt-1. At least four variants of GlyT-1 (GlyT-1a, GlyT-1b, GlyT-1c, and Glyt-1d), have been described. Both GlyT1 and GlyT2 transporters are members of a broader family of sodium- and chloride-dependent neurotransmitter transporters, the members of which typically have 12 transmembrane domains (Olivares et al. (1997) J. Biol. Chem. 272:1211-1217; Uhl, Trends in Neuroscience 15: 265-268, 1992; Clark et al, BioEssays 15: 323-332, 1993). Both the N- and C-termini of the members of this family are thought to be intracellular.
[0007]NMDA receptor activity has been implicated in a large number of psychological and neurological functions, such as learning and memory, and in a large number of diseases and conditions, including schizophrenia, dementias, attention-deficit hyperactive disorder, and various neurodegenerative disorders. Thus, modulators of GlyT1 proteins can used to treat these and other conditions. The present invention addresses these and other needs.
SUMMARY OF THE INVENTION
[0008]The present invention pertains to polynucleotides and polypeptides corresponding to cDNA sequences encoding 8 novel splice variants of the GlyT1 glycine transporter. Oligonucleotide probes or primers hybridizing specifically with the novel cDNA sequences are also part of the present invention, as are DNA amplification and detection methods using said primers and probes.
[0009]A further object of the invention consists of recombinant vectors comprising any of the nucleic acid sequences described herein, as well as of cell hosts and transgenic non human animals comprising these nucleic acid sequences or recombinant vectors.
[0010]The invention is also directed to methods for the screening of substances or molecules that interact with any of the present polypeptides or that modulate the activity of any of the present polypeptides.
[0011]As such, in one aspect, the present invention provides an isolated, purified, or recombinant polynucleotide comprising a nucleic acid sequence that is at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to a contiguous span of at least 12, 25, 50, 100, 250, 500, 1000, or more nucleotides of any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21, or a sequence complementary to any of these sequences. In another aspect, the present invention provides an isolated, purified, or recombinant polynucleotide comprising a nucleic acid sequence that encodes a functional GlyT1 transporter and which specifically hybridizes under stringent or moderate conditions with any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21.
[0012]In another aspect, the present invention provides an isolated, purified, or recombinant polynucleotide comprising a nucleic acid sequence that is at least about 70%, 75%, 80%, 85%, 90%, 95% or more identical to any of the sequences shown as SEQ ID NOs:14-21, wherein the polynucleotide comprises a sequence at least about 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any of the sequences shown as SEQ ID NOs:2-9.
[0013]In another aspect, the present invention provides an isolated, purified, or recombinant polynucleotide encoding a glycine transporter, wherein said polynucleotide hybridizes under stringent or moderate hybridization conditions with a nucleic acid comprising any of the sequences shown as SEQ ID NOs:14-21, and wherein said polynucleotide comprises any of the sequences shown as SEQ ID NOs:2-9.
[0014]In another aspect, the present invention provides an isolated, purified, or recombinant polynucleotide which encodes a polypeptide comprising an amino acid sequence that is at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more identical to a contiguous span of at least 6, 12, 25, 50, 100, 200, 300, 400, 500, or more amino acids of any of SEQ ID NOs:26-33. In one embodiment, the polypeptide comprises any of the amino acid sequences shown as SEQ ID NOs:26-33.
[0015]In another aspect, the present invention provides a method of producing a GlyT1 polypeptide, said method comprising the following steps: a) providing a host cell comprising a nucleic acid encoding any one of the polypeptides shown as SEQ ID NO:26-33, operably linked to a promoter; b) cultivating said host cell under conditions conducive to the expression of said polypeptide; and c) isolating said polypeptide from said host cell.
[0016]In one embodiment, the polynucleotide is attached to a solid support. In another embodiment, the polynucleotide further comprises a label. In another embodiment, the polynucleotide is operably linked to a promoter.
[0017]In another aspect, the present invention provides a biologically active fragment of any of the herein-described polynucleotides.
[0018]In another aspect, the present invention provides an array of polynucleotides comprising at least one of the herein-described polynucleotides. In one embodiment, the array is addressable.
[0019]In another aspect, the present invention provides a recombinant vector comprising any of the herein-described polynucleotides.
[0020]In another aspect, the present invention provides a host cell comprising any of the herein-described recombinant vectors or polynucleotides.
[0021]In another aspect, the present invention provides a non-human host animal or mammal comprising any of the herein-described recombinant vectors or polynucleotides.
[0022]In another aspect, the present invention provides an isolated, purified, or recombinant polypeptide comprising an amino acid sequence that is at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more identical to a contiguous span of at least 6, 12, 25, 50, 100, 250, 500, or more amino acids of any of the sequences shown as SEQ ID NOs:26-33.
[0023]In one embodiment, the polypeptide comprises any of the sequences shown as SEQ ID NOs:26-33.
[0024]In another aspect, the present invention provides an isolated, purified, or recombinant polypeptide, wherein the polypeptide comprises an amino acid sequence encoded by any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21.
[0025]In another aspect, the present invention provides a biologically active fragment of any of the herein-described polypeptides.
[0026]The invention further relates to methods of making the polypeptides of the present invention.
[0027]In another aspect, the present invention provides an isolated or purified antibody capable of selectively binding to an epitope-containing fragment of any of the herein-described polypeptides, such as the polypeptides encoded by any of the sequences shown as SEQ ID NOs: 2-9 or 14-21, or the polypeptides comprising any of the amino acid sequences shown as SEQ ID NOs:26-33
[0028]In another aspect, the present invention provides a method of binding an anti-GlyT1 antibody to any of the herein-described polypeptides, e.g. polypeptides encoded by any of SEQ ID NOs:2-9 or 14-21, or comprising any of the sequences shown as SEQ ID NOs:26-33, said method comprising contacting said antibody with said polypeptide under conditions in which said antibody can specifically bind to said polypeptide.
[0029]The present invention further relates to transgenic plants or animals, wherein said transgenic plant or animal is transgenic for a polynucleotide of the present invention and expresses a polypeptide of the present invention, or in which a polynucleotide of the present invention has been specifically disrupted or replaced with an inactive version of the polynucleotide, or with a substitute version having altered properties.
[0030]In another aspect, the present invention provides a diagnostic kit comprising any of the herein described polynucleotides, polypeptides, or antibodies.
[0031]The invention also provides kits, uses and methods for detecting the expression and/or biological activity of any of the herein-described GlyT1 variants, e.g., in a biological sample. One such method involves assaying for expression using the polymerase chain reaction (PCR), e.g., RT-PCR, to detect mRNA encoding any of the variants. In another method, Northern blot hybridization is used. Alternatively, a method of detecting gene expression in a test sample can be accomplished using a compound which binds to any of the herein-described polypeptides, e.g. a GlyT1-specific antibody, preferably a variant-specific anti-GlyT1 antibody.
[0032]In another aspect, the present invention provides a method of detecting the expression of a GlyT1 gene within a cell, said method comprising the steps of: a) contacting said cell or an extract from said cell with either of: i) a polynucleotide that hybridizes under stringent conditions to any of the herein-described GlyT1 polynucleotides; or ii) a compound that specifically binds to any of the herein-described GlyT1 polypeptides; and b) detecting the presence or absence of hybridization between said polynucleotide and an RNA species within said cell or extract, or the presence or absence of binding of said compound to a protein within said cell or extract; wherein a detection of the presence of said hybridization or of said binding indicates that said GlyT1 gene is expressed within said cell.
[0033]In one embodiment, said polynucleotide is an oligonucleotide primer, and wherein said hybridization is detected by detecting the presence of an amplification product comprising the sequence of said primer. In another embodiment, said compound is an anti-GlyT1 antibody.
[0034]In another aspect, the present invention provides a method of identifying a candidate modulator of a GlyT1 polypeptide, said method comprising: a) contacting any of the herein-described GlyT1 polypeptides with a test compound; and b) determining whether said compound specifically binds to said polypeptide; wherein a detection that said compound specifically binds to said polypeptide indicates that said compound is a candidate modulator of said GlyT1 polypeptide.
[0035]In one embodiment, the method further comprises testing the activity of said GlyT1 polypeptide in the presence of said candidate modulator, wherein a difference in the activity of said GlyT1 polypeptide in the presence of said candidate modulator in comparison to the activity in the absence of said candidate modulator indicates that the candidate modulator is a modulator of said GlyT1 polypeptide.
[0036]In another aspect, the present invention provides a method of identifying a modulator of a GlyT1 polypeptide, said method comprising: a) contacting any of the herein-described polypeptides with a test compound; and b) detecting the activity of said polypeptide in the presence and absence of said compound; wherein a detection of a difference in said activity in the presence of said compound in comparison to the activity in the absence of said compound indicates that said compound is a modulator of said GlyT1 polypeptide.
[0037]In one embodiment of these methods, said polypeptide is present in a cell or cell membrane, and wherein said activity comprises glycine transport activity.
[0038]In another aspect, the present invention provides a method for the preparation of a pharmaceutical composition comprising a) identifying a modulator of a GlyT1 polypeptide using any of the herein-described methods; and b) combining said modulator with a physiologically acceptable carrier.
[0039]The present invention also relates to diagnostic methods and uses of the present polynucleotides and polypeptides for identifying humans or non-human animals having elevated or reduced levels of expression of any one or combination of the herein-described variants, which individuals are likely to benefit from therapies to suppress or enhance the expression of the variant or variants, respectively, and to methods of identifying individuals or non-human animals at increased risk for developing, or at present having, diseases or disorders associated with expression or biological activity of any one or combination of the herein-described variants.
[0040]The present invention also relates to kits, uses and methods for screening compounds for their ability to modulate (e.g. increase or inhibit) the activity or expression of any of the present variants. Uses of such compounds are also within the scope of the present invention.
[0041]The present invention also relates to pharmaceutical or physiologically acceptable compositions comprising, an active agent, the polypeptides, polynucleotides or antibodies of the present invention, as well as, typically, a pharmaceutically acceptable carrier.
[0042]The present invention also provides the use of any of the herein-described GlyT1 polynucleotides, polypeptides, antibodies, modulators, or kits, in the diagnosis or treatment of any disorder, preferably a neurological or psychiatric disorder such as schizophrenia, or in the preparation of a medicament for the treatment of any disorder including neurological or psychiatric disorders such as schizophrenia.
[0043]In another aspect, the present invention provides a computer readable medium having stored thereon a sequence selected from the group consisting of a nucleic acid code comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of any of the sequences shown as SEQ ID NOs:2-9 or 14-21.
[0044]In another aspect, the present invention provides a computer readable medium having stored thereon a sequence consisting of a polypeptide code comprising a contiguous span of at least 6, 8, 10, 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of the amino acid sequences shown as SEQ ID NOs:26-33.
[0045]In another aspect, the present invention provides a computer system comprising a processor and a data storage device, wherein said data storage device comprises any of the herein-described computer readable media.
[0046]In one embodiment, the computer system further comprises a sequence comparer and a data storage device having reference sequences stored thereon. In another embodiment, the computer system further comprises an identifier which identifies features in said sequence.
BRIEF DESCRIPTION OF THE DRAWING
[0047]FIG. 1 shows the 8 novel GlyT1 splice variants of the present invention. The exon structure is shown for each variant in comparison with the structure for the previously described variant of Genbank accession no. S70612. The genomic structure for all of the exons is also indicated within the genomic sequence presented in Genbank accession no. AC005038.
BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE LISTING
[0048]SEQ ID NO: 1 provides genomic sequence of the GlyT1 gene, comprising the 5' regulatory region (upstream untranscribed region), the exons and introns, and the 3' regulatory region (downstream untranscribed region).
[0049]SEQ ID NOs: 2-9 provide novel exons of the GlyT1 gene.
[0050]SEQ ID NOs: 10-13 provide DNA sequences encoding previously known GlyT1 variants (GlyT1a, GlyT1b, GlyT1c, GlyT1d).
[0051]SEQ ID NOs:14-21 provide novel DNA sequences encoding novel GlyT1 variants (Genset variants 1-8).
[0052]SEQ ID NOs:22-25 provide protein sequences of previously known GlyT1 variants.
[0053]SEQ ID NOs:26-33 provide protein sequences of the novel Genset GlyT1 variants.
[0054]SEQ ID NOs:34 and 35 provide the sequences of oligonucleotides SLC6A9LF and SLC6A9LR, which were used in the cloning of the presently-provided GlyT1 variants.
[0055]SEQ ID NO:36 provides a primer sequence containing the additional PU 5' sequence described further in Example 2.
[0056]SEQ ID NO:37 provides a primer sequence containing the additional RP 5' sequence described further in Example 2.
[0057]In accordance with the regulations relating to Sequence Listings, the following codes have been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. The code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while the other allele is a cytosine. The code "m" in the sequences indicates that one allele of the polymorphic base is an adenine, while the other allele is an cytosine. The code "k" in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine. The code "s" in the sequences indicates that one allele of the polymorphic base is a guanine, while the other allele is a cytosine. The code "w" in the sequences indicates that one allele of the polymorphic base is an adenine, while the other allele is an thymine.
[0058]In some instances, the polymorphic bases of biallelic markers alter the identity of one or more amino acids in the encoded polypeptide. This is indicated in the accompanying Sequence Listing by use of the feature VARIANT, placement of an Xaa at the position of the polymorphic amino acid, and definition of Xaa as the two alternative amino acids. For example if one allele of a biallelic marker is the codon CAC, which encodes histidine, while the other allele of the biallelic marker is CAA, which encodes glutamine, the Sequence Listing for the encoded polypeptide will contain an Xaa at the location of the polymorphic amino acid. In this instance, Xaa would be defined as being histidine or glutamine.
[0059]In other instances, Xaa may indicate an amino acid whose identity is unknown because of nucleotide sequence ambiguity. In this instance, the feature UNSURE is used, placement of an Xaa at the position of the unknown amino acid and definition of Xaa as being any of the 20 amino acids or a limited number of amino acids suggested by the genetic code.
DETAILED DESCRIPTION
[0060]The present invention concerns novel polynucleotides and polypeptides related to the GlyT1 gene. Oligonucleotide probes and primers hybridizing specifically with these novel polynucleotides are also part of the invention. A further object of the invention consists of recombinant vectors comprising any of the nucleic acid sequences described in the present invention, as well as cell hosts comprising said nucleic acid sequences or recombinant vectors. The invention also encompasses methods of screening for molecules which modulate the activity of the present proteins. The invention also deals with antibodies directed specifically against the present polypeptides, which are useful as diagnostic reagents.
DEFINITIONS
[0061]Before describing the invention in greater detail, the following definitions are set forth to illustrate and define the meaning and scope of the terms used to describe the invention herein.
[0062]The terms "GlyT1 gene", when used herein, encompasses genomic, mRNA and cDNA sequences encoding the GlyT1 protein, including the untranslated regulatory regions of the genomic DNA, and including any of the herein-described variants.
[0063]A GlyT1 "variant" can refer to any GlyT1 polynucleotide or polypeptide, in particular a GlyT1 polypeptide or polynucleotide differing at one or more nucleotides or amino acids from other GlyT1 sequences, especially differing from other GlyT1 sequences as a result of differential mRNA splicing. Most specifically, GlyT1 variants refer to the novel GlyT1 polynucleotides and polypeptides shown here as SEQ ID NOs: 14-21 and 26-33, and to conservatively substituted relatives thereof.
[0064]The term "heterologous protein", when used herein, is intended to designate any protein or polypeptide other than a GlyT1 protein of interest.
[0065]A "functional" glycine transporter refers to any polypeptide with one or more detectable activities of glycine transporters such as full-length GlyT1, such as the ability to transport glycine across a membrane in in vitro or in vivo assays, and also including glycine binding, neuronal activation in cells expressing the transporter, interaction with additional ligands, etc. Examples of such assays can be found in the section entitled, "Methods for identifying modulators of GlyT1 activity."
[0066]The term "isolated" requires that the material be removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such q polynucleotide could be part of a vector and/or such polynucleotide or polypeptide could be part of a composition, and still be isolated in that the vector or composition is not part of its natural environment.
[0067]For example, a naturally-occurring polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated. Specifically excluded from the definition of "isolated" are: naturally-occurring chromosomes (such as chromosome spreads), artificial chromosome libraries, genomic libraries, and cDNA libraries that exist either as an in vitro nucleic acid preparation or as a transfected/transformed host cell preparation, wherein the host cells are either an in vitro heterogeneous preparation or plated as a heterogeneous population of single colonies. Also specifically excluded are the above libraries wherein a specified polynucleotide makes up less than 5% of the number of nucleic acid inserts in the vector molecules. Further specifically excluded are whole cell genomic DNA or whole cell RNA preparations (including said whole cell preparations which are mechanically sheared or enzymatically digested). Further specifically excluded are the above whole cell preparations as either an in vitro preparation or as a heterogeneous mixture separated by electrophoresis (including blot transfers of the same) wherein the polynucleotide of the invention has not further been separated from the heterologous polynucleotides in the electrophoresis medium (e.g., further separating by excising a single band from a heterogeneous band population in an agarose gel or nylon blot).
[0068]The term "purified" does not require absolute purity; rather, it is intended as a relative definition. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated. As an example, purification from 0.1% concentration to 10% concentration is two orders of magnitude. To illustrate, individual cDNA clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either from the library or from total human DNA. The cDNA clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA and subsequently isolating individual clones from that library results in an approximately 104-106 fold purification of the native message.
[0069]The term "purified" is further used herein to describe a polypeptide or polynucleotide of the invention which has been separated from other compounds including, but not limited to, polypeptides or polynucleotides, carbohydrates, lipids, etc. The term "purified" may be used to specify the separation of monomeric polypeptides of the invention from oligomeric forms such as homo- or hetero-dimers, trimers, etc. The term "purified" may also be used to specify the separation of covalently closed polynucleotides from linear polynucleotides. A polynucleotide is substantially pure when at least about 50%, preferably 60 to 75% of a sample exhibits a single polynucleotide sequence and conformation (linear versus covalently close). A substantially pure polypeptide or polynucleotide typically comprises about 50%, preferably 60 to 90% weight/weight of a polypeptide or polynucleotide sample, respectively, more usually about 95%, and preferably is over about 99% pure. Polypeptide and polynucleotide purity, or homogeneity, is indicated by a number of means well known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed by visualizing a single band upon staining the gel. For certain purposes higher resolution can be provided by using HPLC or other means well known in the art. As an alternative embodiment, purification of the polypeptides and polynucleotides of the present invention may be expressed as "at least" a percent purity relative to heterologous polypeptides and polynucleotides (DNA, RNA or both). As a preferred embodiment, the polypeptides and polynucleotides of the present invention are at least; 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 96%, 98%, 99%, or 100% pure relative to heterologous polypeptides and polynucleotides, respectively. As a further preferred embodiment the polypeptides and polynucleotides have a purity ranging from any number, to the thousandth position, between 90% and 100% (e.g., a polypeptide or polynucleotide at least 99.995% pure) relative to either heterologous polypeptides or polynucleotides, respectively, or as a weight/weight ratio relative to all compounds and molecules other than those existing in the carrier. Each number representing a percent purity, to the thousandth position, may be claimed as individual species of purity.
[0070]The term "polypeptide" refers to a polymer of amino acids without regard to the length of the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not specify or exclude post-expression modifications of polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide. Also included within the definition are polypeptides which contain one or more analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids which only occur naturally in an unrelated biological system, modified amino acids from mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
[0071]The term "recombinant polypeptide" is used herein to refer to polypeptides that have been artificially designed and which comprise at least two polypeptide sequences that are not found as contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides which have been expressed from a recombinant polynucleotide, i.e. using recombinant DNA methods.
[0072]As used herein, the term "non-human animal" refers to any non-human vertebrate, birds and more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly embrace human subjects unless preceded with the term "non-human".
[0073]As used herein, the term "antibody" refers to a polypeptide or group of polypeptides which are comprised of at least one binding domain, where an antibody binding domain is formed from the folding of variable domains of an antibody molecule to form three-dimensional binding spaces with an internal surface shape and charge distribution complementary to the features of an antigenic determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, Fab', F(ab)2, and F(ab')2 fragments.
[0074]As used herein, an "antigenic determinant" is the portion of an antigen molecule, in this case a GLYT1 polypeptide, that determines the specificity of the antigen-antibody reaction. An "epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 amino acids in a spatial conformation which is unique to the epitope. Generally an epitope comprises at least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for determining the amino acids which make up an epitope include x-ray crystallography, 2-dimensional nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et al. 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506.
[0075]Throughout the present specification, the expression "nucleotide sequence" may be employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to the sequence information (i.e. the succession of letters chosen among the four base letters) that biochemically characterizes a specific DNA or RNA molecule.
[0076]As used interchangeably herein, the terms "nucleic acids", "oligonucleotides", and "polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single-stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. Although the term "nucleotide" is also used herein to encompass "modified nucleotides" which comprise at least one modifications (a) an alternative linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT publication No. WO 95/04064. The polynucleotide sequences of the invention may be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any purification methods known in the art.
[0077]As used herein, the term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. For instance, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the coding sequence.
[0078]The terms "trait" and "phenotype" are used interchangeably herein and refer to any visible, detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility to a disease for example. Typically the terms "trait" or "phenotype" are used herein to refer to symptoms of, or susceptibility to a disease, a beneficial response to or side effects related to a treatment. Preferably, said trait can be, without being limited to, neurological and psychiatric conditions such as schizophrenia.
[0079]The term "allele" is used herein to refer to variants of a nucleotide sequence. A biallelic polymorphism has two forms. Diploid organisms may be homozygous or heterozygous for an allelic form.
[0080]The term "heterozygosity rate" is used herein to refer to the incidence of individuals in a population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity rate is on average equal to 2 Pa (1-Pa), where Pa is the frequency of the least common allele. In order to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to allow a reasonable probability that a randomly selected person will be heterozygous.
[0081]The term "genotype" as used herein refers the identity of the alleles present in an individual or a sample. In the context of the present invention, a genotype preferably refers to the description of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample or an individual for a biallelic marker involves determining the specific allele or the specific nucleotide carried by an individual at a biallelic marker.
[0082]The term "mutation" as used herein refers to a difference in DNA sequence between or among different genomes or individuals which has a frequency below 1%.
[0083]The term "haplotype" refers to a combination of alleles present in an individual or a sample. In the context of the present invention, a haplotype preferably refers to a combination of biallelic marker alleles found in a given individual and which may be associated with a phenotype.
[0084]The term "polymorphism" as used herein refers to the occurrence of two or more alternative genomic sequences or alleles between or among different genomes or individuals, "Polymorphic" refers to the condition in which two or more variants of a specific genomic sequence can be found in a population. A "polymorphic site" is the locus at which the variation occurs. A single nucleotide polymorphism is the replacement of one nucleotide by another nucleotide at the polymorphic site. Deletion of a single nucleotide or insertion of a single nucleotide also gives rise to single nucleotide polymorphisms. In the context of the present invention, "single nucleotide polymorphism" preferably refers to a single nucleotide substitution. Typically, between different individuals, the polymorphic site may be occupied by two different nucleotides.
[0085]The term "biallelic polymorphism" and "biallelic marker" are used interchangeably herein to refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker site. Typically, the frequency of the less common allele of the biallelic markers of the present invention has been validated to be greater than 1%, preferably the frequency is greater than 10%, more preferably the frequency is at least 20% (i.e. heterozygosity rate of at least 0.32), even more preferably the frequency is at least 30% (i.e. heterozygosity rate of at least 0.42). A biallelic marker wherein the frequency of the less common allele is 30% or more is termed a "high quality biallelic marker".
[0086]The location of nucleotides in a polynucleotide with respect to the center of the polynucleotide are described herein in the following manner. When a polynucleotide has an odd number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the polynucleotide is considered to be "at the center" of the polynucleotide, and any nucleotide immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even number of nucleotides, there would be a bond and not a nucleotide at the center of the polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would be considered to be "within 2 nucleotides of the center", and so on. For polymorphisms which involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele or biallelic marker is "at the center" of a polynucleotide if the difference between the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 3' end of the polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of the polymorphism and the 5' end of the polynucleotide is zero or one nucleotide. If this difference is 0 to 3, then the polymorphism is considered to be "within 1 nucleotide of the center." If the difference is 0 to 5, the polymorphism is considered to be "within 2 nucleotides of the center." If the difference is 0 to 7, the polymorphism is considered to be "within 3 nucleotides of the center," and so on.
[0087]The term "upstream" is used herein to refer to a location which is toward the 5' end of the polynucleotide from a specific reference point, and "downstream" refers to locations in the 3' direction.
[0088]The terms "base paired" and "Watson & Crick base paired" are used interchangeably herein to refer to nucleotides which can be hydrogen bonded to one another by virtue of their sequence identities in a manner like that found in double-helical DNA with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, L., Biochemistry, 4th edition, 1995).
[0089]The terms "complementary" or "complement thereof" are used herein to refer to the sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another specified polynucleotide throughout the entirety of the complementary region. For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide is paired with its complementary base. Complementary bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym from "complementary polynucleotide", "complementary nucleic acid" and "complementary nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their sequences and not any particular set of conditions under which the two polynucleotides would actually bind.
Variants and Fragments
1--Polynucleotides
[0090]The invention also relates to variants and fragments of the polynucleotides described herein.
[0091]Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, including those applied to polynucleotides, cells or organisms. Generally, differences are limited so that the nucleotide sequences of the reference and the variant are closely similar overall and, in many regions, identical.
[0092]Variants of polynucleotides according to the invention include, without being limited to, nucleotide sequences which are at least 95% identical to a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos:2-9 or 14-21, or to any polynucleotide fragment of at least 12 consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 2-9 or 14-21, and preferably at least 99% identical, more particularly at least 99.5% identical, and most preferably at least 99.8% identical to a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 2-9 or 14-21, or to any polynucleotide fragment of at least 12 consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos: 2-9 or 14-21.
[0093]In particular, the present invention comprises polynucleotide and polypeptide sequences spanning regions comprising biallelic markers within the GlyT1 gene. Methods of identifying such markers, and of using them for diagnosis, gene mapping, association studies, and other applications are well known to those of skill in the art.
[0094]Nucleotide changes present in a variant polynucleotide may be silent, which means that they do not alter the amino acids encoded by the polynucleotide. However, nucleotide changes may also result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence. The substitutions, deletions or additions may involve one or more nucleotides. The variants may be altered in coding or non-coding regions or both. Alterations in the coding regions may produce conservative or non-conservative amino acid substitutions, deletions or additions.
[0095]In the context of the present invention, particularly preferred embodiments are those in which the polynucleotides encode polypeptides which retain substantially the same biological function or activity as the mature GlyT1 protein, or those in which the polynucleotides encode polypeptides which maintain or increase a particular biological activity, while reducing a second biological activity
[0096]A polynucleotide fragment is a polynucleotide having a sequence that is entirely the same as part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a GlyT1 gene, and variants thereof. The fragment can be a portion of an intron or an exon of a GlyT1 gene. It can also be a portion of the regulatory regions of GlyT1.
[0097]Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or they may be comprised within a single larger polynucleotide of which they form a part or region. Indeed, several of these fragments may be present within a single larger polynucleotide.
[0098]Optionally, such fragments may consist of, or consist essentially of a contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 nucleotides in length.
2--Polypeptides
[0099]The invention also relates to variants, fragments, analogs and derivatives of the polypeptides described herein, including mutated GlyT1 proteins.
[0100]The variant may be 1) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue and such substituted amino acid residue may or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid residues includes a substituent group, or 3) one in which the mutated GlyT1 is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated GlyT1, such as a leader or secretory sequence or a sequence which is employed for purification of the mutated GlyT1 or a preprotein sequence. Such variants are deemed to be within the scope of those skilled in the art.
[0101]A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part but not all of a given polypeptide sequence, preferably a polypeptide encoded by a GlyT1 gene and variants thereof.
[0102]In the case of an amino acid substitution in the amino acid sequence of a polypeptide according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. The expression "equivalent" amino acid is used herein to designate any amino acid that may be substituted for one of the amino acids having similar properties, such that one skilled in the art of peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to be substantially unchanged. Generally, the following groups of amino acids represent equivalent changes: (1) Ala, Pro, Gly, Glu, Asp, Gln, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, Ile, Leu, Met, Ala, Phe; (4) Lys, Arg, H is; (5) Phe, Tyr, Trp, His.
[0103]A specific embodiment of a modified GlyT1 peptide molecule of interest according to the present invention, includes, but is not limited to, a peptide molecule which is resistant to proteolysis, is a peptide in which the --CONH-- peptide bond is modified and replaced by a (CH2NH) reduced bond, a (NHCO) retro inverso bond, a (CH2-O) methylene-oxy bond, a (CH2-S) thiomethylene bond, a (CH2CH2) carba bond, a (CO--CH2) cetomethylene bond, a (CHOH--CH2) hydroxyethylene bond), a (N--N) bound, a E-alcene bond or also a --CH═CH-- bond. The invention also encompasses a human GlyT1 polypeptide or a fragment or a variant thereof in which at least one peptide bond has been modified as described above.
[0104]Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or they may be comprised within a single larger polypeptide of which they form a part or region.
[0105]However, several fragments may be comprised within a single larger polypeptide. As representative examples of polypeptide fragments of the invention, there may be mentioned those which have from about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to 55 amino acids long. In one embodiment, the fragments contain at least one amino acid mutation in the GlyT1 protein.
Identity Between Nucleic Acids or Polypeptides
[0106]The terms "percentage of sequence identity" and "percentage homology" are used interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety of sequence comparison algorithms and programs known in the art. Such algorithms and programs include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988; Altschul et al., 1990; Thompson et al., 1994; Higgins et al., 1996; Altschul et al., 1990; Altschul et al., 1993). In a particularly preferred embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic Local Alignment Search Tool ("BLAST") which is well known in the art (see, e.g., Karlin and Altschul, 1990; Altschul et al., 1990, 1993, 1997). In particular, five specific BLAST programs are used to perform the following task:
[0107](1) BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence database;
[0108](2) BLASTN compares a nucleotide query sequence against a nucleotide sequence database;
[0109](3) BLASTX compares the six-frame conceptual translation products of a query nucleotide sequence (both strands) against a protein sequence database;
[0110](4) TBLASTN compares a query protein sequence against a nucleotide sequence database translated in all six reading frames (both strands); and
[0111](5) TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
[0112]The BLAST programs identify homologous sequences by identifying similar segments, which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix (Gonnet et al., 1992; Henikoff and Henikoff, 1993). Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs evaluate the statistical significance of all high-scoring segment pairs identified, and preferably selects those segments which satisfy a user-specified threshold of significance, such as a user-specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990).
[0113]The BLAST programs may be used with the default parameters or with modified parameters provided by the user.
Stringent Hybridization Conditions
[0114]For the purpose of defining such a hybridizing nucleic acid according to the invention, the stringent hybridization conditions are the followings:
[0115]The hybridization step is realized at 65° C. in the presence of 6×SSC buffer, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml of salmon sperm DNA.
[0116]The hybridization step is followed by four washing steps: [0117]two washings during 5 min, preferably at 65° C. in a 2×SSC and 0.1% SDS buffer; [0118]one washing during 30 min, preferably at 65° C. in a 2×SSC and 0.1% SDS buffer, [0119]one washing during 10 min, preferably at 65° C. in a 0.1×SSC and 0.1% SDS buffer,these hybridization conditions being suitable for a nucleic acid molecule of about 20 nucleotides in length. There is no need to say that the hybridization conditions described above are to be adapted according to the length of the desired nucleic acid, following techniques well known to the one skilled in the art. Suitable hybridization conditions may for example be adapted according to the teachings disclosed in the book of Hames and Higgins (1985).
[0120]GlyT1 cDNA Sequences
[0121]The expression of the GlyT1 gene has been shown to lead to the production of a number of distinct mRNA species, the novel nucleic acid sequences of eight of which are set forth herein as SEQ ID Nos: 14-21.
[0122]Another object of the invention is a purified, isolated, or recombinant nucleic acid comprising the nucleotide sequence of SEQ ID Nos:14-21, complementary sequences thereto, as well as allelic variants, and fragments thereof. Moreover, preferred polynucleotides of the invention include purified, isolated, or recombinant GlyT1 cDNAs consisting of, consisting essentially of, or comprising the sequence of SEQ ID Nos:2-9. Particularly preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID Nos:2-9 or 14-21, or the complements thereof.
[0123]The invention also pertains to a purified or isolated nucleic acid comprising a polynucleotide having at least 95% nucleotide identity with a polynucleotide of SEQ ID Nos:2-9 or 14-21, advantageously 99% nucleotide identity, preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a polynucleotide of SEQ ID Nos: 2-9 or 14-21, or a sequence complementary thereto or a biologically active fragment thereof.
[0124]Another object of the invention relates to purified, isolated or recombinant nucleic acids comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined herein, with a polynucleotide comprising a sequence of SEQ ID Nos: 2-9 or 14-21, or a sequence complementary thereto, or a variant thereof or a biologically active fragment thereof.
[0125]The novel cDNAs of the present invention comprise novel combinations of previously identified exons, as well as novel exons. For example, Table I provides a list of the exons present in previously-identified variants GlyT1a-GlyT1d, as well as the presently provided novel variants. SEQ ID NO: 1 provides the genomic DNA sequence of the GlyT1 gene, and notes the positions of each of the herein-referenced exons. The exon structure of the novel variants is also presented in FIG. 1.
TABLE-US-00001 TABLE I Variants of the GlyT1 gene Size of Encoded Variant Exon configuration protein (aa) 1a 1, 2, 5-16 633 1b 3, 5-16 638 1c 3-16 692 1d 1a, 2, 4-16 687 Genset Variant 1 3, 4d, 5-16 184 Genset Variant 2 3, 5a, 6-16 125 Genset Variant 3 3, 6-16 64 Genset Variant 4 3, 5-7, 7bis, 8-16 229 Genset Variant 5 3, 4ter, 5-12, 13a, 14-16 94 Genset Variant 6 3, 5-12, 13a, 14-16 456 Genset Variant 7 3, 5-14, 15a, 16 550 Genset Variant 8 3, 4bis, 5-16 188
[0126]The cDNA of SEQ ID No: 14 (Genset variant 1) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-789, and a 3'-UTR region starting from the nucleotide at position 790 and ending at the nucleotide at position 2265. The protein encoded by this cDNA comprises 184 amino acids and is shown as SEQ ID NO:26.
[0127]The cDNA of SEQ ID No: 15 (Genset variant 2) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-612, and a 3'-UTR region starting from the nucleotide at position 613 and ending at the nucleotide at position 2088. The protein encoded by this cDNA comprises 125 amino acids and is shown as SEQ ID NO:27.
[0128]The cDNA of SEQ ID No: 16 (Genset variant 3) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-429, and a 3'-UTR region starting from the nucleotide at position 430 and ending at the nucleotide at position 2014. The protein encoded by this cDNA comprises 64 amino acids and is shown as SEQ ID NO:28.
[0129]The cDNA of SEQ ID No: 17 (Genset variant 4) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-924, and a 3'-UTR region starting from the nucleotide at position 925 and ending at the nucleotide at position 2242. The protein encoded by this cDNA comprises 229 amino acids and is shown as SEQ ID NO:29.
[0130]The cDNA of SEQ ID No: 18 (Genset variant 5) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-519, and a 3'-UTR region starting from the nucleotide at position 520 and ending at the nucleotide at position 2322. The protein encoded by this cDNA comprises 94 amino acids and is shown as SEQ ID NO:30.
[0131]The cDNA of SEQ ID No: 19 (Genset variant 6) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-1605, and a 3'-UTR region starting from the nucleotide at position 1606 and ending at the nucleotide at position 2167. The protein encoded by this cDNA comprises 456 amino acids and is shown as SEQ ID NO:31.
[0132]The cDNA of SEQ ID No: 20 (Genset variant 7) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-1887, and a 3'-UTR region starting from the nucleotide at position 1888 and ending at the nucleotide at position 2371. The protein encoded by this cDNA comprises 550 amino acids and is shown as SEQ ID NO:32.
[0133]The cDNA of SEQ ID No: 21 (Genset variant 8) includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the nucleotide in position 234, an open reading frame spanning the nucleotide positions 235-801, and a 3'-UTR region starting from the nucleotide at position 802 and ending at the nucleotide at position 2277. The protein encoded by this cDNA comprises 188 amino acids and is shown as SEQ ID NO:33.
[0134]Consequently, the invention concerns a purified, isolated, and/or recombinant nucleic acid comprising a nucleotide sequence of the 5'UTR of any of the herein-provided GlyT1 cDNAs, a sequence complementary thereto, or an allelic variant thereof. The invention also concerns a purified, isolated, and/or recombinant nucleic acid comprising a nucleotide sequence of the 3'UTR of any of the herein-provided GlyT1 cDNAs, a sequence complementary thereto, or an allelic variant thereof.
[0135]While this section is entitled "GLYT1 cDNA Sequences," it should be noted that nucleic acid fragments of any size and sequence may also be comprised by the polynucleotides described in this section, flanking the genomic sequences of GLYT1 on either side or between two or more such genomic sequences.
Coding Regions
[0136]The open reading frames of the novel GlyT1 cDNAs provided herein are contained in the corresponding mRNAs of SEQ ID Nos:14-21, as outlined in the previous section. The present invention also embodies isolated, purified, and/or recombinant polynucleotides which encode a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, 100, 200 or more amino acids of any of SEQ ID Nos:26-33.
[0137]Certain of the present novel GlyT1 cDNAs comprise novel exons, which are shown as SEQ ID Nos:2-9. Thus, the present invention also provides purified, isolated, or recombinant polynucleotides that comprise a nucleotide sequence of SEQ ID Nos: 2-9, complementary sequences thereto, as well as allelic variants and fragments thereof. Particularly preferred nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 300, or more nucleotides of SEQ ID Nos:2-9, the complements thereof, or which comprise a nucleotide sequence that is at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more identical to any of the sequences shown as SEQ ID NOs:2-9. In a preferred embodiment, the present invention provides a nucleic acid sequence that is at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more identical to any of the sequences shown as SEQ ID NOs 14-21, or which hybridize under stringent or moderate conditions to any of the sequences shown as SEQ ID NOs: 14-21, wherein the nucleic acid sequence comprises any of the sequences shown as SEQ ID NOs:2-9.
[0138]Any of the above-disclosed polynucleotides containing a coding sequence of the GlyT1 gene may be expressed in a desired host cell or a desired host organism, when the polynucleotide is placed under the control of suitable expression signals. The expression signals may be either the expression signals contained in the regulatory regions in the GlyT1 gene of the invention, or, in contrast, the signals may be exogenous regulatory nucleic sequences. Such a polynucleotide, when placed under the suitable expression signals, may also be inserted in a vector for its expression and/or amplification.
Regulatory Sequences of GlyT1
[0139]As mentioned, the genomic sequence of the GlyT1 gene contains regulatory sequences both in the non-coding 5'-flanking region and in the non-coding 3'-flanking region that border the coding regions containing the exons of the various cDNAs. The positions of these 5'-regulatory sequence of the novel GlyT1 cDNAs are described in the section entitled, "GlyT1 cDNA sequences," supra.
[0140]Biologically active polynucleotide fragments or variants of any of the herein described novel cDNAs (e.g. the 5'UTRs or 3'UTRs) can be detected, e.g., by inserting a candidate sequence into a recombinant vector carrying a detectable marker gene (i.e. beta galactosidase, chloramphenicol acetyl transferase, etc.) (see, e.g., Sambrook et al. (1989)).
[0141]Polynucleotides derived from any of these 5' and 3' regulatory regions are useful, inter alia, in the detection of at least a copy of any of the nucleotide sequences of SEQ ID No: 1 or 14-21, or a fragment thereof, in a test sample. Polynucleotides carrying the regulatory elements located at the 5' end and at the 3' end of the GLYT1 coding region may also be used to control the transcriptional and translational activity of an heterologous polynucleotide of interest. In addition, polynucleotides from regulatory regions of a GlyT1 gene can be used to identify GlyT1 or related genes elsewhere in the genome of the same species or in the genomes of heterologous species.
[0142]Thus, the present invention also concerns a purified or isolated nucleic acid comprising a polynucleotide which is selected from the group consisting of the 5' and 3' regulatory regions, a sequence complementary thereto, and biologically active fragments or variants thereof,
[0143]The invention also pertains to a purified or isolated nucleic acid comprising a polynucleotide having at least 95% nucleotide identity with a polynucleotide selected from the group consisting of the 5' and 3' regulatory regions, advantageously 99% nucleotide identity, preferably 99.5% nucleotide identity and most preferably 99.8% nucleotide identity with a polynucleotide selected from the group consisting of the 5' and 3' regulatory regions, a sequence complementary thereto, a variant thereof, and a biologically active fragment thereof.
[0144]Another object of the invention consists of purified, isolated or recombinant nucleic acids comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined herein, with a polynucleotide selected from the group consisting of the nucleotide sequences of the 5'- and 3' regulatory regions, a sequence complementary thereto, a variant thereof, and a biologically active fragment thereof.
[0145]Preferred fragments of the 5' regulatory region have a length of about 1500 or 1000 nucleotides, preferably of about 500 nucleotides, more preferably about 400 nucleotides, even more preferably 300 nucleotides and most preferably about 200 nucleotides.
[0146]Preferred fragments of the 3' regulatory region are at least 50, 100, 150, 200, 300 or 400 bases in length.
[0147]Biologically active" regulatory polynucleotide derivatives of SEQ ID Nos: 14-21 are polynucleotides comprising or alternatively consisting of a fragment of said polynucleotide which is functional as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a recombinant cell host. It could act either as an enhancer or as a repressor of transcription or translation. For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said regulatory polynucleotide contains nucleotide sequences which contain transcriptional and translational regulatory information. Such sequences can then be "operably linked" to nucleotide sequences which encode the desired polypeptide or the desired polynucleotide.
[0148]The regulatory polynucleotides of the invention may be prepared from the nucleotide sequence of SEQ ID No: 1 or any of SEQ ID NOs:14-21 by cleavage using suitable restriction enzymes, as described for example in Sambrook et al. (1989). The regulatory polynucleotides may also be prepared by digestion of SEQ ID No:1 or any of SEQ ID NOs:14-21 by an exonuclease enzyme, such as Bal31 (Wabiko et al., 1986). These regulatory polynucleotides can also be prepared by nucleic acid chemical synthesis, as described elsewhere in the specification.
[0149]The regulatory polynucleotides according to the invention may be part of a recombinant expression vector that may be used to express a coding sequence in a desired host cell or host organism. The recombinant expression vectors according to the invention are described elsewhere in the specification.
[0150]The desired nucleic acids encoded by the above-described polynucleotide, e.g. an RNA molecule, may be complementary to a desired coding polynucleotide, for example to the GlyT1 coding sequence, and thus useful as an antisense polynucleotide.
[0151]Such a polynucleotide may be included in a recombinant expression vector in order to express the desired polypeptide or the desired nucleic acid in host cell or in a host organism. Suitable recombinant vectors that contain a polynucleotide such as described herein are disclosed elsewhere in the specification.
Polynucleotide Constructs
[0152]The terms "polynucleotide construct" and "recombinant polynucleotide" are used interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have been artificially designed and which comprise at least two nucleotide sequences that are not found as contiguous nucleotide sequences in their initial natural environment.
[0153]In order to study the physiological and phenotypic consequences of a lack of synthesis of the GlyT1 protein, both at the cell level and at the multi-cellular organism level, the invention also encompasses DNA constructs and recombinant vectors enabling a conditional expression of specific cDNAs encoded by the GlyT1 genomic sequence or variants, derivatives, or fragments thereof.
[0154]The present invention embodies recombinant vectors comprising any one of the polynucleotides described in the present invention. Preferably, the polynucleotide constructs according to the present invention comprise any of the polynucleotides described in the "GlyT1 cDNA Sequences" section, the "Coding Regions" section, and the "Oligonucleotide Probes And Primers" section.
[0155]One preferred DNA construct is based on the tetracycline resistance operon tet from E. coli transposon Tn10 for controlling the GlyT1 gene expression, such as described by Gossen et al. (1992, 1995) and Furth et al. (1994). Such a DNA construct contains seven tet operator sequences from Tn10 (tetop) that are fused to either a minimal promoter and/or a 5'-regulatory sequence of the GlyT1 gene, said minimal promoter or said GlyT1 regulatory sequence being operably linked to a polynucleotide of interest that codes either for a sense or an antisense oligonucleotide or for a polypeptide, including a GlyT1 polypeptide (preferably a novel GlyT1 polypeptide provided herein) or a peptide fragment thereof. This DNA construct is functional as a conditional expression system for the nucleotide sequence of interest when the same cell also comprises a nucleotide sequence coding for either the wild type (tTA) or the mutant (rTA) repressor fused to the activating domain of viral protein VP 16 of herpes simplex virus, placed under the control of a promoter, such as the HCMVIE1 enhancer/promoter or the MMTV-LTR. Indeed, a preferred DNA construct of the invention comprise both the polynucleotide containing the tet operator sequences and the polynucleotide containing a sequence coding for the tTA or the rTA repressor.
[0156]The present DNA constructs may be used to introduce a desired nucleotide sequence of the invention, preferably a novel GlyT1 cDNA sequence, within a predetermined location of the targeted genome, leading either to the generation of an altered copy of a targeted gene (knock-out homologous recombination) or to the replacement of a copy of the targeted gene by another copy sufficiently homologous to allow an homologous recombination event to occur (knock-in homologous recombination),
Nuclear Antisense DNA Constructs
[0157]Other compositions containing a vector of the invention comprising an oligonucleotide fragment of any of the nucleic acid sequences shown as SEQ ID Nos: 2-9 or 14-21, preferably a fragment including the start codon of any of the present novel GlyT1 cDNAs, as an antisense tool that inhibits the expression of the corresponding GlyT1 cDNA. Preferred methods using antisense polynucleotide according to the present invention are the procedures described by Sczakiel et al. (1995) or those described in PCT Application No WO 95/24223, the disclosures of which are incorporated by reference herein in their entirety.
[0158]Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that are complementary to the 5' end of the GlyT1 mRNA. In one embodiment, a combination of different antisense polynucleotides complementary to different parts of the desired targeted gene are used. Preferred antisense polynucleotides according to the present invention are complementary to a sequence of any of the present GlyT1 mRNAs that contains either the translation initiation codon ATG or a splicing site. Further preferred antisense polynucleotides according to the invention are complementary of a splicing site of any of the present GlyT1 mRNAs.
[0159]Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II transcripts are produced without poly(A) at their 3' ends, these antisense polynucleotides being incapable of export from the nucleus, such as described by Liu et al. (1994). In a preferred embodiment, these GlyT1 antisense polynucleotides also comprise, within the ribozyme cassette, a histone stem-loop structure to stabilize cleaved transcripts against 3'-5' exonucleolytic degradation, such as the structure described by Eckner et al. (1991).
[0160]Oligonucleotide Probes and Primers
[0161]Polynucleotides derived from the GlyT1 gene are useful in order to detect the expression of any of the novel cDNAs shown as SEQ ID Nos: 14-21, or any cDNA comprising any of the novel exons shown as SEQ ID Nos:2-9, or fragments, complements, or variants thereof in a test sample.
[0162]Particularly preferred probes and primers of the invention include isolated, purified, or recombinant polynucleotides comprising, consisting of, or consisting essentially of, a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 or more nucleotides of SEQ ID Nos:2-9 or 14-21, or the complements thereof.
[0163]Thus, the invention also relates to nucleic acid probes characterized in that they hybridize specifically, under the stringent hybridization conditions defined above, with any of the novel cDNAs or exons described herein, e.g. as shown as SEQ ID NOs:2-9 or 14-21, or sequences complementary thereto.
[0164]In a preferred embodiment, said probes comprises, consists of, or consists essentially of a sequence selected from SEQ ID NOs:34, 35, 36, and 37, and the complementary sequences thereto.
[0165]In an additional embodiment, the invention encompasses polynucleotides for use in hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for determining the expression of particular cDNA species encoded by the GlyT1 gene, e.g. the expression of any of the herein provided novel cDNAs, or the expression of any cDNAs comprising any of the herein-provided novel exons.
[0166]The invention concerns the use of the polynucleotides according to the invention for detecting the expression of any of the herein--provided novel cDNAs, or the expression of any cDNAs comprising any of the herein-provided novel exons, preferably in hybridization assays, sequencing assays, microsequencing assays, enzyme-based mismatch detection assays, or by amplifying segments of nucleotides comprising any of the present novel exons, or spanning any novel exon-exon junctions found in any of the present novel cDNAs (i.e. novel junctions resulting from novel exon configurations; see, e.g., Table I).
[0167]A probe or a primer according to the invention preferably has between 8 and 1000 nucleotides in length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 nucleotides in length. More particularly, the length of these probes and primers typically ranges from 8, 10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence and generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to form hairpin structures. The appropriate length for primers and probes under a particular set of assay conditions may be empirically determined by one of skill in the art.
[0168]The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C content. The higher the G+C content of the primer or probe, the higher is the melting temperature because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in the probes of the invention usually ranges between 10 and 75%, preferably between 35 and 60%, and more preferably between 40 and 55%.
[0169]The primers and probes can be prepared by any suitable method, including, for example, cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as the phosphodiester method of Narang et al. (1979), the phosphodiester method of Brown et al. (1979), the diethylphosphoramidite method of Beaucage et al. (1981) and the solid support method described in EP 0 707 592.
[0170]Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs such as, for example peptide nucleic acids which are disclosed in PCT Application WO 92/20702, morpholino analogs which are described in U.S. Pat. Nos. 5,185,444; 5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and nucleic acid probes can be rendered non-extendable by modifying the 3' end of the probe such that the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of the probe can be functionalized with the capture or detection label to thereby consume or otherwise block the hydroxyl group. Alternatively, the 3' hydroxyl group simply can be cleaved, replaced or modified; U.S. Pat. No. 4,869,905 describes modifications which can be used to render a probe non-extendable.
[0171]Any of the polynucleotides of the present invention can be labeled, if desired, by incorporating any label known in the art to be detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include radioactive substances (including, 32P, 35S, 3H, 125I), fluorescent dyes (including, 5-bromodesoxyuridin, fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described in French patent No. FR-7810975 or by Urdea et al. (1988) or Sanchez-Pescador et al. (1988). In addition, the probes according to the present invention may have structural characteristics such that they allow the signal amplification, such structural characteristics being, for example, branched DNA probes as those described by Urdea et al. (1991) or in European patent No. EP 0 225 807 (Chiron).
[0172]A label can also be used to capture the primer, so as to facilitate the immobilization of either the primer or a primer extension product, such as amplified DNA, on a solid support. A capture label is attached to the primers or probes and can be a specific binding member which forms a binding pair with the solid phase reagent's specific binding member (e.g. biotin and streptavidin). Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be employed to capture or to detect the target DNA. Further, it will be understood that the polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it may be selected such that it binds a complementary portion of a primer or probe to thereby immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself serves as the binding member, those skilled in the art will recognize that the probe will contain a sequence or "tail" that is not complementary to the target. In the case where a polynucleotide primer itself serves as the capture label, at least a portion of the primer will be free to hybridize with a nucleic acid on a solid phase. DNA labeling techniques are well known to the skilled technician.
[0173]The probes of the present invention are useful for a number of purposes. They can be notably used in Southern hybridization to genomic DNA. The probes can also be used to detect PCR amplification products. They may also be used to detect mismatches in the GLYT1 gene or mRNA using other techniques.
[0174]Any of the polynucleotides, primers and probes of the present invention can be conveniently immobilized on a solid support. Solid supports are known to those skilled in the art and include the walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls of microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid phases include ionic, hydrophobic, covalent interactions and the like. A solid support, as used herein, refers to any material which is insoluble, or can be made insoluble by a subsequent reaction. The solid support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. Alternatively, the solid phase can retain an additional receptor which has the ability to attract and immobilize the capture reagent. The additional receptor can include a charged substance that is oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to the capture reagent. As yet another alternative, the receptor molecule can be any specific binding member which is immobilized upon (attached to) the solid support and which has the ability to immobilize the capture reagent through a specific binding reaction. The receptor molecule enables the indirect binding of the capture reagent to a solid support material before the performance of the assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, Duracytes® and other configurations known to those of ordinary skill in the art. The polynucleotides of the invention can be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, polynucleotides other than those of the invention may be attached to the same solid support as one or more polynucleotides of the invention.
Oligonucleotide Arrays
[0175]A substrate comprising a plurality of oligonucleotide primers or probes of the invention may be used, e.g., to detect expression of a plurality of any of the herein-provided cDNAs, or to detect the expression of one or more of the present cDNAs in conjunction with the expression of one or more heterologous genes.
[0176]Any polynucleotide provided herein may be attached in overlapping areas or at random locations on the solid support. Alternatively the polynucleotides of the invention may be attached in an ordered array wherein each polynucleotide is attached to a distinct region of the solid support which does not overlap with the attachment site of any other polynucleotide. Preferably, such an ordered array of polynucleotides is designed to be "addressable" where the distinct locations are recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. The knowledge of the precise location of each polynucleotides location makes these "addressable" arrays particularly useful in hybridization assays. Any addressable array technology known in the art can be employed with the polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is known as the GENECHIPS, and has been generally described in U.S. Pat. No. 5,143,854; PCT publications WO 90/15070 and 92/10092. These arrays may generally be produced using mechanical synthesis methods or light directed synthesis methods which incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the development of a technology generally identified as "Very Large Scale Immobilized Polymer Synthesis" (VLSIPS) in which, typically, probes are immobilized in a high density array on a solid surface of a chip. Examples of VLSIPS technologies are provided in U.S. Pat. Nos. 5,143,854; and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/11995, which describe methods for forming oligonucleotide arrays through techniques such as light-directed synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized on solid supports, further presentation strategies were developed to order and display the oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence information. Examples of such presentation strategies are disclosed in PCT Publications WO 94/12305, WO 94/11530, WO 97/29212 and WO 97/31256, the disclosures of which are incorporated herein by reference in their entireties.
[0177]Consequently, the invention concerns an array of nucleic acid molecules comprising at least one polynucleotide described above as probes and primers. Preferably, the invention concerns an array of nucleic acid comprising at least two polynucleotides described above as probes and primers.
GlyT1 Proteins and Polypeptide Fragments
[0178]The term "GlyT1 polypeptides" is used herein to embrace all of the proteins and polypeptides of the present invention. Also forming part of the invention are polypeptides encoded by the polynucleotides of the invention, as well as fusion polypeptides comprising such polypeptides. The invention embodies GlyT1 proteins from humans, including isolated or purified GlyT1 proteins consisting of, consisting essentially of, or comprising any of the sequences of SEQ ID Nos:26-33.
[0179]The invention concerns polypeptides encoded by a nucleotide sequence selected from the group consisting of SEQ ID Nos:2-9 or 14-21, a complementary sequence thereof or a fragment thereof.
[0180]The present invention embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos:26-33. The present invention also embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids encoded by any of the exons shown as SEQ ID NOs:2-9.
[0181]The invention also encompasses purified, isolated, or recombinant polypeptides comprising an amino acid sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% amino acid identity with any of the amino acid sequences of SEQ ID NO:26-33, or any of the amino acid sequences encoded by any of the nucleic acid sequences shown as SEQ ID NO:2-9 or 14-21, or a fragment thereof.
[0182]GlyT1 proteins are preferably isolated from human or mammalian tissue samples or expressed from human or mammalian genes. The GlyT1 polypeptides of the invention can be made using routine expression methods known in the art. For example, a polynucleotide encoding the desired polypeptide is ligated into an expression vector suitable for any convenient host. Either eukaryotic or prokaryotic host systems can be used to produce recombinant polypeptides. The polypeptide is then isolated from lysed cells or from the culture medium and purified to the extent needed for its intended use. Purification can be carried out using any technique known in the art, for example, differential extraction, salt fractionation, chromatography, centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for purifying proteins.
[0183]In addition, shorter protein fragments can be produced by chemical synthesis. Alternatively, the proteins of the invention can be extracted from cells or tissues of humans or non-human animals. Methods for purifying proteins are known in the art, and include the use of detergents or chaotropic agents to disrupt particles followed by differential extraction and separation of the polypeptides by ion exchange chromatography, affinity chromatography, sedimentation according to density, or gel electrophoresis.
[0184]Any GlyT1 polynucleotide, preferably a novel cDNA shown as SEQ ID NOs:14-21, can be used to express GlyT1 proteins and polypeptides. The nucleic acid encoding the GlyT1 protein or polypeptide to be expressed can be operably linked to a promoter in an expression vector using conventional cloning technology. The GlyT1 insert in the expression vector may comprise the full coding sequence for the GlyT1 protein or a portion thereof. For example, the GlyT1 derived insert may encode a polypeptide comprising at least 10 consecutive amino acids of the GlyT1 protein of SEQ ID Nos: 26-33, or a protein encoded by any of the nucleic acids shown as SEQ ID NOs:2-9 or 14-21.
[0185]The expression vector is any of the mammalian, yeast, insect or bacterial expression systems known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Genetics Institute (Cambridge, Mass.), Stratagene (La Jolla, Calif.), Promega (Madison, Wis.), and Invitrogen (San Diego, Calif.). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for the particular expression organism in which the expression vector is introduced, as explained by Hatfield, et al., U.S. Pat. No. 5,082,767, the disclosure of which is incorporated by reference herein in its entirety.
[0186]In one embodiment, the entire coding sequence of the cDNA through the poly A signal of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if the nucleic acid encoding a portion of the GlyT1 protein lacks a methionine to serve as the initiation site, an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional techniques. Similarly, if the insert from the GlyT1 cDNA lacks a poly A signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using BglI and SalI restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXT1 (Stratagene). pXT1 contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene.
[0187]The finished constructs may be transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, N.Y.) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Mo.).
[0188]The expressed protein may be purified using conventional purification techniques such as ammonium sulfate precipitation or chromatographic separation based on size or charge. The protein encoded by the nucleic acid insert may also be purified using standard immunochromatography techniques. In such procedures, a solution containing the expressed GlyT1 protein or portion thereof, such as a cell extract, is applied to a column having antibodies against the GlyT1 protein or portion thereof is attached to the chromatography matrix. The expressed protein is allowed to bind the immunochromatography column. Thereafter, the column is washed to remove non-specifically bound proteins. The specifically bound expressed protein is then released from the column and recovered using standard techniques.
[0189]To confirm expression of the GlyT1 protein or a portion thereof, the proteins expressed from host cells containing an expression vector containing an insert encoding the GlyT1 protein or a portion thereof can be compared to the proteins expressed in host cells containing the expression vector without an insert. The presence of a band in samples from cells containing the expression vector with an insert which is absent in samples from cells containing the expression vector without an insert indicates that the GlyT1 protein or a portion thereof is being expressed. Generally, the band will have the mobility expected for the GlyT1 protein or portion thereof. However, the band may have a mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage.
[0190]Antibodies capable of specifically recognizing the expressed GlyT1 protein or a portion thereof can be prepared using standard methods and are described below.
[0191]If antibody production is not possible, the nucleic acids encoding the GlyT1 protein or a portion thereof may be incorporated into an expression vector designed for use in purification schemes employing chimeric polypeptides. In such strategies the nucleic acid encoding the GlyT1 protein or a portion thereof is inserted in frame with the gene encoding the other half of the chimera. The other half of the chimera is, e.g., beta-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix having an antibody to beta-globin or nickel attached thereto is then used to purify the chimeric protein. Protease cleavage sites is engineered between the beta-globin gene or the nickel binding polypeptide and the GlyT1 protein or portion thereof. Thus, the two polypeptides of the chimera are separated from one another by protease digestion.
[0192]One useful expression vector for generating beta-globin chimeric proteins is pSG5 (Stratagene), which encodes rabbit beta-globin. Intron II of the rabbit beta-globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al. (1986) and many of the methods are available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from the construct using in vitro translation systems such as the IN VITRO EXPRESS Translation Kit (Stratagene).
Antibodies that Bind GlyT1 Polypeptides of the Invention
[0193]Any GlyT1 polypeptide or whole protein may be used to generate antibodies capable of specifically binding to an expressed GlyT1 protein or fragment thereof as described.
[0194]In preferred embodiments, antibodies are prepared that specifically recognize any of the novel GlyT1 polypeptides of the invention (e.g. polypeptides comprising a sequence shown as SEQ ID NOs:26-33), or a polypeptide comprising a sequence encoded by any of the novel exons of the invention (SEQ ID NOs:2-9). For an antibody composition to specifically bind to a first variant of GlyT1, it must demonstrate at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for a full length first variant of the GlyT1 protein than for a full length second variant of the GlyT1 protein in an ELISA, RIA, or other antibody-based binding assay.
[0195]In a preferred embodiment, the invention concerns antibody compositions, either polyclonal or monoclonal, capable of selectively binding to an epitope-containing polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID Nos:26-33, or encoded by SEQ ID NOs:2-9.
[0196]In a preferred embodiment, the invention concerns the use in the manufacture of antibodies of a polypeptide comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID NOs:26-33, or encoded by SEQ ID NOs:2-9.
[0197]Non-human animals or mammals, whether wild-type or transgenic, which express a different species of GlyT1 than the one to which antibody binding is desired, and animals which do not express GlyT1 (i.e. a GlyT1 knock out animal as described herein) are particularly useful for preparing antibodies. GlyT1 knock out animals will recognize all or most of the exposed regions of a GlyT1 protein as foreign antigens, and therefore produce antibodies with a wider array of GlyT1 epitopes. Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in obtaining specific binding to any one of the GlyT1 proteins. In addition, the humoral immune system of animals which produce a species of GlyT1 that resembles the antigenic sequence will preferentially recognize the differences between the animal's native GlyT1 species and the antigen sequence, and produce antibodies to these unique sites in the antigen sequence. Such a technique will be particularly useful in obtaining antibodies that specifically bind to any one of the GlyT1 proteins.
[0198]Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein in the body.
[0199]The antibodies of the invention may be labeled using any of a large number of labels, including any one of the radioactive, fluorescent or enzymatic labels known in the art.
[0200]Consequently, the invention is also directed to a method for detecting specifically the presence of a GlyT1 polypeptide according to the invention in a biological sample, said method comprising bringing into contact the biological sample with a polyclonal or monoclonal antibody that specifically binds a GlyT1 polypeptide comprising an amino acid sequence of SEQ ID Nos:26-33, or encoded by any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21; or to a peptide fragment or variant thereof, and detecting the antigen-antibody complex formed.
[0201]The invention also concerns a diagnostic kit for detecting in vitro the presence of a GlyT1 polypeptide according to the present invention in a biological sample, wherein said kit comprises a polyclonal or monoclonal antibody that specifically binds a GlyT1 polypeptide comprising an amino acid sequence of SEQ ID Nos:26-33, or encoded by any of the nucleic acid sequences shown as SEQ ID NOs:2-9 or 14-21; or to a peptide fragment or variant thereof, optionally labeled; and a reagent allowing the detection of the antigen-antibody complexes formed, said reagent carrying optionally a label, or being able to be recognized itself by a labeled reagent, more particularly in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled by itself.
Recombinant Vectors
[0202]The term "vector" is used herein to designate either a circular or a linear DNA or RNA molecule, which is either double-stranded or single-stranded, and which comprises at least one polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or multicellular host organism.
[0203]The present invention encompasses a family of recombinant vectors that comprise any regulatory or coding polynucleotide derived from any of the herein-provided novel GlyT1 cDNAs.
[0204]In a first preferred embodiment, a recombinant vector of the invention is used to amplify an inserted polynucleotide derived from a GlyT1 cDNA in a suitable cell host, this polynucleotide being amplified at every time that the recombinant vector replicates.
[0205]A second preferred embodiment of the recombinant vectors according to the invention comprises expression vectors comprising a regulatory polynucleotide and/or a coding nucleic acid of the invention. Within certain embodiments, expression vectors are employed to express the GlyT1 polypeptide which can then be purified and, for example, be used in ligand screening assays or as an immunogen in order to raise specific antibodies directed against the GlyT1 protein. In other embodiments, the expression vectors are used for constructing transgenic animals and also for gene therapy. Expression requires that appropriate signals are provided in the vectors, said signals including various regulatory elements, such as enhancers/promoters from both viral and mammalian sources that drive expression of the genes of interest in host cells. Dominant drug selection markers for establishing permanent, stable cell clones expressing the products are generally included in the expression vectors of the invention, as they are elements that link expression of the drug selection markers to expression of the polypeptide.
[0206]More particularly, the present invention relates to expression vectors which include nucleic acids encoding a GlyT1 protein, preferably a GlyT1 protein of any of the amino acid sequences of SEQ ID Nos:26-33, or variants or fragments thereof.
[0207]The invention also pertains to a recombinant expression vector useful for the expression of a GlyT1 coding sequence, wherein said vector comprises a nucleic acid of SEQ ID Nos:2-9 or 14-21.
[0208]Some of the elements which can be found in the vectors of the present invention are described in further detail elsewhere in the present specification.
[0209]The present invention also encompasses primary, secondary, and immortalized homologously recombinant host cells of vertebrate origin, preferably mammalian origin and particularly human origin, that have been engineered to: a) insert exogenous heterologous) polynucleotides into the endogenous chromosomal DNA of a targeted gene, b) delete endogenous chromosomal DNA, and/or c) replace endogenous chromosomal DNA with exogenous polynucleotides. Insertions, deletions, and/or replacements of polynucleotide sequences may be to the coding sequences of the targeted gene and/or to regulatory regions, such as promoter and enhancer sequences, operably associated with the targeted gene.
[0210]The present invention further relates to a method of altering the expression of a targeted gene in a cell in vitro or in vivo wherein the gene is not normally expressed in the cell, comprising the steps of: (a) transfecting the cell in vitro or in vivo with a polynucleotide construct, the polynucleotide construct comprising: (i) a targeting sequence; (ii) a regulatory sequence and/or a coding sequence; and (iii) an unpaired splice donor site, if necessary, thereby producing a transfected cell; and (b) maintaining the transfected cell in vitro or in vivo under conditions appropriate for homologous recombination, thereby producing a homologously recombinant cell; and (c) maintaining the homologously recombinant cell in vitro or in vivo under conditions appropriate for expression of the gene. Methods of making cells with altered expression, and polynucleotide constructs used to make the cells, are also provided.
[0211]Another method for altering the expression of a targeted gene, e.g. a GlyT1 gene, is by introducing into a cell capable of expressing GlyT1 a polynucleotide whose presence in the cell alters the expression of the GlyT1 gene. For example, the polynucleotide may act to replace the endogenous GlyT1 promoter with a more or less active promoter, or may comprise an enhancer element whose insertion into the genome in the vicinity of the GlyT1 gene results in an increase or decrease in the expression of the GlyT1.
[0212]The compositions may be produced, and methods performed, by techniques known in the art, such as those described in U.S. Pat. Nos. 6,054,288; 6,048,729; 6,048,724; 6,048,524; 5,994,127; 5,968,502; 5,965,125; 5,869,239; 5,817,789; 5,783,385; 5,733,761; 5,641,670; 5,580,734; International Publication Nos: WO96/29411, WO 94/12650; and scientific articles including Koller et al., (1989) Proc. Natl. Acad. Sci. USA 86:8932-8935.
1. General Features of the Expression Vectors of the Invention
[0213]Recombinant vectors that can be used in the present invention include, but are not limited to, YACs (Yeast Artificial Chromosome), BACs (Bacterial Artificial Chromosome), phages, phagemids, cosmids, plasmids, and linear DNA molecules which may comprise chromosomal, non-chromosomal, semi-synthetic or synthetic DNA. Such recombinant vectors can comprise a transcriptional unit comprising an assembly of:
[0214](1) a genetic element or elements having a regulatory role in gene expression, for example promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 bp in length that act on the promoter to increase the transcription.
[0215](2) a structural or coding sequence which is transcribed into mRNA and eventually translated into a polypeptide, said structural or coding sequence being operably linked to the regulatory elements described in (1); and
[0216](3) appropriate transcription initiation and termination sequences. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell. Alternatively, when a recombinant protein is expressed without a leader or transport sequence, it may include a N-terminal residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
[0217]Generally, recombinant expression vectors will include origins of replication, selectable markers permitting transformation of the host cell, and a promoter derived from a highly expressed gene to direct transcription of a downstream structural sequence. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably a leader sequence capable of directing secretion of the translated protein into the periplasmic space or the extracellular medium. In a specific embodiment wherein the vector is adapted for transfecting and expressing desired sequences in mammalian host cells, preferred vectors will comprise an origin of replication in the desired host, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation signal, splice donor and acceptor sites, transcriptional termination sequences, and 5'-flanking non-transcribed sequences. DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, enhancer, splice and polyadenylation signals may be used to provide the required non-transcribed genetic elements.
[0218]The in vivo expression of a GlyT1 polypeptide of SEQ ID Nos:26-33, or fragments or variants thereof, may be useful in order to correct a genetic defect related to the expression of the native gene in a host organism or to the production of a biologically inactive GlyT1 protein.
[0219]Consequently, the present invention also comprises recombinant expression vectors mainly designed for the in vivo production of a GlyT1 polypeptide of SEQ ID Nos:26-33, or fragments or variants thereof, by the introduction of the appropriate genetic material in the organism of the patient to be treated. This genetic material may be introduced in vitro in a cell that has been previously extracted from the organism, the modified cell being subsequently reintroduced in the said organism, directly in vivo into the appropriate tissue.
2. Regulatory Elements
Promoters
[0220]The suitable promoter regions used in the expression vectors according to the present invention are chosen taking into account the cell host in which the heterologous gene has to be expressed. The particular promoter employed to control the expression of a nucleic acid sequence of interest is not believed to be important, so long as it is capable of directing the expression of the nucleic acid in the targeted cell. Thus, where a human cell is targeted, it is preferable to position the nucleic acid coding region adjacent to and under the control of a promoter that is capable of being expressed in a human cell, such as, for example, a human or a viral promoter.
[0221]A suitable promoter may be heterologous with respect to the nucleic acid for which it controls the expression or alternatively can be endogenous to the native polynucleotide containing the coding sequence to be expressed. Additionally, the promoter is generally heterologous with respect to the recombinant vector sequences within which the construct promoter/coding sequence has been inserted.
[0222]Promoter regions can be selected from any desired gene using, for example, CAT (chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors.
[0223]Preferred bacterial promoters are the LacI, LacZ, the T3 or T7 bacteriophage RNA polymerase promoters, the gpt, lambda PR, PL and trp promoters (EP 0036776), the polyhedrin promoter, or the p10 protein promoter from baculovirus (Kit Novagen) (Smith et al., 1983; O'Reilly et al., 1992), the lambda PR promoter or also the trc promoter.
[0224]Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and promoter is well within the level of ordinary skill in the art.
[0225]The choice of a promoter is well within the ability of a person skilled in the field of genetic engineering. For example, one may refer to Sambrook et al. (1989) or also to the procedures described by Fuller et al. (1996).
Other Regulatory Elements
[0226]Where a cDNA insert is employed, one will typically desire to include a polyadenylation signal to effect proper polyadenylation of the gene transcript. The nature of the polyadenylation signal is not believed to be crucial to the successful practice of the invention, and any such sequence may be employed such as human growth hormone and SV40 polyadenylation signals. Also contemplated as an element of the expression cassette is a terminator. These elements can serve to enhance message levels and to minimize read through from the cassette into other sequences.
3. Selectable Markers
[0227]Such markers would confer an identifiable change to the cell permitting easy identification of cells containing the expression construct. The selectable marker genes for selection of transformed host cells are preferably dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. coli, or levan saccharase for mycobacteria, this latter marker being a negative selection marker.
4. Preferred Vectors
Bacterial Vectors
[0228]As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable marker and a bacterial origin of replication derived from commercially available plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEMI (Promega Biotec, Madison, Wis., USA).
[0229]Large numbers of other suitable vectors are known to those of skill in the art, and commercially available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 (Qiagen), pbs, pD10, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT, pOG44, pXT1, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); pQE-30 (QIAexpress).
Bacteriophage Vectors
[0230]The P1 bacteriophage vector may contain large inserts ranging from about 80 to about 100 kb.
[0231]The construction of P1 bacteriophage vectors such as p158 or p158/neo8 are notably described by Sternberg (1992, 1994). Recombinant P1 clones comprising GlyT1 nucleotide sequences may be designed for inserting large polynucleotides of more than 40 kb (Linton et al., 1993). To generate P1 DNA for transgenic experiments, a preferred protocol is the protocol described by McCormick et al. (1994). Briefly, E. coli (preferably strain NS3529) harboring the P1 plasmid are grown overnight in a suitable broth medium containing 25 μg/ml of kanamycin. The P1 DNA is prepared from the E. coli by alkaline lysis using the Qiagen Plasmid Maxi kit (Qiagen, Chatsworth, Calif., USA), according to the manufacturer's instructions. The P1 DNA is purified from the bacterial lysate on two Qiagen-tip 500 columns, using the washing and elution buffers contained in the kit. A phenol/chloroform extraction is then performed before precipitating the DNA with 70% ethanol. After solubilizing the DNA in TE (10 mM Tris-HCl, pH 7.4, 1 mM EDTA), the concentration of the DNA is assessed by spectrophotometry.
[0232]When the goal is to express a P1 clone comprising GlyT1 nucleotide sequences in a transgenic animal, typically in transgenic mice, it is desirable to remove vector sequences from the P1 DNA fragment, for example by cleaving the P1 DNA at rare-cutting sites within the P1 polylinker (SfiI, NotI or SalI). The P1 insert is then purified from vector sequences on a pulsed-field agarose gel, using methods similar to those originally reported for the isolation of DNA from YACs (Schedl et al., 1993a; Peterson et al., 1993). At this stage, the resulting purified insert DNA can be concentrated, if necessary, on a Millipore Ultrafree-MC Filter Unit (Millipore, Bedford, Mass., USA 30,000 molecular weight limit) and then dialyzed against microinjection buffer (10 mM Tris-HCl, pH 7.4; 250 μM EDTA) containing 100 mM NaCl, 30 μM spermine, 70 μM spermidine on a microdyalisis membrane (type VS, 0.025 μM from Millipore). The intactness of the purified P1 DNA insert is assessed by electrophoresis on 1% agarose (Sea Kem GTG; FMC Bio-products) pulse-field gel and staining with ethidium bromide.
Baculovirus Vectors
[0233]A suitable vector for the expression of a GlyT1 polypeptide of SEQ ID Nos:26-33 or fragments or variants thereof is a baculovirus vector that can be propagated in insect cells and in insect cell lines. A specific suitable host vector system is the pVL1392/1393 baculovirus transfer vector (Pharmingen) that is used to transfect the SF9 cell line (ATCC NoCRL 1711) which is derived from Spodoptera frugiperda.
[0234]Other suitable vectors for the expression of the GlyT1 polypeptide of SEQ ID Nos:26-33 or fragments or variants thereof in a baculovirus expression system include those described by Chai et al. (1993), Vlasak et al. (1983) and Lenhard et al. (1996).
Viral Vectors
[0235]In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et al. (1994). Another preferred recombinant adenovirus according to this specific embodiment of the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal origin (French patent application No FR-93.05954).
[0236]Retrovirus vectors and adeno-associated virus vectors are generally understood to be the recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo, particularly to mammals, including humans. These vectors provide efficient delivery of genes into cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host.
[0237]Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or in vitro gene delivery vehicles of the present invention include retroviruses selected from the group consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A and the 1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR-190; PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred retroviral vectors are those described in Roth et al. (1996), PCT Application No WO 93/25234, PCT Application No WO 94/06920, Roux et al. (1989), Julan et al. (1992) and Neda et al. (1991).
[0238]Yet another viral vector system that is contemplated by the invention comprises the adeno-associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient replication and a productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration (Flotte et al., 1992; Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells.
BAC Vectors
[0239]The bacterial artificial chromosome (BAC) cloning system (Shizuya et al., 1992) has been developed to stably maintain large fragments of genomic DNA (100-300 kb) in E. coli. A preferred BAC vector comprises a pBeloBAC11 vector that has been described by Kim et al. (1996). BAC libraries are prepared with this vector using size-selected genomic DNA that has been partially digested using enzymes that permit ligation into either the Bam HI or HindIII sites in the vector. Flanking these cloning sites are T7 and SP6 RNA polymerase transcription initiation sites that can be used to generate end probes by either RNA transcription or PCR methods. After the construction of a BAC library in E. coli, BAC DNA is purified from the host cell as a supercoiled circle. Converting these circular molecules into a linear form precedes both size determination and introduction of the BACs into recipient cells. The cloning site is flanked by two Not I sites, permitting cloned segments to be excised from the vector by Not I digestion. Alternatively, the DNA insert contained in the pBeloBAC11 vector may be linearized by treatment of the BAC vector with the commercially available enzyme lambda terminase that leads to the cleavage at the unique cos N site, but this cleavage method results in a full length BAC clone containing both the insert DNA and the BAC sequences.
5. Delivery of the Recombinant Vectors
[0240]In order to effect expression of the polynucleotides and polynucleotide constructs of the invention, these constructs must be delivered into a cell (or cell extract capable of supporting protein expression). This delivery may be accomplished in vitro, as in laboratory procedures for transforming cell lines, or in vivo or ex vivo, as in the treatment of certain diseases states.
[0241]One mechanism is viral infection where the expression construct is encapsulated in an infectious viral particle.
[0242]Several non-viral methods for the transfer of polynucleotides into cultured mammalian cells are also contemplated by the present invention, and include, without being limited to, calcium phosphate precipitation (Graham et al., 1973; Chen et al., 1987), DEAE-dextran (Gopal, 1985), electroporation (Tur-Kaspa et al., 1986; Potter et al., 1984), direct microinjection (Harland et al., 1985), DNA-loaded liposomes (Nicolau et al., 1982; Fraley et al., 1979), and receptor-mediated transfection (Wu and Wu, 1987; 1988). Some of these techniques may be successfully adapted for in vivo or ex vivo use.
[0243]Once the expression polynucleotide has been delivered into the cell, it may be stably integrated into the genome of the recipient cell. This integration may be in the cognate location and orientation via homologous recombination (gene replacement) or it may be integrated in a random, non specific location (gene augmentation). In yet further embodiments, the nucleic acid may be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to permit maintenance and replication independent of or in synchronization with the host cell cycle.
[0244]One specific embodiment for a method for delivering a protein or peptide to the interior of a cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a physiologically acceptable carrier and a naked polynucleotide operatively coding for the polypeptide of interest into the interstitial space of a tissue comprising the cell, whereby the naked polynucleotide is taken up into the interior of the cell and has a physiological effect. This is particularly applicable for transfer in vitro but it may be applied to in vivo as well.
[0245]Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are described in PCT application No. WO 90/11092 (Vical Inc.) and also in PCT application No. WO 95/11307 (Institut Pasteur, INSERM, Universite d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996).
[0246]In still another embodiment of the invention, the transfer of a naked polynucleotide of the invention, including a polynucleotide construct of the invention, into cells may be proceeded with a particle bombardment (biolistic), said particles being DNA-coated microprojectiles accelerated to a high velocity allowing them to pierce cell membranes and enter cells without killing them, such as described by Klein et al. (1987).
[0247]In a further embodiment, the polynucleotide of the invention may be entrapped in a liposome using any of a wide variety of standard methods (see, e.g., Ghosh and Bacchawat, 1991; Wong et al., 1980; Nicolau et al., 1987).
[0248]In a specific embodiment, the invention provides a composition for the in vivo production of the GlyT1 protein or polypeptide described herein. It comprises a naked polynucleotide operatively coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide.
[0249]The amount of vector to be injected to the desired host organism varies according to the site of injection. As an indicative dose, it will be injected between 0.1 and 100 μg of the vector in an animal body, preferably a mammal body, for example a mouse body.
[0250]In another embodiment of the vector according to the invention, it may be introduced in vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been transformed with the vector coding for the desired GlyT1 polypeptide or the desired fragment thereof is reintroduced into the animal body in order to deliver the recombinant protein within the body either locally or systemically.
Cell Hosts
[0251]Another object of the invention comprises a host cell that has been transformed or transfected with one of the polynucleotides described herein, and in particular a polynucleotide either comprising a GlyT1 regulatory polynucleotide or the coding sequence of the GlyT1 polypeptide selected from the group consisting of SEQ ID Nos:2-9 and 14-21, or a fragment or a variant thereof. Also included are host cells that are transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector such as one of those described above. More particularly, the cell hosts of the present invention can comprise any of the polynucleotides described in the "Genomic Sequences Of The GlyT1 Gene" section, the "GlyT1 cDNA Sequences" section, the "Coding Regions" section, the "Polynucleotide constructs" section, and the "Oligonucleotide Probes And Primers" section.
[0252]An additional recombinant cell host according to the invention comprises any of the vectors described herein, more particularly any of the vectors described in the "Recombinant Vectors" section.
[0253]Preferred host cells used as recipients for the expression vectors of the invention are the following:
[0254]a) Prokaryotic host cells: Escherichia coli strains (I.E.DH5-α strain), Bacillus subtilis, Salmonella typhimurium, and strains from species like Pseudomonas, Streptomyces and Staphylococcus.
[0255]b) Eukaryotic host cells: HeLa cells (ATCC No. CCL2; No. CCL2.1; No. CCL2.2), Cv 1 cells (ATCC No. CCL70), COS cells (ATCC No. CRL1650; No. CRL1651), Sf-9 cells (ATCC No. CRL1711), C127 cells (ATCC No. CRL-1804), 3T3 (ATCC No. CRL-6361), CHO (ATCC No. CCL-61), human kidney 293. (ATCC No. 45504; No. CRL-1573) and BHK (ECACC No. 84100501; No. 84111301).
[0256]c) Other mammalian host cells.
[0257]The GlyT1 gene expression in mammalian, and typically human, cells may be inhibited or enhanced with the insertion of a GlyT1 genomic or cDNA sequence with the replacement of the GlyT1 gene counterpart in the genome of an animal cell by a GlyT1 polynucleotide according to the invention. These genetic alterations may be generated by homologous recombination events using specific DNA constructs that have been previously described.
[0258]One kind of cell hosts that may be used are mammal zygotes, such as murine zygotes. For example, murine zygotes may undergo microinjection with a purified DNA molecule of interest, for example a purified DNA molecule that has previously been adjusted to a concentration range from 1 ng/ml--for BAC inserts--3 ng/μl--for P1 bacteriophage inserts--in 10 mM Tris-HCl, pH 7.4, 250 μM EDTA containing 100 mM NaCl, 30 μM spermine, and 70 μM spermidine. When the DNA to be microinjected has a large size, polyamines and high salt concentrations can be used in order to avoid mechanical breakage of this DNA, as described by Schedl et al. (1993b).
[0259]Any one of the polynucleotides of the invention, including the DNA constructs described herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. ES cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre-implantation blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC no CRL-1821), ES-D3 (ATCC no CRL1934 and no CRL-11632), YS001 (ATCC no CRL-11776), 36.5 (ATCC no CRL-11116). To maintain ES cells in an uncommitted state, they are cultured in the presence of growth inhibited feeder cells which provide the appropriate signals to preserve this embryonic phenotype and serve as a matrix for ES cell adherence. Preferred feeder cells are primary embryonic fibroblasts that are established from tissue of day 13-day 14 embryos of virtually any mouse strain, that are maintained in culture, such as described by Abbondanzo et al. (1993) and are inhibited in growth by irradiation, such as described by Robertson (1987), or by the presence of an inhibitory concentration of LIF, such as described by Pease and Williams (1990).
[0260]The constructs in the host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
[0261]Following transformation of a suitable host and growth of the host to an appropriate cell density, the selected promoter is induced by appropriate means, such as temperature shift or chemical induction, and cells are cultivated for an additional period.
[0262]Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
[0263]Microbial cells employed in the expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Such methods are well known by the skill artisan.
Transgenic Animals
[0264]The terms "transgenic animals" or "host animals" are used herein to designate animals that have their genome genetically and artificially manipulated so as to include one of the nucleic acids according to the invention. Preferred animals are non-human mammals and include those belonging to a genus selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) which have their genome artificially and genetically altered by the insertion of a nucleic acid according to the invention. In one embodiment, the invention encompasses non-human host mammals and animals comprising a recombinant vector of the invention.
[0265]The transgenic animals of the invention all include within a plurality of their cells a cloned recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic acids comprising a GlyT1 coding sequence, a GlyT1 regulatory polynucleotide, a polynucleotide construct, or a DNA sequence encoding an antisense polynucleotide such as described in the present specification.
[0266]Generally, a transgenic animal according the present invention comprises any one of the polynucleotides, the recombinant vectors and the cell hosts described in the present invention. More particularly, the transgenic animals of the present invention can comprise any of the polynucleotides described in the "Genomic Sequences Of the GlyT1 Gene" section, the "GlyT1 cDNA Sequences" section, the "Coding Regions" section, the "Polynucleotide constructs" section, the "Oligonucleotide Probes And Primers" section, the "Recombinant Vectors" section and the "Cell Hosts" section.
[0267]In a first preferred embodiment, these transgenic animals may be good experimental models in order to study the effects of GlyT1 activity, e.g. to study psychological disorders such as schizophrenia or other psychotic disorders. In one such embodiment, transgenic animals are produced in which one or several copies of a polynucleotide encoding any of the present novel GlyT1 proteins has been inserted into the genome.
[0268]In a second preferred embodiment, these transgenic animals may express a desired polypeptide of interest under the control of the regulatory polynucleotides of the GlyT1 gene, leading to good yields in the synthesis of this protein of interest, and eventually a tissue specific expression of this protein of interest.
[0269]The design of the transgenic animals of the invention may be made according to the conventional techniques well known from the one skilled in the art. For more details regarding the production of transgenic animals, and specifically transgenic mice, it may be referred to U.S. Pat. No. 4,873,191; 5,464,764; or 5,789,215; these documents being herein incorporated by reference to disclose methods of producing transgenic mice.
[0270]Transgenic animals of the present invention are produced by the application of procedures which result in an animal with a genome that has incorporated exogenous genetic material. The procedure involves obtaining the genetic material, or a portion thereof, which encodes either a GlyT1 coding sequence, a GlyT1 regulatory polynucleotide or a DNA sequence encoding a GlyT1 antisense polynucleotide such as described in the present specification.
[0271]A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell line. The insertion is preferably made using electroporation, such as described by Thomas et al. (1987). The cells subjected to electroporation are screened (e.g. by selection via selectable markers, by PCR or by Southern blot analysis) to find positive cells which have integrated the exogenous recombinant polynucleotide into their genome, preferably via an homologous recombination event. An illustrative positive-negative selection procedure that may be used according to the invention is described by Mansour et al. (1988).
[0272]Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts from mice, such as described by Bradley (1987). The blastocysts are then inserted into a female host animal and allowed to grow to term.
[0273]Alternatively, the positive ES cells are brought into contact with embryos at the 2.5 days old 8-16 cell stage (morulae) such as described by Wood et al. (1993) or by Nagy et al. (1993), the ES cells being internalized to colonize extensively the blastocyst including the cells which will give rise to the germ line.
[0274]The offspring of the female host are tested to determine which animals are transgenic e.g. include the inserted exogenous DNA sequence and which are wild-type.
[0275]Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a recombinant expression vector or a recombinant host cell according to the invention.
Recombinant Cell Lines Derived from the Transgenic Animals of the Invention.
[0276]A further object of the invention comprises recombinant host cells obtained from a transgenic animal described herein. In one embodiment the invention encompasses cells derived from non-human host mammals and animals comprising a recombinant vector of the invention or expressing any of the present novel GlyT1 polypeptides.
[0277]Recombinant cell lines may be established in vitro from cells obtained from any tissue of a transgenic animal according to the invention, for example by transfection of primary cell cultures with vectors expressing one-genes such as SV40 large T antigen, as described by Chou (1989) and Shay et al. (1991).
[0278]Methods for Screening Substances Interacting with a GlyT1 Polypeptide
[0279]For the purpose of the present invention, a ligand means a molecule, such as a protein, a peptide, an antibody or any synthetic chemical compound capable of binding to a GlyT1 protein or one of its fragments or variants or to modulate the expression of the polynucleotide coding for GlyT1 or a fragment or variant thereof.
[0280]In the ligand screening method according to the present invention, a biological sample or a defined molecule to be tested as a putative ligand of a GlyT1 protein is brought into contact with the corresponding purified GlyT1 protein, for example the corresponding purified recombinant GlyT1 protein produced by a recombinant cell host as described hereinbefore, in order to form a complex between this protein and the putative ligand molecule to be tested. In any of the herein-described assays, the GlyT1 may be present in a cell or cell membrane during the assay.
[0281]As an illustrative example, to study the interaction of any of the present novel GlyT1 proteins, or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID NOs:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21, with drugs or small molecules, such as molecules generated through combinatorial chemistry approaches, the microdialysis coupled to HPLC method described by Wang et al. (1997) or the affinity capillary electrophoresis method described by Bush et al. (1997), the disclosures of which are incorporated by reference, can be used.
[0282]In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which interact with the GlyT1 protein, or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21, may be identified using assays such as the following. The molecule to be tested for binding is labeled with a detectable label, such as a fluorescent radioactive, or enzymatic tag and placed in contact with immobilized GlyT1 protein, or a fragment thereof under conditions which permit specific binding to occur. After removal of non-specifically bound molecules, bound molecules are detected using appropriate means.
[0283]Another object of the present invention comprises methods and kits for the screening of candidate substances that interact with a GlyT1 polypeptide.
[0284]The present invention pertains to methods for screening substances of interest that interact with a GlyT1 protein or one fragment or variant thereof. By their capacity to bind covalently or non-covalently to a GlyT1 protein or to a fragment or variant thereof, these substances or molecules may be advantageously used both in vitro and in vivo.
[0285]In vitro, said interacting molecules may be used as detection means in order to identify the presence of a GlyT1 protein in a sample, preferably a biological sample.
[0286]A method for the screening of a candidate substance comprises the following steps:
[0287]a) providing a polypeptide comprising, consisting essentially of, or consisting of a GlyT1 protein or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID Nos:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21;
[0288]b) obtaining a candidate substance;
[0289]c) bringing into contact said polypeptide with said candidate substance;
[0290]d) detecting the complexes formed between said polypeptide and said candidate substance.
[0291]The invention further concerns a kit for the screening of a candidate substance interacting with the GlyT1 polypeptide, wherein said kit comprises:
[0292]a) a GlyT1 protein having an amino acid sequence selected from the group consisting of any of the amino acid sequences of SEQ ID Nos:26-33, or a peptide fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID Nos:26-33, or an amino acid sequence encoded by any of SEQ ID NOs:2-9 or 14-21;
[0293]b) optionally means useful to detect the complex formed between the GlyT1 protein or a peptide fragment or a variant thereof and the candidate substance.
[0294]In a preferred embodiment of the kit described above, the detection means comprises a monoclonal or polyclonal antibodies directed against the GlyT1 protein or a peptide fragment or a variant thereof.
[0295]Various candidate substances or molecules can be assayed for interaction with a GlyT1 polypeptide. These substances or molecules include, without being limited to, natural or synthetic organic compounds or molecules of biological origin such as polypeptides. When the candidate substance or molecule comprises a polypeptide, this polypeptide may be the resulting expression product of a phage clone belonging to a phage-based random peptide library, or alternatively the polypeptide may be the resulting expression product of a cDNA library cloned in a vector suitable for performing a two-hybrid screening assay.
[0296]The invention also pertains to kits useful for performing the herein-described screening methods. Preferably, such kits comprise a GlyT1 polypeptide or a fragment or a variant thereof, and optionally means useful to detect the complex formed between the GlyT1 polypeptide or its fragment or variant and the candidate substance. In a preferred embodiment the detection means comprise a monoclonal or polyclonal antibody directed against the corresponding GlyT1 polypeptide or a fragment or a variant thereof.
A. Candidate Ligands Obtained from Random Peptide Libraries
[0297]In a particular embodiment of the screening method, the putative ligand is the expression product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, random peptide phages libraries are used. The random DNA inserts encode for peptides of 8 to 20 amino acids in length (Oldenburg K. R. et al., 1992; Valadon P., et al., 1996; Lucas A. H., 1994; Westerink M. A. J., 1995; Felici F. et al., 1991). According to this particular embodiment, the recombinant phage expressing a protein that binds to the immobilized GlyT1 protein is retained and the complex formed between the GlyT1 protein and the recombinant phage may be subsequently immunoprecipitated by a polyclonal or a monoclonal antibody directed against the GlyT1 protein.
[0298]Once the ligand library in recombinant phages has been constructed, the phage population is brought into contact with the immobilized GlyT1 protein. Then the preparation of complexes is washed in order to remove the non-specifically bound recombinant phages. The phages that bind specifically to the GlyT1 protein are then eluted by a buffer (acid pH) or immunoprecipitated by the monoclonal antibody produced by the hybridoma anti-GlyT1, and this phage population is subsequently amplified by an over-infection of bacteria (for example E. coli). The selection step may be repeated several times, preferably 2-4 times, in order to select the more specific recombinant phage clones. The last step comprises characterizing the peptide produced by the selected recombinant phage clones either by expression in infected bacteria and isolation, expressing the phage insert in another host-vector system, or sequencing the insert contained in the selected recombinant phages.
B. Candidate Ligands Obtained by Competition Experiments
[0299]Alternatively, peptides, drugs or small molecules which bind to the GlyT1 protein, or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID NOs:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21, may be identified in competition experiments. In such assays, the GlyT1 protein, or a fragment thereof, is immobilized to a surface, such as a plastic plate. Increasing amounts of the peptides, drugs or small molecules are placed in contact with the immobilized GlyT1 protein, or a fragment thereof, in the presence of a detectable labeled known GlyT1 protein ligand. For example, the GlyT1 ligand may be detectably labeled with a fluorescent, radioactive, or enzymatic tag. The ability of the test molecule to bind the GlyT1 protein, or a fragment thereof, is determined by measuring the amount of detectably labeled known ligand bound in the presence of the test molecule. A decrease in the amount of known ligand bound to the GlyT1 protein, or a fragment thereof, when the test molecule is present indicated that the test molecule is able to bind to the GlyT1 protein, or a fragment thereof.
C. Candidate Ligands Obtained by Affinity Chromatography
[0300]Proteins or other molecules interacting with the GlyT1 protein, or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID NOs:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21, can also be found using affinity columns which contain the GlyT1 protein, or a fragment thereof. The GlyT1 protein, or a fragment thereof, may be attached to the column using conventional techniques including chemical coupling to a suitable column matrix such as agarose, Affi Gel®, or other matrices familiar to those of skill in art. In some embodiments of this method, the affinity column contains chimeric proteins in which the GlyT1 protein, or a fragment thereof, is fused to glutathion S transferase (GST). A mixture of cellular proteins or pool of expressed proteins as described above is applied to the affinity column. Proteins or other molecules interacting with the GlyT1 protein, or a fragment thereof, attached to the column can then be isolated and analyzed on 2-D electrophoresis gel as described in Ramunsen et al. (1997), the disclosure of which is incorporated by reference. Alternatively, the proteins retained on the affinity column can be purified by electrophoresis based methods and sequenced. The same method can be used to isolate antibodies, to screen phage display products, or to screen phage display human antibodies.
D. Candidate Ligands Obtained by Optical Biosensor Methods
[0301]Proteins interacting with the GlyT1 protein, or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID Nos:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21, can also be screened by using an Optical Biosensor as described in Edwards and Leatherbarrow (1997) and also in Szabo et al. (1995), the disclosures of which are incorporated herein by reference. This technique permits the detection of interactions between molecules in real time, without the need of labeled molecules. This technique is based on the surface plasmon resonance (SPR) phenomenon. Briefly, the candidate ligand molecule to be tested is attached to a surface (such as a carboxymethyl dextran matrix). A light beam is directed towards the side of the surface that does not contain the sample to be tested and is reflected by said surface. The SPR phenomenon causes a decrease in the intensity of the reflected light with a specific association of angle and wavelength. The binding of candidate ligand molecules cause a change in the refraction index on the surface, which change is detected as a change in the SPR signal. For screening of candidate ligand molecules or substances that are able to interact with the GlyT1 protein, or a fragment thereof, the GlyT1 protein, or a fragment thereof, is immobilized onto a surface. This surface comprises one side of a cell through which flows the candidate molecule to be assayed. The binding of the candidate molecule on the GlyT1 protein, or a fragment thereof, is detected as a change of the SPR signal. The candidate molecules tested may be proteins, peptides, carbohydrates, lipids, or small molecules generated by combinatorial chemistry. This technique may also be performed by immobilizing eukaryotic or prokaryotic cells or lipid vesicles exhibiting an endogenous or a recombinantly expressed GlyT1 protein at their surface.
[0302]The main advantage of the method is that it allows the determination of the association rate between the GlyT1 protein and molecules interacting with the GlyT1 protein. It is thus possible to select specifically ligand molecules interacting with the GlyT1 protein, or a fragment thereof, through strong or conversely weak association constants.
E. Candidate Ligands Obtained Through a Two-Hybrid Screening Assay
[0303]The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields and Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain of the yeast Gal4 protein. This technique is also described in U.S. Pat. No. 5,667,973 and U.S. Pat. No. 5,283,173 (Fields et al.) the technical teachings of both patents being herein incorporated by reference.
[0304]The general procedure of library screening by the two-hybrid assay may be performed as described by Harper et al. (1993) or as described by Cho et al. (1998) or also Fromont-Racine et al. (1997).
[0305]The bait protein or polypeptide comprises, consists essentially of, or consists of a GlyT1 polypeptide or a fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of any of SEQ ID NOs:26-33, or encoded by any of SEQ ID NOs:2-9 or 14-21.
[0306]More precisely, the nucleotide sequence encoding the GlyT1 polypeptide or a fragment or variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, the fused nucleotide sequence being inserted into a suitable expression vector, for example pAS2 or pM3.
[0307]Then, a human cDNA library is constructed in a specially designed vector, such that the human cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional domain of the GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides encoded by the nucleotide inserts of the human cDNA library are termed "prey" polypeptides.
[0308]A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT gene that is placed under the control of a regulation sequence that is responsive to the binding of a complete Gal4 protein containing both the transcriptional activation domain and the DNA binding domain. For example, the vector pG5EC may be used.
[0309]Two different yeast strains are also used. As an illustrative but non limiting example the two different yeast strains may be the following: [0310]Y190, the phenotype of which is (MATa, Leu2-3, 112 ura3-12, trp1-901, his3-D200, ade2-101, gal4Dgal180D URA3 GAL-LacZ, LYS GAL-HIS3, cyh); [0311]Y187, the phenotype of which is (MATa gal4 gal80 his3 trp1-901 ade2-101 ura3-52 leu2-3, 112 URA3 GAL-lacZmet.sup.-), which is the opposite mating type of Y190.
[0312]Briefly, 20 μg of pAS2/GLYT1 and 20 μg of pACT-cDNA library are co-transformed into yeast strain Y190. The transformants are selected for growth on minimal media lacking histidine, leucine and tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive colonies are screened for beta galactosidase by filter lift assay. The double positive colonies (His.sup.+, beta-gal.sup.+) are then grown on plates lacking histidine, leucine, but containing tryptophan and cycloheximide (10 mg/ml) to select for loss of pAS2/GLYT1 plasmids but retention of pACT-cDNA library plasmids. The resulting Y190 strains are mated with Y187 strains expressing GLYT1 or non-related control proteins; such as cyclophilin B, lamin, or SNF1, as Gal4 fusions as described by Harper et al. (1993) and by Bram et al. (Bram R J et al., 1993), and screened for beta galactosidase by filter lift assay. Yeast clones that are beta gal-after mating with the control Gal4 fusions are considered false positives.
[0313]In another embodiment of the two-hybrid method according to the invention, interaction between the GlyTT1 or a fragment or variant thereof with cellular proteins may be assessed using the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech). As described in the manual accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech), the disclosure of which is incorporated herein by reference, nucleic acids encoding the GlyT1 protein or a portion thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. A desired cDNA, preferably human cDNA, is inserted into a second expression vector such that they are in frame with DNA encoding the activation domain of GAL4. The two expression plasmids are transformed into yeast and the yeast are plated on selection medium which selects for expression of selectable markers on each of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for GAL4 dependent lacZ expression. Those cells which are positive in both the histidine selection and the lacZ assay contain interaction between GlyT1 and the protein or peptide encoded by the initially selected cDNA insert.
Methods for Identifying Modulators of GlyT1 Activity
[0314]Any of a large number of assays, agonists, and antagonists are known and can be used to assess the activity of any of the herein-described GlyT1 polypeptides. Assays include in vitro, ex vivo, and in vivo assays. For example, assays can be used in which the activity of the transporter is measured in cells (e.g. COS-7 cells, Xenopus oocytes, human embryonic kidney 293 cells) transfected with nucleic acids encoding the transporter, or using tissue homogenates or biological samples that contain cells naturally expressing the transporter (e.g. chondrocytes, placental choriocarcinoma cells, hippocampal pyramidal neurons), and using any of a large number of methods to assess transporter activity, including by detecting signal transduction molecule activity or levels, levels of transcription of genes responsive to transporter activity, etc. Compounds identified as a modulator of any of the Glyt1 polypeptides, and compounds found to physically interact with a Glyt1 polypeptide, have a large number of uses, including for the treatment of or prevention of a number of neurological and psychological disorders, e.g., disorders related to NMDA receptor signalling, such as schizophrenia.
[0315]The effect of a compound on GlyT1 activity can be assessed in any of a large number of ways, including, but not limited to, by examining glycine transport or uptake (e.g. using whole-cell patch-clamp recordings of hipposampal pyramidal neurons in vitro), synaptic transmission through vertebrate autonomic ganglia, postsynaptic nicotinic acetylcholine receptor (nAChRs) activity, N-methyl-D-aspartate receptor (NMDAR) function, any animal model for assessing NMDA receptor activity, e.g. using behavioral assays, or any other assay for assessing glycine transport in cells or in animals.
[0316]Examples of suitable ligands for use in the present assays, including agonists and antagonists, inhibitors or activators, include, but are not limited to, sarcosine (GlyT1 inhibitor), alpha-methylaminoisobutyric acid (MeAIB) (inhibitor of glycine transport), glycine methyl ester, glycine ethyl ester, 2-amino-5-phosphonovaleric acid (inhibitor of glycine transport), 7-chloro-kynurenic acid (inhibitor of glycine transport), doxepin, amitriptyline, N[3-(4'-fluorophenyl-3-4'-phenylphenoxy)propyl]sarcosine (NFPS; inhibitor), nortriptyline, as well as any compound structurally related to any of these compounds, or any other compound that interacts with or modulates any of the presently described glycine transporters. Such compounds can either be used as positive or negative controls in the herein-described assays, or can be included in the assay, as the test compound is assessed for its ability to modulate the known effect of a ligand on the transporter. These compounds having known activity on GlyT1 transporters can also preferably be used as "lead compounds" to identify related compounds with potentially enhanced properties, e.g. in terms of activity or absence of side effects.
[0317]As described above, the ability of a compound to alter the binding of a known ligand (e.g. glycine), to any of the herein-described glycine transporter, in vitro, in vivo, or ex vivo, can also be used.
[0318]Methods of assaying glycine transporter activity, and glycine transporter interacting ligands, are described in, inter alia, Horiuchi et al. (2001) PNAS 98(4):1448-53; Tsen et al. (2000) Nat Neurosci 3(2):126-32; Evans et al. (1999) FEBS Lett 463(3):301-6; Barker et al. (1999) J Physiol 514 (Pt 3):795-808; Liu et al. (1994) Biochim Biophys Acta 1194(1):176-84; Kim et al. (1994) Mol Pharmacol 45(4):608-17; Liu et al. (1992) FEBS Lett 305(2):110-4; Bergeron et al. (1998) Proc Natl Acad Sci USA 95(26): 15730-4; Nunez et al. (2000) Br J Pharmacol 129(1):200-6; the entire disclosure of each of which is herein incorporated by reference.
Methods for Inhibiting the Expression of a GlyT1 cDNA
[0319]Other therapeutic compositions according to the present invention comprise advantageously an oligonucleotide fragment of the nucleic sequence of GlyT1 as an antisense tool to inhibit the expression of the corresponding GlyT1 cDNA.
[0320]Preferred methods using antisense polynucleotide according to the present invention are the procedures described by Sczakiel et al. (1995).
[0321]Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that are complementary to the 5' end of the GlyT1 mRNA of interest. In another embodiment, a combination of different antisense polynucleotides complementary to different parts of the desired targeted gene are used.
[0322]Preferred antisense polynucleotides according to the present invention are complementary to a sequence of the mRNAs of GlyT1 that contains either the translation initiation codon ATG or a splicing donor or acceptor site.
[0323]The antisense nucleic acids should have a length and melting temperature sufficient to permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the GlyT1 mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene therapy are disclosed in Green et al. (1986) and Izant and Weintraub (1984), the disclosures of which are incorporated herein by reference.
[0324]In some strategies, antisense molecules are obtained by reversing the orientation of the GlyT1 coding region with respect to a promoter so as to transcribe the opposite strand from that which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. Another approach involves transcription of GlyT1 antisense nucleic acids in vivo by operably linking DNA containing the antisense sequence to a promoter in a suitable expression vector.
[0325]Alternatively, suitable antisense strategies are those described by Rossi et al. (1991), in International Application Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in European Patent Application No. EP 0 572 287 A2.
[0326]An alternative to the antisense technology that is used according to the present invention comprises using ribozymes that will bind to a target sequence via their complementary polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site (e.g., "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme comprises (1) sequence specific binding to the target RNA via complementary antisense sequences; (2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of cleavage products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense ribozymes according to the present invention are prepared as described by Sczakiel et al. (1995), the specific preparation procedures being referred to in said article being herein incorporated by reference.
Treatment of Neurological and Psychiatric Disorders
[0327]The present GlyT1 polypeptides, polynucleotides, and modulators thereof, can be used to treat or prevent any of a large number of diseases or conditions. For example, any disease, disorder, or condition associated with an elevated or reduced level of glycine or glycine transporter activity can be treated or prevented by modulating the activity or expression of any of the herein-described polypeptides.
[0328]In one, preferred embodiment, any of the present polypeptides, polynucleotides, or modulators is used to treat or prevent a condition associated with abnormal NMDA receptor activity.
[0329]NMDA receptors have been implicated in a large number of neurological and psychological functions, including memory and learning. For example, as decreased function of NMDA-mediated neurotransmission has been suggested to contribute to the symptoms of schizophrenia (Olney and Farber, Archives General Psychiatry 52: 998-1007 (1996), agents that inhibit GlyT1 transporters (and thus increase glycine activation of NMDA receptors), can be used to treat schizophrenia or other psychotic conditions. Such inhibitors can also be used to treat dementia-associated disorders, as well as other conditions such as attention deficit disorders and organic brain syndromes. In addition, activators of the transporters (which cause decreased glycine-activation of NMDA receptors) can be used to treat neuronal death associated with stroke or head trauma, as well as neurodegenerative diseases such as Alzheimer's disease, multi-infarct dementia, AIDS dementia, Parkinson's disease, Huntington's disease, or amyotrophic lateral sclerosis.
Pharmaceutical and Physiologically Acceptable Compositions and Administration Thereof
[0330]To treat or present any of the herein-described disorders using any of the compounds described herein, the compounds may be prepared utilizing readily available starting materials and employing common synthetic methodologies well-known to those skilled in the art.
[0331]The effective dose of the compound can vary, depending upon factors such as the condition of the patient, the severity of the symptoms of the disorder, and the manner in which the pharmaceutical composition is administered. For human patients, the effective dose of typical compounds generally requires administering the compound in an amount of at least about 1, often at least about 10, and frequently at least about 25 mg/24 hr./patient. For human patients, the effective dose of typical compounds requires administering the compound which generally does not exceed about 500, often does not exceed about 400, and frequently does not exceed about 300 mg/24 hr./patient. In addition, administration of the effective dose is such that the concentration of the compound within the plasma of the patient normally does not exceed 500 ng/ml, and frequently does not exceed 100 ng/ml.
[0332]The compounds of the present invention can be administered to a patient at dosage levels in the range of about 0.1 to about 1,000 mg per day. For a normal human adult having a body weight of about 70 kilograms, a dosage in the range of about 0.01 to about 100 mg per kilogram of body weight per day is sufficient. The specific dosage used, however, can vary. For example, the dosage can depend on a numbers of factors including the requirements of the patient, the severity of the condition being treated, and the pharmacological activity of the compound being used. The determination of optimum dosages for a particular patient is well-known to those skilled in the art. One preferred dosage is about 10 mg to about 70 mg per day. In choosing a regimen for patients suffering from psychotic illness it may frequently be necessary to begin with a dosage of from about 30 to about 70 mg per day and when the condition is under control to reduce the dosage as low as from about 1 to about 10 mg per day. The exact dosage will depend upon the mode of administration, form in which administered, the subject to be treated and the body weight of the subject to be treated, and the preference and experience of the physician or veterinarian in charge.
[0333]Dosage levels of the order of from about 0.1 mg to about 140 mg per kilogram of body weight per day are useful in the treatment of the above-indicated conditions (about 0.5 mg to about 7 g per patient per day). The amount of active ingredient that may be combined with the carrier materials to produce a single dosage form will vary depending upon the host treated and the particular mode of administration. Dosage unit forms will generally contain between from about 1 mg to about 500 mg of an active ingredient.
[0334]It will be understood, however, that the specific dose level for any particular patient will depend upon a variety of factors including the activity of the specific compound employed, the age, body weight, general health, sex, diet, time of administration, route of administration, and rate of excretion, drug combination and the severity of the particular disease undergoing therapy.
[0335]Preferred compounds useful according to the method of the present invention have the ability to pass across the blood-brain barrier of the patient. As such, such compounds have the ability to enter the central nervous system of the patient. The log P values of typical compounds useful in carrying out the present invention generally are greater than 0, often are greater than about 1, and frequently are greater than about 1.5. The log P values of such typical compounds generally are less than about 4, often are less than about 3.5, and frequently are less than about 3. Log P values provide a measure of the ability of a compound to pass across a diffusion barrier, such as a biological membrane. See, Hansch, et al., J. Med. Chem., Vol. 11, p. 1 (1968). Alternatively, the compositions of the present invention can bypass the blood brain barrier through the use of compositions and methods known in the art for bypassing the blood brain barrier (e.g., U.S. Pat. Nos. 5,686,416; 5,994,392, incorporated by reference in their entireties) or can be injected directly into the brain. Suitable areas include the cerebral cortex, cerebellum, midbrain, brainstem, hypothalamus spinal cord and ventricular tissue, and areas of the PNS including the carotid body and the adrenal medulla. The compositions can be administered in as a bolus or through the use of other methods such as an osmotic pump.
[0336]The compounds of the present invention can be administered to a patient alone or as part of a composition that contains other components such as excipients, diluents, and carriers, all of which are well-known in the art. The compositions can be administered to humans and animals either orally, rectally, parenterally (intravenous, by intramuscularly or subcutaneously), intracistemally, intravaginally, intraperitoneally, intravesically, locally (powders, ointments or drops), or as a buccal or nasal spray.
[0337]Compositions suitable for parenteral injection can comprise physiologically acceptable sterile aqueous or nonaqueous solutions, dispersions, suspensions or emulsions, and sterile powders for reconstitution into sterile injectable solutions or dispersions. Examples of suitable aqueous and nonaqueous carriers, diluents, solvents or vehicles include water, ethanol, polyols (propyleneglycol, polyethyleneglycol, glycerol, and the like), suitable mixtures thereof, vegetable oils (such as olive oil) and injectable organic esters such as ethyl oleate. Proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersions and by the use of surfactants.
[0338]These compositions can also contain adjuvants such as preserving, wetting, emulsifying, and dispensing agents. Prevention of the action of microorganisms can be ensured by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, and the like. It may also be desirable to include isotonic agents, for example sugars, sodium chloride, and the like. Prolonged absorption of the injectable pharmaceutical form can be brought about by the use of agents delaying absorption, for example, aluminum monostearate and gelatin.
[0339]Solid dosage forms for oral administration include capsules, tablets, pills, powders, and granules. In such solid dosage forms, the active compound is admixed with at least one customary inert excipient (or carrier) such as sodium citrate or dicalcium phosphate or (a) fillers or extenders, as for example, starches, lactose, sucrose, glucose, mannitol, and silicic acid; (b) binders, as for example, carboxymethylcellulose, alignates, gelatin, polyvinylpyrrolidone, sucrose and acacia; (c) humectants, as for example, glycerol; (d) disintegrating agents, as for example, agar-agar, calcium carbonate, potato or tapioca starch, alginic acid, certain complex silicates and sodium carbonate; (e) solution retarders, as for example paraffin; (f) absorption accelerators, as for example, quaternary ammonium compounds; (g) wetting agents, as for example, cetyl alcohol and glycerol monostearate; (h) adsorbents, as for example, kaolin and bentonite; and (i) lubricants, as for example, talc, calcium stearate, magnesium stearate, solid polyethylene glycols, sodium lauryl sulfate, or mixtures thereof. In the case of capsules, tablets, and pills, the dosage forms may also comprise buffering agents.
[0340]Solid compositions of a similar type may also be employed as fillers in soft and hard-filled gelatin capsules using such excipients as lactose or milk sugar as well as high molecular weight polyethylene glycols, and the like. Solid dosage forms such as tablets, dragees, capsules, pills, and granules can be prepared with coatings and shells, such as enteric coatings and others well-known in the art. They may contain opacifying agents and can also be of such composition that they release the active compound or compounds in a certain part of the intestinal tract in a delayed manner. Examples of embedding compositions which can be used are polymeric substances and waxes. The active compounds can also be in micro-encapsulated form, if appropriate, with one or more of the above-mentioned excipients.
[0341]Liquid dosage forms for oral administration include pharmaceutically acceptable emulsions, solutions, suspensions, syrups, and elixirs. In addition to the active compounds, the liquid dosage forms may contain inert diluents commonly used in the art, such as water or other solvents, solubilizing agents and emulsifiers, as for example, ethyl alcohol, isopropyl alcohol, ethyl carbonate, ethyl acetate, benzyl alcohol, benzyl benzoate, propylene glycol, 1,3-butylene glycol, dimethylformamide, oils, in particular, cottonseed oil, groundnut oil, corn germ oil, olive oil, castor oil and sesame oil, glycerol, tetrahydrofurfuryl alcohol, polyethylene glycols and fatty acid esters of sorbitan or mixtures of these substances, and the like. Besides such inert diluents, the composition can also include adjuvants, such as wetting agents, emulsifying and suspending agents, sweetening, flavoring, and perfuming agents.
[0342]Suspensions, in addition to the active compounds, may contain suspending agents, as for example, ethoxylated isostearyl alcohols, polyoxyethylene sorbitol and sorbitan esters, microcrystalline cellulose, aluminum metahydroxide, bentonite, agar-agar and tragacanth, or mixtures of these substances, and the like.
[0343]Compositions for rectal administrations are preferably suppositories which can be prepared by mixing the compounds of the present invention with suitable nonirritating excipients or carriers such as cocoa butter, polyethylene glycol or a suppository wax, which are solid at ordinary temperatures but liquid at body temperature and therefore, melt in the rectum or vaginal cavity and release the active component.
[0344]Dosage forms for topical administration of a compound of this invention include ointments, powders, sprays, and inhalants. The active component is admixed under sterile conditions with a physiologically acceptable carrier and any preservative, buffers, or propellants as may be required. Ophthalmic formulations, eye ointments, powders, and solutions are also contemplated as being within the scope of this invention.
[0345]In addition, the compounds of the present invention can exist in unsolvated as well as solvated forms with pharmaceutically acceptable solvents such as water, ethanol, and the like. In general, the solvated forms are considered equivalent to the unsolvated forms for the purposes of the present invention.
[0346]Aqueous suspensions contain the active materials in admixture with excipients suitable for the manufacture of aqueous suspensions. Such excipients are suspending agents, for example sodium carboxymethylcellulose, methylcellulose, hydropropylmethylcellulose, sodium alginate, polyvinylpyrrolidone, gum tragacanth and gum acacia; dispersing or wetting agents may be a naturally-occurring phosphatide, for example, lecithin, or condensation products of an alkylene oxide with fatty acids, for example polyoxyethylene stearate, or condensation products of ethylene oxide with long chain aliphatic alcohols, for example heptadecaethyleneoxycetanol, or condensation products of ethylene oxide with partial esters derived from fatty acids and a hexitol such as polyoxyethylene sorbitol monooleate, or condensation products of ethylene oxide with partial esters derived from fatty acids and hexitol anhydrides, for example polyethylene sorbitan monooleate. The aqueous suspensions may also contain one or more preservatives, for example ethyl, or n-propyl p-hydroxybenzoate, one or more coloring agents, one or more flavoring agents, and one or more sweetening agents, such as sucrose or saccharin. Oily suspensions may be formulated by suspending the active ingredients in a vegetable oil, for example arachis oil, olive oil, sesame oil or coconut oil, or in a mineral oil such as liquid paraffin. The oily suspensions may contain a thickening agent, for example beeswax, hard paraffin or cetyl alcohol. Sweetening agents such as those set forth above, and flavoring agents may be added to provide palatable oral preparations. These compositions may be preserved by the addition of an anti-oxidant such as ascorbic acid.
[0347]Dispersible powders and granules suitable for preparation of an aqueous suspension by the addition of water provide the active ingredient in admixture with a dispersing or wetting agent, suspending agent and one or more preservatives. Suitable dispersing or wetting agents and suspending agents are exemplified by those already mentioned above. Additional excipients, for example sweetening, flavoring and coloring agents, may also be present.
[0348]Pharmaceutical compositions of the invention may also be in the form of oil-in-water emulsions. The oily phase may be a vegetable oil, for example olive oil or arachis oil, or a mineral oil, for example liquid paraffin or mixtures of these. Suitable emulsifying agents may be naturally-occurring gums, for example gum acacia or gum tragacanth, naturally-occurring phosphatides, for example soy bean, lecithin, and esters or partial esters derived from fatty acids and hexitol, anhydrides, for example sorbitan monoleate, and condensation products of the said partial esters with ethylene oxide, for example polyoxyethylene sorbitan monoleate. The emulsions may also contain sweetening and flavoring agents.
[0349]Syrups and elixirs may be formulated with sweetening agents, for example glycerol, propylene glycol, sorbitol or sucrose. Such formulations may also contain a demulcent, a preservative and flavoring and coloring agents. The pharmaceutical compositions may be in the form of a sterile injectable aqueous or oleaginous suspension. This suspension may be formulated according to the known art using those suitable dispersing or wetting agents and suspending agents which have been mentioned above. The sterile injectable preparation may also be sterile injectable solution or suspension in a non-toxic parentally acceptable diluent or solvent, for example as a solution in 1,3-butanediol. Among the acceptable vehicles and solvents that may be employed are water. Ringer's solution and isotonic sodium chloride solution. In addition, sterile, fixed oils are conventionally employed as a solvent or suspending medium. For this purpose any bland fixed oil may be employed including synthetic mono- or diglycerides. In addition, fatty acids such as oleic acid find use in the preparation of injectables.
[0350]The compounds of general formula I may also be administered in the form of suppositories for rectal administration of the drug. These compositions can be prepared by mixing the drug with a suitable non-irritating excipient which is solid at ordinary temperatures but liquid at the rectal temperature and will therefore melt in the rectum to release the drug. Such materials are cocoa butter and polyethylene glycols.
[0351]Compounds of general formula I may be administered parenterally in a sterile medium. The drug, depending on the vehicle and concentration used, can either be suspended or dissolved in the vehicle. Advantageously, adjuvants such as local anesthetics, preservatives and buffering agents can be dissolved in the vehicle.
COMPUTER-RELATED EMBODIMENTS
[0352]As used herein the term "nucleic acid codes of the invention" encompass the nucleotide sequences comprising, consisting essentially of, or consisting of any one of the following: a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID NO: 1, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of any of SEQ ID NOs:2-9; b) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000 nucleotides of any of SEQ ID NOs:2-9, or the full-length sequence thereof, c) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of any of SEQ ID NOs: 14-21, or the full-length sequence thereof; and, d) a nucleotide sequence complementary to any one of the preceding nucleotide sequences. The "nucleic acid codes of the invention" further encompass nucleotide sequences homologous to any of the above-described sequences. Homologous sequences refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% homology to these contiguous spans. Homology may be determined using any method described herein, including BLAST2N with the default parameters or with any modified parameters. Homologous sequences also may include RNA sequences in which uridines replace the thymines in the nucleic acid codes of the invention. It will be appreciated that the nucleic acid codes of the invention can be represented in the traditional single character format (See the inside back cover of Stryer, Lubert. Biochemistry, 3rd edition. W. H Freeman & Co., New York) or in any other format or code which records the identity of the nucleotides in a sequence.
[0353]As used herein the term "polypeptide codes of the invention" encompass the polypeptide sequences comprising a contiguous span of at least 6, 8, 10, 12, 15, 20, 25, 30, 40, 50, 100 or more amino acids of any of SEQ ID NOs:26-33, or a sequence encoded by any of SEQ ID NOs:2-9 or 14-21. It will be appreciated that the polypeptide codes of the invention can be represented in the traditional single character format or three letter format (See the inside back cover of Stryer, Lubert. Biochemistry, 3rd edition. W. H Freeman & Co., New York) or in any other format or code which records the identity of the polypeptides in a sequence.
[0354]It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a process for storing information on a computer medium. A skilled artisan can readily adopt any of the presently known methods for recording information on a computer readable medium to generate manufactures comprising one or more of the nucleic acid codes of the invention, or one or more of the polypeptide codes of the invention. Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of the invention. Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention.
[0355]Computer readable media include magnetically readable media, optically readable media, electronically readable media and magnetic/optical media. For example, the computer readable media may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to those skilled in the art.
[0356]Embodiments of the present invention include systems, particularly computer systems which store and manipulate the sequence information described herein. As used herein, "a computer system" refers to the hardware components, software components, and data storage components used to analyze the nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention. In one embodiment, the computer system is a Sun Enterprise 1000 server (Sun Microsystems, Palo Alto, Calif.). The computer system preferably includes a processor for processing, accessing and manipulating the sequence data. The processor can be any well-known type of central processing unit, such as the Pentium III from Intel Corporation, or similar processor from Sun, Motorola, Compaq or International Business Machines.
[0357]Preferably, the computer system is a general purpose system that comprises the processor and one or more internal data storage components for storing data, and one or more data retrieving devices for retrieving the data stored on the data storage components. A skilled artisan can readily appreciate that any one of the currently available computer systems are suitable.
[0358]In one particular embodiment, the computer system includes a processor connected to a bus which is connected to a main memory (preferably implemented as RAM) and one or more internal data storage devices, such as a hard drive and/or other computer readable media having data recorded thereon. In some embodiments, the computer system further includes one or more data retrieving devices for reading the data stored on the internal data storage devices.
[0359]The data retrieving device may represent, for example, a floppy disk drive, a compact disk drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device is a removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. containing control logic and/or data recorded thereon. The computer system may advantageously include or be programmed by appropriate software for reading the control logic and/or the data from the data storage component once inserted in the data retrieving device.
[0360]The computer system includes a display which is used to display output to a computer user. It should also be noted that the computer system can be linked to other computer systems in a network or wide area network to provide centralized access to the computer system.
[0361]Software for accessing and processing the nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention (such as search tools, compare tools, and modeling tools etc.) may reside in main memory during execution.
[0362]In some embodiments, the computer system may further comprise a sequence comparer for comparing the above-described nucleic acid codes of the invention or the polypeptide codes of the invention stored on a computer readable medium to reference nucleotide or polypeptide sequences stored on a computer readable medium. A "sequence comparer" refers to one or more programs which are implemented on the computer system to compare a nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences and/or compounds including but not limited to peptides, peptidomimetics, and chemicals stored within the data storage means. For example, the sequence comparer may compare the nucleotide sequences of nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention stored on a computer readable medium to reference sequences stored on a computer readable medium to identify homologies, motifs implicated in biological function, or structural motifs. The various sequence comparer programs identified elsewhere in this patent specification are particularly contemplated for use in this aspect of the invention.
[0363]In one embodiment, a process is used for comparing a new nucleotide or protein sequence with a database of sequences in order to determine the homology levels between the new sequence and the sequences in the database. The database of sequences can be a private database stored within the computer system, or a public database such as GENBANK, PIR OR SWISSPROT that is available through the Internet.
[0364]The process begins at a start state and then moves to a state wherein the new sequence to be compared is stored to a memory in a computer system. As discussed above, the memory could be any type of memory, including RAM or an internal storage device.
[0365]The process then moves to a state wherein a database of sequences is opened for analysis and comparison. The process then moves to a state wherein the first sequence stored in the database is read into a memory on the computer. A comparison is then performed to determine if the first sequence is the same as the second sequence. It is important to note that this step is not limited to performing an exact comparison between the new sequence and the first sequence in the database. Well-known methods are known to those of skill in the art for comparing two nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into one sequence in order to raise the homology level between the two tested sequences. The parameters that control whether gaps or other features are introduced into a sequence during comparison are normally entered by the user of the computer system.
[0366]Once a comparison of the two sequences has been performed, a determination is made at a decision state whether the two sequences are the same. Of course, the term "same" is not limited to sequences that are absolutely identical. Sequences that are within the homology parameters entered by the user will be marked as "same" in the process.
[0367]If a determination is made that the two sequences are the same, the process moves to a state wherein the name of the sequence from the database is displayed to the user. This state notifies the user that the sequence with the displayed name fulfills the homology constraints that were entered. Once the name of the stored sequence is displayed to the user, the process moves to a decision state wherein a determination is made whether more sequences exist in the database. If no more sequences exist in the database, then the process terminates at an end state. However, if more sequences do exist in the database, then the process moves to a state wherein a pointer is moved to the next sequence in the database so that it can be compared to the new sequence. In this manner, the new sequence is aligned and compared with every sequence in the database.
[0368]It should be noted that if a determination had been made at the decision state that the sequences were not homologous, then the process would move immediately to the decision state in order to determine if any other sequences were available in the database for comparison.
[0369]Accordingly, one aspect of the present invention is a computer system comprising a processor, a data storage device having stored thereon a nucleic acid code of the invention or a polypeptide code of the invention, a data storage device having retrievably stored thereon reference nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the invention or polypeptide code of the invention and a sequence comparer for conducting the comparison. The sequence comparer may indicate a homology level between the sequences compared or identify motifs implicated in biological function and structural motifs in the nucleic acid code of the invention and polypeptide codes of the invention or it may identify structural motifs in sequences which are compared to these nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may have stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the invention or polypeptide codes of the invention.
[0370]Another aspect of the present invention is a method for determining the level of homology between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a computer program which determines homology levels and determining homology between the nucleic acid code and the reference nucleotide sequence with the computer program. The computer program may be any of a number of computer programs for determining homology levels, including those specifically enumerated herein, including BLAST2N with the default parameters or with any modified parameters. The method may be implemented using the computer systems described above. The method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic acid codes of the invention through the use of the computer program and determining homology between the nucleic acid codes and reference nucleotide sequences.
[0371]In another embodiment, a process is carried out in a computer for determining whether two sequences are homologous. The process begins at a start state and then moves to a state wherein a first sequence to be compared is stored to a memory. The second sequence to be compared is then stored in a memory. The process then moves to a state wherein the first character in the first sequence is read and then to a state wherein the first character of the second sequence is read. It should be understood that if the sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. If the sequence is a protein sequence, then it should be in the single letter amino acid code so that the first and sequence sequences can be easily compared.
[0372]A determination is then made at a decision state whether the two characters are the same. If they are the same, then the process moves to a state wherein the next characters in the first and second sequences are read. A determination is then made whether the next characters are the same. If they are, then the process continues this loop until two characters are not the same. If a determination is made that the next two characters are not the same, the process moves to a decision state to determine whether there are any more characters either sequence to read.
[0373]If there are no more characters to read, then the process moves to a state wherein the level of homology between the first and second sequences is displayed to the user. The level of homology is determined by calculating the proportion of characters between the sequences that were the same out of the total number of sequences in the first sequence. Thus, if every character in a first nucleotide sequence aligned with a every character in a second sequence, the homology level would be 100%.
[0374]Alternatively, the computer program may be a computer program which compares the nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide sequences in order to determine whether the nucleic acid code of the invention differs from a reference nucleic acid sequence at one or more positions. Optionally such a program records the length and identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the computer program may be a program which determines whether the nucleotide sequences of the nucleic acid codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to a reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single base substitution, insertion, or deletion.
[0375]Another aspect of the present invention is a method for determining the level of homology between a polypeptide code of the invention and a reference polypeptide sequence, comprising the steps of reading the polypeptide code of the invention and the reference polypeptide sequence through use of a computer program which determines homology levels and determining homology between the polypeptide code and the reference polypeptide sequence using the computer program.
[0376]Accordingly, another aspect of the present invention is a method for determining whether a nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through use of a computer program which identifies differences between nucleic acid sequences and identifying differences between the nucleic acid code and the reference nucleotide sequence with the computer program. In some embodiments, the computer program is a program which identifies single nucleotide polymorphisms. The method may be implemented by the computer systems described above and the method described supra. The method may also be performed by reading at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the invention and the reference nucleotide sequences through the use of the computer program and identifying differences between the nucleic acid codes and the reference nucleotide sequences with the computer program.
[0377]In other embodiments the computer based system may further comprise an identifier for identifying features within the nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention.
[0378]An "identifier" refers to one or more programs which identifies certain features within the above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the polypeptide codes of the invention. In one embodiment, the identifier may comprise a program which identifies an open reading frame in the cDNAs codes of the invention.
[0379]In another embodiment, an identifier process is used to detect the presence of a feature in a sequence. The process begins at a start state and then moves to a state wherein a first sequence that is to be checked for features is stored to a memory in the computer system. The process then moves to a state wherein a database of sequence features is opened. Such a database would include a list of each feature's attributes along with the name of the feature. For example, a feature name could be "Initiation Codon" and the attribute would be "ATG". Another example would be the feature name "TAATAA Box" and the feature attribute would be "TAATAA". An example of such a database is produced by the University of Wisconsin Genetics Computer Group (see Worldwide Website: gcg.com).
[0380]Once the database of features is opened, the process moves to a state wherein the first feature is read from the database. A comparison of the attribute of the first feature with the first sequence is then made. A determination is then made at a decision state whether the attribute of the feature was found in the first sequence. If the attribute was found, then the process moves to a state wherein the name of the found feature is displayed to the user.
[0381]The process then moves to a decision state wherein a determination is made whether more features exist in the database. If no more features do exist, then the process terminates at an end state. However, if more features do exist in the database, then the process reads the next sequence feature and loops back to the state wherein the attribute of the next feature is compared against the first sequence.
[0382]It should be noted, that if the feature attribute is not found in the first sequence at the decision state, the process moves directly to the decision state in order to determine if any more features exist in the database.
[0383]In another embodiment, the identifier may comprise a molecular modeling program which determines the 3-dimensional structure of the polypeptides codes of the invention. In some embodiments, the molecular modeling program identifies target sequences that are most compatible with profiles representing the structural environments of the residues in known three-dimensional protein structures. (See, e.g., U.S. Pat. No. 5,436,850). In another technique, the known three-dimensional structures of proteins in a given family are superimposed to define the structurally conserved regions in that family. This protein modeling technique also uses the known three-dimensional structure of a homologous protein to approximate the structure of the polypeptide codes of the invention. (See e.g., U.S. Pat. No. 5,557,535). Conventional homology modeling techniques have been used routinely to build models of proteases and antibodies. (Sowdhamini et al., (1997)). Comparative approaches can also be used to develop three-dimensional protein models when the protein of interest has poor sequence identity to template proteins. In some cases, proteins fold into similar three-dimensional structures despite having very weak sequence identities. For example, the three-dimensional structures of a number of helical cytokines fold in similar three-dimensional topology in spite of weak sequence homology.
[0384]The recent development of threading methods now enables the identification of likely folding patterns in a number of situations where the structural relatedness between target and template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the threading output using a distance geometry program DRAGON to construct a low resolution model, and a full-atom representation is constructed using a molecular modeling package such as QUANTA.
[0385]According to this 3-step approach, candidate templates are first identified by using the novel fold recognition algorithm MST, which is capable of performing simultaneous threading of multiple aligned sequences onto one or more 3-D structures. In a second step, the structural equivalencies obtained from the MST output are converted into interresidue distance restraints and fed into the distance geometry program DRAGON, together with auxiliary information obtained from secondary structure predictions. The program combines the restraints in an unbiased manner and rapidly generates a large number of low resolution model confirmations. In a third step, these low resolution model confirmations are converted into full-atom models and subjected to energy minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al., (1997)).
[0386]The results of the molecular modeling analysis may then be used in rational drug design techniques to identify agents which modulate the activity of the polypeptide codes of the invention.
[0387]Accordingly, another aspect of the present invention is a method of identifying a feature within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program which identifies features therein and identifying features within the nucleic acid code(s) or polypeptide code(s) with the computer program. In one embodiment, computer program comprises a computer program which identifies open reading frames. In a further embodiment, the computer program identifies structural motifs in a polypeptide sequence. In another embodiment, the computer program comprises a molecular modeling program. The method may be performed by reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the invention or the polypeptide codes of the invention through the use of the computer program and identifying features within the nucleic acid codes or polypeptide codes with the computer program.
[0388]The nucleic acid codes of the invention or the polypeptide codes of the invention may be stored and manipulated in a variety of data processor programs in a variety of formats. For example, they may be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list is intended not to limit the invention but to provide guidance to programs and databases which are useful with the nucleic acid codes of the invention or the polypeptide codes of the invention. The programs and databases which may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NTCBI), BLASTN and BLASTX (Altschul et al, 1990), FASTA (Pearson and Lipman, 1988), FASTDB (Brutlag et al., 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), Cerius2.DBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database. Many other programs and data bases would be apparent to one of skill in the art given the present disclosure.
[0389]Motifs which may be detected using the above programs include sequences encoding leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites.
[0390]Throughout this application, various publications, patents and published patent applications are cited. The disclosures of these publications, patents and published patent specification referenced in this application are hereby incorporated by reference into the present disclosure to more fully describe the state of the art to which this invention pertains.
EXAMPLES
Example 1
DNA Extraction
[0391]Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a French heterogeneous population. The DNA from 100 individuals was extracted and tested for the detection of the biallelic markers.
[0392]30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by a lysis solution (50 ml final volume: 10 mM Tris pH7.6; 5 mM MgCl2; 10 mM NaCl). The solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual red cells present in the supernatant, after resuspension of the pellet in the lysis solution.
[0393]The pellet of white cells was lysed overnight at 42° C. with 3.7 ml of lysis solution composed of: [0394]3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM)/NaCl 0.4 M [0395]200 μl SDS 10% [0396]500 μl K-proteinase (2 mg K-proteinase in TE 10-2/NaCl 0.4 M).
[0397]For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm.
[0398]For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. The pellet was dried at 37° C., and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA concentration was evaluated by measuring the OD at 260 nm (1 unit OD=50 μg/ml DNA).
[0399]To determine the presence of proteins in the DNA solution, the OD 260/OD 280 ratio was determined. Only DNA preparations having a OD 260/OD 280 ratio between 1.8 and 2 were used in the subsequent examples described below.
[0400]The pool was constituted by mixing equivalent quantities of DNA from each individual.
Example 2
Amplification of Genomic DNA by PCR
[0401]The amplification of specific genomic sequences of the DNA samples of example 1 was carried out on the pool of DNA obtained previously. In addition, 50 individual samples were similarly amplified.
PCR assays were performed using the following protocol:
TABLE-US-00002 Final volume 25 μl DNA 2 ng/μl MgCl2 2 mM dNTP (each) 200 μM primer (each) 2.9 ng/μl Ampli Taq Gold DNA polymerase 0.05 unit/μl PCR buffer (10x = 0.1 M TrisHCl pH8.3 0.5M KCl) 1x
[0402]Each pair of first primers was designed using the sequence information of the GLYT1 gene disclosed herein and the OSP software (Hillier & Green, 1991). This first pair of primers was about 20 nucleotides in length and had the sequences shown as SEQ ID NOs:36 and 37.
[0403]Primers PU contain the following additional PU 5' sequence: TGTAAAACGACGGCCAGT; primers RP contain the following RP 5' sequence: CAGGAAACAGCTATGACC. The primer containing the additional PU 5' sequence is listed as SEQ ID NO:36. The primer containing the additional RP 5' sequence is listed in SEQ ID NO:37.
[0404]The synthesis of these primers was performed following the phosphoramidite method, on a GENSET UFPS 24.1 synthesizer.
[0405]DNA amplification was performed on a Genius II thermocycler. After heating at 95° C. for 10 min, 40 cycles were performed. Each cycle comprised: 30 sec at 95° C., 54° C. for 1 min, and 30 sec at 72° C. For final elongation, 10 min at 72° C. ended the amplification. The quantities of the amplification products obtained were determined on 96-well microtiter plates, using a fluorometer and Picogreen as intercalant agent (Molecular Probes).
[0406]In addition, RT-PCR was used to identify novel cDNAs present in the cells of normal and/or schizophrenic individuals. 8 novel splice variants were identified, and are shown as SEQ ID NOs: 14-21 (nucleotide sequences) and SEQ ID NOs: 26-33 (polypeptide sequences) and diagrammed in FIG. 1. Certain of the novel variants include novel exons, which are presented herein as SEQ ID NOs:2-9.
Example 3
Identification of Biallelic Markers
Sequencing of Amplified Genomic DNA
[0407]The sequencing of the amplified DNA obtained in example 2 was carried out on ABI 377 sequencers. The sequences of the amplification products were determined using automated dideoxy terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of the sequencing reactions were run on sequencing gels and the sequences were determined using gel image analysis (ABI Prism DNA Sequencing Analysis software (2.1.2 version)).
Example 4
Preparation of Antibody Compositions to the GlyT1 Protein
[0408]Substantially pure protein or polypeptide is isolated from transfected or transformed cells containing an expression vector encoding the GlyT1 protein or a portion thereof. The concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:
A. Monoclonal Antibody Production by Hybridoma Fusion
[0409]Monoclonal antibody to epitopes in the GlyT1 protein or a portion thereof can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, C., (1975) or derivative methods thereof. See, also, Harlow and Lane (1988).
[0410]Briefly, a mouse is repetitively inoculated with a few micrograms of the GlyT1 protein or a portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall, (1980), and derivative methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2.
B. Polyclonal Antibody Production by Immunization
[0411]Polyclonal antiserum containing antibodies to heterogeneous epitopes in the GlyT1 protein or a portion thereof can be prepared by immunizing suitable non-human animal with the GlyT1 protein or a portion thereof, which can be unmodified or modified to enhance immunogenicity. A suitable non-human animal is preferably a non-human mammal is selected, usually a mouse, rat, rabbit, goat, or horse. Alternatively, a crude preparation which has been enriched for GlyT1 concentration can be used to generate antibodies. Such proteins, fragments or preparations are introduced into the non-human mammal in the presence of an appropriate adjuvant (e.g. aluminum hydroxide, RIBI, etc.) which is known in the art. In addition the protein, fragment or preparation can be pretreated with an agent which will increase antigenicity, such agents are known in the art and include, for example, methylated bovine serum albumin (mBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, and keyhole limpet hemocyanin (KLH). Serum from the immunized animal is collected, treated and tested according to known procedures. If the serum contains polyclonal antibodies to undesired epitopes, the polyclonal antibodies can be purified by immunoaffinity chromatography.
[0412]Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques for producing and processing polyclonal antisera are known in the art, see for example, Mayer and Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. (1971).
[0413]Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et al., (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 μM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., (1980).
[0414]Antibody preparations prepared according to either the monoclonal or the polyclonal protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein in the body.
[0415]While the preferred embodiments of the invention has been illustrated and described, it will be appreciated that various changes can be made therein by the one skilled in the art without departing from the spirit and scope of the invention.
REFERENCES
[0416]Abbondanzo S J et al., 1993, Methods in Enzymology, Academic Press, New York, pp 803-823 [0417]Ajioka R. S. et al., Am. J. Hum. Genet., 60:1439-1447, 1997 [0418]Altschul et al., 1990, J. Mol. Biol. 215(3):403-410 [0419]Altschul et al., 1993, Nature Genetics 3:266-272 [0420]Altschul et al., 1997, Nuc. Acids Res. 25:3389-3402 [0421]Anton M. et al., 1995, J. Virol., 69: 4600-4606 [0422]Araki K et al. (1995) Proc. Natl. Acad. Sci. USA. 92(1):160-4. [0423]Aszodi et al., Proteins: Structure, Function, and Genetics, Supplement 1:38-42 (1997) [0424]Ausubel et al. (1989) Current Protocols in Molecular Biology, Green Publishing Associates and Wiley Interscience, N.Y. [0425]Baubonis W. (1993) Nucleic Acids Res. 21(9):2025-9. [0426]Beaucage et al., Tetrahedron Lett 1981, 22: 1859-1862 [0427]Bradley A., 1987, Production and analysis of chimeric mice. In: E. J. Robertson (Ed.), Teratocarcinomas and embryonic stem cells: A practical approach. IRL Press, Oxford, pp. 113. [0428]Bram R J et al., 1993, Mol. Cell Biol., 13: 4760-4769 [0429]Brown E L, Belagaje R, Ryan M J, Khorana H G, Methods Enzymol 1979; 68:109-151 [0430]Brutlag et al. Comp. App. Biosci. 6:237-245, 1990 [0431]Bush et al., 1997, J. Chromatogr., 777: 311-328. [0432]Chai H. et al. (1993) Biotechnol. Appl. Biochem. 18:259-273. [0433]Chee et al. (1996) Science, 274:610-614. [0434]Chen and Kwok Nucleic Acids Research 25:347-353 1997 [0435]Chen et al. (1987) Mol. Cell. Biol. 7:2745-2752. [0436]Chen et al. Proc. Natl. Acad, Sci. USA 94/20 10756-10761, 1997 [0437]Cho R J et al., 1998, Proc. Natl. Acad. Sci. USA, 95(7): 3752-3757. [0438]Chou J. Y., 1989, Mol. Endocrinol., 3: 1511-1514. [0439]Clark A. G. (1990) Mol. Biol. Evol. 7:111-122. [0440]Coles R, Caswell R, Rubinsztein D C, Hum Mol Genet 1998; 7:791-800 [0441]Compton J. (1991) Nature. 350(6313):91-92. [0442]Davis L. G., M. D. Dibner, and J. F. Battey, Basic Methods in Molecular Biology, ed., Elsevier Press, NY, 1986 [0443]Dempster et al., (1977) J. R. Stat. Soc., 39B:1-38. [0444]Dent D S & Latchman D S (1993) The DNA mobility shift assay. In: Transcription Factors: A Practical Approach (Latchman D S, ed.) pp 1-26. Oxford: IRL Press [0445]Eckner R. et al. (1991) EMBO J. 10:3513-3522. [0446]Edwards et Leatherbarrow, Analytical Biochemistry, 246, 1-6 (1997) [0447]Engvall, E., Meth. Enzymol. 70:419 (1980) [0448]Excoffier L. and Slatkin M. (1995) Mol. Biol. Evol., 12(5): 921-927. [0449]Feldman and Steg, 1996, Medicine/Sciences, synthese, 12:47-55 [0450]Felici F., 1991, J. Mol. Biol., Vol. 222:301-310 [0451]Fields and Song, 1989, Nature, 340: 245-246 [0452]Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980) [0453]Flotte et al. (1992) Am. J. Respir. Cell Mol. Biol. 7:349-356. [0454]Fodor et al. (1991) Science 251:767-777. [0455]Fraley et al. (1979) Proc. Natl. Acad. Sci. USA. 76:3348-3352. [0456]Fried M, Crothers D M, Nucleic Acids Res 1981; 9:6505-6525 [0457]Fromont-Racine M. et al., 1997, Nature Genetics, 16(3): 277-282. [0458]Fuller S. A. et al. (1996) Immunology in Current Protocols in Molecular Biology, Ausubel et al. Eds, John Wiley & Sons, Inc., USA. [0459]Furth P. A. et al. (1994) Proc. Natl. Acad. Sci. USA. 91:9302-9306. [0460]Garner M M, Revzin A, Nucleic Acids Res 1981; 9:3047-3060 [0461]Geysen H. Mario et al. 1984. Proc. Natl. Acad. Sci. U.S.A. 81:3998-4002 [0462]Ghosh and Bacchawat, 1991, Targeting of liposomes to hepatocytes, IN: Liver Diseases, Targeted diagnosis and therapy using specific receptors and ligands. Wu et al. Eds., Marcel Dekeker, New York, pp. 87-104. [0463]Gonnet et al., 1992, Science 256:1443-1445 [0464]Gopal (1985) Mol. Cell. Biol., 5:1188-1190. [0465]Gossen M. et al. (1992) Proc. Natl. Acad. Sci. USA. 89:5547-5551. [0466]Gossen M. et al. (1995) Science. 268:1766-1769. [0467]Graham et al. (1973) Virology 52:456-457. [0468]Green et al., Ann. Rev. Biochem. 55:569-597 (1986) [0469]Griffin et al. Science 245:967-971 (1989) [0470]Grompe, M. (1993) Nature Genetics. 5:111-117. [0471]Grompe, M. et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:5855-5892. [0472]Gu H. et al. (1993) Cell 73:1155-1164. [0473]Gu H. et al. (1994) Science 265:103-106. [0474]Guatelli J C et al. Proc. Natl. Acad. Sci. USA. 35:273-286. [0475]Hacia J G, Brody L C, Chee M S, Fodor S P, Collins F S, Nat Genet. 1996; 14(4):441-447 [0476]Hall L. A. and Smirnov I. P. (1997) Genome Research, 7:378-388. [0477]Hames B. D. and Higgins S. J. (1985) Nucleic Acid Hybridization: A Practical Approach. Hames and Higgins Ed., IRL Press, Oxford. [0478]Harju L, Weber T, Alexandrova L, Lukin M, Ranki M, Jalanko A, Clin Chem 1993; 39(11 Pt 1):2282-2287 [0479]Harland et al. (1985) J. Cell. Biol. 101:1094-1095. [0480]Harlow, E., and D. Lane. 1988. Antibodies A Laboratory Manual. Cold Spring Harbor Laboratory. pp. 53-242 [0481]Harper J W et al., 1993, Cell, 75: 805-816 [0482]Hawley M. E. et al. (1994) Am. J. Phys. Anthropol. 18:104. [0483]Henikoff and Henikoff, 1993, Proteins 17:49-61 [0484]Higgins et al., 1996, Methods Enzymol. 266:383-402 [0485]Hillier L. and Green P. Methods Appl., 1991, 1: 124-8. [0486]Hoess et al. (1986) Nucleic Acids Res, 14:2287-2300. [0487]Huang L. et al. (1996) Cancer Res 56(5):1137-1141. [0488]Huygen et al. (1996) Nature Medicine. 2(8):893-898. [0489]Izant J G, Weintraub H, Cell 1984 April; 36(4):1007-15 [0490]Julan et al. (1992) J. Gen. Virol. 73:3251-3255. [0491]Kanegae Y. et al., Nucl. Acids Res. 23:3816-3821 (1995). [0492]Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 87:2267-2268 [0493]Khoury J. et al., Fundamentals of Genetic Epidemiology, Oxford University Press, NY, 1993 [0494]Kim U-J. et al. (1996) Genomics 34:213-218. [0495]Klein et al. (1987) Nature. 327:70-73. [0496]Kohler, G. and Milstein, C., Nature 256:495 (1975) [0497]Koller et al. (1992) Annu. Rev. Immunol. 10:705-730. [0498]Kozal M J, Shah N, Shen N, Yang R, Fucini R, Merigan T C, Richman D D, Morris D, Hubbell E, Chee M, Gingeras T R, Nat Med 1996; 2(7):753-759 [0499]Lander and Schork, Science, 265, 2037-2048, 1994 [0500]Landegren U. et al. (1998) Genome Research, 8:769-776. [0501]Lange K. (1997) Mathematical and Statistical Methods for Genetic Analysis. Springer, New York. [0502]Lenhard T. et al. (1996) Gene. 169:187-190. [0503]Linton M. F. et al. (1993) J. Clin. Invest. 92:3029-3037. [0504]Liu Z. et al. (1994) Proc. Natl. Acad. Sci. USA. 91: 4528-4262. [0505]Livak et al., Nature Genetics, 9:341-342, 1995 [0506]Livak K J, Hainer J W, Hum Mutat 1994; 3(4):379-385 [0507]Lockhart et al. Nature Biotechnology 14: 1675-1680, 1996 [0508]Lucas A. H., 1994, In: Development and Clinical Uses of Haempophilus b Conjugate; [0509]Mansour S. L. et al. (1988) Nature. 336:348-352. [0510]Marshall R. L. et al. (1994) PCR Methods and Applications. 4:80-84. [0511]McCormick et al. (1994) Genet. Anal. Tech. Appl. 11:158-164. [0512]McLaughlin B. A. et al. (1996) Am. J. Hum. Genet. 59:561-569. [0513]Morton N. E., Am J. Hum. Genet., 7:277-318, 1955 [0514]Muzyczka et al. (1992) Curr. Topics in Micro. and Immunol. 158:97-129. [0515]Nada S. et al. (1993) Cell 73:1125-1135. [0516]Nagy A. et al., 1993, Proc. Natl. Acad. Sci. USA, 90: 8424-8428. [0517]Narang S A, Hsiung H M, Brousseau R, Methods Enzymol 1979; 68:90-98 [0518]Neda et al. (1991) J. Biol. Chem. 266:14143-14146. [0519]Newton et al. (1989) Nucleic Acids Res. 17:2503-2516. [0520]Nickerson D. A. et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:8923-8927. [0521]Nicolau C. et al., 1987, Methods Enzymol., 149:157-76. [0522]Nicolau et al. (1982) Biochim. Biophys. Acta. 721:185-190. [0523]Nyren P, Pettersson B, Uhlen M, Anal Biochem 1993; 208(1):171-175 [0524]O'Reilly et al. (1992) Baculovirus Expression Vectors. A Laboratory Manual. W. H. Freeman and Co., New York. [0525]Ohno et al. (1994) Science. 265:781-784. [0526]Oldenburg K. R. et al., 1992, Proc. Natl. Acad. Sci., 89:5393-5397. [0527]Orita et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86: 2776-2770. [0528]Ott J., Analysis of Human Genetic Linkage, John Hopkins University Press, Baltimore, 1991 [0529]Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental Immunology D. Wier (ed) Blackwell (1973) [0530]Parmley and Smith, Gene, 1988, 73:305-318 [0531]Pastinen et al., Genome Research 1997; 7:606-614 [0532]Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. USA 85(8):2444-2448 [0533]Pease S. and William R. S., 1990, Exp. Cell. Res., 190: 209-211. [0534]Perlin et al. (1994) Am. J. Hum. Genet. 55:777-787. [0535]Peterson et al., 1993, Proc. Natl. Acad. Sci. USA, 90: 7593-7597. [0536]Pietu et al. Genome Research 6:492-503, 1996 [0537]Potter et al. (1984) Proc. Natl. Acad. Sci. U.S.A. 81(22):7161-7165. [0538]Ramunsen et al., 1997, Electrophoresis, 18: 588-598. [0539]Reid L. H. et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:4299-4303. [0540]Risch, N. and Merikangas, K. (Science, 273:1516-1517, 1996 [0541]Robertson E., 1987, Embryo-derived stem cell lines. In: E. J. Robertson Ed. Teratocarcinomas and embryonic stem cells. a practical approach. IRL Press, Oxford, pp. 71. [0542]Rossi et al., Pharmacol.
Ther. 50:245-254, (1991) [0543]Roth J. A. et al. (1996) Nature Medicine. 2(9):985-991. [0544]Roux et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:9079-9083. [0545]Ruano et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:6296-6300. [0546]Sambrook, J., Fritsch, E. F., and T. Maniatis. (1989) Molecular Cloning. A Laboratory Manual. 2ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. [0547]Samson M, et al. (1996) Nature, 382(6593):722-725. [0548]Samulski et al. (1989) J. Virol. 63:3822-3828. [0549]Sanchez-Pescador R. (1988) J. Clin. Microbiol. 26(10):1934-1938. [0550]Sarkar, G. and Sommer S. S. (1991) Biotechniques. [0551]Sauer B. et al. (1988) Proc. Natl. Acad. Sci. U.S.A. 85:5166-5170. [0552]Schaid D. J. et al., Genet. Epidemiol., 13:423-450, 1996 [0553]Schedl A. et al., 1993a, Nature, 362: 258-261. [0554]Schedl et al., 1993b, Nucleic Acids Res., 21: 4783-4787. [0555]Schena et al. Science 270:467-470, 1995 [0556]Schena et al., 1996, Proc Natl Acad Sci USA, 93(20):10614-10619. [0557]Schneider et al. (1997) Arlequin: A Software For Population Genetics Data Analysis. University of Geneva. [0558]Schwartz and Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence and Structure, Washington: National Biomedical Research Foundation [0559]Sczakiel G. et al. (1995) Trends Microbiol. 3(6):213-217. [0560]Shay J. W. et al., 1991, Biochem. Biophys. Acta, 1072: 1-7. [0561]Sheffield, V. C. et al. (1991) Proc. Natl. Acad. Sci. U.S.A. 49:699-706. [0562]Shizuya et al. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:8794-8797. [0563]Shoemaker D D, et al., Nat Genet. 1996; 14(4):450-456 [0564]Smith (1957) Ann. Hum. Genet. 21:254-276. [0565]Smith et al. (1983) Mol. Cell. Biol. 3:2156-2165. [0566]Sosnowski R G, et al., Proc Natl Acad Sci USA 1997; 94:1119-1123 [0567]Sowdhamini et al., Protein Engineering 10:207, 215 (1997) [0568]Spielmann S, and Ewens W. J., Am. J. Hum. Genet., 62:450-458, 1998 [0569]Spielmann S. et al., Am. J. Hum. Genet., 52:506-516, 1993 [0570]Sternberg N. L. (1992) Trends Genet. 8:1-16. [0571]Sternberg N. L. (1994) Mamm. Genome. 5:397-404. [0572]Stryer, L., Biochemistry, 4th edition, 1995 [0573]Syvanen A C, Clin Chim Acta 1994; 226(2):225-236 [0574]Szabo A. et al. Curr Opin Struct Biol 5, 699-705 (1995) [0575]Tacson et al. (1996) Nature Medicine. 2(8):888-892. [0576]Te Riele et al. (1990) Nature. 348:649-651. [0577]Terwilliger J. D. and Ott J., Handbook of Human Genetic Linkage, John Hopkins University Press, London, 1994 [0578]Thomas K. R. et al. (1986) Cell. 44:419-428. [0579]Thomas K. R. et al. (1987) Cell. 51:503-512. [0580]Thompson et al., 1994, Nucleic Acids Res. 22(2):4673-4680 [0581]Tur-Kaspa et al. (1986) Mol. Cell. Biol. 6:716-718. [0582]Tyagi et al. (1998) Nature Biotechnology. 16:49-53. [0583]Urdea M. S. (1988) Nucleic Acids Research. 11:4937-4957. [0584]Urdea M. S. et al. (1991) Nucleic Acids Symp. Ser. 24:197-200. [0585]Vaitukaitis, J. et al. J. Clin. Endocrinol. Metab. 33:988-991 (1971) [0586]Valadon P., et al., 1996, J. Mol. Biol., 261:11-22. [0587]Van der Lugt et al. (1991) Gene. 105:263-267. [0588]Vlasak R. et al. (1983) Eur. J. Biochem. 135:123-126. [0589]Wabiko et al. (1986) DNA 5(4):305-314. [0590]Walker et al. (1996) Clin. Chem. 42:9-13. [0591]Wang et al., 1997, Chromatographia, 44: 205-208. [0592]Weir, B. S. (1996) Genetic data Analysis II. Methods for Discrete population genetic Data, Sinauer Assoc., Inc., Sunderland, Mass., U.S.A. [0593]Westerink M. A. J., 1995, Proc. Natl. Acad. Sci., 92:4021-4025 [0594]White, M. B. et al. (1992) Genomics. 12:301-306. [0595]White, M. B. et al. (1997) Genomics. 12:301-306. [0596]Wong et al. (1980) Gene. 10:87-94. [0597]Wood S. A. et al., 1993, Proc. Natl. Acad. Sci. USA, 90: 4582-4585. [0598]Wu and Wu (1987) J. Biol. Chem. 262:4429-4432. [0599]Wu and Wu (1988) Biochemistry. 27:887-892. [0600]Wu et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:2757. [0601]Yagi T. et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:9918-9922. [0602]Zhao et al., Am. J. Hum. Genet., 63:225-240, 1998 [0603]Zou Y. R. et al. (1994) Curr. Biol. 4:1099-1103.
Sequence CWU
1
37137889DNAHomo sapiensmisc_feature1..20005'regulatory region 1ggccccaagc
ccctaggcct gagggcatcc tggaagtctg gcttgcagca ccctacacct 60gcctcccttc
atatggattc cctgagccat gaaaactctc ggaatcttca ctgagtgttc 120ttagcgctca
gtgccctctt ctaggcctgt ccccacccca gcctactgct ggaagggaga 180aggagggagg
ccttttcccc ttcatactct tccttgtccg ccttcctcta cttgggggcc 240tcttcgacat
cctccatctg ccttccctgg ggtggtatcc cttttgatga gccccagcga 300ggaggcaagg
aggaagatcc taaaaaaaac ttatggagga agttacgtcc tggcaggaga 360gatatagagg
gcttggaccc catctgccca gtcccaggaa ttgagcacac acataactac 420acagacctgc
tgcagtgcac acaatcagtc ttccacagat gtacctacac aaatacacaa 480tcggggctca
ggtgtctggg ggcacagaca caacaggttg cctagatcca ggtgtgtgtg 540cacacaactc
acgcaagaaa cacctgcact tggcttgtag gagactaaaa atcccaaaca 600tgctatacct
actaggtttg gggacaaagt aaaagtgaaa ataatctcgc tgataagact 660aaggtgtctt
cccagatctg atcaggagtc caatgtgaat aggagacaac agagagctgt 720ctttgtactt
tggctctgag gaaggtttgt ggaatggacc gcgtgtggcc tgagagagct 780cgcatttcag
ggaatgcctg ggcttgatgc ctgttatctt gttcagtgcc tctgataaat 840taataaactg
gaatctcact caacaattgg tttcacgtga gtttggcatg tgtgcactcg 900cttcttggag
gctgactgtc ccaaggaaaa aaattaaact gtttcctggc cagaaaatgg 960tgaggactca
gggtccaaag tagaaattgc ctctgtgggc tgggcccctt ccctgagcag 1020atgccctact
caggttgggg caaaaaccag ttcctctgac agaactaact ggccactctc 1080catgaaaggt
ggtggctctc acaaggtatg caaatgtggg aggctcattc caaatgccag 1140ccaccactcc
ccagaggcag cctggagcaa tgggctttat atacacacac atgcatttcc 1200ttagcccctt
ctctctgttt cacagaggag gaaaccgagg ctcaccatgc tccttttggc 1260aaagactctt
ttttagctga gattgtaacc caggtctgtc tcaccatgga aggaggaggg 1320atcccaggtt
ggtgtctgca gcccctgtaa caccacctca aaagtagagc tgaccaagcc 1380ggtggtccag
gagggaaatt tgagaaaggg ctaccccgag ccctccagtt ccaaggagta 1440ggacctaggc
cttacccagc aatgcccaca gtctctagcg gctctgctcc aaagggaatg 1500gaatttgctc
tgacttctct ctaggggaaa cgcaggcgca gaaggccagg ggcctggagg 1560gctaactcca
gcctcagccc cccatagtgc cccaggcctt ctggcttcat tccagcctag 1620cccgaagggg
cccacatctg cttgctcaag ggccccagct ggagggcagc gaggcagggg 1680cggagaaggg
gactcctttg tggcgctcac tcccaggggt tcgtggggat caatcatctg 1740aaagctggag
agacttgtaa ttttattccc tgggaggagg agcctgagaa gtcggcacgg 1800gaggagggga
tgggatggag aggggaggac ggccagggga gccgttcggg taggggaggg 1860acttggctag
atctcccaag ccttctccac tcttcaaact ttcaccaaac tcttagcccc 1920cgcctccccg
aaaccaaaac acaacaggct gtggaagagc tgcgagcggc gcggcggggc 1980gggcgaactc
ggcgcggcac agagcctcgg gaggctgatg caactttccc tttaagaaag 2040ccacctgggc
gcaccgcggt gcggacccag cacgcctggg ccgggggctg cagcatggta 2100agggggctgg
gggaagggac gctggggccc gcgcctccgt ctgtcctgcc gtcggtctga 2160gtgttcggcg
gctccgcggg gctgagcccc aggaccagcg tgtgcgtgtg tgtgtgtgtg 2220tgtgtgtgtg
tgtgatcgcg catgtatgcg ggcatctgtg ggtaagagtg tgggtgcacg 2280caccgggatg
tctgtctcgc cgcaaaggtc ggggtaggtg cgcgcgtctg ggtgtcaccg 2340ctgtcaccaa
ggagctgaag cgggagggag agggaagtgg ggggagctcc tgcgtggggt 2400ggcggggggg
cgccagacgc cggttcgatc gggggcggcg ggcgcaggct gtgcgggtag 2460cgccggagcc
agagcgtggc actgtccgtc cgaggagcca gtggttctgc gctctggggg 2520cctgcagggg
tcgacagtcg gccgtgcctg cacccagcgt gcgggctccc gctcgctcca 2580tcgcctctca
gcgaccgcca gcctggctgc ctatctggcc cctccagccg gcctgcctcc 2640ctatctgccc
tcccggtgcc ggttccagtg cccgtgcgcc cgcgggaccc gggaagccgc 2700gcgggagagt
cctggcgcac tcagcctctc tttgtccgcg ctctggccgc ggctttggga 2760agggactgcg
agcgagccgg gagttctctg ggggttgcta ccctgggaag aatccgtgcc 2820ctccccaccc
caaccctcat taccggcaca cctagggcgg ggccgggcgg ccgcaggtga 2880gggcaggggc
ctagcagccc tggagagccc acccccggcc ggcgctccgc acccggcagc 2940cactgcagat
ctcgagcagg cggcggggcc gggccgggcg ggagagtggg gcttgcccgg 3000agggccggga
ggggccggtg ccggcccggg ggcggggccg tggggcgggg acctcggcgc 3060cccggtactt
ttccaccgca ggggccgggg gcggggccgc cgcctccacg ccagcttagg 3120cccacacgtg
gcggctgatg taacgcccgt gcccccacat ctctctcccc gattaaagcg 3180gacgcgcccc
acttcgagga accgggacgc tgcgggcggt gccgctgctc ccagaccgac 3240cgcgcccctg
cccgccggcc ggaggcctcc gcgcctcggg ccggcggata cttaaaggag 3300ccgcaccccc
tttcgggcct gctctccgca agccacaccc cctcgcggat cactcttaaa 3360ggagacggga
cgcttgctcg ctgacagctt ccgcacaaag cgcggtctat ttctccgctg 3420aaaggagcgt
gagggtgcgg ccgttgggaa ataaaggctg cagggcgagg agtggggaaa 3480gagaaaaacc
tgaaaggtta aagactgcca aagtgaggga gcatcagaag gttggagagt 3540agtgggtgag
gggactggtt ctagggacta atgggcgaca ctgagcatgc ccaagtgttt 3600ccaaggaccg
cagagaaaac gcccccttta cttggcctat gcggcgactg cggctgtacc 3660tgctcagttc
ccggctctgg ggacacagaa gaatcacgcc tctgccctgg tagaactctc 3720catctagtgg
ggagccagac ttgggcacaa ggaagcatgc ctcaattgct acgtgtggca 3780agcgtggggg
tacagtagga aggctgacag caacggggag gacctaggaa ggcttcatga 3840agataatagc
attgggatag gttttgagga tgggtaggat acatacaggt ggaagtggag 3900acaagggtgc
tcaaaaaaca tggaacgggt gccggggcaa aggcatgaag caggagccct 3960ctgcaagtgc
tgggtagtct cacgtggcgg cagatggccg caggctgctg acctgggctc 4020ccaggtagaa
ctaagggctg tttcctgctc ccacagctcc tccacttcct ccaacacagt 4080gaatggaaat
tatgcttctt cctctgtcta ctaaactgtg cggtccttca agctaggaaa 4140agagcctgtc
acctttccat cccccgtgag gactccaggg cttggccaag atggaccgtt 4200aaaactgatt
gacctgacag tcctgtctga tactcagcgg tgagactgtc aaggtgggcc 4260agggttctgt
ggctgatgtc tcaacatctg gctcacaagc agctcagacc agcttgtcca 4320agccagccct
cctgtccccg caccaaacct gcagcctcac cagcattcgt cctctgggtc 4380gggcaccggt
ttcctcccac tgcaccagca gaaaccagag actccttctt atccctcccc 4440ctacccagtc
cgattcatca ccaagtctcg ccatgttgcc ttttatattt ctctgaaatc 4500tgtgcacctc
tcctcaccct gttgtcacct tcctaacctg agccccttca tttttggcct 4560ggctccttcc
ttccttcctt ccttctttcc ttccttcctt tctttctttc tctctctctc 4620tttctttttt
ctttttcttt ttcttttgac ggagttttgc tcttgttgcc caggcaatag 4680tgcaatggca
caatcttgac tcactgcaac ctctgtctcc caagttcaag caattctcat 4740gactcagcct
cctgagcagc tgggattaca ggtgcctgcc accacgccca gctaatttcg 4800tatttttagt
agagacaggg tttctccatg tcggtcaggc tggtctcgga ctcctgacct 4860caggtgatcc
acccacctcg gcctcccaaa gtgctgggat tacaggtgtg agccactgcg 4920cccagcctgg
cctggcccat ttctaattga cttcctctct tcaacttcag cctccctcca 4980gtcctttctc
cacactgtat ctagggggat ggtttcaaga tgcacattcc accaatggct 5040cctccttgaa
gttgatcttc tgcttaaaag ctttcaatag gcaccttgcc ctcaagatga 5100agaccaaaat
ccagacccca aaggccctgg ccaggcctct ctgtccactt taatctttga 5160ccatttcccc
ctgactgtct gagctccagc ctcagagcct ttccacatat tgttccattt 5220gcttgactct
ctcttcctct tctcttcctc attattgcca ctttgcctaa ttaacttctt 5280tagaccctta
aacatcactt catcagggaa gccctcccct agtcttacac aaggccaggc 5340ccccctattg
tagagtctca taacacaatc tttgtctttt acacccttta tctcagtagg 5400cagttcttta
ttttttattt tttgagataa gagtttcact ctttttgccc aggctggggt 5460gcaatggcgt
gacctcggct cactgcaacc tctgcctcct ggattcaagt gattctcctg 5520cctcagcctc
ccaagtagtt gggattacag gcgcctacca ccacgcccag ctaattttgt 5580gtttttagta
gagacgggcc ttcaccatgt tagccaggct ggtctcgaac tcctgacctc 5640aggtgatttg
cctgccttgg cctcccaaag tgctaggatt acaggcgtga gccaccactc 5700ctggtctcag
tttgtaattc tttaaatttt agtcattggt ttaactcttg ccccttagtc 5760tgtaagctgc
aggagtgcag atacctttga gttcttggtg cctagcacag gtctggtcca 5820gagcagacct
gcaatcaata ttgtggaata aatgagtcca gctgtggcca gccttaaacg 5880cctgctaagc
aatttaggcc atatcctgtt ggctgtggaa aactgttgga gagaatcagg 5940gatttagggc
tttgggaaga ttaacagatt ggaggggagg aacatgtgga gggcttgtgg 6000aaaggataaa
aattaggcag aaacttcaga acttacgagg aaactactgt tagccccatc 6060ttccagactg
agaaaactaa ggctcaaaga ggttaagtaa cttactcgaa atcacatacc 6120tggtgagtag
ctgagccaga aatccagccc aggcctacca gatcccagag cctgagctct 6180ttatcatgtg
ctaaactgcc tctgtgaccc actactgctt gtcaacagtg gcccacaagt 6240gcttcagggg
catgtgtggg gcctggtgtt gtggatcttc cacacgtaga agagactgga 6300cgccacgggc
aatcagttgg tgttttctga atggacggat gggaccatgt aacaggaatg 6360tcaaaggcag
gagaagcagc aaagctttac acccagctca ttctcctgga agccgccatg 6420aagcagtctt
ttccctccag tccacgaact acctgagggc agggactgca tttgcagtca 6480gaaatgtcag
agtcagagat gtagggtctg aaaacctagt tcaagtcctc actcgagcac 6540ttacaaggaa
ggcatgtgac ctctccgagt ctgtctctct gatggtaata ggaggaattg 6600gaggaattcc
aggataacct gggtgagggc cctctgcagc ctaagttctg agacacagtg 6660tcctcagttt
acttacctga ccagcgggct taataatttc caacctgtga cgtggctttt 6720ttgtggctct
tttttttttt ttagacaatg tcttgctctg tcgcccaggc tggagtacag 6780tggtgtgatc
tcggctcact gcaatctctg cctcccagat tcaagccatt ctcctgtctc 6840agcctcccga
gtagctggga ctacaggcgc ataccaccat acctggttaa tttttatatt 6900tttagtagag
acgggatttc gccatattgg tcaggctggt cttgaactcc tgacctcagg 6960tgatccgccc
gcctcggcct cccaaagtgc tgggattaca gccgtgagcc actgcttcca 7020gctgacatgg
ctctttctaa tgccatctgc tgttattact atcatcattg tcattattag 7080tccttgactg
cagacagatg gctttaagga agtcaaatat agagtttaca cctggagagg 7140gaggcacagg
ggtgagagag gacaaggtgt gatctacttt taagggagca cacccgcgct 7200tcccctactt
gcccccagaa gaccaaagcc atgagcagga acctctttag gaggacaagt 7260tgagccacag
gccagcctgg gtcacatctt gtccccaccc tgccctgaag agaagctctg 7320caggagacca
ggcagccagg gcaaagggcc taaaaatgcc aaggcctgat ttgcataggc 7380aaagagcagc
ctcctcatct ctcctccagc cttccagaga cttcctgcat ttgtaggtga 7440gaaggggccc
ttgacaggtg gctgtgggtg gggcagagtt gcccagagta gccttgggaa 7500cagagttgac
tccctgggaa ggcaagaaag gctccagaaa ggcagagccg aatccttcct 7560ggggtttcct
cccatgagtc aattcttttt tttttttttt gagacggagt cttgctctgt 7620cacccaagct
agagtgcagt cgcgcaatct cggctcactg caacctctgc ctcccaggtt 7680ccagcgattc
ttctgcctca gcctcccgag tagctgggat tacaggcacc tgccaccatg 7740cccagctaat
ttttgtaatt ttagtagaga cagggtttca ccatgttggc caggatggtc 7800tcaaactcct
gagctcaggt aatccacctg ccttggcctc ctaaagtgct gggattacag 7860gtgtgagcca
ccacgtctgg ccaagataac tttttttttt ttttttttga gacggaacct 7920tgctctgtcg
cccaggctgg agtgcaatga ggcaactctt taaggctgtc tgaagatgga 7980tttctcccga
ggcctgtcaa aaactaacct ttcttccgtc ccatctgaca ttaatccagt 8040gtgcaccagc
attgccatga gactgttcct tccacgtggc tgctgccccc cagctcgcca 8100gccccacctt
gccactcaga tgcttcccac aagatcctgg gctggcctgg acccctcatt 8160tcagaggtcc
cgacttcgct atttccctcc atttcaattc aaataacaaa cacatcttgc 8220aagtccagtg
ctgtgctaca tgctgggcac agagggtaaa agaaacacag tcccaagccc 8280tcggggaact
cactgactca tagttgacga acttataagc aaataagtgt catatgaaat 8340aagagtactc
gtagagattt gcacaaaaca gaaaggagga agcaactttt tcattctttt 8400gagagagagg
gtcagaaaag gctgcagaga agaggtgaca tttagcccgg tgcagtgccg 8460aagatttgta
gaatttcacc tggtgaactg ctgtggaggg gacagctggg agtgcaagga 8520ggtgcggctg
cacatgggtg tgtggcaggc ccagaactcg gttcaagtga atgacatgac 8580aacaaaccca
gctgccctgg agcccacact gagtgccacc cagataggaa aggaccatcc 8640tggctgcctg
gctggggtga ggatggggag agtgtgctga tctgagctga ttcccctcct 8700ggggaaaggc
tgaagccatt cacacagacc atgaaggaat tggactccat cccagtttgg 8760tattcatttt
ttatctttta taggattggg catctggaaa gtcaaaaata gacctttgcc 8820cattgggaca
ggaagcctgg agatcaagag atccagggga aaggttgggg ggactgggaa 8880ggagctggat
ctgagggtct tccagaaagg agcaggatat ggacatggcc tggcagaggg 8940cctgggcagg
ctggtggtgg gtggctcggc actacctgtc tggagagccc tgtgggccac 9000acggccaagt
cctcacctca ccctctgctc tctccacagc tcttgagatc tgtggcctga 9060aaggcgctgg
aagcagagcc tgtgagtgtg gtccccgtca ccagagcccc aacccaccgc 9120cgccatggta
ggaaaaggtg ccaaagggat gctggtgagt acagaggcca gactgcggga 9180aggaacggga
gtcccaagcc ctgcccccag gacctcctgc cagctgcttg gaaggcttag 9240gaaggccggc
actgccaggg ctcctgagtg cccagctgac cccggcctgg gcaggtagcc 9300cctcatctag
caggtcagag agcatcaggc tccaagcccg cagagcccgc aaagacctgc 9360cagaaggcag
gaggtgtggg ccccatcctt gtcacagagg tgctccagac agaaggaggc 9420cactcatatg
ctgagaggac agtggacgcg gggctgtgca ggcgccttca gtctgtgctg 9480atcaaggggt
cctccctcag accctgcctg gtttttctcc acttctttgg gtccttctca 9540gtctgaaaag
taacaaagaa cattttgttt tgaatcctct cctgaggcct cccagaaaat 9600gtccctccac
cctttcccca agtcctggct cttctttgcc ctcagcctta gagcacagga 9660gaggtagtgg
tgaggccacg agacacggga gaggtagtgg tgaggccacg agacacagag 9720gggtggatga
gattcttttt atttttttta atttttttga gactaagtct cactctgttg 9780cccaggctag
agtgcagtgg cgtgatctca gctcactgca acctctgccc ctgccgggtt 9840caagcgattc
tcatgcctca gcctcccaaa tagctggggt tacaggcatg agccaccatg 9900cctagctaat
tttgtatttt tagtagagac ggggtttcac catgttggcc aggctggtct 9960caaactcctg
accttaggtg atccacacgc ctcggccttc caaagtgctg agattacagg 10020tgtgagccac
cgcacccggc ctgggtgagg ttcttgaagg agcaagactc gctgatggat 10080ggaccaggtc
agcaccagaa gcaagaacca agaaaatgat gccaaatcct gtcccagagg 10140caatggttgt
cttccttctc ttgagatctt ccctggggac aggctagact ggacggctgc 10200cgggggtggg
gacctgggac tgagcatgga cagtccctcc ctcagcagct cccatccttt 10260caatcccctg
tgttgctttt ctcatttagt gaaagagtca ttggcctggt gtctctactg 10320agccctcccc
aaacgagccc tgtctctcac caaagatttc tgggactcaa agtacctggt 10380ttaaaagaac
aacttgggct gggcagggtg gctcacgcct gtaatcccag cactttggga 10440ggctgaggag
ggcagatcac ttgagatcag gagtttgaga ccagcctgac caacatggtg 10500aaaccttgtc
tctactaaaa atacaaaaat tagccagatg tggtggtgcg cacctgtaat 10560cccagctact
tgggaggctg tggcacgaaa atcacttgaa cccgggttgc agtgaggcaa 10620gcgccactgc
actccagcct gggcaataga gcagggctct gtctcaaaaa aaaaaaaaaa 10680aaaaaaagaa
agaaagaaag aaagaaagaa aaattaaaaa acagcagttt gtgggagaga 10740accggtggtt
ctgctccaag ctcaggatct tctatctggc cctgttctga acctccagca 10800tccaacccct
ggccccaacc ctcaggccag ctctggtcta gccccctact gattgactga 10860ttggagcaag
tctgattgaa ccaagctcct gaaagtattc ctgagattcc tgaaatctgg 10920gtgctcagaa
atgggcccag atggagatct tatgacctga acacttgcta ggggtgaatc 10980ccatctgaga
acccaaacat ggccccacct ttcctggaga agtcttggaa ttgccatggg 11040cttgtgataa
gaagcggggt gtggccctgc cccttcctac catggagatt ctgagcccca 11100tgaaacctct
gagtcctccc tggttgctct ttcttctctg gagtgacctg gaaggactgg 11160gaggagggca
ggcaggagag atgtgagcag cacagagggc atggctccca gtgtcccaga 11220gtccctgcct
gagctgagct tcccggactg tgctctgccg gtcggcccgg ccagcctgtc 11280tatggatggg
aatgctgcct tgggactcct gttcccagca ccaggatgtg cctaaagata 11340gctggggctc
gctctcctcc ccgctctctg ccctttggct tttgcccact gttgctgcgt 11400ttgaggccca
tagcatctgt cccatgggca cccccagtgg ttgcttggag attctattcc 11460tgcttccttt
tcagttctgc tgatgacctt ggagtactcc actctattct tgggttccag 11520tttgggactg
aagccacctc cattcagtgc ttgtctctag acacagcctc tgaggtgcat 11580agggcagctg
gtccctgctg agccccgtct cttctaacac aggcagccca gtcagcacat 11640ttgtgtcttc
tgtgtcttcc tatggctagt aactaaggaa gcttcattct gacagggttc 11700ttctttggat
accctggtag atcagaagag aggatgttaa ggggcgtgtg tgtgagccat 11760gcacttggac
catcccagtt ctccgtggct tctgctgctc tatttgtgcc ctggcatgca 11820tggagcaggc
actggcccct accatgggat cctcacttct atcccccttt acccaggttt 11880gggtgtgagg
acactggccc tccaagtctc ttcagtgtcg gcagacaccc tgtgttatcc 11940ttggagcttc
tgctggatgc tacagccagt gccccatggg taacaagcag cttaccagca 12000cgtgcccagg
cccacatgcc tgctgcacgt gtccctccct cactccagcc cagatggtct 12060ctgccaggcc
agctcagctg cctccctcac gcctcagcag ctcctaggac agggctgtgc 12120gagaacccag
ctggccaggg tgcctccaag cctgggagca gctgggatgc ttgggggcac 12180tagggggtgg
gtggggtggt cctcccgtgg agacctaggg cctcatacac gtgcggagca 12240gggccaccta
ggtctcaaag gacaaggggc tggaggagag ggacccagga caggaggcaa 12300gagagccccc
actagctcag ttttaatcct ccacccacca atgtcacacc atttcccagg 12360ctggccacct
cagtcccctc cctagtgcca ctgcccactc atgcagagtc ttgggcagtt 12420caggccttgg
tgagctttgt tgacctcttc ccatgtctga gcacactgat tgtttctgcc 12480actgggaagc
cttacaatga acaggaacag actcatttgg attagggttc tgggtgccag 12540agtgtccaaa
tccaaggagg ccttgcaagc accacaggcc gccaggggct gccccaccct 12600gacgcgttct
catatgtttt tccctgaggg tttgggcacc tcctgtctgt agcatccgtc 12660ataggacctg
gcacatagtg gggactcacc cagcccaagt gttggcatgt agcaagagct 12720caactagtgc
aggcttggtg cattattggg aggtcaccaa gcacagggtt tggcacacct 12780tggacactcc
cttggtagag ggcctgtccc tgcctgggca ctcacctaac cgaagtctta 12840gcgtgtggca
aatgctcaac tagtgcagac ttggcacaca ttaaacactc ccttagaaga 12900cagcctgtca
ccctgtaggc attcatctag cacactgcca ggtacacagt gggtgctcac 12960ccagctcaga
gctgggcgca cattgggtcc ttacctggca cagaattcga catacccggg 13020gtactcccct
agtacaggac ctagcacacc acaggggctc cccacctggg gctgacacac 13080aatgagtgct
gtctcagcat cggacttggc acacaagagg cactctgctg atcagcatcg 13140gatttggcac
acggtgggca ctctgctgat gcaggttctc acacacagca ggtgcttcca 13200gcacaagatt
tgacatccag taagtgctat cccggctcag ggcctggcac acagtgggca 13260ttctccttag
ataaggccta gtacagtgga ggtgctcttc tagccaggac agcctaagca 13320atgaccttcc
gctgatccta gattagcagc taccccatcc tccgtgaaat gggcaatatc 13380tagatccaga
ggaaaagttc cagccttcct cagcttcctc tctggtcggg ggccctgccc 13440tccctccccc
tccccacttc ctgtctagcc cacctactca accctgtagc ccggcctccc 13500acacagcctc
cctgcagccc tcccccatcc ccggcacttc tcagtcccag ctcagcggga 13560ccttccctcc
agagtcacag gggacaacaa agaccctagc ctccttgccc ctctggaagc 13620atgcccactc
ttattaggca aagtcagagg cgggagctgg gggagtcagc atccctaagt 13680ggggctagag
cagctggatc tcgctaatga ggcctcatta cggccagggc ccagctgggc 13740tgctgccgcc
tcctgtcccc atggccagag ccagggaaag ccgtccagcc atggctgcct 13800gttctgggca
cccggcccag gccagggcag ccagagccca gagcccaccc tcggtgcaag 13860tgagcatttg
ggcatcatgg gtgacgggac cttcctggag ccagtggccc agccagcgcc 13920ttggccctca
gtctgcgccc atgggaagtc aggggagtgt gtctggggac caggcgcatg 13980aggctggctc
agtgagcagc tgagtcggta ggatctgagg ccatccttag gacacttgct 14040tgagccaagg
cctgtcccca tcttccccaa accaacatgg agataataat ggacctacct 14100catgggacag
ctgggaggac ccaatgagtt aagatatggg aagttaactt gtaccatact 14160aggtgagtgc
ccacagggtg atgggctgtc tgtgaagcaa gcgtcccatg tatgctgagc 14220caggacagtg
cctggcccac agcaagcact tactatgtat tttctattac cattattact 14280aatgctcact
agattcaatg attggtgggt ggtttccccg ttttagagat ggagaaactg 14340agatccgaca
tgttaatgga cttgcctaag gttatatatg gtgagaaact gacaaagcta 14400gggtttgaac
ccaggactgt caaactctaa aggctgtgac tttttatctt attttaaaat 14460ttatttttta
ttaatttttt gaaacaaggt ctcactttgt cacccagttt ggagtgcagt 14520ggtgtgatca
tagctcacta caacttccaa cgtctgggct caagcaaaac tccctcctca 14580gtctcctgag
tagctgggac tataggtaca caccaccaca ccttgccagt tttttgtaga 14640gacacggtct
cactatgttg cccaggctgg tcatgaactc ctgacctcaa gcgaccctcc 14700tgccttggct
tcccaaagca ctggcattac aggcctaagc cactgtgccc cacctgtctt 14760tttattttta
tttattatta ttattattat tattattatt attattatta ttttagagag 14820agagttttgc
tcttgctgcc caggctggaa tgcaatggcg cgatctcggc tcactgcaac 14880ctccacctcc
cgggttcaag cgattctccc gcctcagcct cccgagtagc tgggattaca 14940ggcatgtgcc
accacgccca gctaattttg tatttttagt agagatgggg ttggtctcac 15000catgttggtc
aggttggtct caaactcctg accttgtgat ctgccctcct cagcctccca 15060aagtgctggg
attacaggcg tgagccaccg tacctggccc tgtttttaaa tttttaatgt 15120catcctgtgc
ttgacaaaac tccttatgtt ccctacgggg ttgccacact caccaaactc 15180acaggaaagc
tgttcttaca ccagaattca tgcacgtgtc catctgactg atatttcctg 15240aggtcctgtg
gtaagccagg ccttctgctg actgaccagt ttccaaaggg cctgacacat 15300tcctggcacc
ggaatgcata tgcccacctt ccgagccact ggtctgaggt atcttggggt 15360cagcaggaga
ctggccagtg tgtgtggcct cagagaaagg gactggatgg gtctgagtgg 15420catcccacta
cagtagtggt gactatagtc atggtgacga caccactgga cctaccagcc 15480cacgccacgc
acactcccag actagcacct cagcttctgg gcatcacggt gcaaatcctc 15540acgggttctt
attccacagt ccccctttgt ctggactctg ggatgacttg cccccttggc 15600taatgccatc
tttgtccaga cagctgacag cgtggttcac taggccaggg ctttcttgca 15660tggtcagttg
cattctccta ggaagacgcc ccatcaccct ggcagtttgg gagtccccat 15720ctctggttgg
tggggggtgt caggtacagc cctagggctt ggtttctccc agcccgcacc 15780cctccgtgga
gcctgagctg ctctggtggt gccctgtgga ctgtgccttc catcctcctg 15840ccggtcacca
tccagggccc aatggggcag accccgtcac tcatcagaca cccaactacc 15900ctagcaggag
cccacccagt gctaggcctg tgggctcaga cagcgtctgg tcagctccgt 15960ggggggagtg
agccaccctc tcgcctgctc tgccccagtc tgtgtgtgtg tgtgtgtgtg 16020tgtgtgtgtg
tgtgtgtgtt gttccagtta cttgctggga acaagggaaa aacacaacaa 16080accccacaca
ccccactcca gctccggagc acccgtgctg ggctgcatgg ggactggccg 16140gaggggcagg
gccaggggag cgggtaggca gagcttcggg aggagatgag gtgaaagtaa 16200ttgacgctgc
ccagcccggc agtgggagag gcaggggatg cgtcagtgtc gcgctggagc 16260tggcagaggt
gtgaatgagc ggcggagaca cgcgggctgc gatcgctcgc cccaggatgg 16320ccgcggctca
tggacctgtg gccccctctt ccccagaaca ggtcagtcac cttccaggaa 16380gttgagcccc
ctccttggtg ccaggcaggc tgagaggctg ggctgcggat ggggagaggg 16440gctgagtggt
ctgtgccagg gaggggcagg cacagccagc aagcaagaca gacaccagcc 16500aaagtgaccc
agaaagttgc aactgtgcaa aggagtgtag gaagagttag ctgtgtggtg 16560ggggtgaggg
ggaggagagt gggggtgtga gggctccccc agcccaagga cctgctgaga 16620atctgctctg
tgtatcaaag ctgcattgtc ctagctcccg gggagacggg ccggaggtgc 16680ccaggaattg
tactaatcaa tacagcctca ttgtctgtgc cacggcagtt actcctatgg 16740gaagagtggg
tgagggggca ggggaacgtg gtacatggtt tcagtggcaa ggggtggggg 16800aaggggatgg
ggcaagagtc tcagctggga agagagagac cccacagaac attttccagg 16860gtccagggca
agtgcccgca tgggctggga gccggtgtgt cctcatccct ggacgccggg 16920gacgggtgga
gatgtcagcg gatggcagcg agggtgagga gactcgaaag ctacacttct 16980cccctgtagg
acatcctgac ttgaccttca aaaggcaaga gagggaatgt cctctgaggc 17040gatgtgggat
gggtctggga ggaacagtac gcagggctgc cctcaagcct cagtactgct 17100agggtcaaga
accaaagcca ctttccagag tgcctgcctg ctgctagcca gtgagggtag 17160acgccgcccc
cagccgctgc agcacaggac atgaccattg cctccagtct tccctcctgg 17220gcttgtctgg
ggtgggaagc gaggagctgg ccagggtcca cccagggtcc tccatcactc 17280accccagggc
ctggagtgga gcccggcatg cctgggggtg ccagtcatct gtctgcagcc 17340cacaatattg
gcaggtccct tgctgccagt cagggctctg caatggataa gcacaaggcg 17400tattcttgcc
tctctcatct gctatatgaa ggcctggact ctgaaggcac ctggagtcct 17460gagaggtgcc
cagagatggc tctgaggttt gccttgaggc aggaggcatg aatggccact 17520gggggctgct
cggcccgggg gagattttcc catgttctgc ccctccaccc cggccccaga 17580ctctcccata
tccaggtaac tgaggccaga gatcagagct gcagctgggt ctctctgccc 17640ctactcagtg
ccatgtaggt gaaggacaga agattctgtc attgcagggt aggtgctgcc 17700tgtgggccag
gctgggccct gcactgatag gagagaacaa aaactgaggg gtgcctggcc 17760ggaaggaggt
cgtggggtct tctgaggcag ccacaattta ctgtaatggc tgttctgtct 17820agattcccca
ttggggctta gagaaagttg ccagtgatct gatttggctc aaagtcagtc 17880tgggatgtga
ggtcttgccc attctcctcc ataaagtttg tcttggatgg gacttcatcc 17940catgagcaaa
ccctggggct ggagagtcct aaacacaaag cctgctgggt cccagagggg 18000cccgcccatc
cagggcagct ccaggattcc tcagtgtcca cccagagcag gagggggaag 18060ctcacaattc
acaaaaatat ccactggcca ctgatctagg aggcgggagt gtcacaccag 18120agcacagtca
ccgtccaggg tggagagctg ggctccctga gtctgtgcaa cagcaaaagg 18180caggagtgca
tcctggggct ggcccggttc gctctgggag taggctttgt tgctgcattt 18240cctctttacc
tttttctgta ttttctaatt tttttgtgtg tgatagctta ttatgctttt 18300ataatcagga
aaaaaaaggg caaaattgtt tttagtaatc tagaaagcta gtgggaacta 18360atggaatgta
caggggaaag tcagaaatag caaaattcca ggtgagggag agctatggaa 18420tatgagtagc
tacagaggct atctagcctg ggaatctggc agatcgattg aaagtgaggt 18480tatctagtct
atggccatac catcctgaac gcacccgatc ttgtctgata gtgaggttat 18540ctagtctgtc
ccctgctttt gtagacagca gtgctgagat accaagagtg gaagtggctt 18600gcccagggct
ctactggagg agcgggccct agagcaccag ggctctcacc tgcaccggtg 18660tcccgcctct
gccaagccca gggctgagtg ggagatctgg attctgccct cggagctgag 18720cagccttaat
tccaggggct cagccagagc agtaaataag ccaagcagag ggctgtgtaa 18780aggacttatg
taagtgactt tgctcagtgt ggctctccca gctaggcagg gaaggggtgg 18840gaggctgtta
aaccaggaaa gatcataagt ccttgtgcca gggtggacac tcctgtctca 18900atgactggtt
ctaagggtaa gggaacatct gaggggtgtg tgtgtgtgtg tatgggtgta 18960tatgtgtcag
ccagctcacc ccaccccgcc atggaaggaa agcctccatt ccttccatgg 19020ctgggagagc
ttggggacag ggcagggtag tcctgtcctg gagaccacct gctgagccca 19080tggcagtggc
aagtggaggc tgctttgtcc tggcagtggc ctctcagagc tgccaggaag 19140ggtgacaggg
cacatagggg ctgcctaggg agcctgcaga agagggggcc aagccagagg 19200gaaccatgcc
gctatggttt aattccccac tgagatgtca gcaagtgagt gagggtgagc 19260agaggggggc
agatggtggt gatggtggtg ggggttagga gaagcctgga gcagagccca 19320tccatggggc
tgagctggct caggcagacg ggcagccagt ctgagtcaaa tcgtgagcac 19380aaagggattt
ttccgtgggt acctcagctg atgaccactg cagggacagt tcatatcagt 19440gacacagccg
tccacccagc agatcaagaa aaactagagg cttagaatag gaaagattat 19500gcaaggaggg
agaaaaaaag ccatacagta ttgaaaaaga tttagctgct caggagggaa 19560gtcagagggt
ggtccccaga ggagatctgg acaaggaagg gaactagaac acccagaaag 19620tgcctggatt
tttttttgga gagtctttcc tccagtttgc ttggagaaag gtaggcctgc 19680tctgtcgtcc
gctgagagct cagccctccc cagccaagcc acctgagcgt agacagccct 19740ggagttgaag
acagaggcca agcctgggcc agattcccgt ggtcgagcgc cacagtgctg 19800ggacccaggc
ccaagcagaa gtgggtgctg cttctgcaag gatcctctgc atacagcaga 19860cctttagtgt
tgtcagaaac ttcctgccag agccccgtcg ccaaccccgg ttcagccctg 19920tcgcgggggc
tgtggcctcc ctgggttaca gcccgcacgg ctaagctggt tagcacacgg 19980ctcattaacg
gagcagtggc tggctcttga gggctcaggc aggcggctgt ctgtggcacg 20040tggctggaag
catcaccacc ttccccgtac ctccccctgc cagggaagga gcagacttca 20100gggatccaaa
tgtggtccca cttggtcagg gggtatgggg catctgtccc tcatcctgct 20160gggaacagga
gacaggcctt gatcttcaag ccttgggaac ctggatctct gcaaagggcc 20220ttgcccttac
cctgcccatg ggtccatcct ggaccatcag ttgtcccttt tacagctgta 20280actgctgctc
tgtgacccca tccgtgtact gtgctgctgg cacaatcagg cattgtgccc 20340ctaagtaaac
aaccctggga cagaataaaa ctggtattgt agggtcaaca tataggagcc 20400ctatttggct
ggcatttgtt tgggaccaga tagaccaccc ccagcaggcc tcacaccctc 20460caaaaaatgg
gcccttgact ctgtgatctt gccacacctg ccccagtgaa ggccccagga 20520ggtggtgaca
tgctcagaca ggattctgta ggggcatgtc acagagggca cactcaggcc 20580ggggggcacc
ctctaccagg gccaagcatg tcttcaagcc tcttcatcag ggagaccttc 20640cttgacctca
gccctgcttt cctgccttgt cctccttcca tactcttgtg attttgagat 20700gatcccctct
gtggtcaccc tgaagccaca tctaaagatg atgggtgtat acatttcatt 20760gcacccccac
tagatttcca gctccgaatg ggtaggagct ctctgctgaa gtcccacttc 20820ctggcacagt
gcctggcaca tagccagtgt caataaatct gccattgcat gaatgaatca 20880atgaatgaat
gaaagaaggc ttgttcagcc actgatttgc tctgggtgct ccttgaagga 20940aacccatggt
ccctggcact gcacccctgc cctctgtgga gggtgtgagg atcacatctc 21000ggcagggtcg
atgctggatt gagagcacag ggtggctgcc ccacaaaggt gtctgtgctg 21060cagtgcccac
ccatcaggcc ttggtgcctc ctttgcccac attcccacct ggcctatcca 21120aggcgcttgg
ggcccaggga ggccattcat tccgtctgtg tttgttcatc agcatcaggg 21180tctgatagca
tttcacctgc accaggctgc cttgcttatt gaccagagtg actgctgtcg 21240aatatgtctc
tttcaagcgg agggtgaatg tcacatcact ggcatctctc tgaggctacg 21300gggctccatc
tctgatgcca ggtctgtctg tttctctcag gggaatagct cacctctggg 21360aggctgtctc
tggcactaag gtctgtctgt gtttctgttg gtgacactga gcccctgtct 21420ccttagtgag
accccttggg gcccttgggc tctgattaat ttcctgctca ctgtgggagg 21480gaggtggaag
gagcatggca cttattaagg gacttcttta accaagggac ttgcagagcc 21540agttccagct
tttgggcctg gggttccaca aggggctgtc ctcagtgggg cccaggccac 21600cagctcagtc
ctgcaggccc cacattcttc cctggggcaa atccccgcat ctgcctctcc 21660tgaccttcgg
ccccctcccc tgcaggtgac gcttctccct gttcagagat ccttcttcct 21720gccacccttt
tctggagcca ctccctctac ttccctagca gagtctgtcc tcaaagtctg 21780gcatggggcc
tacaactctg gtctccttcc ccaactcatg gcccagcact ccctagccat 21840ggcccaggtc
agctccccag ggcccctccc cagccatggg cagtgtagct agctccccgg 21900accctcccac
ctagagtccc agaatgccct ccccatgtct gtcccggctc agctgctccc 21960ccatttcccc
ccagccctgc ccactctgtg cccccaggaa gaagccttca gctgggggag 22020gggtctaacc
tcccctcccc cgcagtcgcc tgcccagggg cagcgtgctc agcctggtga 22080tgtggtaggc
gacgggtggg ggggctgtct ggacacaggc ccctttgtgc ccagcctgac 22140ttggctgctt
ggtatgaggc tcagctgctc ccagatcctt acccagaggg gcaaggggag 22200gttttggagg
ggcttggggg ctcctggccc tagctccctc ttcagtgtcc cgcctcccgg 22260atgctggtag
cagctcctca ggtgggctgt gcacctggca ctgggtgctg tggggctggt 22320gagggaagct
ctgcttggcg ccctctggca gccactactc tctgcttccc tcctttgcca 22380ggcctgcctg
aagctcctgc cagcagaggg gaagcccggg gaggggtgtt ggaggggcca 22440cggtttccct
gctgggggag gcccagaggg tcaggcatgg agcacaaatg ccttgctgat 22500ggtagcttct
ctcctgtccc cacagaatgg tgctgtgccc agcgaggcca ccaagaggga 22560ccagaacctc
aaacggggca actggggcaa ccagatcgag tttgtactga cgagcgtggg 22620ctatgccgtg
ggcctgggca atgtctggcg cttcccatac ctctgctatc gcaacggggg 22680aggtacccag
tgggcacagg gcagggacca cccacttggg tagagcccac ccaccagggc 22740ctgggccacc
ctctccctgg cacagaccca ctccagataa aggctggcct ctggccctgg 22800cctccttcgc
ccccatccct ggcttaggca ccctttgcct aaggcaggcc accccagacc 22860ctggcagtct
gttgttgggg gctggccctg ccagttggcc ccctttgagg tgaggagagg 22920tgagaaaagg
acctaacccc cccccccccc cgcccacaag gctccctgtt ctattccctg 22980gggttcagca
gcccccctaa atggttcagg gatgttgagg atctcagcaa cttatgcttg 23040tgaaatcgct
tgtttaaacc atcaaatccc acccaagtat ggaggctctg ggacctggat 23100tccactccaa
gcagaattcc tgagagcctc tgtcaagttg aaagacaggc tgggggtggt 23160gtgcctcagc
ccaggctcct gctcaagcct ggcacactcc tccctgggtc ctacccaatc 23220gcgaaggttt
tgctgttgag ttctgagcag gaagcccagg gtgagatccc tccaagccca 23280ctcctgagcc
agacctctcc ctgcctacca ggcgccttca tgttccccta cttcatcatg 23340ctcatcttct
gcgggatccc cctcttcttc atggagctct ccttcggcca gtttgcaagc 23400caggggtgcc
tgggggtctg gaggatcagc cccatgttca aaggtgaggc ctgggcgggg 23460cactcgctca
cacatacaca aacacacaac ggcctctgag ggccgtgcca actcgcccat 23520ctgtacaaac
atggctgacc tctgtagagc cctgatgctg accccaagcc ccacatcaac 23580ggcctaagcc
atggatgcat gctgggaccc atgtttactt ggctgccacc ctcccctcct 23640accacgtgta
cacacaaaac tcctttgcgc caccccccgc caacttaccc ctaccaagac 23700cttgacctgg
ccttcctttg gccctccatc tcagcctgcc ccttgggccc ttagaaatta 23760ttaacctcgg
gccaggcaca gtggctcacg cctgtaatcc cagcactttg ggaggccaag 23820gcaggcagat
cacctgaggt caggagtttg agaccagcct gaccaacatg gagaaacccc 23880atcttggccg
ggcgcggtgg cttacccctg taatcccagc actttgggag gccaaggcag 23940gcggatcaca
aggtcaggag atcaagacca tactggctaa tacggtgaaa ccctgtctct 24000actaaaaata
caaaaacaaa attagcttgg tgtggtggca ggtgcctgca gtcctagcta 24060ctctgggggc
tgaggcggga gaatggcgtg aacccgggag gcagagcttg tagtgagcca 24120agatcatgcc
acagcactcc agcctgggcg acagagggag actccatgtc aaaaaaaaaa 24180aaaaggaaac
cccatctcta ctaaaaatac aaaattatcc agatatggtg gcgcatgcct 24240ctaatcccag
ctactcggga ggctgaggca ggagaatcgc ttgaacctgg gaggcggagg 24300ttgcagtgag
ccgagatcgc gccattgcac tccagcctgg gcaagaaaag caaaaactcc 24360gtctcaaaaa
aaaaaaaaaa aaaaaaaaag gccgggcgcg gtggctcacg cctgtaatcc 24420cagcacttcg
ggaggccgag gcaggcggat cacgaggtca ggagatcaag accatcctgg 24480ctaacacagt
gaaaccccgt ctctactaaa aatacaaaaa aattagccag gcgtggtggt 24540gggcgcctgt
agtcccagct actcggtagg ttgaggcagg agaatggcat gaacccagga 24600ggtggagctt
gcagtgagcc aagattccgc cactgcactc cagcctgggt gtcagagcct 24660gggcaacaga
gctagactcc atctcaaaaa aaaaaaaaga aaagaaaaga aaaaaaagaa 24720attattaacc
tcacctcctc ctgctggatg ctgaggctcc tccccccacc cctcccctcc 24780tgcaggagtg
ggctatggta tgatggtggt gtccacctac atcggcatct actacaatgt 24840ggtcatctgc
atcgccttct actacttctt ctcgtccatg acgcacgtgc tgccctgggc 24900ctactgcaat
aacccctgga acacgcatga ctgcgccggt gtactggacg cctccaacct 24960caccaatggc
tctcggccag ccgccttgcc cagcaacctc tcccacctgc tcaaccacag 25020cctccagagg
accagcccca gcgaggagta ctggaggtga ggcacctgca ggacctgggg 25080tggggggaac
ctggtggcaa ccctgtcccc actgggcaac catctgaaga caaagtaggg 25140tgcctgctgc
aggttcagcc cgctgctggg ctgggagagg gtagaggtga gtgaagagtt 25200taggagaaga
gcctgatgta cagaaagcaa tggaagatgg aggccgggca cggtggctca 25260tgcctgtaat
cccagcactt tgggaggccg aggcgggcag atcacgaggt caggagatcg 25320agaccatcct
ggctaacacg gtgaaaccct gtctctacta aaaatacaaa aagagaaaag 25380aattagccgg
gcgtggtggc aggcacagga gaatggcgtg aacccgggag gcagagcttg 25440caatgagctg
agatcgcgcc actgcacttc agcctgggcg acagagcaag actccatctc 25500aaaaaaaaaa
aaaaaagaaa gaaagcaatg gaagatggaa tgttgagttg caggatcttt 25560acagagtagc
cgtgaggaag gccgtcagaa atccagcaaa ccagccttca gaggagactc 25620aggggaaagg
ggggcagatc ctggggagaa gaagagtgtg ggcaaaggct ctgtcccaga 25680tgagcagagg
ctggccccaa agctgaaggg agcctactct gatgagagta acaggtgtgg 25740ggagaatgca
tgggcttcag gtggcctagg aagtggcact gccttcataa agaacttgcc 25800aggattaggc
ttagacttga tgcagaacaa aagacgccat ggaaggtctc tgagccgggg 25860gtgccgtgat
caagacatcc tggcacattt tgttcagagg cgacagagac cagagtctca 25920ggccatcctg
ctgtgccctc cacccaccag ttggaaattc cctgtgtctt gctggggaac 25980ccctgggcag
agggctttgg gggcatgctg gcaggcctcc accatttatt ccacaacatt 26040cacgcctctg
ctctgccagg ccctgtgtcc ccatggtagg tatgaacagc ccagagcccc 26100tgtcctgagg
agctcacagg gcagtgggga gacagattgc taatcacagt agggcaagtg 26160cctcggtcca
ggtatgtgca ggggtgtcac agagacacaa agaggtagtg cttggtgcta 26220aggggatggg
gcaagaccag gcagtcagag cttcccagag aagggagaga ggaaagtaga 26280gagattggga
accacatgag caggtgagca gaggccagaa ggaggcaaac tggagctggg 26340gttgggagca
ggtgggacgc tgagtgacaa gacagaagag gccagcaggc atcatataac 26400acagggcctc
ccatgccagg ctaagggttt gaggcttttc tccacatgtg atggaaagcc 26460atggctgaaa
tgtacaaggg tctgaggata ccgcaagtca ccacagggcc tctaaaccca 26520ggtccaagta
ctctgaccct gaaggactca gcttccctgt gccaacctct gggctgggct 26580gtagacactc
aaagacaggt aagacagggc cactattctc aaggacccag agcatagaga 26640acacagttgc
tgtcccccag aggggcatgg gcagactcag gcactcagtt accagacagg 26700tcatttcatt
tgtctttcat tttttaattt ttaacatcct aacatttatt ttctttttaa 26760aaaatacacc
tggaactgaa tggacagatg atttcttttt tttttttttt tttttttttg 26820agacagaatc
tcgctctgtt gcccaggctg gagtgcagtg gcacgatctc agctcactgc 26880aacctctgcc
tcccggggta agcgattctc ctgcctcagc ctccctagta gctgggatta 26940caagcacccg
ccaccacgcc tggcttgttt ttgtattttt agtagagaca gggtttcaac 27000atgttggccg
ggctggtctc gaactcctga cctcaggtga tccacccacc tcggcctccc 27060aaagtgctgg
ggttacaggc atgagccacc acacccagct gatttcttga gcacctacta 27120ggtaccaggt
actgttctca gaggggagag agggcaggga ggcagacatt gtcctcacgg 27180agctgccgtc
cgatgggaga acgagcagat attgctgagt gaggcgtttt attacatttg 27240aggcatgatg
tggactggtc aacttgatct agtctggggt gatgaacctt tctggaagct 27300cccagaaaca
caaagccttg tgcaacccca cccccagatg ctccagtggt ggttgtccca 27360gggactgact
gccagagatt cccagccctg ggcccaccca gggcaaagct ctagagattt 27420tatccagtaa
aatcagaggg gaagagcctt caaagaaatc tggctggtgg agccgctgca 27480ctctactccc
cagcgcgggc tggagccagg cctgcctaac tggccactgt cccatgcggg 27540ataccttccc
ggcagcacgg ctccctgctg ccctctcctg gagatccagg gaaatggccc 27600tgaagagagc
aacccaggct ggcttcagca agggaggaaa ggaagcaccc ttattgttga 27660gcacctacta
ggtgccaggc atgttatcta agtagatatt agtatccata ttttcagact 27720cagaaactgc
cgtttagagg ttaaggagct aggcagcagt cgctgtatca gaatctgtgt 27780gtgttgcgcc
cgtctggtgc tgctgttgct gggtgtgccg gggtgaatta ggtgggggtg 27840gctgtgcagg
tgtttagtcc acactctcca atgacccttt ggacagctgc gcgatctcac 27900tggagtgcca
caaaaaccct gttaggtagc ttgtgtaggg attttaaaaa tcaccatttt 27960acaaatgagg
gaagctaatc aaaagtaaga tgaagagact ctaaagcgag agaagctcaa 28020acggcagatt
tctgagccca tgtctgcctc tcacccactc acccttctca ctcagggtcc 28080tcccctccct
cgccaccctc ttcacaatct tgccttccag tcccctgtgc tggtctgata 28140cctgctactg
ggtcctaaac tcgtgaccag gccttctcac ctctgcggtc cctgtggggc 28200ccagctcctg
tcccccaggg tcttcatggc agctctggag tcacacagac tcctcccaaa 28260tcctggctct
gtcgcatatt agctgtggcc ttgggaggtc gttcacagtc atttgtccat 28320gaaattgggc
agtcgtgata cctacctggt tttaaagaca aaatgagata atgcagccta 28380aatcactaag
actgctgata ggagctcagg cagcacaacc acagcccctg tgggacctgt 28440tcctgagggt
cctgttctca aggcactcac cacctggccg gggaaataag atttgtcatg 28500tggagacatt
tttaaagcaa cgcagagtaa ttctggagat gtcatgtgtc aagaggtcac 28560ttgtaaatac
gttgggttcc aaatgtcctt aggctgtttg atctcaaaag catgtggtcc 28620aaaaccaaaa
ccaaacaaaa aacagaatat gatgccctaa aaataaagct agaattcatc 28680atttggtttc
caggctagcc ttttaaactc ttcagtccct aacactgtaa cgctatattc 28740ttgcagtcaa
aaataatata ggccaggtgc ggtggctcat gcctgtaatc ccagcacttt 28800gggaggctga
ggcaggagga ttgcttgagc ccaggcattc gagaccagcc tgggcaatat 28860ggcgatacca
tatctctaca aaaaatacaa aaattagttg ggcatggtgg tgcacacatg 28920tagtcccatc
tactcgggag gctgaggtgg gaggatcgat tgagactggg aggtcgaggc 28980tgcagtgagc
catgattgtg ccattgcact ccagcctggg ctatagagca agaccctgtc 29040tcaataataa
taatataaaa cactacctgt ttagcatata tggttagaca aagtattaaa 29100aaatccttca
ggccaggcgt ggtggctcac gcctgtaatc ccagcacttt gggaggctga 29160ggcaggtggg
tcacctgagg tcaggagttt gagaccagcc tggccaacat ggtgaaatcc 29220ccatctctcc
taaaaataca aaaattagcc cagcgtattg gtgcatacct gtaatcccag 29280ctactcggga
ggctgaggca ggagaatcgc ttgaacctgg gagacggagg ttgcagtgag 29340cccagatcgc
acgaatgcac tccagcctgg gtgacagagc aagactgtgt caaaaaaaaa 29400aaaaatcctt
caaacaagaa atggtttctt tactttgaat gctggtgctg gaggtagagc 29460agttgttctt
tgccctcttc cctgagggct ggggagatgc ccctgcaggc ctgtatgccc 29520agcacttagc
acaggcccgg gcacgtaaat gtctgattga tcaaaagatc acagctggta 29580tgacataggg
tgccagatgg agccacccag aggggagaaa tttgggtcca atcttttttt 29640tttttttttt
tttttttgag attggacttt tgctttgttg cccaggctag agtgcagtgg 29700cacaatctcc
actccctgca acctccacct cctgggttca agcagttctc ctgcctcagc 29760ctcccgagta
gctgggatta caggcacccg ccatcatggc cggctaattt ttgtagagac 29820ggggtttcac
catattggcc aggctggtct tgaactcctg acttcaggtg atccacccac 29880cacggcctcc
caaagtgctg ggattacagg tgtgagccac cacacccagc cagggtccaa 29940tctttactga
gcattccact gcttccctca atttatcaga gatagagcta attggtgcac 30000gtgaaatgct
cactggtggc ttcctggagg aggtgacacc tgacagaatg aggaaaaggc 30060attcctgaag
ggagaagtgt gcaaacaagg acctgggctg agcccaactg gggacaagga 30120gatctcctcg
cccaggaact aggaagacca cccaagcctg gggtgtggcc tccaggagcc 30180cttgaagggc
ctgggaaagg ctgcagagca gtggactgtg ttgggctccc caggggctgc 30240agcaggtgga
ggggactggg ccggccaggg ctgggcccct gaagtagctc ccatgtcaca 30300caaggtggct
gctgggacct catgacaaag gagggtacag cccgggtggg ccctgggcct 30360tgggaatttg
ttgctgggca gcggcggctg gccctcagtg acagagtctc cccttcccag 30420gctgtacgtg
ctgaagctgt cagatgacat tgggaacttt ggggaggtgc ggctgcccct 30480ccttggctgc
ctcggtgtct cctggttggt cgtcttcctc tgcctcatcc gaggggtcaa 30540gtcttcaggg
aaagcaagta cccctcccca gcagggtctg tgcccaccca gaaagaccct 30600gccccctacc
tgggttatgt gtgtattgca gggaggggat gctgaaggga ctccggagcc 30660ggagagggac
cagggggctg gtggtgatta gggagccagg gtggtggtgc ccaggggagg 30720ggagagccca
gaagagtcca tggagccctc ttgtgccagg tggtgtactt cacggccacg 30780ttcccctacg
tggtgctgac cattctgttt gtccgcggag tgaccctgga gggagccttt 30840gacggcatca
tgtactacct aaccccgcag tgggacaaga tcctggaggc caaggtggga 30900agtgagggaa
accggaaggg ggctggggag gggccaggga ggtgcctgcc tctgcccaca 30960gctcctcacc
caccggtcct tccctgcagg tgtggggtga tgctgcctcc cagatcttct 31020actcactggg
ctgcgcgtgg ggaggcctca tcaccatggc ttcctacaac aagttccaca 31080ataactgtta
ccggtgagtg ctccctgctg ggctgaggct gcccccttcc tgcactctct 31140gcagcctgat
ctcagctcca gtgcagccat gggcccacga ggcctccttt cccctcatga 31200tcttgaggtt
ctgggaatac tggttctgga taaagttgtt ggagaatgag gagagaggtt 31260taggcaggag
agagaagaga gatagggaca gtgatgtccc aggggtctgt gaggactcag 31320gcagctcagg
ctccttaaaa atgcccagaa cagaccagaa aaggagttgg gggccaggct 31380cagtgcttca
cgccgtaatc ccagcacttt gagaggccaa ggcgggagga tcacttgagc 31440ccaggaattc
aagacaagcc tgggcagcat agtgagaccc catctttctt tataaaacat 31500aaaaaagaaa
attagctggg tgtggtggca cacacctgta gtcccagcta cttaggaggc 31560tgaggtggga
ggattgcttg agaccaagag gtcaaggcta cagttagctg tgatagcgcc 31620attgcactcc
atcctggaca acagagtgag accctgtctc ccactccaaa aaaaaagagt 31680tgggggtgcc
tttgtaccta gaactttccg tgtctccatg tctcctcttg gcccaaattg 31740gggagaggga
agctgaactc tgttttcctc cccatcaggg acagtgtcat catcagcatc 31800accaactgtg
ccaccagcgt ctatgctggc ttcgtcatct tctccatcct cggcttcatg 31860gccaatcacc
tgggcgtgga tgtgtcccgt gtggcagacc acggccctgg cctggccttc 31920gtggcttacc
ccgaggccct cacactactt cccatctccc cgctgtggtc tctgctcttc 31980ttcttcatgc
ttatcctgct ggggctgggc actcaggtac gaggtgcagg acgtggaccg 32040gaggcttggg
gaagggaggg gacaggaaca aggggcaggt ctccgatctg acagccacat 32100cctgcagttc
tgcctcctgg agacgctggt cacagccatt gtggatgagg tggggaatga 32160gtggatcctg
cagaaaaaga cctatgtgac cttgggcgtg gctgtggctg gcttcctgct 32220gggcatcccc
ctcaccagcc aggtaagagc tgcgtaaggg aagggctggc ccagagttgc 32280caggacgtgc
agggaagggt tggggagccc gggctgcagg cgcctcccac agctgcccgc 32340tgttccccag
gcaggcatct attggctgct gctgatggac aactatgcgg ccagcttctc 32400cttggtggtc
atctcctgca tcatgtgtgt ggccatcatg tacatctacg gtgagcactc 32460gagcctccgg
cctcccgcga cccggccctt ggcccgccca ctctgtcctg cgggtctgct 32520gaccccccat
ctctccaggg caccggaact acttccagga catccagatg atgctgggat 32580tcccaccacc
cctcttcttt cagatctgct ggcgcttcgt ctctcccgcc atcatcttcg 32640tgagttccct
ggccggcccc tcccttcccc cgctgcccct tggccatgtg tccatctctt 32700gggacccagc
ggagggacag tgggagctcc cgcacctggc cggagctcca tacccatgac 32760tctggccttt
gggaggctga gggaggttgg agagagtcag agaggcgtgt ggcgactcag 32820agctgcctgg
gccctggcct caggttagga ggggagtgcc acagccttta tttgactgca 32880gacgttgctc
aggcattcct gcctggcacc cccggggact caggagtcaa tatcgatagc 32940gtgtcagcct
ctgcctgctg gccttatctc ccatccctcc agggagggtg ttccagggcc 33000ttggctatgt
ggtgggaacc agtgttccag cagcagccca aggctggtag accagactgt 33060ggagtcagag
gtcagctctg catgggggca actttgcaag ccttccaggc agtgaggtgg 33120gtggcttggg
aaggggagtt ccaggtttta tgaatattca acagctgaga aattcccatc 33180agggagctga
attggtggaa ggaatccacc atggcaagga agcaggcact ttccacacac 33240cattcccatg
taacccgtga tagagctctc caaggtgtca ttgttgtccc cactttacag 33300atgaggaaac
tgaggcatgg ggatgttaag taattggccc atggtcacac agctaataag 33360aggcagagcc
acaattcaaa tcaggttgtc caattccaag cctgggccag cgccacacta 33420gaaacacctc
tgcaggaggg aggaatggtc caggggccaa aaccgtatga cttggctcct 33480tcctgtttgg
ggtgggattc ggggaaggca ggtattgaca gtgtaatggt cactccacag 33540aggacaggca
aggccacctc tgagtgatag tcagagggct ggcccagtgg cccggccagt 33600gttcagagcc
acactcttgt catgtttaat gcctcggtgg cacagggggc tcagcaccag 33660acttcccctg
tggctcatgt ttctgaataa aggcccgtgg agggctgagg cctctacccg 33720gtaggttgtg
ggatgttccc gagcctggca caggctcccc gctctggcca ctggcccagc 33780tttgaaacag
gaggaacgga agtctcagtg gtgggaggca ggggagggca ggcaccacca 33840ccccagagtg
ccatgccttc tggggctatg gcccaagggg agctcagcct ctgcggctca 33900gagggtctcc
cagcattcag actagattga tgtcctgggg agagaccagg gggcacctgc 33960caagggcagt
ggggaggaag gagacaagca caagaaggat ctgtaggggc aaccttggct 34020gggatctggt
gtctccacaa gattgaggca cagcgggcgg ggctcttcct gtgcccagga 34080agccaggctg
gagttacctt taggggagcc tagagtcctc tccttggggt aactccagcc 34140tggctttctg
ggctgttgtg ttttctctgg gggctttcct ttttcatatt tctgaaaagc 34200tctttttgac
aaactccacc cttctgtcag ctcaaatgag ggatttgtcc acacgtaggt 34260caggatctgt
tccattcagg aaacctgtta taagctccca tcgtgtgcag tgtctccgga 34320gggtccccca
atttggatgt gggtcttgca ttctagaata tggccctgtg ctccaggatg 34380tggctggagg
ctcaggagga gggtgctggg actgcccccc cccttccccc gccaggccct 34440ctgggtggcc
cgccagccac tgcctgtgtt tctgcctcct ttcactcctt ttgttactgc 34500agccacaagt
tttatttcca atctcactca tctctttcca aggaggttcc ctagagcacc 34560gcagcctgac
actcaccatt tctagcccat ggagcttccc caggctgaat gtccctgtct 34620ccacggccat
agacaggcag ggctttgccc taccctcttt ccatcctcct ctgcctctct 34680gcacaaatca
gttcaaatca acacaagaga ctgtggcagg cactgtactg ggcactgcac 34740ccccagagtt
gatgcaattc caggagctct caggtctatg ggaacgttaa accaagtcac 34800tctaagggaa
gcaggtggaa aacgccgtgg ggaggccctg cagaaggaaa tggaggaaat 34860ctgggatggc
tttttgaagg agagggcact gacatagggt ttgacagagg aagatgcaga 34920ggatgattcc
ccagaataaa tgaagagtgg gtgggaaaca ctgggtgtga ctgaaacact 34980gagggcggag
tagtgtggcc ggcaagacaa tttgaggtca cactgccagg tgacagccta 35040aaggcccagg
atgtggcctt acttcagtta gccatgggga gccatagcga gcagaactga 35100ctgattccaa
aggcggcagc agctctactg tgcttggcca gttcaggact gtggcaaaga 35160cagacttctc
ccttccgcag agccttccca ctcaccgcgc tccggccccc ggccctgcct 35220gcttcttcca
cggcccagat gttgacagct gtccctgttc tgccccaaat gcccttttct 35280agtttgggaa
accagagagg aaagggtgct ggtagaggga tggcctgggg tctgggctct 35340gggtcggcct
cacgcccact cccacacctg ccccgtgcag tttattctag ttttcactgt 35400gatccagtac
cagccgatca cctacaacca ctaccagtac ccaggctggg ccgtggccat 35460tggcttcctc
atggctctgt cctccgtcct ctgcatcccc ctctacgcca tgttccggct 35520ctgccgcaca
gacggggaca ccctcctcca ggtgagggca gggacaggct gggtagccag 35580gcaaaatggg
aggggccagc tggaaggggc ttaagggctc cctcctgacg tgccctgccc 35640ctgtgcccca
tgcctacctc atgcagcgtt tgaaaaatgc cacaaagcca agcagagact 35700ggggccctgc
cctcctggag caccggacag ggcgctacgc ccccaccata gccccctctc 35760ctgaggacgg
cttcgaggtc cagccactgc acccggacaa ggcgcagatc cccattgtgg 35820gcagtaatgg
ctccagccgc ctccaggact cccggatatg agcacagctg ccaggggagt 35880ggccccaccc
ccaccccgtg ctcccaccgc agagactggt gaggcagagg ccaggtgtct 35940ctgcctgccc
cctgccacgc cctggccagg acggctgctg tcaccttggt caccactgct 36000agtgcagtca
ttcatgctca tgtccccagt gtttacatgt cctttggatg ccaagatagc 36060agctgggggg
aggggtggga gtggagggtt gctgggaggt ccaaagcact ttggaggggt 36120ctcgggccag
gtccccagca gcctggatgg ctttacgtgg cctccgatac ccttataccc 36180tgccctgagc
tgaggttctg ggtgggcctc cagccccatg actagtgctc ctgccctcag 36240agccgacccc
agcctctgcc aggcacattt ggctattcct ccctaggggc aggatgaagg 36300gctgggggac
tgcccagtgt tacttgtcag gctgtgctgc ccagccctgt ttatctgtgt 36360aattattttt
gtaaacattg tattctctgt ggtcgccacc tcctcgcccc cagcctcggg 36420ttcagtctgt
cttccaggcc tgcttgcacc tcactgggct gctccggggt ctctgcccct 36480cattccaggc
ctggctgtca ggcccagcca gcaggggccc gtgacccagc agcctgccca 36540aagcatttgt
ttctggggga tggggtgggg gctgctccac aggaggtttg agcccagcac 36600cctggggaag
gggaccctgc acgacacccc cttgccctcc ctccatgagg ctaaaggccc 36660agtcttccca
aatgtgctgc cctcgttcat gtgccaaatg gccccagccc acatgcccct 36720cccctctctg
gagtgggagc cccttctgaa gtgtctgaat ccctgaagtg ttcatttgtc 36780cgtcctctgt
gcagtgacag ccccggccaa gccacctcta atcctctgta gcaataacgg 36840tgcgccgccc
atccctgccc attgtgcacc actaggattt taaagtccat agattttaat 36900gaaatttcta
ttcctgtctc tgagcggctg ctgtgctttg tctgggtccc ccaggggaca 36960agagtcaggc
tggaatgaga cctctgtctg ccaggccttt gtggaggcct gggaggagaa 37020aggccaaagg
ctttgatgct tgggaccgat gcccggccac tcagctccag acaccaggga 37080tctggcaagg
gggtggggca agggccagac agaccaacag ccttggggtc ctggcgagag 37140ctcgccaaga
ccagatctga agctggctgg gccaaagcag ctgaggcggc agcggcagac 37200aggtgcgctg
tggcagaagc cagagcctac tcggtgacca gcccttcagg ctggggcttg 37260gggtgctgca
ctggacatgg gtgtggaagc tggcccttcc ctggggctgc actgaagctg 37320ccgtccccag
cggtgggtag ccgactggcc tcgaggtcga aggtagggct ccagaggagg 37380tgcaggcagg
cagcactgga gaggtgcaac cccgcagagg ggaagcgggg gtctgaggcc 37440ctgtgtggag
gaccccaagg gccgctgcct ggacaagaaa gagtaactgt gaggggtggt 37500ggtctgatcc
ttcagtccac cctgccctgt gtctactgtt gtggtttacc tgtggttggg 37560agagacacaa
gaaggcccca cagatgcctg cagctcctgc tccatggcct tcttctgttc 37620ctttagctct
gaaggagaga ttgggctgag acagtgccca gctcttctag ggtgtgtaca 37680agtgtggggt
gggagcagga ggctgggtgt tctgggccct gctccgtgtc ctcacctgcc 37740agggtgggct
ccagggctgc ctcagagggg tggcaaggcc tcaaatactc ctcttccagg 37800cagcgctgca
gggagagctg cgtcagtgcc cagactcaag gcaggggtgg ggcaaggcag 37860gatacagagg
ccaagatccc cagggcagg 378892106DNAHomo
sapiens 2aacattttcc agggtccagg gcaagtgccc gcatgggctg ggagccggtg
tgtcctcatc 60cctggacgcc ggggacgggt ggagatgtca gcggatgaca gcgagg
1063225DNAHomo sapiens 3acagcagtgc tgagatacca agagtggaag
tggcttgccc agggctctac tggaggagcg 60ggccctagag caccagggct ctcacctgca
ccggtgtccc gcctctgcca agcccagggc 120tgagtgggag atctggattc tgccctcgga
gctgagcagc cttaattcca ggggctcagc 180cagagcagta aataagccaa gcagagggct
gtgtaaagga cttat 225494DNAHomo sapiens 4ggtccagggc
aagtgcccgc atgggctggg agccggtgtg tcctcatccc tggacgccgg 60ggacgggtgg
agatgtcarc ggatggcarc gagg 945155DNAHomo
sapiens 5ggtccagggc aagtgcccgc atgggctggg agccggtgtg tcctcatccc
tggacgccgg 60ggacgggtgg agatgtcagc ggatggcagc gagggtgagg agactcgaaa
gctacacttc 120tcccctgtag gacatcctga cttgaccttc aaaag
155674DNAHomo sapiens 6aatggtgctg tgcccagcga ggccaccaag
agggaccaga acctcaaacg gggcaactgg 60ggcaaccaga tcga
74771DNAHomo sapiens 7aatctcgctc
tgttgcccag gctggagtgc agtggcacga tctcagctca ctgcaacctc 60tgcctcccgg g
71896DNAHomo
sapiens 8gcatctattg gctgctgctg atggacaact atgcggccag cttctccttg
gtggtcatct 60cctgcatcat gtgtgtggcc atcatgtaca tctacg
969371DNAHomo sapiens 9agccttccca ctcaccgcgc tccggccccc
ggccctgcct gcttcttcca cggcccagat 60gttgacagct gtccctgttc tgccccaaat
gcccttttct agtttgggaa accagagagg 120aaagggtgct ggtagaggga tggcctgggg
tctgggctct gggtcggcct cacgcccact 180cccacacctg ccccgtgcag tttattctag
ttttcactgt gatccagtac cagccgatca 240cctacaacca ctaccagtac ccaggctggg
ccgtggccat tggcttcctc atggctctgt 300cctccgtcct ctgcatcccc ctctacgcca
tgttccggct ctgccgcaca gacggggaca 360ccctcctcca g
371102136DNAHomo
sapiens5'UTR1..182CDS183..20843'UTR2085..2104 10agagcctcgg gaggctgatg
caactttccc tttaagaaag ccacctgggc gcaccgcggt 60gcggacccag cacgcctggg
ccgggggctg cagcatgctc ttgagatctg tggcctgaaa 120ggcgctggaa gcagagcctg
taagtgtggt ccccgtcacc agagccccaa cccaccgccg 180cc atg gta gga aaa ggt
gcc aaa ggg atg ctg aat ggt gct gtg ccc 227Met Val Gly Lys Gly Ala
Lys Gly Met Leu Asn Gly Ala Val Pro1 5 10
15agc gag gcc acc aag agg gac cag aac ctc aaa cgg ggc
aac tgg ggc 275Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg Gly
Asn Trp Gly20 25 30aac cag atc gag ttt
gta ctg acg agc gtg ggc tat gcc gtg ggc ctg 323Asn Gln Ile Glu Phe
Val Leu Thr Ser Val Gly Tyr Ala Val Gly Leu35 40
45ggc aat gtc tgg cgc ttc cca tac ctc tgc tat cgc aac ggg gga
ggc 371Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg Asn Gly Gly
Gly50 55 60gcc ttc atg ttc ccc tac ttc
atc atg ctc atc ttc tgc ggg atc ccc 419Ala Phe Met Phe Pro Tyr Phe
Ile Met Leu Ile Phe Cys Gly Ile Pro65 70
75ctc ttc ttc atg gag ctc tcc ttc ggc cag ttt gca agc cag ggg tgc
467Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala Ser Gln Gly Cys80
85 90 95ctg ggg gtc tgg agg
atc agc ccc atg ttc aaa gga gtg ggc tat ggt 515Leu Gly Val Trp Arg
Ile Ser Pro Met Phe Lys Gly Val Gly Tyr Gly100 105
110atg atg gtg gtg tcc acc tac atc ggc atc tac tac aat gtg gtc
atc 563Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr Asn Val Val
Ile115 120 125tgc atc gcc ttc tac tac ttc
ttc tcg tcc atg acg cac gtg ctg ccc 611Cys Ile Ala Phe Tyr Tyr Phe
Phe Ser Ser Met Thr His Val Leu Pro130 135
140tgg gcc tac tgc aat aac ccc tgg aac acg cat gac tgc gcc ggt gta
659Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His Asp Cys Ala Gly Val145
150 155ctg gac gcc tcc aac ctc acc aat ggc
tct cgg cca gcc gcc ttg ccc 707Leu Asp Ala Ser Asn Leu Thr Asn Gly
Ser Arg Pro Ala Ala Leu Pro160 165 170
175agc aac ctc tcc cac ctg ctc aac cac agc ctc cag agg acc
agc ccc 755Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln Arg Thr
Ser Pro180 185 190agc gag gag tac tgg agg
ctg tac gtg ctg aag ctg tca gat gac att 803Ser Glu Glu Tyr Trp Arg
Leu Tyr Val Leu Lys Leu Ser Asp Asp Ile195 200
205ggg aac ttt ggg gag gtg cgg ctg ccc ctc ctt ggc tgc ctc ggt gtc
851Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu Gly Cys Leu Gly Val210
215 220tcc tgg ttg gtc gtc ttc ctc tgc ctc
atc cga ggg gtc aag tct tca 899Ser Trp Leu Val Val Phe Leu Cys Leu
Ile Arg Gly Val Lys Ser Ser225 230 235ggg
aaa gtg gtg tac ttc acg gcc acg ttc ccc tac gtg gtg ctg acc 947Gly
Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr Val Val Leu Thr240
245 250 255att ctg ttt gtc cgc gga
gtg acc ctg gag gga gcc ttt gac ggc atc 995Ile Leu Phe Val Arg Gly
Val Thr Leu Glu Gly Ala Phe Asp Gly Ile260 265
270atg tac tac cta acc ccg cag tgg gac aag atc ctg gag gcc aag gtg
1043Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu Glu Ala Lys Val275
280 285tgg ggt gat gct gcc tcc cag atc ttc
tac tca ctg gcg tgc gcg tgg 1091Trp Gly Asp Ala Ala Ser Gln Ile Phe
Tyr Ser Leu Ala Cys Ala Trp290 295 300gga
ggc ctc atc acc atg gct tcc tac aac aag ttc cac aat aac tgt 1139Gly
Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys Phe His Asn Asn Cys305
310 315tac cgg gac agt gtc atc atc agc atc acc aac
tgt gcc acc agc gtc 1187Tyr Arg Asp Ser Val Ile Ile Ser Ile Thr Asn
Cys Ala Thr Ser Val320 325 330
335tat gct ggc ttc gtc atc ttc tcc atc ctc ggc ttc atg gcc aat cac
1235Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe Met Ala Asn His340
345 350ctg ggc gtg gat gtg tcc cgt gtg gca
gac cac ggc cct ggc ctg gcc 1283Leu Gly Val Asp Val Ser Arg Val Ala
Asp His Gly Pro Gly Leu Ala355 360 365ttc
gtg gct tac ccc gag gcc ctc aca cta ctt ccc atc tcc ccg ctg 1331Phe
Val Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro Ile Ser Pro Leu370
375 380tgg tct ctg ctc ttc ttc ttc atg ctt atc ctg
ctg ggg ctg ggc act 1379Trp Ser Leu Leu Phe Phe Phe Met Leu Ile Leu
Leu Gly Leu Gly Thr385 390 395cag ttc tgc
ctc ctg gag acg ctg gtc aca gcc att gtg gat gag gtg 1427Gln Phe Cys
Leu Leu Glu Thr Leu Val Thr Ala Ile Val Asp Glu Val400
405 410 415ggg aat gag tgg atc ctg cag
aaa aag acc tat gtg acc ttg ggc gtg 1475Gly Asn Glu Trp Ile Leu Gln
Lys Lys Thr Tyr Val Thr Leu Gly Val420 425
430gct gtg gct ggc ttc ctg ctg ggc atc ccc ctc acc agc cag gca ggc
1523Ala Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr Ser Gln Ala Gly435
440 445atc tat tgg ctg ctg ctg atg gac aac
tat gcg gcc agc ttc tcc ttg 1571Ile Tyr Trp Leu Leu Leu Met Asp Asn
Tyr Ala Ala Ser Phe Ser Leu450 455 460gtg
gtc atc tcc tgc atc atg tgt gtg gcc atc atg tac atc tac ggg 1619Val
Val Ile Ser Cys Ile Met Cys Val Ala Ile Met Tyr Ile Tyr Gly465
470 475cac cgg aac tac ttc cag gac atc cag atg atg
ctg gga ttc cca cca 1667His Arg Asn Tyr Phe Gln Asp Ile Gln Met Met
Leu Gly Phe Pro Pro480 485 490
495ccc ctc ttc ttt cag atc tgc tgg cgc ttc gtc tct ccc gcc atc atc
1715Pro Leu Phe Phe Gln Ile Cys Trp Arg Phe Val Ser Pro Ala Ile Ile500
505 510ttc ttt att cta gtt ttc act gtg atc
cag tac cag ccg atc acc tac 1763Phe Phe Ile Leu Val Phe Thr Val Ile
Gln Tyr Gln Pro Ile Thr Tyr515 520 525aac
cac tac cag tac cca ggc tgg gcc gtg gcc att ggc ttc ctc atg 1811Asn
His Tyr Gln Tyr Pro Gly Trp Ala Val Ala Ile Gly Phe Leu Met530
535 540gct ctg tcc tcc gtc ctc tgc atc ccc ctc tac
gcc atg ttc cgg ctc 1859Ala Leu Ser Ser Val Leu Cys Ile Pro Leu Tyr
Ala Met Phe Arg Leu545 550 555tgc cgc aca
gac ggg gac acc ctc ctc cag cgt ttg aaa aat gcc aca 1907Cys Arg Thr
Asp Gly Asp Thr Leu Leu Gln Arg Leu Lys Asn Ala Thr560
565 570 575aag cca agc aga gac tgg ggc
cct gcc ctc ctg gag cac cgg aca ggg 1955Lys Pro Ser Arg Asp Trp Gly
Pro Ala Leu Leu Glu His Arg Thr Gly580 585
590cgc tac gcc ccc acc ata gcc ccc tct cct gag gac ggc ttc gag gtc
2003Arg Tyr Ala Pro Thr Ile Ala Pro Ser Pro Glu Asp Gly Phe Glu Val595
600 605cag tca ctg cac ccg gac aag gcg cag
atc ccc att gtg ggc agt aat 2051Gln Ser Leu His Pro Asp Lys Ala Gln
Ile Pro Ile Val Gly Ser Asn610 615 620ggc
tcc agc cgc ctc cag gac tcc cgg ata tag cacagctgcc aggggagtgc 2104Gly
Ser Ser Arg Leu Gln Asp Ser Arg Ile *625 630caccccaccc
gtgctccacg agagactgtg ag
2136112199DNAHomo sapiens5'UTR1..105CDS106..21693'UTR2170..2189
11gggccggggg ctgcagcatg ctcttgagat ctgtggcctg aaaggcgctg gaagcagagc
60ctgtgagtgt ggtccccgtc accagagccc caacccaccg ccgcc atg gta gga aaa
117Met Val Gly Lys1ggt gcc aaa ggg atg ctg gtg acg ctt ctc cct gtt cag
aga tcc ttc 165Gly Ala Lys Gly Met Leu Val Thr Leu Leu Pro Val Gln
Arg Ser Phe5 10 15
20ttc ctg cca ccc ttt tct gga gcc act ccc tct act tcc cta gca gag
213Phe Leu Pro Pro Phe Ser Gly Ala Thr Pro Ser Thr Ser Leu Ala Glu25
30 35tct gtc ctc aaa gtc tgg cat ggg gcc tac
aac tct ggt ctc ctt ccc 261Ser Val Leu Lys Val Trp His Gly Ala Tyr
Asn Ser Gly Leu Leu Pro40 45 50caa ctc
atg gcc cag cac tcc cta gcc atg gcc cag aat ggt gct gtg 309Gln Leu
Met Ala Gln His Ser Leu Ala Met Ala Gln Asn Gly Ala Val55
60 65ccc agc gag gcc acc aag agg gac cag aac ctc aaa
cgg ggc aac tgg 357Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys
Arg Gly Asn Trp70 75 80ggc aac cag atc
gag ttt gta ctg acg agc gtg ggc tat gcc gtg ggc 405Gly Asn Gln Ile
Glu Phe Val Leu Thr Ser Val Gly Tyr Ala Val Gly85 90
95 100ctg ggc aat gtc tgg cgc ttc cca tac
ctc tgc tat cgc aac ggg gga 453Leu Gly Asn Val Trp Arg Phe Pro Tyr
Leu Cys Tyr Arg Asn Gly Gly105 110 115ggc
gcc ttc atg ttc ccc tac ttc atc atg ctc atc ttc tgc ggg atc 501Gly
Ala Phe Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys Gly Ile120
125 130ccc ctc ttc ttc atg gag ctc tcc ttc ggc cag
ttt gca agc cag ggg 549Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln
Phe Ala Ser Gln Gly135 140 145tgc ctg ggg
gtc tgg agg atc agc ccc atg ttc aaa gga gtg ggc tat 597Cys Leu Gly
Val Trp Arg Ile Ser Pro Met Phe Lys Gly Val Gly Tyr150
155 160ggt atg atg gtg gtg tcc acc tac atc ggc atc tac
tac aat gtg gtc 645Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr
Tyr Asn Val Val165 170 175
180atc tgc atc gcc ttc tac tac ttc ttc tcg tcc atg acg cac gtg ctg
693Ile Cys Ile Ala Phe Tyr Tyr Phe Phe Ser Ser Met Thr His Val Leu185
190 195ccc tgg gcc tac tgc aat aac ccc tgg
aac acg cat gac tgc gcc ggt 741Pro Trp Ala Tyr Cys Asn Asn Pro Trp
Asn Thr His Asp Cys Ala Gly200 205 210gta
ctg gac gcc tcc aac ctc acc aat ggc tct cgg cca gcc gcc ttg 789Val
Leu Asp Ala Ser Asn Leu Thr Asn Gly Ser Arg Pro Ala Ala Leu215
220 225ccc agc aac ctc tcc cac ctg ctc aac cac agc
ctc cag agg acc agc 837Pro Ser Asn Leu Ser His Leu Leu Asn His Ser
Leu Gln Arg Thr Ser230 235 240ccc agc gag
gag tac tgg agg ctg tac gtg ctg aag ctg tca gat gac 885Pro Ser Glu
Glu Tyr Trp Arg Leu Tyr Val Leu Lys Leu Ser Asp Asp245
250 255 260att ggg aac ttt ggg gag gtg
cgg ctg ccc ctc ctt ggc tgc ctc ggt 933Ile Gly Asn Phe Gly Glu Val
Arg Leu Pro Leu Leu Gly Cys Leu Gly265 270
275gtc tcc tgg ttg gtc gtc ttc ctc tgc ctc atc cga ggg gtc aag tct
981Val Ser Trp Leu Val Val Phe Leu Cys Leu Ile Arg Gly Val Lys Ser280
285 290tca ggg aaa gtg gtg tac ttc acg gcc
acg ttc ccc tac gtg gtg ctg 1029Ser Gly Lys Val Val Tyr Phe Thr Ala
Thr Phe Pro Tyr Val Val Leu295 300 305acc
att ctg ttt gtc cgc gga gtg acc ctg gag gga gcc ttt gac ggc 1077Thr
Ile Leu Phe Val Arg Gly Val Thr Leu Glu Gly Ala Phe Asp Gly310
315 320atc atg tac tac cta acc ccg cag tgg gac aag
atc ctg gag gcc aag 1125Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys
Ile Leu Glu Ala Lys325 330 335
340gtg tgg ggt gat gct gcc tcc cag atc ttc tac tca ctg gcg tgc gcg
1173Val Trp Gly Asp Ala Ala Ser Gln Ile Phe Tyr Ser Leu Ala Cys Ala345
350 355tgg gga ggc ctc atc acc atg gct tcc
tac aac aag ttc cac aat aac 1221Trp Gly Gly Leu Ile Thr Met Ala Ser
Tyr Asn Lys Phe His Asn Asn360 365 370tgt
tac cgg gac agt gtc atc atc agc atc acc aac tgt gcc acc agc 1269Cys
Tyr Arg Asp Ser Val Ile Ile Ser Ile Thr Asn Cys Ala Thr Ser375
380 385gtc tat gct ggc ttc gtc atc ttc tcc atc ctc
ggc ttc atg gcc aat 1317Val Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu
Gly Phe Met Ala Asn390 395 400cac ctg ggc
gtg gat gtg tcc cgt gtg gca gac cac ggc cct ggc ctg 1365His Leu Gly
Val Asp Val Ser Arg Val Ala Asp His Gly Pro Gly Leu405
410 415 420gcc ttc gtg gct tac ccc gag
gcc ctc aca cta ctt ccc atc tcc ccg 1413Ala Phe Val Ala Tyr Pro Glu
Ala Leu Thr Leu Leu Pro Ile Ser Pro425 430
435ctg tgg tct ctg ctc ttc ttc ttc atg ctt atc ctg ctg ggg ctg ggc
1461Leu Trp Ser Leu Leu Phe Phe Phe Met Leu Ile Leu Leu Gly Leu Gly440
445 450act cag ttc tgc ctc ctg gag acg ctg
ggc aca gcc att gtg gat gag 1509Thr Gln Phe Cys Leu Leu Glu Thr Leu
Gly Thr Ala Ile Val Asp Glu455 460 465gtg
ggg aat gag tgg atc ctg cag aaa aag acc aat atg acc ttg ggg 1557Val
Gly Asn Glu Trp Ile Leu Gln Lys Lys Thr Asn Met Thr Leu Gly470
475 480cgt gct gtg gct ggc ttc ctg ctg ggc atc ccc
ctc acc agc cag gca 1605Arg Ala Val Ala Gly Phe Leu Leu Gly Ile Pro
Leu Thr Ser Gln Ala485 490 495
500ggc atc tat tgg ctg ctg ctg atg gac aac tat gcg gcc agc ttc tcc
1653Gly Ile Tyr Trp Leu Leu Leu Met Asp Asn Tyr Ala Ala Ser Phe Ser505
510 515ttg gtg gtc atc tcc tgc atc atg tgt
gtg gcc atc atg tac atc tac 1701Leu Val Val Ile Ser Cys Ile Met Cys
Val Ala Ile Met Tyr Ile Tyr520 525 530ggg
cac cgg aac tac ttc cag gac atc cag atg atg ctg gga ttc cca 1749Gly
His Arg Asn Tyr Phe Gln Asp Ile Gln Met Met Leu Gly Phe Pro535
540 545cca ccc ctc ttc ttt cag atc tgc tgg cgc ttc
gtc tct ccc gcc atc 1797Pro Pro Leu Phe Phe Gln Ile Cys Trp Arg Phe
Val Ser Pro Ala Ile550 555 560atc ttc ttt
att cta gtt ttc act gtg atc cag tac cag ccg atc acc 1845Ile Phe Phe
Ile Leu Val Phe Thr Val Ile Gln Tyr Gln Pro Ile Thr565
570 575 580tac aac cac tac cag tac cca
ggc tgg gcc gtg gcc att ggc ttc ctc 1893Tyr Asn His Tyr Gln Tyr Pro
Gly Trp Ala Val Ala Ile Gly Phe Leu585 590
595atg gct ctg tcc tcc gtc ctc tgc atc ccc ctc tac gcc atg ttc cgg
1941Met Ala Leu Ser Ser Val Leu Cys Ile Pro Leu Tyr Ala Met Phe Arg600
605 610ctc tgc cgc aca gac ggg gac acc ctc
ctc cag cgt ttg aaa aat gcc 1989Leu Cys Arg Thr Asp Gly Asp Thr Leu
Leu Gln Arg Leu Lys Asn Ala615 620 625aca
aag cca agc aga gac tgg ggc cct gcc ctc ctg gag cac cgg aca 2037Thr
Lys Pro Ser Arg Asp Trp Gly Pro Ala Leu Leu Glu His Arg Thr630
635 640ggg cgc tac gcc ccc acc ata gcc ccc tct cct
gag gac ggc ttc gag 2085Gly Arg Tyr Ala Pro Thr Ile Ala Pro Ser Pro
Glu Asp Gly Phe Glu645 650 655
660gtc cag tca ctg cac ccg gac aag gcg cag atc ccc att gtg ggc agt
2133Val Gln Ser Leu His Pro Asp Lys Ala Gln Ile Pro Ile Val Gly Ser665
670 675aat ggc tca cgc cgc ctc cag gac tcc
cgg ata tag cacagctgcc 2179Asn Gly Ser Arg Arg Leu Gln Asp Ser
Arg Ile *680 685aggggagtgc cacctctaga
2199122202DNAHomo
sapiens5'UTR1..233CDS234..21513'UTR2152..2171 12gcccacacac cccactccag
ctccggagca cccgtgctgg gctgcatggg gactggccgg 60aggggcaggg ccaggggagc
gggtaggcag agcttcggga ggagatgagg tgaaagtaat 120tgacgctgcc cagcccggca
gtgggagagg caggggatgc gtcagtgtcg cgctggagct 180ggcagaggtg atgagcggcg
gagacacgcg gggctgcgat cgctcgcccc agg atg 236Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag aat ggt 284Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Gly5 10
15gct gtg ccc agc gag gcc acc aag agg gac cag aac ctc aaa cgg
ggc 332Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg
Gly20 25 30aac tgg ggc aac cag atc gag
ttt gta ctg acg agc gtg ggc tat gcc 380Asn Trp Gly Asn Gln Ile Glu
Phe Val Leu Thr Ser Val Gly Tyr Ala35 40
45gtg ggc ctg ggc aat gtc tgg cgc ttc cca tac ctc tgc tat cgc aac
428Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg Asn50
55 60 65ggg gga ggc gcc ttc
atg ttc ccc tac ttc atc atg ctc atc ttc tgc 476Gly Gly Gly Ala Phe
Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys70 75
80ggg atc ccc ctc ttc ttc atg gag ctc tcc ttc ggc cag ttt gca
agc 524Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala
Ser85 90 95cag ggg tgc ctg ggg gtc tgg
agg atc agc ccc atg ttc aaa gga gtg 572Gln Gly Cys Leu Gly Val Trp
Arg Ile Ser Pro Met Phe Lys Gly Val100 105
110ggc tat ggt atg atg gtg gtg tcc acc tac atc ggc atc tac tac aat
620Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr Asn115
120 125gtg gtc atc tgc atc gcc ttc tac tac
ttc ttc tcg tcc atg acg cac 668Val Val Ile Cys Ile Ala Phe Tyr Tyr
Phe Phe Ser Ser Met Thr His130 135 140
145gtg ctg ccc tgg gcc tac tgc aat aac ccc tgg aac acg cat
gac tgc 716Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His
Asp Cys150 155 160gcc ggt gta ctg gac gcc
tcc aac ctc acc aat ggc tct cgg cca gcc 764Ala Gly Val Leu Asp Ala
Ser Asn Leu Thr Asn Gly Ser Arg Pro Ala165 170
175gcc ttg ccc agc aac ctc tcc cac ctg ctc aac cac agc ctc cag agg
812Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln Arg180
185 190acc agc ccc agc gag gag tac tgg agg
ctg tac gtg ctg aag ctg tca 860Thr Ser Pro Ser Glu Glu Tyr Trp Arg
Leu Tyr Val Leu Lys Leu Ser195 200 205gat
gac att ggg aac ttt ggg gag gtg cgg ctg ccc ctc ctt ggc tgc 908Asp
Asp Ile Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu Gly Cys210
215 220 225ctc ggt gtc tcc tgg ttg
gtc gtc ttc ctc tgc ctc atc cga ggg gtc 956Leu Gly Val Ser Trp Leu
Val Val Phe Leu Cys Leu Ile Arg Gly Val230 235
240aag tct tca ggg aaa gtg gtg tac ttc acg gcc acg ttc ccc tac gtg
1004Lys Ser Ser Gly Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr Val245
250 255gtg ctg acc att ctg ttt gtc cgc gga
gtg acc ctg gag gga gcc ttt 1052Val Leu Thr Ile Leu Phe Val Arg Gly
Val Thr Leu Glu Gly Ala Phe260 265 270gac
ggc atc atg tac tac cta acc ccg cag tgg gac aag atc ctg gag 1100Asp
Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu Glu275
280 285gcc aag gtg tgg ggt gat gct gcc tcc cag atc
ttc tac tca ctg gcg 1148Ala Lys Val Trp Gly Asp Ala Ala Ser Gln Ile
Phe Tyr Ser Leu Ala290 295 300
305tgc gcg tgg gga ggc ctc atc acc atg gct tcc tac aac aag ttc cac
1196Cys Ala Trp Gly Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys Phe His310
315 320aat aac tgt tac cgg gac agt gtc atc
atc agc atc acc aac tgt gcc 1244Asn Asn Cys Tyr Arg Asp Ser Val Ile
Ile Ser Ile Thr Asn Cys Ala325 330 335acc
agc gtc tat gct ggc ttc gtc atc ttc tcc atc ctc ggc ttc atg 1292Thr
Ser Val Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe Met340
345 350gcc aat cac ctg ggc gtg gat gtg tcc cgt gtg
gca gac cac ggc cct 1340Ala Asn His Leu Gly Val Asp Val Ser Arg Val
Ala Asp His Gly Pro355 360 365ggc ctg gcc
ttc gtg gct tac ccc gag gcc ctc aca cta ctt ccc atc 1388Gly Leu Ala
Phe Val Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro Ile370
375 380 385tcc ccg ctg tgg tct ctg ctc
ttc ttc ttc atg ctt atc ctg ctg ggg 1436Ser Pro Leu Trp Ser Leu Leu
Phe Phe Phe Met Leu Ile Leu Leu Gly390 395
400ctg ggc act cag ttc tgc ctc ctg gag acg ctg gtc aca gcc att gtg
1484Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr Leu Val Thr Ala Ile Val405
410 415gat gag gtg ggg aat gag tgg atc ctg
cag aaa aag acc tat gtg acc 1532Asp Glu Val Gly Asn Glu Trp Ile Leu
Gln Lys Lys Thr Tyr Val Thr420 425 430ttg
ggc gtg gct gtg gct ggc ttc ctg ctg ggc atc ccc ctc acc agc 1580Leu
Gly Val Ala Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr Ser435
440 445cag gca ggc atc tat tgg ctg ctg ctg atg gac
aac tat gcg gcc agc 1628Gln Ala Gly Ile Tyr Trp Leu Leu Leu Met Asp
Asn Tyr Ala Ala Ser450 455 460
465ttc tcc ttg gtg gtc atc tcc tgc atc atg tgt gtg gcc atc atg tac
1676Phe Ser Leu Val Val Ile Ser Cys Ile Met Cys Val Ala Ile Met Tyr470
475 480atc tac ggg cac cgg aac tac ttc cag
gac atc cag atg atg ctg gga 1724Ile Tyr Gly His Arg Asn Tyr Phe Gln
Asp Ile Gln Met Met Leu Gly485 490 495ttc
cca cca ccc ctc ttc ttt cag atc tgc tgg cgc ttc gtc tct ccc 1772Phe
Pro Pro Pro Leu Phe Phe Gln Ile Cys Trp Arg Phe Val Ser Pro500
505 510gcc atc atc ttc ttt att cta gtt ttc act gtg
atc cag tac cag ccg 1820Ala Ile Ile Phe Phe Ile Leu Val Phe Thr Val
Ile Gln Tyr Gln Pro515 520 525atc acc tac
aac cac tac cag tac cca ggc tgg gcc gtg gcc att ggc 1868Ile Thr Tyr
Asn His Tyr Gln Tyr Pro Gly Trp Ala Val Ala Ile Gly530
535 540 545ttc ctc atg gct ctg tcc tcc
gtc ctc tgc atc ccc ctc tac gcc atg 1916Phe Leu Met Ala Leu Ser Ser
Val Leu Cys Ile Pro Leu Tyr Ala Met550 555
560ttc cgg ctc tgc cgc aca gac ggg gac acc ctc ctc cag cgt ttg aaa
1964Phe Arg Leu Cys Arg Thr Asp Gly Asp Thr Leu Leu Gln Arg Leu Lys565
570 575aat gcc aca aag cca agc aga gac tgg
ggc cct gcc ctc ctg gag cac 2012Asn Ala Thr Lys Pro Ser Arg Asp Trp
Gly Pro Ala Leu Leu Glu His580 585 590cgg
aca ggg cgc tac gcc ccc acc ata gcc ccc tct cct gag gac ggc 2060Arg
Thr Gly Arg Tyr Ala Pro Thr Ile Ala Pro Ser Pro Glu Asp Gly595
600 605ttc gag gtc cag tca ctg cac ccg gac aag gcg
cag atc ccc att gtg 2108Phe Glu Val Gln Ser Leu His Pro Asp Lys Ala
Gln Ile Pro Ile Val610 615 620
625ggc agt aat ggc tcc agc cgc ctc cag gac tcc cgg ata tag
2150Gly Ser Asn Gly Ser Ser Arg Leu Gln Asp Ser Arg Ile *630
635cacagctgcc aggggagtgc caccccaccc gtgctccacg agagactgtg ag
2202132364DNAHomo sapiens5'UTR1..233CDS234..23133'UTR2314..2333
13gcccacacac cccactccag ctccggagca cccgtgctgg gctgcatggg gactggccgg
60aggggcaggg ccaggggagc gggtaggcag agcttcggga ggagatgagg tgaaagtaat
120tgacgctgcc cagcccggca gtgggagagg caggggatgc gtcagtgtcg cgctggagct
180ggcagaggtg atgagcggcg gagacacgcg gggctgcgat cgctcgcccc agg atg
236Met1gcc gcg gct cat gga cct gtg gcc ccc tct tcc cca gaa cag gtg acg
284Ala Ala Ala His Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Val Thr5
10 15ctt ctc cct gtt cag aga tcc ttc ttc
ctg cca ccc ttt tct gga gcc 332Leu Leu Pro Val Gln Arg Ser Phe Phe
Leu Pro Pro Phe Ser Gly Ala20 25 30act
ccc tct act tcc cta gca gag tct gtc ctc aaa gtc tgg cat ggg 380Thr
Pro Ser Thr Ser Leu Ala Glu Ser Val Leu Lys Val Trp His Gly35
40 45gcc tac aac tct ggt ctc ctt ccc caa ctc atg
gcc cag cac tcc cta 428Ala Tyr Asn Ser Gly Leu Leu Pro Gln Leu Met
Ala Gln His Ser Leu50 55 60
65gcc atg gcc cag aat ggt gct gtg ccc agc gag gcc acc aag agg gac
476Ala Met Ala Gln Asn Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp70
75 80cag aac ctc aaa cgg ggc aac tgg ggc
aac cag atc gag ttt gta ctg 524Gln Asn Leu Lys Arg Gly Asn Trp Gly
Asn Gln Ile Glu Phe Val Leu85 90 95acg
agc gtg ggc tat gcc gtg ggc ctg ggc aat gtc tgg cgc ttc cca 572Thr
Ser Val Gly Tyr Ala Val Gly Leu Gly Asn Val Trp Arg Phe Pro100
105 110tac ctc tgc tat cgc aac ggg gga ggc gcc ttc
atg ttc ccc tac ttc 620Tyr Leu Cys Tyr Arg Asn Gly Gly Gly Ala Phe
Met Phe Pro Tyr Phe115 120 125atc atg ctc
atc ttc tgc ggg atc ccc ctc ttc ttc atg gag ctc tcc 668Ile Met Leu
Ile Phe Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser130
135 140 145ttc ggc cag ttt gca agc cag
ggg tgc ctg ggg gtc tgg agg atc agc 716Phe Gly Gln Phe Ala Ser Gln
Gly Cys Leu Gly Val Trp Arg Ile Ser150 155
160ccc atg ttc aaa gga gtg ggc tat ggt atg atg gtg gtg tcc acc tac
764Pro Met Phe Lys Gly Val Gly Tyr Gly Met Met Val Val Ser Thr Tyr165
170 175atc ggc atc tac tac aat gtg gtc atc
tgc atc gcc ttc tac tac ttc 812Ile Gly Ile Tyr Tyr Asn Val Val Ile
Cys Ile Ala Phe Tyr Tyr Phe180 185 190ttc
tcg tcc atg acg cac gtg ctg ccc tgg gcc tac tgc aat aac ccc 860Phe
Ser Ser Met Thr His Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro195
200 205tgg aac acg cat gac tgc gcc ggt gta ctg gac
gcc tcc aac ctc acc 908Trp Asn Thr His Asp Cys Ala Gly Val Leu Asp
Ala Ser Asn Leu Thr210 215 220
225aat ggc tct cgg cca gcc gcc ttg ccc agc aac ctc tcc cac ctg ctc
956Asn Gly Ser Arg Pro Ala Ala Leu Pro Ser Asn Leu Ser His Leu Leu230
235 240aac cac agc ctc cag agg acc agc ccc
agc gag gag tac tgg agg ctg 1004Asn His Ser Leu Gln Arg Thr Ser Pro
Ser Glu Glu Tyr Trp Arg Leu245 250 255tac
gtg ctg aag ctg tca gat gac att ggg aac ttt ggg gag gtg cgg 1052Tyr
Val Leu Lys Leu Ser Asp Asp Ile Gly Asn Phe Gly Glu Val Arg260
265 270ctg ccc ctc ctt ggc tgc ctc ggt gtc tcc tgg
ttg gtc gtc ttc ctc 1100Leu Pro Leu Leu Gly Cys Leu Gly Val Ser Trp
Leu Val Val Phe Leu275 280 285tgc ctc atc
cga ggg gtc aag tct tca ggg aaa gtg gtg tac ttc acg 1148Cys Leu Ile
Arg Gly Val Lys Ser Ser Gly Lys Val Val Tyr Phe Thr290
295 300 305gcc acg ttc ccc tac gtg gtg
ctg acc att ctg ttt gtc cgc gga gtg 1196Ala Thr Phe Pro Tyr Val Val
Leu Thr Ile Leu Phe Val Arg Gly Val310 315
320acc ctg gag gga gcc ttt gac ggc atc atg tac tac cta acc ccg cag
1244Thr Leu Glu Gly Ala Phe Asp Gly Ile Met Tyr Tyr Leu Thr Pro Gln325
330 335tgg gac aag atc ctg gag gcc aag gtg
tgg ggt gat gct gcc tcc cag 1292Trp Asp Lys Ile Leu Glu Ala Lys Val
Trp Gly Asp Ala Ala Ser Gln340 345 350atc
ttc tac tca ctg gcg tgc gcg tgg gga ggc ctc atc acc atg gct 1340Ile
Phe Tyr Ser Leu Ala Cys Ala Trp Gly Gly Leu Ile Thr Met Ala355
360 365tcc tac aac aag ttc cac aat aac tgt tac cgg
gac agt gtc atc atc 1388Ser Tyr Asn Lys Phe His Asn Asn Cys Tyr Arg
Asp Ser Val Ile Ile370 375 380
385agc atc acc aac tgt gcc acc agc gtc tat gct ggc ttc gtc atc ttc
1436Ser Ile Thr Asn Cys Ala Thr Ser Val Tyr Ala Gly Phe Val Ile Phe390
395 400tcc atc ctc ggc ttc atg gcc aat cac
ctg ggc gtg gat gtg tcc cgt 1484Ser Ile Leu Gly Phe Met Ala Asn His
Leu Gly Val Asp Val Ser Arg405 410 415gtg
gca gac cac ggc cct ggc ctg gcc ttc gtg gct tac ccc gag gcc 1532Val
Ala Asp His Gly Pro Gly Leu Ala Phe Val Ala Tyr Pro Glu Ala420
425 430ctc aca cta ctt ccc atc tcc ccg ctg tgg tct
ctg ctc ttc ttc ttc 1580Leu Thr Leu Leu Pro Ile Ser Pro Leu Trp Ser
Leu Leu Phe Phe Phe435 440 445atg ctt atc
ctg ctg ggg ctg ggc act cag ttc tgc ctc ctg gag acg 1628Met Leu Ile
Leu Leu Gly Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr450
455 460 465ctg gtc aca gcc att gtg gat
gag gtg ggg aat gag tgg atc ctg cag 1676Leu Val Thr Ala Ile Val Asp
Glu Val Gly Asn Glu Trp Ile Leu Gln470 475
480aaa aag acc tat gtg acc ttg ggc gtg gct gtg gct ggc ttc ctg ctg
1724Lys Lys Thr Tyr Val Thr Leu Gly Val Ala Val Ala Gly Phe Leu Leu485
490 495ggc atc ccc ctc acc agc cag gca ggc
atc tat tgg ctg ctg ctg atg 1772Gly Ile Pro Leu Thr Ser Gln Ala Gly
Ile Tyr Trp Leu Leu Leu Met500 505 510gac
aac tat gcg gcc agc ttc tcc ttg gtg gtc atc tcc tgc atc atg 1820Asp
Asn Tyr Ala Ala Ser Phe Ser Leu Val Val Ile Ser Cys Ile Met515
520 525tgt gtg gcc atc atg tac atc tac ggg cac cgg
aac tac ttc cag gac 1868Cys Val Ala Ile Met Tyr Ile Tyr Gly His Arg
Asn Tyr Phe Gln Asp530 535 540
545atc cag atg atg ctg gga ttc cca cca ccc ctc ttc ttt cag atc tgc
1916Ile Gln Met Met Leu Gly Phe Pro Pro Pro Leu Phe Phe Gln Ile Cys550
555 560tgg cgc ttc gtc tct ccc gcc atc atc
ttc ttt att cta gtt ttc act 1964Trp Arg Phe Val Ser Pro Ala Ile Ile
Phe Phe Ile Leu Val Phe Thr565 570 575gtg
atc cag tac cag ccg atc acc tac aac cac tac cag tac cca ggc 2012Val
Ile Gln Tyr Gln Pro Ile Thr Tyr Asn His Tyr Gln Tyr Pro Gly580
585 590tgg gcc gtg gcc att ggc ttc ctc atg gct ctg
tcc tcc gtc ctc tgc 2060Trp Ala Val Ala Ile Gly Phe Leu Met Ala Leu
Ser Ser Val Leu Cys595 600 605atc ccc ctc
tac gcc atg ttc cgg ctc tgc cgc aca gac ggg gac acc 2108Ile Pro Leu
Tyr Ala Met Phe Arg Leu Cys Arg Thr Asp Gly Asp Thr610
615 620 625ctc ctc cag cgt ttg aaa aat
gcc aca aag cca agc aga gac tgg ggc 2156Leu Leu Gln Arg Leu Lys Asn
Ala Thr Lys Pro Ser Arg Asp Trp Gly630 635
640cct gcc ctc ctg gag cac cgg aca ggg cgc tac gcc ccc acc ata gcc
2204Pro Ala Leu Leu Glu His Arg Thr Gly Arg Tyr Ala Pro Thr Ile Ala645
650 655ccc tct cct gag gac ggc ttc gag gtc
cag tca ctg cac ccg gac aag 2252Pro Ser Pro Glu Asp Gly Phe Glu Val
Gln Ser Leu His Pro Asp Lys660 665 670gcg
cag atc ccc att gtg ggc agt aat ggc tcc agc cgc ctc cag gac 2300Ala
Gln Ile Pro Ile Val Gly Ser Asn Gly Ser Ser Arg Leu Gln Asp675
680 685tcc cgg ata tag cacagctgcc aggggagtgc
caccccaccc gtgctccacg 2352Ser Arg Ile *690agagactgtg ag
2364142299DNAHomo
sapiens5'UTR1..234CDS235..7893'UTR790..2265 14cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag ggt cca 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Gly Pro5 10
15ggg caa gtg ccc gca tgg gct ggg agc cgg tgt gtc ctc atc cct
gga 333Gly Gln Val Pro Ala Trp Ala Gly Ser Arg Cys Val Leu Ile Pro
Gly20 25 30cgc cgg gga cgg gtg gag atg
tca rcg gat ggc arc gag gaa tgg tgc 381Arg Arg Gly Arg Val Glu Met
Ser Xaa Asp Gly Xaa Glu Glu Trp Cys35 40
45tgt gcc cag cga ggc cac caa gag gga cca gaa cct caa acg ggg caa
429Cys Ala Gln Arg Gly His Gln Glu Gly Pro Glu Pro Gln Thr Gly Gln50
55 60 65ctg ggg caa cca gat
cga gtt tgt act gac gag cgt ggg cta tgc cgt 477Leu Gly Gln Pro Asp
Arg Val Cys Thr Asp Glu Arg Gly Leu Cys Arg70 75
80ggg cct ggg caa tgt ctg gcg ctt ccc ata cct ctg cta tcg caa
cgg 525Gly Pro Gly Gln Cys Leu Ala Leu Pro Ile Pro Leu Leu Ser Gln
Arg85 90 95ggg agg cgc ctt cat gtt ccc
cta ctt cat cat gct cat ctt ctg cgg 573Gly Arg Arg Leu His Val Pro
Leu Leu His His Ala His Leu Leu Arg100 105
110gat ccc cct ctt ctt cat gga gct ctc ctt cgg cca gtt tgc aag cca
621Asp Pro Pro Leu Leu His Gly Ala Leu Leu Arg Pro Val Cys Lys Pro115
120 125ggg gtg cct ggg ggt ctg gag gat cag
ccc cat gtt caa agg agt ggg 669Gly Val Pro Gly Gly Leu Glu Asp Gln
Pro His Val Gln Arg Ser Gly130 135 140
145cta tgg tat gat ggt ggt gtc cac cta cat cgg cat cta cta
caa tgt 717Leu Trp Tyr Asp Gly Gly Val His Leu His Arg His Leu Leu
Gln Cys150 155 160ggt cat ctg cat cgc ctt
cta cta ctt ctt ctc gtc cat gac gca cgt 765Gly His Leu His Arg Leu
Leu Leu Leu Leu Leu Val His Asp Ala Arg165 170
175gct gcc ctg ggc cta ctg caa taa cccctggaac acgcatgact gcgccggtgt
819Ala Ala Leu Gly Leu Leu Gln *180actggacgcc tccaacctca ccaatggctc
tcggccagcc gccttgccca gcaacctctc 879ccacctgctc aaccacagcc tccagaggac
cagccccagc gaggagtact ggaggctgta 939cgtgctgaag ctgtcagatg acattgggaa
ctttggggag gtgcggctgc ccctccttgg 999ctgcctcggt gtctcctggt tggtcgtctt
cctctgcctc atccgagggg tcaagtcttc 1059agggaaagtg gtgtacttca cggccacgtt
cccctacgtg gtgctgacca ttctgtttgt 1119ccgcggagtg accctggagg gagcctttga
cggcatcatg tactacctaa ccccgcagtg 1179ggacaagatc ctggaggcca aggtgtgggg
tgatgctgcc tcccagatct tctactcact 1239gggctgcgcg tggggaggcc tcatcaccat
ggcttcctac aacaagttcc acaataactg 1299ttaccgggac agtgtcatca tcagcatcac
caactgtgcc accagcgtct atgctggctt 1359cgtcatcttc tccatcctcg gcttcatggc
caatcacctg ggcgtggatg tgtcccgtgt 1419ggcagaccac ggccctggcc tggccttcgt
ggcttacccc gaggccctca cactacttcc 1479catctccccg ctgtggtctc tgctcttctt
cttcatgctt atcctgctgg ggctgggcac 1539tcagttctgc ctcctggaga cgctggtcac
agccattgtg gatgaggtgg ggaatgagtg 1599gatcctgcag aaaaagacct atgtgacctt
gggcgtggct gtggctggct tcctgctggg 1659catccccctc accagccagg caggcatcta
ttggctgctg ctgatggaca actatgcggc 1719cagcttctcc ttggtggtca tctcctgcat
catgtgtgtg gccatcatgt acatctacgg 1779gcaccggaac tacttccagg acatccagat
gatgctggga ttcccaccac ccctcttctt 1839tcagatctgc tggcgcttcg tctctcccgc
catcatcttc tttattctag ttttcactgt 1899gatccagtac cagccgatca cctacaacca
ctaccagtac ccaggctggg ccgtggccat 1959tggcttcctc atggctctgt cctccgtcct
ctgcatcccc ctctacgcca tgttccggct 2019ctgccgcaca gacggggaca ccctcctcca
gcgtttgaaa aatgccacaa agccaagcag 2079agactggggc cctgccctcc tggagcaccg
gacagggcgc tacgccccca ccatagcccc 2139ctctcctgag gacggcttcg aggtccagcc
actgcacccg gacaaggcgc agatccccat 2199tgtgggcagt aatggctcca gccgcctcca
ggactcccgg atatgagcac agctgccagg 2259ggagtggccc cacccccacc ccgtgctcca
cgagagactg 2299152129DNAHomo
sapiens5'UTR1..234CDS235..6123'UTR613..2088 15cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag aat ggt 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Gly5 10
15gct gtg ccc agc gag gcc acc aag agg gac cag aac ctc aaa cgg
ggc 333Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg
Gly20 25 30aac tgg ggc aac cag atc gag
cgc ctt cat gtt ccc cta ctt cat cat 381Asn Trp Gly Asn Gln Ile Glu
Arg Leu His Val Pro Leu Leu His His35 40
45gct cat ctt ctg cgg gat ccc cct ctt ctt cat gga gct ctc ctt cgg
429Ala His Leu Leu Arg Asp Pro Pro Leu Leu His Gly Ala Leu Leu Arg50
55 60 65cca gtt tgc aag cca
ggg gtg cct ggg ggt ctg gag gat cag ccc cat 477Pro Val Cys Lys Pro
Gly Val Pro Gly Gly Leu Glu Asp Gln Pro His70 75
80gtt caa agg agt ggg cta tgg tat gat ggt ggt gtc cac cta cat
cgg 525Val Gln Arg Ser Gly Leu Trp Tyr Asp Gly Gly Val His Leu His
Arg85 90 95cat cta cta caa tgt ggt cat
ctg cat cgc ctt cta cta ctt ctt ctc 573His Leu Leu Gln Cys Gly His
Leu His Arg Leu Leu Leu Leu Leu Leu100 105
110gtc cat gac gca cgt gct gcc ctg ggc cta ctg caa taa cccctggaac
622Val His Asp Ala Arg Ala Ala Leu Gly Leu Leu Gln *115
120 125acgcatgact gcgccggtgt actggacgcc tccaacctca
ccaatggctc tcggccagcc 682gccttgccca gcaacctctc ccacctgctc aaccacagcc
tccagaggac cagccccagc 742gaggagtact ggaggctgta cgtgctgaag ctgtcagatg
acattgggaa ctttggggag 802gtgcggctgc ccctccttgg ctgcctcggt gtctcctggt
tggtcgtctt cctctgcctc 862atccgagggg tcaagtcttc agggaaagtg gtgtacttca
cggccacgtt cccctacgtg 922gtgctgacca ttctgtttgt ccgcggagtg accctggagg
gagcctttga cggcatcatg 982tactacctaa ccccgcagtg ggacaagatc ctggaggcca
aggtgtgggg tgatgctgcc 1042tcccagatct tctactcact gggctgcgcg tggggaggcc
tcatcaccat ggcttcctac 1102aacaagttcc acaataactg ttaccgggac agtgtcatca
tcagcatcac caactgtgcc 1162accagcgtct atgctggctt cgtcatcttc tccatcctcg
gcttcatggc caatcacctg 1222ggcgtggatg tgtcccgtgt ggcagaccac ggccctggcc
tggccttcgt ggcttacccc 1282gaggccctca cactacttcc catctccccg ctgtggtctc
tgctcttctt cttcatgctt 1342atcctgctgg ggctgggcac tcagttctgc ctcctggaga
cgctggtcac agccattgtg 1402gatgaggtgg ggaatgagtg gatcctgcag aaaaagacct
atgtgacctt gggcgtggct 1462gtggctggct tcctgctggg catccccctc accagccagg
caggcatcta ttggctgctg 1522ctgatggaca actatgcggc cagcttctcc ttggtggtca
tctcctgcat catgtgtgtg 1582gccatcatgt acatctacgg gcaccggaac tacttccagg
acatccagat gatgctggga 1642ttcccaccac ccctcttctt tcagatctgc tggcgcttcg
tctctcccgc catcatcttc 1702tttattctag ttttcactgt gatccagtac cagccgatca
cctacaacca ctaccagtac 1762ccaggctggg ccgtggccat tggcttcctc atggctctgt
cctccgtcct ctgcatcccc 1822ctctacgcca tgttccggct ctgccgcaca gacggggaca
ccctcctcca gcgtttgaaa 1882aatgccacaa agccaagcag agactggggc cctgccctcc
tggagcaccg gacagggcgc 1942tacgccccca ccatagcccc ctctcctgag gacggcttcg
aggtccagcc actgcacccg 2002gacaaggcgc agatccccat tgtgggcagt aatggctcca
gccgcctcca ggactcccgg 2062atatgagcac agctgccagg ggagtggccc cacccccacc
ccgtgctcca cgagagactg 2122tggggct
2129162014DNAHomo
sapiens5'UTR1..234CDS235..4293'UTR430..2014 16cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag gcg cct 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Ala Pro5 10
15tca tgt tcc cct act tca tca tgc tca tct tct gcg gga tcc ccc
tct 333Ser Cys Ser Pro Thr Ser Ser Cys Ser Ser Ser Ala Gly Ser Pro
Ser20 25 30tct tca tgg agc tct cct tcg
gcc agt ttg caa gcc agg ggt gcc tgg 381Ser Ser Trp Ser Ser Pro Ser
Ala Ser Leu Gln Ala Arg Gly Ala Trp35 40
45ggg tct gga gga tca gcc cca tgt tca aag gag tgg gct atg gta tga
429Gly Ser Gly Gly Ser Ala Pro Cys Ser Lys Glu Trp Ala Met Val *50
55 60tggtggtgtc cacctacatc ggcatctact
acaatgtggt catctgcatc gccttctact 489acttcttctc gtccatgacg cacgtgctgc
cctgggccta ctgcaataac ccctggaaca 549cgcatgactg cgccggtgta ctggacgcct
ccaacctcac caatggctct cggccagccg 609ccttgcccag caacctctcc cacctgctca
accacagcct ccagaggacc agccccagcg 669aggagtactg gaggctgtac gtgctgaagc
tgtcagatga cattgggaac tttggggagg 729tgcggctgcc cctccttggc tgcctcggtg
tctcctggtt ggtcgtcttc ctctgcctca 789tccgaggggt caagtcttca gggaaagtgg
tgtacttcac ggccacgttc ccctacgtgg 849tgctgaccat tctgtttgtc cgcggagtga
ccctggaggg agcctttgac ggcatcatgt 909actacctaac cccgcagtgg gacaagatcc
tggaggccaa ggtgtggggt gatgctgcct 969cccagatctt ctactcactg ggctgcgcgt
ggggaggcct catcaccatg gcttcctaca 1029acaagttcca caataactgt taccgggaca
gtgtcatcat cagcatcacc aactgtgcca 1089ccagcgtcta tgctggcttc gtcatcttct
ccatcctcgg cttcatggcc aatcacctgg 1149gcgtggatgt gtcccgtgtg gcagaccacg
gccctggcct ggccttcgtg gcttaccccg 1209aggccctcac actacttccc atctccccgc
tgtggtctct gctcttcttc ttcatgctta 1269tcctgctggg gctgggcact cagttctgcc
tcctggagac gctggtcaca gccattgtgg 1329atgaggtggg gaatgagtgg atcctgcaga
aaaagaccta tgtgaccttg ggcgtggctg 1389tggctggctt cctgctgggc atccccctca
ccagccaggc aggcatctat tggctgctgc 1449tgatggacaa ctatgcggcc agcttctcct
tggtggtcat ctcctgcatc atgtgtgtgg 1509ccatcatgta catctacggg caccggaact
acttccagga catccagatg atgctgggat 1569tcccaccacc cctcttcttt cagatctgct
ggcgcttcgt ctctcccgcc atcatcttct 1629ttattctagt tttcactgtg atccagtacc
agccgatcac ctacaaccac taccagtacc 1689caggctgggc cgtggccatt ggcttcctca
tggctctgtc ctccgtcctc tgcatccccc 1749tctacgccat gttccggctc tgccgcacag
acggggacac cctcctccag cgtttgaaaa 1809atgccacaaa gccaagcaga gactggggcc
ctgccctcct ggagcaccgg acagggcgct 1869acgcccccac catagccccc tctcctgagg
acggcttcga ggtccagcca ctgcacccgg 1929acaaggcgca gatccccatt gtgggcagta
atggctccag ccgcctccag gactcccgga 1989tatgagcaca gctgccaggg gagtg
2014172278DNAHomo
sapiens5'UTR1..234CDS235..9243'UTR925..2242 17cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag aat ggt 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Gly5 10
15gct gtg ccc agc gag gcc acc aag agg gac cag aac ctc aaa cgg
ggc 333Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg
Gly20 25 30aac tgg ggc aac cag atc gag
ttt gta ctg acg agc gtg ggc tat gcc 381Asn Trp Gly Asn Gln Ile Glu
Phe Val Leu Thr Ser Val Gly Tyr Ala35 40
45gtg ggc ctg ggc aat gtc tgg cgc ttc cca tac ctc tgc tat cgc aac
429Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg Asn50
55 60 65ggg gga ggc gcc ttc
atg ttc ccc tac ttc atc atg ctc atc ttc tgc 477Gly Gly Gly Ala Phe
Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys70 75
80ggg atc ccc ctc ttc ttc atg gag ctc tcc ttc ggc cag ttt gca
agc 525Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala
Ser85 90 95cag ggg tgc ctg ggg gtc tgg
agg atc agc ccc atg ttc aaa gga gtg 573Gln Gly Cys Leu Gly Val Trp
Arg Ile Ser Pro Met Phe Lys Gly Val100 105
110ggc tat ggt atg atg gtg gtg tcc acc tac atc ggc atc tac tac aat
621Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr Asn115
120 125gtg gtc atc tgc atc gcc ttc tac tac
ttc ttc tcg tcc atg acg cac 669Val Val Ile Cys Ile Ala Phe Tyr Tyr
Phe Phe Ser Ser Met Thr His130 135 140
145gtg ctg ccc tgg gcc tac tgc aat aac ccc tgg aac acg cat
gac tgc 717Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His
Asp Cys150 155 160gcc ggt gta ctg gac gcc
tcc aac ctc acc aat ggc tct cgg cca gcc 765Ala Gly Val Leu Asp Ala
Ser Asn Leu Thr Asn Gly Ser Arg Pro Ala165 170
175gcc ttg ccc agc aac ctc tcc cac ctg ctc aac cac agc ctc cag agg
813Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln Arg180
185 190acc agc ccc agc gag gag tac tgg aga
atc tcg ctc tgt tgc cca ggc 861Thr Ser Pro Ser Glu Glu Tyr Trp Arg
Ile Ser Leu Cys Cys Pro Gly195 200 205tgg
agt gca gtg gca cga tct cag ctc act gca acc tct gcc tcc cgg 909Trp
Ser Ala Val Ala Arg Ser Gln Leu Thr Ala Thr Ser Ala Ser Arg210
215 220 225ggc tgt acg tgc tga
agctgtcaga tgacattggg aactttgggg aggtgcggct 964Gly Cys Thr Cys
*gcccctcctt ggctgcctcg gtgtctcctg gttggtcgtc ttcctctgcc tcatccgagg
1024ggtcaagtct tcagggaaag tggtgtactt cacggccacg ttcccctacg tggtgctgac
1084cattctgttt gtccgcggag tgaccctgga gggagccttt gacggcatca tgtactacct
1144aaccccgcag tgggacaaga tcctggaggc caaggtgtgg ggtgatgctg cctcccagat
1204cttctactca ctgggctgcg cgtggggagg cctcatcacc atggcttcct acaacaagtt
1264ccacaataac tgttaccggg acagtgtcat catcagcatc accaactgtg ccaccagcgt
1324ctatgctggc ttcgtcatct tctccatcct cggcttcatg gccaatcacc tgggcgtgga
1384tgtgtcccgt gtggcagacc acggccctgg cctggccttc gtggcttacc ccgaggccct
1444cacactactt cccatctccc cgctgtggtc tctgctcttc ttcttcatgc ttatcctgct
1504ggggctgggc actcagttct gcctcctgga gacgctggtc acagccattg tggatgaggt
1564ggggaatgag tggatcctgc agaaaaagac ctatgtgacc ttgggcgtgg ctgtggctgg
1624cttcctgctg ggcatccccc tcaccagcca ggcaggcatc tattggctgc tgctgatgga
1684caactatgcg gccagcttct ccttggtggt catctcctgc atcatgtgtg tggccatcat
1744gtacatctac gggcaccgga actacttcca ggacatccag atgatgctgg gattcccacc
1804acccctcttc tttcagatct gctggcgctt cgtctctccc gccatcatct tctttattct
1864agttttcact gtgatccagt accagccgat cacctacaac cactaccagt acccaggctg
1924ggccgtggcc attggcttcc tcatggctct gtcctccgtc ctctgcatcc ccctctacgc
1984catgttccgg ctctgccgca cagacgggga caccctcctc cagcgtttga aaaatgccac
2044aaagccaagc agagactggg gccctgccct cctggagcac cggacagggc gctacgcccc
2104caccatagcc ccctctcctg aggacggctt cgaggtccag ccactgcacc cggacaaggc
2164gcagatcccc attgtgggca gtaatggctc cagccgcctc caggactccc ggatatgagc
2224acagctgcca ggggagtggc cccaccccca ccccgtgctc cacgagtgac tgtg
2278182358DNAHomo sapiens5'UTR1..234CDS235..5193'UTR520..2322
18cccacacacc ccactccagc tccggagcac ccgtgctggg ctgcatgggg actggccgga
60ggggcagggc caggggagcg ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt
120gacgctgccc agcccggcag tgggagaggc aggggatgcg tcagtgtcgc gctggagctg
180gcagaggtgt gaatgagcgg cggagacacg cgggctgcga tcgctcgccc cagg atg
237Met1gcc gcg gct cat gga cct gtg gcc ccc tct tcc cca gaa cag ggt cca
285Ala Ala Ala His Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Gly Pro5
10 15ggg caa gtg ccc gca tgg gct ggg agc
cgg tgt gtc ctc atc cct gga 333Gly Gln Val Pro Ala Trp Ala Gly Ser
Arg Cys Val Leu Ile Pro Gly20 25 30cgc
cgg gga cgg gtg gag atg tca gcg gat ggc agc gag ggt gag gag 381Arg
Arg Gly Arg Val Glu Met Ser Ala Asp Gly Ser Glu Gly Glu Glu35
40 45act cga aag cta cac ttc tcc cct gta gga cat
cct gac ttg acc ttc 429Thr Arg Lys Leu His Phe Ser Pro Val Gly His
Pro Asp Leu Thr Phe50 55 60
65aaa aga atg gtg ctg tgc cca gcg agg cca cca aga ggg acc aga acc
477Lys Arg Met Val Leu Cys Pro Ala Arg Pro Pro Arg Gly Thr Arg Thr70
75 80tca aac ggg gca act ggg gca acc aga
tcg agt ttg tac tga 519Ser Asn Gly Ala Thr Gly Ala Thr Arg
Ser Ser Leu Tyr *85 90cgagcgtggg ctatgccgtg ggcctgggca
atgtctggcg cttcccatac ctctgctatc 579gcaacggggg aggcgccttc atgttcccct
acttcatcat gctcatcttc tgcgggatcc 639ccctcttctt catggagctc tccttcggcc
agtttgcaag ccaggggtgc ctgggggtct 699ggaggatcag ccccatgttc aaaggagtgg
gctatggtat gatggtggtg tccacctaca 759tcggcatcta ctacaatgtg gtcatctgca
tcgccttcta ctacttcttc tcgtccatga 819cgcacgtgct gccctgggcc tactgcaata
acccctggaa cacgcatgac tgcgccggtg 879tactggacgc ctccaacctc accaatggct
ctcggccagc cgccttgccc agcaacctct 939cccacctgct caaccacagc ctccagagga
ccagccccag cgaggagtac tggaggctgt 999acgtgctgaa gctgtcagat gacattggga
actttgggga ggtgcggctg cccctccttg 1059gctgcctcgg tgtctcctgg ttggtcgtct
tcctctgcct catccgaggg gtcaagtctt 1119cagggaaagt ggtgtacttc acggccacgt
tcccctacgt ggtgctgacc attctgtttg 1179tccgcggagt gaccctggag ggagcctttg
acggcatcat gtactaccta accccgcagt 1239gggacaagat cctggaggcc aaggtgtggg
gtgatgctgc ctcccagatc ttctactcac 1299tgggctgcgc gtggggaggc ctcatcacca
tggcttccta caacaagttc cacaataact 1359gttaccggga cagtgtcatc atcagcatca
ccaactgtgc caccagcgtc tatgctggct 1419tcgtcatctt ctccatcctc ggcttcatgg
ccaatcacct gggcgtggat gtgtcccgtg 1479tggcagacca cggccctggc ctggccttcg
tggcttaccc cgaggccctc acactacttc 1539ccatctcccc gctgtggtct ctgctcttct
tcttcatgct tatcctgctg gggctgggca 1599ctcagttctg cctcctggag acgctggtca
cagccattgt ggatgaggtg gggaatgagt 1659ggatcctgca gaaaaagacc tatgtgacct
tgggcgtggc tgtggctggc ttcctgctgg 1719gcatccccct caccagccag gcatctattg
gctgctgctg atggacaact atgcggccag 1779cttctccttg gtggtcatct cctgcatcat
gtgtgtggcc atcatgtaca tctacgggca 1839ccggaactac ttccaggaca tccagatgat
gctgggattc ccaccacccc tcttctttca 1899gatctgctgg cgcttcgtct ctcccgccat
catcttcttt attctagttt tcactgtgat 1959ccagtaccag ccgatcacct acaaccacta
ccagtaccca ggctgggccg tggccattgg 2019cttcctcatg gctctgtcct ccgtcctctg
catccccctc tacgccatgt tccggctctg 2079ccgcacagac ggggacaccc tcctccagcg
tttgaaaaat gccacaaagc caagcagaga 2139ctggggccct gccctcctgg agcaccggac
agggcgctac gcccccacca tagccccctc 2199tcctgaggac ggcttcgagg tccagccact
gcacccggac aaggcgcaga tccccattgt 2259gggcagtaat ggctccagcc gcctccagga
ctcccggata tgagcacagc tgccagggga 2319gtggccccac ccccaccccg tgctccacga
tagactgtg 2358192200DNAHomo
sapiens5'UTR1..234CDS235..16053'UTR1606..2167 19cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag aat ggt 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Gly5 10
15gct gtg ccc agc gag gcc acc aag agg gac cag aac ctc aaa cgg
ggc 333Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg
Gly20 25 30aac tgg ggc aac cag atc gag
ttt gta ctg acg agc gtg ggc tat gcc 381Asn Trp Gly Asn Gln Ile Glu
Phe Val Leu Thr Ser Val Gly Tyr Ala35 40
45gtg ggc ctg ggc aat gtc tgg cgc ttc cca tac ctc tgc tat cgc aac
429Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg Asn50
55 60 65ggg gga ggc gcc ttc
atg ttc ccc tac ttc atc atg ctc atc ttc tgc 477Gly Gly Gly Ala Phe
Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys70 75
80ggg atc ccc ctc ttc ttc atg gag ctc tcc ttc ggc cag ttt gca
agc 525Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala
Ser85 90 95cag ggg tgc ctg ggg gtc tgg
agg atc agc ccc atg ttc aaa gga gtg 573Gln Gly Cys Leu Gly Val Trp
Arg Ile Ser Pro Met Phe Lys Gly Val100 105
110ggc tat ggt atg atg gtg gtg tcc acc tac atc ggc atc tac tac aat
621Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr Asn115
120 125gtg gtc atc tgc atc gcc ttc tac tac
ttc ttc tcg tcc atg acg cac 669Val Val Ile Cys Ile Ala Phe Tyr Tyr
Phe Phe Ser Ser Met Thr His130 135 140
145gtg ctg ccc tgg gcc tac tgc aat aac ccc tgg aac acg cat
gac tgc 717Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His
Asp Cys150 155 160gcc ggt gta ctg gac gcc
tcc aac ctc acc aat ggc tct cgg cca gcc 765Ala Gly Val Leu Asp Ala
Ser Asn Leu Thr Asn Gly Ser Arg Pro Ala165 170
175gcc ttg ccc agc aac ctc tcc cac ctg ctc aac cac agc ctc cag agg
813Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln Arg180
185 190acc agc ccc agc gag gag tac tgg agg
ctg tac gtg ctg aag ctg tca 861Thr Ser Pro Ser Glu Glu Tyr Trp Arg
Leu Tyr Val Leu Lys Leu Ser195 200 205gat
gac att ggg aac ttt ggg gag gtg cgg ctg ccc ctc ctt ggc tgc 909Asp
Asp Ile Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu Gly Cys210
215 220 225ctc ggt gtc tcc tgg ttg
gtc gtc ttc ctc tgc ctc atc cga ggg gtc 957Leu Gly Val Ser Trp Leu
Val Val Phe Leu Cys Leu Ile Arg Gly Val230 235
240aag tct tca ggg aaa gtg gtg tac ttc acg gcc acg ttc ccc tac gtg
1005Lys Ser Ser Gly Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr Val245
250 255gtg ctg acc att ctg ttt gtc cgc gga
gtg acc ctg gag gga gcc ttt 1053Val Leu Thr Ile Leu Phe Val Arg Gly
Val Thr Leu Glu Gly Ala Phe260 265 270gac
ggc atc atg tac tac cta acc ccg cag tgg gac aag atc ctg gag 1101Asp
Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu Glu275
280 285gcc aag gtg tgg ggt gat gct gcc tcc cag atc
ttc tac tca ctg ggc 1149Ala Lys Val Trp Gly Asp Ala Ala Ser Gln Ile
Phe Tyr Ser Leu Gly290 295 300
305tgc gcg tgg gga ggc ctc atc acc atg gct tcc tac aac aag ttc cac
1197Cys Ala Trp Gly Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys Phe His310
315 320aat aac tgt tac cgg gac agt gtc atc
atc agc atc acc aac tgt gcc 1245Asn Asn Cys Tyr Arg Asp Ser Val Ile
Ile Ser Ile Thr Asn Cys Ala325 330 335acc
agc gtc tat gct ggc ttc gtc atc ttc tcc atc ctc ggc ttc atg 1293Thr
Ser Val Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe Met340
345 350gcc aat cac ctg ggc gtg gat gtg tcc cgt gtg
gca gac cac ggc cct 1341Ala Asn His Leu Gly Val Asp Val Ser Arg Val
Ala Asp His Gly Pro355 360 365ggc ctg gcc
ttc gtg gct tac ccc gag gcc ctc aca cta ctt ccc atc 1389Gly Leu Ala
Phe Val Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro Ile370
375 380 385tcc ccg ctg tgg tct ctg ctc
ttc ttc ttc atg ctt atc ctg ctg ggg 1437Ser Pro Leu Trp Ser Leu Leu
Phe Phe Phe Met Leu Ile Leu Leu Gly390 395
400ctg ggc act cag ttc tgc ctc ctg gag acg ctg gtc aca gcc att gtg
1485Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr Leu Val Thr Ala Ile Val405
410 415gat gag gtg ggg aat gag tgg atc ctg
cag aaa aag acc tat gtg acc 1533Asp Glu Val Gly Asn Glu Trp Ile Leu
Gln Lys Lys Thr Tyr Val Thr420 425 430ttg
ggc gtg gct gtg gct ggc ttc ctg ctg ggc atc ccc ctc acc agc 1581Leu
Gly Val Ala Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr Ser435
440 445cag gca tct att ggc tgc tgc tga tggacaacta
tgcggccagc ttctccttgg 1635Gln Ala Ser Ile Gly Cys Cys *450
455tggtcatctc ctgcatcatg tgtgtggcca tcatgtacat ctacgggcac cggaactact
1695tccaggacat ccagatgatg ctgggattcc caccacccct cttctttcag atctgctggc
1755gcttcgtctc tcccgccatc atcttcttta ttctagtttt cactgtgatc cagtaccagc
1815cgatcaccta caaccactac cagtacccag gctgggccgt ggccattggc ttcctcatgg
1875ctctgtcctc cgtcctctgc atccccctct acgccatgtt ccggctctgc cgcacagacg
1935gggacaccct cctccagcgt ttgaaaaatg ccacaaagcc aagcagagac tggggccctg
1995ccctcctgga gcaccggaca gggcgctacg cccccaccat agccccctct cctgaggacg
2055gcttcgaggt ccagccactg cacccggaca aggcgcagat ccccattgtg ggcagtaatg
2115gctccagccg cctccaggac tcccggatat gagcacagct gccaggggag tggccccacc
2175cccaccccgt gctccacgag agact
2200202406DNAHomo sapiens5'UTR1..234CDS235..18873'UTR1888..2371
20cccacacacc ccactccagc tccggagcac ccgtgctggg ctgcatgggg actggccgga
60ggggcagggc caggggagcg ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt
120gacgctgccc agcccggcag tgggagaggc aggggatgcg tcagtgtcgc gctggagctg
180gcagaggtgt gaatgagcgg cggagacacg cgggctgcga tcgctcgccc cagg atg
237Met1gcc gcg gct cat gga cct gtg gcc ccc tct tcc cca gaa cag aat ggt
285Ala Ala Ala His Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Gly5
10 15gct gtg ccc agc gag gcc acc aag agg
gac cag aac ctc aaa cgg ggc 333Ala Val Pro Ser Glu Ala Thr Lys Arg
Asp Gln Asn Leu Lys Arg Gly20 25 30aac
tgg ggc aac cag atc gag ttt gta ctg acg agc gtg ggc tat gcc 381Asn
Trp Gly Asn Gln Ile Glu Phe Val Leu Thr Ser Val Gly Tyr Ala35
40 45gtg ggc ctg ggc aat gtc tgg cgc ttc cca tac
ctc tgc tat cgc aac 429Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr
Leu Cys Tyr Arg Asn50 55 60
65ggg gga ggc gcc ttc atg ttc ccc tac ttc atc atg ctc atc ttc tgc
477Gly Gly Gly Ala Phe Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys70
75 80ggg atc ccc ctc ttc ttc atg gag ctc
tcc ttc ggc cag ttt gca agc 525Gly Ile Pro Leu Phe Phe Met Glu Leu
Ser Phe Gly Gln Phe Ala Ser85 90 95cag
ggg tgc ctg ggg gtc tgg agg atc agc ccc atg ttc aaa gga gtg 573Gln
Gly Cys Leu Gly Val Trp Arg Ile Ser Pro Met Phe Lys Gly Val100
105 110ggc tat ggt atg atg gtg gtg tcc acc tac atc
ggc atc tac tac aat 621Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile
Gly Ile Tyr Tyr Asn115 120 125gtg gtc atc
tgc atc gcc ttc tac tac ttc ttc tcg tcc atg acg cac 669Val Val Ile
Cys Ile Ala Phe Tyr Tyr Phe Phe Ser Ser Met Thr His130
135 140 145gtg ctg ccc tgg gcc tac tgc
aat aac ccc tgg aac acg cat gac tgc 717Val Leu Pro Trp Ala Tyr Cys
Asn Asn Pro Trp Asn Thr His Asp Cys150 155
160gcc ggt gta ctg gac gcc tcc aac ctc acc aat ggc tct cgg cca gcc
765Ala Gly Val Leu Asp Ala Ser Asn Leu Thr Asn Gly Ser Arg Pro Ala165
170 175gcc ttg ccc agc aac ctc tcc cac ctg
ctc aac cac agc ctc cag agg 813Ala Leu Pro Ser Asn Leu Ser His Leu
Leu Asn His Ser Leu Gln Arg180 185 190acc
agc ccc agc gag gag tac tgg agg ctg tac gtg ctg aag ctg tca 861Thr
Ser Pro Ser Glu Glu Tyr Trp Arg Leu Tyr Val Leu Lys Leu Ser195
200 205gat gac att ggg aac ttt ggg gag gtg cgg ctg
ccc ctc ctt ggc tgc 909Asp Asp Ile Gly Asn Phe Gly Glu Val Arg Leu
Pro Leu Leu Gly Cys210 215 220
225ctc ggt gtc tcc tgg ttg gtc gtc ttc ctc tgc ctc atc cga ggg gtc
957Leu Gly Val Ser Trp Leu Val Val Phe Leu Cys Leu Ile Arg Gly Val230
235 240aag tct tca ggg aaa gtg gtg tac ttc
acg gcc acg ttc ccc tac gtg 1005Lys Ser Ser Gly Lys Val Val Tyr Phe
Thr Ala Thr Phe Pro Tyr Val245 250 255gtg
ctg acc att ctg ttt gtc cgc gga gtg acc ctg gag gga gcc ttt 1053Val
Leu Thr Ile Leu Phe Val Arg Gly Val Thr Leu Glu Gly Ala Phe260
265 270gac ggc atc atg tac tac cta acc ccg cag tgg
gac aag atc ctg gag 1101Asp Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp
Asp Lys Ile Leu Glu275 280 285gcc aag gtg
tgg ggt gat gct gcc tcc cag atc ttc tac tca ctg ggc 1149Ala Lys Val
Trp Gly Asp Ala Ala Ser Gln Ile Phe Tyr Ser Leu Gly290
295 300 305tgc gcg tgg gga ggc ctc atc
acc atg gct tcc tac aac aag ttc cac 1197Cys Ala Trp Gly Gly Leu Ile
Thr Met Ala Ser Tyr Asn Lys Phe His310 315
320aat aac tgt tac cgg gac agt gtc atc atc agc atc acc aac tgt gcc
1245Asn Asn Cys Tyr Arg Asp Ser Val Ile Ile Ser Ile Thr Asn Cys Ala325
330 335acc agc gtc tat gct ggc ttc gtc atc
ttc tcc atc ctc ggc ttc atg 1293Thr Ser Val Tyr Ala Gly Phe Val Ile
Phe Ser Ile Leu Gly Phe Met340 345 350gcc
aat cac ctg ggc gtg gat gtg tcc cgt gtg gca gac cac ggc cct 1341Ala
Asn His Leu Gly Val Asp Val Ser Arg Val Ala Asp His Gly Pro355
360 365ggc ctg gcc ttc gtg gct tac ccc gag gcc ctc
aca cta ctt ccc atc 1389Gly Leu Ala Phe Val Ala Tyr Pro Glu Ala Leu
Thr Leu Leu Pro Ile370 375 380
385tcc ccg ctg tgg tct ctg ctc ttc ttc ttc atg ctt atc ctg ctg ggg
1437Ser Pro Leu Trp Ser Leu Leu Phe Phe Phe Met Leu Ile Leu Leu Gly390
395 400ctg ggc act cag ttc tgc ctc ctg gag
acg ctg gtc aca gcc att gtg 1485Leu Gly Thr Gln Phe Cys Leu Leu Glu
Thr Leu Val Thr Ala Ile Val405 410 415gat
gag gtg ggg aat gag tgg atc ctg cag aaa aag acc tat gtg acc 1533Asp
Glu Val Gly Asn Glu Trp Ile Leu Gln Lys Lys Thr Tyr Val Thr420
425 430ttg ggc gtg gct gtg gct ggc ttc ctg ctg ggc
atc ccc ctc acc agc 1581Leu Gly Val Ala Val Ala Gly Phe Leu Leu Gly
Ile Pro Leu Thr Ser435 440 445cag gca ggc
atc tat tgg ctg ctg ctg atg gac aac tat gcg gcc agc 1629Gln Ala Gly
Ile Tyr Trp Leu Leu Leu Met Asp Asn Tyr Ala Ala Ser450
455 460 465ttc tcc ttg gtg gtc atc tcc
tgc atc atg tgt gtg gcc atc atg tac 1677Phe Ser Leu Val Val Ile Ser
Cys Ile Met Cys Val Ala Ile Met Tyr470 475
480atc tac ggg cac cgg aac tac ttc cag gac atc cag atg atg ctg gga
1725Ile Tyr Gly His Arg Asn Tyr Phe Gln Asp Ile Gln Met Met Leu Gly485
490 495ttc cca cca ccc ctc ttc ttt cag atc
tgc tgg cgc ttc gtc tct ccc 1773Phe Pro Pro Pro Leu Phe Phe Gln Ile
Cys Trp Arg Phe Val Ser Pro500 505 510gcc
atc atc ttc agc ctt ccc act cac cgc gct ccg gcc ccc ggc cct 1821Ala
Ile Ile Phe Ser Leu Pro Thr His Arg Ala Pro Ala Pro Gly Pro515
520 525gcc tgc ttc ttc cac ggc cca gat gtt gac agc
tgt ccc tgt tct gcc 1869Ala Cys Phe Phe His Gly Pro Asp Val Asp Ser
Cys Pro Cys Ser Ala530 535 540
545cca aat gcc ctt ttc tag tttgggaaac cagagaggaa agggtgctgg
1917Pro Asn Ala Leu Phe *550tagagggatg gcctggggtc tgggctctgg
gtcggcctca cgcccactcc cacacctgcc 1977ccgtgcagtt tattctagtt ttcactgtga
tccagtacca gccgatcacc tacaaccact 2037accagtaccc aggctgggcc gtggccattg
gcttcctcat ggctctgtcc tccgtcctct 2097gcatccccct ctacgccatg ttccggctct
gccgcacaga cggggacacc ctcctccagc 2157gtttgaaaaa tgccacaaag ccaagcagag
actggggccc tgccctcctg gagcaccgga 2217cagggcgcta cgcccccacc atagccccct
ctcctgagga cggcttcgag gtccagccac 2277tgcacccgga caaggcgcag atccccattg
tgggcagtaa tggctccagc cgcctccagg 2337actcccggat atgagcacag ctgccagggg
agtggcccca cccccacccc gtgctccacg 2397agagactgt
2406212310DNAHomo
sapiens5'UTR1..234CDS235..8013'UTR802..2277 21cccacacacc ccactccagc
tccggagcac ccgtgctggg ctgcatgggg actggccgga 60ggggcagggc caggggagcg
ggtaggcaga gcttcgggag gagatgaggt gaaagtaatt 120gacgctgccc agcccggcag
tgggagaggc aggggatgcg tcagtgtcgc gctggagctg 180gcagaggtgt gaatgagcgg
cggagacacg cgggctgcga tcgctcgccc cagg atg 237Met1gcc gcg gct cat gga
cct gtg gcc ccc tct tcc cca gaa cag aac att 285Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn Ile5 10
15ttc cag ggt cca ggg caa gtg ccc gca tgg gct ggg agc cgg tgt
gtc 333Phe Gln Gly Pro Gly Gln Val Pro Ala Trp Ala Gly Ser Arg Cys
Val20 25 30ctc atc cct gga cgc cgg gga
cgg gtg gag atg tca gcg gat gac agc 381Leu Ile Pro Gly Arg Arg Gly
Arg Val Glu Met Ser Ala Asp Asp Ser35 40
45gag gaa tgg tgc tgt gcc cag cga ggc cac caa gag gga cca gaa cct
429Glu Glu Trp Cys Cys Ala Gln Arg Gly His Gln Glu Gly Pro Glu Pro50
55 60 65caa acg ggg caa ctg
ggg caa cca gat cga gtt tgt act gac gag cgt 477Gln Thr Gly Gln Leu
Gly Gln Pro Asp Arg Val Cys Thr Asp Glu Arg70 75
80ggg cta tgc cgt ggg cct ggg caa tgt ctg gcg ctt ccc ata cct
ctg 525Gly Leu Cys Arg Gly Pro Gly Gln Cys Leu Ala Leu Pro Ile Pro
Leu85 90 95cta tcg caa cgg ggg agg cgc
ctt cat gtt ccc cta ctt cat cat gct 573Leu Ser Gln Arg Gly Arg Arg
Leu His Val Pro Leu Leu His His Ala100 105
110cat ctt ctg cgg gat ccc cct ctt ctt cat gga gct ctc ctt cgg cca
621His Leu Leu Arg Asp Pro Pro Leu Leu His Gly Ala Leu Leu Arg Pro115
120 125gtt tgc aag cca ggg gtg cct ggg ggt
ctg gag gat cag ccc cat gtt 669Val Cys Lys Pro Gly Val Pro Gly Gly
Leu Glu Asp Gln Pro His Val130 135 140
145caa agg agt ggg cta tgg tat gat ggt ggt gtc cac cta cat
cgg cat 717Gln Arg Ser Gly Leu Trp Tyr Asp Gly Gly Val His Leu His
Arg His150 155 160cta cta caa tgt ggt cat
ctg cat cgc ctt cta cta ctt ctt ctc gtc 765Leu Leu Gln Cys Gly His
Leu His Arg Leu Leu Leu Leu Leu Leu Val165 170
175cat gac gca cgt gct gcc ctg ggc cta ctg caa taa cccctggaac
811His Asp Ala Arg Ala Ala Leu Gly Leu Leu Gln *180
185acgcatgact gcgccggtgt actggacgcc tccaacctca ccaatggctc tcggccagcc
871gccttgccca gcaacctctc ccacctgctc aaccacagcc tccagaggac cagccccagc
931gaggagtact ggaggctgta cgtgctgaag ctgtcagatg acattgggaa ctttggggag
991gtgcggctgc ccctccttgg ctgcctcggt gtctcctggt tggtcgtctt cctctgcctc
1051atccgagggg tcaagtcttc agggaaagtg gtgtacttca cggccacgtt cccctacgtg
1111gtgctgacca ttctgtttgt ccgcggagtg accctggagg gagcctttga cggcatcatg
1171tactacctaa ccccgcagtg ggacaagatc ctggaggcca aggtgtgggg tgatgctgcc
1231tcccagatct tctactcact gggctgcgcg tggggaggcc tcatcaccat ggcttcctac
1291aacaagttcc acaataactg ttaccgggac agtgtcatca tcagcatcac caactgtgcc
1351accagcgtct atgctggctt cgtcatcttc tccatcctcg gcttcatggc caatcacctg
1411ggcgtggatg tgtcccgtgt ggcagaccac ggccctggcc tggccttcgt ggcttacccc
1471gaggccctca cactacttcc catctccccg ctgtggtctc tgctcttctt cttcatgctt
1531atcctgctgg ggctgggcac tcagttctgc ctcctggaga cgctggtcac agccattgtg
1591gatgaggtgg ggaatgagtg gatcctgcag aaaaagacct atgtgacctt gggcgtggct
1651gtggctggct tcctgctggg catccccctc accagccagg caggcatcta ttggctgctg
1711ctgatggaca actatgcggc cagcttctcc ttggtggtca tctcctgcat catgtgtgtg
1771gccatcatgt acatctacgg gcaccggaac tacttccagg acatccagat gatgctggga
1831ttcccaccac ccctcttctt tcagatctgc tggcgcttcg tctctcccgc catcatcttc
1891tttattctag ttttcactgt gatccagtac cagccgatca cctacaacca ctaccagtac
1951ccaggctggg ccgtggccat tggcttcctc atggctctgt cctccgtcct ctgcatcccc
2011ctctacgcca tgttccggct ctgccgcaca gacggggaca ccctcctcca gcgtttgaaa
2071aatgccacaa agccaagcag agactggggc cctgccctcc tggagcaccg gacagggcgc
2131tacgccccca ccatagcccc ctctcctgag gacggcttcg aggtccagcc actgcacccg
2191gacaaggcgc agatccccat tgtgggcagt aatggctcca gccgcctcca ggactcccgg
2251atatgagcac agctgccagg ggagtggccc cacccccacc ccgtgctcca cgagagact
231022633PRTHomo sapiens 22Met Val Gly Lys Gly Ala Lys Gly Met Leu Asn
Gly Ala Val Pro Ser1 5 10
15Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg Gly Asn Trp Gly Asn20
25 30Gln Ile Glu Phe Val Leu Thr Ser Val Gly
Tyr Ala Val Gly Leu Gly35 40 45Asn Val
Trp Arg Phe Pro Tyr Leu Cys Tyr Arg Asn Gly Gly Gly Ala50
55 60Phe Met Phe Pro Tyr Phe Ile Met Leu Ile Phe Cys
Gly Ile Pro Leu65 70 75
80Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala Ser Gln Gly Cys Leu85
90 95Gly Val Trp Arg Ile Ser Pro Met Phe Lys
Gly Val Gly Tyr Gly Met100 105 110Met Val
Val Ser Thr Tyr Ile Gly Ile Tyr Tyr Asn Val Val Ile Cys115
120 125Ile Ala Phe Tyr Tyr Phe Phe Ser Ser Met Thr His
Val Leu Pro Trp130 135 140Ala Tyr Cys Asn
Asn Pro Trp Asn Thr His Asp Cys Ala Gly Val Leu145 150
155 160Asp Ala Ser Asn Leu Thr Asn Gly Ser
Arg Pro Ala Ala Leu Pro Ser165 170 175Asn
Leu Ser His Leu Leu Asn His Ser Leu Gln Arg Thr Ser Pro Ser180
185 190Glu Glu Tyr Trp Arg Leu Tyr Val Leu Lys Leu
Ser Asp Asp Ile Gly195 200 205Asn Phe Gly
Glu Val Arg Leu Pro Leu Leu Gly Cys Leu Gly Val Ser210
215 220Trp Leu Val Val Phe Leu Cys Leu Ile Arg Gly Val
Lys Ser Ser Gly225 230 235
240Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr Val Val Leu Thr Ile245
250 255Leu Phe Val Arg Gly Val Thr Leu Glu
Gly Ala Phe Asp Gly Ile Met260 265 270Tyr
Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu Glu Ala Lys Val Trp275
280 285Gly Asp Ala Ala Ser Gln Ile Phe Tyr Ser Leu
Ala Cys Ala Trp Gly290 295 300Gly Leu Ile
Thr Met Ala Ser Tyr Asn Lys Phe His Asn Asn Cys Tyr305
310 315 320Arg Asp Ser Val Ile Ile Ser
Ile Thr Asn Cys Ala Thr Ser Val Tyr325 330
335Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe Met Ala Asn His Leu340
345 350Gly Val Asp Val Ser Arg Val Ala Asp
His Gly Pro Gly Leu Ala Phe355 360 365Val
Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro Ile Ser Pro Leu Trp370
375 380Ser Leu Leu Phe Phe Phe Met Leu Ile Leu Leu
Gly Leu Gly Thr Gln385 390 395
400Phe Cys Leu Leu Glu Thr Leu Val Thr Ala Ile Val Asp Glu Val
Gly405 410 415Asn Glu Trp Ile Leu Gln Lys
Lys Thr Tyr Val Thr Leu Gly Val Ala420 425
430Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr Ser Gln Ala Gly Ile435
440 445Tyr Trp Leu Leu Leu Met Asp Asn Tyr
Ala Ala Ser Phe Ser Leu Val450 455 460Val
Ile Ser Cys Ile Met Cys Val Ala Ile Met Tyr Ile Tyr Gly His465
470 475 480Arg Asn Tyr Phe Gln Asp
Ile Gln Met Met Leu Gly Phe Pro Pro Pro485 490
495Leu Phe Phe Gln Ile Cys Trp Arg Phe Val Ser Pro Ala Ile Ile
Phe500 505 510Phe Ile Leu Val Phe Thr Val
Ile Gln Tyr Gln Pro Ile Thr Tyr Asn515 520
525His Tyr Gln Tyr Pro Gly Trp Ala Val Ala Ile Gly Phe Leu Met Ala530
535 540Leu Ser Ser Val Leu Cys Ile Pro Leu
Tyr Ala Met Phe Arg Leu Cys545 550 555
560Arg Thr Asp Gly Asp Thr Leu Leu Gln Arg Leu Lys Asn Ala
Thr Lys565 570 575Pro Ser Arg Asp Trp Gly
Pro Ala Leu Leu Glu His Arg Thr Gly Arg580 585
590Tyr Ala Pro Thr Ile Ala Pro Ser Pro Glu Asp Gly Phe Glu Val
Gln595 600 605Ser Leu His Pro Asp Lys Ala
Gln Ile Pro Ile Val Gly Ser Asn Gly610 615
620Ser Ser Arg Leu Gln Asp Ser Arg Ile625
63023687PRTHomo sapiens 23Met Val Gly Lys Gly Ala Lys Gly Met Leu Val Thr
Leu Leu Pro Val1 5 10
15Gln Arg Ser Phe Phe Leu Pro Pro Phe Ser Gly Ala Thr Pro Ser Thr20
25 30Ser Leu Ala Glu Ser Val Leu Lys Val Trp
His Gly Ala Tyr Asn Ser35 40 45Gly Leu
Leu Pro Gln Leu Met Ala Gln His Ser Leu Ala Met Ala Gln50
55 60Asn Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp
Gln Asn Leu Lys65 70 75
80Arg Gly Asn Trp Gly Asn Gln Ile Glu Phe Val Leu Thr Ser Val Gly85
90 95Tyr Ala Val Gly Leu Gly Asn Val Trp Arg
Phe Pro Tyr Leu Cys Tyr100 105 110Arg Asn
Gly Gly Gly Ala Phe Met Phe Pro Tyr Phe Ile Met Leu Ile115
120 125Phe Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser
Phe Gly Gln Phe130 135 140Ala Ser Gln Gly
Cys Leu Gly Val Trp Arg Ile Ser Pro Met Phe Lys145 150
155 160Gly Val Gly Tyr Gly Met Met Val Val
Ser Thr Tyr Ile Gly Ile Tyr165 170 175Tyr
Asn Val Val Ile Cys Ile Ala Phe Tyr Tyr Phe Phe Ser Ser Met180
185 190Thr His Val Leu Pro Trp Ala Tyr Cys Asn Asn
Pro Trp Asn Thr His195 200 205Asp Cys Ala
Gly Val Leu Asp Ala Ser Asn Leu Thr Asn Gly Ser Arg210
215 220Pro Ala Ala Leu Pro Ser Asn Leu Ser His Leu Leu
Asn His Ser Leu225 230 235
240Gln Arg Thr Ser Pro Ser Glu Glu Tyr Trp Arg Leu Tyr Val Leu Lys245
250 255Leu Ser Asp Asp Ile Gly Asn Phe Gly
Glu Val Arg Leu Pro Leu Leu260 265 270Gly
Cys Leu Gly Val Ser Trp Leu Val Val Phe Leu Cys Leu Ile Arg275
280 285Gly Val Lys Ser Ser Gly Lys Val Val Tyr Phe
Thr Ala Thr Phe Pro290 295 300Tyr Val Val
Leu Thr Ile Leu Phe Val Arg Gly Val Thr Leu Glu Gly305
310 315 320Ala Phe Asp Gly Ile Met Tyr
Tyr Leu Thr Pro Gln Trp Asp Lys Ile325 330
335Leu Glu Ala Lys Val Trp Gly Asp Ala Ala Ser Gln Ile Phe Tyr Ser340
345 350Leu Ala Cys Ala Trp Gly Gly Leu Ile
Thr Met Ala Ser Tyr Asn Lys355 360 365Phe
His Asn Asn Cys Tyr Arg Asp Ser Val Ile Ile Ser Ile Thr Asn370
375 380Cys Ala Thr Ser Val Tyr Ala Gly Phe Val Ile
Phe Ser Ile Leu Gly385 390 395
400Phe Met Ala Asn His Leu Gly Val Asp Val Ser Arg Val Ala Asp
His405 410 415Gly Pro Gly Leu Ala Phe Val
Ala Tyr Pro Glu Ala Leu Thr Leu Leu420 425
430Pro Ile Ser Pro Leu Trp Ser Leu Leu Phe Phe Phe Met Leu Ile Leu435
440 445Leu Gly Leu Gly Thr Gln Phe Cys Leu
Leu Glu Thr Leu Gly Thr Ala450 455 460Ile
Val Asp Glu Val Gly Asn Glu Trp Ile Leu Gln Lys Lys Thr Asn465
470 475 480Met Thr Leu Gly Arg Ala
Val Ala Gly Phe Leu Leu Gly Ile Pro Leu485 490
495Thr Ser Gln Ala Gly Ile Tyr Trp Leu Leu Leu Met Asp Asn Tyr
Ala500 505 510Ala Ser Phe Ser Leu Val Val
Ile Ser Cys Ile Met Cys Val Ala Ile515 520
525Met Tyr Ile Tyr Gly His Arg Asn Tyr Phe Gln Asp Ile Gln Met Met530
535 540Leu Gly Phe Pro Pro Pro Leu Phe Phe
Gln Ile Cys Trp Arg Phe Val545 550 555
560Ser Pro Ala Ile Ile Phe Phe Ile Leu Val Phe Thr Val Ile
Gln Tyr565 570 575Gln Pro Ile Thr Tyr Asn
His Tyr Gln Tyr Pro Gly Trp Ala Val Ala580 585
590Ile Gly Phe Leu Met Ala Leu Ser Ser Val Leu Cys Ile Pro Leu
Tyr595 600 605Ala Met Phe Arg Leu Cys Arg
Thr Asp Gly Asp Thr Leu Leu Gln Arg610 615
620Leu Lys Asn Ala Thr Lys Pro Ser Arg Asp Trp Gly Pro Ala Leu Leu625
630 635 640Glu His Arg Thr
Gly Arg Tyr Ala Pro Thr Ile Ala Pro Ser Pro Glu645 650
655Asp Gly Phe Glu Val Gln Ser Leu His Pro Asp Lys Ala Gln
Ile Pro660 665 670Ile Val Gly Ser Asn Gly
Ser Arg Arg Leu Gln Asp Ser Arg Ile675 680
68524638PRTHomo sapiens 24Met Ala Ala Ala His Gly Pro Val Ala Pro Ser
Ser Pro Glu Gln Asn1 5 10
15Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg20
25 30Gly Asn Trp Gly Asn Gln Ile Glu Phe Val
Leu Thr Ser Val Gly Tyr35 40 45Ala Val
Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg50
55 60Asn Gly Gly Gly Ala Phe Met Phe Pro Tyr Phe Ile
Met Leu Ile Phe65 70 75
80Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala85
90 95Ser Gln Gly Cys Leu Gly Val Trp Arg Ile
Ser Pro Met Phe Lys Gly100 105 110Val Gly
Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr115
120 125Asn Val Val Ile Cys Ile Ala Phe Tyr Tyr Phe Phe
Ser Ser Met Thr130 135 140His Val Leu Pro
Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His Asp145 150
155 160Cys Ala Gly Val Leu Asp Ala Ser Asn
Leu Thr Asn Gly Ser Arg Pro165 170 175Ala
Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln180
185 190Arg Thr Ser Pro Ser Glu Glu Tyr Trp Arg Leu
Tyr Val Leu Lys Leu195 200 205Ser Asp Asp
Ile Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu Gly210
215 220Cys Leu Gly Val Ser Trp Leu Val Val Phe Leu Cys
Leu Ile Arg Gly225 230 235
240Val Lys Ser Ser Gly Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr245
250 255Val Val Leu Thr Ile Leu Phe Val Arg
Gly Val Thr Leu Glu Gly Ala260 265 270Phe
Asp Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu275
280 285Glu Ala Lys Val Trp Gly Asp Ala Ala Ser Gln
Ile Phe Tyr Ser Leu290 295 300Ala Cys Ala
Trp Gly Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys Phe305
310 315 320His Asn Asn Cys Tyr Arg Asp
Ser Val Ile Ile Ser Ile Thr Asn Cys325 330
335Ala Thr Ser Val Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe340
345 350Met Ala Asn His Leu Gly Val Asp Val
Ser Arg Val Ala Asp His Gly355 360 365Pro
Gly Leu Ala Phe Val Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro370
375 380Ile Ser Pro Leu Trp Ser Leu Leu Phe Phe Phe
Met Leu Ile Leu Leu385 390 395
400Gly Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr Leu Val Thr Ala
Ile405 410 415Val Asp Glu Val Gly Asn Glu
Trp Ile Leu Gln Lys Lys Thr Tyr Val420 425
430Thr Leu Gly Val Ala Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr435
440 445Ser Gln Ala Gly Ile Tyr Trp Leu Leu
Leu Met Asp Asn Tyr Ala Ala450 455 460Ser
Phe Ser Leu Val Val Ile Ser Cys Ile Met Cys Val Ala Ile Met465
470 475 480Tyr Ile Tyr Gly His Arg
Asn Tyr Phe Gln Asp Ile Gln Met Met Leu485 490
495Gly Phe Pro Pro Pro Leu Phe Phe Gln Ile Cys Trp Arg Phe Val
Ser500 505 510Pro Ala Ile Ile Phe Phe Ile
Leu Val Phe Thr Val Ile Gln Tyr Gln515 520
525Pro Ile Thr Tyr Asn His Tyr Gln Tyr Pro Gly Trp Ala Val Ala Ile530
535 540Gly Phe Leu Met Ala Leu Ser Ser Val
Leu Cys Ile Pro Leu Tyr Ala545 550 555
560Met Phe Arg Leu Cys Arg Thr Asp Gly Asp Thr Leu Leu Gln
Arg Leu565 570 575Lys Asn Ala Thr Lys Pro
Ser Arg Asp Trp Gly Pro Ala Leu Leu Glu580 585
590His Arg Thr Gly Arg Tyr Ala Pro Thr Ile Ala Pro Ser Pro Glu
Asp595 600 605Gly Phe Glu Val Gln Ser Leu
His Pro Asp Lys Ala Gln Ile Pro Ile610 615
620Val Gly Ser Asn Gly Ser Ser Arg Leu Gln Asp Ser Arg Ile625
630 63525692PRTHomo sapiens 25Met Ala Ala Ala His
Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Val1 5
10 15Thr Leu Leu Pro Val Gln Arg Ser Phe Phe Leu
Pro Pro Phe Ser Gly20 25 30Ala Thr Pro
Ser Thr Ser Leu Ala Glu Ser Val Leu Lys Val Trp His35 40
45Gly Ala Tyr Asn Ser Gly Leu Leu Pro Gln Leu Met Ala
Gln His Ser50 55 60Leu Ala Met Ala Gln
Asn Gly Ala Val Pro Ser Glu Ala Thr Lys Arg65 70
75 80Asp Gln Asn Leu Lys Arg Gly Asn Trp Gly
Asn Gln Ile Glu Phe Val85 90 95Leu Thr
Ser Val Gly Tyr Ala Val Gly Leu Gly Asn Val Trp Arg Phe100
105 110Pro Tyr Leu Cys Tyr Arg Asn Gly Gly Gly Ala Phe
Met Phe Pro Tyr115 120 125Phe Ile Met Leu
Ile Phe Cys Gly Ile Pro Leu Phe Phe Met Glu Leu130 135
140Ser Phe Gly Gln Phe Ala Ser Gln Gly Cys Leu Gly Val Trp
Arg Ile145 150 155 160Ser
Pro Met Phe Lys Gly Val Gly Tyr Gly Met Met Val Val Ser Thr165
170 175Tyr Ile Gly Ile Tyr Tyr Asn Val Val Ile Cys
Ile Ala Phe Tyr Tyr180 185 190Phe Phe Ser
Ser Met Thr His Val Leu Pro Trp Ala Tyr Cys Asn Asn195
200 205Pro Trp Asn Thr His Asp Cys Ala Gly Val Leu Asp
Ala Ser Asn Leu210 215 220Thr Asn Gly Ser
Arg Pro Ala Ala Leu Pro Ser Asn Leu Ser His Leu225 230
235 240Leu Asn His Ser Leu Gln Arg Thr Ser
Pro Ser Glu Glu Tyr Trp Arg245 250 255Leu
Tyr Val Leu Lys Leu Ser Asp Asp Ile Gly Asn Phe Gly Glu Val260
265 270Arg Leu Pro Leu Leu Gly Cys Leu Gly Val Ser
Trp Leu Val Val Phe275 280 285Leu Cys Leu
Ile Arg Gly Val Lys Ser Ser Gly Lys Val Val Tyr Phe290
295 300Thr Ala Thr Phe Pro Tyr Val Val Leu Thr Ile Leu
Phe Val Arg Gly305 310 315
320Val Thr Leu Glu Gly Ala Phe Asp Gly Ile Met Tyr Tyr Leu Thr Pro325
330 335Gln Trp Asp Lys Ile Leu Glu Ala Lys
Val Trp Gly Asp Ala Ala Ser340 345 350Gln
Ile Phe Tyr Ser Leu Ala Cys Ala Trp Gly Gly Leu Ile Thr Met355
360 365Ala Ser Tyr Asn Lys Phe His Asn Asn Cys Tyr
Arg Asp Ser Val Ile370 375 380Ile Ser Ile
Thr Asn Cys Ala Thr Ser Val Tyr Ala Gly Phe Val Ile385
390 395 400Phe Ser Ile Leu Gly Phe Met
Ala Asn His Leu Gly Val Asp Val Ser405 410
415Arg Val Ala Asp His Gly Pro Gly Leu Ala Phe Val Ala Tyr Pro Glu420
425 430Ala Leu Thr Leu Leu Pro Ile Ser Pro
Leu Trp Ser Leu Leu Phe Phe435 440 445Phe
Met Leu Ile Leu Leu Gly Leu Gly Thr Gln Phe Cys Leu Leu Glu450
455 460Thr Leu Val Thr Ala Ile Val Asp Glu Val Gly
Asn Glu Trp Ile Leu465 470 475
480Gln Lys Lys Thr Tyr Val Thr Leu Gly Val Ala Val Ala Gly Phe
Leu485 490 495Leu Gly Ile Pro Leu Thr Ser
Gln Ala Gly Ile Tyr Trp Leu Leu Leu500 505
510Met Asp Asn Tyr Ala Ala Ser Phe Ser Leu Val Val Ile Ser Cys Ile515
520 525Met Cys Val Ala Ile Met Tyr Ile Tyr
Gly His Arg Asn Tyr Phe Gln530 535 540Asp
Ile Gln Met Met Leu Gly Phe Pro Pro Pro Leu Phe Phe Gln Ile545
550 555 560Cys Trp Arg Phe Val Ser
Pro Ala Ile Ile Phe Phe Ile Leu Val Phe565 570
575Thr Val Ile Gln Tyr Gln Pro Ile Thr Tyr Asn His Tyr Gln Tyr
Pro580 585 590Gly Trp Ala Val Ala Ile Gly
Phe Leu Met Ala Leu Ser Ser Val Leu595 600
605Cys Ile Pro Leu Tyr Ala Met Phe Arg Leu Cys Arg Thr Asp Gly Asp610
615 620Thr Leu Leu Gln Arg Leu Lys Asn Ala
Thr Lys Pro Ser Arg Asp Trp625 630 635
640Gly Pro Ala Leu Leu Glu His Arg Thr Gly Arg Tyr Ala Pro
Thr Ile645 650 655Ala Pro Ser Pro Glu Asp
Gly Phe Glu Val Gln Ser Leu His Pro Asp660 665
670Lys Ala Gln Ile Pro Ile Val Gly Ser Asn Gly Ser Ser Arg Leu
Gln675 680 685Asp Ser Arg
Ile69026184PRTHomo sapiensmisc_feature42Xaa = A or T 26Met Ala Ala Ala
His Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Gly1 5
10 15Pro Gly Gln Val Pro Ala Trp Ala Gly Ser
Arg Cys Val Leu Ile Pro20 25 30Gly Arg
Arg Gly Arg Val Glu Met Ser Xaa Asp Gly Xaa Glu Glu Trp35
40 45Cys Cys Ala Gln Arg Gly His Gln Glu Gly Pro Glu
Pro Gln Thr Gly50 55 60Gln Leu Gly Gln
Pro Asp Arg Val Cys Thr Asp Glu Arg Gly Leu Cys65 70
75 80Arg Gly Pro Gly Gln Cys Leu Ala Leu
Pro Ile Pro Leu Leu Ser Gln85 90 95Arg
Gly Arg Arg Leu His Val Pro Leu Leu His His Ala His Leu Leu100
105 110Arg Asp Pro Pro Leu Leu His Gly Ala Leu Leu
Arg Pro Val Cys Lys115 120 125Pro Gly Val
Pro Gly Gly Leu Glu Asp Gln Pro His Val Gln Arg Ser130
135 140Gly Leu Trp Tyr Asp Gly Gly Val His Leu His Arg
His Leu Leu Gln145 150 155
160Cys Gly His Leu His Arg Leu Leu Leu Leu Leu Leu Val His Asp Ala165
170 175Arg Ala Ala Leu Gly Leu Leu
Gln18027125PRTHomo sapiens 27Met Ala Ala Ala His Gly Pro Val Ala Pro Ser
Ser Pro Glu Gln Asn1 5 10
15Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg20
25 30Gly Asn Trp Gly Asn Gln Ile Glu Arg Leu
His Val Pro Leu Leu His35 40 45His Ala
His Leu Leu Arg Asp Pro Pro Leu Leu His Gly Ala Leu Leu50
55 60Arg Pro Val Cys Lys Pro Gly Val Pro Gly Gly Leu
Glu Asp Gln Pro65 70 75
80His Val Gln Arg Ser Gly Leu Trp Tyr Asp Gly Gly Val His Leu His85
90 95Arg His Leu Leu Gln Cys Gly His Leu His
Arg Leu Leu Leu Leu Leu100 105 110Leu Val
His Asp Ala Arg Ala Ala Leu Gly Leu Leu Gln115 120
1252864PRTHomo sapiens 28Met Ala Ala Ala His Gly Pro Val Ala Pro
Ser Ser Pro Glu Gln Ala1 5 10
15Pro Ser Cys Ser Pro Thr Ser Ser Cys Ser Ser Ser Ala Gly Ser Pro20
25 30Ser Ser Ser Trp Ser Ser Pro Ser Ala
Ser Leu Gln Ala Arg Gly Ala35 40 45Trp
Gly Ser Gly Gly Ser Ala Pro Cys Ser Lys Glu Trp Ala Met Val50
55 6029229PRTHomo sapiens 29Met Ala Ala Ala His Gly
Pro Val Ala Pro Ser Ser Pro Glu Gln Asn1 5
10 15Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln
Asn Leu Lys Arg20 25 30Gly Asn Trp Gly
Asn Gln Ile Glu Phe Val Leu Thr Ser Val Gly Tyr35 40
45Ala Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys
Tyr Arg50 55 60Asn Gly Gly Gly Ala Phe
Met Phe Pro Tyr Phe Ile Met Leu Ile Phe65 70
75 80Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser
Phe Gly Gln Phe Ala85 90 95Ser Gln Gly
Cys Leu Gly Val Trp Arg Ile Ser Pro Met Phe Lys Gly100
105 110Val Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile
Gly Ile Tyr Tyr115 120 125Asn Val Val Ile
Cys Ile Ala Phe Tyr Tyr Phe Phe Ser Ser Met Thr130 135
140His Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr
His Asp145 150 155 160Cys
Ala Gly Val Leu Asp Ala Ser Asn Leu Thr Asn Gly Ser Arg Pro165
170 175Ala Ala Leu Pro Ser Asn Leu Ser His Leu Leu
Asn His Ser Leu Gln180 185 190Arg Thr Ser
Pro Ser Glu Glu Tyr Trp Arg Ile Ser Leu Cys Cys Pro195
200 205Gly Trp Ser Ala Val Ala Arg Ser Gln Leu Thr Ala
Thr Ser Ala Ser210 215 220Arg Gly Cys Thr
Cys2253094PRTHomo sapiens 30Met Ala Ala Ala His Gly Pro Val Ala Pro Ser
Ser Pro Glu Gln Gly1 5 10
15Pro Gly Gln Val Pro Ala Trp Ala Gly Ser Arg Cys Val Leu Ile Pro20
25 30Gly Arg Arg Gly Arg Val Glu Met Ser Ala
Asp Gly Ser Glu Gly Glu35 40 45Glu Thr
Arg Lys Leu His Phe Ser Pro Val Gly His Pro Asp Leu Thr50
55 60Phe Lys Arg Met Val Leu Cys Pro Ala Arg Pro Pro
Arg Gly Thr Arg65 70 75
80Thr Ser Asn Gly Ala Thr Gly Ala Thr Arg Ser Ser Leu Tyr85
9031456PRTHomo sapiens 31Met Ala Ala Ala His Gly Pro Val Ala Pro Ser
Ser Pro Glu Gln Asn1 5 10
15Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu Lys Arg20
25 30Gly Asn Trp Gly Asn Gln Ile Glu Phe Val
Leu Thr Ser Val Gly Tyr35 40 45Ala Val
Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg50
55 60Asn Gly Gly Gly Ala Phe Met Phe Pro Tyr Phe Ile
Met Leu Ile Phe65 70 75
80Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln Phe Ala85
90 95Ser Gln Gly Cys Leu Gly Val Trp Arg Ile
Ser Pro Met Phe Lys Gly100 105 110Val Gly
Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr Tyr115
120 125Asn Val Val Ile Cys Ile Ala Phe Tyr Tyr Phe Phe
Ser Ser Met Thr130 135 140His Val Leu Pro
Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His Asp145 150
155 160Cys Ala Gly Val Leu Asp Ala Ser Asn
Leu Thr Asn Gly Ser Arg Pro165 170 175Ala
Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser Leu Gln180
185 190Arg Thr Ser Pro Ser Glu Glu Tyr Trp Arg Leu
Tyr Val Leu Lys Leu195 200 205Ser Asp Asp
Ile Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu Gly210
215 220Cys Leu Gly Val Ser Trp Leu Val Val Phe Leu Cys
Leu Ile Arg Gly225 230 235
240Val Lys Ser Ser Gly Lys Val Val Tyr Phe Thr Ala Thr Phe Pro Tyr245
250 255Val Val Leu Thr Ile Leu Phe Val Arg
Gly Val Thr Leu Glu Gly Ala260 265 270Phe
Asp Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys Ile Leu275
280 285Glu Ala Lys Val Trp Gly Asp Ala Ala Ser Gln
Ile Phe Tyr Ser Leu290 295 300Gly Cys Ala
Trp Gly Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys Phe305
310 315 320His Asn Asn Cys Tyr Arg Asp
Ser Val Ile Ile Ser Ile Thr Asn Cys325 330
335Ala Thr Ser Val Tyr Ala Gly Phe Val Ile Phe Ser Ile Leu Gly Phe340
345 350Met Ala Asn His Leu Gly Val Asp Val
Ser Arg Val Ala Asp His Gly355 360 365Pro
Gly Leu Ala Phe Val Ala Tyr Pro Glu Ala Leu Thr Leu Leu Pro370
375 380Ile Ser Pro Leu Trp Ser Leu Leu Phe Phe Phe
Met Leu Ile Leu Leu385 390 395
400Gly Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr Leu Val Thr Ala
Ile405 410 415Val Asp Glu Val Gly Asn Glu
Trp Ile Leu Gln Lys Lys Thr Tyr Val420 425
430Thr Leu Gly Val Ala Val Ala Gly Phe Leu Leu Gly Ile Pro Leu Thr435
440 445Ser Gln Ala Ser Ile Gly Cys Cys450
45532550PRTHomo sapiens 32Met Ala Ala Ala His Gly Pro Val
Ala Pro Ser Ser Pro Glu Gln Asn1 5 10
15Gly Ala Val Pro Ser Glu Ala Thr Lys Arg Asp Gln Asn Leu
Lys Arg20 25 30Gly Asn Trp Gly Asn Gln
Ile Glu Phe Val Leu Thr Ser Val Gly Tyr35 40
45Ala Val Gly Leu Gly Asn Val Trp Arg Phe Pro Tyr Leu Cys Tyr Arg50
55 60Asn Gly Gly Gly Ala Phe Met Phe Pro
Tyr Phe Ile Met Leu Ile Phe65 70 75
80Cys Gly Ile Pro Leu Phe Phe Met Glu Leu Ser Phe Gly Gln
Phe Ala85 90 95Ser Gln Gly Cys Leu Gly
Val Trp Arg Ile Ser Pro Met Phe Lys Gly100 105
110Val Gly Tyr Gly Met Met Val Val Ser Thr Tyr Ile Gly Ile Tyr
Tyr115 120 125Asn Val Val Ile Cys Ile Ala
Phe Tyr Tyr Phe Phe Ser Ser Met Thr130 135
140His Val Leu Pro Trp Ala Tyr Cys Asn Asn Pro Trp Asn Thr His Asp145
150 155 160Cys Ala Gly Val
Leu Asp Ala Ser Asn Leu Thr Asn Gly Ser Arg Pro165 170
175Ala Ala Leu Pro Ser Asn Leu Ser His Leu Leu Asn His Ser
Leu Gln180 185 190Arg Thr Ser Pro Ser Glu
Glu Tyr Trp Arg Leu Tyr Val Leu Lys Leu195 200
205Ser Asp Asp Ile Gly Asn Phe Gly Glu Val Arg Leu Pro Leu Leu
Gly210 215 220Cys Leu Gly Val Ser Trp Leu
Val Val Phe Leu Cys Leu Ile Arg Gly225 230
235 240Val Lys Ser Ser Gly Lys Val Val Tyr Phe Thr Ala
Thr Phe Pro Tyr245 250 255Val Val Leu Thr
Ile Leu Phe Val Arg Gly Val Thr Leu Glu Gly Ala260 265
270Phe Asp Gly Ile Met Tyr Tyr Leu Thr Pro Gln Trp Asp Lys
Ile Leu275 280 285Glu Ala Lys Val Trp Gly
Asp Ala Ala Ser Gln Ile Phe Tyr Ser Leu290 295
300Gly Cys Ala Trp Gly Gly Leu Ile Thr Met Ala Ser Tyr Asn Lys
Phe305 310 315 320His Asn
Asn Cys Tyr Arg Asp Ser Val Ile Ile Ser Ile Thr Asn Cys325
330 335Ala Thr Ser Val Tyr Ala Gly Phe Val Ile Phe Ser
Ile Leu Gly Phe340 345 350Met Ala Asn His
Leu Gly Val Asp Val Ser Arg Val Ala Asp His Gly355 360
365Pro Gly Leu Ala Phe Val Ala Tyr Pro Glu Ala Leu Thr Leu
Leu Pro370 375 380Ile Ser Pro Leu Trp Ser
Leu Leu Phe Phe Phe Met Leu Ile Leu Leu385 390
395 400Gly Leu Gly Thr Gln Phe Cys Leu Leu Glu Thr
Leu Val Thr Ala Ile405 410 415Val Asp Glu
Val Gly Asn Glu Trp Ile Leu Gln Lys Lys Thr Tyr Val420
425 430Thr Leu Gly Val Ala Val Ala Gly Phe Leu Leu Gly
Ile Pro Leu Thr435 440 445Ser Gln Ala Gly
Ile Tyr Trp Leu Leu Leu Met Asp Asn Tyr Ala Ala450 455
460Ser Phe Ser Leu Val Val Ile Ser Cys Ile Met Cys Val Ala
Ile Met465 470 475 480Tyr
Ile Tyr Gly His Arg Asn Tyr Phe Gln Asp Ile Gln Met Met Leu485
490 495Gly Phe Pro Pro Pro Leu Phe Phe Gln Ile Cys
Trp Arg Phe Val Ser500 505 510Pro Ala Ile
Ile Phe Ser Leu Pro Thr His Arg Ala Pro Ala Pro Gly515
520 525Pro Ala Cys Phe Phe His Gly Pro Asp Val Asp Ser
Cys Pro Cys Ser530 535 540Ala Pro Asn Ala
Leu Phe545 55033188PRTHomo sapiens 33Met Ala Ala Ala His
Gly Pro Val Ala Pro Ser Ser Pro Glu Gln Asn1 5
10 15Ile Phe Gln Gly Pro Gly Gln Val Pro Ala Trp
Ala Gly Ser Arg Cys20 25 30Val Leu Ile
Pro Gly Arg Arg Gly Arg Val Glu Met Ser Ala Asp Asp35 40
45Ser Glu Glu Trp Cys Cys Ala Gln Arg Gly His Gln Glu
Gly Pro Glu50 55 60Pro Gln Thr Gly Gln
Leu Gly Gln Pro Asp Arg Val Cys Thr Asp Glu65 70
75 80Arg Gly Leu Cys Arg Gly Pro Gly Gln Cys
Leu Ala Leu Pro Ile Pro85 90 95Leu Leu
Ser Gln Arg Gly Arg Arg Leu His Val Pro Leu Leu His His100
105 110Ala His Leu Leu Arg Asp Pro Pro Leu Leu His Gly
Ala Leu Leu Arg115 120 125Pro Val Cys Lys
Pro Gly Val Pro Gly Gly Leu Glu Asp Gln Pro His130 135
140Val Gln Arg Ser Gly Leu Trp Tyr Asp Gly Gly Val His Leu
His Arg145 150 155 160His
Leu Leu Gln Cys Gly His Leu His Arg Leu Leu Leu Leu Leu Leu165
170 175Val His Asp Ala Arg Ala Ala Leu Gly Leu Leu
Gln180 1853422DNAArtificial Sequenceoligonucleotide
SLC6A9LF 34gtgctgggct gcatggggac tg
223523DNAArtificial Sequenceoligonucleotide SLC6A9LR 35cacagtctct
cgtggagcac ggg
233618DNAArtificial Sequencesequencing oligonucleotide PrimerPU
36tgtaaaacga cggccagt
183718DNAArtificial Sequencesequencing oligonucleotide PrimerRP
37caggaaacag ctatgacc
18
User Contributions:
Comment about this patent or add new information about this topic:
People who visited this patent also read: | |
Patent application number | Title |
---|---|
20110114126 | RINSING METHOD FOR A WATER-BEARING DOMESTIC APPLIANCE, ESPECIALLY DISHWASHER |
20110114125 | METHOD FOR CLEANING A WAFER STAGE |
20110114124 | HYGIENIC TOOTHBRUSH AND METHOD OF USING SAME |
20110114123 | DISTANCE INDICATOR |
20110114122 | WATER-CONDUCTING HOUSEHOLD APPLIANCE AND METHOD FOR THE OPERATION THEREOF |