Patent application title: CODON OPTIMIZED NUCLEIC ACID ENCODING A RETINITIS PIGMENTOSA GTPASE REGULATOR (RPGR)
Inventors:
Guo-Jie Ye (Gainesville, FL, US)
Jilin Liu (Gainesville, FL, US)
Assignees:
APPLIED GENETIC TECHNOLOGIES CORPORATION
IPC8 Class: AC07K1447FI
USPC Class:
1 1
Class name:
Publication date: 2017-06-15
Patent application number: 20170166617
Abstract:
This invention relates generally to a codon optimized nucleic acid
encoding a retinitis pigmentosa GTPase regulator (RPGR) protein. The
nucleic acid has enhanced stability during plasmid production relative to
a wildtype cDNA encoding the RPGR protein. The invention also relates to
expression cassettes, vectors, and host cells comprising the codon
optimized nucleic acid. Methods for preparing a recombinant
adeno-associated (rAAV) expression vector comprising the codon optimized
nucleic acid sequence are also provided. The nucleic acids, expression
cassettes, vectors, and host cells provided may be useful in the large
scale production of rAAV expression vectors for gene therapy
applications.Claims:
1. A polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1
encoding a human retinitis pigmentosa GTPase regulator (RPGR) protein.
2. An expression cassette comprising the polynucleotide of claim 1 and an expression control sequence operably linked and heterologous to the nucleic acid sequence.
3. A vector comprising the polynucleotide of claim 1.
4. The vector of claim 3, wherein the vector is a recombinant adeno-associated (rAAV) expression vector.
5. A recombinant herpes simplex virus (rHSV) comprising the polynucleotide of claim 1.
6. A host cell comprising the polynucleotide of claim 1.
7. The host cell of claim 6, wherein the host cell is a mammalian cell.
8. The host cell of claim 6, wherein the host cell is a HeLa cell, a BHK21 cell or a Vero cell.
9. The host cell of claim 6, wherein the host cell is a V27 cell.
10. The expression cassette of claim 2, wherein the expression control sequence is a human interphotoreceptor retinoid-binding protein (IRBP) promoter.
11. The expression cassette of claim 10, wherein the human IRBP promoter comprises a nucleic acid sequence having at least 95% sequence identity to the nucleic acid sequence of SEQ ID NO: 8 and directs preferential expression in rods and cones.
12. The expression cassette of claim 10, wherein the human IRBP promoter comprises the nucleic acid sequence of SEQ ID NO: 8.
13. The polynucleotide of claim 1, wherein the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 7.
14. A method of producing the rAAV expression vector of claim 4, comprising (a) infecting a host cell with a recombinant herpes simplex virus (rHSV) comprising the nucleic acid sequence of SEQ ID NO: 1; (b) incubating the host cell; and (c) following incubation, collecting rAAV from the host cell of step (b).
15. The method of claim 14, wherein the host cell is a HeLa cell, a BHK21 cell or a Vero cell.
16. The method of claim 14, wherein the rHSV further comprises a human IRBP promoter operably linked to the nucleic acid sequence of SEQ ID NO: 1.
17. The method of claim 16, wherein the human IRBP promoter comprises a nucleic acid sequence having at least 95% sequence identity to the nucleic acid sequence of SEQ ID NO: 8 and directs preferential expression in rods and cones.
18. The method of claim 16, wherein the human IRBP promoter comprises the nucleic acid sequence of SEQ ID NO: 8.
19. The method of claim 14, wherein the rHSV comprises the nucleic acid sequence of SEQ ID NO: 7.
Description:
RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 61/979,633, filed on Apr. 15, 2014, the entire contents of which is expressly incorporated herein by reference.
SUBMISSION OF SEQUENCE LISTING
[0002] The Sequence Listing associated with this application is filed in electronic format via EFS-Web and hereby incorporated by reference into the specification in its entirety. The name of the text file containing the Sequence Listing is "Sequence Listing" and is 16 kb in size.
FIELD OF THE INVENTION
[0003] This invention relates generally to codon optimized nucleic acid sequences encoding a human retinitis pigmentosa GTPase regulator (RPGR).
BACKGROUND OF THE INVENTION
[0004] Retinitis pigmentosa (RP) is an inherited degenerative disease of the retina that affects approximately one in 3,500 individuals, with an estimated 1.5 million patients worldwide. See Churchill et al., 2013, Invest. Ophthalmol. Vis. Sci. 54(2): 1411-1416. RP is caused by progressive loss of rod and cone photoreceptors, resulting in night blindness followed by loss of visual fields. The disease may result in legal or even complete blindness. Mutations in the retinitis pigmentosa GTPase regulator (RPGR) gene account for greater than 70% of the cases of human X-linked retinitis pigmentosa (XLRP), the most severe subtype of RP. See Beltran et al., 2012, PNAS 109(6): 2132-2137 and Bader et al., 2003, Invest. Ophthalmol. Vis. Sci. (44)4: 1458-1463.
[0005] Alternative splicing of the RPGR gene results in expression of multiple isoforms of the RPGR protein. The mRNA for isoform A contains all 19 exons of the gene, while the mRNA for isoform C contains exons 1 to 15 and a large part of intron 15. Intron 15 is a purine-rich region that contains highly repetitive sequences that code for glutamate and glycine repeats (EEEGEGEGE in human and EEGEGE in mouse), see Vervoort et al., Mutational hot spot within a new RPGR exon in X-linked retinitis pigmentosa. Nat Genet 2000; 25:462-6. Isoform A is constitutively expressed in all tissues while isoform C, which is also referred to as "ORF15", is the predominant form expressed in the connecting cilium of photoreceptor, see Hong et al., Invest Ophthalmol Vis Sci 2002; 43:3373-82, and Hong et al., Invest Ophthalmol Vis Sci 2003; 44:2413-21.
[0006] A total of 55% of RPGR-related XLRP is caused by mutations in ORF15, all of which result from deletions that lead to truncated proteins. Most of the other cases are caused by mutations in exons 1-13, which can be either missense or nonsense mutations, with a small number caused by mutations in introns or large deletions. No cases have been identified due to mutations in exons 16 to 19.
[0007] Recent studies have demonstrated the potential of gene therapy approaches to treating XLRP caused by mutations in the RPGR gene. For example, Beltran et al. have shown that subretinal injections of adeno-assocatied virus (AAV) vectors expressing human RPGR increased rod and cone photoreceptor function in a canine model of XLRP.
[0008] However one of the challenges in large-scale production of AAV vectors for clinical use is that nucleic acid sequences encoding a protein of interest such as RPGR may be unstable, resulting in the accumulation of several mutations and deletions. For example, the RPGR gene contains a region of 1.2 kb called ORF15 near the 3' end of the cDNA that is highly repetitive and GA rich. This region is a mutation "hot spot" in population. This repetitive region is very unstable during cloning and vector preparation and clones obtained generally contain mutations and deletions. These mutations can potentially alter or eliminate RPGR protein function, limiting the use of this protein in gene therapy applications. Therefore a need exists to identify methods of stabilizing RPGR cDNAs during large-scale production of AAV vectors.
SUMMARY OF THE INVENTION
[0009] It has been surprisingly found that the nucleic acid sequence of SEQ ID NO: 1 encoding the human RPGR protein is stable in large scale production of AAV plasmid pTR-IRBP-RPGRsyn. This nucleic acid sequence was developed through codon optimization of the wild type RPGR cDNA. In one aspect, the present invention provides a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1 encoding a human RPGR protein.
[0010] In one aspect, the invention features a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1 encoding a human retinitis pigmentosa GTPase regulator (RPGR) protein.
[0011] In one embodiments, the invention features an expression cassette comprising the polynucleotide of the above aspect, and an expression control sequence operably linked and heterologous to the nucleic acid sequence.
[0012] In another embodiment, the invention features a vector comprising the polynucleotide of claim 1. In a further embodiment, the vector is a recombinant adeno-associated (rAAV) expression vector.
[0013] In another embodiment, the invention features a recombinant herpes simplex virus (rHSV) comprising the polynucleotide of any one of the above aspects.
[0014] In another embodiment, the invention features a host cell comprising the polynucleotide of any one of the above aspects. In a related embodiment, the host cell is a mammalian cell. In a further related embodiment, the host cell is a HeLa cell, a BHK21 cell or a Vero cell. In another further embodiment, the host cell is a V27 cell.
[0015] In another embodiment, the expression control sequence is a human interphotoreceptor retinoid-binding protein (IRBP) promoter. In a further related embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95% sequence identity to the nucleic acid sequence of SEQ ID NO: 8 and directs preferential expression in rods and cones. In another further embodiment, the human IRBP promoter comprises the nucleic acid sequence of SEQ ID NO: 8.
[0016] In one embodiment, the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 7.
[0017] The invention also features in another embodiment a method of producing the rAAV expression vector of the above aspect, comprising (a) infecting a host cell with a recombinant herpes simplex virus (rHSV) comprising the nucleic acid sequence of SEQ ID NO: 1; (b) incubating the host cell; and (c) following incubation, collecting rAAV from the host cell of step (b).
[0018] In one embodiment, the host cell is a HeLa cell, a BHK21 cell or a Vero cell.
[0019] In another embodiment, the rHSV further comprises a human IRBP promoter operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a further embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95% sequence identity to the nucleic acid sequence of SEQ ID NO: 8 and directs preferential expression in rods and cones. In a further related embodiment, the human IRBP promoter comprises the nucleic acid sequence of SEQ ID NO: 8.
[0020] In another embodiment, the rHSV comprises the nucleic acid sequence of SEQ ID NO: 7.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] FIGS. 1A-1B show a sequence alignment of codon optimized RPGR cDNA (RPGRsyn; SEQ ID NO: 1) and the wildtype RPGR cDNA (Genbank Accession No. NM_001034853; SEQ ID NO: 5).
[0022] FIG. 2 shows a map of plasmid pUC57-RPGRsyn.
[0023] FIG. 3 shows pUC57-RPGRsyn plasmid DNA clones N5 and N6 prepared by mini-prep and larger scale midi-prep (Midi) and digested with restriction enzymes NotI and PciI. Plasmid DNA from mini-preps was retransformed into SURE2 cells before larger scale production by midi-prep.
[0024] FIG. 4 shows pUC57-RPGRsyn plasmid DNA from mini-preps (mini_2 and mini_3) digested with restriction enzymes NotI and PciI. Plasmid DNA was not detectable in larger scale midi preps (midi_2 and midi_3). Seeding culture was stored at 4.degree. C. overnight and used as the inoculant for larger scale plasmid production.
[0025] FIG. 5 shows a map of AAV proviral plasmid pTR-IRBP-RPGRsyn
[0026] FIG. 6 shows the restriction maps of pTR-IRBP-RPGRsyn plasmid DNA isolated from transformed bacteria after 4 rounds of serial overnight propagation, along with a control plasmid of pTR-IRBP-CNGB3co. Bacteria transformed with pTR-IRBP-RPGRsyn or pTR-IRBP/GNAT2-hCNGB3co plasmids were grown in medium at 37.degree. C., overnight. In the next morning, plasmid DNA was purified from 1.5 mL of overnight culture, and the remaining culture was left at room temperature until late afternoon and then used to inoculate 2 mLs of culture medium (1:1000 dilution) for the 2.sup.nd round propagation. Same procedures were followed for the 3.sup.rd and 4.sup.th round of propagation. Plasmid DNA purified from each round were then analyzed by restriction digestion with SmaI to confirm the integrity of the ITR sequence of the plasmid. Restriction maps kept same for both pTR-IRBP-RPGRsyn and the control plasmid pTR-IRBP-CNGB3co, through the 3 rounds of propagation in bacteria. However, the yield was significantly decreased after 3.sup.rd round propagation and almost no plasmid restriction fragments were detected after 4.sup.th round propagation in bacteria.
[0027] FIG. 7 shows the sequence alignment of the consensus sequence of contigs obtained from pTR-IRBP-RPGRsyn plasmid DNA to the reference pTR-IRBP-RPGRsyn sequence.
DETAILED DESCRIPTION OF THE INVENTION
[0028] The invention provides a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1 encoding a human retinitis pigmentosa GTPase regulator (RPGR) protein. The nucleic acid sequence has been codon optimized for enhanced stability during vector replication, and may be used, for example, for production of adeno-assocatied virus (AAV) vectors for gene therapy applications.
[0029] Nucleic acid sequences may be codon optimized to improve stability or heterologous expression in host cells without changing the encoded amino acid sequence. For example, codon optimization may be used to remove sequences that negatively impact gene expression, transcript stability, protein expression or protein stability, such as transcription splice sites, DNA instability motifs, polyadenylation sites, secondary structure, AU-rich RNA elements, secondary ORFs, codon tandem repeats, or long range repeats. Codon optimization may also be used to adjust the G/C content of a sequence of interest.
[0030] A codon consists of a set of three nucleotides and encodes a specific amino acid or results in the termination of translation (i.e. stop codons). The genetic code is redundant in that multiple codons specify the same amino acid, i.e., there are a total of 61 codons encoding 20 amino acids. Codon optimization replaces codons present in a DNA sequence with preferred codons encoding the same amino acid, for example, codons preferred for mammalian expression. Thus, the amino acid sequence is not altered during the process. Codon optimization can be performed using gene optimization software. The codon optimized nucleotide sequence is translated and aligned to the original protein sequence to ensure that no changes were made to the amino acid sequence. For example, the nucleotide sequence of SEQ ID NO: 1 encoding human RPGR is a codon optimized version of the wild type human RPGR nucleotide sequence (Genbank Accession No. NM_001034853, SEQ ID NO: 5). Both SEQ ID NO: 1 and SEQ ID NO: 5 encode the same RPGR protein (SEQ ID NO: 6).
[0031] Methods of codon optimization are known in the art and are described, for example, in U.S. Application Publication No. 2008/0194511 and U.S. Pat. No. 6,114,148.
[0032] The nucleic acid sequences of the present invention can be made as synthetic sequences. Techniques for constructing synthetic nucleic acid sequences are known in the art, and synthetic gene sequences may be purchased from several companies, including DNA 2.0 (Menlo Park, Calif.) and GenScript USA Inc. (Piscataway, N.J.). Alternatively, codon changes can be introduced by standard molecular biology techniques such as site-specific in vitro mutagenesis, PCR, or any other genetic engineering methods known in art which are suitable for specifically changing a nucleic acid sequence. In vitro mutagenesis protocols are described, for example, in In Vitro Mutagenesis Protocols, Braman, ed., 2002, Humana Press, and in Sankaranarayanan, Protocols in Mutagenesis, 2001, Elsevier Science Ltd.
[0033] The human RPGR gene is located in chromosomal region Xp21.1 and spans 172 kb. Shu et al., 2012, Invest. Ophthalmol. Vis. Sci. 53(7): 3951-3958. There are multiple alternatively spliced transcripts, all of which encode an amino (N)-terminal RCC1-like (RCCL) domain. The RCCL domain is structurally similar to the RCC1 protein, a guanine nucleotide exchange factor for the small guanosine triphosphate--binding protein, Ran. The RPGR gene contains 19 exons (RPGRex1-19), encoding a predicted 90 kDa protein. Exons 2 to 11 encode the RCCL domain, whereas exons 12 to 19 encode a carboxyl (C)-terminal domain rich in acidic residues and ending in an isoprenylation anchorage signal. Mutations found in RPGRex1-19 account for 15% to 20% of XLRP patients, and subsequent studies revealed many more disease-causing mutations within one or more transcripts containing an alternatively spliced C-terminal exon called ORF15 (RPGRORF15). A high frequency of microdeletions, frameshift, and premature stop mutations are found within the ORF15.
[0034] In one embodiment, the RPGR cDNA used for codon optimization is the full-length human RPGRORF15 clone, variant C, Genbank Accession No. NM_001034853 (SEQ ID NO: 5). See Vervoort et al., 2000, Nat Genet 25: 462-466. This clone contains exons 1-ORF15 and was generated using three-way ligation by step-wise amplifying exons 1-part of 15b (nucleotides 169-1990) from human lymphocytes and 1991-3627 from human genomic DNA. See Beltran et al., 2012, PNAS 109(6): 2132-2137.
[0035] RPGR is widely expressed and shows a complex expression pattern. See Shu et al., cited above. RPGR transcripts are detected in different tissues, including brain, eye, kidney, lung, and testis in several different species. RPGR protein is detected in retina, trachea, brain, and testis. In human, mouse, and bovine retina, RPGR mainly localizes to photoreceptor connecting cilia, but expression has also been reported in outer segments in some species. RPGR is expressed in the transitional zone of motile cilia and within human and monkey cochlea.
[0036] The invention also provides an expression cassette comprising the nucleic acid sequence of SEQ ID NO: 1 and an expression control sequence operably linked and heterologous to the nucleic acid sequence. The term "expression control sequence" refers to any genetic element (e.g., polynucleotide sequence) that can exert a regulatory effect on the replication or expression (transcription or translation) of the nucleic acid sequence. Common expression control sequences include promoters, polyadenylation signals, transcription termination sequences, upstream regulatory domains, origins of replication, internal ribosome entry sites (IRES), and enhancers.
[0037] An expression control sequence is operably linked with a nucleic acid sequence when the expression control sequence is placed in a functional relationship with the second nucleic acid sequence. For example, a promoter is operably linked to a coding sequence if the promoter affects the expression of the coding sequence. The term operably linked encompasses, for example, an arrangement of an expression control sequence with the nucleic acid sequence to be expressed and optionally further expression control sequences, such as a terminator or enhancer, such that each of the expression control sequences can allow, modify, facilitate or otherwise influence expression of the nucleic acid sequence.
[0038] The term "heterologous" refers to nucleic acid or amino acid sequences that are obtained or derived from different source organisms or from different genes or proteins within the same source organism. For example, an expression control sequence that is not a native expression control sequence of the human RPGR gene is considered to be heterologous to the human RPGR gene. In certain embodiments, the expression control sequence is a promoter that is heterologous to the RPGR gene.
[0039] In a preferred embodiment, the expression control sequence is a human interphotoreceptor retinoid-binding protein (IRBP) promoter. IRBP is a large glycoprotein that is expressed only in the photoreceptor cells of the retina and to a much lesser extent in pinealocytes in the pineal gland in the brain. See Al-Ubaidi et al., 1992, J Cell Biology, 119(6) 1681-1687. The IRBP promoter region is well characterized. For example, Albini et al. (1990, Nucleic Acids Research 18(17): 5181-5187) describe a nucleotide sequence of the human IRBP promoter region (Genbank Accession No. X53044) containing 2818 bp of the 5' untranscribed region (SEQ ID NO: 2). Beltran et al. (cited above) demonstrated that a 235 bp fragment of the human IRBP promoter directed GFP expression in both rods and cones of normal canine retina in a dose- and time-dependent manner. A 1.3 kb fragment of the 5' untranslated region of the human IRBP gene (SEQ ID NO: 3) directed expression of a bacterial reporter gene (chloramphenicol acetyltransferase, CAT) specifically to photoreceptor cells in transgenic mice. See Al Ubaidi et al. 1992, J Cell Biology 119: 1681-1687. Nested deletion analysis of a 1783 bp fragment of the mouse IRBP 5' flanking region indicated that high promoter activity was maintained with a fragment consisting of 70 bp 5' to the transcription start site (SEQ ID NO: 4), but that elements upstream of this 70 bp fragment are required for complete tissue-specific regulation. See Boatright, et al., 1997, Molecular Vision 3: 15.
[0040] In a preferred embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95%, 96%, 97%, 98% or 99% sequence identity to SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 8. In a further preferred embodiment, the human IRBP promoter comprises SEQ ID NO: 8.
[0041] As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked, e.g., a plasmid. Preferred vectors are those capable of autonomous replication and/or expression of nucleic acids to which they are linked. An "rAAV vector" is a recombinant vector that includes nucleic acid sequences derived from adeno-associated virus (AAV). Recombinant AAV is produced in vitro by introduction of gene constructs into cells known as producer cells. Recombinant AAV has been studied extensively as a vehicle for gene therapy and for its potential applicability as a treatment for human diseases based on genetic defects. At the clinical level, the rAAV vector has been used in human clinical trials to deliver the cftr gene to cystic fibrosis patients and the Factor IX gene to hemophilia patients (Flotte, et al., 1998, Methods Enzymol 292:717-732; and Wagner et al., 1998, Lancet 351:1702-1703). Systems for production of rAAV employ three elements: 1) a gene cassette containing the gene of interest, 2) a gene cassette containing AAV rep and cap genes and 3) a source of "helper" virus proteins. Methods of producing rAAV are known in the art and are described, for example, in U.S. Pat. No. 7,091,029.
[0042] Production of rAAV vectors for gene therapy is carried out in vitro, using suitable producer cell lines. A preferred cell line is 293, but production of rAAV can be achieved using other cell lines, including but not limited to human or monkey cell lines such as Vero, WI 38 and HeLa, and rodent cells, such as BHK cells, e.g. BHK21.
[0043] In particular embodiments, the rAAV comprises the nucleic acid sequence of SEQ ID NO: 1 encoding the human RPGR protein. The rAAV may further comprise one or more expression control sequences operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the expression control sequence is a human IRBP promoter. In a further preferred embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95%, 96%, 97%, 98% or 99% sequence identity to the nucleic acid sequence of SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 8 and directs preferential expression in rods and cones. In a particularly preferred embodiment, the human IRBP promoter comprises SEQ ID NO: 8.
[0044] In certain embodiments, the rAAV further comprises an SV40 poly A tail, an SV40 splice donor/splice acceptor (SD/SA) sequence, and a Kozak sequence, each operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the rAAV comprises the nucleic acid sequence of SEQ ID NO: 7.
[0045] One strategy for delivering all of the required elements for rAAV production to the producer cell line involves transfecting the cells with plasmids containing gene cassettes encoding the necessary gene products, as well as infection of the cells with the helper virus Ad to provide the helper functions. This system employs plasmids with two different gene cassettes. The first is a proviral plasmid encoding the recombinant DNA to be packaged as rAAV. The second is a plasmid encoding the rep and cap genes. Other DNA viruses, such as Herpes simplex virus type 1 (HSV-1) can be used instead of Ad to provide helper virus gene products needed for rAAV production (Conway et al., 1999, Gene Ther. 6:973-985).
[0046] Another strategy for rAAV production is based on the use of two or more recombinant rHSV-1 viruses to simultaneously co-infect producer cells with all of the components necessary for producing rAAV. This strategy employs at least two different forms of rHSV, each containing a different gene cassette. In addition to supplying the necessary helper functions, each of these rHSV viruses is engineered to deliver different AAV (and other) genes to the producer cells upon infection. The two rHSV forms are referred to as the "rHSV/rc virus" and the "rHSV expression virus." The rHSV/rc virus contains a gene cassette in which the rep and cap genes from AAV are inserted into the HSV genome. The rep genes are responsible for replication and packaging of the rAAV genome in host cells infected with AAV. The cap genes encode proteins that comprise the capsid of the rAAV produced by the infected cells.
[0047] The second recombinant HSV is an "rHSV expression virus." A usual element of an rAAV production system is an expression cassette containing transgene DNA sequences encoding a gene(s) of interest, such as the RPGR gene, along with promoter elements necessary for expression of the gene. In particular embodiments, the rHSV comprises the nucleic acid sequence of SEQ ID NO: 1 encoding the human RPGR protein. Expression vectors engineered for rAAV production are generally constructed with the gene of interest inserted between two AAV-2 inverted terminal repeats (ITRs). The ITRs are responsible for the ability of native AAV to insert its DNA into the genome of host cells upon infection or otherwise persist in the infected cells. The expression cassette is incorporated into the rHSV expression virus described above. This second rHSV virus is used for simultaneous co-infection of the cells along with the rHSV-1/rc virus.
[0048] The terms "recombinant HSV," "rHSV," "rHSV vector," and "rHSV expression vector" refer to isolated, genetically modified forms of herpes simplex virus (HSV) containing heterologous genes incorporated into the viral genome. Methods for production of rHSV are known in the art and are described, for example, by Conway et al. (1999, Gene Ther. 6:973-985); Conway et al. (1997, J Virol 71: 8780-8789) and U.S. Pat. No. 7,037,723.
[0049] In particular embodiments, the rHSV comprises the nucleic acid sequence of SEQ ID NO: 1 encoding the human RPGR protein. The rHSV may further comprise one or more expression control sequences for regulating expression of the nucleic acid sequence of SEQ ID NO: 1, wherein the expression control sequence is operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the expression control sequence is a human IRBP promoter that is operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a further preferred embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95%, 96%, 97%, 98% or 99% sequence identity to the nucleic acid sequence of SEQ ID SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 8 and directs preferential expression in rods and cones. In a particularly preferred embodiment, the human IRBP promoter comprises SEQ ID NO: 8.
[0050] In certain embodiments of the aforementioned methods, the rHSV further comprises an SV40 poly A tail, an SV40 splice donor/splice acceptor (SD/SA) sequence, and a Kozak sequence, each operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the rHSV comprises the nucleic acid sequence of SEQ ID NO: 7.
[0051] The invention also provides a method of producing an rAAV expression vector comprising a polynucleotide comprising the nucleic acid sequence of SEQ ID NO: 1 encoding a human RPGR protein. In one embodiment, the method comprises (a) infecting a host cell with a recombinant herpes simplex virus (rHSV) comprising the nucleic acid sequence of SEQ ID NO: 1; (b) incubating the host cell; and (c) following incubation, collecting rAAV from the host cell of step (b).
[0052] Methods of producing rAAV expression vectors by infecting a host cell with an rHSV are known in the art and are described for example in U.S. Pat. No. 7,091,029. For example, in one embodiment, the host cells are infected with rHSV by diluting the virus in growth medium such as DMEM and adding the virus to flasks containing the host cells. The host cells may be incubated with the virus for various intervals, for example, 22, 26, 30, 34, or 46 hours. Following the incubation interval, the virus-infected cells may be harvested by pelleting, followed by resuspension in DMEM. Cell-associated rAAV may be collected from the host cells by lysis of the cells using standard techniques involving three rounds of freezing and thawing (See Conway et al., 1999, cited above).
[0053] In particular embodiments, the host cell used for producing an rAAV expression vector in the aforementioned methods is a HeLa cell, a BHK21 cell or a Vero cell.
[0054] The rHSV used in the aforementioned method may further comprise one or more expression control sequences for regulating expression of the nucleic acid sequence of SEQ ID NO: 1 that is operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the expression control sequence is a human IRBP promoter that is operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a further preferred embodiment, the human IRBP promoter comprises a nucleic acid sequence having at least 95%, 96%, 97%, 98% or 99% sequence identity to the nucleic acid sequence of SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, or SEQ ID NO: 8 and directs preferential expression in rods and cones. In a particularly preferred embodiment, the human IRBP promoter comprises SEQ ID NO: 8.
[0055] In certain embodiments of the aforementioned methods, the rHSV further comprises an SV40 poly A tail, an SV40 splice donor/splice acceptor (SD/SA) sequence, and a Kozak sequence, each operably linked to the nucleic acid sequence of SEQ ID NO: 1. In a preferred embodiment, the rHSV comprises the nucleic acid sequence of SEQ ID NO: 7.
Description of Sequences
TABLE-US-00001
[0056] SEQ ID NO: Description 1 Codon modified RPGR cDNA 2 Human IRBP promoter, 2818 bp. Albini et al., 1990, Nuc Acid Res 18: 5181-5187). SEQ ID NO: 2 comprises SEQ ID NO: 3 and 4. 3 Human IRBP promoter, 1326 bp. Al Ubaidi et al. 1992, J Cell Biology 119: 1681-1687 4 Mouse IRBP core promoter region, 70 bp. Boatright, et al., 1997, Molecular Vision 3: 15. 5 Wildtype RPGR cDNA, Genbank Accession No. NM_001034853 6 Wildtype RPGR amino acid sequence 7 3871 bp synthesized sequence comprising SEQ ID NO: 1, an SV40 poly A tail, the SV40 SD/SA sequence, Kozak sequence, and restriction sites 8 Human IRBP promoter, 234 bp fragment used in the RPGRsyn expression cassette
[0057] The following examples serve to illustrate certain embodiments and aspects of the present invention and are not to be construed as limiting the scope of the invention.
EXAMPLES
Example 1
Codon Optimization of the RPGR Gene and Evaluation of Plasmid Stability
[0058] A wildtype RPGR cDNA in an AAV plasmid used for AAV manufacturing was found to contain several mutations and deletions in the region from nt 2461 to nt 3057. There were a total of 42 bp accumulated deletions or substitutions across this region. The plasmid clone was found to be stable during plasmid propagation in bacteria, and no sequence changes were found in the AAV vector.
[0059] A 3459 bp coding sequence of the RPGR gene, variant C (SEQ ID NO: 5) was codon-optimized at Genscript, Inc. for mammalian expression. Codon optimization was used both to select codons of high frequency in mammals and to alter GC content to enhance stability, and to reduce the repetitive nature of the gene. The codon optimized version of the RPGR coding sequence (RPGRsyn; SEQ ID NO: 1) shares 72.1% sequence identity with the original gene (SEQ ID NO: 5). See FIG. 1. The codon optimized gene encodes the same polypeptide as the original gene, i.e. the polypeptide of SEQ ID NO: 6. The RPGRsyn gene was synthesized at GenScript along with an SV40 poly A tail, the SV40 SD/SA sequence, a Kozak sequence, and restriction sites for cloning purposes. The entire 3871 bp synthesized sequence is provided as SEQ ID NO: 7.
[0060] A map of the plasmid containing RPGRsyn (pUC57-RPGRsyn) is shown in FIG. 2. This plasmid was able to propagate stably in bacteria in small scale plasmid production. This plasmid also maintained its stability in larger scale production after being retransformed into SURE2 cells, a bacteria strain used for cloning of the AAV plasmid. See FIG. 3. Clone N5 of the pUC57-RPGRsyn plasmid DNA produced in large scale production was confirmed to be identical to the original plasmid by DNA sequencing. The plasmid yield could range from very low yield to none at all if the seeding culture was stored at 4.degree. C. overnight and used as the inoculant for large scale plasmid production. See FIG. 4. The RPGRsyn cDNA was then released from pUC57-RPGRsyn plasmid and inserted into a pTR containing plasmid to generate the AAV proviral plasmid pTR-IRBP-RPGRsyn (FIG. 5). pTR-IRBP-RPGRsyn contains inverted terminal repeats (ITR) of AAV2 and IRBP promoter. Large scale production of the plasmid confirmed to be 100% correct upon DNA sequencing (FIG. 6). To further confirm the stability of pTR-IRBP-RPGRsyn, bacteria transformed with pTR-IRBP-RPGRsyn or pTR-IRBP/GNAT2-hCNGB3co plasmids were grown in medium at 37.degree. C., overnight. In the next morning, plasmid DNA was purified from 1.5 mL of overnight culture, and the remaining culture was left at room temperature until late afternoon and then used to inoculate 2 mLs of culture medium (1:1000 dilution) for the 2.sup.nd round propagation. Same procedures were followed for the 3.sup.rd and 4.sup.th round propagation. Plasmid DNA purified from each round was then analyzed by restriction digestion with SmaI to confirm the integrity of the ITR sequence of the plasmid. As shown in FIG. 6, the yield of pTR-IRBP-RPGRsyn declined during the serial passages; however, the same pattern is observed for pTR-IRBP-CNGB3co, a plasmid that contains the stable hCNGB3 cDNA. Therefore, the decline of plasmid yield is related to bacteria itself or other features such as TR, but not to the RPGRsyn. Also noted in FIG. 7, the 4.2kb band containing RPGRsyn has been stable over the passages (it will become loose or smear if unstable).
Example 2
Construction of AAV Plasmids and Evaluation in Bacteria
[0061] An AAV plasmid (pTR-IRBP-RPGRsyn) comprising an RPGRsyn expressing cassette comprising the IRBP promoter (234 bp), the RPGRsyn cDNA (SEQ ID NO: 1), and an SV40 polyA signal sequence is constructed. This IRBP fragment is contained within the 235 bp fragment used by Beltran et al. in the canine model (See Beltran et al., 2012, PNAS 109(6): 2132-2137). After construction of pTR-IRBP-RPGRsyn, the plasmid is tested for stability in bacteria using the methods described in Example 1.
[0062] Once the stability of pTR-IRBP-RPGRsyn is confirmed, an HSV recombination plasmid comprising the IRBP-RPGRsyn expression cassette (pHSV106-IRBP-RPGRsyn) is constructed. pHSV106-IRBP-RPGRsyn is used for construction of HSV-IRBP-RPGRsyn helper vector for large scale production of the AAV vector AAV-IRBP-RPGRsyn. The rHSV helper viruses are propagated in mammalian cells (V27, an ICP27-complementing Vero cell line). RPGRsyn cDNA is more stable in mammalian cells than in bacteria. This increased stability will eliminate the need for large-scale production of an AAV proviral plasmid containing the RPGRsyn cDNA, which is a reagent required for rAAV production by plasmid transfection methods.
Sequence CWU
1
1
1213459DNAArtificial SequenceDescription of Artificial Sequence Synthetic
polynucleotide 1atgagagagc cagaggagct gatgccagat agcggagcag
tgtttacctt cggaaagtcc 60aagttcgcag agaataaccc aggaaagttc tggtttaaaa
acgacgtgcc cgtccacctg 120tcttgtggcg atgagcatag tgccgtggtc actgggaaca
ataagctgta tatgttcggg 180tccaacaatt ggggacagct ggggctggga tccaaatctg
ctatctctaa gccaacctgc 240gtgaaggcac tgaaacccga gaaggtcaaa ctggccgctt
gtggcagaaa ccacactctg 300gtgagcaccg agggcgggaa tgtctatgcc accggaggca
acaatgaggg acagctggga 360ctgggggaca ctgaggaaag gaataccttt cacgtgatct
ccttctttac atctgagcat 420aagatcaagc agctgagcgc cggctccaac acatctgcag
ccctgactga ggacgggcgc 480ctgttcatgt ggggagataa ttcagagggc cagattgggc
tgaaaaacgt gagcaacgtg 540tgcgtgcctc agcaggtgac catcggaaag ccagtcagtt
ggatttcatg tggctactat 600catagcgcct tcgtgaccac agatggcgag ctgtacgtct
ttggggagcc cgaaaacgga 660aaactgggcc tgcctaacca gctgctgggc aatcaccgga
caccccagct ggtgtccgag 720atccctgaaa aagtgatcca ggtcgcctgc gggggagagc
atacagtggt cctgactgag 780aatgccgtgt acaccttcgg actgggccag tttggccagc
tggggctggg aaccttcctg 840tttgagacat ccgaaccaaa agtgatcgag aacattcgcg
accagactat cagctacatt 900tcctgcggag agaatcacac cgcactgatc acagacattg
gcctgatgta tacctttggc 960gatgggcggc acgggaagct gggactgggc ctggagaact
tcactaatca cttcatcccc 1020accctgtgct ctaacttcct gcggttcatc gtgaaactgg
tcgcttgcgg cgggtgtcac 1080atggtggtct tcgctgcacc tcataggggc gtggctaagg
agatcgaatt tgacgagatt 1140aacgatacat gcctgagcgt ggcaactttc ctgccataca
gctccctgac ttctggcaat 1200gtgctgcaga gaaccctgag tgcaaggatg cggagaaggg
agagggaacg ctctcctgac 1260agtttctcaa tgcgacgaac cctgccacct atcgagggga
cactgggact gagtgcctgc 1320ttcctgccta actcagtgtt tccacgatgt agcgagcgga
atctgcagga gtctgtcctg 1380agtgagcagg atctgatgca gccagaggaa cccgactacc
tgctggatga gatgaccaag 1440gaggccgaaa tcgacaactc tagtacagtg gagtccctgg
gcgagactac cgatatcctg 1500aatatgacac acattatgtc actgaacagc aatgagaaga
gtctgaaact gtcaccagtg 1560cagaagcaga agaaacagca gactattggc gagctgactc
aggacaccgc cctgacagag 1620aacgacgata gcgatgagta tgaggaaatg tccgagatga
aggaaggcaa agcttgtaag 1680cagcatgtga gtcaggggat cttcatgaca cagccagcca
caactattga ggctttttca 1740gacgaggaag tggagatccc cgaggaaaaa gagggcgcag
aagattccaa ggggaatgga 1800attgaggaac aggaggtgga agccaacgag gaaaatgtga
aagtccacgg aggcaggaag 1860gagaaaacag aaatcctgtc tgacgatctg actgacaagg
ccgaggtgtc cgaaggcaag 1920gcaaaatctg tcggagaggc agaagacgga ccagagggac
gaggggatgg aacctgcgag 1980gaaggctcaa gcggggctga gcattggcag gacgaggaac
gagagaaggg cgaaaaggat 2040aaaggccgcg gggagatgga acgacctgga gagggcgaaa
aagagctggc agagaaggag 2100gaatggaaga aaagggacgg cgaggaacag gagcagaaag
aaagggagca gggccaccag 2160aaggagcgca accaggagat ggaagagggc ggcgaggaag
agcatggcga gggagaagag 2220gaagagggcg atagagaaga ggaagaggaa aaagaaggcg
aagggaagga ggaaggagag 2280ggcgaggaag tggaaggcga gagggaaaag gaggaaggag
aacggaagaa agaggaaaga 2340gccggcaaag aggaaaaggg cgaggaagag ggcgatcagg
gcgaaggcga ggaggaagag 2400accgagggcc gcggggaaga gaaagaggag ggaggagagg
tggagggcgg agaggtcgaa 2460gagggaaagg gcgagcgcga agaggaagag gaagagggcg
agggcgagga agaagagggc 2520gagggggaag aagaggaggg agagggcgaa gaggaagagg
gggagggaaa gggcgaagag 2580gaaggagagg aaggggaggg agaggaagag ggggaggagg
gcgaggggga aggcgaggag 2640gaagaaggag agggggaagg cgaagaggaa ggcgaggggg
aaggagagga ggaagaaggg 2700gaaggcgaag gcgaagagga gggagaagga gagggggagg
aagaggaagg agaagggaag 2760ggcgaggagg aaggcgaaga gggagagggg gaaggcgagg
aagaggaagg cgagggcgaa 2820ggagaggacg gcgagggcga gggagaagag gaggaagggg
aatgggaagg cgaagaagag 2880gaaggcgaag gcgaaggcga agaagagggc gaaggggagg
gcgaggaggg cgaaggcgaa 2940ggggaggaag aggaaggcga aggagaaggc gaggaagaag
agggagagga ggaaggcgag 3000gaggaaggag agggggagga ggagggagaa ggcgagggcg
aagaagaaga agagggagaa 3060gtggagggcg aagtcgaggg ggaggaggga gaaggggaag
gggaggaaga agagggcgaa 3120gaagaaggcg aggaaagaga aaaagaggga gaaggcgagg
aaaaccggag aaatagggaa 3180gaggaggaag aggaagaggg aaagtaccag gagacaggcg
aagaggaaaa cgagcggcag 3240gatggcgagg aatataagaa agtgagcaag atcaaaggat
ccgtcaagta cggcaagcac 3300aaaacctatc agaagaaaag cgtgaccaac acacagggga
atggaaaaga gcagcgaagt 3360aaaatgcctg tgcagtcaaa acggctgctg aagaatggcc
caagcgggtc taaaaaattc 3420tggaacaatg tcctgccaca ctatctggaa ctgaagtaa
345922818DNAHomo sapiens 2gctccttcct gtactgccca
gctccgcttg ctccctgacc atccctgcag cagccctgat 60gtgtcattgt ccccctctta
acctgcgctg cagtgctgca gggctgggct ctggagctgg 120gtctggtcat ttctccttag
atatgtagag gcccaggaaa ggtttggagc ctaagaagcc 180ctaggactcc aggtctccag
ggcagcccca gcctcttgga atgactttcc ctaataccac 240aggggtgttc taatcccagg
cagacccaag ctgcccctca ccaactccta cgtcctcaac 300ttcctttcat aacttctagg
atggaaacac ctaatcctcc agcaatactg aggcttttct 360ccttattctg ttttcccttt
tgaagaagcc aaggctcaga gcagtcgagt cacctaatca 420tggtctcatg tcgcctgatc
aaggtctcat gtcaccttat caagatctca cccactcacc 480tattcagttc tcaccagttc
agttcaggat ggcttctaag ctaccctgca cagctctgcc 540cacaggacat ttgtataagt
gagggggtgc aggccttcca gccccctcca actccaaaac 600tcagccccca agatcaagtg
gactctctga acccaccctg gccctacagt tgtcagggtc 660tggatgggaa gatgtagagc
tctcggcttt cactctgggg acttacccag aacatattct 720cctcatgagc taaggaggct
ggctgccatc ttcctacatc cccccacggc ctgggggcaa 780ggacaccctg gccccctgga
gtctggagaa ctctgaggac agaacttgct cttccacctg 840cttgggcctt acccacagga
gaagcactgc ttctctaccc atgccccatc caactcaggc 900accccaggga cttgcaacag
tctgattttt tctcacgtcc ttcttaaggc tctgggctag 960ccacacaaat caaatcccag
tgataggtcc agacaatcct atcctgaaac tacatcttag 1020taagactcca gggaatcctt
tccccaaaga cagtcttact cctgttctcc ccccaagcct 1080ttctgggcca gaagctttgc
ctggactcaa gcaatggcag acaagtgccc tctgaggaca 1140cggaagtgca tgctcagaac
tgtgattctc caagtggagg cagaggagaa ggcccaggct 1200tcccagcagg gctaaggata
tgcaaggagt gcattcatcc ggaggtgttg gcagcatccc 1260agccccaccc cattctcatc
gtaaatcagg ctcacttcca ttggctgcat acggtggagt 1320gatgtgacca tatgtcactt
gagcattaca caaatcctaa tgagctaaaa atatgtttgt 1380tttagctaat tgacctcttt
ggccttcata aagcagttgg taaacatcct cagataatga 1440tttccaaaga gcagattgtg
ggtctcagct gtgcagagaa agcccacgtc cctgagacca 1500ccttctccag ctgcctactg
aggcacacag gggcgcctgc ctgctgcccg ctcagccaag 1560gcggtgttgc tggagccagc
ttgggacagc tctcccaacg ctctgccctg gccttgcgac 1620cactctctgg gccgtagttg
tctgtctgtt aagtgaggaa agtgcccatc tccagaggca 1680ttcagcggca aagcagggct
tccaggttcc gaccccatag caggacttct tggatttcta 1740cagccagtca gttgcaagca
gcacccatat tatttctata agaagtggca ggagctggga 1800tctgaagagt tcagcagtct
acctttccct gtttcttgtg ctttatgcag tcaggaggaa 1860tgatctggat tccatgtgaa
gcctgggacc acggagaccc aagacttcct gcttgattct 1920ccctgcgaac tgcaggctgt
gggctgagcc ttcaagaagc aggagtcccc tctagccatt 1980aactctcaga gctaacctca
tttgaatggg aacactagtc ctgtgatgtc tggaaggtgg 2040gcgcctctac actccacacc
ctacatggtg gtccagacac atcattccca gcattagaaa 2100gctgtagggg gacccgttct
gttccctgga ggcattaaag ggacatagaa ataaatctca 2160agctctgagg ctgatgccag
cctcagactc agcctctgca ctgtatgggc caattgtagc 2220cccaaggact tcttcttgct
gcacccccta tctgtccaca cctaaaacga tgggcttcta 2280tttagttaca gaactctctg
gcctgttttg ttttgctttg ctttgttttg ttttgttttt 2340ttgttttttt gttttttagc
tatgaaacag aggtaatatc taatacagat aacttaccag 2400taatgagtgc ttcctactta
ctgggtactg ggaagaagtg ctttacacat attttctcat 2460ttaatctaca caataagtaa
ttaagacatt tccctgaggc cacgggagag acagtggcag 2520aacagttctc caaggaggac
ttgcaagtta ataactggac tttgcaaggc tctggtggaa 2580actgtcagct tgtaaaggat
ggagcacagt gtctggcatg tagcaggaac taaaataatg 2640gcagtgatta atgttatgat
atgcagacac aacacagcaa gataagatgc aatgtacctt 2700ctgggtcaaa ccaccctggc
cactcctccc cgatacccag ggttgatgtg cttgaattag 2760acaggattaa aggcttactg
gagctggaag ccttgcccca actcaggagt ttagcccc 281831326DNAHomo sapiens
3ctgcctactg aggcacacag gggcgcctgc ctgctgcccg ctcagccaag gcggtgttgc
60tggagccagc ttgggacagc tctcccaacg ctctgccctg gccttgcgac cactctctgg
120gccgtagttg tctgtctgtt aagtgaggaa agtgcccatc tccagaggca ttcagcggca
180aagcagggct tccaggttcc gaccccatag caggacttct tggatttcta cagccagtca
240gttgcaagca gcacccatat tatttctata agaagtggca ggagctggga tctgaagagt
300tcagcagtct acctttccct gtttcttgtg ctttatgcag tcaggaggaa tgatctggat
360tccatgtgaa gcctgggacc acggagaccc aagacttcct gcttgattct ccctgcgaac
420tgcaggctgt gggctgagcc ttcaagaagc aggagtcccc tctagccatt aactctcaga
480gctaacctca tttgaatggg aacactagtc ctgtgatgtc tggaaggtgg gcgcctctac
540actccacacc ctacatggtg gtccagacac atcattccca gcattagaaa gctgtagggg
600gacccgttct gttccctgga ggcattaaag ggacatagaa ataaatctca agctctgagg
660ctgatgccag cctcagactc agcctctgca ctgtatgggc caattgtagc cccaaggact
720tcttcttgct gcacccccta tctgtccaca cctaaaacga tgggcttcta tttagttaca
780gaactctctg gcctgttttg ttttgctttg ctttgttttg ttttgttttt ttgttttttt
840gttttttagc tatgaaacag aggtaatatc taatacagat aacttaccag taatgagtgc
900ttcctactta ctgggtactg ggaagaagtg ctttacacat attttctcat ttaatctaca
960caataagtaa ttaagacatt tccctgaggc cacgggagag acagtggcag aacagttctc
1020caaggaggac ttgcaagtta ataactggac tttgcaaggc tctggtggaa actgtcagct
1080tgtaaaggat ggagcacagt gtctggcatg tagcaggaac taaaataatg gcagtgatta
1140atgttatgat atgcagacac aacacagcaa gataagatgc aatgtacctt ctgggtcaaa
1200ccaccctggc cactcctccc cgatacccag ggttgatgtg cttgaattag acaggattaa
1260aggcttactg gagctggaag ccttgcccca actcaggagt ttagccccag accttctgtc
1320caccag
1326470DNAMus musculus 4gcttgaatta gacaggatta aaggcttact ggagctggaa
gccttgcccc aactcaggag 60tttagcccca
7053459DNAHomo sapiens 5atgagggagc cggaagagct
gatgcccgat tcgggtgctg tgtttacatt tgggaaaagt 60aaatttgctg aaaataatcc
cggtaaattc tggtttaaaa atgatgtccc tgtacatctt 120tcatgtggag atgaacattc
tgctgttgtt accggaaata ataaacttta catgtttggc 180agtaacaact ggggtcagtt
aggattagga tcaaagtcag ccatcagcaa gccaacatgt 240gtcaaagctc taaaacctga
aaaagtgaaa ttagctgcct gtggaaggaa ccacaccctg 300gtgtcaacag aaggaggcaa
tgtatatgca actggtggaa ataatgaagg acagttgggg 360cttggtgaca ccgaagaaag
aaacactttt catgtaatta gcttttttac atccgagcat 420aagattaagc agctgtctgc
tggatctaat acttcagctg ccctaactga ggatggaaga 480ctttttatgt ggggtgacaa
ttccgaaggg caaattggtt taaaaaatgt aagtaatgtc 540tgtgtccctc agcaagtgac
cattgggaaa cctgtctcct ggatctcttg tggatattac 600cattcagctt ttgtaacaac
agatggtgag ctatatgtgt ttggagaacc tgagaatggg 660aagttaggtc ttcccaatca
gctcctgggc aatcacagaa caccccagct ggtgtctgaa 720attccggaga aggtgatcca
agtagcctgt ggtggagagc atactgtggt tctcacggag 780aatgctgtgt atacctttgg
gctgggacaa tttggtcagc tgggtcttgg cacttttctt 840tttgaaactt cagaacccaa
agtcattgag aatattaggg atcaaacaat aagttatatt 900tcttgtggag aaaatcacac
agctttgata acagatatcg gccttatgta tacttttgga 960gatggtcgcc acggaaaatt
aggacttgga ctggagaatt ttaccaatca cttcattcct 1020actttgtgct ctaatttttt
gaggtttata gttaaattgg ttgcttgtgg tggatgtcac 1080atggtagttt ttgctgctcc
tcatcgtggt gtggcaaaag aaattgaatt cgatgaaata 1140aatgatactt gcttatctgt
ggcgactttt ctgccgtata gcagtttaac ctcaggaaat 1200gtactgcaga ggactctatc
agcacgtatg cggcgaagag agagggagag gtctccagat 1260tctttttcaa tgaggagaac
actacctcca atagaaggga ctcttggcct ttctgcttgt 1320tttctcccca attcagtctt
tccacgatgt tctgagagaa acctccaaga gagtgtctta 1380tctgaacagg acctcatgca
gccagaggaa ccagattatt tgctagatga aatgaccaaa 1440gaagcagaga tagataattc
ttcaactgta gaaagccttg gagaaactac tgatatctta 1500aacatgacac acatcatgag
cctgaattcc aatgaaaagt cattaaaatt atcaccagtt 1560cagaaacaaa agaaacaaca
aacaattggg gaactgacgc aggatacagc tcttactgaa 1620aacgatgata gtgatgaata
tgaagaaatg tcagaaatga aagaagggaa agcatgtaaa 1680caacatgtgt cacaagggat
tttcatgacg cagccagcta cgactatcga agcattttca 1740gatgaggaag tagagatccc
agaggagaag gaaggagcag aggattcaaa aggaaatgga 1800atagaggagc aagaggtaga
agcaaatgag gaaaatgtga aggtgcatgg aggaagaaag 1860gagaaaacag agatcctatc
agatgacctt acagacaaag cagaggtgag tgaaggcaag 1920gcaaaatcag tgggagaagc
agaggatggg cctgaaggta gaggggatgg aacctgtgag 1980gaaggtagtt caggagcaga
acactggcaa gatgaggaga gggagaaggg ggagaaagac 2040aagggtagag gagaaatgga
gaggccagga gagggagaga aggaactagc agagaaggaa 2100gaatggaaga agagggatgg
ggaagagcag gagcaaaagg agagggagca gggccatcag 2160aaggaaagaa accaagagat
ggaggaggga ggggaggagg agcatggaga aggagaagaa 2220gaggagggag acagagaaga
ggaagaagag aaggagggag aagggaaaga ggaaggagaa 2280ggggaagaag tggagggaga
acgtgaaaag gaggaaggag agaggaaaaa ggaggaaaga 2340gcggggaagg aggagaaagg
agaggaagaa ggagaccaag gagaggggga agaggaggaa 2400acagagggga gaggggagga
aaaagaggag ggaggggaag tagagggagg ggaagtagag 2460gaggggaaag gagagaggga
agaggaagag gaggagggtg agggggaaga ggaggaaggg 2520gagggggaag aggaggaagg
ggagggggaa gaggaggaag gagaagggaa aggggaggaa 2580gaaggggaag aaggagaagg
ggaggaagaa ggggaggaag gagaagggga gggggaagag 2640gaggaaggag aaggggaggg
agaagaggaa ggagaagggg agggagaaga ggaggaagga 2700gaaggggagg gagaagagga
aggagaaggg gagggagaag aggaggaagg agaagggaaa 2760ggggaggagg aaggagagga
aggagaaggg gagggggaag aggaggaagg agaaggggaa 2820ggggaggatg gagaagggga
gggggaagag gaggaaggag aatgggaggg ggaagaggag 2880gaaggagaag gggaggggga
agaggaagga gaaggggaag gggaggaagg agaaggggag 2940ggggaagagg aggaaggaga
aggggagggg gaagaggagg aaggggaaga agaaggggag 3000gaagaaggag agggagagga
agaaggggag ggagaagggg aggaagaaga ggaaggggaa 3060gtggaagggg aggtggaagg
ggaggaagga gagggggaag gagaggaaga ggaaggagag 3120gaggaaggag aagaaaggga
aaaggagggg gaaggagaag aaaacaggag gaacagagaa 3180gaggaggagg aagaagaggg
gaagtatcag gagacaggcg aagaagagaa tgaaaggcag 3240gatggagagg agtacaaaaa
agtgagcaaa ataaaaggat ctgtgaaata tggcaaacat 3300aaaacatatc aaaaaaagtc
agttactaac acacagggaa atgggaaaga gcagaggtcc 3360aaaatgccag tccagtcaaa
acgactttta aaaaacgggc catcaggttc caaaaagttc 3420tggaataatg tattaccaca
ttacttggaa ttgaagtaa 345961152PRTHomo sapiens
6Met Arg Glu Pro Glu Glu Leu Met Pro Asp Ser Gly Ala Val Phe Thr 1
5 10 15 Phe Gly Lys Ser
Lys Phe Ala Glu Asn Asn Pro Gly Lys Phe Trp Phe 20
25 30 Lys Asn Asp Val Pro Val His Leu Ser
Cys Gly Asp Glu His Ser Ala 35 40
45 Val Val Thr Gly Asn Asn Lys Leu Tyr Met Phe Gly Ser Asn
Asn Trp 50 55 60
Gly Gln Leu Gly Leu Gly Ser Lys Ser Ala Ile Ser Lys Pro Thr Cys 65
70 75 80 Val Lys Ala Leu Lys
Pro Glu Lys Val Lys Leu Ala Ala Cys Gly Arg 85
90 95 Asn His Thr Leu Val Ser Thr Glu Gly Gly
Asn Val Tyr Ala Thr Gly 100 105
110 Gly Asn Asn Glu Gly Gln Leu Gly Leu Gly Asp Thr Glu Glu Arg
Asn 115 120 125 Thr
Phe His Val Ile Ser Phe Phe Thr Ser Glu His Lys Ile Lys Gln 130
135 140 Leu Ser Ala Gly Ser Asn
Thr Ser Ala Ala Leu Thr Glu Asp Gly Arg 145 150
155 160 Leu Phe Met Trp Gly Asp Asn Ser Glu Gly Gln
Ile Gly Leu Lys Asn 165 170
175 Val Ser Asn Val Cys Val Pro Gln Gln Val Thr Ile Gly Lys Pro Val
180 185 190 Ser Trp
Ile Ser Cys Gly Tyr Tyr His Ser Ala Phe Val Thr Thr Asp 195
200 205 Gly Glu Leu Tyr Val Phe Gly
Glu Pro Glu Asn Gly Lys Leu Gly Leu 210 215
220 Pro Asn Gln Leu Leu Gly Asn His Arg Thr Pro Gln
Leu Val Ser Glu 225 230 235
240 Ile Pro Glu Lys Val Ile Gln Val Ala Cys Gly Gly Glu His Thr Val
245 250 255 Val Leu Thr
Glu Asn Ala Val Tyr Thr Phe Gly Leu Gly Gln Phe Gly 260
265 270 Gln Leu Gly Leu Gly Thr Phe Leu
Phe Glu Thr Ser Glu Pro Lys Val 275 280
285 Ile Glu Asn Ile Arg Asp Gln Thr Ile Ser Tyr Ile Ser
Cys Gly Glu 290 295 300
Asn His Thr Ala Leu Ile Thr Asp Ile Gly Leu Met Tyr Thr Phe Gly 305
310 315 320 Asp Gly Arg His
Gly Lys Leu Gly Leu Gly Leu Glu Asn Phe Thr Asn 325
330 335 His Phe Ile Pro Thr Leu Cys Ser Asn
Phe Leu Arg Phe Ile Val Lys 340 345
350 Leu Val Ala Cys Gly Gly Cys His Met Val Val Phe Ala Ala
Pro His 355 360 365
Arg Gly Val Ala Lys Glu Ile Glu Phe Asp Glu Ile Asn Asp Thr Cys 370
375 380 Leu Ser Val Ala Thr
Phe Leu Pro Tyr Ser Ser Leu Thr Ser Gly Asn 385 390
395 400 Val Leu Gln Arg Thr Leu Ser Ala Arg Met
Arg Arg Arg Glu Arg Glu 405 410
415 Arg Ser Pro Asp Ser Phe Ser Met Arg Arg Thr Leu Pro Pro Ile
Glu 420 425 430 Gly
Thr Leu Gly Leu Ser Ala Cys Phe Leu Pro Asn Ser Val Phe Pro 435
440 445 Arg Cys Ser Glu Arg Asn
Leu Gln Glu Ser Val Leu Ser Glu Gln Asp 450 455
460 Leu Met Gln Pro Glu Glu Pro Asp Tyr Leu Leu
Asp Glu Met Thr Lys 465 470 475
480 Glu Ala Glu Ile Asp Asn Ser Ser Thr Val Glu Ser Leu Gly Glu Thr
485 490 495 Thr Asp
Ile Leu Asn Met Thr His Ile Met Ser Leu Asn Ser Asn Glu 500
505 510 Lys Ser Leu Lys Leu Ser Pro
Val Gln Lys Gln Lys Lys Gln Gln Thr 515 520
525 Ile Gly Glu Leu Thr Gln Asp Thr Ala Leu Thr Glu
Asn Asp Asp Ser 530 535 540
Asp Glu Tyr Glu Glu Met Ser Glu Met Lys Glu Gly Lys Ala Cys Lys 545
550 555 560 Gln His Val
Ser Gln Gly Ile Phe Met Thr Gln Pro Ala Thr Thr Ile 565
570 575 Glu Ala Phe Ser Asp Glu Glu Val
Glu Ile Pro Glu Glu Lys Glu Gly 580 585
590 Ala Glu Asp Ser Lys Gly Asn Gly Ile Glu Glu Gln Glu
Val Glu Ala 595 600 605
Asn Glu Glu Asn Val Lys Val His Gly Gly Arg Lys Glu Lys Thr Glu 610
615 620 Ile Leu Ser Asp
Asp Leu Thr Asp Lys Ala Glu Val Ser Glu Gly Lys 625 630
635 640 Ala Lys Ser Val Gly Glu Ala Glu Asp
Gly Pro Glu Gly Arg Gly Asp 645 650
655 Gly Thr Cys Glu Glu Gly Ser Ser Gly Ala Glu His Trp Gln
Asp Glu 660 665 670
Glu Arg Glu Lys Gly Glu Lys Asp Lys Gly Arg Gly Glu Met Glu Arg
675 680 685 Pro Gly Glu Gly
Glu Lys Glu Leu Ala Glu Lys Glu Glu Trp Lys Lys 690
695 700 Arg Asp Gly Glu Glu Gln Glu Gln
Lys Glu Arg Glu Gln Gly His Gln 705 710
715 720 Lys Glu Arg Asn Gln Glu Met Glu Glu Gly Gly Glu
Glu Glu His Gly 725 730
735 Glu Gly Glu Glu Glu Glu Gly Asp Arg Glu Glu Glu Glu Glu Lys Glu
740 745 750 Gly Glu Gly
Lys Glu Glu Gly Glu Gly Glu Glu Val Glu Gly Glu Arg 755
760 765 Glu Lys Glu Glu Gly Glu Arg Lys
Lys Glu Glu Arg Ala Gly Lys Glu 770 775
780 Glu Lys Gly Glu Glu Glu Gly Asp Gln Gly Glu Gly Glu
Glu Glu Glu 785 790 795
800 Thr Glu Gly Arg Gly Glu Glu Lys Glu Glu Gly Gly Glu Val Glu Gly
805 810 815 Gly Glu Val Glu
Glu Gly Lys Gly Glu Arg Glu Glu Glu Glu Glu Glu 820
825 830 Gly Glu Gly Glu Glu Glu Glu Gly Glu
Gly Glu Glu Glu Glu Gly Glu 835 840
845 Gly Glu Glu Glu Glu Gly Glu Gly Lys Gly Glu Glu Glu Gly
Glu Glu 850 855 860
Gly Glu Gly Glu Glu Glu Gly Glu Glu Gly Glu Gly Glu Gly Glu Glu 865
870 875 880 Glu Glu Gly Glu Gly
Glu Gly Glu Glu Glu Gly Glu Gly Glu Gly Glu 885
890 895 Glu Glu Glu Gly Glu Gly Glu Gly Glu Glu
Glu Gly Glu Gly Glu Gly 900 905
910 Glu Glu Glu Glu Gly Glu Gly Lys Gly Glu Glu Glu Gly Glu Glu
Gly 915 920 925 Glu
Gly Glu Gly Glu Glu Glu Glu Gly Glu Gly Glu Gly Glu Asp Gly 930
935 940 Glu Gly Glu Gly Glu Glu
Glu Glu Gly Glu Trp Glu Gly Glu Glu Glu 945 950
955 960 Glu Gly Glu Gly Glu Gly Glu Glu Glu Gly Glu
Gly Glu Gly Glu Glu 965 970
975 Gly Glu Gly Glu Gly Glu Glu Glu Glu Gly Glu Gly Glu Gly Glu Glu
980 985 990 Glu Glu
Gly Glu Glu Glu Gly Glu Glu Glu Gly Glu Gly Glu Glu Glu 995
1000 1005 Gly Glu Gly Glu Gly
Glu Glu Glu Glu Glu Gly Glu Val Glu Gly 1010 1015
1020 Glu Val Glu Gly Glu Glu Gly Glu Gly Glu
Gly Glu Glu Glu Glu 1025 1030 1035
Gly Glu Glu Glu Gly Glu Glu Arg Glu Lys Glu Gly Glu Gly Glu
1040 1045 1050 Glu Asn
Arg Arg Asn Arg Glu Glu Glu Glu Glu Glu Glu Gly Lys 1055
1060 1065 Tyr Gln Glu Thr Gly Glu Glu
Glu Asn Glu Arg Gln Asp Gly Glu 1070 1075
1080 Glu Tyr Lys Lys Val Ser Lys Ile Lys Gly Ser Val
Lys Tyr Gly 1085 1090 1095
Lys His Lys Thr Tyr Gln Lys Lys Ser Val Thr Asn Thr Gln Gly 1100
1105 1110 Asn Gly Lys Glu Gln
Arg Ser Lys Met Pro Val Gln Ser Lys Arg 1115 1120
1125 Leu Leu Lys Asn Gly Pro Ser Gly Ser Lys
Lys Phe Trp Asn Asn 1130 1135 1140
Val Leu Pro His Tyr Leu Glu Leu Lys 1145
1150 7 3871DNAArtificial SequenceDescription of Artificial
Sequence Synthetic polynucleotide 7tctagactcg aggaactgaa aaaccagaaa
gttaactggt aagtttagtc tttttgtctt 60ttatttcagg tcccggatcc ggtggtggtg
caaatcaaag aactgctcct cagtggatgt 120tgcctttact tctaggcctg tacggaagtg
ttacttctgc tctaaaagct gcggaattgt 180acccgcggcc gcgccaccat gagagagcca
gaggagctga tgccagatag cggagcagtg 240tttaccttcg gaaagtccaa gttcgcagag
aataacccag gaaagttctg gtttaaaaac 300gacgtgcccg tccacctgtc ttgtggcgat
gagcatagtg ccgtggtcac tgggaacaat 360aagctgtata tgttcgggtc caacaattgg
ggacagctgg ggctgggatc caaatctgct 420atctctaagc caacctgcgt gaaggcactg
aaacccgaga aggtcaaact ggccgcttgt 480ggcagaaacc acactctggt gagcaccgag
ggcgggaatg tctatgccac cggaggcaac 540aatgagggac agctgggact gggggacact
gaggaaagga atacctttca cgtgatctcc 600ttctttacat ctgagcataa gatcaagcag
ctgagcgccg gctccaacac atctgcagcc 660ctgactgagg acgggcgcct gttcatgtgg
ggagataatt cagagggcca gattgggctg 720aaaaacgtga gcaacgtgtg cgtgcctcag
caggtgacca tcggaaagcc agtcagttgg 780atttcatgtg gctactatca tagcgccttc
gtgaccacag atggcgagct gtacgtcttt 840ggggagcccg aaaacggaaa actgggcctg
cctaaccagc tgctgggcaa tcaccggaca 900ccccagctgg tgtccgagat ccctgaaaaa
gtgatccagg tcgcctgcgg gggagagcat 960acagtggtcc tgactgagaa tgccgtgtac
accttcggac tgggccagtt tggccagctg 1020gggctgggaa ccttcctgtt tgagacatcc
gaaccaaaag tgatcgagaa cattcgcgac 1080cagactatca gctacatttc ctgcggagag
aatcacaccg cactgatcac agacattggc 1140ctgatgtata cctttggcga tgggcggcac
gggaagctgg gactgggcct ggagaacttc 1200actaatcact tcatccccac cctgtgctct
aacttcctgc ggttcatcgt gaaactggtc 1260gcttgcggcg ggtgtcacat ggtggtcttc
gctgcacctc ataggggcgt ggctaaggag 1320atcgaatttg acgagattaa cgatacatgc
ctgagcgtgg caactttcct gccatacagc 1380tccctgactt ctggcaatgt gctgcagaga
accctgagtg caaggatgcg gagaagggag 1440agggaacgct ctcctgacag tttctcaatg
cgacgaaccc tgccacctat cgaggggaca 1500ctgggactga gtgcctgctt cctgcctaac
tcagtgtttc cacgatgtag cgagcggaat 1560ctgcaggagt ctgtcctgag tgagcaggat
ctgatgcagc cagaggaacc cgactacctg 1620ctggatgaga tgaccaagga ggccgaaatc
gacaactcta gtacagtgga gtccctgggc 1680gagactaccg atatcctgaa tatgacacac
attatgtcac tgaacagcaa tgagaagagt 1740ctgaaactgt caccagtgca gaagcagaag
aaacagcaga ctattggcga gctgactcag 1800gacaccgccc tgacagagaa cgacgatagc
gatgagtatg aggaaatgtc cgagatgaag 1860gaaggcaaag cttgtaagca gcatgtgagt
caggggatct tcatgacaca gccagccaca 1920actattgagg ctttttcaga cgaggaagtg
gagatccccg aggaaaaaga gggcgcagaa 1980gattccaagg ggaatggaat tgaggaacag
gaggtggaag ccaacgagga aaatgtgaaa 2040gtccacggag gcaggaagga gaaaacagaa
atcctgtctg acgatctgac tgacaaggcc 2100gaggtgtccg aaggcaaggc aaaatctgtc
ggagaggcag aagacggacc agagggacga 2160ggggatggaa cctgcgagga aggctcaagc
ggggctgagc attggcagga cgaggaacga 2220gagaagggcg aaaaggataa aggccgcggg
gagatggaac gacctggaga gggcgaaaaa 2280gagctggcag agaaggagga atggaagaaa
agggacggcg aggaacagga gcagaaagaa 2340agggagcagg gccaccagaa ggagcgcaac
caggagatgg aagagggcgg cgaggaagag 2400catggcgagg gagaagagga agagggcgat
agagaagagg aagaggaaaa agaaggcgaa 2460gggaaggagg aaggagaggg cgaggaagtg
gaaggcgaga gggaaaagga ggaaggagaa 2520cggaagaaag aggaaagagc cggcaaagag
gaaaagggcg aggaagaggg cgatcagggc 2580gaaggcgagg aggaagagac cgagggccgc
ggggaagaga aagaggaggg aggagaggtg 2640gagggcggag aggtcgaaga gggaaagggc
gagcgcgaag aggaagagga agagggcgag 2700ggcgaggaag aagagggcga gggggaagaa
gaggagggag agggcgaaga ggaagagggg 2760gagggaaagg gcgaagagga aggagaggaa
ggggagggag aggaagaggg ggaggagggc 2820gagggggaag gcgaggagga agaaggagag
ggggaaggcg aagaggaagg cgagggggaa 2880ggagaggagg aagaagggga aggcgaaggc
gaagaggagg gagaaggaga gggggaggaa 2940gaggaaggag aagggaaggg cgaggaggaa
ggcgaagagg gagaggggga aggcgaggaa 3000gaggaaggcg agggcgaagg agaggacggc
gagggcgagg gagaagagga ggaaggggaa 3060tgggaaggcg aagaagagga aggcgaaggc
gaaggcgaag aagagggcga aggggagggc 3120gaggagggcg aaggcgaagg ggaggaagag
gaaggcgaag gagaaggcga ggaagaagag 3180ggagaggagg aaggcgagga ggaaggagag
ggggaggagg agggagaagg cgagggcgaa 3240gaagaagaag agggagaagt ggagggcgaa
gtcgaggggg aggagggaga aggggaaggg 3300gaggaagaag agggcgaaga agaaggcgag
gaaagagaaa aagagggaga aggcgaggaa 3360aaccggagaa atagggaaga ggaggaagag
gaagagggaa agtaccagga gacaggcgaa 3420gaggaaaacg agcggcagga tggcgaggaa
tataagaaag tgagcaagat caaaggatcc 3480gtcaagtacg gcaagcacaa aacctatcag
aagaaaagcg tgaccaacac acaggggaat 3540ggaaaagagc agcgaagtaa aatgcctgtg
cagtcaaaac ggctgctgaa gaatggccca 3600agcgggtcta aaaaattctg gaacaatgtc
ctgccacact atctggaact gaagtaagcg 3660gccgcgcgga tccagacatg ataagataca
ttgatgagtt tggacaaacc acaactagaa 3720tgcagtgaaa aaaatgcttt atttgtgaaa
tttgtgatgc tattgcttta tttgtaacca 3780ttataagctg caataaacaa gttaacaaca
acaattgcat tcattttatg tttcaggttc 3840agggggaggt gtgggaggtt ttttagcatg c
38718235DNAHomo sapiens 8agcacagtgt
ctggcatgta gcaggaacta aaataatggc agtgattaat gttatgatat 60gcagacacaa
cacagcaaga taagatgcaa tgtaccttct gggtcaaacc accctggcca 120ctcctccccg
atacccaggg ttgatgtgct tgaattagac aggattaaag gcttactgga 180gctggaagcc
ttgccccaac tcaggagttt agccccagac cttctgtcca ccagc 23599PRTHomo
sapiens 9Glu Glu Glu Gly Glu Gly Glu Gly Glu 1 5
106PRTMus musculus 10Glu Glu Gly Glu Gly Glu 1 5
114107DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 11agatctgaat tcagcacagt gtctggcatg
tagcaggaac taaaataatg gcagtgatta 60atgttatgat atgcagacac aacacagcaa
gataagatgc aatgtacctt ctgggtcaaa 120ccaccctggc cactcctccc cgatacccag
ggttgatgtg cttgaattag acaggattaa 180aggcttactg gagctggaag ccttgcccca
actcaggagt ttagccccag accttctgtc 240caccagctct agactcgagg aactgaaaaa
ccagaaagtt aactggtaag tttagtcttt 300ttgtctttta tttcaggtcc cggatccggt
ggtggtgcaa atcaaagaac tgctcctcag 360tggatgttgc ctttacttct aggcctgtac
ggaagtgtta cttctgctct aaaagctgcg 420gaattgtacc cgcggccgcg ccaccatgag
agagccagag gagctgatgc cagatagcgg 480agcagtgttt accttcggaa agtccaagtt
cgcagagaat aacccaggaa agttctggtt 540taaaaacgac gtgcccgtcc acctgtcttg
tggcgatgag catagtgccg tggtcactgg 600gaacaataag ctgtatatgt tcgggtccaa
caattgggga cagctggggc tgggatccaa 660atctgctatc tctaagccaa cctgcgtgaa
ggcactgaaa cccgagaagg tcaaactggc 720cgcttgtggc agaaaccaca ctctggtgag
caccgagggc gggaatgtct atgccaccgg 780aggcaacaat gagggacagc tgggactggg
ggacactgag gaaaggaata cctttcacgt 840gatctccttc tttacatctg agcataagat
caagcagctg agcgccggct ccaacacatc 900tgcagccctg actgaggacg ggcgcctgtt
catgtgggga gataattcag agggccagat 960tgggctgaaa aacgtgagca acgtgtgcgt
gcctcagcag gtgaccatcg gaaagccagt 1020cagttggatt tcatgtggct actatcatag
cgccttcgtg accacagatg gcgagctgta 1080cgtctttggg gagcccgaaa acggaaaact
gggcctgcct aaccagctgc tgggcaatca 1140ccggacaccc cagctggtgt ccgagatccc
tgaaaaagtg atccaggtcg cctgcggggg 1200agagcataca gtggtcctga ctgagaatgc
cgtgtacacc ttcggactgg gccagtttgg 1260ccagctgggg ctgggaacct tcctgtttga
gacatccgaa ccaaaagtga tcgagaacat 1320tcgcgaccag actatcagct acatttcctg
cggagagaat cacaccgcac tgatcacaga 1380cattggcctg atgtatacct ttggcgatgg
gcggcacggg aagctgggac tgggcctgga 1440gaacttcact aatcacttca tccccaccct
gtgctctaac ttcctgcggt tcatcgtgaa 1500actggtcgct tgcggcgggt gtcacatggt
ggtcttcgct gcacctcata ggggcgtggc 1560taaggagatc gaatttgacg agattaacga
tacatgcctg agcgtggcaa ctttcctgcc 1620atacagctcc ctgacttctg gcaatgtgct
gcagagaacc ctgagtgcaa ggatgcggag 1680aagggagagg gaacgctctc ctgacagttt
ctcaatgcga cgaaccctgc cacctatcga 1740ggggacactg ggactgagtg cctgcttcct
gcctaactca gtgtttccac gatgtagcga 1800gcggaatctg caggagtctg tcctgagtga
gcaggatctg atgcagccag aggaacccga 1860ctacctgctg gatgagatga ccaaggaggc
cgaaatcgac aactctagta cagtggagtc 1920cctgggcgag actaccgata tcctgaatat
gacacacatt atgtcactga acagcaatga 1980gaagagtctg aaactgtcac cagtgcagaa
gcagaagaaa cagcagacta ttggcgagct 2040gactcaggac accgccctga cagagaacga
cgatagcgat gagtatgagg aaatgtccga 2100gatgaaggaa ggcaaagctt gtaagcagca
tgtgagtcag gggatcttca tgacacagcc 2160agccacaact attgaggctt tttcagacga
ggaagtggag atccccgagg aaaaagaggg 2220cgcagaagat tccaagggga atggaattga
ggaacaggag gtggaagcca acgaggaaaa 2280tgtgaaagtc cacggaggca ggaaggagaa
aacagaaatc ctgtctgacg atctgactga 2340caaggccgag gtgtccgaag gcaaggcaaa
atctgtcgga gaggcagaag acggaccaga 2400gggacgaggg gatggaacct gcgaggaagg
ctcaagcggg gctgagcatt ggcaggacga 2460ggaacgagag aagggcgaaa aggataaagg
ccgcggggag atggaacgac ctggagaggg 2520cgaaaaagag ctggcagaga aggaggaatg
gaagaaaagg gacggcgagg aacaggagca 2580gaaagaaagg gagcagggcc accagaagga
gcgcaaccag gagatggaag agggcggcga 2640ggaagagcat ggcgagggag aagaggaaga
gggcgataga gaagaggaag aggaaaaaga 2700aggcgaaggg aaggaggaag gagagggcga
ggaagtggaa ggcgagaggg aaaaggagga 2760aggagaacgg aagaaagagg aaagagccgg
caaagaggaa aagggcgagg aagagggcga 2820tcagggcgaa ggcgaggagg aagagaccga
gggccgcggg gaagagaaag aggagggagg 2880agaggtggag ggcggagagg tcgaagaggg
aaagggcgag cgcgaagagg aagaggaaga 2940gggcgagggc gaggaagaag agggcgaggg
ggaagaagag gagggagagg gcgaagagga 3000agagggggag ggaaagggcg aagaggaagg
agaggaaggg gagggagagg aagaggggga 3060ggagggcgag ggggaaggcg aggaggaaga
aggagagggg gaaggcgaag aggaaggcga 3120gggggaagga gaggaggaag aaggggaagg
cgaaggcgaa gaggagggag aaggagaggg 3180ggaggaagag gaaggagaag ggaagggcga
ggaggaaggc gaagagggag agggggaagg 3240cgaggaagag gaaggcgagg gcgaaggaga
ggacggcgag ggcgagggag aagaggagga 3300aggggaatgg gaaggcgaag aagaggaagg
cgaaggcgaa ggcgaagaag agggcgaagg 3360ggagggcgag gagggcgaag gcgaagggga
ggaagaggaa ggcgaaggag aaggcgagga 3420agaagaggga gaggaggaag gcgaggagga
aggagagggg gaggaggagg gagaaggcga 3480gggcgaagaa gaagaagagg gagaagtgga
gggcgaagtc gagggggagg agggagaagg 3540ggaaggggag gaagaagagg gcgaagaaga
aggcgaggaa agagaaaaag agggagaagg 3600cgaggaaaac cggagaaata gggaagagga
ggaagaggaa gagggaaagt accaggagac 3660aggcgaagag gaaaacgagc ggcaggatgg
cgaggaatat aagaaagtga gcaagatcaa 3720aggatccgtc aagtacggca agcacaaaac
ctatcagaag aaaagcgtga ccaacacaca 3780ggggaatgga aaagagcagc gaagtaaaat
gcctgtgcag tcaaaacggc tgctgaagaa 3840tggcccaagc gggtctaaaa aattctggaa
caatgtcctg ccacactatc tggaactgaa 3900gtaagcggcc gcgcggatcc agacatgata
agatacattg atgagtttgg acaaaccaca 3960actagaatgc agtgaaaaaa atgctttatt
tgtgaaattt gtgatgctat tgctttattt 4020gtaaccatta taagctgcaa taaacaagtt
aacaacaaca attgcattca ttttatgttt 4080caggttcagg gggaggtgtg ggaggtt
4107124142DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
12gcagagaggg agtggccaac ctcctagatc tgaattcagc acagtgtctg gcatgtagca
60ggaactaaaa taatggcagt gattaatgtt atgatatgca gacacaacac agcaagataa
120gatgcaatgt accttctggg tcaaaccacc ctggccactc ctccccgata cccagggttg
180atgtgcttga attagacagg attaaaggct tactggagct ggaagccttg ccccaactca
240ggagtttagc cccagacctt ctgtccacca gctctagact cgaggaactg aaaaaccaga
300aagttaactg gtaagtttag tctttttgtc ttttatttca ggtcccggat ccggtggtgg
360tgcaaatcaa agaactgctc ctcagtggat gttgccttta cttctaggcc tgtacggaag
420tgttacttct gctctaaaag ctgcggaatt gtacccgcgg ccgcgccacc atgagagagc
480cagaggagct gatgccagat agcggagcag tgtttacctt cggaaagtcc aagttcgcag
540agaataaccc aggaaagttc tggtttaaaa acgacgtgcc cgtccacctg tcttgtggcg
600atgagcatag tgccgtggtc actgggaaca ataagctgta tatgttcggg tccaacaatt
660ggggacagct ggggctggga tccaaatctg ctatctctaa gccaacctgc gtgaaggcac
720tgaaacccga gaaggtcaaa ctggccgctt gtggcagaaa ccacactctg gtgagcaccg
780agggcgggaa tgtctatgcc accggaggca acaatgaggg acagctggga ctgggggaca
840ctgaggaaag gaataccttt cacgtgatct ccttctttac atctgagcat aagatcaagc
900agctgagcgc cggctccaac acatctgcag ccctgactga ggacgggcgc ctgttcatgt
960ggggagataa ttcagagggc cagattgggc tgaaaaacgt gagcaacgtg tgcgtgcctc
1020agcaggtgac catcggaaag ccagtcagtt ggatttcatg tggctactat catagcgcct
1080tcgtgaccac agatggcgag ctgtacgtct ttggggagcc cgaaaacgga aaactgggcc
1140tgcctaacca gctgctgggc aatcaccgga caccccagct ggtgtccgag atccctgaaa
1200aagtgatcca ggtcgcctgc gggggagagc atacagtggt cctgactgag aatgccgtgt
1260acaccttcgg actgggccag tttggccagc tggggctggg aaccttcctg tttgagacat
1320ccgaaccaaa agtgatcgag aacattcgcg accagactat cagctacatt tcctgcggag
1380agaatcacac cgcactgatc acagacattg gcctgatgta tacctttggc gatgggcggc
1440acgggaagct gggactgggc ctggagaact tcactaatca cttcatcccc accctgtgct
1500ctaacttcct gcggttcatc gtgaaactgg tcgcttgcgg cgggtgtcac atggtggtct
1560tcgctgcacc tcataggggc gtggctaagg agatcgaatt tgacgagatt aacgatacat
1620gcctgagcgt ggcaactttc ctgccataca gctccctgac ttctggcaat gtgctgcaga
1680gaaccctgag tgcaaggatg cggagaaggg agagggaacg ctctcctgac agtttctcaa
1740tgcgacgaac cctgccacct atcgagggga cactgggact gagtgcctgc ttcctgccta
1800actcagtgtt tccacgatgt agcgagcgga atctgcagga gtctgtcctg agtgagcagg
1860atctgatgca gccagaggaa cccgactacc tgctggatga gatgaccaag gaggccgaaa
1920tcgacaactc tagtacagtg gagtccctgg gcgagactac cgatatcctg aatatgacac
1980acattatgtc actgaacagc aatgagaaga gtctgaaact gtcaccagtg cagaagcaga
2040agaaacagca gactattggc gagctgactc aggacaccgc cctgacagag aacgacgata
2100gcgatgagta tgaggaaatg tccgagatga aggaaggcaa agcttgtaag cagcatgtga
2160gtcaggggat cttcatgaca cagccagcca caactattga ggctttttca gacgaggaag
2220tggagatccc cgaggaaaaa gagggcgcag aagattccaa ggggaatgga attgaggaac
2280aggaggtgga agccaacgag gaaaatgtga aagtccacgg aggcaggaag gagaaaacag
2340aaatcctgtc tgacgatctg actgacaagg ccgaggtgtc cgaaggcaag gcaaaatctg
2400tcggagaggc agaagacgga ccagagggac gaggggatgg aacctgcgag gaaggctcaa
2460gcggggctga gcattggcag gacgaggaac gagagaaggg cgaaaaggat aaaggccgcg
2520gggagatgga acgacctgga gagggcgaaa aagagctggc agagaaggag gaatggaaga
2580aaagggacgg cgaggaacag gagcagaaag aaagggagca gggccaccag aaggagcgca
2640accaggagat ggaagagggc ggcgaggaag agcatggcga gggagaagag gaagagggcg
2700atagagaaga ggaagaggaa aaagaaggcg aagggaagga ggaaggagag ggcgaggaag
2760tggaaggcga gagggaaaag gaggaaggag aacggaagaa agaggaaaga gccggcaaag
2820aggaaaaggg cgaggaagag ggcgatcagg gcgaaggcga ggaggaagag accgagggcc
2880gcggggaaga gaaagaggag ggaggagagg tggagggcgg agaggtcgaa gagggaaagg
2940gcgagcgcga agaggaagag gaagagggcg agggcgagga agaagagggc gagggggaag
3000aagaggaggg agagggcgaa gaggaagagg gggagggaaa gggcgaagag gaaggagagg
3060aaggggaggg agaggaagag ggggaggagg gcgaggggga aggcgaggag gaagaaggag
3120agggggaagg cgaagaggaa ggcgaggggg aaggagagga ggaagaaggg gaaggcgaag
3180gcgaagagga gggagaagga gagggggagg aagaggaagg agaagggaag ggcgaggagg
3240aaggcgaaga gggagagggg gaaggcgagg aagaggaagg cgagggcgaa ggagaggacg
3300gcgagggcga gggagaagag gaggaagggg aatgggaagg cgaagaagag gaaggcgaag
3360gcgaaggcga agaagagggc gaaggggagg gcgaggaggg cgaaggcgaa ggggaggaag
3420aggaaggcga aggagaaggc gaggaagaag agggagagga ggaaggcgag gaggaaggag
3480agggggagga ggagggagaa ggcgagggcg aagaagaaga agagggagaa gtggagggcg
3540aagtcgaggg ggaggaggga gaaggggaag gggaggaaga agagggcgaa gaagaaggcg
3600aggaaagaga aaaagaggga gaaggcgagg aaaaccggag aaatagggaa gaggaggaag
3660aggaagaggg aaagtaccag gagacaggcg aagaggaaaa cgagcggcag gatggcgagg
3720aatataagaa agtgagcaag atcaaaggat ccgtcaagta cggcaagcac aaaacctatc
3780agaagaaaag cgtgaccaac acacagggga atggaaaaga gcagcgaagt aaaatgcctg
3840tgcagtcaaa acggctgctg aagaatggcc caagcgggtc taaaaaattc tggaacaatg
3900tcctgccaca ctatctggaa ctgaagtaag cggccgcgcg gatccagaca tgataagata
3960cattgatgag tttggacaaa ccacaactag aatgcagtga aaaaaatgct ttatttgtga
4020aatttgtgat gctattgctt tatttgtaac cattataagc tgcaataaac aagttaacaa
4080caacaattgc attcatttta tgtttcaggt tcagggggag gtgtgggagg ttttttagca
4140tg
4142
User Contributions:
Comment about this patent or add new information about this topic: