Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Patent application title: NOVEL AAV8 MUTANT CAPSIDS AND COMPOSITIONS CONTAINING SAME

Inventors:  James M. Wilson (Philadelphia, PA, US)  James M. Wilson (Philadelphia, PA, US)  Qiang Wang (Philadelphia, PA, US)
IPC8 Class: AC12N1586FI
USPC Class: 1 1
Class name:
Publication date: 2021-11-04
Patent application number: 20210340569



Abstract:

Provided herein are AAV8 mutant capsids and rAAV comprising the same. In one embodiment, vectors employing the AAV8 mutant capsid show increased transduction in a selected tissue as compared to AAV8.

Claims:

1. An adeno-associated virus comprising a capsid having the sequence of SEQ ID NO: 18 (AAV3G1), SEQ ID NO: 20 (AAV8.T20); or SEQ ID NO: 22 (AAV8.TR1).

2. A nucleic acid encoding the capsid according to claim 1.

3. The AAV according to claim 1, wherein the capsid is encoded by SEQ ID NO: 17, SEQ ID NO: 19 or SEQ ID NO: 21, or a sequence sharing at least 80% identity therewith.

4. A recombinant adeno-associated virus (AAV) comprising an AAV capsid having an amino acid sequence selected from: SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32 and 34, further comprising a non-AAV nucleic acid sequence.

5. A nucleic acid molecule comprising a nucleic acid sequence encoding an AAV capsid protein, wherein said nucleic acid sequence is selected from SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 and 33.

Description:

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of U.S. patent application Ser. No. 16/093,800, filed Oct. 15, 2018, which is a National Stage Entry under 35 USC 371 of International Patent Application No. PCT/US2017/027392, filed Apr. 13, 2017, which claims the benefit under 35 USC 119(e) of U.S. Provisional Patent Application No. 62/323,389, filed Apr. 15, 2016. These applications are incorporated herein by reference.

INCORPORATION-BY-REFERFNCE OF MATERIAL SUBMITTED IN ELECTRONIC FORM

[0002] Applicant hereby incorporates by reference the Sequence Listing material filed in electronic form herewith. This file is labeled "UPN-16-7726PCT_ST25.txt".

BACKGROUND OF THE INVENTION

[0003] Adeno-associated viruses (AAV) hold great promise in human gene therapy and have been widely used to target liver, muscle, heart, brain, eye, kidney and other tissues in various studies due to its ability to provide long-term gene expression and lack of pathogenicity. AAVs belong to the parvovirus family and each contains a single strand DNA flanked by two inverted terminal repeats. Dozens of naturally occurring AAV capsids have been reported their unique capsid structures enable them to recognize and transduce different cell types and organs.

[0004] Since the first trial which started in 1981, there has not been any vector-related toxicity reported in clinical trials of adeno-associated virus (AAV) vector based gene therapy. The ever-accumulating safety records of AAV vector in clinical trials, combined with demonstrated efficacy, show that AAV is a good platform to work with. Another attractive feature is that AAV is relatively easy to be manipulated as AAV is a single-stranded DNA virus with a small genome (.about.4.7 kb) and simple genetic components-inverted terminal repeats (ITR), the Rep and Cap genes. Only the ITRs and AAV capsid protein are required in AAV vectors, with the ITRs serving as replication and packaging signals for vector production and the capsid proteins playing a central role by forming capsids to accommodate vector genome DNA, determining tissue tropism and delivering vector genomic DNA into target cells. There have been mainly four ways to obtain AAV capsid genes: isolating AAVs from cultures or tissues samples, AAV directed evolution, shuffling, and rational design.

[0005] AAV8 has been shown to effectively transduce liver, muscle. In addition, AAV8-mediated hFIX gene transfer by a single peripheral-vein infusion consistently leads to long-term expression of the FIX transgene at therapeutic levels without acute or long-lasting toxicity in patients with severe hemophilia B.

[0006] AAV vectors possess many advantages in gene transfer, but there are still some problems to be solved. Thus, more effective AAV vectors are needed.

SUMMARY OF THE INVENTION

[0007] In one aspect, an adeno-associated virus is provided. The virus comprises an AAV8 mutant capsid. In one embodiment, the capsid has the sequence of SEQ ID NO: 18 and is termed AAV3G1. In another embodiment, the capsid has the sequence of SEQ ID NO: 20 and is termed AAV8.T20. In yet another embodiment, the capsid has the sequence of SEQ ID NO: 22 and is termed AAV8.TR1. In another aspect, a nucleic acid encoding a capsid as described herein is provided. In one embodiment, the capsid is encoded by SEQ ID NO: 17 or a sequence sharing at least 95% identity therewith. In another embodiment, the capsid is encoded by SEQ ID NO: 19 or a sequence sharing at least 95% identity therewith. In another embodiment, the capsid is encoded by SEQ ID NO: 21 or a sequence sharing at least 95% identity therewith.

[0008] In another embodiment, the AAV which includes an AAV8 mutant capsid, includes at least a vp3 capsid having a mutation in at least one of the following regions, as compared to native AAV8 (SEQ ID NO: 34): i. aa 263 to 267 (SEQ ID NO: 78); ii. aa 457 to aa 459; iii. aa 455 to aa 459 (SEQ ID NO: 81); or iv. aa 583 to aa 597 (SEQ ID NO: 69). In one embodiment, the AAV having the AAV8 mutant capsid has increased transduction in a target tissue as compared to AAV8. In one embodiment, the target tissue is muscle, liver, lung, airway epithelium, neurons, eye, or heart. In another embodiment, the AAV having the AAV8 mutant capsid has an increased ability to escape AAV neutralizing antibodies as compared to native AAV8.

[0009] In one embodiment, the vp1 and or vp2 unique regions are derived from a different AAV than the AAV supplying the vp3 unique region (i.e., AAV8). In one embodiment, the AAV supplying the vp1 and vp2 sequences is rh.20. In one embodiment, the rh.20 vp1 sequence is SEQ ID NO: 88.

[0010] In another embodiment, the AAV further includes AAV inverted terminal repeats and a heterologous nucleic acid sequence operably linked to regulatory sequences which direct expression of a product encoded by the heterologous nucleic acid sequence in a target cell.

[0011] In another aspect, a method of transducing a target tissue is provided. In one embodiment, the method includes administering an AAV having a capsid as described herein. In one embodiment, a method of transducing liver tissue is provided, comprising administering an AAV having the AAV3G1 capsid. In another embodiment, a method of transducing muscle tissue is provided, comprising administering an AAV having the AAV3G1 capsid. In yet another embodiment, a method of transducing airway epithelium is provided, comprising administering an AAV having the AAV3G1 or AAV8.T20 capsid. In another embodiment, a method of transducing liver tissue is provided, comprising administering an AAV having the AAV8.TR1 capsid. In yet another embodiment, a method of transducing ocular cells is provided, comprising administering an AAV having the AAV3G1 capsid.

[0012] In yet another aspect, a method of generating a mutant AAV capsid having increased transduction for a target tissue, as compared to the wild type capsid is provided. The method includes performing mutagenesis at the contact region of a neutralizing antibody to the wild type capsid; and performing in vitro selection in the presence of the monoclonal antibody. In one embodiment, the method includes performing an additional mutation at a hypervariable region of the capsid. In another embodiment, the method further includes substituting the vp1 and/or vp2 unique sequences with the vp1 and/or vp2 sequences from a different AAV capsid.

[0013] In another aspect, a method of generating a recombinant adeno-associated virus (AAV) comprising an AAV capsid is provided. In one embodiment, the method includes culturing a host cell containing: (a) a molecule encoding an AAV capsid protein a capsid having a mutation in at least one of the following regions, as compared to native AAV8 (SEQ ID NO: 34): i. aa 263 to 267 (SEQ ID NO: 78); ii. aa 457 to aa 459; iii. aa 455 to aa 459 (SEQ ID NO: 81); or iv. aa 583 to aa 597 (SEQ ID NO: 69); (b) a functional rep gene; (c) a minigene comprising AAV inverted terminal repeats (ITRs) and a transgene; and (d) sufficient helper functions to permit packaging of the minigene into the AAV capsid protein.

[0014] In yet another aspect, a recombinant adeno-associated virus (AAV) is provided. In one embodiment, the rAAV includes an AAV capsid having an amino acid sequence selected from: SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and 32. Such capsids are sometimes referred to herein as the "AAV8 mutant capsid(s)". The rAAV further includes a non-AAV nucleic acid sequence. In another aspect, a nucleic acid molecule encoding an AAV capsid sequence is provided. In one embodiment, the nucleic acid sequence is selected from SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31.

[0015] In another aspect, an AAV capsid protein is provided. The AAV capsid has a mutation in at least one of the following regions, as compared to native AAV8 (SEQ ID NO: 34): i. aa 263 to 267 (SEQ ID NO: 78); ii. aa 457 to aa 459; iii. aa 455 to aa 459 (SEQ ID NO: 81); or iv. aa 583 to aa 597 (SEQ ID NO: 69). In another aspect, a nucleic acid sequence encoding an AAV capsid as described herein, is provided.

[0016] In yet another aspect, a host cell transfected with an adeno-associated virus as described herein, is provided.

[0017] In another aspect, a composition is provided which includes at least an AAV as described herein and a physiologically compatible carrier, buffer, adjuvant, and/or diluent.

[0018] In yet another aspect, a method of delivering a transgene to a cell is provided. The method includes the step of contacting the cell with an AAV as described herein, wherein said rAAV comprises the transgene.

BRIEF DESCRIPTION OF THE FIGURES

[0019] FIG. 1A provides a map of the plasmid used for AAV mutant library construction.

[0020] FIG. 1B illustrates the selection process of the AAV mutant library construction.

[0021] FIG. 2A is a bar graph demonstrating that mutagenesis at the antibody-capsid contact sites confers Nab resistance in vitro. The HEK 293 cells were infected by AAV8 and mutants carrying CMV.eGFP, mixed with medium (No Ab), antibody ADK8, ADK8/9 or ADK9. The M.O.I. was around 1e4. Two days later, GFP images were taken and analyzed. See Example 2B2.

[0022] FIG. 2B is a scatter plot demonstrating mutagenesis at the antibody-capsid contact sites confers Nab resistance in vivo. AAV8 mutants were packed with TBG.canine F9-WPRE cassette and tested in B6 in the presence/absence of antibody ADK8 through i.v. injection. 100 uL of diluted ADK8 was injected i.v. 2 hours prior to vector injection. AAV8 was used as control. Canine F9 level was measured with ELISA from plasma collected 1 week after administration. The percent of F9 from ADK8-present animal to ADK8-absent animal and p value (t-test) are shown above. See Example 2B6.

[0023] FIGS. 3A-3B are a protein Alignment of AAV8, AAV3G1, AAV8.T20 and AAV8.TR1 as described herein.

[0024] FIG. 4A demonstrates that AAV3G1 is resistant to pooled human IVIG (hIVIG), compared to AAV8. AAV8 (filled bar) or AAV3G1 (open bar) carrying CB7.CI.luciferase cassette were incubated with various dilution of pooled human IVIG before applied to Huh7 cells in 96 well plates (M.O.I., .about.1e4). Luciferase level was read 72 hours after infection. The x-axis is the dilution fold of hIVIG. The y-axis represents the percentage of luciferase expression compared to "vector alone" control. The gray dot line indicates 50% expression level.

[0025] FIG. 4B demonstrates that all three mutations in AAV3G1 contribute to Nab resistance. AAV8, AAV3G1 and mutants carrying all the combinations of the three mutations comprising AAV3G1 were tested in vitro with human plasmas (4 samples) and anti-AAV8 monkey sera (4 samples). AAV8 and the variants were incubated with diluted sera/plasma (final anti-AAV8 Nab titer in the mix, 1:4) before applied to Huh7 cells in 96-well plates. Luciferase expression was read 72 hours later and converted to the percentage of the expression level of each "vector alone" control. for each serum/plasma, a ranking number was assigned to each vector according to their residual expression (the ranking number of the highest residual expression was 1 and the lowest was 8). See Example 2C.

[0026] FIG. 5A are photographs of mice injected i.m. with AAV8 or AAV3G1 carrying a CB7.CI.luciferase cassette. Vector was administered into B6 muscle at a dose of 3.times.10.sup.10 gc/mouse, 4 mice/group. Luciferase activity was monitored 2 weeks and 4 weeks after dosing. These findings demonstrate that, through intramuscular injection, AAV3G1 prefers muscle to liver, compared to AAV8. See Example 2C.

[0027] FIG. 5B are photographs of muscle tissue after i.m. injection of AAV vectors carrying a different transgene cassette from that shown in FIG. 5a. These experiments show similar muscle preference of AAV3G1 in B6 mice. Dose, 1.times.10.sup.9 gc/animal, 5.times.10.sup.8 gc/25 uL/leg, both legs. Week 3 after vector injection, muscle section, X-gal staining, the best section of each group, 4.times..

[0028] FIG. 5C. I.m. injection of AAV vectors carrying a third transgene cassette, tMCK.human F9, shows similar muscle preference of AAV3G1 in B6 mice. tMCK is a muscle-specific promoter. Dose, 3e10 gc/mouse, 3 mice/group. Plasma and muscle were collected 28 and 30 days after dosing, respectively. Human F9 was measured by ELISA from plasma and muscle lysate. The muscle F9 expression level of AAV3G1 was 11.2 folds of AAV8. See Example 2B6.

[0029] FIG. 5D. The neutralizing antibody titer of the day 28 plasma shows that the antigenecity of AAV8 and AAV3G1 is different. The plasma samples were from the study of FIG. 5c. See Example 2B6.

[0030] FIG. 6A. Overview of X-gal stained sections from heart, muscle and liver of mice received AAV8 or AAV3G1 vector. MPS 3A Het mice (B6 background) received Sell gc of AAV.CMV.Lac/mouse, i.v. Tissues were collected 14 days later. Representative muscle sections of each animal at 4.times.. See Example 2C.

[0031] FIG. 6B. Representative image of in vivo luciferase imaging, to compare AAV8 and AAV3G1 with CB7.CI.ffluciferase transgene cassette, i.v., in B6 mice. Dose, 3e11 gc/mouse, week 2 after vector injection. The left is AAV8; the right is AAV3G1. See Example 2C.

[0032] FIG. 7A. AAV3G1 has a higher transduction to mouse airway epithelial cells and the transduction is improved further by replacing VP12 region with rh.20. B6 mice received 1e11 gc/mouse of AAV.CB7.CI.luciferase, i.n.. The luciferase activity was monitored 2, 3 and 4 week after vector administration. The right panel is a representative image (week 4) of the study. The left panel is quantification with Living Image.RTM. 3.2 and normalized by the average value of AAV8 group at week 2. See Example 2C.

[0033] FIG. 7B. Airway epithelia cell transduction comparison of AAV8, AAV8.T20, AAV9 and AAV6.2. B6 mice received 1e11 gc/mouse of AAV.CB7.CI.luciferase, i.n., 4 mice/vector. The luciferase activity was monitored 1, 2 and 3 week after vector administration. Living Image.RTM. 3.2 was used for quantification and normalized by the average value of AAV8 group at week 1. See Example 2C.

[0034] FIG. 8A. The heparin affinity of AAV3G1 is increased. AAV vectors were diluted in DPBS and 2e11 gc of the vector was loaded to Heparin column, followed by washing with DPBS and DPBS with various concentrations of NaCl. Dot blot was performed with PVDF membrane with antibody B1.

[0035] FIG. 8B. The charge reduction in AAV8.TR1 decreases its heparin affinity. Equal gc of AAV8.TR1.TBG.hF9co.WPRE.bGH and AAV3G1.CB7.CI.luciferase.RBG were mixed together in Tris buffer (pH 7.4, 0.01 M), loaded onto heparin column and washed sequentially with various buffers. Fractions were collected during the process: FT+W, flow-through plus wash with Tris buffer, 0.05 M-2.0 M, Tris buffer plus 0.05-2.0 M NaCl. Vector distributions were measured by qPCR with bGH and RBG probes.

[0036] FIG. 8C shows charge reduction of AAV3G1, resulting the in the mutant AAV8.TR1, restores liver transduction partially. B6 mice were administrated intravenously with AAV.TBG.hF9co.WPRE.RBG at a dose of 1e10 gc/mouse, 5 mouse/group. Plasma was collected week 1, 2 and 4 after vector injection and measured by human F9 ELISA.

[0037] FIG. 8D provides results of in vitro Huh7 Nab assy. Reporter:CB7.CI.ffluciferase; M.O.I. .about.1e3. The samples were Week 4 plasma from 3 animals each group of the same study as FIG. 8C.

[0038] FIG. 8E provides the vector genome copy distribution from the mice of FIG. 8C.

[0039] FIG. 9 provides a map of pAAV.DE.0.

[0040] FIG. 10 provides a map of pAAV.DE.1.

[0041] FIG. 11 provides a map of pAAV.DE.1.HVR.I.

[0042] FIG. 12 provides a map of pAAV.DE.1.HVR.IV.

[0043] FIG. 13A is a graph showing human F9 expression (ng/mL) in mice (5 mice/group) injected with AAV.TBG.human F9 at 1e10 gc/mouse, i.v. Plasma was collected 1, 2 and 4 weeks after treatment.

[0044] FIG. 13B is a graph showing neutralizing antibody titer against AAV8 at week 4 in the mice of FIG. 13A. Huh7 cells were used with AAV8.CB7.Luciferase at a final concentration of 1e9 gc/mL. The average of each group is indicated.

[0045] FIG. 14 provides a map of pAAVinvivo.

[0046] FIG. 15 are photographs of male B6 mice, 3 mice/group, injected i.m. with 3e9 or 3e10 gc/mouse, 1 leg/mouse with AAV3G1.tMCK.PI.ffluc.bGH, dd-PCR(PK). Week 1 results are shown. For each figure, the left is AAV8-treated, the right AAV3G1.

DETAILED DESCRIPTION OF THE INVENTION

[0047] Adeno-associated virus (AAV)-based gene therapy is showing increasing promise, stimulated by encouraging results from clinical trials in recent years. Until now, AAV vectors utilizing the capsid have shown a tremendous potential for in vivo gene delivery with nearly complete transduction of many tissues in rodents after intravascular infusion. Thus, AAV8 is a logical starting point for designing improved vectors. To advance the platform, provided herein are AAV8 mutants having increased resistance to neutralizing antibodies, yield, expression, or transduction. The methods are directed to use of the AAV to target various tissues and treat various conditions.

[0048] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application. The following definitions are provided for clarity only and are not intended to limit the claimed invention. As used herein, the terms "a" or "an", refers to one or more, for example, "an ocular cell" is understood to represent one or more ocular cells. As such, the terms "a" (or "an"), "one or more," and "at least one" are used interchangeably herein. As used herein, the term "about" means a variability of 10% from the reference given, unless otherwise specified. While various embodiments in the specification are presented using "comprising" language, under other circumstances, a related embodiment is also intended to be interpreted and described using "consisting of" or "consisting essentially of" language.

[0049] With regard to the following description, it is intended that each of the compositions herein described, is useful, in another embodiment, in the methods of the invention. In addition, it is also intended that each of the compositions herein described as useful in the methods, is, in another embodiment, itself an embodiment of the invention.

[0050] As used herein, the term "target tissue" can refer to any cell or tissue which is intended to be transduced by the subject AAV vector. The term may refer to any one or more of muscle, liver, lung, airway epithelium, neurons, eye (ocular cells), or heart. In one embodiment, the target tissue is liver. In another embodiment, the target tissue is the eye.

[0051] As used herein, the term "ocular cells" refers to any cell in, or associated with the function of, the eye. The term may refer to any one or more of photoreceptor cells, including rod, cone and photosensitive ganglion cells, retinal pigment epithelium (RPE) cells, Mueller cells, bipolar cells, horizontal cells, amacrine cells. In one embodiment, the ocular cells are bipolar cells. In another embodiment, the ocular cells are horizontal cells. In another embodiment, the ocular cells are ganglion cells.

[0052] As used herein, the term "mammalian subject" or "subject" includes any mammal in need of the methods of treatment described herein or prophylaxis, including particularly humans. Other mammals in need of such treatment or prophylaxis include dogs, cats, or other domesticated animals, horses, livestock, laboratory animals, including non-human primates, etc. The subject may be male or female.

[0053] As used herein, the term "host cell" may refer to the packaging cell line in which the rAAV is produced from the plasmid. In the alternative, the term "host cell" may refer to the target cell in which expression of the transgene is desired.

A. THE AAV CAPSID

[0054] A recombinant AAV capsid protein as described herein is characterized by a variable protein 3 (vp3) having a mutation in at least one of the following regions, as compared to the native full length (vp1) AAV8 capsid sequence (SEQ ID NO: 34): i. aa 263 to 267 (SEQ ID NO: 78); ii. aa 457 to aa 459; iii. aa 455 to aa 459 (SEQ ID NO: 81); or iv. aa 583 to aa 597 (SEQ ID NO: 69). An AAV having such a capsid has increased transduction in a target tissue as compared to AAV8. Also encompassed by the invention are nucleic acid sequences encoding the novel AAV, capsids, and fragments thereof which are described herein.

[0055] As used herein, the term "native" refers to the native AAV sequence without mutation in i. aa 263 to 267; ii. aa 457 to aa 459; iii. aa 455 to aa 459; or iv. aa 583 to aa 597 (using AAV8 numbering) of the capsid protein. However it is not intended that only naturally occurring AAV8 be the source of the wild type sequence. Useful herein are non-naturally occurring AAV, including, without limitation, recombinant, modified or altered, shuffled, chimeric, hybrid, evolved, synthetic, artificial, etc., AAV. This includes AAV with mutations in regions of the capsid other than in i. aa 263 to 267; ii. aa 457 to aa 459; iii. aa 455 to aa 459; or iv. aa 583 to aa 597 (using AAV8 numbering), provided they are used as the "starting sequence" for generating the mutant capsid described herein.

[0056] The AAV capsid consists of three overlapping coding sequences, which vary in length due to alternative start codon usage. These variable proteins are referred to as VP1, VP2 and VP3, with VP1 being the longest and VP3 being the shortest. The AAV particle consists of all three capsid proteins at a ratio of .about.1:1:10 (VP1:VP2:VP3). VP3, which is comprised in VP1 and VP2 at the N-terminus, is the main structural component that builds the particle. The capsid protein can be referred to using several different numbering systems. For convenience, as used herein, the AAV sequences are referred to using VP1 numbering, which starts with aa 1 for the first residue of VP1. However, the capsid proteins described herein include VP1, VP2 and VP3 (used interchangeably herein with vp1, vp2 and vp3) with mutations in the corresponding region of the protein. In AAV8, the variable proteins correspond to VP1 (aa 1 to 738), VP2 (aa 138 to 738), and VP3 (aa 204 to 738) using the numbering of the full length VP1. The amino acid sequence of native AAV8 vp1 is shown in SEQ ID NO: 34.

[0057] The AAV capsid contains 9 hypervariable regions (HVR) which show the most sequence divergence throughout AAV isolates. See, Govindasamy et al, J Virol. 2006 December; 80(23):11556-70. Epub 2006 Sep. 13, which is incorporated herein by reference. Thus, when rationally designing new vectors, the HVRs are a rich target. In one embodiment, the AAV capsid has a mutation in the HVRVIII region. In one embodiment, an AAV capsid is provided which has a mutation in aa 583-aa597 as compared to the AAV8 native sequence. In one embodiment, the AAV capsid has an aa 583-597 sequence as shown below in Table 1. Encompassed herein are capsid proteins and rAAV having capsid proteins having vp1, vp2 and/or vp3 sequences which include one of the amino acid sequences shown in Table 1.

TABLE-US-00001 TABLE 1 capsid mutations SEQ ID NO CONTAINING AA583-597 MUTATION aa593 to aa597 Mutation 2 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >GDNLQLYNTAPGSVF (SEQ ID NO: 70) 4 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >SDNLQFRNTAPLWSS (SEQ ID NO: 71) 6 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >NDNLQVCNTAPDDVM (SEQ ID NO: 72) 8 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >CDNLQGYNTAPLCVA (SEQ ID NO: 73) 10 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >VDNLQFLNTAPAGEA (SEQ ID NO: 74) 12 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >LDNLQDGNTAPGACG (SEQ ID NO: 75) 14 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >WDNLQSENTAPSETS (SEQ ID NO: 76) 16 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >SDNLQSCNTAPFAGA (SEQ ID NO: 77) 18 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69) -- >GDNLQLYNTAPGSVF (SEQ ID NO: 70)

[0058] Additional mutations were made at the HVR.1 and HVR.IV regions. Thus, in one embodiment, the AAV capsid has a mutation in aa263 to aa267. In one embodiment, the AAV capsid has the mutation 263NGTSG267 (SEQ ID NO: 78)->SGTH (SEQ ID NO: 79). In another embodiment, the AAV capsid has the mutation 263NGTSG267 (SEQ ID NO: 78)->SDTH (SEQ ID NO: 80). Encompassed herein are capsid proteins and rAAV having capsid proteins having vp1, vp2 and/or vp3 sequences which include one of the amino acid sequences of SEQ ID NO: 79 or SEQ ID NO 80.

[0059] In one embodiment, the AAV capsid has a mutation in aa457 to aa459. In another embodiment, the AAV capsid has a mutation in aa455 to aa459. In one embodiment, the AAV capsid has the mutation 457TAN459->SRP. In one embodiment, the AAV capsid has the mutation 455GGTAN459 (SEQ ID NO: 81)->DGSGL (SEQ ID NO: 82). Encompassed herein are capsid proteins and rAAV having capsid proteins having vp1, vp2 and/or vp3 sequences which include one of the amino acid sequences of SEQ ID NO: 79 or SEQ ID NO 80.

[0060] In another embodiment, the vp1/vp2 unique regions of the AAV8 capsid (or other AAV capsid described herein) can be replaced with the vp1/vp2 regions from a different capsid. In one embodiment, the vp1/vp2 unique regions are replaced with the vp1/vp2 unique region of rh.20. In AAV8, the vp2 starts at amino acid 138, and the vp3 starts at amino acid 204, using AAV8 vp1 numbering. Thus, in one embodiment, the vp1/2 region of AAV8 (amino acids 1 to 203) is swapped for the corresponding portion (vp1/2) of another capsid. The vp1/2 regions in the swapped capsids may be of the same or different amino acid lengths. For example, in AAVrh.20, the vp1/2 region spans amino acids 1 to 202 of that sequence (SEQ ID NO: 88). See, Limberis et al, Mol Ther. 2009 February; 17(2): 294-301 (which is incorporated herein by reference). In another embodiment, the vp1/vp2 unique regions are replaced the vp1/vp2 unique region of AAV1, 6, 9, rh.8, rh.10, rh.20, hu.37, rh.2R, rh.43, rh.46, rh.64R1, hu.48R3, or cy.5R4. The vp1/2 regions can be readily determined based on alignments available in the art. See, e.g., WO 2006/110689, which is incorporated herein by reference.

[0061] The AAV capsid vp1 ORF includes a second ORF, which encodes the AAV assembly-activating protein (AAP). The AAP coding sequence of ORF2 initiates prior to the VP3 coding sequence. The AAV8 AAP native coding sequence is shown in SEQ ID NO: 35. The native AAP amino acid sequence is shown in SEQ ID NO: 36. In one embodiment, the AAV VP1 ORF is mutated to result in an alternative AAP amino acid sequence. Thus, in one embodiment, the AAV vp1 nucleic acid sequence shares at least 95% identity with the native AAV8 coding sequence. In another embodiment, the AAV vp1 nucleic acid sequence includes the ORF2 (AAP coding sequence) shown in SEQ ID NO: 37. In another embodiment, the AAV AAP amino acid sequence is shown in SEQ ID NO: 38. See, Sonntag et al, A viral assembly factor promotes AAV2 capsid formation in the nucleolus, Proc Natl Acad Sci USA. 2010 Jun. 1; 107(22): 10220-10225, which is incorporated herein by reference.

[0062] As shown in the examples below, the inventors have shown that the AAV termed AAV3G1 (also sometimes called AAV8.Triple or Triple) effectively transduces liver, muscle and airway epithelium. In fact, AAV3G1 shows about a 10 fold increase in transduction as compared to native AAV8, both i.m. and i.v., with various transgene cassettes such as CB7.CI.ffluciferase, CMV.LacZ and tMCK.human F9. A further recognized benefit of the AAV3G1 mutant is that it shows resistance to various antisera of monkey and human, as well as human IVIG (at levels 2 to 4 fold that of AAV8, with respect to human IVIG). Further, intranasal administration of AAV3G1 resulted in a transduction efficiency of airway epithelium 2 to 3 fold greater than that of AAV8. Thus, in one embodiment, the AAV capsid has a sequence of AAV3G1, as shown in SEQ ID NO: 18.

[0063] As shown in the examples below, the AAV termed AAV8.T20 transduces airway epithelium at levels approximately 10 fold greater than AAV8. Thus, in one embodiment, the AAV capsid has a sequence of AAV8.T20, as shown in SEQ ID NO: 20.

[0064] As shown in the examples below, the AAV termed AAV8.TR1 effectively transduces liver. Thus, in one embodiment, the AAV capsid has a sequence of AAV8.TR1, as shown in SEQ ID NO: 22.

[0065] In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 2. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 4. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 6. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 8. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 10. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 12. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 14. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 16. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 18. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 20. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 22. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 24. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 26. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 28. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 30. In another embodiment, an AAV capsid is provided which has the sequence shown in SEQ ID NO: 32. In another embodiment, the AAV capsid has a vp1, vp2 or vp3 protein as shown in any of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 or 32 (which show the vp1 sequences).

[0066] In another aspect, nucleic acid sequences encoding the AAV viruses, capsids and fragments described herein are provided. Thus, in one embodiment, a nucleic acid encoding SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 or 32 is provided. In one embodiment, a nucleic acid encoding the AAV3G1 capsid (SEQ ID NO: 18) is provided. In another embodiment, a nucleic acid encoding the AAV8.T20 capsid (SEQ ID NO: 20) is provided. In another embodiment, a nucleic acid encoding the AAV8.TR1 capsid (SEQ ID NO: 22) is provided. In one embodiment, the nucleic acid sequence encoding AAV3G1 is shown in SEQ ID NO: 17. In one embodiment, the nucleic acid sequence encoding AAV8.T20 is shown in SEQ ID NO: 19. In one embodiment, the nucleic acid sequence encoding AAV8.TR1 is shown in SEQ ID NO: 21. In another embodiment, the nucleic acid sequence encoding the capsid is shown in SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 or 31, or a sequence sharing at least 80% identity with any of these sequences. In another embodiment, the nucleic acid molecular also encodes a functional AAV rep protein.

B. rAAV Vectors and Compositions

[0067] In another aspect, described herein are molecules which utilize the AAV capsid sequences described herein, including fragments thereof, for production of viral vectors useful in delivery of a heterologous gene or other nucleic acid sequences to a target cell. In one embodiment, the vectors useful in compositions and methods described herein contain, at a minimum, sequences encoding a selected AAV capsid as described herein, e.g., an AAV3G1, AAV8.T20 or AAV.TR1 capsid, or a fragment thereof. In another embodiment, useful vectors contain, at a minimum, sequences encoding a selected AAV serotype rep protein, e.g., AAV8 rep protein, or a fragment thereof. Optionally, such vectors may contain both AAV cap and rep proteins. In vectors in which both AAV rep and cap are provided, the AAV rep and AAV cap sequences can both be of one serotype origin, e.g., all AAV8 origin. Alternatively, vectors may be used in which the rep sequences are from an AAV which differs from the wild type AAV providing the cap sequences. In one embodiment, the rep and cap sequences are expressed from separate sources (e.g., separate vectors, or a host cell and a vector). In another embodiment, these rep sequences are fused in frame to cap sequences of a different AAV serotype to form a chimeric AAV vector, such as AAV2/8 described in U.S. Pat. No. 7,282,199, which is incorporated by reference herein. Optionally, the vectors further contain a minigene comprising a selected transgene which is flanked by AAV 5' ITR and AAV 3' ITR. In another embodiment, the AAV is a self-complementary AAV (sc-AAV) (See, US 2012/0141422 which is incorporated herein by reference). Self-complementary vectors package an inverted repeat genome that can fold into dsDNA without the requirement for DNA synthesis or base-pairing between multiple vector genomes. Because scAAV have no need to convert the single-stranded DNA (ssDNA) genome into double-stranded DNA (dsDNA) prior to expression, they are more efficient vectors. However, the trade-off for this efficiency is the loss of half the coding capacity of the vector, ScAAV are useful for small protein-coding genes (up to .about.55 kd) and any currently available RNA-based therapy.

[0068] In one aspect, the vectors described herein contain nucleic acid sequences encoding an intact AAV capsid as described herein. In one embodiment, the capsid comprises amino acids 1 to 738 of SEQ ID NO: 18, 20 or 22. In another embodiment, the AAV has a recombinant AAV capsid comprising a mutation in at least one of the following regions, as compared to native AAV8 (SEQ ID NO: 34): i. aa 263 to 267 (SEQ ID NO: 78); ii. aa 457 to aa 459; iii. aa 455 to aa 459 (SEQ ID NO: 81); or iv. aa 583 to aa 597 (SEQ ID NO: 69). In one embodiment, the AAV has increased transduction in a target tissue as compared to AAV8. In one embodiment, the AAV has a mutation which comprises 263NGTSG267 (SEQ ID NO: 78)->SGTH (SEQ ID NO: 79) or 263NGTSG267 (SEQ ID NO: 78)->SDTH (SEQ ID NO: 80). In another embodiment, the AAV has a mutation which comprises 457TAN459->SRP or 455GGTAN459 (SEQ ID NO: 81)->DGSGL (SEQ ID NO: 82). In yet another embodiment, the AAV has a mutation which comprises 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69)->GDNLQLYNTAPGSVF (SEQ ID NO: 70). In another embodiment, the AAV has the following mutations: 263NGTSG267 (SEQ ID NO: 78)->SGTH (SEQ ID NO: 79), 457TAN459->SRP, and 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69)->GDNLQLYNTAPGSVF (SEQ ID NO: 70).

[0069] In another embodiment, the AAV has a capsid protein in which the VP1/VP2 unique regions have been replaced with the VP1/VP2 unique regions from a capsid different than AAV8. In one embodiment, the VP1/VP2 unique regions are from AAVrh.20. In one embodiment, the rh.20 vp1 sequence is SEQ ID NO: 88.

[0070] Pseudotyped vectors, wherein the capsid of one AAV is replaced with a heterologous capsid protein, are useful herein. For illustrative purposes, AAV vectors utilizing the AAV8 mutant capsids described herein, with AAV2 ITRs are used in the examples described below. See, Mussolino et al, cited above. Unless otherwise specified, the AAV ITRs, and other selected AAV components described herein, may be individually selected from among any AAV serotype, including, without limitation, AAV1, AAV2, AAV3, AAV4, AAVS, AAV6, AAV7, AAV8, AAV9 or other known and unknown AAV serotypes. In one desirable embodiment, the ITRs of AAV serotype 2 are used. However, ITRs from other suitable serotypes may be selected. These ITRs or other AAV components may be readily isolated using techniques available to those of skill in the art from an AAV serotype. Such AAV may be isolated or obtained from academic, commercial, or public sources (e.g., the American Type Culture Collection, Manassas, Va.). Alternatively, the AAV sequences may be obtained through synthetic or other suitable means by reference to published sequences such as are available in the literature or in databases such as, e.g., GenBank, PubMed, or the like. In one embodiment, the AAV comprises the sequence of SEQ ID NO: 17, which corresponds to the full length DNA coding sequence of AAV3G1. In another embodiment, the AAV comprises the sequence of SEQ ID NO: 19, which corresponds to the full length DNA sequence of AAV8.T20. In another embodiment, the AAV comprises the sequence of SEQ ID NO: 21, which corresponds to the full length DNA sequence of AAV8.TR1.

[0071] The rAAV described herein also comprise a minigene. The minigene is composed of, at a minimum, a heterologous nucleic acid sequence (the transgene), as described below, and its regulatory sequences, and 5' and 3' AAV inverted terminal repeats (ITRs). It is this minigene which is packaged into a capsid protein and delivered to a selected target cell.

[0072] The transgene is a nucleic acid sequence, heterologous to the vector sequences flanking the transgene, which encodes a polypeptide, protein, or other product, of interest. The nucleic acid coding sequence is operatively linked to regulatory components in a manner which permits transgene transcription, translation, and/or expression in a target cell. The heterologous nucleic acid sequence (transgene) can be derived from any organism. The AAV may comprise one or more transgenes.

[0073] The composition of the transgene sequence will depend upon the use to which the resulting vector will be put. For example, one type of transgene sequence includes a reporter sequence, which upon expression produces a detectable signal. Such reporter sequences include, without limitation, DNA sequences encoding .beta.-lactamase, .beta.-galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), enhanced GFP (EGFP), chloramphenicol acetyltransferase (CAT), luciferase, membrane bound proteins including, for example, CD2, CD4, CD8, the influenza hemagglutinin protein, and others well known in the art, to which high affinity antibodies directed thereto exist or can be produced by conventional means, and fusion proteins comprising a membrane bound protein appropriately fused to an antigen tag domain from, among others, hemagglutinin or Myc.

[0074] These coding sequences, when associated with regulatory elements which drive their expression, provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence or other spectrographic assays, fluorescent activating cell sorting assays and immunological assays, including enzyme linked immunosorbent assay (ELISA), radioimmunoassay (MA) and immunohistochemistry. For example, where the marker sequence is the LacZ gene, the presence of the vector carrying the signal is detected by assays for beta-galactosidase activity. Where the transgene is green fluorescent protein or luciferase, the vector carrying the signal may be measured visually by color or light production in a luminometer.

[0075] However, desirably, the transgene is a non-marker sequence encoding a product which is useful in biology and medicine, such as proteins, peptides, RNA, enzymes, dominant negative mutants, or catalytic RNAs. Desirable RNA molecules include tRNA, dsRNA, ribosomal RNA, catalytic RNAs, siRNA, small hairpin RNA, trans-splicing RNA, and antisense RNAs. One example of a useful RNA sequence is a sequence which inhibits or extinguishes expression of a targeted nucleic acid sequence in the treated animal. Typically, suitable target sequences include oncologic targets and viral diseases. See, for examples of such targets the oncologic targets and viruses identified below in the section relating to immunogens.

[0076] The transgene may be used to correct or ameliorate gene deficiencies, which may include deficiencies in which normal genes are expressed at less than normal levels or deficiencies in which the functional gene product is not expressed. Alternatively, the transgene may provide a product to a cell which is not natively expressed in the cell type or in the host. A preferred type of transgene sequence encodes a therapeutic protein or polypeptide which is expressed in a host cell. The invention further includes using multiple transgenes. In certain situations, a different transgene may be used to encode each subunit of a protein, or to encode different peptides or proteins. This is desirable when the size of the DNA encoding the protein subunit is large, e.g., for an immunoglobulin, the platelet-derived growth factor, or a dystrophin protein. In order for the cell to produce the multi-subunit protein, a cell is infected with the recombinant virus containing each of the different subunits. Alternatively, different subunits of a protein may be encoded by the same transgene. In this case, a single transgene includes the DNA encoding each of the subunits, with the DNA for each subunit separated by an internal ribozyme entry site (IRES). This is desirable when the size of the DNA encoding each of the subunits is small, e.g., the total size of the DNA encoding the subunits and the IRES is less than five kilobases. As an alternative to an IRES, the DNA may be separated by sequences encoding a 2A peptide, which self-cleaves in a post-translational event. See, e.g., M. L. Donnelly, et al, J. Gen. Virol., 78(Pt 1):13-21 (January 1997); Furler, S., et al, Gene Ther., 8(11):864-873 (June 2001); Klump H., et al., Gene Ther., 8(10):811-817 (May 2001). This 2A peptide is significantly smaller than an IRES, making it well suited for use when space is a limiting factor. More often, when the transgene is large, consists of multi-subunits, or two transgenes are co-delivered, rAAV carrying the desired transgene(s) or subunits are co-administered to allow them to concatamerize in vivo to form a single vector genome. In such an embodiment, a first AAV may carry an expression cassette which expresses a single transgene and a second AAV may carry an expression cassette which expresses a different transgene for co-expression in the host cell. However, the selected transgene may encode any biologically active product or other product, e.g., a product desirable for study.

[0077] Useful therapeutic products encoded by the transgene include hormones and growth and differentiation factors including, without limitation, insulin, glucagon, growth hormone (GH), parathyroid hormone (PTH), growth hormone releasing factor (GRF), follicle stimulating hormone (FSH), luteinizing hormone (LH), human chorionic gonadotropin (hCG), vascular endothelial growth factor (VEGF), angiopoietins, angiostatin, granulocyte colony stimulating factor (GCSF), erythropoietin (EPO), connective tissue growth factor (CTGF), basic fibroblast growth factor (bFGF), acidic fibroblast growth factor (aFGF), epidermal growth factor (EGF), transforming growth factor .alpha. (TGF.alpha.), platelet-derived growth factor (PDGF), insulin growth factors I and II (IGF-I and IGF-II), any one of the transforming growth factor .beta. superfamily, including TGF .beta., activins, inhibins, or any of the bone morphogenic proteins (BMP) BMPs 1-15, any one of the heregluin/neuregulin/ARIA/neu differentiation factor (NDF) family of growth factors, nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), neurotrophins NT-3 and NT-4/5, ciliary neurotrophic factor (CNTF), glial cell line derived neurotrophic factor (GDNF), neurturin, agrin, any one of the family of semaphorins/collapsins, netrin-1 and netrin-2, hepatocyte growth factor (HGF), ephrins, noggin, sonic hedgehog and tyrosine hydroxylase.

[0078] Other useful transgene products include proteins that regulate the immune system including, without limitation, cytokines and lymphokines such as thrombopoietin (TPO), interleukins (IL) IL-1 through IL-25 (including, IL-2, IL-4, IL-12, and IL-18), monocyte chemoattractant protein, leukemia inhibitory factor, granulocyte-macrophage colony stimulating factor, Fas ligand, tumor necrosis factors .alpha. and .beta., interferons .alpha., .beta., and .gamma., stem cell factor, flk-2/flt3 ligand. Gene products produced by the immune system are also useful in the invention. These include, without limitations, immunoglobulins IgG, IgM, IgA, IgD and IgE, chimeric immunoglobulins, humanized antibodies, single chain antibodies, T cell receptors, chimeric T cell receptors, single chain T cell receptors, class I and class II MHC molecules, as well as engineered immunoglobulins and MHC molecules. Useful gene products also include complement regulatory proteins such as complement regulatory proteins, membrane cofactor protein (MCP), decay accelerating factor (DAF), CR1, CF2 and CD59.

[0079] Still other useful gene products include any one of the receptors for the hormones, growth factors, cytokines, lymphokines, regulatory proteins and immune system proteins. The invention encompasses receptors for cholesterol regulation, including the low density lipoprotein (LDL) receptor, high density lipoprotein (HDL) receptor, the very low density lipoprotein (VLDL) receptor, and the scavenger receptor. The invention also encompasses gene products such as members of the steroid hormone receptor superfamily including glucocorticoid receptors and estrogen receptors, Vitamin D receptors and other nuclear receptors. In addition, useful gene products include transcription factors such as jun, fos, max, mad, serum response factor (SRF), AP-1, AP2, myb, MyoD and myogenin, ETS-box containing proteins, TFE3, E2F, ATF1, ATF2, ATF3, ATF4, ZF5, NFAT, CREB, HNF-4, C/EBP, SP1, CCAAT-box binding proteins, interferon regulation factor (IRF-1), Wilms tumor protein, ETS-binding protein, STAT, GATA-box binding proteins, e.g., GATA-3, and the forkhead family of winged helix proteins.

[0080] Other useful gene products include, carbamoyl synthetase I, ornithine transcarbamylase, arginosuccinate synthetase, arginosuccinate lyase, arginase, fumarylacetacetate hydrolase, phenylalanine hydroxylase, alpha-1 antitrypsin, glucose-6-phosphatase, porphobilinogen deaminase, factor VIII, factor IX, cystathione beta-synthase, branched chain ketoacid decarboxylase, albumin, isovaleryl-coA dehydrogenase, propionyl CoA carboxylase, methyl malonyl CoA mutase, glutaryl CoA dehydrogenase, insulin, beta-glucosidase, pyruvate carboxylate, hepatic phosphorylase, phosphorylase kinase, glycine decarboxylase, H-protein, T-protein, a cystic fibrosis transmembrane regulator (CFTR) sequence, and a dystrophin cDNA sequence. Still other useful gene products include enzymes such as may be useful in enzyme replacement therapy, which is useful in a variety of conditions resulting from deficient activity of enzyme. For example, enzymes that contain mannose-6-phosphate may be utilized in therapies for lysosomal storage diseases (e.g., a suitable gene includes that encodes .beta.-glucuronidase (GUSB)).

[0081] Other useful gene products include non-naturally occurring polypeptides, such as chimeric or hybrid polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions. For example, single-chain engineered immunoglobulins could be useful in certain immunocompromised patients. Other types of non-naturally occurring gene sequences include antisense molecules and catalytic nucleic acids, such as ribozymes, which could be used to reduce overexpression of a target.

[0082] Reduction and/or modulation of expression of a gene is particularly desirable for treatment of hyperproliferative conditions characterized by hyperproliferating cells, as are cancers and psoriasis. Target polypeptides include those polypeptides which are produced exclusively or at higher levels in hyperproliferative cells as compared to normal cells. Target antigens include polypeptides encoded by oncogenes such as myb, myc, fyn, and the translocation gene bcr/abl, ras, src, P53, neu, trk and EGRF. In addition to oncogene products as target antigens, target polypeptides for anti-cancer treatments and protective regimens include variable regions of antibodies made by B cell lymphomas and variable regions of T cell receptors of T cell lymphomas which, in some embodiments, are also used as target antigens for autoimmune disease. Other tumor-associated polypeptides can be used as target polypeptides such as polypeptides which are found at higher levels in tumor cells including the polypeptide recognized by monoclonal antibody 17-1A and folate binding polypeptides.

[0083] Other suitable therapeutic polypeptides and proteins include those which may be useful for treating individuals suffering from autoimmune diseases and disorders by conferring a broad based protective immune response against targets that are associated with autoimmunity including cell receptors and cells which produce self-directed antibodies. T cell mediated autoimmune diseases include Rheumatoid arthritis (RA), multiple sclerosis (MS), Sjogren's syndrome, sarcoidosis, insulin dependent diabetes mellitus (IDDM), autoimmune thyroiditis, reactive arthritis, ankylosing spondylitis, scleroderma, polymyositis, dermatomyositis, psoriasis, vasculitis, Wegener's granulomatosis, Crohn's disease and ulcerative colitis. Each of these diseases is characterized by T cell receptors (TCRs) that bind to endogenous antigens and initiate the inflammatory cascade associated with autoimmune diseases.

[0084] Alternatively, or in addition, the vectors of the invention may contain AAV sequences of the invention and a transgene encoding a peptide, polypeptide or protein which induces an immune response to a selected immunogen. For example, immunogens may be selected from a variety of viral families. Example of desirable viral families against which an immune response would be desirable include, the picornavirus family, which includes the genera rhinoviruses, which are responsible for about 50% of cases of the common cold; the genera enteroviruses, which include polioviruses, coxsackieviruses, echoviruses, and human enteroviruses such as hepatitis A virus; and the genera apthoviruses, which are responsible for foot and mouth diseases, primarily in non-human animals. Within the picornavirus family of viruses, target antigens include the VP1, VP2, VP3, VP4, and VPG. Another viral family includes the calcivirus family, which encompasses the Norwalk group of viruses, which are an important causative agent of epidemic gastroenteritis. Still another viral family desirable for use in targeting antigens for inducing immune responses in humans and non-human animals is the togavirus family, which includes the genera alphavirus, which include Sindbis viruses, RossRiver virus, and Venezuelan, Eastern & Western Equine encephalitis, and rubivirus, including Rubella virus. The flaviviridae family includes dengue, yellow fever, Japanese encephalitis, St. Louis encephalitis and tick borne encephalitis viruses. Other target antigens may be generated from the Hepatitis C or the coronavirus family, which includes a number of non-human viruses such as infectious bronchitis virus (poultry), porcine transmissible gastroenteric virus (pig), porcine hemagglutinating encephalomyelitis virus (pig), feline infectious peritonitis virus (cats), feline enteric coronavirus (cat), canine coronavirus (dog), and human respiratory coronaviruses, which may cause the common cold and/or non-A, B or C hepatitis. Within the coronavirus family, target antigens include the E1 (also called M or matrix protein), E2 (also called S or Spike protein), E3 (also called HE or hemagglutin-elterose) glycoprotein (not present in all coronaviruses), or N (nucleocapsid). Still other antigens may be targeted against the rhabdovirus family, which includes the genera vesiculovirus (e.g., Vesicular Stomatitis Virus), and the general lyssavirus (e.g., rabies). Within the rhabdovirus family, suitable antigens may be derived from the G protein or the N protein. The family filoviridae, which includes hemorrhagic fever viruses such as Marburg and Ebola virus may be a suitable source of antigens. The paramyxovirus family includes parainfluenza Virus Type 1, parainfluenza Virus Type 3, bovine parainfluenza Virus Type 3, rubulavirus (mumps virus, parainfluenza Virus Type 2, parainfluenza virus Type 4, Newcastle disease virus (chickens), rinderpest, morbillivirus, which includes measles and canine distemper, and pneumovirus, which includes respiratory syncytial virus. The influenza virus is classified within the family orthomyxovirus and is a suitable source of antigen (e.g., the HA protein, the N1 protein). The bunyavirus family includes the genera bunyavirus (California encephalitis, La Crosse), phlebovirus (Rift Valley Fever), hantavirus (puremala is a hemahagin fever virus), nairovirus (Nairobi sheep disease) and various unassigned bungaviruses. The arenavirus family provides a source of antigens against LCM and Lassa fever virus. The reovirus family includes the genera reovirus, rotavirus (which causes acute gastroenteritis in children), orbiviruses, and cultivirus (Colorado Tick fever, Lebombo (humans), equine encephalosis, blue tongue).

[0085] The retrovirus family includes the sub-family oncorivirinal which encompasses such human and veterinary diseases as feline leukemia virus, HTLVI and HTLVII, lentivirinal (which includes human immunodeficiency virus (HIV), simian immunodeficiency virus (SIV), feline immunodeficiency virus (FIV), equine infectious anemia virus, and spumavirinal). Between the HIV and SIV, many suitable antigens have been described and can readily be selected. Examples of suitable HIV and SIV antigens include, without limitation the gag, pol, Vif, Vpx, VPR, Env, Tat and Rev proteins, as well as various fragments thereof. In addition, a variety of modifications to these antigens have been described. Suitable antigens for this purpose are known to those of skill in the art. For example, one may select a sequence encoding the gag, pol, Vif, and Vpr, Env, Tat and Rev, amongst other proteins. See, e.g., the modified gag protein which is described in U.S. Pat. No. 5,972,596. See, also, the HIV and SIV proteins described in D. H. Barouch et al, J. Virol., 75(5):2462-2467 (March 2001), and R. R. Amara, et al, Science, 292:69-74 (6 Apr. 2001). These proteins or subunits thereof may be delivered alone, or in combination via separate vectors or from a single vector.

[0086] The papovavirus family includes the sub-family polyomaviruses (BKU and JCU viruses) and the sub-family papillomavirus (associated with cancers or malignant progression of papilloma). The adenovirus family includes viruses (EX, AD7, ARD, O.B.) which cause respiratory disease and/or enteritis. The parvovirus family feline parvovirus (feline enteritis), feline panleucopeniavirus, canine parvovirus, and porcine parvovirus. The herpesvirus family includes the sub-family alphaherpesvirinae, which encompasses the genera simplexvirus (HSVI, HSVII), varicellovirus (pseudorabies, varicella zoster) and the sub-family betaherpesvirinae, which includes the genera cytomegalovirus (HCMV, muromegalovirus) and the sub-family gammaherpesvirinae, which includes the genera lymphocryptovirus, EBV (Burkitts lymphoma), infectious rhinotracheitis, Marek's disease virus, and rhadinovirus. The poxvirus family includes the sub-family chordopoxvirinae, which encompasses the genera orthopoxvirus (Variola (Smallpox) and Vaccinia (Cowpox)), parapoxvirus, avipoxvirus, capripoxvirus, leporipoxvirus, suipoxvirus, and the sub-family entomopoxvirinae. The hepadnavirus family includes the Hepatitis B virus. One unclassified virus which may be suitable source of antigens is the Hepatitis delta virus. Still other viral sources may include avian infectious bursal disease virus and porcine respiratory and reproductive syndrome virus. The alphavirus family includes equine arteritis virus and various Encephalitis viruses.

[0087] The present invention may also encompass immunogens which are useful to immunize a human or non-human animal against other pathogens including bacteria, fungi, parasitic microorganisms or multicellular parasites which infect human and non-human vertebrates, or from a cancer cell or tumor cell. Examples of bacterial pathogens include pathogenic gram-positive cocci include pneumococci; staphylococci; and streptococci. Pathogenic gram-negative cocci include meningococcus; gonococcus. Pathogenic enteric gram-negative bacilli include enterobacteriaceae; pseudomonas, acinetobacteria and eikenella; melioidosis; salmonella; shigella; haemophilus; moraxella; H. ducreyi (which causes chancroid); Brucella; Franisella tularensis (which causes tularemia); Yersinia (pasteurella); streptobacillus moniliformis and spirillum; Gram-positive bacilli include Listeria monocytogenes; erysipelothrix rhusiopathiae; Corynebacterium diphtheria (diphtheria); cholera; B. anthracis (anthrax); donovanosis (granuloma inguinale); and bartonellosis. Diseases caused by pathogenic anaerobic bacteria include tetanus; botulism; other clostridia; tuberculosis; leprosy; and other mycobacteria. Pathogenic spirochetal diseases include syphilis; treponematoses: yaws, pinta and endemic syphilis; and leptospirosis. Other infections caused by higher pathogen bacteria and pathogenic fungi include actinomycosis; nocardiosis; cryptococcosis, blastomycosis, histoplasmosis and coccidioidomycosis; candidiasis, aspergillosis, and mucormycosis; sporotrichosis; paracoccidiodomycosis, petriellidiosis, torulopsosis, mycetoma and chromomycosis; and dermatophytosis. Rickettsial infections include Typhus fever, Rocky Mountain spotted fever, Q fever, and Rickettsialpox. Examples of mycoplasma and chlamydial infections include: Mycoplasma pneumoniae; lymphogranuloma venereum; psittacosis; and perinatal chlamydial infections. Pathogenic eukaryotes encompass pathogenic protozoans and helminths and infections produced thereby include: amebiasis; malaria; leishmaniasis; trypanosomiasis; toxoplasmosis; Pneumocystis carinii; Trichans; Toxoplasma gondii; babesiosis; giardiasis; trichinosis; filariasis; schistosomiasis; nematodes; trematodes or flukes; and cestode (tapeworm) infections.

[0088] Many of these organisms and/or toxins produced thereby have been identified by the Centers for Disease Control [(CDC), Department of Health and Human Services, USA], as agents which have potential for use in biological attacks. For example, some of these biological agents, include, Bacillus anthracis (anthrax), Clostridium botulinum and its toxin (botulism), Yersinia pestis (plague), variola major (smallpox), Francisella tularensis (tularemia), and viral hemorrhagic fever, all of which are currently classified as Category A agents; Coxiella burnetti (Q fever); Brucella species (brucellosis), Burkholderia mallei (glanders), Ricinus communis and its toxin (ricin toxin), Clostridium perfringens and its toxin (epsilon toxin), Staphylococcus species and their toxins (enterotoxin B), all of which are currently classified as Category B agents; and Nipan virus and hantaviruses, which are currently classified as Category C agents. In addition, other organisms, which are so classified or differently classified, may be identified and/or used for such a purpose in the future. It will be readily understood that the viral vectors and other constructs described herein are useful to deliver antigens from these organisms, viruses, their toxins or other by-products, which will prevent and/or treat infection or other adverse reactions with these biological agents.

[0089] Administration of the vectors of the invention to deliver immunogens against the variable region of the T cells elicit an immune response including CTLs to eliminate those T cells. In rheumatoid arthritis (RA), several specific variable regions of T cell receptors (TCRs) which are involved in the disease have been characterized. These TCRs include V-3, V-14, V-17 and V.alpha.-17. Thus, delivery of a nucleic acid sequence that encodes at least one of these polypeptides will elicit an immune response that will target T cells involved in RA. In multiple sclerosis (MS), several specific variable regions of TCRs which are involved in the disease have been characterized. These TCRs include V-7 and V.alpha.-10. Thus, delivery of a nucleic acid sequence that encodes at least one of these polypeptides will elicit an immune response that will target T cells involved in MS. In scleroderma, several specific variable regions of TCRs which are involved in the disease have been characterized. These TCRs include V-6, V-8, V-14 and V.alpha.-16, V.alpha.-3C, V.alpha.-7, V.alpha.-14, V.alpha.-15, V.alpha.-16, V.alpha.-28 and V.alpha.-12. Thus, delivery of a nucleic acid molecule that encodes at least one of these polypeptides will elicit an immune response that will target T cells involved in scleroderma.

[0090] In one desirable embodiment, the transgene is selected to provide optogenetic therapy. In optogenetic therapy, artificial photoreceptors are constructed by gene delivery of light-activated channels or pumps to surviving cell types in the remaining retinal circuit. This is particularly useful for patients who have lost a significant amount of photoreceptor function, but whose bipolar cell circuitry to ganglion cells and optic nerve remains intact. In one embodiment, the heterologous nucleic acid sequence (transgene) is an opsin. The opsin sequence can be derived from any suitable single- or multicellular-organism, including human, algae and bacteria. In one embodiment, the opsin is rhodopsin, photopsin, L/M wavelength (red/green)-opsin, or short wavelength (S) opsin (blue). In another embodiment, the opsin is channelrhodopsin or halorhodopsin.

[0091] In another embodiment, the transgene is selected for use in gene augmentation therapy, i.e., to provide replacement copy of a gene that is missing or defective. In this embodiment, the transgene may be readily selected by one of skill in the art to provide the necessary replacement gene. In one embodiment, the missing/defective gene is related to an ocular disorder. In another embodiment, the transgene is NYX, GRM6, TRPM1L or GPR179 and the ocular disorder is Congenital Stationary Night Blindness. See, e.g., Zeitz et al, Am J Hum Genet. 2013 Jan. 10; 92(1):67-75. Epub 2012 Dec. 13 which is incorporated herein by reference. In another embodiment, the transgene is RPGR.

[0092] In another embodiment, the transgene is selected for use in gene suppression therapy, i.e., expression of one or more native genes is interrupted or suppressed at transcriptional or translational levels. This can be accomplished using short hairpin RNA (shRNA) or other techniques well known in the art. See, e.g., Sun et al, Int J Cancer. 2010 Feb. 1; 126(3):764-74 and O'Reilly M, et al. Am J Hum Genet. 2007 July; 81(1):127-35, which are incorporated herein by reference. In this embodiment, the transgene may be readily selected by one of skill in the art based upon the gene which is desired to be silenced.

[0093] In another embodiment, the transgene comprises more than one transgene. This may be accomplished using a single vector carrying two or more heterologous sequences, or using two or more AAV each carrying one or more heterologous sequences. In one embodiment, the AAV is used for gene suppression (or knockdown) and gene augmentation co-therapy. In knockdown/augmentation co-therapy, the defective copy of the gene of interest is silenced and a non-mutated copy is supplied. In one embodiment, this is accomplished using two or more co-administered vectors. See, Millington-Ward et al, Molecular Therapy, April 2011, 19(4):642-649 which is incorporated herein by reference. The transgenes may be readily selected by one of skill in the art based on the desired result.

[0094] In another embodiment, the transgene is selected for use in gene correction therapy. This may be accomplished using, e.g., a zinc-finger nuclease (ZFN)-induced DNA double-strand break in conjunction with an exogenous DNA donor substrate. See, e.g., Ellis et al, Gene Therapy (epub January 2012) 20:35-42 which is incorporated herein by reference. The transgenes may be readily selected by one of skill in the art based on the desired result.

[0095] In one embodiment, the capsids described herein are useful in the CRISPR-Cas dual vector system described in U.S. Provisional Patent Application Nos. 61/153,470, 62/183,825, 62/254,225 and 62/287,511, each of which is incorporated herein by reference. The capsids are also useful for delivery homing endonucleases or other meganucleases.

[0096] In another embodiment, the transgenes useful herein include reporter sequences, which upon expression produce a detectable signal. Such reporter sequences include, without limitation, DNA sequences encoding .beta.-lactamase, .beta.-galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), red fluorescent protein (RFP), chloramphenicol acetyltransferase (CAT), luciferase, membrane bound proteins including, for example, CD2, CD4, CD8, the influenza hemagglutinin protein, and others well known in the art, to which high affinity antibodies directed thereto exist or can be produced by conventional means, and fusion proteins comprising a membrane bound protein appropriately fused to an antigen tag domain from, among others, hemagglutinin or Myc.

[0097] These coding sequences, when associated with regulatory elements which drive their expression, provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence or other spectrographic assays, fluorescent activating cell sorting assays and immunological assays, including enzyme linked immunosorbent assay (ELISA), radioimmunoassay (MA) and immunohistochemistry. For example, where the marker sequence is the LacZ gene, the presence of the vector carrying the signal is detected by assays for beta-galactosidase activity. Where the transgene is green fluorescent protein or luciferase, the vector carrying the signal may be measured visually by color or light production in a luminometer.

[0098] Desirably, the transgene encodes a product which is useful in biology and medicine, such as proteins, peptides, RNA, enzymes, or catalytic RNAs. Desirable RNA molecules include shRNA, tRNA, dsRNA, ribosomal RNA, catalytic RNAs, and antisense RNAs. One example of a useful RNA sequence is a sequence which extinguishes expression of a targeted nucleic acid sequence in the treated animal.

[0099] The regulatory sequences include conventional control elements which are operably linked to the transgene in a manner which permits its transcription, translation and/or expression in a cell transfected with the vector or infected with the virus produced as described herein. As used herein, "operably linked" sequences include both expression control sequences that are contiguous with the gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest.

[0100] Expression control sequences include appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation (polyA) signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance secretion of the encoded product. A great number of expression control sequences, including promoters, are known in the art and may be utilized.

[0101] The regulatory sequences useful in the constructs provided herein may also contain an intron, desirably located between the promoter/enhancer sequence and the gene. One desirable intron sequence is derived from SV-40, and is a 100 bp mini-intron splice donor/splice acceptor referred to as SD-SA. Another suitable sequence includes the woodchuck hepatitis virus post-transcriptional element. (See, e.g., L. Wang and I. Verma, 1999 Proc. Natl. Acad. Sci., USA, 96:3906-3910). PolyA signals may be derived from many suitable species, including, without limitation SV-40, human and bovine.

[0102] Another regulatory component of the rAAV useful in the methods described herein is an internal ribosome entry site (IRES). An IRES sequence, or other suitable systems, may be used to produce more than one polypeptide from a single gene transcript. An IRES (or other suitable sequence) is used to produce a protein that contains more than one polypeptide chain or to express two different proteins from or within the same cell. An exemplary IRES is the poliovirus internal ribosome entry sequence, which supports transgene expression in photoreceptors, RPE and ganglion cells. Preferably, the IRES is located 3' to the transgene in the rAAV vector.

[0103] In one embodiment, the AAV comprises a promoter (or a functional fragment of a promoter). The selection of the promoter to be employed in the rAAV may be made from among a wide number of constitutive or inducible promoters that can express the selected transgene in the desired target cell. In one embodiment, the target cell is an ocular cell. The promoter may be derived from any species, including human. Desirably, in one embodiment, the promoter is "cell specific". The term "cell-specific" means that the particular promoter selected for the recombinant vector can direct expression of the selected transgene in a particular cell tissue. In one embodiment, the promoter is specific for expression of the transgene in muscle cells. In another embodiment, the promoter is specific for expression in lung. In another embodiment, the promoter is specific for expression of the transgene in liver cells. In another embodiment, the promoter is specific for expression of the transgene in airway epithelium. In another embodiment, the promoter is specific for expression of the transgene in neurons. In another embodiment, the promoter is specific for expression of the transgene in heart.

[0104] The expression cassette typically contains a promoter sequence as part of the expression control sequences, e.g., located between the selected 5' ITR sequence and the immunoglobulin construct coding sequence. In one embodiment, expression in liver is desirable. Thus, in one embodiment, a liver-specific promoter is used. Tissue specific promoters, constitutive promoters, regulatable promoters [see, e.g., WO 2011/126808 and WO 2013/04943], or a promoter responsive to physiologic cues may be used may be utilized in the vectors described herein. In another embodiment, expression in muscle is desirable. Thus, in one embodiment, a muscle-specific promoter is used. In one embodiment, the promoter is an MCK based promoter, such as the dMCK (509-bp) or tMCK (720-bp) promoters (see, e.g., Wang et al, Gene Ther. 2008 November; 15(22):1489-99. doi: 10.1038/gt.2008.104. Epub 2008 Jun. 19, which is incorporated herein by reference). Another useful promoter is the SPc5-12 promoter (see Rasowo et al, European Scientific Journal June 2014 edition vol. 10, No. 18, which is incorporated herein by reference). In one embodiment, the promoter is a CMV promoter. In another embodiment, the promoter is a TBG promoter. In another embodiment, a CB7 promoter is used. CB7 is a chicken .beta.-actin promoter with cytomegalovirus enhancer elements. Alternatively, other liver-specific promoters may be used [see, e.g., The Liver Specific Gene Promoter Database, Cold Spring Harbor, rulai.schl.edu/LSPD, alpha 1 anti-trypsin (A1AT); human albumin Miyatake et al., J. Virol., 71:5124 32 (1997), humAlb; and hepatitis B virus core promoter, Sandig et al., Gene Ther., 3:1002 9 (1996)]. TTR minimal enhancer/promoter, alpha-antitrypsin promoter, LSP (845 nt) 25 (requires intron-less scAAV).

[0105] The promoter(s) can be selected from different sources, e.g., human cytomegalovirus (CMV) immediate-early enhancer/promoter, the SV40 early enhancer/promoter, the JC polymovirus promoter, myelin basic protein (MBP) or glial fibrillary acidic protein (GFAP) promoters, herpes simplex virus (HSV-1) latency associated promoter (LAP), rouse sarcoma virus (RSV) long terminal repeat (LTR) promoter, neuron-specific promoter (NSE), platelet derived growth factor (PDGF) promoter, hSYN, melanin-concentrating hormone (MCH) promoter, CBA, matrix metalloprotein promoter (MPP), and the chicken beta-actin promoter.

[0106] The expression cassette may contain at least one enhancer, i.e., CMV enhancer. Still other enhancer elements may include, e.g., an apolipoprotein enhancer, a zebrafish enhancer, a GFAP enhancer element, and brain specific enhancers such as described in WO 2013/1555222, woodchuck post hepatitis post-transcriptional regulatory element. Additionally, or alternatively, other, e.g., the hybrid human cytomegalovirus (HCMV)-immediate early (IE)-PDGR promoter or other promoter--enhancer elements may be selected. Other enhancer sequences useful herein include the IRBP enhancer (Nicoud 2007, J Gene Med. 2007 December; 9(12):1015-23), immediate early cytomegalovirus enhancer, one derived from an immunoglobulin gene or SV40 enhancer, the cis-acting element identified in the mouse proximal promoter, etc.

[0107] In addition to a promoter, an expression cassette and/or a vector may contain other appropriate transcription initiation, termination, enhancer sequences, efficient RNA processing signals such as splicing and polyadenylation (polyA) signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance secretion of the encoded product. A variety of suitable polyA are known. In one example, the polyA is rabbit beta globin, such as the 127 bp rabbit beta-globin polyadenylation signal (GenBank #V00882.1). In other embodiments, an SV40 polyA signal is selected. Still other suitable polyA sequences may be selected. In certain embodiments, an intron is included. One suitable intron is a chicken beta-actin intron. In one embodiment, the intron is 875 bp (GenBank #X00182.1). In another embodiment, a chimeric intron available from Promega is used. However, other suitable introns may be selected. In one embodiment, spacers are included such that the vector genome is approximately the same size as the native AAV vector genome (e.g., between 4.1 and 5.2 kb). In one embodiment, spacers are included such that the vector genome is approximately 4.7 kb. See, Wu et al, Effect of Genome Size on AAV Vector Packaging, Mol Ther. 2010 January; 18(1): 80-86, which is incorporated herein by reference.

[0108] Selection of these and other common vector and regulatory elements are conventional and many such sequences are available. See, e.g., Sambrook et al, and references cited therein at, for example, pages 3.18-3.26 and 16.17-16.27 and Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1989. Of course, not all vectors and expression control sequences will function equally well to express all of the transgenes as described herein. However, one of skill in the art may make a selection among these, and other, expression control sequences without departing from the scope of this invention.

[0109] In another embodiment, a method of generating a recombinant adeno-associated virus is provided. A suitable recombinant adeno-associated virus (AAV) is generated by culturing a host cell which contains a nucleic acid sequence encoding an AAV capsid protein as described herein, or fragment thereof; a functional rep gene; a minigene composed of, at a minimum, AAV inverted terminal repeats (ITRs) and a heterologous nucleic acid sequence encoding a desirable transgene; and sufficient helper functions to permit packaging of the minigene into the AAV capsid protein. The components required to be cultured in the host cell to package an AAV minigene in an AAV capsid may be provided to the host cell in trans. Alternatively, any one or more of the required components (e.g., minigene, rep sequences, cap sequences, and/or helper functions) may be provided by a stable host cell which has been engineered to contain one or more of the required components using methods known to those of skill in the art.

[0110] Also provided herein are host cells transfected with an AAV as described herein. Most suitably, such a stable host cell will contain the required component(s) under the control of an inducible promoter. However, the required component(s) may be under the control of a constitutive promoter. Examples of suitable inducible and constitutive promoters are provided herein, in the discussion below of regulatory elements suitable for use with the transgene. In still another alternative, a selected stable host cell may contain selected component(s) under the control of a constitutive promoter and other selected component(s) under the control of one or more inducible promoters. For example, a stable host cell may be generated which is derived from 293 cells (which contain E1 helper functions under the control of a constitutive promoter), but which contains the rep and/or cap proteins under the control of inducible promoters. Still other stable host cells may be generated by one of skill in the art. In another embodiment, the host cell comprises a nucleic acid molecule as described herein.

[0111] The minigene, rep sequences, cap sequences, and helper functions required for producing the rAAV described herein may be delivered to the packaging host cell in the form of any genetic element which transfers the sequences carried thereon. The selected genetic element may be delivered by any suitable method, including those described herein. The methods used to construct any embodiment of this invention are known to those with skill in nucleic acid manipulation and include genetic engineering, recombinant engineering, and synthetic techniques. See, e.g., Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. Similarly, methods of generating rAAV virions are well known and the selection of a suitable method is not a limitation on the present invention. See, e.g., K. Fisher et al, 1993 J. Virol., 70:520-532 and U.S. Pat. No. 5,478,745, among others. These publications are incorporated by reference herein.

[0112] Also provided herein, are plasmids for use in producing the vectors described herein. Such plasmids are described in the Examples section.

C. PHARMACEUTICAL COMPOSITIONS AND ADMINISTRATION

[0113] In one embodiment, the recombinant AAV containing the desired transgene and cell-specific promoter for use in the target cells as detailed above is optionally assessed for contamination by conventional methods and then formulated into a pharmaceutical composition intended for administration to a subject in need thereof. Such formulation involves the use of a pharmaceutically and/or physiologically acceptable vehicle or carrier, such as buffered saline or other buffers, e.g., HEPES, to maintain pH at appropriate physiological levels, and, optionally, other medicinal agents, pharmaceutical agents, stabilizing agents, buffers, carriers, adjuvants, diluents, etc. For injection, the carrier will typically be a liquid. Exemplary physiologically acceptable carriers include sterile, pyrogen-free water and sterile, pyrogen-free, phosphate buffered saline. A variety of such known carriers are provided in U.S. Pat. No. 7,629,322, incorporated herein by reference. In one embodiment, the carrier is an isotonic sodium chloride solution. In another embodiment, the carrier is balanced salt solution. In one embodiment, the carrier includes tween. If the virus is to be stored long-term, it may be frozen in the presence of glycerol or Tween20. In another embodiment, the pharmaceutically acceptable carrier comprises a surfactant, such as perfluorooctane (Perfluoron liquid). The vector is formulated in a buffer/carrier suitable for infusion in human subjects. The buffer/carrier should include a component that prevents the rAAV from sticking to the infusion tubing but does not interfere with the rAAV binding activity in vivo.

[0114] In certain embodiments of the methods described herein, the pharmaceutical composition described above is administered to the subject intramuscularly. In other embodiments, the pharmaceutical composition is administered by intravenously. Other forms of administration that may be useful in the methods described herein include, but are not limited to, direct delivery to a desired organ (e.g., the eye), including subretinal or intravitreal delivery, oral, inhalation, intranasal, intratracheal, intravenous, intramuscular, subcutaneous, intradermal, and other parental routes of administration. Routes of administration may be combined, if desired.

[0115] Furthermore, in certain embodiments it is desirable to perform certain examinations prior to vector administration to identify areas requiring cells to be targeted for therapy. In one embodiment, where delivery to the eye is desired, non-invasive retinal imaging and functional studies to identify areas of specific ocular cells to be targeted for therapy. See, e.g., WO 2014/124282, which is incorporated herein by reference. See also, International Patent Application No. PCT/US2013/022628 which is incorporated herein by reference.

[0116] The composition may be delivered in a volume of from about 0.1 .mu.L to about 10 mL, including all numbers within the range, depending on the size of the area to be treated, the viral titer used, the route of administration, and the desired effect of the method. In one embodiment, the volume is about 50 .mu.L. In another embodiment, the volume is about 70 .mu.L. In another embodiment, the volume is about 100 .mu.L. In another embodiment, the volume is about 125 .mu.L. In another embodiment, the volume is about 150 .mu.L. In another embodiment, the volume is about 175 .mu.L. In yet another embodiment, the volume is about 200 .mu.L. In another embodiment, the volume is about 250 .mu.L. In another embodiment, the volume is about 300 .mu.L. In another embodiment, the volume is about 450 .mu.L. In another embodiment, the volume is about 500 .mu.L. In another embodiment, the volume is about 600 .mu.L. In another embodiment, the volume is about 750 .mu.L. In another embodiment, the volume is about 850 .mu.L. In another embodiment, the volume is about 1000 .mu.L. In another embodiment, the volume is about 1.5 mL. In another embodiment, the volume is about 2 mL. In another embodiment, the volume is about 2.5 mL. In another embodiment, the volume is about 3 mL. In another embodiment, the volume is about 3.5 mL. In another embodiment, the volume is about 4 mL. In another embodiment, the volume is about 5 mL. In another embodiment, the volume is about 5.5 mL. In another embodiment, the volume is about 6 mL. In another embodiment, the volume is about 6.5 mL. In another embodiment, the volume is about 7 mL. In another embodiment, the volume is about 8 mL. In another embodiment, the volume is about 8.5 mL. In another embodiment, the volume is about 9 mL. In another embodiment, the volume is about 9.5 mL. In another embodiment, the volume is about 10 mL.

[0117] An effective concentration of a recombinant adeno-associated virus carrying a nucleic acid sequence encoding the desired transgene under the control of the regulatory sequences desirably ranges from about 10.sup.7 and 10.sup.14 vector genomes per milliliter (vg/mL) (also called genome copies/mL (GC/mL)). In one embodiment, the rAAV vector genomes are measured by real-time PCR. In another embodiment, the rAAV vector genomes are measured by digital PCR. See, Lock et al, Absolute determination of single-stranded and self-complementary adeno-associated viral vector genome titers by droplet digital PCR, Hum Gene Ther Methods. 2014 April; 25(2):115-25. doi: 10.1089/hgtb.2013.131. Epub 2014 Feb. 14, which are incorporated herein by reference. In another embodiment, the rAAV infectious units are measured as described in S.K. McLaughlin et al, 1988 J. Virol., 62:1963, which is incorporated herein by reference.

[0118] Preferably, the concentration is from about 1.5.times.10.sup.9 vg/mL to about 1.5.times.10.sup.13 vg/mL, and more preferably from about 1.5.times.10.sup.9 vg/mL to about 1.5.times.10.sup.11 vg/mL. In one embodiment, the effective concentration is about 1.4.times.10.sup.8 vg/mL. In one embodiment, the effective concentration is about 3.5.times.10.sup.10 vg/mL. In another embodiment, the effective concentration is about 5.6.times.10.sup.11 vg/mL. In another embodiment, the effective concentration is about 5.3.times.10.sup.12 vg/mL. In yet another embodiment, the effective concentration is about 1.5.times.10.sup.12 vg/mL. In another embodiment, the effective concentration is about 1.5.times.10.sup.13 vg/mL. All ranges described herein are inclusive of the endpoints.

[0119] In one embodiment, the dosage is from about 1.5.times.10.sup.9 vg/kg of body weight to about 1.5.times.10.sup.13 vg/kg, and more preferably from about 1.5.times.10.sup.9 vg/kg to about 1.5.times.10.sup.11 vg/kg. In one embodiment, the dosage is about 1.4.times.10.sup.8 vg/kg. In one embodiment, the dosage is about 3.5.times.10.sup.10 vg/kg. In another embodiment, the dosage is about 5.6.times.10.sup.11 vg/kg. In another embodiment, the dosage is about 5.3.times.10.sup.12 vg/kg. In yet another embodiment, the dosage is about 1.5.times.10.sup.12 vg/kg. In another embodiment, the dosage is about 1.5.times.10.sup.13 vg/kg. In another embodiment, the dosage is about 3.0.times.10.sup.13 vg/kg. In another embodiment, the dosage is about 1.0.times.10.sup.14 vg/kg. All ranges described herein are inclusive of the endpoints.

[0120] In one embodiment, the effective dosage (total genome copies delivered) is from about 10.sup.7 to 10.sup.13 vector genomes. In one embodiment, the total dosage is about 10.sup.8 genome copies. In one embodiment, the total dosage is about 10.sup.9 genome copies. In one embodiment, the total dosage is about 10.sup.10 genome copies. In one embodiment, the total dosage is about 10.sup.11 genome copies. In one embodiment, the total dosage is about 10.sup.12 genome copies. In one embodiment, the total dosage is about 10.sup.13 genome copies. In one embodiment, the total dosage is about 10.sup.14 genome copies. In one embodiment, the total dosage is about 10.sup.15 genome copies.

[0121] It is desirable that the lowest effective concentration of virus be utilized in order to reduce the risk of undesirable effects, such as toxicity. Still other dosages and administration volumes in these ranges may be selected by the attending physician, taking into account the physical state of the subject, preferably human, being treated, the age of the subject, the particular disorder and the degree to which the disorder, if progressive, has developed. Intravenous delivery, for example may require doses on the order of 1.5.times.10.sup.13 vg/kg.

D. METHODS

[0122] As discussed herein, the vectors comprising the AAV8 mutant capsids are capable of transducing target tissues at high levels. Thus, provided herein is a method of delivering a transgene to a liver cell. The method includes contacting the cell with an rAAV having the AAV3G1 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having the AAV8.TR1 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having any capsid described herein, wherein the rAAV comprises the transgene. In another aspect, the use of an rAAV having the AAV3G1 capsid is provided for delivering a transgene to liver. In another aspect, the use of an rAAV having the AAV8.TR1 capsid is provided for delivering a transgene to liver.

[0123] Also provided herein is a method of delivering a transgene to a muscle cell. The method includes contacting the cell with an rAAV having the AAV3G1 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having any capsid described herein, wherein the rAAV comprises the transgene. In another aspect, the use of an rAAV having the AAV3G1 capsid is provided for delivering a transgene to muscle.

[0124] Further, a method of delivering a transgene to the airway epithelium is provided. The method includes contacting the cell with an rAAV having the AAV3G1 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having the AAV8.T20 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having any capsid described herein, wherein the rAAV comprises the transgene. In another aspect, the use of an rAAV having the AAV3G1 capsid is provided for delivering a transgene to airway epithelium. In another aspect, the use of an rAAV having the AAV8.T20 capsid is provided for delivering a transgene to airway epithelium.

[0125] Further, a method of delivering a transgene to ocular cells is provided. The method includes contacting the cell with an rAAV having the AAV3G1 capsid, wherein said rAAV comprises the transgene. In another embodiment, the method includes contacting the cell with an rAAV having any capsid described herein, wherein the rAAV comprises the transgene. In another aspect, the use of an rAAV having the AAV3G1 capsid is provided for delivering a transgene to ocular cells.

[0126] As described in the examples below, in vitro, the AAV3G1 mutant showed resistance to various antisera of monkey and human, as well as human IVIG (at levels 2 to 4 fold that of AAV8, with respect to human IVIG). All three mutations contributed to the observed resistance. In mice, the liver transduction efficiency of AAV3G1 was reduced compared with AAV8, however its muscle transduction was higher than that of AAV8 by approximately 10 fold. In addition, AAV3G1 demonstrated a higher heparin affinity than AAV8. Interestingly, reducing the positive charges of the HVR.IV mutation decreased the vector's heparin affinity while liver transduction was partially restored. Similar to the trend observed in muscle, intranasal administration of AAV3G1 resulted in a transduction efficiency 2 to 3 fold greater than that of AAV8, which was further improved to levels approximately 10 fold greater than AAV8 by swapping the VP1 unique region of AAV3G1 with that of another AAV serotype. These findings are relevant to disease models where high-efficiency intramuscular, ocular or intranasal gene delivery and resistance to pre-existing neutralizing antibodies are desired.

[0127] As shown herein, the capsid described herein (e.g., the AAV3G1, AAVT20 or AAVTR1 capsid) is, in one embodiment, able to evade neutralization by pre-existing neutralizing antibodies (NAbs) to AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 2 fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 3 fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 4 fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 5 fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 10 fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. In one embodiment, the rAAV having the capsid described shows at least about a 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 220, 240, 260 or greater fold increase in resistance to neutralization by an AAV8 neutralizing antibody as compared to native AAV8. Methods of assessing antibody neutralization are known in the art and described herein. See, e.g., Lochrie et al, J Virol., January 2006, 80(2):821-34, which is incorporated herein by reference. In one embodiment, the AAV8 neutralizing antibody is ADK8. See, Gurda et al, J. Virol, 2012 August; 86(15):7739-51. doi: 10.1128/JVI.00218-12. Epub 2012 May 16, which is incorporated herein by reference. In another embodiment, the AAV8 neutralizing antibody is ADK8/9.

[0128] This reduction in neutralization by an AAV8 antibody provides the advantage of escaping pre-existing AAV8 antibodies which may be present in the subject. This is useful in instances where an AAV8 vector was used in treating the subject for a certain condition, and a booster dosage is required or second treatment requiring use of an AAV vector.

[0129] Saturation mutagenesis was performed on the AAV8 hyper-variable region (HVR) VIII guided by antibody-capsid structure information. It was demonstrated that the capsid mutants were capable of escaping AAV8 neutralizing antibodies and maintained liver transduction. Saturation mutagenesis was performed on HVR.I and HVR.IV regions, beginning with one of the capsid mutants described above--AAV8.C41--as the backbone, followed by three rounds of in vivo enrichment in mouse liver, resulting in an AAV8 mutant, termed AAV3G1 (also called AAV8.Triple or Triple). AAV3G1 showed resistance to various antisera of monkey and human, as well as human IVIG (at levels 2 to 4 fold that of AAV8, with respect to human IVIG). All the three mutations contributed to the observed resistance. Unexpectedly, AAV83G1 demonstrated decreased liver transduction efficiency of as compared to AAV8 native (.about.1/6.times.AAV8) while its muscle transduction was increased (.about.10.times.AAV8). AAV3G1 demonstrated a higher heparin affinity than AAV8. Reducing the positive charges of the HVR.IV and HVR.I mutation decreased the vector's heparin affinity accompanied by partially restored liver transduction (the resulting mutant is called AAV8.TR1). Intranasal administration of AAV3G1 resulted in a transduction efficiency 2 to 3 fold greater than that of AAV8. A new mutant, AAV8.T20, was created by swapping the VP1/2-unique region of one of the high transduction members, rh.20, into AAV3G1, resulting in AAV8.T20. I.e., amino acids 1-202 of AAVrh.20 (SEQ ID NO: 88) were swapped in for amino acids 1 to 203 of the AAV3G1 capsid. AAV8.T20's transduction was approximately 10 fold greater than AAV8 in mice by intranasal administration.

E. EXAMPLES

Example 1: Study Design

[0130] Several AAV8 mutants were generated c41, c42, c46, g110, g113, g115 and g117 with mutations in the HVR.VIII region. As discussed in Gurda et al, cited above, the major ADK8 epitope lies in the HVR.VIII region (amino acids 586 to 591 using AAV8 vp1 numbering). Those mutants were tested in vitro for ADK8 resistance and some of them were tested in vivo for ADK8 resistance. See, e.g., Lochrie 2006 cited above.

TABLE-US-00002 Name Amino acid sequence (583-597) AAV8 ADNLQQQNTAPQIGT; SEQ ID NO: 69 C41 GDNLQLYNTAPGSVF; SEQ ID NO: 70 C42 SDNLQFRNTAPLWSS; SEQ ID NO: 71 C46 NDNLQVCNTAPDDVM; SEQ ID NO: 72 G110 CDNLQGYNTAPLCVA; SEQ ID NO: 73 G112 VDNLQFLNTAPAGEA; SEQ ID NO: 74 G113 LDNLQDGNTAPGACG; SEQ ID NO: 75 G115 WDNLQSENTAPSETS; SEQ ID NO: 76 G117 SDNLQSCNTAPFAGA; SEQ ID NO: 77

[0131] The mutant c41 was picked as the backbone for further mutagenesis at HVR.I and HVR.IV region. Mutant c41 has the sequence shown in SEQ ID NO: 2 (DNA sequence shown in SEQ ID NO: 1). The c41 amino acid sequence is that of AAV8, with the following mutation in the HVR.VIII region: 583ADNLQQQNTAPQIGT597 (SEQ ID NO: 69)->GDNLQLYNTAPGSVF (SEQ ID NO: 70).

[0132] For HVR.I or HVR.IV mutagenesis, three rounds of in vivo selection were done. HVR.I mutation SGTH and HVR.IV mutation GGSRP were then incorporated into clone c41 backbone to generate AAV3G1. In vitro Nab tests show that AAV3G1 showed some degree of hIVIG resistance; all the three mutations (c41, SGTH and GGSRP) contribute to the resistance. AAV3G1 shows a higher muscle transduction than AAV8, both i.m. and i.v., with various transgene cassettes such as CB7.CI.ffluciferase, CMV.LacZ and tMCK.human F9.

[0133] AAV3G1 also shows higher transduction in murine airway epithelia cells than AAV8. By replacing the VP1/2 region with that of rh.20, the resulting mutant, AAV8.T20, shows transduction, .about.10 times of AAV8. In nasal administration to B6 mice, normalized to AAV8 (100%, CB7.CI.luciferase), AAV3G1 transduced at 375% while AAV8.T20 transduced at 988%.

[0134] AAV3G1 has heparin affinity higher than AAV8. A new mutant was designed to introduce negative-charged residues in HVR.I and HVR.IV (HVRI: SGTH.fwdarw.SDTH. HVR.IV: GGSRP is replaced by another mutation, DGSGL (SEQ ID NO: 82), showed up during the selection process. The resulting mutant, AAV8.TR1, shows decreased heparin affinity and its liver transduction was partially restored. As compared to AAV8 (100%, TBG.human F9), AAV3G1 transduces liver at 18%, while AAV.TR1 transduces at 52%.

Example 2: Materials and Methods

A. Plasmids for Library Construction.

[0135] 1. pAAV.DE.0

[0136] The plasmid pAAV.DE.0 was constructed by placing the following components between the two AAV ITRs-ZsGreen expression cassette, followed by CMV promoter, followed by fragment 1883-2207 of AAV2 genome (NC 001401), followed by restriction sites AarI and SpeI (for inserting AAV VP1 ORF). pAAV.DE.0 is shown in SEQ ID NO: 39 and FIG. 9.

[0137] 2. pAAV.DE.1

[0138] The plasmid pAAV.DE.1 was based on pAAV.DE.0 with modifications: 1) the NheI fragment was removed; 2) a rabbit beta-globin polyadenylation signal sequence was inserted between the 3' ITR and the SpeI restriction recognition site. pAAV.DE.1 is shown in SEQ ID NO: 40 and FIG. 10.

[0139] 3. pAAV.DE.1.HVR.I

[0140] The plasmid was based on pAAV.DE.1 with 1) the VP1 ORF of AAV8.c41 was inserted in pAAV.DE.1 between AarI and SpeI; 2) the two BsmBI restriction recognition sites were removed by silent mutagenesis; 3) a small DNA fragment carrying two BsmBI sites at its ends was inserted at HVR.I region of AAV8.c41 VP1 ORF to create a cloning site for HVR.I mutagenesis. pAAV.DE.1.HVRI is shown in SEQ ID NO: 41 and FIG. 11.

[0141] 4. pAAV.DE.1.HVR.IV

[0142] The plasmid was based on pAAV.DE.1 with 1) the VP1 ORF of AAV8.c41 was inserted in pAAV.DE.1 between AarI and SpeI; 2) the two BsmBI restriction recognition sites were removed by silent mutagenesis; 3) a small DNA fragment carrying two BsmBI sites at its ends was inserted at HVR.IV region of AAV8.c41 VP1 ORF to create a cloning site for HVR.IV mutagenesis. pAAV.DE.1.HVRIV is shown in SEQ ID NO: 42 and FIG. 12.

[0143] 5. pRep

[0144] The plasmid was based on pAAV2/8 plasmid (SEQ ID NO: 43). The plasmid pAAV2/8 was digested with AfeI, then partially digested with BbsI, end-polishing and then self-ligated.

B. Library Construction, Selection and the Generation of AAV3G1, AAV8.T20 and AAV8.TR1.

[0145] 1. HVR.VIII Library

[0146] Three PCRs were set up: PCR1: primer031 (SEQ ID NO: 49), primer032 (SEQ ID NO: 50) and primer009 (SEQ ID NO: 45); PCR2: primer016 (SEQ ID NO: 46) and primer030 (SEQ ID NO: 48), with the plasmid pAAV2/8 as template; PCR3: primer033 (SEQ ID NO: 49) and primer017 (SEQ ID NO: 47), with the plasmid pAAV2/8 as template. Primers shown in Table 2 below. The three PCR products were purified QIAquick PCR purification Kit (Qiagen), combined together, digested with BsmBI (New England Biolabs) and purified again, followed by ligation at 16.degree. C. with T4 DNA ligase (Roche). A 428-bp fragment was gel-extracted and ligated with the 6908-bp BsmBI fragment of pAAV2/8. The ligation product served as PCR template with primer.AAV8start and primer AAV8 END nd5R. The PCR product was purified, cloned into pAAV.DE.0 through AarI and SpeI and transformed into Stb14 (Invitrogen). Plasmid was extracted from the overnight culture of the transformation and it was the plasmid library of AAV8 HVR.VIII mutagenesis.

[0147] The plasmid library was mixed with helper plasmid (pAdAF6) and pRep, and then transfected into 293 cells by Calcium-phosphate method. Three days after transfection, cell lysate was harvest, re-suspended in DPBS and treated with Benzonase (Merck). The lysate was then spun down to remove debris. The supernatant was the AAV mutagenesis library and stored at -20.degree. C. for further uses. The titration was done with real-time PCR.

[0148] 1.times.10.sup.9 genome copies (gc) of the AAV mutagenesis library was mixed with 0.54, of ADK8 (AAV8 Nab titer--1:2560) and added up to 1 mL with complete medium. The mixture was incubated at 37.degree. C. for 30 min, and then applied to the 293 cells (MOI, 1.times.10.sup.4). Two days later, the cell was split at a ratio of 1:5. Two days later, the cells were transfected with the plasmid pAdAF6 and pRep. Two days later, RNA and genomic DNA were extracted from the cells as templates for RT-PCR or PCR. The PCR primers were primer016 (SEQ ID NO: 46) and primer017 (SEQ ID NO: 47). The PCR product was cloned into Topo vector (Invitrogen) and sequenced. AAV fragments were cut out from the Topo plasmids and cloned into pAAV2/8 at the BsmBI sites to make trans plasmids. Individual trans plasmids were packed into regular AAV vectors with pAAV.CMV.eGFP as the cis-plasmid for further analysis.

TABLE-US-00003 TABLE 2 Primer list Seq Name Sequence ID primer009 ctacagaggaatacggtatcgtgnnkgataact 45 tgcagnnknnkaacacggctcctnnknnknnkn nkgtcaacagccagggggccttac primer016 Tggaccggctgatgaatcct 46 primer017 Cggtgctgtattgcgtgatg 47 primer030 ggctcacgtctctgtagccacagggttagtggt 48 t primer031 cggacacgtctcgctacagaggaatacggtatc 49 gtg primer032 ggctcacgtctcggtaaggccccctggctg 50 primer033 cggacacgtctccttacccggtatggtctggca 51 gaa primer035 Cacgcagaatgaaggcacca 52 primer042 Cacgataccgtattcctctgtagccac 53 primer084 gctggtttagtgaaccgtcagatcctgcat 54 primer098 Aaggtgcgcgtggaccagaa 55 primer113 Acaggtactggtcaatcagagg 56 primer155 caaccacctctacaagcaaatctccnnknnknn 57 knnknnkggagccaccaacgacaacacctact primer156 agtaggtgttgtcgttggtggaccmnnmnnmnn 58 mnnmnnggagatttgcttgtagaggtggttg primer157 ctacttgtctcggactcaaacaacannknnknn 59 knnknnkacgcagactctgggcttcagccaa primer158 ttggctgaagcccagagtctgcgtmnnmnnmnn 60 mnnmnntgttgtttgagtccgagacaagtag primer159 gatttttggcaaacaaaatgctgccnnknnknn 61 knnknnktacagcgatgtcatgctcaccagcg primer160 cgctggtgagcatgacatcgctgtamnnmnnmn 62 nmnnmnnggcagcattttgtttgccaaaaatc primer175 cggtcacgtctcggtcatcaccaccagcacccg 63 aac primer200 gccagtcgtctccgttgtcgttggtggctcc 64 primer201 cggtcacgtctcg cctctgattgaccagtacc 65 tgtactacttgtctcggactcaa primer202 gccagtcgtaccgccattgtattaggcccacct 66 tggctgaagcccagagtc primer.AAV8 ttaccccacaggaagcacgccacctgcaaatca 67 start ggtatggctgccgatggttatcttc primer.AAV8 ctcgttactgccgtgtgggactagttacagatt 68 end acgggtgaggtaacgggtgcca

[0149] 2. In Vitro Nab Assay

[0150] 1.times.10.sup.9 gc of each AAV mutant carrying eGFP cassette was mixed with different monoclonal antibodies (ADK8, [Nab]AAV8=1:2560, 0.5 .mu.L/well; ADK8/9, [Nab]AAV8=1:2560, 0.5 .mu.L/well; ADK9, [Nab]AAV8=5, 0.5 .mu.L/well; No Ab: medium), up to 100 .mu.L with media, incubated at 37.degree. C. for 30 minutes and then applied to 293 cells (5.times.10.sup.4 cells/well seeded one day before infection in a 96-well plate). GFP expression was monitored and quantified with Image J. FIG. 2a.

[0151] 3. HVR.I and HVR.IV Libraries

[0152] Three rounds of selection were performed in vivo. For each round, the AAV libraries were injected into B6 mice, i.v., in the presence of pooled human IVIG (hIVIG).

[0153] For round 1, HVR.I:

[0154] Two fragments were made through PCR with pAAV2/8.c41 as the template and primer098 (SEQ ID NO: 55)+primer156 (SEQ ID NO: 58), primer155 (SEQ ID NO: 57)+primer as the primer sets, respectively. The two fragments were assembled together by PCR with primer098 (SEQ ID NO: 55)+primer.AAV8end (SEQ ID NO: 68). The resulting fragments were then cloned into pAAV.DE.1 through HindIII and SpeI sites as the plasmid libraries for the production of AAV libraries. The library production was similar to HVR.VIII library except that it was purified with iodixanol gradient, the same way as regular AAV vector.

For round 1, HVR.IV:

[0155] The process was very similar to HVR.I except that the primer sets were primer098 (SEQ ID NO: 55)+primer158 (SEQ ID NO: 60), primer157 (SEQ ID NO: 59)+primer.AAV8end (SEQ ID NO: 68).

[0156] The libraries were then injected into mice in the presence of human IVIG, i.v. Two weeks later, liver was harvested. Genomic DNA and RNA were extracted. AAV DNA fragments were retrieved through PCR and cloned into plasmids for new library production.

[0157] Round 2 and round 3 were similar to round 1, except that:

[0158] For HVR.I, primer175 (SEQ ID NO: 63) and primer200 (SEQ ID NO: 64) were used and the cloning vector was pAAV.DE.1.HVR.I; for HVR.IV, primer201 (SEQ ID NO: 65) and primer202 (SEQ ID NO: 66) were used and the cloning vector was pAAV.DE.1.HVR.IV.

[0159] After round 3, genomic DNA was extract from mouse liver, amplified through PCR and cloned into trans plasmid backbone for further analysis.

[0160] 4. The Generation of AAV3G1, AAV8.T20 and AAV8.TR1

[0161] The trans plasmid pAAV2/8.Triple was based on pAAV2/8.c41 (SEQ ID NO: 44), in which the HVR.I region was replaced by DNA coding SGTH and the HVR.IV region was replaced by DNA coding GGSRP.

[0162] The trans plasmid pAAV2/8.T20 was based on pAAV2/8.Triple, in which the VP12 region was replaced with the corresponding region of AAVrh.20.

[0163] The trans plasmid pAAV2/8.TR was based on pAAV2/8.Triple, in which the HVR.I region was replaced by DNA coding SDTH (SEQ ID NO: 80) and the HVR.IV region was replaced by DNA coding DGSGL (SEQ ID NO: 82).

[0164] 5. AAV Vector Production

[0165] AAV vectors were made according the method described by Lock, M, Alvira, M, Vandenberghe, L H, Samanta, A, Toelen, J, Debyser, Z, et al. (2010). Rapid, Simple, and Versatile Manufacturing of Recombinant Adeno-Associated Viral Vectors at Scale. Human Gene Therapy 21: 1259-1271.

[0166] 6. ELISA for Canine F9 and Human F9

[0167] The ELISA for measuring canine F9 was described by Wang, L L, Calcedo, R, Nichols, T C, Bellinger, D A, Dillow, A, Verma, I M, et al. (2005). Sustained correction of disease in naive and AAV2-pretreated hemophilia B dogs: AAV2/8-mediated, liver-directed gene therapy. Blood 105: 3079-3086 which is incorporated herein by reference. Briefly, AAV8 mutants were packed with TBG.canine F9-WPRE cassette and tested in B6 mice in the presence/absence of antibody ADK8 through i.v. injection. 100 uL of diluted ADK8 was injected i.v. 2 hours prior to vector injection. AAV8 was used as control. Canine F9 level was measured with ELISA from plasma collected 1 week after administration. The percent of F9 from ADK8-treated animal to ADK8-naive animal and p value (t-test) is shown in FIG. 2b.

[0168] A similar experiment was done using human F9. I.m. injection of AAV vectors carrying a third transgene cassette, tMCK.human F9, shows similar muscle preference of AAV3G1 in B6 mice. tMCK is a muscle-specific promoter. Dose was 3.times.10.sup.10 gc/mouse, n=3 mice/group. Plasma and muscle were collected 28 and 30 days after dosing, respectively. Human F9 was measured by ELISA from plasma and muscle lysate. The muscle F9 expression level after transduction with AAV3G1 was 11.2 folds higher than after transduction with AAV8. FIG. 5c. Measurement of the neutralizing antibody titer of the day 28 plasma shows that the antigenicity of AAV8 and AAV3G1 is different. FIG. 5d.

C. In Vitro Nab Assay, with Luciferase as the Reporter Gene

[0169] AAV8, AAV3G1 and mutants carrying all the combinations of the three mutations comprising AAV3G1 were tested in vitro with human plasmas (4 samples) and anti-AAV8 monkey sera (4 samples). Huh7 cells were seeded in 96-well black plates with clear bottom (Corning), 5.times.10.sup.4 cells/well. Two days later, AAV8 and the variants were diluted in complete medium and incubated with diluted sera/plasma (final anti-AAV8 Nab titer in the mix, 1:4) before being applied to Huh7 cells in 96-well plates. The mixture was incubated at 37.degree. C. for 30 minutes before being transferred to the Huh7 plates.

[0170] Luciferase expression was read 72 hours later and converted to the percentage of the expression level of each "vector alone" control. For each serum/plasma, a ranking number was assigned to each vector according to their residual expression (the ranking number of the highest residual expression was 1 and the lowest was 8). FIG. 4b. These data show that all the three mutations in AAV3G1 contribute to Nab resistance.

[0171] 1. Luciferase Assay, In Vivo

[0172] AAV8 or AAV3G1 carrying CB7.CI.luciferase cassette was administrated intramuscularly into C57BL6 mice at a dose of 3.times.10.sup.10 gc/mouse, 4 mice/group. Luciferase activity was monitored 2 weeks and 4 weeks after dosing. Through intramusclar injection, AAV3G1 prefers muscle to liver, compared to AAV8. FIG. 5a.

[0173] A second experiment was performed in which AAV8 and AAV3G1 vectors carrying a different transgene were administered i.m. in C57BL6 mice at a dose of 1.times.10.sup.9 gc/animal (5.times.10.sup.8 gc/25 uL/leg, both legs). Week 3 after vector injection, muscle section, X-gal staining, the best section of each group, is shown in FIG. 5b (4.times. magnification). These studies show that i.m. injection of AAV vectors carrying another transgene cassette shows similar muscle preference of AAV3G1 in B6 mice.

[0174] MPS 3A Het mice (C57BL6 background) received 5.times.10.sup.11 gc of AAV.CMV.Lac/mouse, i.v. Tissues were collected 14 days later. X-gal stained sections from heart, muscle and liver of mice received AAV8 or AAV3G1 vector were made (data not shown). These studies show that i.v. injection shows increased muscle preference in AAV8. Triple as compared to AAV8. Representative muscle sections of each animal at 4.times. are shown in FIG. 6a.

[0175] AAV8 and AAV3G1 were compared with CB7.CI.ffluciferase transgene cassette. B6 mice were injected, i.v., at a dose of 3.times.10.sup.11 gc/mouse. Two weeks after vector injection, luciferase was imaged. FIG. 6b. The left is AAV8; the right is AAV3G1.

[0176] AAV3G1 has a higher transduction to mouse airway epithelial cells and the transduction is improved further by replacing VP1/2 region with rh.20. B6 mice received 1.times.10.sup.11 gc/mouse of AAV.CB7.CI.luciferase, i.n. 4 mice received each vector. The luciferase activity was monitored 2, 3 and 4 week after vector administration. FIG. 7a, right panel, is a representative image (week 4) of the study. The left panel is quantification with Living Image.RTM. 3.2 and normalized by the average value of AAV8 group at week 2.

[0177] Airway epithelia cell transduction comparison of AAV8, AAV8.T20, AAV9 and AAV6.2. B6 mice received 1.times.10.sup.11 gc/mouse of AAV.CB7.CI.luciferase, i.n., 4 mice/vector. The luciferase activity was monitored 1, 2 and 3 weeks after vector administration. Living Image.RTM. 3.2 was used for quantification and normalized by the average value of AAV8 group at week 1. FIG. 7b.

[0178] Mice were anaesthetized. D-luciferin (Xenogen) was instilled into the mouse nostril at 15 ug/uL, 10 uL/nostril, 20 uL/mouse. Five minutes later, luminescent images were taken by IVIS.RTM. Imaging Systems (Xenogen) and quantified with the software Living Image.RTM. 3.2.

[0179] 2. Heparin Binding Assay

[0180] AAV vectors were diluted in desired buffers and loaded to vector-dilution-buffer-prebalanced HiTrap Heparin HP column (GE Healthcare Life Sciences) by AKTA.TM. FPLC System (GE). The column was then washed sequentially with vector dilution buffer and buffers with increasing amount of sodium chloride. Fractions were collected during the whole process. Dot blot protocol was described by Tenney, R M, Bell, C L, and Wilson, J M (2014). AAV8 capsid variable regions at the two-fold symmetry axis contribute to high liver transduction by mediating nuclear entry and capsid uncoating. Virology 454: 227-236, which is incorporated herein by reference. See FIGS. 8a-8d. Yield for each vector is shown below.

TABLE-US-00004 TABLE 3 Yield table (total gc of purified vector/cell stack. DIY) Transgene cassette AAV types CB7.CI.ffluciferase.RBG LSP. cF9.W TBG.hF9.W tMCK.hF9.W AAV8 4.93E+12 4.65E+13 4.47E+13 1.84E+13 2.07E+13 2.04E+13 2.10E+13 AAV8.C41 1.46E+13 1.69E+13 AAV8.C41.I-SGTH 3.64E+12 5.63E+12 6.64E+12 AAV8.C41.IV-GGSRP 1.40E+13 AAV8.G112 7.14E+12 AAV8.G113 1.93E+13 AAV8.G115 1.86E+13 AAV8.I-SGTH 1.78E+13 AAV8.IV-GGSRP 2.24E+13 AAV8.T20 5.60E+12 AAV8.TR1 4.64E+13 AAV3G1 3.95E+12 2.12E+13 1.63E+13 1.98E+13 8.43E+12 1.04E+13

Example 3: Detailed Studies

[0181] AAV mutant library preparation. A plasmid, termed pAAVinvivo, was used for the library preparation. The plasmid contains CMV promoter, partial Rep sequence (AAV2, NC 001401,1881-2202)18, AAV8 VP1 gene and rabbit beta globin (RBG) polyadenylation signal, flanked by two AAV ITRs (FIG. 14). The saturation mutagenesis was done with primers carrying NNK degenerate codons at the desired sites. Both NNS and NNK covers all 20 amino acids. For human codon usage, NNS is slightly higher than NNK (FIG. 15A); however, too many GCs may not be good for PCR and/or virus replication--the average GC % of NNS is 67% while NNK 50% (FIG. 15B). Taken together, NNK was chosen. Two helper plasmids, pAdAF6 (carrying adenovirus components) and pRep (carrying AAV Rep genes), and the plasmid library were transfected into HEK293 cells for AAV library production. The downstream steps utilized AAV vector manufacturing techniques previously described. The plasmid library size was around 1.times.10.sup.6-3.times.10.sup.7. The yield of AAV libraries was around 1.52.times.10.sup.11-2.56.times.10.sup.13 gc.

[0182] Structure-guided saturation mutagenesis quickly abolished vector neutralization by the antibody. We first picked residues 583, 588, 589, 594-597 (AAV8 VP1 numbering, SEQ ID NO: 34) for mutagenesis, because they're within the contact region between monoclonal neutralizing antibody ADK8 and AAV8 capsid, according to the structure resolved by Gurda et al. After one round of in vitro selection in HEK293 cells in the presence of ADK8, mutants were randomly picked and tested with Nab assay. The mutation sequences are listed in Table 1. As shown in FIG. 2A, all the mutants were resistant to ADK8 in comparison to AAV8. They also show resistance to ADK8/9, implying epitope overlapping between the two antibodies. One mutant, C42, showed much higher 293 cell transduction than AAV8, probably due to the change of residue 589 to arginine. Huh7 cells showed similar result (data not shown).

[0183] Liver transduction was evaluated in B6 mice. Mice received CB7.CI.eGFP vectors at a dose of 1.times.10.sup.11 GC/animal, i.v., and liver was harvested two weeks later. The dosage of G112 was 3.5.times.10.sup.10 per animal. Liver transduction in B6 mice with CB7.CI.eGFP reporter showed that GFP expression of C41, G110 and G112 was better than AAV8; G113 and G115 were roughly equal to AAV8; in contrast to its high 293 cell transduction, C42 expressed less GFP in mouse liver (Data not shown).

[0184] The resistance remained in in vivo testing when LSP.canine F9 transgene cassette was packed into those AAV8 mutants and administrated intravenously into mice 2 hours after ADK8 i.v. injection (FIG. 2B). No mutants showed clear resistance to several AAV8 Nab-positive human plasmas (data not shown), which was expected because those mutants are single-epitope ablated and AAV antisera are likely polyclonal, as demonstrated by the broad neutralizing spectrum of AAV Nab in chimpanzees.

[0185] Further mutagenesis and the generation of AAV3G1. One mutant, C41, showed some resistance to two AAV8 Nab-positive human plasmas, when tested in vivo with CB7.CI.eGFP transgene cassette (data not shown). This mutant was used as the backbone for further mutagenesis. HVR.I and HVR.IV region were picked for the next round of mutagenesis, respectively, because protrusions of a protein are likely to be more antigenic. (NNK)5 were loaded into pAAVinvivo.C41 backbone (pAAVinvivo.C41 is the same as pAAVinvivo with AAV8 VP1 replaced with AAV8.C41 VP1) at position 263-267 and 455-459 respectively to make libraries and then go through three round of in vivo selection in mice. For each round, AAV libraries were intravenously injected into mice 2 hour after pooled human Intravenous Immunoglobulin (hIVIG) injection. AAV sequences were retrieved with PCR from mouse livers two weeks after vector injection and loaded into pAAVinvivo.C41 to make libraries for the next round of selection with increased amount of hIVIG. After three rounds of selection, SGTH was the only mutant recovered from the highest IVIG group among all PCR positive animals. It's interesting that it's a three-bp deletion mutant which doesn't disrupt the ORFs of VP123 and assembly activation protein (the DNA change is: AACGGGACATCGGGA (SEQ ID NO: 83)->TCTGGTACTCAT (SEQ ID NO: 84). HVR.IV's signal was still diverse, implying that it's conformationally flexible and may not be the dominant epitope in pooled hIVIG. AAV3G1 was generated by combining the three mutations, C41 (HVR.VIII mutation), SGTH (HVR.I mutation) and GGSRP (HVR.IV mutation) together into AAV8 backbone. GSRP was picked because it showed the highest resistance to hIVIG in in vitro Nab assay, among all HVR.IV mutants tested (data not shown).

[0186] AAV3G1 showed Nab resistance and all the three mutations contributed to the resistance. AAV3G1 showed resistance to hIVIG (FIG. 4A). To figure out each mutation's contribution to the resistance, we made a series of AAV8 mutants plus AAV8 and AAV3G1 to cover all the combinations and tested them with anti-AAV primate sera or plasma. As shown in FIG. 4B, all the three mutations comprising AAV3G1 contributed to Nab resistance.

[0187] The liver transduction of AAV3G1 is down while its muscle transduction is up. We evaluated liver transduction of AAV3G1 in mice with TBG.human F9 (hF9) as the reporter gene. At a dose of 1.times.10.sup.10 gc/animal, i.v., F9 expressed in plasma was around 18% of AAV8, at weeks 1, 2 and 4 after vector administration (FIG. 13A). The neutralizing antibody titer against AAV8 from AAV3G1 injected mice was 12 fold less than AAV8 injected animals (FIG. 13B). Consistent to F9 expression data, the vector genome copies in liver of AAV3G1 was 20% of AAV8. For both treatments, the liver/spleen ratio of vector genome DNA was similar, with AAV3G1 being 285 and AAV8 being 237 (FIG. 8E). We then evaluated muscle transduction of AAV3G1 in mice. Three reporter gene cassettes were used: CB7.CI.luciferase, CMV.LacZ and tMCK.hF9. As in FIG. 5a, intramuscular injection of 3e10 gc of CB7.luciferase clearly showed that a large amount of AAV8 vectors went to liver, consistent to previous study; in contrast, for AAV3G1, the muscle transduction was much higher than AAV8 and a smaller proportion of vectors went to liver. Intravenous injection showed similar results (FIG. 6c). So did CMV.LacZ with both i.m. and i.v (FIGS. 5B, 6A) and tMCK.hF9 with i.m. (FIG. 5C). For tMCK.hF9 i.m. injection, F9 level in the muscle lysate from AAV3G1 injected mice was about 10 fold higher than AAV8; in contrast, plasma F9 level of the two vectors was similar, consistent with previous report that muscle is not an ideal tissue for F9 expression. We also measured the Nab in the tMCK.hF9 study. Consistent with the study described previously in the paper, AAV8 Nab in AAV8-injected mice was higher than AAV3G1-injected mice (around 12 fold) while AAV3G1 Nab in AAV8-injected mice was lower than AAV3G1-injected mice (around 4 fold) (FIG. 5D). The results show that AAV3G1 has better muscle transduction than AAV8 and indicates that the two capsids are serologically different.

[0188] The heparin affinity of AAV3G1 is increased and the rational design of reducing its surface charges successfully reduced its heparin affinity and partially restored its murine liver transduction. Liver transduction of AAV3G1 is decreased despite two of its three mutations identified in three rounds of in vivo selection in mouse liver on the AAV8.C41 backbone. Heparin binding assay showed that the affinity of AAV3G1 is increased (FIG. 8A). Binding to heparin or some other negative charged macromolecules could cause the vectors become trapped/captured before they reach hepatocytes. To eliminate heparin binding, we introduced negative charges onto AAV3G1 capsid, by changing SGTH, the HVR.I mutation, to SDTH, and replacing GGSRP, the HVR.IV mutation, to another negative-charged mutation showing up during the selection process, DGSGL, resulting in a new mutant--AAV8.TR1. The modifications successfully reduced heparin binding (FIG. 8B), and the liver transduction was partially restored (FIG. 13A). The AAV8 Nab titer was 19 fold less than AAV8-treated mice (FIG. 13B). Surprisingly, spleen vector DNA of AAV8.TR1 treated mice was higher than AAV8-treated ones (FIG. 8E). The transduction of AAV3G1 was higher than AAV8 in mice through intranasal vector administration and the rational design of replacing its VP1/2 region with rh.20 improved the transduction further. As shown in FIG. 7a, AAV3G1's transduction was higher than AAV8 in mice through intranasal administration. A previous comprehensive study showed various airway transduction among AAVs. By analyzing the data from Table 1 in Limberis, M P et al, (2009). Transduction efficiencies of novel AAV vectors in mouse airway epithelium in vivo and human ciliated airway epithelium in vitro. Mol Ther 17: 294-301, which is incorporated herein by reference, we found that codon 24 is distinct between low score members and high score members of AAV clade E (data not shown), especially between rh.39 and hu.37--the two have only one amino acid difference (A24D) while their scores are quite different (4 vs 13). We reasoned that VP1/2 region may play some role in AAV airway transduction.

[0189] By replacing VP1/2 region (1-202) of AAV3G1 with rh.20, we created another mutant called AAV8.T20. Indeed, AAV8.T20's transduction was 8-12 fold higher than AAV8 (FIG. 7B), approaching to AAV9 level (FIG. 7B).

Material and Methods

Animal Studies.

[0190] All mice for the study were housed in an Association for Assessment and Accreditation of Laboratory Animal Care-accredited and Public Health Service-assured facility at the University of Pennsylvania. All animal procedures complied with protocols approved by the Institute of Animal Care and Use Committees at the University of Pennsylvania. All mice were bought from the Jackson Laboratory (Bar Harbor, Me.). The mice were C57BL/6J mice (male, 6-8 weeks old) unless specifically described. Plasmid Library construction.

[0191] The starting plasmid, pAAVinvivo, is shown in FIG. 14. HVR.VIII mutagenesis library was constructed by PCR with Phusion (Thermo Fisher Scientific, MA) and a degenerate oligo CTACAGAGGAATACGGTATCGTGNNKGATAACTTGCAGNNKNNKAACACGGCTCCT NNKNNKNNKNNKGTCAAC AGCCAGGGGGCCTTAC (SEQ ID NO: 85), followed by cloning into pAAVinvivo and transformation into Stb14 competent cells (Invitrogen, CA) by electroporation. The initial libraries of HVR.I and HVR.IV were constructed in the same way, with the degenerate oligo CAACCACCTCTACAAGCAAATCTCCNNKNNKNNKNNKNNKGGAGCCACCAACGAC AACACCTACT (SEQ ID NO: 86) for HVR.I and CTACTTGTCTCGGACTCAAACAACANNKNNKNNKNNKNNKACGCAGACTCTGGGCT TCAGCCAA (SEQ ID No:87) for HVR.IV.

[0192] The cloning plasmid was pAAVinvivo.C41--AAV8 VP1 replaced with AAV8.C41 VP1. After round one selection, AAV sequences were retrieved with primers flanked with BsmBI sites and cloned into two new cloning plasmids constructed on pAAVinvivo.C41 by removing the two endogenous BsmBI sites by silent mutations and then introducing two BsmBI sites flanking HVR.I and HVR.IV, respectively. The competent cells used here was MegaX DH10B.TM. T1R Electrocomp.TM. Cells (Invitrogen, CA) instead. The virus libraries were made the same way as regular AAV vector preps.

[0193] AAV Library Production.

[0194] For HVR.VIII, The plasmid library was mixed with pdeltaF6 and pRep and transfected into EK293 cells with Calcium-phosphate method. Three days after transfection, cell lysate was harvest, re-suspended in DPBS and treated with Benzonase (Merck). The lysate was then spinned down to remove debris. The supernatant was the AAV mutagenesis library and stored at -20.degree. C. for further uses. For HVR.I and HVR.IV, the libraries were made the same way as regular AAV vectors (see below). The titration was done with real-time PCR.

[0195] Selection.

[0196] HVR.VIII went through one round of in vitro selection. Specifically, 1e9 genome copies (gc) of the AAV mutagenesis library was mixed with 0.5 .mu.L of ADK8 (AAV8 Nab titer--1:2560) and added up to 1 mL with complete medium. The mixture was incubated at 37.degree. C. for 30 min, and then applied to the 293 cells (MOI, .about.1e4). Two days later, the cell was split followed by transfection with the plasmid pAdAF6 and pRep two days later. Two days after the transfection, AAV fragments were retrieved from the cells by PCR, cloned into Topo vector (Invitrogen) for sequencing, and then cloned into trans plasmids to make AAV.CMV.eGFP vector for further analysis.

[0197] HVR.I and HVR.IV went through three rounds of in vivo selection in B6 mice, with a dose of 2.53e10 gc/mouse for HVR.I and 4e10 gc/mouse for HVR.IV, 3 mice/group, i.v. injection. Two hours before library injection, 100 uL of hIVIG diluted with DPBS was injection intravenously. For round one, one group of mice was for each HVR, with hIVIG titer 1:40; for round two, two groups were for each HVR, with hIVIG titer 1:40 for group 1 and 1:80 for group 2; for round three, three groups were for each HVR, with hIVIG titer 1:80 for group 1, 1:160 for group 2 and 1:320 for group 3. Two weeks after vector injection, AAV sequences were retrieved from liver by PCR for next library construction described above. AAV vector production AAV vectors were made as described by Lock et al, 2010.

[0198] ELISA for canine F9 and human F9. The ELISA for measuring canine F9 was described by Wang et al., 2005. The human F9 ELISA protocol was a modified version of canine F9 ELISA, also developed by Wang et al.

[0199] In vitro Nab assay with eGFP as the reporter gene. 1e9 gc of each AAV mutant carrying eGFP cassette was mixed with different monoclonal antibodies (ADK8, AAV8 Nab titer 1:2560, 0.5 .mu.L/well; ADK8/9, AAV8 Nab titer 1:2560, 0.5 .mu.L/well; ADK9, AAV8 Nab titer 1:5, 0.5 .mu.L/well), up to 100 .mu.L with media, incubated at 37.degree. C. for 30 minutes and then applied to 293 cells (5e4 cells/well seeded one day before infection in a 96-well plate). GFP expression was monitored and quantified with Image J. In vitro Nab assay with Luciferase as the reporter gene. Huh7 cells were seeded in 96-well black plates with clear bottom (Corning), 5e4 cells/well. Two days later, AAV vectors were diluted in complete medium and then mixed serum/plasma samples with various dilutions. The mixture was incubated at 37.degree. C. for 30 minutes before transferred to the Huh7 plates. Three days after vector infection, luminescence was read with Clarity.TM. Luminescence Microplate Reader (BioTek).

[0200] Luciferase Assay, In Vivo

[0201] For studies with intranasal administration, mice were anaesthetized. D-luciferin (Xenogen) was instilled into the mouse nostril at 15 ug/uL, 10 uL/nostril, 20 uL/mouse. Five minutes later, luminescent images were taken by IVIS.RTM. Imaging Systems (Xenogen) and quantified with the software Living Image.RTM. 3.2. For other studies, mice were treated the same way except that D-luciferin was given i.p., 10 uL/gram of mouse body weight and that the luminescence was measured 20 minutes after luciferin injection.

[0202] Heparin Binding Assay

[0203] AAV vectors were diluted in desired buffers (DPBS or Tris buffer) and loaded to HiTrap Heparin HP column (GE Healthcare Life Sciences) by AKTA.TM. FPLC System (GE). The column was then washed sequentially with vector dilution buffer and dilution buffers plus increasing amount of sodium chloride. Fractions were collected during the whole process. Dot blot protocol was described by Tenney et al, 2014.

[0204] Another aspect of this study was replacing VP1/2 region (1-202) of AAV3G1 with h.20. By combining the data from Limberis et al.'s study (Limberis, M P et al, (2009). Transduction efficiencies of novel AAV vectors in mouse airway epithelium in vivo and human ciliated airway epithelium in vitro. Mol Ther 17: 294-301, which is incorporated herein by reference) and our sequence analysis, we found the codon 24 differentiation between high lung transduction members and low-lung transduction members within AAV clade E. Because the amino acids of the 1-202 region of the three highest Clade E member, rh.64R1, rh.10 and rh.20, are identical, we replaced this region into AAV3G1, leading to further improvement of AAV3G1's nasal transduction.

Example 4: Comparison of AAV8 and AAV3G1in Muscle

[0205] Male B6 mice, 3 mice/group, were injected i.m. with 3e9 or 3e10 gc/mouse, 1 leg/mouse with AAV3G1.tMCK.PI.ffluc.bGH, dd-PCR(PK), manufactured and titrated by Vector Core. Week 1 results are shown in FIG. 15. For each figure, the left is AAV8-treated, the right AAV3G1.

[0206] Substantial proportion of AAV8 vectors went to liver even though the vectors were injected intramuscularly, consistent to previous studies, and the transgene was expressed in the liver even when controlled by the muscle-specific promoter tMCK. AAV3G1's muscle transduction is much better than AAV8.

Example 5

[0207] Neutralizing antibody titers were determined for AAV8, AV83G1 and AAV9 using serum from naive NHPs. The results confirm that AAV8 and AAV3G1 are serologically distinct.

TABLE-US-00005 Animal AAV NAb in HEK293 cells.sup.1,2 # ID Time Point AAV8 AAV83G1 AAV9 1 RA2125 Screening <5 <5 <5 2 RA2145 Screening <5 <5 <5 3 RA2150 Screening <5 <5 <5 4 RA2153 Screening 5* <5 <5 5 RA2152 Screening <5 <5 <5 6 RA2172 Screening <5 5* 5* 7 RA2309 Screening 10* <5 <5 8 RA2334 Screening <5 <5 <5 9 RA2343 Screening <5 <5 <5 10 RA1971 Screening <5 <5 <5 11 RA0549 Screening <5 <5 <5 12 RA1875 Screening <5 <5 <5 13 RA0875 Screening <5 <5 <5 14 RA1915 Screening <5 <5 <5 15 RA1156 Screening <5 <5 <5 16 BD957KB Screening <5 <5 <5 17 RA0472 Screening 10* <5 <5 18 RA0760 Screening >20* 5* 5*

Sequence Listing Free Text

[0208] The following information is provided for sequences containing free text under numeric identifier <223>.

TABLE-US-00006 SEQ ID NO: (containing free text) Free text under <223> 1 <223> constructed sequence 2 <223> constructed sequence 3 <223> constructed sequence 4 <223> constructed sequence 5 <223> constructed sequence 6 <223> constructed sequence 7 <223> constructed sequence 8 <223> constructed sequence 9 <223> constructed sequence 10 <223> constructed sequence 11 <223> constructed sequence 12 <223> constructed sequence 13 <223> constructed sequence 14 <223> constructed sequence 15 <223> constructed sequence 16 <223> constructed sequence 17 <223> constructed sequence 18 <223> constructed sequence 19 <223> constructed sequence 20 <223> constructed sequence 21 <223> constructed sequence 22 <223> constructed sequence 23 <223> constructed sequence 24 <223> constructed sequence 25 <223> constructed sequence 26 <223> constructed sequence 27 <223> constructed sequence 28 <223> constructed sequence 29 <223> constructed sequence 30 <223> constructed sequence 31 <223> constructed sequence 32 <223> constructed sequence 33 <223> constructed sequence 34 <223> constructed sequence 35 <223> constructed sequence 36 <223> constructed sequence 37 <223> constructed sequence 38 <223> constructed sequence 39 <223> constructed sequence 40 <223> constructed sequence 41 <223> constructed sequence 42 <223> constructed sequence 43 <223> constructed sequence 44 <223> constructed sequence 45 <223> Constructed sequence <220> <221> misc_feature <222> (24)..(25) <223> n is a, c, g, or t <220> <221> misc_feature <222> (39)..(40) <223> n is a, c, g, or t <220> <221> misc_feature <222> (42)..(43) <223> n is a, c, g, or t <220> <221> misc_feature <222> (57)..(58) <223> n is a, c, g, or t <220> <221> misc_feature <222> (60)..(61) <223> n is a, c, g, or t <220> <221> misc_feature <222> (63)..(64) <223> n is a, c, g, or t <220> <221> misc_feature <222> (66)..(67) <223> n is a, c, g, or t 46 <223> Constructed sequence 47 <223> Constructed sequence 48 <223> Constructed sequence 49 <223> Constructed sequence 50 <223> Constructed sequence 51 <223> Constructed sequence 52 <223> Constructed sequence 53 <223> Constructed sequence 54 <223> Constructed sequence 55 <223> Constructed sequence 56 <223> constructed sequence 57 <223> Constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 58 <223> Constructed sequence <220> <221> misc_feature <222> (27)..(28) <223> n is a, c, g, or t <220> <221> misc_feature <222> (30)..(31) <223> n is a, c, g, or t <220> <221> misc_feature <222> (33)..(34) <223> n is a, c, g, or t <220> <221> misc_feature <222> (36)..(37) <223> n is a, c, g, or t <220> <221> misc_feature <222> (39)..(40) <223> n is a, c, g, or t 59 <223> Constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 60 <223> Constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 61 <223> Constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 62 <223> Constructed sequence <220> <221> misc_feature <222> (27)..(28) <223> n is a, c, g, or t <220> <221> misc_feature <222> 30)..(31) <223> n is a, c, g, or t <220> <221> misc_feature <222> (33)..(34) <223> n is a, c, g, or t <220> <221> misc_feature <222> (36)..(37) <223> n is a, c, g, or t <220> <221> misc_feature <222> (39)..(40) <223> n is a, c, g, or t 63 <223> Constructed sequence 64 <223> Constructed sequence 65 <223> Constructed sequence 66 <223> Constructed sequence 67 <223> Constructed sequence 68 <223> Constructed sequence 69 <223> major ADK8 epitope in AAV8 HVR.VIII region 70 <223> mutated c41 ADK8 epitope in AAV8 HVR.VIII region 71 <223> mutated c42 ADK8 epitope in AAV8 HVR.VIII region 72 <223> mutated c46 ADK8 epitope in AAV8 HVR.VIII region 73 <223> mutated g110 ADK8 epitope in AAV8 HVR.VIII region 74 <223> mutated g112 ADK8 epitope in AAV8 HVR.VIII region 75 <223> mutated g113 ADK8 epitope in AAV8 HVR.VIII region 76 <223> mutated g115 ADK8 epitope in AAV8 HVR.VIII region 77 <223> mutated g117 ADK8 epitope in AAV8 HVR.VIII region 78 <223> Constructed sequence 79 <223> Constructed sequence 80 <223> Constructed sequence 81 <223> Constructed sequence 82 <223> Constructed sequence 83 <223> Constructed sequence 84 <223> Constructed sequence 85 <223> Constructed sequence <220> <221> misc_feature <222> (24)..(25)

<223> n is a, c, g, or t <220> <221> misc_feature <222> (39)..(40) <223> n is a, c, g, or t <220> <221> misc_feature <222> (42)..(43) <223> n is a, c, g, or t <220> <221> misc_feature <222> (57)..(58) <223> n is a, c, g, or t <220> <221> misc_feature <222> (60)..(61) <223> n is a, c, g, or t <220> <221> misc_feature <222> (63)..(64) <223> n is a, c, g, or t <220> <221> misc_feature <222> (66)..(67) <223> n is a, c, g, or t 86 <223> constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 87 <223> Constructed sequence <220> <221> misc_feature <222> (26)..(27) <223> n is a, c, g, or t <220> <221> misc_feature <222> (29)..(30) <223> n is a, c, g, or t <220> <221> misc_feature <222> (32)..(33) <223> n is a, c, g, or t <220> <221> misc_feature <222> (35)..(36) <223> n is a, c, g, or t <220> <221> misc_feature <222> (38)..(39) <223> n is a, c, g, or t 88 <223> AAV rh.20 capsid protein

[0209] All publications cited in this specification are incorporated herein by reference in their entireties, as is U.S. Provisional Patent Application No. 62/323,389, filed Apr. 15, 2016. Similarly, the SEQ ID NOs which are referenced herein and which appear in the appended Sequence Listing are incorporated by reference. While the invention has been described with reference to particular embodiments, it will be appreciated that modifications can be made without departing from the spirit of the invention. Such modifications are intended to fall within the scope of the appended claims.

Sequence CWU 1

1

8812217DNAArtificial Sequenceconstructed sequence 1atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgggtg ataacttgca gttgtataac acggctcctg gttcggtgtt tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 22172738PRTArtificial Sequenceconstructed sequence 2Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala 580 585 590Pro Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu32217DNAArtificial Sequenceconstructed sequence 3atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgtctg ataacttgca gtttcgtaac acggctcctt tgtggtcttc tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 22174738PRTArtificial Sequenceconstructed sequence 4Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Ser Asp Asn Leu Gln Phe Arg Asn Thr Ala 580 585 590Pro Leu Trp Ser Ser Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu52217DNAArtificial Sequenceconstructed sequence 5atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg

ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgaatg ataacttgca ggtttgtaac acggctcctg atgatgttat ggtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 22176738PRTArtificial Sequenceconstructed sequence 6Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Asn Asp Asn Leu Gln Val Cys Asn Thr Ala 580 585 590Pro Asp Asp Val Met Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu72217DNAArtificial Sequenceconstructed sequence 7atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgtgtg ataacttgca gggttataac acggctcctc tgtgtgttgc tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 22178738PRTArtificial Sequenceconstructed sequence 8Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Cys Asp Asn Leu Gln Gly Tyr Asn Thr Ala 580 585 590Pro Leu Cys Val Ala Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu92217DNAArtificial Sequenceconstructed sequence 9atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtggttg ataacttgca gtttcttaac acggctcctg ctggtgaggc ggtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221710738PRTArtificial Sequenceconstructed sequence 10Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp

Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Val Asp Asn Leu Gln Phe Leu Asn Thr Ala 580 585 590Pro Ala Gly Glu Ala Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu112217DNAArtificial Sequenceconstructed sequence 11atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgcttg ataacttgca ggatggtaac acggctcctg gtgcgtgtgg tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221712738PRTArtificial Sequenceconstructed sequence 12Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Leu Asp Asn Leu Gln Asp Gly Asn Thr Ala 580 585 590Pro Gly Ala Cys Gly Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu132217DNAArtificial Sequenceconstructed sequence 13atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgtggg ataacttgca gtctgagaac acggctcctt cggagacttc tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221714738PRTArtificial Sequenceconstructed sequence 14Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr

Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Trp Asp Asn Leu Gln Ser Glu Asn Thr Ala 580 585 590Pro Ser Glu Thr Ser Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu152217DNAArtificial Sequenceconstructed sequence 15atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgtctg ataacttgca gtcttgtaac acggctcctt ttgcgggtgc ggtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221716738PRTArtificial Sequenceconstructed sequence 16Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Ser Asp Asn Leu Gln Ser Cys Asn Thr Ala 580 585 590Pro Phe Ala Gly Ala Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu172214DNAArtificial Sequenceconstructed sequence 17atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg gtactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag gtgggagtag gcctacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtgggtgata acttgcagtt gtataacacg gctcctggtt cggtgtttgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221418737PRTArtificial Sequenceconstructed sequence 18Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Gly Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Gly Gly Ser Arg Pro Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570

575Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala Pro 580 585 590Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu192214DNAArtificial Sequenceconstructed sequence 19atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg acttgaaacc tggagccccg aaacccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctca aagcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagacag gccagcagcc cgcgaaaaag agactcaact ttgggcagac tggcgactca 540gagtcagtgc ccgaccctca accaatcgga gaaccccccg caggcccctc tggtctggga 600tctggtacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg gtactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag gtgggagtag gcctacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtgggtgata acttgcagtt gtataacacg gctcctggtt cggtgtttgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221420737PRTArtificial Sequenceconstructed sequence 20Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Asp Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Lys Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Thr Gly Gln Gln Pro Ala Lys Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Ile Gly Glu Pro 180 185 190Pro Ala Gly Pro Ser Gly Leu Gly Ser Gly Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Gly Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Gly Gly Ser Arg Pro Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570 575Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala Pro 580 585 590Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu212214DNAArtificial Sequenceconstructed sequence 21atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg atactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag atgggtctgg gctgacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtgggtgata acttgcagtt gtataacacg gctcctggtt cggtgtttgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221422737PRTArtificial Sequenceconstructed sequence 22Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Asp Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Asp Gly Ser Gly Leu Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570 575Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala Pro 580 585 590Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu232214DNAArtificial Sequenceconstructed sequence 23atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt

ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg gtactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag gtgggagtag gcctacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtggcagata acttgcagca gcaaaacacg gctcctcaaa ttggaactgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221424737PRTArtificial Sequenceconstructed sequence 24Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Gly Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Gly Gly Ser Arg Pro Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570 575Glu Tyr Gly Ile Val Ala Asp Asn Leu Gln Gln Gln Asn Thr Ala Pro 580 585 590Gln Ile Gly Thr Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu252214DNAArtificial Sequenceconstructed sequence 25atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg gtactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag gaggcacggc aaatacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtgggtgata acttgcagtt gtataacacg gctcctggtt cggtgtttgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221426737PRTArtificial Sequenceconstructed sequence 26Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Gly Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570 575Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala Pro 580 585 590Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu272214DNAArtificial Sequenceconstructed sequence 27atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctcctctg gtactcatgg agccaccaac gacaacacct acttcggcta cagcaccccc 840tgggggtatt ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga 900ctcatcaaca acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc 960caggtcaagg aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc 1020accatccagg tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac 1080cagggctgcc tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta 1140acactcaaca acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt 1200ccttcgcaga tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg 1260cctttccaca gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctctgatt 1320gaccagtacc tgtactactt gtctcggact caaacaacag gaggcacggc aaatacgcag 1380actctgggct tcagccaagg tgggcctaat acaatggcca atcaggcaaa gaactggctg 1440ccaggaccct gttaccgcca acaacgcgtc tcaacgacaa ccgggcaaaa caacaatagc 1500aactttgcct ggactgctgg gaccaaatac catctgaatg gaagaaattc attggctaat 1560cctggcatcg ctatggcaac acacaaagac gacgaggagc gtttttttcc cagtaacggg 1620atcctgattt ttggcaaaca aaatgctgcc agagacaatg cggattacag cgatgtcatg 1680ctcaccagcg aggaagaaat caaaaccact aaccctgtgg ctacagagga atacggtatc 1740gtggcagata acttgcagca gcaaaacacg gctcctcaaa ttggaactgt caacagccag 1800ggggccttac ccggtatggt ctggcagaac

cgggacgtgt acctgcaggg tcccatctgg 1860gccaagattc ctcacacgga cggcaacttc cacccgtctc cgctgatggg cggctttggc 1920ctgaaacatc ctccgcctca gatcctgatc aagaacacgc ctgtacctgc ggatcctccg 1980accaccttca accagtcaaa gctgaactct ttcatcacgc aatacagcac cggacaggtc 2040agcgtggaaa ttgaatggga gctgcagaag gaaaacagca agcgctggaa ccccgagatc 2100cagtacacct ccaactacta caaatctaca agtgtggact ttgctgttaa tacagaaggc 2160gtgtactctg aaccccgccc cattggcacc cgttacctca cccgtaatct gtaa 221428737PRTArtificial Sequenceconstructed sequence 28Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Ser Gly Thr His Gly Ala Thr Asn Asp Asn 260 265 270Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn Arg 275 280 285Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn Asn 290 295 300Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn Ile305 310 315 320Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala Asn 325 330 335Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln Leu 340 345 350Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe Pro 355 360 365Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn Asn 370 375 380Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr Phe385 390 395 400Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr Thr 405 410 415Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser Leu 420 425 430Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu Ser 435 440 445Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly Phe 450 455 460Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp Leu465 470 475 480Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly Gln 485 490 495Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His Leu 500 505 510Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr His 515 520 525Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile Phe 530 535 540Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val Met545 550 555 560Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr Glu 565 570 575Glu Tyr Gly Ile Val Ala Asp Asn Leu Gln Gln Gln Asn Thr Ala Pro 580 585 590Gln Ile Gly Thr Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val Trp 595 600 605Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile Pro 610 615 620His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe Gly625 630 635 640Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val Pro 645 650 655Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe Ile 660 665 670Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu Leu 675 680 685Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr Ser 690 695 700Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu Gly705 710 715 720Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg Asn 725 730 735Leu292217DNAArtificial Sequenceconstructed sequence 29atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggtgggag taggcctacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtggcag ataacttgca gcagcaaaac acggctcctc aaattggaac tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221730738PRTArtificial Sequenceconstructed sequence 30Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Ser Arg Pro Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Ala Asp Asn Leu Gln Gln Gln Asn Thr Ala 580 585 590Pro Gln Ile Gly Thr Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu312217DNAArtificial Sequenceconstructed sequence 31atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggtgggag taggcctacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtgggtg ataacttgca gttgtataac acggctcctg gttcggtgtt tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221732738PRTArtificial Sequenceconstructed sequence 32Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro

Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Ser Arg Pro Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala 580 585 590Pro Gly Ser Val Phe Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu332217DNAArtificial Sequenceconstructed sequence 33atggctgccg atggttatct tccagattgg ctcgaggaca acctctctga gggcattcgc 60gagtggtggg cgctgaaacc tggagccccg aagcccaaag ccaaccagca aaagcaggac 120gacggccggg gtctggtgct tcctggctac aagtacctcg gacccttcaa cggactcgac 180aagggggagc ccgtcaacgc ggcggacgca gcggccctcg agcacgacaa ggcctacgac 240cagcagctgc aggcgggtga caatccgtac ctgcggtata accacgccga cgccgagttt 300caggagcgtc tgcaagaaga tacgtctttt gggggcaacc tcgggcgagc agtcttccag 360gccaagaagc gggttctcga acctctcggt ctggttgagg aaggcgctaa gacggctcct 420ggaaagaaga gaccggtaga gccatcaccc cagcgttctc cagactcctc tacgggcatc 480ggcaagaaag gccaacagcc cgccagaaaa agactcaatt ttggtcagac tggcgactca 540gagtcagttc cagaccctca acctctcgga gaacctccag cagcgccctc tggtgtggga 600cctaatacaa tggctgcagg cggtggcgca ccaatggcag acaataacga aggcgccgac 660ggagtgggta gttcctcggg aaattggcat tgcgattcca catggctggg cgacagagtc 720atcaccacca gcacccgaac ctgggccctg cccacctaca acaaccacct ctacaagcaa 780atctccaacg ggacatcggg aggagccacc aacgacaaca cctacttcgg ctacagcacc 840ccctgggggt attttgactt taacagattc cactgccact tttcaccacg tgactggcag 900cgactcatca acaacaactg gggattccgg cccaagagac tcagcttcaa gctcttcaac 960atccaggtca aggaggtcac gcagaatgaa ggcaccaaga ccatcgccaa taacctcacc 1020agcaccatcc aggtgtttac ggactcggag taccagctgc cgtacgttct cggctctgcc 1080caccagggct gcctgcctcc gttcccggcg gacgtgttca tgattcccca gtacggctac 1140ctaacactca acaacggtag tcaggccgtg ggacgctcct ccttctactg cctggaatac 1200tttccttcgc agatgctgag aaccggcaac aacttccagt ttacttacac cttcgaggac 1260gtgcctttcc acagcagcta cgcccacagc cagagcttgg accggctgat gaatcctctg 1320attgaccagt acctgtacta cttgtctcgg actcaaacaa caggaggcac ggcaaatacg 1380cagactctgg gcttcagcca aggtgggcct aatacaatgg ccaatcaggc aaagaactgg 1440ctgccaggac cctgttaccg ccaacaacgc gtctcaacga caaccgggca aaacaacaat 1500agcaactttg cctggactgc tgggaccaaa taccatctga atggaagaaa ttcattggct 1560aatcctggca tcgctatggc aacacacaaa gacgacgagg agcgtttttt tcccagtaac 1620gggatcctga tttttggcaa acaaaatgct gccagagaca atgcggatta cagcgatgtc 1680atgctcacca gcgaggaaga aatcaaaacc actaaccctg tggctacaga ggaatacggt 1740atcgtggcag ataacttgca gcagcaaaac acggctcctc aaattggaac tgtcaacagc 1800cagggggcct tacccggtat ggtctggcag aaccgggacg tgtacctgca gggtcccatc 1860tgggccaaga ttcctcacac ggacggcaac ttccacccgt ctccgctgat gggcggcttt 1920ggcctgaaac atcctccgcc tcagatcctg atcaagaaca cgcctgtacc tgcggatcct 1980ccgaccacct tcaaccagtc aaagctgaac tctttcatca cgcaatacag caccggacag 2040gtcagcgtgg aaattgaatg ggagctgcag aaggaaaaca gcaagcgctg gaaccccgag 2100atccagtaca cctccaacta ctacaaatct acaagtgtgg actttgctgt taatacagaa 2160ggcgtgtact ctgaaccccg ccccattggc acccgttacc tcacccgtaa tctgtaa 221734738PRTArtificial Sequenceconstructed sequence 34Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Ala Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55 60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Gln Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Lys Gly Gln Gln Pro Ala Arg Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Leu Gly Glu Pro 180 185 190Pro Ala Ala Pro Ser Gly Val Gly Pro Asn Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ala Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Ser Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Gln Phe Thr Tyr 405 410 415Thr Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Thr Thr Gly Gly Thr Ala Asn Thr Gln Thr Leu Gly 450 455 460Phe Ser Gln Gly Gly Pro Asn Thr Met Ala Asn Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Thr Gly 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Ala Gly Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asn Ser Leu Ala Asn Pro Gly Ile Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Asn Gly Ile Leu Ile 530 535 540Phe Gly Lys Gln Asn Ala Ala Arg Asp Asn Ala Asp Tyr Ser Asp Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Glu Tyr Gly Ile Val Ala Asp Asn Leu Gln Gln Gln Asn Thr Ala 580 585 590Pro Gln Ile Gly Thr Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Asn Gln Ser Lys Leu Asn Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Ser Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Val Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu35594DNAArtificial Sequenceconstructed sequence 35ctggcgactc agagtcagtt ccagaccctc aacctctcgg agaacctcca gcagcgccct 60ctggtgtggg acctaataca atggctgcag gcggtggcgc accaatggca gacaataacg 120aaggcgccga cggagtgggt agttcctcgg gaaattggca ttgcgattcc acatggctgg 180gcgacagagt catcaccacc agcacccgaa cctgggccct gcccacctac aacaaccacc 240tctacaagca aatctccaac gggacatcgg gaggagccac caacgacaac acctacttcg 300gctacagcac cccctggggg tattttgact ttaacagatt ccactgccac ttttcaccac 360gtgactggca gcgactcatc aacaacaact ggggattccg gcccaagaga ctcagcttca 420agctcttcaa catccaggtc aaggaggtca cgcagaatga aggcaccaag accatcgcca 480ataacctcac cagcaccatc caggtgttta cggactcgga gtaccagctg ccgtacgttc 540tcggctctgc ccaccagggc tgcctgcctc cgttcccggc ggacgtgttc atga 59436197PRTArtificial Sequenceconstructed sequence 36Leu Ala Thr Gln Ser Gln Phe Gln Thr Leu Asn Leu Ser Glu Asn Leu1 5 10 15Gln Gln Arg Pro Leu Val Trp Asp Leu Ile Gln Trp Leu Gln Ala Val 20 25 30Ala His Gln Trp Gln Thr Ile Thr Lys Ala Pro Thr Glu Trp Val Val 35 40 45Pro Arg Glu Ile Gly Ile Ala Ile Pro His Gly Trp Ala Thr Glu Ser 50 55 60Ser Pro Pro Ala Pro Glu Pro Gly Pro Cys Pro Pro Thr Thr Thr Thr65 70 75 80Ser Thr Ser Lys Ser Pro Thr Gly His Arg Glu Glu Pro Pro Thr Thr 85 90 95Thr Pro Thr Ser Ala Thr Ala Pro Pro Gly Gly Ile Leu Thr Leu Thr 100 105 110Asp Ser Thr Ala Thr Phe His His Val Thr Gly Ser Asp Ser Ser Thr 115 120 125Thr Thr Gly Asp Ser Gly Pro Arg Asp Ser Ala Ser Ser Ser Ser Thr 130 135 140Ser Arg Ser Arg Arg Ser Arg Arg Met Lys Ala Pro Arg Pro Ser Pro145 150 155 160Ile Thr Ser Pro Ala Pro Ser Arg Cys Leu Arg Thr Arg Ser Thr Ser 165 170 175Cys Arg Thr Phe Ser Ala Leu Pro Thr Arg Ala Ala Cys Leu Arg Ser 180 185 190Arg Arg Thr Cys Ser 19537591DNAArtificial Sequenceconstructed sequence 37ctggcgactc agagtcagtt ccagaccctc aacctctcgg agaacctcca gcagcgccct 60ctggtgtggg acctaataca atggctgcag gcggtggcgc accaatggca gacaataacg 120aaggcgccga cggagtgggt agttcctcgg gaaattggca ttgcgattcc acatggctgg 180gcgacagagt catcaccacc agcacccgaa cctgggccct gcccacctac aacaaccacc 240tctacaagca aatctcctct ggtactcatg gagccaccaa cgacaacacc tacttcggct 300acagcacccc ctgggggtat tttgacttta acagattcca ctgccacttt tcaccacgtg 360actggcagcg actcatcaac aacaactggg gattccggcc caagagactc agcttcaagc 420tcttcaacat ccaggtcaag gaggtcacgc agaatgaagg caccaagacc atcgccaata 480acctcaccag caccatccag gtgtttacgg actcggagta ccagctgccg tacgttctcg 540gctctgccca ccagggctgc ctgcctccgt tcccggcgga cgtgttcatg a 59138196PRTArtificial Sequenceconstructed sequence 38Leu Ala Thr Gln Ser Gln Phe Gln Thr Leu Asn Leu Ser Glu Asn Leu1 5 10 15Gln Gln Arg Pro Leu Val Trp Asp Leu Ile Gln Trp Leu Gln Ala Val 20 25 30Ala His Gln Trp Gln Thr Ile Thr Lys Ala Pro Thr Glu Trp Val Val 35 40 45Pro Arg Glu Ile Gly Ile Ala Ile Pro His Gly Trp Ala Thr Glu Ser 50 55 60Ser Pro Pro Ala Pro Glu Pro Gly Pro Cys Pro Pro Thr Thr Thr Thr65 70 75 80Ser Thr Ser Lys Ser Pro Leu Val Leu Met Glu Pro Pro Thr Thr Thr 85 90 95Pro Thr Ser Ala Thr Ala Pro Pro Gly Gly Ile Leu Thr Leu Thr Asp 100 105 110Ser Thr Ala Thr Phe His His Val Thr Gly Ser Asp Ser Ser Thr Thr 115 120 125Thr Gly Asp Ser Gly Pro Arg Asp Ser Ala Ser Ser Ser Ser Thr Ser 130 135 140Arg Ser Arg Arg Ser Arg Arg Met Lys Ala Pro Arg Pro Ser Pro Ile145 150 155 160Thr Ser Pro Ala Pro Ser Arg Cys Leu Arg Thr Arg Ser Thr Ser Cys 165 170 175Arg Thr Phe Ser Ala Leu Pro Thr Arg Ala Ala Cys Leu Arg Ser Arg 180 185 190Arg Thr Cys Ser 195395500DNAArtificial Sequenceconstructed sequence 39ctgcgcgctc gctcgctcac tgaggccgcc cgggcaaagc ccgggcgtcg ggcgaccttt 60ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120aggggttcct tgtagttaat gattaacccg ccatgctact tatctacgta gccatgctct 180aggaagatcg gaattcgccc ttaagctagc aaaaaccaac acacagatcc aatgaaaata 240aggatctttt atttctagat tagggcaagg cggagccgga ggcgatggcg tgctcggtca 300ggtgccactt ctggttcttg gcgtcgctgc ggtcctcgcg ggtcagcttg tgctggatga 360agtgccagtc gggcatcttg cggggcacgg acttggcctt gtacacggtg tcgaactggc 420agcgcaagcg gccaccgtcc ttcagcagca ggtacatgct cacgtcgccc ttcaagatgc 480cctgcttggg cacggggatg atcttctcgc aggagggctc ccagttgtcg gtcatcttct 540tcatcacggg gccgtcggcg gggaagttca cgccgtagaa cttggactcg tggtacatgc 600agttctcctc cacgctcacg gtgatgtcgg cgttgcagat gcacacggcg ccgtcctcga 660acaggaagga gcggtcccag gtgtagccgg cggggcagga gttcttgaag tagtcgacga 720tgtcctgggg gtactcggtg aacacgcggt tgccgtacat gaaggcggcg gacaagatgt 780cctcggcgaa gggcaagggg ccgccctcca ccacgcacag gttgatggcc tgcttgccct 840tgaaggggta gccgatgccc tcgccggtga tcacgaactt gtggccgtcc acgcagccct 900ccatgcggta cttcatggtc atctccttgg

tcaggccgtg cttggactgg gccatggtgg 960ctctagatcg aaaggcccgg agatgaggaa gaggagaaca gcgcggcaga cgtgcgcttt 1020tgaagcgtgc agaatgccgg gcctccggag gaccttcggg cgcccgcccc gcccctgagc 1080ccgcccctga gcccgccccc ggacccaccc cttcccagcc tctgagccca gaaagcgaag 1140gagcaaagct gctattggcc gctgccccaa aggcctaccc gcttccattg ctcagcggtg 1200ctgtccatct gcacgagact agctagtgag acgtgctact tccatttgtc acgtcctgca 1260cgacgcgagc tgcggggcgg gggggaactt cctgactagg ggaggagtag aaggtggcgc 1320gaaggggcca ccaaagaacg gagccggttg gcgcctaccg gtggatgtgg aatgtgtgcg 1380aggccagagg ccacttgtgt agcgccaagt gcccagcggg gctgctaaag cgcatgctcc 1440agactgcctt gggaaaagcg cctcccctac ccggtagcta gctagttatt aatagtaatc 1500aattacgggg tcattagttc atagcccata tatggagttc cgcgttacat aacttacggt 1560aaatggcccg cctggctgac cgcccaacga cccccgccca ttgacgtcaa taatgacgta 1620tgttcccata gtaacgccaa tagggacttt ccattgacgt caatgggtgg agtatttacg 1680gtaaactgcc cacttggcag tacatcaagt gtatcatatg ccaagtacgc cccctattga 1740cgtcaatgac ggtaaatggc ccgcctggca ttatgcccag tacatgacct tatgggactt 1800tcctacttgg cagtacatct acgtattagt catcgctatt accatggtga tgcggttttg 1860gcagtacatc aatgggcgtg gatagcggtt tgactcacgg ggatttccaa gtctccaccc 1920cattgacgtc aatgggagtt tgttttggca ccaaaatcaa cgggactttc caaaatgtcg 1980taacaactcc gccccattga cgcaaatggg cggtaggcgt gtacggtggg aggtctatat 2040aagcagagct ggtttagtga accgtcagat cctgcatgaa gcttcgatca actacgcaga 2100caggtaccaa aacaaatgtt ctcgtcacgt gggcatgaat ctgatgctgt ttccctgcag 2160acaatgcgag agaatgaatc agaattcaaa tatctgcttc actcacggac agaaagactg 2220tttagagtgc tttcccgtgt cagaatctca acccgtttct gtcgtcaaaa aggcgtatca 2280gaaactgtgc tacattcatc atatcatggg aaaggtgcca gacgcttgca ctgcctgcga 2340tctggtcaat gtggatttgg atgactgcat ctttgaacaa taaatgattt aaatcaggta 2400tggcaggtgc taagtactag ttaatcaata aaccggacat tcgaaaggct gcggtcgaac 2460gcatgctggg gactcgagtt aagggcgaat tcccgataag gatcttccta gagcatggct 2520acgtagataa gtagcatggc gggttaatca ttaactacaa ggaaccccta gtgatggagt 2580tggccactcc ctctctgcgc gctcgctcgc tcactgaggc cgggcgacca aaggtcgccc 2640gacgcccggg ctttgcccgg gcggcctcag tgagcgagcg agcgcgcagc cttaattaac 2700ctaattcact ggccgtcgtt ttacaacgtc gtgactggga aaaccctggc gttacccaac 2760ttaatcgcct tgcagcacat ccccctttcg ccagctggcg taatagcgaa gaggcccgca 2820ccgatcgccc ttcccaacag ttgcgcagcc tgaatggcga atgggacgcg ccctgtagcg 2880gcgcattaag cgcggcgggt gtggtggtta cgcgcagcgt gaccgctaca cttgccagcg 2940ccctagcgcc cgctcctttc gctttcttcc cttcctttct cgccacgttc gccggctttc 3000cccgtcaagc tctaaatcgg gggctccctt tagggttccg atttagtgct ttacggcacc 3060tcgaccccaa aaaacttgat tagggtgatg gttcacgtag tgggccatcg ccccgataga 3120cggtttttcg ccctttgacg ctggagttca cgttcctcaa tagtggactc ttgttccaaa 3180ctggaacaac actcaaccct atctcggtct attcttttga tttataaggg atttttccga 3240tttcggccta ttggttaaaa aatgagctga tttaacaaaa atttaacgcg aattttaaca 3300aaatattaac gtttataatt tcaggtggca tctttcgggg aaatgtgcgc ggaaccccta 3360tttgtttatt tttctaaata cattcaaata tgtatccgct catgagacaa taaccctgat 3420aaatgcttca ataatattga aaaaggaaga gtatgagtat tcaacatttc cgtgtcgccc 3480ttattccctt ttttgcggca ttttgccttc ctgtttttgc tcacccagaa acgctggtga 3540aagtaaaaga tgctgaagat cagttgggtg cacgagtggg ttacatcgaa ctggatctca 3600atagtggtaa gatccttgag agttttcgcc ccgaagaacg ttttccaatg atgagcactt 3660ttaaagttct gctatgtggc gcggtattat cccgtattga cgccgggcaa gagcaactcg 3720gtcgccgcat acactattct cagaatgact tggttgagta ctcaccagtc acagaaaagc 3780atcttacgga tggcatgaca gtaagagaat tatgcagtgc tgccataacc atgagtgata 3840acactgcggc caacttactt ctgacaacga tcggaggacc gaaggagcta accgcttttt 3900tgcacaacat gggggatcat gtaactcgcc ttgatcgttg ggaaccggag ctgaatgaag 3960ccataccaaa cgacgagcgt gacaccacga tgcctgtagt aatggtaaca acgttgcgca 4020aactattaac tggcgaacta cttactctag cttcccggca acaattaata gactggatgg 4080aggcggataa agttgcagga ccacttctgc gctcggccct tccggctggc tggtttattg 4140ctgataaatc tggagccggt gagcgtgggt ctcgcggtat cattgcagca ctggggccag 4200atggtaagcc ctcccgtatc gtagttatct acacgacggg gagtcaggca actatggatg 4260aacgaaatag acagatcgct gagataggtg cctcactgat taagcattgg taactgtcag 4320accaagttta ctcatatata ctttagattg atttaaaact tcatttttaa tttaaaagga 4380tctaggtgaa gatccttttt gataatctca tgaccaaaat cccttaacgt gagttttcgt 4440tccactgagc gtcagacccc gtagaaaaga tcaaaggatc ttcttgagat cctttttttc 4500tgcgcgtaat ctgctgcttg caaacaaaaa aaccaccgct accagcggtg gtttgtttgc 4560cggatcaaga gctaccaact ctttttccga aggtaactgg cttcagcaga gcgcagatac 4620caaatactgt ccttctagtg tagccgtagt taggccacca cttcaagaac tctgtagcac 4680cgcctacata cctcgctctg ctaatcctgt taccagtggc tgctgccagt ggcgataagt 4740cgtgtcttac cgggttggac tcaagacgat agttaccgga taaggcgcag cggtcgggct 4800gaacgggggg ttcgtgcaca cagcccagct tggagcgaac gacctacacc gaactgagat 4860acctacagcg tgagctatga gaaagcgcca cgcttcccga agggagaaag gcggacaggt 4920atccggtaag cggcagggtc ggaacaggag agcgcacgag ggagcttcca gggggaaacg 4980cctggtatct ttatagtcct gtcgggtttc gccacctctg acttgagcgt cgatttttgt 5040gatgctcgtc aggggggcgg agcctatgga aaaacgccag caacgcggcc tttttacggt 5100tcctggcctt ttgctgcggt tttgctcaca tgttctttcc tgcgttatcc cctgattctg 5160tggataaccg tattaccgcc tttgagtgag ctgataccgc tcgccgcagc cgaacgaccg 5220agcgcagcga gtcagtgagc gaggaagcgg aagagcgccc aatacgcaaa ccgcctctcc 5280ccgcgcgttg gccgattcat taatgcagct ggcacgacag gtttcccgac tggaaagcgg 5340gcagtgagcg caacgcaatt aatgtgagtt agctcactca ttaggcaccc caggctttac 5400actttatgct tccggctcgt atgttgtgtg gaattgtgag cggataacaa tttcacacag 5460gaaacagcta tgaccatgat tacgccagat ttaattaagg 5500404365DNAArtificial Sequenceconstructed sequence 40ctgcgcgctc gctcgctcac tgaggccgcc cgggcaaagc ccgggcgtcg ggcgaccttt 60ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120aggggttcct tgtagttaat gattaacccg ccatgctact tatctacgta gccatgctct 180aggaagatcg gaattcgccc ttaagctagc tagttattaa tagtaatcaa ttacggggtc 240attagttcat agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc 300tggctgaccg cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt 360aacgccaata gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca 420cttggcagta catcaagtgt atcatatgcc aagtacgccc cctattgacg tcaatgacgg 480taaatggccc gcctggcatt atgcccagta catgacctta tgggactttc ctacttggca 540gtacatctac gtattagtca tcgctattac catggtgatg cggttttggc agtacatcaa 600tgggcgtgga tagcggtttg actcacgggg atttccaagt ctccacccca ttgacgtcaa 660tgggagtttg ttttggcacc aaaatcaacg ggactttcca aaatgtcgta acaactccgc 720cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctgg 780tttagtgaac cgtcagatcc tgcatgaagc ttcgatcaac tacgcagaca ggtaccaaaa 840caaatgttct cgtcacgtgg gcatgaatct gatgctgttt ccctgcagac aatgcgagag 900aatgaatcag aattcaaata tctgcttcac tcacggacag aaagactgtt tagagtgctt 960tcccgtgtca gaatctcaac ccgtttctgt cgtcaaaaag gcgtatcaga aactgtgcta 1020cattcatcat atcatgggaa aggtgccaga cgcttgcact gcctgcgatc tggtcaatgt 1080ggatttggat gactgcatct ttgaacaata aatgatttaa atcaggtatg gcaggtgcta 1140actagtgatc cgatcttttt ccctctgcca aaaattatgg ggacatcatg aagccccttg 1200agcatctgac ttctggctaa taaaggaaat ttattttcat tgcaatagtg tgttggaatt 1260ttttgtgtct ctcactcgga tctagttaat caataaaccg gacattcgaa aggctgcggt 1320cgaacgcatg ctggggactc gagttaaggg cgaattcccg attaggatct tcctagagca 1380tggctacgta gataagtagc atggcgggtt aatcattaac tacaaggaac ccctagtgat 1440ggagttggcc actccctctc tgcgcgctcg ctcgctcact gaggccgggc gaccaaaggt 1500cgcccgacgc ccgggctttg cccgggcggc ctcagtgagc gagcgagcgc gcagccttaa 1560ttaacctaat tcactggccg tcgttttaca acgtcgtgac tgggaaaacc ctggcgttac 1620ccaacttaat cgccttgcag cacatccccc tttcgccagc tggcgtaata gcgaagaggc 1680ccgcaccgat cgcccttccc aacagttgcg cagcctgaat ggcgaatggg acgcgccctg 1740tagcggcgca ttaagcgcgg cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc 1800cagcgcccta gcgcccgctc ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg 1860ctttccccgt caagctctaa atcgggggct ccctttaggg ttccgattta gtgctttacg 1920gcacctcgac cccaaaaaac ttgattaggg tgatggttca cgtagtgggc catcgccccg 1980atagacggtt tttcgccctt tgacgctgga gttcacgttc ctcaatagtg gactcttgtt 2040ccaaactgga acaacactca accctatctc ggtctattct tttgatttat aagggatttt 2100tccgatttcg gcctattggt taaaaaatga gctgatttaa caaaaattta acgcgaattt 2160taacaaaata ttaacgttta taatttcagg tggcatcttt cggggaaatg tgcgcggaac 2220ccctatttgt ttatttttct aaatacattc aaatatgtat ccgctcatga gacaataacc 2280ctgataaatg cttcaataat attgaaaaag gaagagtatg agtattcaac atttccgtgt 2340cgcccttatt cccttttttg cggcattttg ccttcctgtt tttgctcacc cagaaacgct 2400ggtgaaagta aaagatgctg aagatcagtt gggtgcacga gtgggttaca tcgaactgga 2460tctcaatagt ggtaagatcc ttgagagttt tcgccccgaa gaacgttttc caatgatgag 2520cacttttaaa gttctgctat gtggcgcggt attatcccgt attgacgccg ggcaagagca 2580actcggtcgc cgcatacact attctcagaa tgacttggtt gagtactcac cagtcacaga 2640aaagcatctt acggatggca tgacagtaag agaattatgc agtgctgcca taaccatgag 2700tgataacact gcggccaact tacttctgac aacgatcgga ggaccgaagg agctaaccgc 2760ttttttgcac aacatggggg atcatgtaac tcgccttgat cgttgggaac cggagctgaa 2820tgaagccata ccaaacgacg agcgtgacac cacgatgcct gtagtaatgg taacaacgtt 2880gcgcaaacta ttaactggcg aactacttac tctagcttcc cggcaacaat taatagactg 2940gatggaggcg gataaagttg caggaccact tctgcgctcg gcccttccgg ctggctggtt 3000tattgctgat aaatctggag ccggtgagcg tgggtctcgc ggtatcattg cagcactggg 3060gccagatggt aagccctccc gtatcgtagt tatctacacg acggggagtc aggcaactat 3120ggatgaacga aatagacaga tcgctgagat aggtgcctca ctgattaagc attggtaact 3180gtcagaccaa gtttactcat atatacttta gattgattta aaacttcatt tttaatttaa 3240aaggatctag gtgaagatcc tttttgataa tctcatgacc aaaatccctt aacgtgagtt 3300ttcgttccac tgagcgtcag accccgtaga aaagatcaaa ggatcttctt gagatccttt 3360ttttctgcgc gtaatctgct gcttgcaaac aaaaaaacca ccgctaccag cggtggtttg 3420tttgccggat caagagctac caactctttt tccgaaggta actggcttca gcagagcgca 3480gataccaaat actgtccttc tagtgtagcc gtagttaggc caccacttca agaactctgt 3540agcaccgcct acatacctcg ctctgctaat cctgttacca gtggctgctg ccagtggcga 3600taagtcgtgt cttaccgggt tggactcaag acgatagtta ccggataagg cgcagcggtc 3660gggctgaacg gggggttcgt gcacacagcc cagcttggag cgaacgacct acaccgaact 3720gagataccta cagcgtgagc tatgagaaag cgccacgctt cccgaaggga gaaaggcgga 3780caggtatccg gtaagcggca gggtcggaac aggagagcgc acgagggagc ttccaggggg 3840aaacgcctgg tatctttata gtcctgtcgg gtttcgccac ctctgacttg agcgtcgatt 3900tttgtgatgc tcgtcagggg ggcggagcct atggaaaaac gccagcaacg cggccttttt 3960acggttcctg gccttttgct gcggttttgc tcacatgttc tttcctgcgt tatcccctga 4020ttctgtggat aaccgtatta ccgcctttga gtgagctgat accgctcgcc gcagccgaac 4080gaccgagcgc agcgagtcag tgagcgagga agcggaagag cgcccaatac gcaaaccgcc 4140tctccccgcg cgttggccga ttcattaatg cagctggcac gacaggtttc ccgactggaa 4200agcgggcagt gagcgcaacg caattaatgt gagttagctc actcattagg caccccaggc 4260tttacacttt atgcttccgg ctcgtatgtt gtgtggaatt gtgagcggat aacaatttca 4320cacaggaaac agctatgacc atgattacgc cagatttaat taagg 4365416627DNAArtificial Sequenceconstructed sequence 41ctgcgcgctc gctcgctcac tgaggccgcc cgggcaaagc ccgggcgtcg ggcgaccttt 60ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120aggggttcct tgtagttaat gattaacccg ccatgctact tatctacgta gccatgctct 180aggaagatcg gaattcgccc ttaagctagc tagttattaa tagtaatcaa ttacggggtc 240attagttcat agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc 300tggctgaccg cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt 360aacgccaata gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca 420cttggcagta catcaagtgt atcatatgcc aagtacgccc cctattgacg tcaatgacgg 480taaatggccc gcctggcatt atgcccagta catgacctta tgggactttc ctacttggca 540gtacatctac gtattagtca tcgctattac catggtgatg cggttttggc agtacatcaa 600tgggcgtgga tagcggtttg actcacgggg atttccaagt ctccacccca ttgacgtcaa 660tgggagtttg ttttggcacc aaaatcaacg ggactttcca aaatgtcgta acaactccgc 720cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctgg 780tttagtgaac cgtcagatcc tgcatgaagc ttcgatcaac tacgcagaca ggtaccaaaa 840caaatgttct cgtcacgtgg gcatgaatct gatgctgttt ccctgcagac aatgcgagag 900aatgaatcag aattcaaata tctgcttcac tcacggacag aaagactgtt tagagtgctt 960tcccgtgtca gaatctcaac ccgtttctgt cgtcaaaaag gcgtatcaga aactgtgcta 1020cattcatcat atcatgggaa aggtgccaga cgcttgcact gcctgcgatc tggtcaatgt 1080ggatttggat gactgcatct ttgaacaata aatgatttaa atcaggtatg gctgccgatg 1140gttatcttcc agattggctc gaggacaacc tctctgaggg cattcgcgag tggtgggcgc 1200tgaaacctgg agccccgaag cccaaagcca accagcaaaa gcaggacgac ggccggggtc 1260tggtgcttcc tggctacaag tacctcggac ccttcaacgg actcgacaag ggggagcccg 1320tcaacgcggc ggacgcagcg gccctcgagc acgacaaggc ctacgaccag cagctgcagg 1380cgggtgacaa tccgtacctg cggtataacc acgccgacgc cgagtttcag gagcgtctgc 1440aagaagatac gtcttttggg ggcaacctcg ggcgagcagt cttccaggcc aagaagcggg 1500ttctcgaacc tctcggtctg gttgaggaag gcgctaagac ggctcctgga aagaagagac 1560cggtagagcc atcaccccag cgttctccag actcctctac gggcatcggc aagaaaggcc 1620aacagcccgc cagaaaaaga ctcaattttg gtcagactgg cgactcagag tcagttccag 1680accctcaacc tctcggagaa cctccagcag cgccctctgg tgtgggacct aatacaatgg 1740ctgcaggcgg tggcgcacca atggcagaca ataacgaagg cgccgacgga gtgggtagtt 1800cctcgggaaa ttggcattgc gattccacat ggctgggcga cagagtcagg agacgcgcac 1860agatgcgtaa ggagaaaata ccgcatcagg cgccattcgc cattcaggct gcgcaactgt 1920tgggaagggc gatcggtgcg ggcctcttcg ctattacgcc agctggcgaa agggggatgt 1980gctgcaaggc gattcgtctc gcaacaccta cttcggctac agcaccccct gggggtattt 2040tgactttaac agattccact gccacttttc accacgtgac tggcagcgac tcatcaacaa 2100caactgggga ttccggccca agagactcag cttcaagctc ttcaacatcc aggtcaagga 2160ggtcacgcag aatgaaggca ccaagaccat cgccaataac ctcaccagca ccatccaggt 2220gtttacggac tcggagtacc agctgccgta cgttctcggc tctgcccacc agggctgcct 2280gcctccgttc ccggcggacg tgttcatgat tccccagtac ggctacctaa cactcaacaa 2340cggtagtcag gccgtgggac gctcctcctt ctactgcctg gaatactttc cttcgcagat 2400gctgagaacc ggcaacaact tccagtttac ttacaccttc gaggacgtgc ctttccacag 2460cagctacgcc cacagccaga gcttggaccg gctgatgaat cctctgattg accagtacct 2520gtactacttg tctcggactc aaacaacagg aggcacggca aatacgcaga ctctgggctt 2580cagccaaggt gggcctaata caatggccaa tcaggcaaag aactggctgc caggaccctg 2640ttaccgccaa caacgcgtgt caacgacaac cgggcaaaac aacaatagca actttgcctg 2700gactgctggg accaaatacc atctgaatgg aagaaattca ttggctaatc ctggcatcgc 2760tatggcaaca cacaaagacg acgaggagcg tttttttccc agtaacggga tcctgatttt 2820tggcaaacaa aatgctgcca gagacaatgc ggattacagc gatgtcatgc tcaccagcga 2880ggaagaaatc aaaaccacta accctgtggc tacagaggaa tacggtatcg tgggtgataa 2940cttgcagttg tataacacgg ctcctggttc ggtgtttgtc aacagccagg gggccttacc 3000cggtatggtc tggcagaacc gggacgtgta cctgcagggt cccatctggg ccaagattcc 3060tcacacggac ggcaacttcc acccgtcccc gctgatgggc ggctttggcc tgaaacatcc 3120tccgcctcag atcctgatca agaacacgcc tgtacctgcg gatcctccga ccaccttcaa 3180ccagtcaaag ctgaactctt tcatcacgca atacagcacc ggacaggtca gcgtggaaat 3240tgaatgggag ctgcagaagg aaaacagcaa gcgctggaac cccgagatcc agtacacctc 3300caactactac aaatctacaa gtgtggactt tgctgttaat acagaaggcg tgtactctga 3360accccgcccc attggcaccc gttacctcac ccgtaatctg taactagtga tccgatcttt 3420ttccctctgc caaaaattat ggggacatca tgaagcccct tgagcatctg acttctggct 3480aataaaggaa atttattttc attgcaatag tgtgttggaa ttttttgtgt ctctcactcg 3540gatctagtta atcaataaac cggacattcg aaaggctgcg gtcgaacgca tgctggggac 3600tcgagttaag ggcgaattcc cgattaggat cttcctagag catggctacg tagataagta 3660gcatggcggg ttaatcatta actacaagga acccctagtg atggagttgg ccactccctc 3720tctgcgcgct cgctcgctca ctgaggccgg gcgaccaaag gtcgcccgac gcccgggctt 3780tgcccgggcg gcctcagtga gcgagcgagc gcgcagcctt aattaaccta attcactggc 3840cgtcgtttta caacgtcgtg actgggaaaa ccctggcgtt acccaactta atcgccttgc 3900agcacatccc cctttcgcca gctggcgtaa tagcgaagag gcccgcaccg atcgcccttc 3960ccaacagttg cgcagcctga atggcgaatg ggacgcgccc tgtagcggcg cattaagcgc 4020ggcgggtgtg gtggttacgc gcagcgtgac cgctacactt gccagcgccc tagcgcccgc 4080tcctttcgct ttcttccctt cctttctcgc cacgttcgcc ggctttcccc gtcaagctct 4140aaatcggggg ctccctttag ggttccgatt tagtgcttta cggcacctcg accccaaaaa 4200acttgattag ggtgatggtt cacgtagtgg gccatcgccc cgatagacgg tttttcgccc 4260tttgacgctg gagttcacgt tcctcaatag tggactcttg ttccaaactg gaacaacact 4320caaccctatc tcggtctatt cttttgattt ataagggatt tttccgattt cggcctattg 4380gttaaaaaat gagctgattt aacaaaaatt taacgcgaat tttaacaaaa tattaacgtt 4440tataatttca ggtggcatct ttcggggaaa tgtgcgcgga acccctattt gtttattttt 4500ctaaatacat tcaaatatgt atccgctcat gagacaataa ccctgataaa tgcttcaata 4560atattgaaaa aggaagagta tgagtattca acatttccgt gtcgccctta ttcccttttt 4620tgcggcattt tgccttcctg tttttgctca cccagaaacg ctggtgaaag taaaagatgc 4680tgaagatcag ttgggtgcac gagtgggtta catcgaactg gatctcaata gtggtaagat 4740ccttgagagt tttcgccccg aagaacgttt tccaatgatg agcactttta aagttctgct 4800atgtggcgcg gtattatccc gtattgacgc cgggcaagag caactcggtc gccgcataca 4860ctattctcag aatgacttgg ttgagtactc accagtcaca gaaaagcatc ttacggatgg 4920catgacagta agagaattat gcagtgctgc cataaccatg agtgataaca ctgcggccaa 4980cttacttctg acaacgatcg gaggaccgaa ggagctaacc gcttttttgc acaacatggg 5040ggatcatgta actcgccttg atcgttggga accggagctg aatgaagcca taccaaacga 5100cgagcgtgac accacgatgc ctgtagtaat ggtaacaacg ttgcgcaaac tattaactgg 5160cgaactactt actctagctt cccggcaaca attaatagac tggatggagg cggataaagt 5220tgcaggacca cttctgcgct cggcccttcc ggctggctgg tttattgctg ataaatctgg 5280agccggtgag cgtgggtctc gcggtatcat tgcagcactg gggccagatg gtaagccctc 5340ccgtatcgta gttatctaca cgacggggag tcaggcaact atggatgaac gaaatagaca 5400gatcgctgag ataggtgcct cactgattaa gcattggtaa ctgtcagacc aagtttactc 5460atatatactt tagattgatt taaaacttca tttttaattt aaaaggatct aggtgaagat 5520cctttttgat aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc 5580agaccccgta gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg 5640ctgcttgcaa acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct 5700accaactctt tttccgaagg taactggctt cagcagagcg cagataccaa atactgtcct 5760tctagtgtag ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct 5820cgctctgcta atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg 5880gttggactca agacgatagt taccggataa ggcgcagcgg tcgggctgaa cggggggttc 5940gtgcacacag cccagcttgg agcgaacgac

ctacaccgaa ctgagatacc tacagcgtga 6000gctatgagaa agcgccacgc ttcccgaagg gagaaaggcg gacaggtatc cggtaagcgg 6060cagggtcgga acaggagagc gcacgaggga gcttccaggg ggaaacgcct ggtatcttta 6120tagtcctgtc gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg 6180ggggcggagc ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg 6240ctgcggtttt gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat 6300taccgccttt gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc 6360agtgagcgag gaagcggaag agcgcccaat acgcaaaccg cctctccccg cgcgttggcc 6420gattcattaa tgcagctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa 6480cgcaattaat gtgagttagc tcactcatta ggcaccccag gctttacact ttatgcttcc 6540ggctcgtatg ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaa acagctatga 6600ccatgattac gccagattta attaagg 6627426622DNAArtificial Sequenceconstructed sequence 42ctgcgcgctc gctcgctcac tgaggccgcc cgggcaaagc ccgggcgtcg ggcgaccttt 60ggtcgcccgg cctcagtgag cgagcgagcg cgcagagagg gagtggccaa ctccatcact 120aggggttcct tgtagttaat gattaacccg ccatgctact tatctacgta gccatgctct 180aggaagatcg gaattcgccc ttaagctagc tagttattaa tagtaatcaa ttacggggtc 240attagttcat agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc 300tggctgaccg cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt 360aacgccaata gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca 420cttggcagta catcaagtgt atcatatgcc aagtacgccc cctattgacg tcaatgacgg 480taaatggccc gcctggcatt atgcccagta catgacctta tgggactttc ctacttggca 540gtacatctac gtattagtca tcgctattac catggtgatg cggttttggc agtacatcaa 600tgggcgtgga tagcggtttg actcacgggg atttccaagt ctccacccca ttgacgtcaa 660tgggagtttg ttttggcacc aaaatcaacg ggactttcca aaatgtcgta acaactccgc 720cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctgg 780tttagtgaac cgtcagatcc tgcatgaagc ttcgatcaac tacgcagaca ggtaccaaaa 840caaatgttct cgtcacgtgg gcatgaatct gatgctgttt ccctgcagac aatgcgagag 900aatgaatcag aattcaaata tctgcttcac tcacggacag aaagactgtt tagagtgctt 960tcccgtgtca gaatctcaac ccgtttctgt cgtcaaaaag gcgtatcaga aactgtgcta 1020cattcatcat atcatgggaa aggtgccaga cgcttgcact gcctgcgatc tggtcaatgt 1080ggatttggat gactgcatct ttgaacaata aatgatttaa atcaggtatg gctgccgatg 1140gttatcttcc agattggctc gaggacaacc tctctgaggg cattcgcgag tggtgggcgc 1200tgaaacctgg agccccgaag cccaaagcca accagcaaaa gcaggacgac ggccggggtc 1260tggtgcttcc tggctacaag tacctcggac ccttcaacgg actcgacaag ggggagcccg 1320tcaacgcggc ggacgcagcg gccctcgagc acgacaaggc ctacgaccag cagctgcagg 1380cgggtgacaa tccgtacctg cggtataacc acgccgacgc cgagtttcag gagcgtctgc 1440aagaagatac gtcttttggg ggcaacctcg ggcgagcagt cttccaggcc aagaagcggg 1500ttctcgaacc tctcggtctg gttgaggaag gcgctaagac ggctcctgga aagaagagac 1560cggtagagcc atcaccccag cgttctccag actcctctac gggcatcggc aagaaaggcc 1620aacagcccgc cagaaaaaga ctcaattttg gtcagactgg cgactcagag tcagttccag 1680accctcaacc tctcggagaa cctccagcag cgccctctgg tgtgggacct aatacaatgg 1740ctgcaggcgg tggcgcacca atggcagaca ataacgaagg cgccgacgga gtgggtagtt 1800cctcgggaaa ttggcattgc gattccacat ggctgggcga cagagtcatc accaccagca 1860cccgaacctg ggccctgccc acctacaaca accacctcta caagcaaatc tccaacggga 1920catcgggagg agccaccaac gacaacacct acttcggcta cagcaccccc tgggggtatt 1980ttgactttaa cagattccac tgccactttt caccacgtga ctggcagcga ctcatcaaca 2040acaactgggg attccggccc aagagactca gcttcaagct cttcaacatc caggtcaagg 2100aggtcacgca gaatgaaggc accaagacca tcgccaataa cctcaccagc accatccagg 2160tgtttacgga ctcggagtac cagctgccgt acgttctcgg ctctgcccac cagggctgcc 2220tgcctccgtt cccggcggac gtgttcatga ttccccagta cggctaccta acactcaaca 2280acggtagtca ggccgtggga cgctcctcct tctactgcct ggaatacttt ccttcgcaga 2340tgctgagaac cggcaacaac ttccagttta cttacacctt cgaggacgtg cctttccaca 2400gcagctacgc ccacagccag agcttggacc ggctgatgaa tcctcggaga cgcgcacaga 2460tgcgtaagga gaaaataccg catcaggcgc cattcgccat tcaggctgcg caactgttgg 2520gaagggcgat cggtgcgggc ctcttcgcta ttacgccagc tggcgaaagg gggatgtgct 2580gcaaggcgat tcgtctcgtg gccaatcagg caaagaactg gctgccagga ccctgttacc 2640gccaacaacg cgtgtcaacg acaaccgggc aaaacaacaa tagcaacttt gcctggactg 2700ctgggaccaa ataccatctg aatggaagaa attcattggc taatcctggc atcgctatgg 2760caacacacaa agacgacgag gagcgttttt ttcccagtaa cgggatcctg atttttggca 2820aacaaaatgc tgccagagac aatgcggatt acagcgatgt catgctcacc agcgaggaag 2880aaatcaaaac cactaaccct gtggctacag aggaatacgg tatcgtgggt gataacttgc 2940agttgtataa cacggctcct ggttcggtgt ttgtcaacag ccagggggcc ttacccggta 3000tggtctggca gaaccgggac gtgtacctgc agggtcccat ctgggccaag attcctcaca 3060cggacggcaa cttccacccg tccccgctga tgggcggctt tggcctgaaa catcctccgc 3120ctcagatcct gatcaagaac acgcctgtac ctgcggatcc tccgaccacc ttcaaccagt 3180caaagctgaa ctctttcatc acgcaataca gcaccggaca ggtcagcgtg gaaattgaat 3240gggagctgca gaaggaaaac agcaagcgct ggaaccccga gatccagtac acctccaact 3300actacaaatc tacaagtgtg gactttgctg ttaatacaga aggcgtgtac tctgaacccc 3360gccccattgg cacccgttac ctcacccgta atctgtaact agtgatccga tctttttccc 3420tctgccaaaa attatgggga catcatgaag ccccttgagc atctgacttc tggctaataa 3480aggaaattta ttttcattgc aatagtgtgt tggaattttt tgtgtctctc actcggatct 3540agttaatcaa taaaccggac attcgaaagg ctgcggtcga acgcatgctg gggactcgag 3600ttaagggcga attcccgatt aggatcttcc tagagcatgg ctacgtagat aagtagcatg 3660gcgggttaat cattaactac aaggaacccc tagtgatgga gttggccact ccctctctgc 3720gcgctcgctc gctcactgag gccgggcgac caaaggtcgc ccgacgcccg ggctttgccc 3780gggcggcctc agtgagcgag cgagcgcgca gccttaatta acctaattca ctggccgtcg 3840ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca acttaatcgc cttgcagcac 3900atcccccttt cgccagctgg cgtaatagcg aagaggcccg caccgatcgc ccttcccaac 3960agttgcgcag cctgaatggc gaatgggacg cgccctgtag cggcgcatta agcgcggcgg 4020gtgtggtggt tacgcgcagc gtgaccgcta cacttgccag cgccctagcg cccgctcctt 4080tcgctttctt cccttccttt ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc 4140gggggctccc tttagggttc cgatttagtg ctttacggca cctcgacccc aaaaaacttg 4200attagggtga tggttcacgt agtgggccat cgccccgata gacggttttt cgccctttga 4260cgctggagtt cacgttcctc aatagtggac tcttgttcca aactggaaca acactcaacc 4320ctatctcggt ctattctttt gatttataag ggatttttcc gatttcggcc tattggttaa 4380aaaatgagct gatttaacaa aaatttaacg cgaattttaa caaaatatta acgtttataa 4440tttcaggtgg catctttcgg ggaaatgtgc gcggaacccc tatttgttta tttttctaaa 4500tacattcaaa tatgtatccg ctcatgagac aataaccctg ataaatgctt caataatatt 4560gaaaaaggaa gagtatgagt attcaacatt tccgtgtcgc ccttattccc ttttttgcgg 4620cattttgcct tcctgttttt gctcacccag aaacgctggt gaaagtaaaa gatgctgaag 4680atcagttggg tgcacgagtg ggttacatcg aactggatct caatagtggt aagatccttg 4740agagttttcg ccccgaagaa cgttttccaa tgatgagcac ttttaaagtt ctgctatgtg 4800gcgcggtatt atcccgtatt gacgccgggc aagagcaact cggtcgccgc atacactatt 4860ctcagaatga cttggttgag tactcaccag tcacagaaaa gcatcttacg gatggcatga 4920cagtaagaga attatgcagt gctgccataa ccatgagtga taacactgcg gccaacttac 4980ttctgacaac gatcggagga ccgaaggagc taaccgcttt tttgcacaac atgggggatc 5040atgtaactcg ccttgatcgt tgggaaccgg agctgaatga agccatacca aacgacgagc 5100gtgacaccac gatgcctgta gtaatggtaa caacgttgcg caaactatta actggcgaac 5160tacttactct agcttcccgg caacaattaa tagactggat ggaggcggat aaagttgcag 5220gaccacttct gcgctcggcc cttccggctg gctggtttat tgctgataaa tctggagccg 5280gtgagcgtgg gtctcgcggt atcattgcag cactggggcc agatggtaag ccctcccgta 5340tcgtagttat ctacacgacg gggagtcagg caactatgga tgaacgaaat agacagatcg 5400ctgagatagg tgcctcactg attaagcatt ggtaactgtc agaccaagtt tactcatata 5460tactttagat tgatttaaaa cttcattttt aatttaaaag gatctaggtg aagatccttt 5520ttgataatct catgaccaaa atcccttaac gtgagttttc gttccactga gcgtcagacc 5580ccgtagaaaa gatcaaagga tcttcttgag atcctttttt tctgcgcgta atctgctgct 5640tgcaaacaaa aaaaccaccg ctaccagcgg tggtttgttt gccggatcaa gagctaccaa 5700ctctttttcc gaaggtaact ggcttcagca gagcgcagat accaaatact gtccttctag 5760tgtagccgta gttaggccac cacttcaaga actctgtagc accgcctaca tacctcgctc 5820tgctaatcct gttaccagtg gctgctgcca gtggcgataa gtcgtgtctt accgggttgg 5880actcaagacg atagttaccg gataaggcgc agcggtcggg ctgaacgggg ggttcgtgca 5940cacagcccag cttggagcga acgacctaca ccgaactgag atacctacag cgtgagctat 6000gagaaagcgc cacgcttccc gaagggagaa aggcggacag gtatccggta agcggcaggg 6060tcggaacagg agagcgcacg agggagcttc cagggggaaa cgcctggtat ctttatagtc 6120ctgtcgggtt tcgccacctc tgacttgagc gtcgattttt gtgatgctcg tcaggggggc 6180ggagcctatg gaaaaacgcc agcaacgcgg cctttttacg gttcctggcc ttttgctgcg 6240gttttgctca catgttcttt cctgcgttat cccctgattc tgtggataac cgtattaccg 6300cctttgagtg agctgatacc gctcgccgca gccgaacgac cgagcgcagc gagtcagtga 6360gcgaggaagc ggaagagcgc ccaatacgca aaccgcctct ccccgcgcgt tggccgattc 6420attaatgcag ctggcacgac aggtttcccg actggaaagc gggcagtgag cgcaacgcaa 6480ttaatgtgag ttagctcact cattaggcac cccaggcttt acactttatg cttccggctc 6540gtatgttgtg tggaattgtg agcggataac aatttcacac aggaaacagc tatgaccatg 6600attacgccag atttaattaa gg 6622437336DNAArtificial Sequenceconstructed sequence 43atgccggggt tttacgagat tgtgattaag gtccccagcg accttgacga gcatctgccc 60ggcatttctg acagctttgt gaactgggtg gccgagaagg aatgggagtt gccgccagat 120tctgacatgg atctgaatct gattgagcag gcacccctga ccgtggccga gaagctgcag 180cgcgactttc tgacggaatg gcgccgtgtg agtaaggccc cggaggctct tttctttgtg 240caatttgaga agggagagag ctacttccac atgcacgtgc tcgtggaaac caccggggtg 300aaatccatgg ttttgggacg tttcctgagt cagattcgcg aaaaactgat tcagagaatt 360taccgcggga tcgagccgac tttgccaaac tggttcgcgg tcacaaagac cagaaatggc 420gccggaggcg ggaacaaggt ggtggatgag tgctacatcc ccaattactt gctccccaaa 480acccagcctg agctccagtg ggcgtggact aatatggaac agtatttaag cgcctgtttg 540aatctcacgg agcgtaaacg gttggtggcg cagcatctga cgcacgtgtc gcagacgcag 600gagcagaaca aagagaatca gaatcccaat tctgatgcgc cggtgatcag atcaaaaact 660tcagccaggt acatggagct ggtcgggtgg ctcgtggaca aggggattac ctcggagaag 720cagtggatcc aggaggacca ggcctcatac atctccttca atgcggcctc caactcgcgg 780tcccaaatca aggctgcctt ggacaatgcg ggaaagatta tgagcctgac taaaaccgcc 840cccgactacc tggtgggcca gcagcccgtg gaggacattt ccagcaatcg gatttataaa 900attttggaac taaacgggta cgatccccaa tatgcggctt ccgtctttct gggatgggcc 960acgaaaaagt tcggcaagag gaacaccatc tggctgtttg ggcctgcaac taccgggaag 1020accaacatcg cggaggccat agcccacact gtgcccttct acgggtgcgt aaactggacc 1080aatgagaact ttcccttcaa cgactgtgtc gacaagatgg tgatctggtg ggaggagggg 1140aagatgaccg ccaaggtcgt ggagtcggcc aaagccattc tcggaggaag caaggtgcgc 1200gtggaccaga aatgcaagtc ctcggcccag atagacccga ctcccgtgat cgtcacctcc 1260aacaccaaca tgtgcgccgt gattgacggg aactcaacga ccttcgaaca ccagcagccg 1320ttgcaagacc ggatgttcaa atttgaactc acccgccgtc tggatcatga ctttgggaag 1380gtcaccaagc aggaagtcaa agactttttc cggtgggcaa aggatcacgt ggttgaggtg 1440gagcatgaat tctacgtcaa aaagggtgga gccaagaaaa gacccgcccc cagtgacgca 1500gatataagtg agcccaaacg ggtgcgcgag tcagttgcgc agccatcgac gtcagacgcg 1560gaagcttcga tcaactacgc agacaggtac caaaacaaat gttctcgtca cgtgggcatg 1620aatctgatgc tgtttccctg cagacaatgc gagagaatga atcagaattc aaatatctgc 1680ttcactcacg gacagaaaga ctgtttagag tgctttcccg tgtcagaatc tcaacccgtt 1740tctgtcgtca aaaaggcgta tcagaaactg tgctacattc atcatatcat gggaaaggtg 1800ccagacgctt gcactgcctg cgatctggtc aatgtggatt tggatgactg catctttgaa 1860caataaatga tttaaatcag gtatggctgc cgatggttat cttccagatt ggctcgagga 1920caacctctct gagggcattc gcgagtggtg ggcgctgaaa cctggagccc cgaagcccaa 1980agccaaccag caaaagcagg acgacggccg gggtctggtg cttcctggct acaagtacct 2040cggacccttc aacggactcg acaaggggga gcccgtcaac gcggcggacg cagcggccct 2100cgagcacgac aaggcctacg accagcagct gcaggcgggt gacaatccgt acctgcggta 2160taaccacgcc gacgccgagt ttcaggagcg tctgcaagaa gatacgtctt ttgggggcaa 2220cctcgggcga gcagtcttcc aggccaagaa gcgggttctc gaacctctcg gtctggttga 2280ggaaggcgct aagacggctc ctggaaagaa gagaccggta gagccatcac cccagcgttc 2340tccagactcc tctacgggca tcggcaagaa aggccaacag cccgccagaa aaagactcaa 2400ttttggtcag actggcgact cagagtcagt tccagaccct caacctctcg gagaacctcc 2460agcagcgccc tctggtgtgg gacctaatac aatggctgca ggcggtggcg caccaatggc 2520agacaataac gaaggcgccg acggagtggg tagttcctcg ggaaattggc attgcgattc 2580cacatggctg ggcgacagag tcatcaccac cagcacccga acctgggccc tgcccaccta 2640caacaaccac ctctacaagc aaatctccaa cgggacatcg ggaggagcca ccaacgacaa 2700cacctacttc ggctacagca ccccctgggg gtattttgac tttaacagat tccactgcca 2760cttttcacca cgtgactggc agcgactcat caacaacaac tggggattcc ggcccaagag 2820actcagcttc aagctcttca acatccaggt caaggaggtc acgcagaatg aaggcaccaa 2880gaccatcgcc aataacctca ccagcaccat ccaggtgttt acggactcgg agtaccagct 2940gccgtacgtt ctcggctctg cccaccaggg ctgcctgcct ccgttcccgg cggacgtgtt 3000catgattccc cagtacggct acctaacact caacaacggt agtcaggccg tgggacgctc 3060ctccttctac tgcctggaat actttccttc gcagatgctg agaaccggca acaacttcca 3120gtttacttac accttcgagg acgtgccttt ccacagcagc tacgcccaca gccagagctt 3180ggaccggctg atgaatcctc tgattgacca gtacctgtac tacttgtctc ggactcaaac 3240aacaggaggc acggcaaata cgcagactct gggcttcagc caaggtgggc ctaatacaat 3300ggccaatcag gcaaagaact ggctgccagg accctgttac cgccaacaac gcgtctcaac 3360gacaaccggg caaaacaaca atagcaactt tgcctggact gctgggacca aataccatct 3420gaatggaaga aattcattgg ctaatcctgg catcgctatg gcaacacaca aagacgacga 3480ggagcgtttt tttcccagta acgggatcct gatttttggc aaacaaaatg ctgccagaga 3540caatgcggat tacagcgatg tcatgctcac cagcgaggaa gaaatcaaaa ccactaaccc 3600tgtggctaca gaggaatacg gtatcgtggc agataacttg cagcagcaaa acacggctcc 3660tcaaattgga actgtcaaca gccagggggc cttacccggt atggtctggc agaaccggga 3720cgtgtacctg cagggtccca tctgggccaa gattcctcac acggacggca acttccaccc 3780gtctccgctg atgggcggct ttggcctgaa acatcctccg cctcagatcc tgatcaagaa 3840cacgcctgta cctgcggatc ctccgaccac cttcaaccag tcaaagctga actctttcat 3900cacgcaatac agcaccggac aggtcagcgt ggaaattgaa tgggagctgc agaaggaaaa 3960cagcaagcgc tggaaccccg agatccagta cacctccaac tactacaaat ctacaagtgt 4020ggactttgct gttaatacag aaggcgtgta ctctgaaccc cgccccattg gcacccgtta 4080cctcacccgt aatctgtaat tgcctgttaa tcaataaacc ggttgattcg tttcagttga 4140actttggtct ctgcgaaggg cgaattcgtt taaacctgca ggactagagg tcctgtatta 4200gaggtcacgt gagtgttttg cgacattttg cgacaccatg tggtcacgct gggtatttaa 4260gcccgagtga gcacgcaggg tctccatttt gaagcgggag gtttgaacgc gcagccgcca 4320agccgaattc tgcagatatc catcacactg gcggccgctc gactagagcg gccgccaccg 4380cggtggagct ccagcttttg ttccctttag tgagggttaa ttgcgcgctt ggcgtaatca 4440tggtcatagc tgtttcctgt gtgaaattgt tatccgctca caattccaca caacatacga 4500gccggaagca taaagtgtaa agcctggggt gcctaatgag tgagctaact cacattaatt 4560gcgttgcgct cactgcccgc tttccagtcg ggaaacctgt cgtgccagct gcattaatga 4620atcggccaac gcgcggggag aggcggtttg cgtattgggc gctcttccgc ttcctcgctc 4680actgactcgc tgcgctcggt cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg 4740gtaatacggt tatccacaga atcaggggat aacgcaggaa agaacatgtg agcaaaaggc 4800cagcaaaagg ccaggaaccg taaaaaggcc gcgttgctgg cgtttttcca taggctccgc 4860ccccctgacg agcatcacaa aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga 4920ctataaagat accaggcgtt tccccctgga agctccctcg tgcgctctcc tgttccgacc 4980ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc gctttctcat 5040agctcacgct gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg 5100cacgaacccc ccgttcagcc cgaccgctgc gccttatccg gtaactatcg tcttgagtcc 5160aacccggtaa gacacgactt atcgccactg gcagcagcca ctggtaacag gattagcaga 5220gcgaggtatg taggcggtgc tacagagttc ttgaagtggt ggcctaacta cggctacact 5280agaagaacag tatttggtat ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt 5340ggtagctctt gatccggcaa acaaaccacc gctggtagcg gtggtttttt tgtttgcaag 5400cagcagatta cgcgcagaaa aaaaggatct caagaagatc ctttgatctt ttctacgggg 5460tctgacgctc agtggaacga aaactcacgt taagggattt tggtcatgag attatcaaaa 5520aggatcttca cctagatcct tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata 5580tatgagtaaa cttggtctga cagttaccaa tgcttaatca gtgaggcacc tatctcagcg 5640atctgtctat ttcgttcatc catagttgcc tgactccccg tcgtgtagat aactacgata 5700cgggagggct taccatctgg ccccagtgct gcaatgatac cgcgagaccc acgctcaccg 5760gctccagatt tatcagcaat aaaccagcca gccggaaggg ccgagcgcag aagtggtcct 5820gcaactttat ccgcctccat ccagtctatt aattgttgcc gggaagctag agtaagtagt 5880tcgccagtta atagtttgcg caacgttgtt gccattgcta caggcatcgt ggtgtcacgc 5940tcgtcgtttg gtatggcttc attcagctcc ggttcccaac gatcaaggcg agttacatga 6000tcccccatgt tgtgcaaaaa agcggttagc tccttcggtc ctccgatcgt tgtcagaagt 6060aagttggccg cagtgttatc actcatggtt atggcagcac tgcataattc tcttactgtc 6120atgccatccg taagatgctt ttctgtgact ggtgagtact caaccaagtc attctgagaa 6180tagtgtatgc ggcgaccgag ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca 6240catagcagaa ctttaaaagt gctcatcatt ggaaaacgtt cttcggggcg aaaactctca 6300aggatcttac cgctgttgag atccagttcg atgtaaccca ctcgtgcacc caactgatct 6360tcagcatctt ttactttcac cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc 6420gcaaaaaagg gaataagggc gacacggaaa tgttgaatac tcatactctt cctttttcaa 6480tattattgaa gcatttatca gggttattgt ctcatgagcg gatacatatt tgaatgtatt 6540tagaaaaata aacaaatagg ggttccgcgc acatttcccc gaaaagtgcc acctaaattg 6600taagcgttaa tattttgtta aaattcgcgt taaatttttg ttaaatcagc tcatttttta 6660accaataggc cgaaatcggc aaaatccctt ataaatcaaa agaatagacc gagatagggt 6720tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac tccaacgtca 6780aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca ccctaatcaa 6840gttttttggg gtcgaggtgc cgtaaagcac taaatcggaa ccctaaaggg agcccccgat 6900ttagagcttg acggggaaag ccggcgaacg tggcgagaaa ggaagggaag aaagcgaaag 6960gagcgggcgc tagggcgctg gcaagtgtag cggtcacgct gcgcgtaacc accacacccg 7020ccgcgcttaa tgcgccgcta cagggcgcgt cccattcgcc attcaggctg cgcaactgtt 7080gggaagggcg atcggtgcgg gcctcttcgc tattacgcca gctggcgaaa gggggatgtg 7140ctgcaaggcg attaagttgg gtaacgccag ggttttccca gtcacgacgt tgtaaaacga 7200cggccagtga gcgcgcgtaa tacgactcac tatagggcga attgggtacc gggccccccc 7260tcgatcgagg tcgacggtat cgggggagct cgcagggtct ccattttgaa gcgggaggtt 7320tgaacgcgca gccgcc 7336447336DNAArtificial Sequenceconstructed sequence 44atgccggggt tttacgagat tgtgattaag gtccccagcg accttgacga gcatctgccc 60ggcatttctg acagctttgt gaactgggtg gccgagaagg aatgggagtt gccgccagat 120tctgacatgg atctgaatct gattgagcag

gcacccctga ccgtggccga gaagctgcag 180cgcgactttc tgacggaatg gcgccgtgtg agtaaggccc cggaggctct tttctttgtg 240caatttgaga agggagagag ctacttccac atgcacgtgc tcgtggaaac caccggggtg 300aaatccatgg ttttgggacg tttcctgagt cagattcgcg aaaaactgat tcagagaatt 360taccgcggga tcgagccgac tttgccaaac tggttcgcgg tcacaaagac cagaaatggc 420gccggaggcg ggaacaaggt ggtggatgag tgctacatcc ccaattactt gctccccaaa 480acccagcctg agctccagtg ggcgtggact aatatggaac agtatttaag cgcctgtttg 540aatctcacgg agcgtaaacg gttggtggcg cagcatctga cgcacgtgtc gcagacgcag 600gagcagaaca aagagaatca gaatcccaat tctgatgcgc cggtgatcag atcaaaaact 660tcagccaggt acatggagct ggtcgggtgg ctcgtggaca aggggattac ctcggagaag 720cagtggatcc aggaggacca ggcctcatac atctccttca atgcggcctc caactcgcgg 780tcccaaatca aggctgcctt ggacaatgcg ggaaagatta tgagcctgac taaaaccgcc 840cccgactacc tggtgggcca gcagcccgtg gaggacattt ccagcaatcg gatttataaa 900attttggaac taaacgggta cgatccccaa tatgcggctt ccgtctttct gggatgggcc 960acgaaaaagt tcggcaagag gaacaccatc tggctgtttg ggcctgcaac taccgggaag 1020accaacatcg cggaggccat agcccacact gtgcccttct acgggtgcgt aaactggacc 1080aatgagaact ttcccttcaa cgactgtgtc gacaagatgg tgatctggtg ggaggagggg 1140aagatgaccg ccaaggtcgt ggagtcggcc aaagccattc tcggaggaag caaggtgcgc 1200gtggaccaga aatgcaagtc ctcggcccag atagacccga ctcccgtgat cgtcacctcc 1260aacaccaaca tgtgcgccgt gattgacggg aactcaacga ccttcgaaca ccagcagccg 1320ttgcaagacc ggatgttcaa atttgaactc acccgccgtc tggatcatga ctttgggaag 1380gtcaccaagc aggaagtcaa agactttttc cggtgggcaa aggatcacgt ggttgaggtg 1440gagcatgaat tctacgtcaa aaagggtgga gccaagaaaa gacccgcccc cagtgacgca 1500gatataagtg agcccaaacg ggtgcgcgag tcagttgcgc agccatcgac gtcagacgcg 1560gaagcttcga tcaactacgc agacaggtac caaaacaaat gttctcgtca cgtgggcatg 1620aatctgatgc tgtttccctg cagacaatgc gagagaatga atcagaattc aaatatctgc 1680ttcactcacg gacagaaaga ctgtttagag tgctttcccg tgtcagaatc tcaacccgtt 1740tctgtcgtca aaaaggcgta tcagaaactg tgctacattc atcatatcat gggaaaggtg 1800ccagacgctt gcactgcctg cgatctggtc aatgtggatt tggatgactg catctttgaa 1860caataaatga tttaaatcag gtatggctgc cgatggttat cttccagatt ggctcgagga 1920caacctctct gagggcattc gcgagtggtg ggcgctgaaa cctggagccc cgaagcccaa 1980agccaaccag caaaagcagg acgacggccg gggtctggtg cttcctggct acaagtacct 2040cggacccttc aacggactcg acaaggggga gcccgtcaac gcggcggacg cagcggccct 2100cgagcacgac aaggcctacg accagcagct gcaggcgggt gacaatccgt acctgcggta 2160taaccacgcc gacgccgagt ttcaggagcg tctgcaagaa gatacgtctt ttgggggcaa 2220cctcgggcga gcagtcttcc aggccaagaa gcgggttctc gaacctctcg gtctggttga 2280ggaaggcgct aagacggctc ctggaaagaa gagaccggta gagccatcac cccagcgttc 2340tccagactcc tctacgggca tcggcaagaa aggccaacag cccgccagaa aaagactcaa 2400ttttggtcag actggcgact cagagtcagt tccagaccct caacctctcg gagaacctcc 2460agcagcgccc tctggtgtgg gacctaatac aatggctgca ggcggtggcg caccaatggc 2520agacaataac gaaggcgccg acggagtggg tagttcctcg ggaaattggc attgcgattc 2580cacatggctg ggcgacagag tcatcaccac cagcacccga acctgggccc tgcccaccta 2640caacaaccac ctctacaagc aaatctccaa cgggacatcg ggaggagcca ccaacgacaa 2700cacctacttc ggctacagca ccccctgggg gtattttgac tttaacagat tccactgcca 2760cttttcacca cgtgactggc agcgactcat caacaacaac tggggattcc ggcccaagag 2820actcagcttc aagctcttca acatccaggt caaggaggtc acgcagaatg aaggcaccaa 2880gaccatcgcc aataacctca ccagcaccat ccaggtgttt acggactcgg agtaccagct 2940gccgtacgtt ctcggctctg cccaccaggg ctgcctgcct ccgttcccgg cggacgtgtt 3000catgattccc cagtacggct acctaacact caacaacggt agtcaggccg tgggacgctc 3060ctccttctac tgcctggaat actttccttc gcagatgctg agaaccggca acaacttcca 3120gtttacttac accttcgagg acgtgccttt ccacagcagc tacgcccaca gccagagctt 3180ggaccggctg atgaatcctc tgattgacca gtacctgtac tacttgtctc ggactcaaac 3240aacaggaggc acggcaaata cgcagactct gggcttcagc caaggtgggc ctaatacaat 3300ggccaatcag gcaaagaact ggctgccagg accctgttac cgccaacaac gcgtctcaac 3360gacaaccggg caaaacaaca atagcaactt tgcctggact gctgggacca aataccatct 3420gaatggaaga aattcattgg ctaatcctgg catcgctatg gcaacacaca aagacgacga 3480ggagcgtttt tttcccagta acgggatcct gatttttggc aaacaaaatg ctgccagaga 3540caatgcggat tacagcgatg tcatgctcac cagcgaggaa gaaatcaaaa ccactaaccc 3600tgtggctaca gaggaatacg gtatcgtggg tgataacttg cagttgtata acacggctcc 3660tggttcggtg tttgtcaaca gccagggggc cttacccggt atggtctggc agaaccggga 3720cgtgtacctg cagggtccca tctgggccaa gattcctcac acggacggca acttccaccc 3780gtctccgctg atgggcggct ttggcctgaa acatcctccg cctcagatcc tgatcaagaa 3840cacgcctgta cctgcggatc ctccgaccac cttcaaccag tcaaagctga actctttcat 3900cacgcaatac agcaccggac aggtcagcgt ggaaattgaa tgggagctgc agaaggaaaa 3960cagcaagcgc tggaaccccg agatccagta cacctccaac tactacaaat ctacaagtgt 4020ggactttgct gttaatacag aaggcgtgta ctctgaaccc cgccccattg gcacccgtta 4080cctcacccgt aatctgtaat tgcctgttaa tcaataaacc ggttgattcg tttcagttga 4140actttggtct ctgcgaaggg cgaattcgtt taaacctgca ggactagagg tcctgtatta 4200gaggtcacgt gagtgttttg cgacattttg cgacaccatg tggtcacgct gggtatttaa 4260gcccgagtga gcacgcaggg tctccatttt gaagcgggag gtttgaacgc gcagccgcca 4320agccgaattc tgcagatatc catcacactg gcggccgctc gactagagcg gccgccaccg 4380cggtggagct ccagcttttg ttccctttag tgagggttaa ttgcgcgctt ggcgtaatca 4440tggtcatagc tgtttcctgt gtgaaattgt tatccgctca caattccaca caacatacga 4500gccggaagca taaagtgtaa agcctggggt gcctaatgag tgagctaact cacattaatt 4560gcgttgcgct cactgcccgc tttccagtcg ggaaacctgt cgtgccagct gcattaatga 4620atcggccaac gcgcggggag aggcggtttg cgtattgggc gctcttccgc ttcctcgctc 4680actgactcgc tgcgctcggt cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg 4740gtaatacggt tatccacaga atcaggggat aacgcaggaa agaacatgtg agcaaaaggc 4800cagcaaaagg ccaggaaccg taaaaaggcc gcgttgctgg cgtttttcca taggctccgc 4860ccccctgacg agcatcacaa aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga 4920ctataaagat accaggcgtt tccccctgga agctccctcg tgcgctctcc tgttccgacc 4980ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc gctttctcat 5040agctcacgct gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg 5100cacgaacccc ccgttcagcc cgaccgctgc gccttatccg gtaactatcg tcttgagtcc 5160aacccggtaa gacacgactt atcgccactg gcagcagcca ctggtaacag gattagcaga 5220gcgaggtatg taggcggtgc tacagagttc ttgaagtggt ggcctaacta cggctacact 5280agaagaacag tatttggtat ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt 5340ggtagctctt gatccggcaa acaaaccacc gctggtagcg gtggtttttt tgtttgcaag 5400cagcagatta cgcgcagaaa aaaaggatct caagaagatc ctttgatctt ttctacgggg 5460tctgacgctc agtggaacga aaactcacgt taagggattt tggtcatgag attatcaaaa 5520aggatcttca cctagatcct tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata 5580tatgagtaaa cttggtctga cagttaccaa tgcttaatca gtgaggcacc tatctcagcg 5640atctgtctat ttcgttcatc catagttgcc tgactccccg tcgtgtagat aactacgata 5700cgggagggct taccatctgg ccccagtgct gcaatgatac cgcgagaccc acgctcaccg 5760gctccagatt tatcagcaat aaaccagcca gccggaaggg ccgagcgcag aagtggtcct 5820gcaactttat ccgcctccat ccagtctatt aattgttgcc gggaagctag agtaagtagt 5880tcgccagtta atagtttgcg caacgttgtt gccattgcta caggcatcgt ggtgtcacgc 5940tcgtcgtttg gtatggcttc attcagctcc ggttcccaac gatcaaggcg agttacatga 6000tcccccatgt tgtgcaaaaa agcggttagc tccttcggtc ctccgatcgt tgtcagaagt 6060aagttggccg cagtgttatc actcatggtt atggcagcac tgcataattc tcttactgtc 6120atgccatccg taagatgctt ttctgtgact ggtgagtact caaccaagtc attctgagaa 6180tagtgtatgc ggcgaccgag ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca 6240catagcagaa ctttaaaagt gctcatcatt ggaaaacgtt cttcggggcg aaaactctca 6300aggatcttac cgctgttgag atccagttcg atgtaaccca ctcgtgcacc caactgatct 6360tcagcatctt ttactttcac cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc 6420gcaaaaaagg gaataagggc gacacggaaa tgttgaatac tcatactctt cctttttcaa 6480tattattgaa gcatttatca gggttattgt ctcatgagcg gatacatatt tgaatgtatt 6540tagaaaaata aacaaatagg ggttccgcgc acatttcccc gaaaagtgcc acctaaattg 6600taagcgttaa tattttgtta aaattcgcgt taaatttttg ttaaatcagc tcatttttta 6660accaataggc cgaaatcggc aaaatccctt ataaatcaaa agaatagacc gagatagggt 6720tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac tccaacgtca 6780aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca ccctaatcaa 6840gttttttggg gtcgaggtgc cgtaaagcac taaatcggaa ccctaaaggg agcccccgat 6900ttagagcttg acggggaaag ccggcgaacg tggcgagaaa ggaagggaag aaagcgaaag 6960gagcgggcgc tagggcgctg gcaagtgtag cggtcacgct gcgcgtaacc accacacccg 7020ccgcgcttaa tgcgccgcta cagggcgcgt cccattcgcc attcaggctg cgcaactgtt 7080gggaagggcg atcggtgcgg gcctcttcgc tattacgcca gctggcgaaa gggggatgtg 7140ctgcaaggcg attaagttgg gtaacgccag ggttttccca gtcacgacgt tgtaaaacga 7200cggccagtga gcgcgcgtaa tacgactcac tatagggcga attgggtacc gggccccccc 7260tcgatcgagg tcgacggtat cgggggagct cgcagggtct ccattttgaa gcgggaggtt 7320tgaacgcgca gccgcc 73364590DNAArtificial SequenceConstructed sequencemisc_feature(24)..(25)n is a, c, g, or tmisc_feature(39)..(40)n is a, c, g, or tmisc_feature(42)..(43)n is a, c, g, or tmisc_feature(57)..(58)n is a, c, g, or tmisc_feature(60)..(61)n is a, c, g, or tmisc_feature(63)..(64)n is a, c, g, or tmisc_feature(66)..(67)n is a, c, g, or t 45ctacagagga atacggtatc gtgnnkgata acttgcagnn knnkaacacg gctcctnnkn 60nknnknnkgt caacagccag ggggccttac 904620DNAArtificial SequenceConstructed sequence 46tggaccggct gatgaatcct 204720DNAArtificial SequenceConstructed sequence 47cggtgctgta ttgcgtgatg 204834DNAArtificial SequenceConstructed sequence 48ggctcacgtc tctgtagcca cagggttagt ggtt 344936DNAArtificial SequenceConstructed sequence 49cggacacgtc tcgctacaga ggaatacggt atcgtg 365030DNAArtificial SequenceConstructed sequence 50ggctcacgtc tcggtaaggc cccctggctg 305136DNAArtificial SequenceConstructed sequence 51cggacacgtc tccttacccg gtatggtctg gcagaa 365220DNAArtificial SequenceConstructed sequence 52cacgcagaat gaaggcacca 205327DNAArtificial SequenceConstructed sequence 53cacgataccg tattcctctg tagccac 275430DNAArtificial SequenceConstructed sequence 54gctggtttag tgaaccgtca gatcctgcat 305520DNAArtificial SequenceConstructed sequence 55aaggtgcgcg tggaccagaa 205622DNAArtificial SequenceConstructed sequence 56acaggtactg gtcaatcaga gg 225765DNAArtificial SequenceConstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 57caaccacctc tacaagcaaa tctccnnknn knnknnknnk ggagccacca acgacaacac 60ctact 655865DNAArtificial SequenceConstructed sequencemisc_feature(27)..(28)n is a, c, g, or tmisc_feature(30)..(31)n is a, c, g, or tmisc_feature(33)..(34)n is a, c, g, or tmisc_feature(36)..(37)n is a, c, g, or tmisc_feature(39)..(40)n is a, c, g, or t 58agtaggtgtt gtcgttggtg gctccmnnmn nmnnmnnmnn ggagatttgc ttgtagaggt 60ggttg 655964DNAArtificial SequenceConstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 59ctacttgtct cggactcaaa caacannknn knnknnknnk acgcagactc tgggcttcag 60ccaa 646064DNAArtificial SequenceConstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 60ttggctgaag cccagagtct gcgtmnnmnn mnnmnnmnnt gttgtttgag tccgagacaa 60gtag 646165DNAArtificial SequenceConstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 61gatttttggc aaacaaaatg ctgccnnknn knnknnknnk tacagcgatg tcatgctcac 60cagcg 656265DNAArtificial SequenceConstructed sequencemisc_feature(27)..(28)n is a, c, g, or tmisc_feature(30)..(31)n is a, c, g, or tmisc_feature(33)..(34)n is a, c, g, or tmisc_feature(36)..(37)n is a, c, g, or tmisc_feature(39)..(40)n is a, c, g, or t 62cgctggtgag catgacatcg ctgtamnnmn nmnnmnnmnn ggcagcattt tgtttgccaa 60aaatc 656336DNAArtificial SequenceConstructed sequence 63cggtcacgtc tcggtcatca ccaccagcac ccgaac 366431DNAArtificial SequenceConstructed sequence 64gccagtcgtc tccgttgtcg ttggtggctc c 316555DNAArtificial SequenceConstructed sequence 65cggtcacgtc tcgcctctga ttgaccagta cctgtactac ttgtctcgga ctcaa 556652DNAArtificial SequenceConstructed sequence 66gccagtcgtc tccgccattg tattaggccc accttggctg aagcccagag tc 526758DNAArtificial SequenceConstructed sequence 67ttaccccaca ggaagcacgc cacctgcaaa tcaggtatgg ctgccgatgg ttatcttc 586856DNAArtificial SequenceConstructed sequence 68ctcgttctct gccgtgtggg actagttaca gattacgggt gaggtaacgg gtgcca 566915PRTUnknownmajor ADK8 epitope in AAV8 HVR.VIII region 69Ala Asp Asn Leu Gln Gln Gln Asn Thr Ala Pro Gln Ile Gly Thr1 5 10 157015PRTUnknownmutated c41 ADK8 epitope in AAV8 HVR.VIII region 70Gly Asp Asn Leu Gln Leu Tyr Asn Thr Ala Pro Gly Ser Val Phe1 5 10 157115PRTUnknownmutated c42 ADK8 epitope in AAV8 HVR.VIII region 71Ser Asp Asn Leu Gln Phe Arg Asn Thr Ala Pro Leu Trp Ser Ser1 5 10 157215PRTUnknownmutated c46 ADK8 epitope in AAV8 HVR.VIII region 72Asn Asp Asn Leu Gln Val Cys Asn Thr Ala Pro Asp Asp Val Met1 5 10 157315PRTUnknownmutated g110 ADK8 epitope in AAV8 HVR.VIII region 73Cys Asp Asn Leu Gln Gly Tyr Asn Thr Ala Pro Leu Cys Val Ala1 5 10 157415PRTUnknownmutated g112 ADK8 epitope in AAV8 HVR.VIII region 74Val Asp Asn Leu Gln Phe Leu Asn Thr Ala Pro Ala Gly Glu Ala1 5 10 157515PRTUnknownmutated g113 ADK8 epitope in AAV8 HVR.VIII region 75Leu Asp Asn Leu Gln Asp Gly Asn Thr Ala Pro Gly Ala Cys Gly1 5 10 157615PRTUnknownmutated g115 ADK8 epitope in AAV8 HVR.VIII region 76Trp Asp Asn Leu Gln Ser Glu Asn Thr Ala Pro Ser Glu Thr Ser1 5 10 157715PRTUnknownmutated g117 ADK8 epitope in AAV8 HVR.VIII region 77Ser Asp Asn Leu Gln Ser Cys Asn Thr Ala Pro Phe Ala Gly Ala1 5 10 15785PRTArtificial SequenceConstructed sequence 78Asn Gly Thr Ser Gly1 5794PRTArtificial SequenceConstructed sequence 79Ser Gly Thr His1804PRTArtificial SequenceConstructed sequence 80Ser Asp Thr His1815PRTArtificial SequenceConstructed sequence 81Gly Gly Thr Ala Asn1 5825PRTArtificial SequenceConstructed sequence 82Asp Gly Ser Gly Leu1 58315DNAArtificial SequenceConstructed sequence 83aacgggacat cggga 158412DNAArtificial SequenceConstructed sequence 84tctggtactc at 128590DNAArtificial SequenceConstructed sequencemisc_feature(24)..(25)n is a, c, g, or tmisc_feature(39)..(40)n is a, c, g, or tmisc_feature(42)..(43)n is a, c, g, or tmisc_feature(57)..(58)n is a, c, g, or tmisc_feature(60)..(61)n is a, c, g, or tmisc_feature(63)..(64)n is a, c, g, or tmisc_feature(66)..(67)n is a, c, g, or t 85ctacagagga atacggtatc gtgnnkgata acttgcagnn knnkaacacg gctcctnnkn 60nknnknnkgt caacagccag ggggccttac 908665DNAArtificial Sequenceconstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 86caaccacctc tacaagcaaa tctccnnknn knnknnknnk ggagccacca acgacaacac 60ctact 658764DNAArtificial SequenceConstructed sequencemisc_feature(26)..(27)n is a, c, g, or tmisc_feature(29)..(30)n is a, c, g, or tmisc_feature(32)..(33)n is a, c, g, or tmisc_feature(35)..(36)n is a, c, g, or tmisc_feature(38)..(39)n is a, c, g, or t 87ctacttgtct cggactcaaa caacannknn knnknnknnk acgcagactc tgggcttcag 60ccaa 6488738PRTUnknownAAV rh.20 capsid protein 88Met Ala Ala Asp Gly Tyr Leu Pro Asp Trp Leu Glu Asp Asn Leu Ser1 5 10 15Glu Gly Ile Arg Glu Trp Trp Asp Leu Lys Pro Gly Ala Pro Lys Pro 20 25 30Lys Ala Asn Gln Gln Lys Gln Asp Asp Gly Arg Gly Leu Val Leu Pro 35 40 45Gly Tyr Lys Tyr Leu Gly Pro Phe Asn Gly Leu Asp Lys Gly Glu Pro 50 55

60Val Asn Ala Ala Asp Ala Ala Ala Leu Glu His Asp Lys Ala Tyr Asp65 70 75 80Gln Gln Leu Lys Ala Gly Asp Asn Pro Tyr Leu Arg Tyr Asn His Ala 85 90 95Asp Ala Glu Phe Gln Glu Arg Leu Gln Glu Asp Thr Ser Phe Gly Gly 100 105 110Asn Leu Gly Arg Ala Val Phe Gln Ala Lys Lys Arg Val Leu Glu Pro 115 120 125Leu Gly Leu Val Glu Glu Gly Ala Lys Thr Ala Pro Gly Lys Lys Arg 130 135 140Pro Val Glu Pro Ser Pro Gln Arg Ser Pro Asp Ser Ser Thr Gly Ile145 150 155 160Gly Lys Thr Gly Gln Gln Pro Ala Lys Lys Arg Leu Asn Phe Gly Gln 165 170 175Thr Gly Asp Ser Glu Ser Val Pro Asp Pro Gln Pro Ile Gly Glu Pro 180 185 190Pro Ala Gly Pro Ser Gly Leu Gly Ser Gly Thr Met Ala Ala Gly Gly 195 200 205Gly Ala Pro Met Ala Asp Asn Asn Glu Gly Ala Asp Gly Val Gly Ser 210 215 220Ser Ser Gly Asn Trp His Cys Asp Ser Thr Trp Leu Gly Asp Arg Val225 230 235 240Ile Thr Thr Ser Thr Arg Thr Trp Ala Leu Pro Thr Tyr Asn Asn His 245 250 255Leu Tyr Lys Gln Ile Ser Asn Gly Thr Ser Gly Gly Ser Thr Asn Asp 260 265 270Asn Thr Tyr Phe Gly Tyr Ser Thr Pro Trp Gly Tyr Phe Asp Phe Asn 275 280 285Arg Phe His Cys His Phe Ser Pro Arg Asp Trp Gln Arg Leu Ile Asn 290 295 300Asn Asn Trp Gly Phe Arg Pro Lys Arg Leu Asn Phe Lys Leu Phe Asn305 310 315 320Ile Gln Val Lys Glu Val Thr Gln Asn Glu Gly Thr Lys Thr Ile Ala 325 330 335Asn Asn Leu Thr Ser Thr Ile Gln Val Phe Thr Asp Ser Glu Tyr Gln 340 345 350Leu Pro Tyr Val Leu Gly Ser Ala His Gln Gly Cys Leu Pro Pro Phe 355 360 365Pro Ala Asp Val Phe Met Ile Pro Gln Tyr Gly Tyr Leu Thr Leu Asn 370 375 380Asn Gly Ser Gln Ala Val Gly Arg Ser Ser Phe Tyr Cys Leu Glu Tyr385 390 395 400Phe Pro Ser Gln Met Leu Arg Thr Gly Asn Asn Phe Glu Phe Ser Tyr 405 410 415Gln Phe Glu Asp Val Pro Phe His Ser Ser Tyr Ala His Ser Gln Ser 420 425 430Leu Asp Arg Leu Met Asn Pro Leu Ile Asp Gln Tyr Leu Tyr Tyr Leu 435 440 445Ser Arg Thr Gln Ser Thr Gly Gly Thr Ala Gly Thr Gln Gln Leu Leu 450 455 460Phe Ser Gln Ala Gly Pro Asn Asn Met Ser Ala Gln Ala Lys Asn Trp465 470 475 480Leu Pro Gly Pro Cys Tyr Arg Gln Gln Arg Val Ser Thr Thr Leu Ser 485 490 495Gln Asn Asn Asn Ser Asn Phe Ala Trp Thr Gly Ala Thr Lys Tyr His 500 505 510Leu Asn Gly Arg Asp Ser Leu Val Asn Pro Gly Val Ala Met Ala Thr 515 520 525His Lys Asp Asp Glu Glu Arg Phe Phe Pro Ser Ser Gly Val Leu Met 530 535 540Phe Gly Lys Gln Gly Ala Gly Lys Asp Asn Val Asp Tyr Ser Ser Val545 550 555 560Met Leu Thr Ser Glu Glu Glu Ile Lys Thr Thr Asn Pro Val Ala Thr 565 570 575Glu Gln Tyr Gly Val Val Ala Asp Asn Leu Gln Gln Gln Asn Ala Ala 580 585 590Pro Ile Val Gly Ala Val Asn Ser Gln Gly Ala Leu Pro Gly Met Val 595 600 605Trp Gln Asn Arg Asp Val Tyr Leu Gln Gly Pro Ile Trp Ala Lys Ile 610 615 620Pro His Thr Asp Gly Asn Phe His Pro Ser Pro Leu Met Gly Gly Phe625 630 635 640Gly Leu Lys His Pro Pro Pro Gln Ile Leu Ile Lys Asn Thr Pro Val 645 650 655Pro Ala Asp Pro Pro Thr Thr Phe Ser Gln Ala Lys Leu Ala Ser Phe 660 665 670Ile Thr Gln Tyr Ser Thr Gly Gln Val Ser Val Glu Ile Glu Trp Glu 675 680 685Leu Gln Lys Glu Asn Ser Lys Arg Trp Asn Pro Glu Ile Gln Tyr Thr 690 695 700Ser Asn Tyr Tyr Lys Ser Thr Asn Val Asp Phe Ala Val Asn Thr Glu705 710 715 720Gly Thr Tyr Ser Glu Pro Arg Pro Ile Gly Thr Arg Tyr Leu Thr Arg 725 730 735Asn Leu



User Contributions:

Comment about this patent or add new information about this topic:

CAPTCHA
Similar patent applications:
DateTitle
2017-04-20Time control system and time control method for multiple motors
2017-04-20Motor control system and method for compensating disturbance
New patent applications in this class:
DateTitle
2022-09-22Electronic device
2022-09-22Front-facing proximity detection using capacitive sensor
2022-09-22Touch-control panel and touch-control display apparatus
2022-09-22Sensing circuit with signal compensation
2022-09-22Reduced-size interfaces for managing alerts
New patent applications from these inventors:
DateTitle
2022-09-22Compositions for regulating and self-inactivating enzyme expression and methods for modulating off-target activity of enzymes
2022-09-15Compositions for treatment of wet age-related macular degeneration
2022-09-15Gene therapy for treating hemophilia b
2022-08-25Adeno-associated viral vectors useful in treatment of spinal muscular atropy
2022-08-11Novel aav capsids and compositions containing same
Website © 2025 Advameg, Inc.