Patent application title: USE OF PROTEINS AND PEPTIDES ENCODED BY THE GENOME OF A NOVEL SARS-ASSOCIATED CORONAVIRUS STRAIN
Inventors:
Sylvie Van Der Werf (Gif-Sur-Yvette, FR)
Nicolas Escriou (Paris, FR)
Bernadette Crescenzo-Chaigne (Neuilly-Sur-Seine, FR)
Jean-Claude Manuguerra (Paris, FR)
Frederik Kunst (Paris, FR)
Benoît Callendret (Nanterre, FR)
Jean-Michel Betton (Paris, FR)
Valérie Lorin (Montrouge, FR)
Sylvie Gerbaud (Saint-Maur-Des-Fosses, FR)
Ana Maria Burguiere (Clamart, FR)
Saliha Azebi (Vitry-Sur-Seine, FR)
Pierre Charneau (Paris, FR)
Frédéric Tangy (Les Lilas, FR)
Frédéric Tangy (Les Lilas, FR)
Chantal Combredet (Paris, FR)
Jean-François Delagneau (La Celle Saint Cloud, FR)
Monique Martin (Chatenay Malabry, FR)
IPC8 Class: AA61K39215FI
USPC Class:
4241861
Class name: Antigen, epitope, or other immunospecific immunoeffector (e.g., immunospecific vaccine, immunospecific stimulator of cell-mediated immunity, immunospecific tolerogen, immunospecific immunosuppressor, etc.) amino acid sequence disclosed in whole or in part; or conjugate, complex, or fusion protein or fusion polypeptide including the same disclosed amino acid sequence derived from virus
Publication date: 2012-04-05
Patent application number: 20120082693
Abstract:
The invention relates to the use of proteins and peptides coded by the
genome of the isolated or purified strain of severe acute respiratory
syndrome (SARS)-associated coronavirus, resulting from sample reference
number 031589 and, in particular, to the use of protein S and the
derivative antibodies thereof as diagnostic reagents and as a vaccine.Claims:
1. An isolated and purified protein or polypeptide, characterized in that
it is the S protein having the sequence SEQ ID No: 3, its ectodomain or a
fragment of its ectodomain.
2. The protein or polypeptide as claimed in claim 1, characterized in that it consists of the amino acids corresponding to positions 1 to 1193 of the amino acid sequence of the S protein.
3. The protein or polypeptide as claimed in claim 1, characterized in that it consists of the amino acids corresponding to positions 14 to 1193 of the amino acid sequence of the S protein.
4. The isolated protein or polypeptide as claimed in claim 1, characterized in that it consists of the amino acids corresponding to positions 475 to 1193 of the amino acid sequence of the S protein.
5. A nucleic acid encoding a protein or a polypeptide as claimed in any one of claims 1 to 4.
6. The nucleic acid as claimed in claim 5, characterized in that it comprises the sequence encoding SEQ ID No: 5 or the sequence encoding SEQ ID No: 6.
7. A recombinant expression vector, characterized in that it encodes a protein or a polypeptide as claimed in any one of claims 1 to 4.
8. The recombinant expression vector as claimed in claim 7, characterized in that it is chosen from the vectors contained in the following bacterial strains, deposited at the Collection Nationale de Cultures de Microorganismes (CNCM), 25 rue du Docteur Roux, 75724 Paris Cedex 15: a) strain No. I-3118, deposited on Oct. 23, 2003, b) strain No. I-3019, deposited on May 12, 2003, c) strain No. I-3020, deposited on May 12, 2003, d) strain No. I-3059, deposited on Jun. 20, 2003, e) strain No. I-3323, deposited on Nov. 22, 2004, f) strain No. I-3324, deposited on Nov. 22, 2004, g) strain No. I-3326, deposited on Dec. 1, 2004, h) strain No. I-3327, deposited on Dec. 1, 2004, i) strain No. I-3332, deposited on Dec. 1, 2004, j) strain No. I-3333, deposited on Dec. 1, 2004, k) strain No. I-3334, deposited on Dec. 1, 2004, l) strain No. I-3335, deposited on Dec. 1, 2004, m) strain No. I-3336, deposited on Dec. 1, 2004, n) strain No. I-3337, deposited on Dec. 1, 2004, o) strain No. I-3338, deposited on Dec. 2, 2004, p) strain No. I-3339, deposited on Dec. 2, 2004, q) strain No. I-3340, deposited on Dec. 2, 2004, and r) strain No. I-3341, deposited on Dec. 2, 2004.
9. A nucleic acid containing a synthetic gene allowing optimized expression of the S protein in eukaryotic cells, characterized in that it possesses the sequence SEQ ID No: 140.
10. An expression vector containing a nucleic acid as claimed in claim 9, characterized in that it is contained in the bacterial strain deposited at the CNCM, on Dec. 1, 2004, under the No. I-3333.
11. The expression vector as claimed in claim 7 or claim 9, characterized in that it is a viral vector, in the form of a viral particle or in the form of a recombinant genome.
12. The vector as claimed in claim 11, characterized in that it is a recombinant viral particle or a recombinant viral genome capable of being obtained by transfecting a plasmid according to paragraphs g), h) or k) to r) of claim 8, into an appropriate cellular system.
13. A lentiviral vector encoding a polypeptide as claimed in any one of claims 1 to 4.
14. A recombinant measles virus encoding a polypeptide as claimed in any one of claims 1 to 4.
15. A recombinant vaccinia virus encoding a polypeptide as claimed in any one of claims 1 to 4.
16. The use of a vector according to paragraphs d) to p) of claim 8, or of a vector as claimed in claim 10, for the production, in a eukaryotic system, of the SARS-associated coronavirus S protein or of a fragment of this protein.
17. A method for producing the S protein in a eukaryotic system, comprising a step of transfecting eukaryotic cells in culture with a vector chosen from the vectors contained in the bacterial strains mentioned in paragraphs d) to p) of claim 8, or in claim 10.
18. A genetically modified eukaryotic cell expressing a protein or a polypeptide as claimed in any one of claims 1 to 4.
19. The cell as claimed in claim 18, capable of being obtained by transfection with any one of the vectors mentioned in paragraphs k) to n) of claim 8.
20. The cell as claimed in claim 19, characterized in that it is the cell FRhK4-Ssol-30, deposited at the CNCM on Nov. 22, 2004, under the No. I-3325.
21. A monoclonal antibody recognizing the native S protein of a SARS-associated coronavirus.
22. The use of a protein or a polypeptide as claimed in any one of claims 1 to 4, or of an antibody as claimed in claim 21, for detecting a SARS-associated coronavirus infection, from a biological sample.
23. A method for detecting a SARS-associated coronavirus, from a biological sample, characterized in that the detection is carried out by ELISA using the recombinant S protein or its ectodomain, or a fragment of its ectodomain, expressed in a eukaryotic system.
24. The method of detection as claimed in claim 23, additionally comprising a step of detection by ELISA using the recombinant N protein.
25. The method as claimed in claim 23 or 24, characterized in that it is a double epitope ELISA method, and in that the serum to be tested is mixed with the visualizing antigen, said mixture then being brought into contact with the antigen attached to a solid support.
26. An immune complex formed of a monoclonal antibody or antibody fragment as claimed in claim 21, and of a SARS-associated coronavirus protein or peptide
27. An immune complex formed of a protein or a polypeptide as claimed in any one of claims 1 to 4, and of an antibody directed specifically against an epitope of the SARS-associated coronavirus.
28. A SARS-associated coronavirus detection kit or box, characterized in that it comprises at least one reagent selected from the group consisting of: a protein or polypeptide as claimed in any one of claims 1 to 4, a nucleic acid as claimed in either of claims 5 and 6, a cell as claimed in any one of claims 18 to 20, or an antibody as claimed in claim 21.
29. An immunogenic and/or vaccine composition, characterized in that it comprises a recombinant protein or polypeptide as claimed in any one of claims 1 to 4, obtained in a eukaryotic expression system.
30. An immunogenic and/or vaccine composition, characterized in that it comprises a recombinant vector or virus as claimed in any one of claims 7, 8, and 10 to 15.
31. A nucleic acid insert of viral origin, characterized in that it is contained in any one of the strains mentioned in paragraphs a) to h) and k) to r) of claim 8.
Description:
[0001] The present invention relates to a novel strain of severe acute
respiratory syndrome (SARS)-associated coronavirus derived from a sample
recorded under No. 031589 and collected in Hanoi (Vietnam), to nucleic
acid molecules derived from its genome, to the proteins and peptides
encoded by said nucleic acid molecules and to their applications, in
particular as diagnostic reagents and/or as vaccine.
[0002] Coronavirus is a virus containing single-stranded RNA, of positive polarity, of approximately 30 kilobases which replicates in the cytoplasm of the host cells; the 5' end of the genome has a capped structure and the 3' end contains a polyA tail. This virus is enveloped and comprises, at its surface, peplomeric structures called spicules.
[0003] The genome comprises the following open reading frames or ORFs, from its 5' end to its 3' end: ORF1a and ORF1b corresponding to the proteins of the transcription-replication complex, and ORF-S, ORF-E, ORF-M and ORF-N corresponding to the structural proteins S, E, M and N. It also comprises ORFs corresponding to proteins of unknown function encoded by: the region situated between ORF-S and ORF-E and overlapping the latter, the region situated between ORF-M and ORF-N, and the region included in ORF-N.
[0004] The S protein is a membrane glycoprotein (200-220 kDa) which exists in the form of spicules or spikes emerging from the surface of the viral envelope. It is responsible for the attachment of the virus to the receptors of the host cell and for inducing the fusion of the viral envelope with the cell membrane.
[0005] The small envelope protein (E), also called sM (small membrane), which is a nonglycosylated transmembrane protein of about 10 kDa, is the protein present in the smallest quantity in the virion. It plays a powerful role in the coronavirus budding process which occurs at the level of the intermediate compartment in the endoplasmic reticulum and the Golgi apparatus.
[0006] The M protein or matrix protein (25-30 kDa) is a more abundant membrane glycoprotein which is integrated into the viral particle by an M/E interaction, whereas the incorporation of S into the particles is directed by an S/M interaction. It appears to be important for the viral maturation of coronaviruses and for the determination of the site where the viral particles are assembled.
[0007] The N protein or nucleocapsid protein (45-50 kDa) which is the most conserved among the coronavirus structural proteins is necessary for encapsidating the genomic RNA and then for directing its incorporation into the virion. This protein is probably also involved in the replication of the RNA.
[0008] When the host cell is infected, the reading frame (ORF) situated in of the viral genome is translated into a polyprotein which is cleaved by the viral proteases and then releases several nonstructural proteins such as the RNA-dependent RNA polymerase (Rep) and the ATPase helicase (Hel). These two proteins are involved in the replication of the viral genome and in the generation of transcripts which are used in the synthesis of the viral proteins. The mechanisms by which these subgenomic mRNAs are produced are not completely understood; however, recent facts indicate that the sequences for regulation of transcription at the 5' end of each gene represent signals which regulate the discontinuous transcription of the subgenomic mRNAs.
[0009] The proteins of the viral membrane (S, E and M proteins) are inserted into the intermediate compartment, whereas the replicated RNA (+strand) is assembled with the N (nucleocapsid) protein. This protein-RNA complex then combines with the M protein contained in the membranes of the endoplasmic reticulum and the viral particles form when the nucleocapsid complex buds into the endoplasmic reticulum. The virus then migrates across the Golgi complex and eventually leaves the cell, for example by exocytosis. The site of attachment of the virus to the host cell is at the level of the S protein.
[0010] Coronaviruses are responsible for 15 to 30% of colds in humans and for respiratory and digestive infections in animals, especially cats (FIPV: Feline infectious peritonitis virus), poultry (IBV: Avian infectious bronchitis virus), mice (MHV: Mouse hepatitis virus), pigs (TGEV: Transmissible gastroenterititis virus, PEDV: Porcine Epidemic diarrhea virus, PRCoV: Porcine Respiratory Coronavirus, HEV: Hemagglutinating encephalomyelitis Virus) and bovines (BCoV: Bovine coronavirus).
[0011] In general, each coronavirus affects only one species; in immunocompetent individuals, the infection induces optionally neutralizing antibodies and cell immunity, capable of destroying the infected cells.
[0012] An epidemy of atypical pneumonia, called severe acute respiratory syndrome (SARS) has spread in various countries (Vietnam, Hong Kong, Singapore, Thailand and Canada) during the first quarter of 2003, from an initial focus which appeared in China in the last quarter of 2002. The severity of this disease is such that its mortality rate is about 3 to 6%. The determination of the causative agent of this disease is underway by numerous laboratories worldwide.
[0013] In March 2003, a new coronavirus (SARS-CoV or SARS virus) was isolated, in association with cases of severe acute respiratory syndrome (T. G. KSIAZEK et al., The New England Journal of Medicine, 2003, 348, 1319-1330; C. DROSTEN et al., The New England Journal of Medicine, 2003, 348, 1967-1976; Peiris et al., Lancet, 2003, 361, 1319).
[0014] Genomic sequences of this new coronavirus have thus been obtained, in particular those of the Urbani isolate (Genbank accession No. AY274119.3 and A. MARRA et al., Science, May 1, 2003, 300, 1399-1404) and the Toronto isolate (Tor2, Genbank accession No. AY278741 and A. ROTA et al., Science, 2003, 300, 1394-1399).
[0015] The organization of the genome is comparable with that of other known coronaviruses, thus making it possible to confirm that SARS-CoV belongs to the Coronaviridae family; open reading frames ORF1a and 1b and open reading frames corresponding to the S, E, M and N proteins, and to proteins encoded by: the region situated between ORF-S and ORF-E (ORF3), the region situated between ORF-S and ORF-E and overlapping ORF-E (ORF4), the region situated between ORF-M and ORF-N (ORF7 to ORF11) and the region corresponding to ORF-N (ORF13 and ORF14), have in particular been identified.
[0016] Seven differences have been identified between the sequences of the Tor2 and Urbani isolates; 3 correspond to silent mutations (c/t at position 16622 and a/g at position 19064 of ORF1b, t/c at position 24872 of ORF-S) and 4 modify the amino acid sequence of respectively: the proteins encoded by ORF1a (c/t at position 7919 corresponding to the A/V mutation), the S protein (g/t at position 23220 corresponding to the A/S mutation), the protein encoded by ORF3 (a/g at position 25298 corresponding to the R/G mutation) and the M protein (t/c at position 26857 corresponding to the S/P mutation).
[0017] In addition, phylogenetic analysis shows that SARS-CoV is distant from other coronaviruses and that it did not appear by mutation of human respiratory coronaviruses nor by recombination between known coronaviruses (for a review, see Holmes, J. C. I., 2003, 111, 1605-1609).
[0018] The determination and the taking into account of new variants are important for the development of reagents for the detection and diagnosis of SARS which are sufficiently sensitive and specific, and immunogenic compositions capable of protecting populations against epidemics of SARS.
[0019] The inventors have now identified another strain of SARS-associated coronavirus which is distinguishable from the Tor2 and Urbani isolates.
[0020] The subject of the present invention is therefore an isolated or purified strain of severe acute respiratory syndrome-associated human coronavirus, characterized in that its genome has, in the form of complementary DNA, a serine codon at position 23220-23222 of the gene for the S protein or a glycine codon at position 25298-25300 of the gene for ORF3, and an alanine codon at position 7918-7920 of ORF1a or a serine codon at position 26857-26859 of the gene for the M protein, said positions being indicated in terms of reference to the Genbank sequence AY274119.3.
[0021] According to an advantageous embodiment of said strain, the DNA equivalent of its genome has a sequence corresponding to the sequence SEQ ID No: 1; this coronavirus strain is derived from the sample collected from the bronchoaleveolar washings from a patient suffering from SARS, recorded under the No. 031589 and collected at the Hanoi (Vietnam) French hospital.
[0022] In accordance with the invention, said sequence SEQ ID No: 1 is that of the deoxyribonucleic acid corresponding to the ribonucleic acid molecule of the genome of the isolated coronavirus strain as defined above.
[0023] The sequence SEQ ID No: 1 is distinguishable from the Genbank sequence AY274119.3 (Tor2 isolate) in that it possesses the following mutations: [0024] g/t at position 23220; the alanine codon (gct) at position 577 of the amino acid sequence of the Tor2 S protein is replaced by a serine codon (tct), [0025] a/g at position 25298; the arginine codon (aga) at position 11 of the amino acid sequence of the protein encoded by the Tor2 ORF3 is replaced by a glycine codon (gga).
[0026] In addition, the sequence SEQ ID No: 1 is distinguishable from the Genbank sequence AY278741 (Urbani isolate) in that it possesses the following mutations: [0027] t/c at position 7919; the valine codon (gtt) in position 2552 of the amino acid sequence of the protein encoded by ORF1a is replaced by an alanine codon (gct), [0028] t/c at position 16622: this mutation does not modify the amino acid sequence of the proteins encoded by ORF1b (silent mutation), [0029] g/a at position 19064: this mutation does not modify the amino acid sequence of the proteins encoded by ORF1b (silent mutation), [0030] c/t at position 24872: this mutation does not modify the amino acid sequence of the S protein, and [0031] c/t at position 26857: the proline codon (ccc) at position 154 of the amino acid sequence of the M protein is replaced by a serine codon (tcc).
[0032] Unless otherwise stated, the positions of the nucleotide and peptide sequences are indicated with reference to the Genbank sequence AY274119.3.
[0033] The subject of the present invention is also an isolated or purified polynucleotide, characterized in that its sequence is that of the genome of the isolated coronavirus strain as defined above.
[0034] According to an advantageous embodiment of said polynucleotide, it has the sequence SEQ ID No: 1.
[0035] The subject of the present invention is also an isolated or purified polynucleotide, characterized in that its sequence hybridizes under high stringency conditions with the sequence of the polynucleotide as defined above.
[0036] The terms "isolated or purified" mean modified "by the hand of humans" from the natural state; in other words if an object exists in nature, it is said to be isolated or purified if it is modified or extracted from its natural environment or both. For example, a polynucleotide or a protein/peptide naturally present in a living organism is neither isolated nor purified; on the other hand, the same polynucleotide or protein/peptide separated from coexisting molecules in its natural environment, obtained by cloning, amplification and/or chemical synthesis is isolated for the purposes of the present invention. Furthermore, a polynucleotide or a protein/peptide which is introduced into an organism by transformation, genetic manipulation or by any other method, is "isolated" even if it is present in said organism. The term purified as used in the present invention means that the proteins/peptides according to the invention are essentially free of association with the other proteins or polypeptides, as is for example the product purified from the culture of recombinant host cells or the product purified from a nonrecombinant source.
[0037] For the purposes of the present invention, high stringency hybridization conditions are understood to mean temperature and ionic strength conditions chosen such that they make it possible to maintain the specific and selective hybridization between complementary polynucleotides.
[0038] By way of illustration, high stringency conditions for the purposes of defining the above polynucleotides are advantageously the following: the DNA-DNA or DNA-RNA hybridization is performed in two steps: (1) prehybridization at 42° C. for 3 hours in phosphate buffer (20 mM, pH 7.5) containing 5×SSC (1×SSC corresponds to a 0.15 M NaCl+0.015 M sodium citrate solution), 50% formamide, 7% sodium dodecyl sulfate (SDS), 10×Denhardt's, 5% dextran sulfate and 1% salmon sperm DNA; (2) hybridization for 20 hours at 42° C. followed by 2 washings of 20 minutes at 20° C. in 2×SSC+2% SDS, 1 washing of 20 minutes at 20° C. in 0.1×SSC+0.1% SDS. The final washing is performed in 0.1×SSC+0.1% SDS for 30 minutes at 60° C.
[0039] The subject of the present invention is also a representative fragment of the polynucleotide as defined above, characterized in that it is capable of being obtained either by the use of restriction enzymes whose recognition and cleavage sites are present in said polynucleotide as defined above, or by amplification with the aid of oligonucleotide primers specific for said polynucleotide as defined above, or by transcription in vitro, or by chemical synthesis.
[0040] According to an advantageous embodiment of said fragment, it is selected from the group consisting of: the cDNA corresponding to at least one open reading frame (ORF) chosen from: ORF1a, ORF1b, ORF-S, ORF-E, ORF-M, ORF-N, ORF3, ORF4, ORF7 to ORF11, ORF13 and ORF14 and the cDNA corresponding to the noncoding 5' or 3' ends of said polynucleotide.
[0041] According to an advantageous feature of this embodiment, said fragment has a sequence selected from the group consisting of: [0042] the sequences SEQ ID NO: 2 and 4 representing the cDNA corresponding to the ORF-S which encodes the S protein, [0043] the sequences SEQ ID NO: 13 and 15 representing the cDNA corresponding to the ORF-E which encodes the E protein, [0044] the sequences SEQ ID NO: 16 and 18 representing the cDNA corresponding to the ORF-M which encodes the M protein, [0045] the sequences SEQ ID NO: 36 and 38 representing the cDNA corresponding to the ORF-N which encodes the N protein, [0046] the sequences representing the cDNA corresponding respectively: to ORF1a and ORF1b (ORF1ab, SEQ ID NO: 31), to ORF3 and ORF4 (SEQ ID NO: 7, 8), to ORF7 to 11 (SEQ ID NO: 19, 20) to ORF13 (SEQ ID NO: 32) and to ORF14 (SEQ ID NO: 34), and [0047] the sequences representing the cDNAs corresponding respectively to the noncoding 5' (SEQ ID NO: 39 and 72) and 3' (SEQ ID NO: 40, 73) ends of said polynucleotide.
[0048] The subject of the present invention is also a cDNA fragment encoding the S protein, as defined above, characterized in that it has a sequence selected from the group consisting of the sequences SEQ ID NO: 5 and 6 (Sa and Sb fragments).
[0049] The subject of the present invention is also a cDNA fragment corresponding to ORF1a and ORF1b as defined above, characterized in that it has a sequence selected from the group consisting of the sequences SEQ ID NO: 41 to 54 (L0 to L12 fragments).
[0050] The subject of the present invention is also a polynucleotide fragment as defined above, characterized in that it has at least 15 consecutive bases or base pairs of the sequence of the genome of said strain including at least one of those situated in position 7979, 16622, 19064, 23220, 24872, 25298 and 26857. Preferably this is a fragment of 20 to 2500 bases or base pairs, preferably from 20 to 400.
[0051] According to an advantageous embodiment of said fragment, it includes at least one pair of bases or base pairs corresponding to the following positions: 7919 and 23220, 7919 and 25298, 16622 and 23220, 19064 and 23220, 16622 and 25298, 19064 and 25298, 23220 and 24872, 23220 and 26857, 24872 and 25298, 25298 and 26857.
[0052] The subject of the present invention is also primers of at least 18 bases capable of amplifying a fragment of the genome of a SARS-associated coronavirus or of the DNA equivalent thereof.
[0053] According to an embodiment of said primers, they are selected from the group consisting of: [0054] the pair of primers No. 1 corresponding respectively to positions 28507 to 28522 (sense primer, SEQ ID NO: 60) and 28774 to 28759 (antisense primer, SEQ ID NO: 61) of the sequence of the polynucleotide as defined above, [0055] the pair of primers No. 2 corresponding respectively to positions 28375 to 28390 (sense primer, SEQ ID NO: 62) and 28702 to 28687 (antisense primer, SEQ ID NO: 63) of the sequence of the polynucleotide as defined above, and [0056] the pair of primers consisting of the primers SEQ ID Nos: 55 and 56.
[0057] The subject of the present invention is also a probe capable of detecting the presence of the genome of a SARS-associated coronavirus or of a fragment thereof, characterized in that it is selected from the group consisting of: the fragments as defined above and the fragments corresponding to the following positions of the polynucleotide sequence as defined above: 28561 to 28586, 28588 to 28608, 28541 to 28563 and 28565 to 28589 (SEQ ID NO: 64 to 67).
[0058] The probes and primers according to the invention may be labeled directly or indirectly with a radioactive or nonradioactive compound by methods well known to persons skilled in the art so as to obtain a detectable and/or quantifiable signal. Among the radioactive isotopes used, there may be mentioned 32P, 33P, 35S, 3H or 125I. The nonradioactive entities are selected from ligands such as biotin, avidin, streptavidin, digoxygenin, haptens, dyes, luminescent agents such as radioluminescent, chemoluminescent, bioluminescent, fluorescent and phosphorescent agents.
[0059] The invention encompasses the labeled probes and primers derived from the preceding sequences.
[0060] Such probes and primers are useful for the diagnosis of infection by a SARS-associated coronavirus.
[0061] The subject of the present invention is also a method for the detection of a SARS-associated coronavirus, from a biological sample, which method is characterized in that it comprises at least:
(a) the extraction of nucleic acids present in said biological sample, (b) the amplification of a fragment of ORF-N by RT-PCR with the aid of a pair of primers as defined above, and (c) the detection, by any appropriate means, of the amplification products obtained in (b).
[0062] The amplification products (amplicons) in (b) are 268 bp for the pair of primers No. 1 and 328 bp for the pair of primers No. 2.
[0063] According to an advantageous embodiment of said method, the step (b) of detection is carried out with the aid of at least one probe corresponding to positions 28561 to 28586, 28588 to 28608, 28541 to 28563 and 28565 to 28589 of the sequence of the polynucleotide as defined above.
[0064] Preferably, the SARS-associated coronavirus genome is detected and optionally quantified by PCR in real time with the aid of the pair of primers No. 2 and probes corresponding to positions 28541 to 28563 and 28565 to 28589 labeled with different compounds, in particular different fluorescent agents.
[0065] The real time RT-PCR which uses this pair of primers and this probe is very sensitive since it makes it possible to detect 102 copies of RNA and up to 10 copies of RNA; it is in addition reliable and reproducible.
[0066] The invention encompasses the single-stranded, double-stranded and triple-stranded polydeoxyribonucleotides and polyribonucleotides corresponding to the sequence of the genome of the isolated strain of coronavirus and its fragments as defined above, and to their sense or antisense complementary sequences, in particular the RNAs and cDNAs corresponding to the sequence of the genome and of its fragments as defined above.
[0067] The present invention also encompasses the amplification fragments obtained with the aid of primers specific for the genome of the purified or isolated strain as defined above, in particular with the aid of primers or pairs of primers as defined above, the restriction fragments formed by or comprising the sequence of fragments as defined above, the fragments obtained by transcription in vitro from a vector containing the sequence SEQ ID NO: 1 or a fragment as defined above, and fragments obtained by chemical synthesis. Examples of restriction fragments are deduced from the restriction map of the sequence SEQ ID NO: 1 illustrated by FIG. 13. In accordance with the invention, said fragments are either in the form of isolated fragments, or in the form of mixtures of fragments. The invention also encompasses fragments modified, in relation to the preceding ones, by removal or addition of nucleotides in a proportion of about 15%, relative to the length of the above fragments and/or modified in terms of the nature of the nucleotides, as long as the modified nucleotide fragments retain a capacity for hybridization with the genomic or antigenomic RNA sequences of the isolate as defined above.
[0068] The nucleic acid molecules according to the invention are obtained by conventional methods, known per se, following standard protocols such as those described in Current Protocols in Molecular Biology (Frederick M. AUSUBEL, 2000, Wiley and son Inc., Library of Congress, USA). For example, they may be obtained by amplification of a nucleic sequence by PCR or RT-PCR or alternatively by total or partial chemical synthesis.
[0069] The subject of the present invention is also a DNA or RNA chip or filter, characterized in that it comprises at least one polynucleotide or one of its fragments as defined above.
[0070] The DNA or RNA chips or filters according to the invention are prepared by conventional methods, known per se, such as for example chemical or electrochemical grafting of oligonucleotides on a glass or nylon support.
[0071] The subject of the present invention is also a recombinant cloning and/or expression vector, in particular a plasmid, a virus, a viral vector or a phage comprising a nucleic acid fragment as defined above. Preferably, said recombinant vector is an expression vector in which said nucleic acid fragment is placed under the control of appropriate elements for regulating transcription and translation. In addition, said vector may comprise sequences (tags) fused in phase with the 5' and/or 3' end of said insert, which are useful for the immobilization and/or detection and/or purification of the protein expressed from said vector.
[0072] These vectors are constructed and introduced into host cells by conventional recombinant DNA and genetic engineering methods which are known per se. Numerous vectors into which a nucleic acid molecule of interest may be inserted in order to introduce it and to maintain it in a host cell are known per se; the choice of an appropriate vector depends on the use envisaged for this vector (for example replication of the sequence of interest, expression of this sequence, maintenance of the sequence in extrachromosomal form or alternatively integration into the chromosomal material of the host), and on the nature of the host cell.
[0073] In accordance with the invention, said plasmid is selected in particular from the following plasmids: [0074] the plasmid, called SARS-S, contained in the bacterial strain deposited under the No. I-3059, on Jun. 20, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the S protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, said sequence corresponding to the nucleotides at positions 21406 to 25348 (SEQ ID NO: 4), with reference to the Genbank sequence AY274119.3, [0075] the plasmid, called SARS-S1, contained in the bacterial strain deposited under the No. I-3020, on May 12, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains a 5' fragment of the cDNA sequence encoding the S protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said fragment corresponding to the nucleotides at positions 21406 to 23454 (SEQ ID NO: 5), with reference to the Genbank sequence AY274119.3 Tor2, [0076] the plasmid, called SARS-S2, contained in the bacterial strain deposited under the No. I-3019, on May 12, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains a 3' fragment of the cDNA sequence encoding the S protein of the SARS-CoV strain derived from the sample recorded under the number No. 031589, as defined above, said fragment corresponding to the nucleotides at positions 23322 to 25348 (SEQ ID NO: 6), with reference to the Genbank sequence accession No. AY274119.3, [0077] the plasmid, called SARS-SE, contained in the bacterial strain deposited under the No. I-3126, on Nov. 13, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA corresponding to the region situated between ORF-S and ORF-E and overlapping ORF-E of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said region corresponding to the nucleotides at positions 25110 to 26244 (SEQ ID NO: 8), with reference to the Genbank sequence accession No. AY274119.3, [0078] the plasmid, called SARS-E, contained in the bacterial strain deposited under the No. I-3046, on May 28, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the E protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions [0079] 26082 to 26413 (SEQ ID NO: 15), with reference to the Genbank sequence accession No. AY274119.3, the plasmid, called SARS-M, contained in the bacterial strain deposited under the No. I-3047, on May 28, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the M protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above; said sequence corresponding to the nucleotides at positions 26330 to 27098 (SEQ ID NO: 18), with reference to the Genbank sequence accession No. AY274119.3, [0080] the plasmid, called SARS-MN, contained in the bacterial sequence deposited under the No. I-3125, on Nov. 13, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence corresponding to the region situated between ORF-M and ORF-N of the SARS-CoV strain derived from the sample recorded under the No. 031589 and collected in Hanoi, as defined above, said sequence corresponding to the nucleotides at positions 26977 to 28218 (SEQ ID NO: 20), with reference to the Genbank accession No. AY274119.3, [0081] the plasmid, called SARS-N, contained in the bacterial strain deposited under the No. I-3048, on Jun. 5, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA encoding the N protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions 28054 to 29430 (SEQ ID NO: 38), with reference to the Genbank sequence accession No. AY274119.3; thus, this plasmid comprises an insert of sequence SEQ ID NO: 38 and is contained in a bacterial strain which was deposited under the No. I-3048, on Jun. 5, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15, [0082] the plasmid, called SARS-5'NC, contained in the bacterial strain deposited under the No. I-3124, on Nov. 7, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA corresponding to the noncoding 5' end of the genome of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions 1 to 204 (SEQ ID NO: 39), with reference to the Genbank sequence accession No. AY274119.3, [0083] the plasmid called SARS-3'NC, contained in the bacterial strain deposited under the No. I-3123 on Nov. 7, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence corresponding to the noncoding 3' end of the genome of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to that situated between the nucleotide and position 28933 to 29727 (SEQ ID NO: 40), with reference to the Genbank sequence accession No. AY274119.3, ends with a series of nucleotides a., [0084] the expression plasmid, called pIV2.3N, containing a cDNA fragment encoding a C-terminal fusion of the N protein (SEQ ID NO: 37) with a polyhistidine tag, [0085] the expression plasmid, called pIV2.3SC, containing a cDNA fragment encoding a C-terminal fusion of the fragment corresponding to positions 475 to 1193 of the amino acid sequence of the S protein (SEQ ID NO: 3) with a polyhistidine tag, [0086] the expression plasmid, pIV2.3SL, containing a cDNA fragment encoding a C-terminal fusion of the fragment corresponding to positions 14 to 1193 of the amino acid sequence of the S protein (SEQ ID NO: 3) with a polyhistidine tag, [0087] the expression plasmid, called pIV2.4N, containing a cDNA fragment encoding a N-terminal fusion of the N protein (SEQ ID NO: 3) with a polyhistidine tag, [0088] the expression plasmid, called pIV2.4SC or pIV2.4S1, containing an insert encoding a N-terminal fusion of the fragment corresponding to positions 475 to 1193 of the amino acid sequence of the S protein (SEQ ID NO: 3) with a polyhistidine tag, and [0089] the expression plasmid, called pIV2.4SL, containing a cDNA fragment encoding an N-terminal fusion of the fragment corresponding to positions 14 to 1193 of the amino acid sequence of the S protein (SEQ ID NO: 3) with a polyhistidine tag.
[0090] According to an advantageous feature of the expression plasmid as defined above, it is contained in a bacterial strain which was deposited under the No. I-3117, on Oct. 23, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15.
[0091] According to another advantageous feature of the expression plasmid as defined above, it is contained in a bacterial strain which was deposited under the No. I-3118, on Oct. 23, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15.
[0092] According to another feature of the expression plasmid as defined above, it is contained in a bacterial strain which was deposited at the CNCM, 25 rue du Docteur Roux, 75724 Paris Cedex 15 under the following numbers: [0093] a) strain No. I-3118, deposited on Oct. 23, 2003, [0094] b) strain No. I-3019, deposited on May 12, 2003, [0095] c) strain No. I-3020, deposited on May 12, 2003, [0096] d) strain No. I-3059, deposited on Jun. 20, 2003, [0097] e) strain No. I-3323, deposited on Nov. 22, 2004, [0098] f) strain No. I-3324, deposited on Nov. 22, 2004, [0099] g) strain No. I-3326, deposited on Dec. 1, 2004, [0100] h) strain No. I-3327, deposited on Dec. 1, 2004, [0101] i) strain No. I-3332, deposited on Dec. 1, 2004, [0102] j) strain No. I-3333, deposited on Dec. 1, 2004, [0103] k) strain No. I-3334, deposited on Dec. 1, 2004, [0104] l) strain No. I-3335, deposited on Dec. 1, 2004, [0105] m) strain No. I-3336, deposited on Dec. 1, 2004, [0106] n) strain No. I-3337, deposited on Dec. 1, 2004, [0107] o) strain No. I-3338, deposited on Dec. 2, 2004, [0108] p) strain No. I-3339, deposited on Dec. 2, 2004, [0109] q) strain No. I-3340, deposited on Dec. 2, 2004, [0110] r) strain No. I-3341, deposited on Dec. 2, 2004.
[0111] The subject of the present invention is also a nucleic acid insert of viral origin, characterized in that it is contained in any of the strains as defined above in a)-r).
[0112] The subject of the present invention is also a nucleic acid containing a synthetic gene allowing optimized expression of the S protein in eukaryotic cells, characterized in that it possesses the sequence SEQ ID NO: 140.
[0113] The subject of the present invention is also an expression vector containing a nucleic acid containing a synthetic gene allowing optimized expression of the S protein, which vector is contained in the bacterial strain deposited at the CNCM, on Dec. 1, 2004, under the No. I-3333.
[0114] According to one embodiment of said expression vector, it is a viral vector, in the form of a viral particle or in the form of a recombinant genome.
[0115] According to an advantageous feature of this embodiment, this is a recombinant viral particle or a recombinant viral genome capable of being obtained by transfection of a plasmid according to paragraphs g), h) and k) to r) as defined above, in an appropriate cellular system, that is to say, for example, cells transfected with one or more other plasmids intended to transcomplement certain functions of the virus that are deleted in the vector and that are necessary for the formation of the viral particles.
[0116] The expression "S protein family" is understood here to mean the complete S protein, its ectodomain and fragments of this ectodomain which are preferably produced in a eukaryotic system.
[0117] The subject of the present invention is also a lentiviral vector encoding a polypeptide of the S protein family, as defined above.
[0118] The subject of the present invention is also a recombinant measles virus encoding a polypeptide of the S protein family, as defined above.
[0119] The subject of the present invention is also a recombinant vaccinia virus encoding a polypeptide of the S protein family, as defined above.
[0120] The subject of the present invention is also the use of a vector according to paragraphs e) to r) as defined above, or of a vector containing a synthetic gene for the S protein, as defined above, for the production, in a eukaryotic system, of the SARS-associated coronavirus S protein or of a fragment of this protein.
[0121] The subject of the present invention is also a method for producing the S protein in a eukaryotic system, comprising a step of transfecting eukaryotic cells in culture with a vector chosen from the vectors contained in the bacterial strains mentioned in paragraphs e) to r) above or a vector containing a synthetic gene allowing optimized expression of the S protein.
[0122] The subject of the present invention is also a cDNA library characterized in that it comprises fragments as defined above, in particular amplification fragments or restriction fragments, cloned into a recombinant vector, in particular an expression vector (expression library).
[0123] The subject of the present invention is also cells, in particular prokaryotic cells, modified by a recombinant vector as defined above.
[0124] The subject of the present invention is also a genetically modified eukaryotic cell expressing a protein or a polypeptide as defined above. Quite obviously, the terms "genetically modified eukaryotic cell" do not denote a cell modified with a wild-type virus.
[0125] According to an advantageous embodiment of said cell, it is capable of being obtained by transfection with any of the vectors mentioned in paragraphs K) to N) above.
[0126] According to an advantageous feature of this embodiment, this is the cell FRhK4-Ssol-30, deposited at the CNCM on Nov. 22, 2004, under the No. I-3325.
[0127] The recombinant vectors as defined above and the cells transformed with said expression vectors are advantageously used for the production of the corresponding proteins and peptides. The expression libraries derived from said vectors, and the cells transformed with said expression libraries are advantageously used to identify the immunogenic epitopes (B and T epitopes) of the SARS-associated coronavirus proteins.
[0128] The subject of the present invention is also the purified or isolated proteins and peptides, characterized in that they are encoded by the polynucleotide or one of its fragments as defined above.
[0129] According to an advantageous embodiment of the invention, said protein is selected from the group consisting of: [0130] the S protein having the sequence SEQ ID NO: 3 or its ectodomain [0131] the E protein having the sequence SEQ ID NO: 14 [0132] the M protein having the sequence SEQ ID NO: 17 [0133] the N protein having the sequence SEQ ID NO: 37 [0134] the proteins encoded by the ORFs: ORF1a, ORF1b, ORF3, ORF4 and ORF7 to ORF11, ORF13 and ORF14 and having the respective sequence, SEQ ID NO: 74, 75, 10, 12, 22, 24, 26, 28, 30, 33 and 35.
[0135] The terms "ectodomain of the S protein" and "soluble form of the S protein" will be used interchangeably below.
[0136] According to an advantageous embodiment of the invention, said polypeptide consists of the amino acids corresponding to positions 1 to 1193 of the amino acid sequence of the S protein.
[0137] According to another advantageous embodiment of the invention, said peptide is selected from the group consisting of:
a) the peptides corresponding to positions 14 to 1193 and 475 to 1193 of the amino acid sequence of the S protein, b) the peptides corresponding to positions 2 to 14 (SEQ ID NO: 69) and 100 to 221 of the amino acid sequence of the M protein; these peptides correspond respectively to the ectodomain and to the endodomain of the M protein, and c) the peptides corresponding to positions 1 to 12 (SEQ ID NO: 70) and 53 to 76 (SEQ ID NO: 71) of the amino acid sequence of the E protein; these peptides correspond respectively to the ectodomain and to the C-terminal end of the E protein, and d) the peptides of 5 to 50 consecutive amino acids, preferably of 10 to 30 amino acids, inclusive or partially or completely overlapping the sequence of the peptides as defined in a), b) or c).
[0138] The subject of the present invention is also a peptide, characterized in that it has a sequence of 7 to 50 amino acids including an amino acid residue selected from the group consisting of: [0139] the alanine situated at position 2552 of the amino acid sequence of the protein encoded by ORF1a, [0140] the serine situated at position 577 of the amino acid sequence of the S protein of the SARS-CoV strain as defined above, [0141] the glycine at position 11 of the amino acid sequence of the protein encoded by ORF3 of the SARS-CoV strain as defined above, [0142] the serine at position 154 of the amino acid sequence of the M protein of the SARS-CoV strain as defined above.
[0143] The subject of the present invention is also an antibody or a polyclonal or monoclonal antibody fragment which can be obtained by immunization of an animal with a recombinant vector as defined above, a cDNA library as defined above or alternatively a protein or a peptide as defined above, characterized in that it binds to at least one of the proteins encoded by SARS-CoV as defined above.
[0144] The invention encompasses the polyclonal antibodies, the monoclonal antibodies, the chimeric antibodies such as the humanized antibodies, and fragments thereof (Fab, Fv, scFv).
[0145] A subject of the present invention is also a hybridoma producing a monoclonal antibody against the N protein, characterized in that it is chosen from the following hybridomas: [0146] the hybridoma producing the monoclonal antibody 87, deposited at the CNCM on Dec. 1, 2004 under the number I-3328, [0147] the hybridoma producing the monoclonal antibody 86, deposited at the CNCM on Dec. 1, 2004 under the number I-3329, [0148] the hybridoma producing the monoclonal antibody 57, deposited at the CNCM on Dec. 1, 2004 under the number I-3330, and [0149] the hybridoma producing the monoclonal antibody 156, deposited at the CNCM on Dec. 1, 2004 under the number I-3331.
[0150] The subject of the present invention is also a polyclonal or monoclonal antibody or antibody fragment directed against the N protein, characterized in that it is produced by a hybridoma as defined above.
[0151] For the purposes of the present invention, the expression chimeric antibody is understood to mean, in relation to an antibody of a particular animal species or of a particular class of antibody, an antibody comprising all or part of a heavy chain and/or of a light chain of an antibody of another animal species or of another class of antibody.
[0152] For the purposes of the present invention, the expression humanized antibody is understood to mean a human immunoglobulin in which the residues of the CDRs (Complementary Determining Regions) which form the antigen-binding site are replaced by those of a nonhuman monoclonal antibody possessing the desired specificity, affinity or activity. Compared with the nonhuman antibodies, the humanized antibodies are less immunogenic and possess a prolonged half-life in humans because they possess only a small proportion of nonhuman sequences given that practically all the residues of the FR (Framework) regions and of the constant (Fc) region of these antibodies are those of a consensus sequence of human immunoglobulins.
[0153] A subject of the present invention is also a protein chip or filter, characterized in that it comprises a protein, a peptide or alternatively an antibody as defined above.
[0154] The protein chips according to the invention are prepared by conventional methods known per se. Among the appropriate supports on which proteins may be immobilized, there may be mentioned those made of plastic or glass, in particular in the form of microplates.
[0155] The subject of the present invention is also reagents derived from the isolated strain of SARS-associated coronavirus, derived from the sample recorded under the No. 031589, which are useful for the study and diagnosis of the infection caused by a SARS-associated coronavirus, said reagents are selected from the group consisting of: [0156] (a) a pair of primers, a probe or a DNA chip as defined above, [0157] (b) a recombinant vector or a modified cell as defined above, [0158] (c) an isolated coronavirus strain or a polynucleotide as defined above, [0159] (d) a protein or a peptide as defined above, [0160] (e) an antibody or an antibody fragment as defined above, and [0161] (f) a protein chip as defined above.
[0162] These various reagents are prepared and used according to conventional molecular biology and immunology techniques following standard protocols such as those described in Current Protocols in Molecular Biology (Frederick M. AUSUBEL, 2000, Wiley and Son Inc., Library of Congress, USA), in Current Protocols in Immunology (John E. Cologan, 2000, Wiley and Son Inc., Library of Congress, USA) and in Antibodies: A Laboratory Manual (E. Howell and D. Lane, Cold Spring Harbor Laboratory, 1988).
[0163] The nucleic acid fragments according to the invention are prepared and used according to conventional techniques as defined above. The peptides and proteins according to the invention are prepared by recombinant DNA techniques, known to persons skilled in the art, in particular with the aid of the recombinant vectors as defined above. Alternatively, the peptides according to the invention may be prepared by conventional techniques of solid or liquid phase synthesis, known to persons skilled in the art.
[0164] The polyclonal antibodies are prepared by immunizing an appropriate animal with a protein or a peptide as defined above, optionally coupled to KLH or to albumin and/or combined with an appropriate adjuvant such as (complete or incomplete) Freund's adjuvant or aluminum hydroxide; after obtaining a satisfactory antibody titer, the antibodies are harvested by collecting serum from the immunized animals and enriched with IgG by precipitation, according to conventional techniques, and then the IgGs specific for the SARS-CoV proteins are optionally purified by affinity chromatography on an appropriate column to which said peptide or said protein is attached, as defined above, so as to obtain a monospecific IgG preparation.
[0165] The monoclonal antibodies are produced from hybridomas obtained by fusion of B lymphocytes from an animal immunized with a protein or a peptide as defined above with myelomas, according to the Kohler and Milstein technique (Nature, 1975, 256, 495-497); the hybridomas are cultured in vitro, in particular in fermenters or produced in vivo, in the form of ascites; alternatively, said monoclonal antibodies are produced by genetic engineering as described in American patent U.S. Pat. No. 4,816,567.
[0166] The humanized antibodies are produced by general methods such as those described in International application WO 98/45332.
[0167] The antibody fragments are produced from the cloned VH and VL regions, from the mRNAs of hybridomas or splenic lymphocytes of an immunized mouse; for example, the Fv, scFv or Fab fragments are expressed at the surface of filamentous phages according to the Winter and Milstein technique (Nature, 1991, 349, 293-299); after several selection steps, the antibody fragments specific for the antigen are isolated and expressed in an appropriate expression system, by conventional techniques for cloning and expression of recombinant DNA.
[0168] The antibodies or fragments thereof as defined above are purified by conventional techniques known to persons skilled in the art, such as affinity chromatography.
[0169] The subject of the present invention is additionally the use of a product selected from the group consisting of: a pair of primers, a probe, a DNA chip, a recombinant vector, a modified cell, an isolated coronavirus strain, a polynucleotide, a protein or a peptide, an antibody or an antibody fragment and a protein chip as defined above, for the preparation of a reagent for the detection and optionally genotyping/serotyping of a SARS-associated coronavirus.
[0170] The proteins and peptides according to the invention, which are capable of being recognized and/or of inducing the production of antibodies specific for the SARS-associated coronavirus, are useful for the diagnosis of infection with such a coronavirus; the infection is detected, by an appropriate technique--in particular EIA, ELISA, RIA, immunofluorescence--, in a biological sample collected from an individual capable of being infected.
[0171] According to an advantageous feature of said use, said proteins are selected from the group consisting of the S, E, M and/or N proteins and the peptides as defined above.
[0172] The S, E, M and/or N proteins and the peptides derived from these proteins as defined above, for example the N protein, are used for the indirect diagnosis of a SARS-associated coronavirus infection (serological diagnosis; detection of an antibody specific for SARS-CoV), in particular by an immunoenzymatic method (ELISA).
[0173] The antibodies and antibody fragments according to the invention, in particular those directed against the S, E, M and/or N proteins and the derived peptides as defined above, are useful for the direct diagnosis of a SARS-associated coronavirus infection; the detection of the protein(s) of SARS-CoV is carried out by an appropriate technique, in particular EIA, ELISA, RIA, immunofluorescence, in a biological sample collected from an individual capable of being infected.
[0174] The subject of the present invention is also a method for the detection of a SARS-associated coronavirus, from a biological sample, which method is characterized in that it comprises at least: [0175] (a) bringing said biological sample into contact with at least one antibody or one antibody fragment, one protein, one peptide or alternatively one protein or peptide chip or filter as defined above, and [0176] (b) visualizing by any appropriate means antigen-antibody complexes formed in (a), for example by EIA, ELISA, RIA, or by immunofluorescence.
[0177] According to one advantageous embodiment of said process, step (a) comprises: [0178] (a1) bringing said biological sample into contact with at least a first antibody or an antibody fragment which is attached to an appropriate support, in particular a microplate, [0179] (a2) washing the solid phase, and [0180] (a3) adding at least a second antibody or an antibody fragment, different from the first, said antibody or antibody fragment being optionally appropriately labeled.
[0181] This method, which makes it possible to capture the viral particles present in the biological sample, is also called immunocapture method.
[0182] For example: [0183] step (a1) is carried out with at least a first monoclonal or polyclonal antibody or a fragment thereof, directed against the S, M and/or E protein, and/or a peptide corresponding to the ectodomain of one of these proteins (M2-14 or E1-12 peptides) [0184] step (a3) is carried out with at least one antibody or an antibody fragment directed against another epitope of the same protein or preferably against another protein, preferably against an inner protein such as the N nucleoprotein or the endodomain of the E or M protein, more preferably still these are antibodies or antibody fragments directed against the N protein which is very abundant in the viral particle; when an antibody or an antibody fragment directed against an inner protein (N) or against the endodomain of the E or M proteins is used, said antibody is incubated in the presence of detergent, such as Tween 20 for example, at concentrations of the order of 0.1%. [0185] step (b) for visualizing the antigen-antibody complexes formed is carried out, either directly with the aid of a second antibody labeled for example with biotin or an appropriate enzyme such as peroxidase or alkaline phosphatase, or indirectly with the aid of an anti-immunoglobulin serum labeled as above. The complexes thus formed are visualized with the aid of an appropriate substrate.
[0186] According to a preferred embodiment of this aspect of the invention, the biological sample is mixed with the visualizing monoclonal antibody prior to its being brought into contact with the capture monoclonal antibodies. Where appropriate, the serum-visualizing antibody mixture is incubated for at least 10 minutes at room temperature before being applied to the plate.
[0187] The subject of the present invention is also an immunocapture test intended to detect an infection by the SARS-associated coronavirus by detecting the native nucleoprotein (N protein), in particular characterized in that the antibody used for the capture of the native viral nucleoprotein is a monoclonal antibody specific for the central region and/or for a conformational epitope.
[0188] According to one embodiment of said test, the antibody used for the capture of the N protein is the monoclonal antibody mAb87, produced by the hybridoma deposited at the CNCM on Dec. 1, 2004 under the number I-3328.
[0189] According to another embodiment of said immunocapture test, the antibody used for the capture of the N protein is the monoclonal antibody mAb86, produced by the hybridoma deposited at the CNCM on Dec. 1, 2004 under the number I-3329.
[0190] According to another embodiment of said immunocapture test, the monoclonal antibodies mAb86 and mAb87 are used for the capture of the N protein.
[0191] In the immunocapture tests according to the invention, it is possible to use, for visualizing the N protein, the monoclonal antibody mAb57, produced by the hybridoma deposited at the CNCM on Dec. 1, 2004 under the number I-3330, said antibody being conjugated with a visualizing molecule or particle.
[0192] In accordance with said immunocapture test, a combination of the antibodies mAb57 and mAb87, conjugated with a visualizing molecule or particle, is used for the visualization of the N protein.
[0193] A visualizing molecule may be a radioactive atom, a dye, a fluorescent molecule, a fluorophore, an enzyme; a visualizing particle may be for example: colloidal gold, a magnetic particle or a latex bead.
[0194] The subject of the present invention is also a reagent for detecting a SARS-associated coronavirus, characterized in that it is selected from the group consisting of: [0195] (a) a pair of primers or a probe as defined above, [0196] (b) a recombinant vector as defined above or a modified cell as defined above, [0197] (c) an isolated coronavirus strain as defined above or a polynucleotide as defined above, [0198] (d) an antibody or an antibody fragment as defined above, [0199] (e) a combination of antibodies comprising the monoclonal antibodies mAb86 and/or mAb87, and the monoclonal antibody mAb57, as defined above, [0200] (f) a chip or a filter as defined above.
[0201] The subject of the present invention is also a method for the detection of a SARS-associated coronavirus infection, from a biological sample, by indirect IgG ELISA using the N protein, which method is characterized in that the plates are sensitized with an N protein solution at a concentration of between 0.5 and 4 μg/ml, preferably to 2 μg/ml, in a 10 mM PBS buffer pH 7.2, phenol red at 0.25 ml/l.
[0202] The subject of the present invention is additionally a method for the detection of a SARS-associated coronavirus infection, from a biological sample, by double epitope ELSA, characterized in that the serum to be tested is mixed with the visualizing antigen, said mixture then being brought into contact with the antigen attached to a solid support.
[0203] According to one variant of the tests for detecting SARS-associated coronaviruses, these tests combine an ELSA using the N protein, and another ELSA using the S protein, as described below.
[0204] The subject of the present invention is also an immune complex formed of a polyclonal or monoclonal antibody or antibody fragment as defined above, and of a SARS-associated coronavirus protein or peptide.
[0205] The subject of the present invention is additionally a SARS-associated coronavirus detection kit, characterized in that it comprises at least one reagent selected from the group consisting of: a pair of primers, a probe, a DNA or RNA chip, a recombinant vector, a modified cell, an isolated coronavirus strain, a polynucleotide, a protein or a peptide, an antibody, and a protein chip as defined above.
[0206] The subject of the present invention is additionally an immunogenic composition, characterized in that it comprises at least one product selected from the group consisting of: [0207] a) a protein or a peptide as defined above, [0208] b) a polynucleotide of the DNA or RNA type or one of its representative fragments as defined above, having a sequence chosen from: [0209] (i) the sequence SEQ ID NO: 1 or its RNA equivalent [0210] (ii) the sequence hybridizing under high stringency conditions with the sequence SEQ ID NO: 1, [0211] (iii) the sequence complementary to the sequence SEQ ID NO: 1 or to the sequence hybridizing under high stringency conditions with the sequence SEQ ID NO: 1, [0212] (iv) the nucleotide sequence of a representative fragment of the polynucleotide as defined in (i), (ii) or (iii), [0213] (v) the sequence as defined in (i), (ii), (iii) or (iv), modified, and [0214] c) a recombinant expression vector comprising a polynucleotide as defined in b), and [0215] d) a cDNA library as defined above, said immunogenic composition being capable of inducing protective humoral or cellular immunity specific for the SARS-associated coronavirus, in particular the production of an antibody directed against a specific epitope of the SARS-associated coronavirus.
[0216] The proteins and peptides as defined above, in particular the S, M, E and/or N proteins and the derived peptides, and the nucleic acid (DNA or RNA) molecules encoding said proteins or said peptides are good candidate vaccines and may be used in immunogenic compositions for the production of a vaccine against the SARS-associated coronavirus.
[0217] According to an advantageous embodiment of the compositions according to the invention, they additionally contain at least one pharmaceutically acceptable vehicle and optionally carrier substances and/or adjuvants.
[0218] The pharmaceutically acceptable vehicles, the carrier substances and the adjuvants are those conventionally used.
[0219] The adjuvants are advantageously chosen from the group consisting of oily emulsions, saponin, mineral substances, bacterial extracts, aluminum hydroxide and squalene.
[0220] The carrier substances are advantageously selected from the group consisting of unilamellar liposomes, multilamellar liposomes, micelles of saponin or solid microspheres of a saccharide or auriferous nature.
[0221] The compositions according to the invention are administered by the general route, in particular by the intramuscular or subcutaneous route or alternatively by the local, in particular nasal (aerosol) route.
[0222] The subject of the present invention is also the use of an isolated or purified protein or peptide having a sequence selected from the group consisting of the sequences SEQ ID NO: 3, 10, 12, 14, 17, 22, 24, 26, 28, 30, 33, 35, 37, 69, 70, 71, 74 and 75 to form an immune complex with an antibody specifically directed against an epitope of the SARS-associated coronavirus.
[0223] The subject of the present invention is also an immune complex consisting of an isolated or purified protein or peptide having a sequence selected from the group consisting of the sequences SEQ ID NO: 3, 10, 12, 14, 17, 22, 24, 26, 28, 30, 33, 35, 37, 69, 70, 71, 74 and 75, and of an antibody specifically directed against an epitope of the SARS-associated coronavirus.
[0224] The subject of the present invention is also the use of an isolated or purified protein or peptide having a sequence selected from the group consisting of the sequences SEQ ID NO: 3, 10, 12, 14, 17, 22, 24, 26, 28, 30, 33, 35, 37, 69, 70, 71, 74 and 75 to induce the production of an antibody capable of specifically recognizing an epitope of the SARS-associated coronavirus.
[0225] The subject of the present invention is also the use of an isolated or purified polynucleotide having a sequence selected from the group consisting of the sequences SEQ ID NO: 1, 2, 4, 7, 8, 13, 15, 16, 18, 19, 20, 31, 36 and 38 to induce the production of an antibody directed against the protein encoded by said polynucleotide and capable of specifically recognizing an epitope of the SARS-associated coronavirus.
[0226] The subject of the present invention is also monoclonal antibodies recognizing the native S protein of a SARS-associated coronavirus.
[0227] The subject of the present invention is also the use of a protein or a polypeptide of the S protein family, as defined above, or of an antibody recognizing the native S protein, as defined above, to detect an infection by a SARS-associated coronavirus, in a biological sample.
[0228] The subject of the present invention is also a method for detecting an infection by a SARS-associated coronavirus, in a biological sample, characterized in that the detection is carried out by ELISA using the recombinant S protein, expressed in a eukaryotic system.
[0229] According to an advantageous embodiment of said method, it is a double epitope ELISA method, and the serum to be tested is mixed with the visualizing antigen, said mixture then being brought into contact with the antigen attached to a solid support.
[0230] The subject of the present invention is also an immune complex consisting of a monoclonal antibody or antibody fragment recognizing the native S protein, and of a protein or a peptide of the SARS-associated coronavirus.
[0231] The subject of the present invention is also an immune complex consisting of a protein or a polypeptide of the S protein family, as defined above, and of an antibody specifically directed against an epitope of the SARS-associated coronavirus.
[0232] The subject of the present invention is additionally a SARS-associated coronavirus detection kit or box, characterized in that it comprises at least one reagent selected from the group consisting of: a protein or polypeptide of the S protein family, as defined above, a nucleic acid encoding a protein or peptide of the S protein family, as defined above, a cell expressing a protein or polypeptide of the S protein family, as defined above, or an antibody recognizing the native S protein of a SARS-associated coronavirus.
[0233] The subject of the present invention is an immunogenic and/or vaccine composition, characterized in that it comprises a polypeptide or a recombinant protein of the S protein family, as defined above, obtained in a eukaryotic expression system.
[0234] The subject of the present invention is also an immunogenic and/or vaccine composition, characterized in that it comprises a vector or recombinant virus, expressing a protein or a polypeptide of the S protein family, as defined above.
[0235] In addition to the preceding features, the invention further comprises other features, which will emerge from the description which follows, which refers to examples of use of the polynucleotide representing the genome of the SARS-CoV strain derived from the sample recorded under the number 031589, and derived cDNA fragments which are the subject of the present invention, and to Table I presenting the sequence listing:
TABLE-US-00001 TABLE I Sequence listing Position Deposit of the number at cDNA with the CNCM reference to of the Identification Genbank corresponding number Sequence AY274119.3 plasmid SEQ ID NO: 1 genome of the -- -- strain derived from the sample 031589 SEQ ID NO: 2 ORF-S* 21406-25348 -- SEQ ID NO: 3 S protein -- -- SEQ ID NO: 4 ORF-S** 21406-25348 I-3059 SEQ ID NO: 5 Sa fragment 21406-23454 I-3020 SEQ ID NO: 6 Sb fragment 23322-25348 I-3019 SEQ ID NO: 7 ORF-3 + ORF-4* 25110-26244 -- SEQ ID NO: 8 ORF-3 + ORF-4** 25110-26244 I-3126 SEQ ID NO: 9 ORF3 -- -- SEQ ID NO: 10 ORF-3 protein -- -- SEQ ID NO: 11 ORF4 -- -- SEQ ID NO: 12 ORF-4 protein -- -- SEQ ID NO: 13 ORF-E* 26082-26413 -- SEQ ID NO: 14 E protein -- -- SEQ ID NO: 15 ORF-E** 26082-26413 I-3046 SEQ ID NO: 16 ORF-M* 26330-27098 -- SEQ ID NO: 17 M protein -- -- SEQ ID NO: 18 ORF-M** 26330-27098 I-3047 SEQ ID NO: 19 ORF7 to 11* 26977-28218 -- SEQ ID NO: 20 ORF7 to 11** 26977-28218 I-3125 SEQ ID NO: 21 ORF7 -- -- SEQ ID NO: 22 ORF7 protein -- -- SEQ ID NO: 23 ORF8 -- -- SEQ ID NO: 24 ORF8 protein -- -- SEQ ID NO: 25 ORF9 -- -- SEQ ID NO: 26 ORF9 protein -- -- SEQ ID NO: 27 ORF10 -- -- SEQ ID NO: 28 ORF10 protein -- -- SEQ ID NO: 29 ORF11 -- -- SEQ ID NO: 30 ORF11 protein -- -- SEQ ID NO: 31 OrF1ab 265-21485 -- SEQ ID NO: 32 ORF13 28130-28426 -- SEQ ID NO: 33 ORF13 protein -- -- SEQ ID NO: 34 ORF14 -- -- SEQ ID NO: 35 ORF14 protein 28583-28795 -- SEQ ID NO: 36 ORF-N* 28054-29430 SEQ ID NO: 37 N protein -- -- SEQ ID NO: 38 ORF-N** 28054-29430 I-3048 SEQ ID NO: 39 noncoding 5'** 1-204 I-3124 SEQ ID NO: 40 noncoding 3'** 28933-29727 I-3123 SEQ ID NO: 41 ORF1ab 30-500 -- Fragment L0 SEQ ID NO: 42 Fragment L1 211-2260 -- SEQ ID NO: 43 Fragment L2 2136-4187 -- SEQ ID NO: 44 Fragment L3 3892-5344 -- SEQ ID NO: 45 Fragment L4b 4932-6043 -- SEQ ID NO: 46 Fragment L4 5305-7318 -- SEQ ID NO: 47 Fragment L5 7275-9176 -- SEQ ID NO: 48 Fragment L6 9032-11086 -- SEQ ID NO: 49 Fragment L7 10298-12982 -- SEQ ID NO: 50 Fragment L8 12815-14854 -- SEQ ID NO: 51 Fragment L9 14745-16646 -- SEQ ID NO: 52 Fragment L10 16514-18590 -- SEQ ID NO: 53 Fragment L11 18500-20602 -- SEQ ID NO: 54 Fragment L12 20319-22224 -- SEQ ID NO: 55 Sense N primer -- -- SEQ ID NO: 56 Antisense -- -- N primer SEQ ID NO: 57 Sense SC primer -- -- SEQ ID NO: 58 Sense SL primer -- -- SEQ ID NO: 59 Antisense SC -- -- and SL primer SEQ ID NO: 60 Sense primer 28507-28522 -- series 1 SEQ ID NO: 61 Antisense primer 28774-28759 series 1 SEQ ID NO: 62 Sense primer 28375-28390 -- series 2 SEQ ID NO: 63 Antisense primer 28702-28687 -- series 2 SEQ ID NO: 64 Probe 1/series 1 28561-28586 -- SEQ ID NO: 65 Probe 2/series 1 28588-28608 -- SEQ ID NO: 66 Probe 1/series 2 28541-28563 -- SEQ ID NO: 67 Probe 2/series 2 28565-28589 -- SEQ ID NO: 68 Anchor primer 14T SEQ ID NO: 69 Peptide M2-14 -- -- SEQ ID NO: 70 Peptide E1-12 -- -- SEQ ID NO: 71 Peptide E53-76 -- -- SEQ ID NO: 72 Noncoding 5'* 1-204 -- SEQ ID NO: 73 Noncoding 3'* 28933-29727 -- SEQ ID NO: 74 ORF1a protein -- -- SEQ ID NO: 75 ORF1b protein -- -- SEQ ID NO: 76-139 Primers SEQ ID NO: 140 Pseudogene of S SEQ ID NO: Primers 141-148 SEQ ID NO: 149 Aa1-13 of S SEQ ID NO: 150 Polypeptide SEQ ID NO: Primers 151-158 *PCR amplification product (amplicon) **Insert cloned into the plasmid deposited at the CNCM and to the appended drawings in which: FIG. 1 illustrates Western-blot analysis of the expression in vitro of the recombinant proteins N, SC and SL from the expression vectors pIVEX. Lane 1: pIV2.3N. Lane 2: pIV2.3SC. Lane 3: pIV2.3SL. Lane 4: pIV2.4N. Lane 5: pIV2.4S1 or pIV2.4SC. Lane 6: pIV2.4SL. The expression of the GFP protein expressed from the same vector is used as a control. FIG. 2 illustrates the analysis, by polyacrylamide gel electrophoresis under denaturing conditions (SDS-PAGE) and staining with Coomassie blue, of the expression in vivo of the N protein from the expression vectors pIVEX. The E. coli BL21(DE3)pDIA17 strain transformed with the recombinant vectors pIVEX is cultured at 30° C. in LB medium, in the presence or in the absence of inducer (IPTG 1 mM). Lane 1: pIV2.3N. Lane 2: pIV2.4N. FIG. 3 illustrates the analysis, by polyacrylamide gel electrophoresis under denaturing conditions (SDS-PAGE) and staining with Coomassie blue, of the expression in vivo of the SL and SC polypeptides from the expression vectors pIVEX. The E. coli BL21(DE3)pDIA17 strain transformed with the recombinant vectors pIVEX is cultured at 30° C. in LB medium, in the presence or in the absence of inducer (IPTG 1 mM). Lane 1: pIV2.3SC. Lane 2: pIV2.3SL. Lane 3: pIV2.4S1. Lane 4: pIV2.4SL. FIG. 4 illustrates the antigenic activity of the recombinant N, SL and SC proteins produced in the E. coli BL21(DE3)pDIA17 strain transformed with the recombinant vectors pIVEX. A: electrophoresis (SDS-PAGE) of the bacterial lysates. B and C: Western-blot with the sera, obtained from the same patient infected with SARS-CoV, collected 8 days (B: serum M12) and 29 days (C: serum M13) respectively after the onset of the SARS symptoms. Lane 1: pIV2.3N. Lane 2: pIV2.4N. Lane 3: pIV2.3SC. Lane 4: pIV2.4S1. Lane 5: pIV2.3SL. Lane 6: pIV2.4SL. FIG. 5 illustrates the purification on an Ni-NTA agarose column of the recombinant N protein produced in the E. coli BL21(DE3)pDIA17 strain from the vector pIV2.3N. Lane 1: total bacterial extract. Lane 2: soluble extract. Lane 3: insoluble extract. Lane 4: extract deposited on the Ni-NTA column. Lane 5: unbound proteins. Lane 6: fractions of peak 1. Lane 7: fractions of peak 2. FIG. 6 illustrates the purification of the recombinant SC protein from the inclusion bodies produced in the E. coli BL21(DE3)pDIA17 strain transformed with pIV2.4S1. A. Treatment with Triton X-100 (2%): Lane 1: total bacterial extract. Lane 2: soluble extract. Lane 3: insoluble extract. Lane 4: supernatant after treatment with Triton X-100 (2%). Lanes 5 and 6: pellet after treatment with Triton X-100 (2%). B: Treatment with 4M, 5M, 6M and 7M urea of the soluble and insoluble extracts. FIG. 7 represents the immunoblot produced with the aid of a lysate of cells infected with SARS-CoV and a serum from a patient suffering from a typical pneumopathy. FIG. 8 represents immunoblots produced with the aid of a lysate of cells infected with SARS-CoV and rabbit immunosera specific for the nucleoprotein N (A) and for the spicule protein S (B). I.S.: immune serum. p.i.: preimmune serum. The anti-N immune serum was used at 1/50 000 and the anti-S immune serum at 1/10 000. FIG. 9 illustrates the ELISA reactivity of the rabbit monospecific polyclonal sera directed against the N protein or the short fragment of the S protein (SC), toward the corresponding recombinant proteins used for immunization. A: rabbits P13097, P13081 and P13031 immunized with the purified recombinant N protein. B: rabbits P11135, P13042 and P14001 immunized with a preparation of inclusion bodies corresponding to the short fragment of the S protein (SC). I.S.: immune serum. p.i.: preimmune serum. FIG. 10 illustrates the ELISA reactivity of the purified recombinant N protein, toward sera from patients suffering from a typical pneumonia caused by SARS-CoV. FIG. 10a: ELISA plates prepared with the N protein at the concentration of 4 μg/ml and 2 μg/ml. FIG. 10B: ELISA plate prepared with the N protein at the concentration of 1 μg/ml. The sera designated A, B, D, E, F, G, H correspond to those of Table IV. FIG. 11 illustrates the amplification by RT-PCR of decreasing quantities of synthetic RNA of the SARS-CoV N gene (107 to 1 copy), with the aid of pairs of primers No. 1 (N/+/28507, N/-/28774) (A) and No. 2 (N/+/28375, N/-/28702) (B). T: amplification performed in the absence of RNA. MW: DNA marker. FIG. 12 illustrates the amplification by RT-PCR in real time of synthetic RNA for the SARS-CoV N gene: decreasing quantities of synthetic RNA as replica (repli.; lanes 16 to 29) and of viral RNA diluted 1/20 × 10-4 (lane 32) were amplified by RT-PCR in real time with the aid of the kit "Light Cycler RNA Amplification Kit Hybridization Probes" and pairs of primers and probes of the No. 2 series, under the conditions described in Example 8. FIG. 13 (FIGS. 13.1 to 13.7) represents the restriction map of the sequence SEQ ID NO: 1 corresponding to the DNA equivalent of the genome of the SARS-CoV strain derived from the sample recorded under the number 031589. FIG. 14 shows the result of the SARS serology test by indirect N ELISA (1st series of sera tested). FIG. 15 shows the result of the SARS serology test by indirect N ELISA (2nd series of sera tested). FIG. 16 presents the result of the SARS serology test by double epitope N ELISA (1st series of sera tested). FIG. 17 shows the result of the SARS serology test by double epitope N ELISA (2nd series of sera tested). FIG. 18 illustrates the test of reactivity of the anti-N monoclonal antibodies by ELISA on the native nucleoprotein N of SARS-CoV. The antibodies were tested in the form of hybridoma culture supernatants by indirect ELISA using an irradiated lysate of VeroE6 cells infected with SARS-CoV as antigen (SARS lysate curves). A negative control for reactivity is performed for each antibody on a lysate of uninfected VeroE6 cells (negative lysate curves). Several monoclonal antibodies of known specificity were used as negative control antibodies: para1-3 directed against the antigens of the parainfluenza viruses type 1-3 (Bio-Rad) and influenza B directed against the antigens of the influenza virus type B (Bio-Rad). FIG. 19 illustrates the test of reactivity of the anti-N of SARS-CoV monoclonal antibodies by ELISA on the native antigens of the human coronavirus 229E (HCoV-229E). The antibodies were tested in the form of hybridoma culture supernatants by an indirect ELISA test using a lysate of MRC-5 cells infected with the human coronavirus 229E as antigen (229E lysate curves). A negative control for immunoreactivity was performed for each antibody on a lysate of noninfected MRC-5 cells (negative lysate curves). The monoclonal antibody 5-11H.6 directed against the S protein of the human coronavirus 229E (Sizun et al. 1998, J. Virol. Met. 72: 145-152) is used as positive control antibody. The antibodies para1-3 directed against the antigens of the parainfluenza virus type 1-3 (Bio-Rad) and influenza B directed against the antigens of the influenza virus type B (Bio-Rad) were added to the panel of monoclonal antibodies tested. FIG. 20 shows a test of reactivity of the anti-N of SARS-CoV monoclonal antibodies by Western blotting on the denatured native nucleoprotein N of SARS-CoV. A lysate of VeroE6 cells infected with SARS-CoV was prepared in the loading buffer according to Laemmli and caused to migrate in a 12% SDS polyacrylamide gel and then the proteins were transferred onto PVDF membrane. The anti-N monoclonal antibodies tested were used for the immunoassay at the concentration of 0.05 μg/ml. The visualization is carried out with anti-mouse IgG(H + L) antibodies coupled to peroxidase (NA93IV, Amersham) and the ECL+ system. Two monoclonal antibodies were used as negative controls for reactivity: influenza B directed against the antigens of the influenza virus type B (Bio-Rad) and para1-3 directed against the antigens of the parainfluenza virus type 1-3 (Bio-Rad). FIG. 21 presents the plasmids for expression in mammalian cells of the SARS-CoV S protein. The cDNA for the SARS-CoV S was inserted between the BamH1 and Xho1 sites of the expression plasmid pcDNA3.1(+) (Clontech) in order to obtain the plasmid pcDNA-S and between the Nhe1 and Xho1 sites of the expression plasmid pCI (Promega) in order to obtain the plasmid pCI-S. The WPRE and CTE sequences were inserted between each of the two plasmids pcDNA-S and pCI-S between the Xho1 and Xba1 sites in order to obtain the plasmids pcDNA-S-CTE, pcDNA-S-WPRE, pCI-S-CTE and pCI-S-WPRE, respectively. SP: signal peptide predicted (aa 1-13) with the software signalP v2.0 (Nielsen et al., 1997, Protein Engineering, 10: 1-6) TM: transmembrane region predicted (aa 1196-1218) with the software TMHMM v2.0 (Sonnhammer et al., 1998, Proc. of Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pp. 175-182, AAAI Press). It should be noted that the amino acids W1194 and P1195 are possibly part of the
transmembrane region with the respective probabilities of 0.13 and 0.42 P-CMV: cytomegalovirus immediate/early promoter. BGH pA: polyadenylation signal of the bovine growth hormone gene SV40 late pA: SV40 virus late polyadenylation signal SD/SA: splice donor and acceptor sites WPRE: sequences of the "Woodchuck Hepatitis Virus posttranscriptional regulatory element" of the woodchuck hepatitis virus CTE: sequences of the "constitutive transport element" of the Mason-Pfizer simian retrovirus FIG. 22 illustrates the expression of the S protein after transfection of VeroE6 cells. Cellular extracts were prepared 48 hours after transfection of VeroE6 cells with the plasmids pcDNA, pcDNA-S, pCI and pCI-S. Cellular extracts were also prepared 18 hours after infection with the recombinant vaccinia virus VV-TF7.3 and transfection with the plasmids pcDNA or pcDNA-S. As a control, extracts of VeroE6 cells were prepared 8 hours after infection with SARS-CoV at a multiplicity of infection of 3. They were separated on an 8% SDS acrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham). A molecular mass ladder (kDa) is presented in the figure. SARS-CoV: extract of VeroE6 cells infected with SARS-CoV Mock: control extract of noninfected cells FIG. 23 illustrates the effect of the CTE and WPRE sequences on the expression of the S protein after transfection of VeroE6 and 293T cells. Cellular extracts were prepared 48 hours after transfection of VeroE6 cells (A) or 293T cells (B) with the plasmids pcDNA, pcDNA-S, pcDNA-S-CTE, pcDNA-S-WPRE, pCI-S, pCI-S-CTE and pCI-S-WPRE separated on 8% SDS polyacrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham). A molecular mass ladder (kDa) is presented in the figure. SARS-CoV: extract of VeroE6 cells prepared 8 hours after infection with SARS-CoV at a multiplicity of infection of 3. Mock: control extract of noninfected VeroE6 cells FIG. 24 presents defective lentiviral vectors with central DNA flap for the expression of SARS-CoV S. The cDNA for the SARS-CoV S protein was cloned in the form of a BamH1-Xho1 fragment into the plasmid pTRIPΔU3-CMV containing a defective lentiviral vector TRIP with central DNA flap (Sirven et al., 2001, Mol. Ther., 3: 438-448) in order to obtain the plasmid pTRIP-S. The optimum expression cassettes consisting of the CMV virus immediate/early promoter, a splice signal, cDNA for S and either of the posttranscriptional signals CTE or WPRE were substituted for the cassette EF1α-EGFP of the defective lentiviral expression vector with central DNA flap TRIPΔU3-EF1α (Sirven et al., 2001, Mol. Ther., 3: 438-448) in order to obtain the plasmids pTRIP-SD/SA-S-CTE and pTRIP-SD/SA-S-WPRE. SP: signal peptide TM: transmembrane region P-CMV: cytomegalovirus immediate/early promoter P-EF1α: EF1α gene promoter SD/SA: splice donor and acceptor sites WPRE: sequences of the "Woodchuck Hepatitis Virus posttranscriptional regulatory element" of the woodchuck hepatitis virus CTE: sequences of the "constitutive transport element" of the Mason-Pfizer simian retrovirus LTR: long terminal repeat ΔU3: LTR deleted for the "promoter/enhancer" sequences cPPT: "polypurine tract cis-active sequence" CTS: "central termination sequence" FIG. 25 shows the Western-blot analysis of the expression of the SARS-CoV S by cell lines transduced with the lentiviral vectors TRIP-SD/SA-S-WPRE and TRIP-SD/SA-S-CTE. Cellular extracts were prepared from established lines FrhK4-S-CTE and FrhK4-S-WPRE after transduction with the lentiviral vectors TRIP-SD/SA-S-CTE and TRIP-SD/SA-S-WPRE respectively. They were separated on an 8% SDS acrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) conjugate coupled to peroxidase. A molecular mass ladder (kDa) is presented in the figure. T-: control extract of FrhK-4 cells T+: extract of FrhK-4 cells prepared 24 hours after infection with SARS-CoV at a multiplicity of infection of 3. FIG. 26 relates to the analysis of the expression of Ssol polypeptide by cell lines transduced with the lentiviral vectors TRIP-SD/SA-Ssol-WPRE and TRIP-SD/SA-Ssol-CTE. The secretion of the Ssol polypeptide was determined in the supernatant of a series of cell clones isolated after transduction of FrhK-4 cells with the lentiviral vectors TRIP-SD/SA-Ssol-WPRE and TRIP-SD/SA-Ssol-CTE. 5 μl of supernatant, diluted 1/2 in loading buffer according to Laemmli, were analyzed by Western blotting, visualized with an anti-FLAG monoclonal antibody (M2, Sigma) and an anti-mouse IgG(H + L) conjugate coupled to peroxidase. T-: supernatant of the parental FRhK-4 line. T+: supernatant of BHK cells infected with a recombinant vaccinia virus expressing the Ssol polypeptide. The solid arrow indicates the Ssol polypeptide, while the empty arrow indicates a cross reaction with a protein of cellular origin. FIG. 27 shows the results relating to the analysis of the purified Ssol polypeptide A. 8, 2, 0.5 and 0.125 μg of recombinant Ssol polypeptide purified by anti-FLAG affinity chromatography and gel filtration (G75) were separated on 8% SDS polyacrylamide gel. The Ssol polypeptide and variable quantities of molecular mass markers (MM) were visualized by staining with silver nitrate (Gelcode SilverSNAP stain kit II, Pierce). B. Standard markers for analysis by SELDI-TOF mass spectrometry IgG: bovine IgG of MM 147300 ConA: conalbumin of MM 77490 HRP: horseradish peroxidase analyzed as a control and of MM 43240 C. Analysis by mass spectrometry (SELDI-TOF) of the recombinant Ssol polypeptide. The peaks A and B correspond to the single and double charged Ssol polypeptide. D. Sequencing of the N-terminal end of the recombinant Ssol polypeptide. 5 Edman degradation cycles in liquid phase were carried out on an ABI494 sequencer (Applied Biosystems). FIG. 28 illustrates the influence of a splicing signal and of the CTE and WPRE sequences on the efficacy of the gene immunization with the aid of plasmid DNA encoding the SARS-CoV S A. Groups of 7 BALB/c mice were immunized twice at 4 weeks' interval with the aid of 50 μg of plasmid DNA of pCI, pcDNA-S, pCI-S, pcDNA-N and pCI-HA. B. Groups of 6 BALB/c mice were immunized twice at 4 weeks' interval with the aid of 2 μg, 10 μg or 50 μg of plasmid DNA of pCI, pCI-S, pCI-S-CTE and pCI-S-WPRE. The immune sera collected 3 weeks after the second immunization were analyzed by indirect ELISA using a lysate of VeroE6 cells infected with SARS-CoV as antigen. The anti-SARS-CoV antibody titers are calculated as the reciprocal of the dilution producing a specific OD of 0.5 after visualization with an anti-mouse IgG polyclonal antibody coupled to peroxidase (NA931V, Amersham) and TMB (KPL). FIG. 29 shows the seroneutralization of the infectivity of SARS-CoV with the antibodies induced in mice after gene immunization with the aid of plasmid DNA encoding SARS-CoV S. Pools of immune sera collected 3 weeks after the second immunization were prepared for each of the groups of experiments described in FIG. 28 and evaluated for their capacity to seroneutralize the infectivity of 100 TCID50 of SARS-CoV on FRhK-4 cells. 4 points are produced for each of the 2-fold dilutions tested from 1/20. The seroneutralizing titer is calculated according to the Reed and Munsch method as the reciprocal of the dilution neutralizing the infectivity of 2 wells out of 4. A. Groups by BALB/c mice immunized twice at 4 weeks' interval with the aid of 50 μg of plasmid DNA of pCI, pcDNA-S, pCI-S, pcDNA-N and pCI-HA. quadrature: preimmune serum. .box-solid.: immune serum. B. Groups of BALB/c mice immunized twice at 4 weeks' interval with the aid of 2 μg, 10 μg or 50 μg of plasmid DNA of pCI, pCI-S, pCI-S-CTE and pCI-S-WPRE. FIG. 30 illustrates the immunoreactivity of the recombinant Ssol polypeptide toward sera from patients suffering from SARS. The reactivity of sera from patients was analyzed by indirect ELISA test against solid phases prepared with the aid of the purified recombinant Ssol polypeptide. The antibodies from patients reacting with the solid phase at a dilution of 1/400 are visualized with a human anti-IgG(H + L) polyclonal antibody coupled to peroxidase (Amersham NA933V) and TMB plus H202 (KPL). The sera of probable SARS cases are identified by a National Reference Center for Influenza Viruses serial number and by the initials of the patient and the number of days elapsed since the onset of symptoms, where appropriate. The TV sera are control sera from subjects which were collected in France before the SARS epidemic which occurred in 2003. FIG. 31 shows the induction of antibodies directed against SARS-CoV after immunization with the recombinant Ssol polypeptide. Two groups of 6 mice were immunized at 3 weeks' interval with 10 μg of recombinant Ssol polypeptide (Ssol group) adjuvanted with aluminum hydroxide or, as a control, of adjuvant alone (mock group). Three successive immunizations were performed and the immune sera were collected 3 weeks after each of the three immunizations (IS1, IS2, IS3). The immune sera were analyzed per pool for each of the 2 groups by indirect ELISA using a lysate of VeroE6 cells infected with SARS-CoV as antigen. The anti-SARS-CoV antibody titers are calculated as the reciprocal of the dilution producing a specific OD of 0.5 after visualization with an anti-mouse IgG polyclonal antibody coupled to peroxidase (Amersham) and TMB (KPL). FIG. 32 presents the nucleotide alignment of the sequences of the synthetic gene 040530 with the sequence of the wild-type gene of the SARS-CoV isolate 031589. I-3059 corresponds to nucleotides 21406-25348 of the SARS-CoV isolate 031589 deposited at the C.N.C.M. under the number I-3059 (SEQ ID NO: 4, plasmid pSARS-S) S-040530 is the sequence of the synthetic gene 040530. FIG. 33 illustrates the use of a synthetic gene for the expression of the SARS-CoV S. Cellular extracts prepared 48 hours after transfection of VeroE6 cells (A) or 293T cells (B) with the plasmids pCI, pCI-S, pCI-S-CTE, pCI-S-WPRE and pCI-Ssynth were separated on 8% SDS acrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham). The Western blot is visualized by luminescence (ECL+, Amersham) and acquisition on a digital imaging device (FluorS, BioRad). The levels of expression of the S protein were measured by quantifying the 2 predominant bands identified on the image. FIG. 34 presents a diagram for the construction of recombinant vaccinia viruses VV-TG-S, VV-TG-Ssol, VV-TN-S and VV-TN-Ssol A. The cDNAs for the S protein and the Ssol polypeptide of SARS-CoV were inserted between the BamH1 and Sma1 sites of the transfer plasmid pTG186 in order to obtain the plasmids pTG-S and pTG-Ssol. B. The sequences of the synthetic promoter 480 were then substituted for those of the 7.5 promoter by exchange of the Nde1-Pst1 fragments of the plasmids pTG186poly, pTG-S and pTG-Ssol in order to obtain the transfer plasmids pTN480, pTN-S and pTN-Ssol. C. Sequence of the synthetic promoter 480 as contained between the Nde1 and Pst1 sites of the transfer plasmids of the pTN series. An Asc1 site was inserted in order to facilitate subsequent handling. The restriction sites and the promoter sequence are underlined. D. The recombinant vaccinia viruses are obtained by double homologous recombination in vivo between the TK cassette of the transfer plasmids of the pTG and pTN series and the TK gene of the Copenhagen strain of the vaccinia virus. SP: signal peptide predicted (aa 1-13) with the software signalP v2.0 (Nielsen et al., 1997, Protein Engineering, 10: 1-6) TM: transmembrane region predicted (aa 1196-1218) with the software TMHMM v2.0 (Sonnhammer et al., 1998, Proc. of Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pp. 175-182, AAAI Press). It should be noted that the amino acids W1194 and P1195 possibly form part of the transmembrane region with respective probabilities of 0.13 and 0.42. TK-L, TK-R: left- and right-hand parts of the vaccinia virus thymidine kinase gene MCS: multiple cloning site PE: early promoter PL: late promoter PL synth: synthetic late promoter 480 FIG. 35 illustrates the expression of the S protein by recombinant vaccinia viruses, analyzed by Western blotting. Cellular extracts were prepared 18 hours after infection of CV1 cells with the recombinant vaccinia viruses VV-TG, VV-TG-S and VV-TN-S at an M.O.I. of 2 (A). As a control, extracts of VeroE6 cells were prepared 8 hours after infection with SARS-CoV at a multiplicity of infection of 2. Cellular extracts were also prepared 18 hours after infection of CV1 cells with the recombinant vaccinia viruses VV-TG-S, VV-TG-Ssol, VV-TN, VV-TN-S and VV-TN-Ssol (B). They were separated on 8% SDS acrylamide gels and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham). "1 μl" and "10 μl" indicates the quantities of cellular extracts deposited on the gel. A molecular mass ladder (kDa) is presented in the figure. SARS-CoV: extract of VeroE6 cells infected with SARS-CoV Mock: control extract of noninfected cells FIG. 36 shows the result of a Western-blot analysis of the secretion of the Ssol polypeptide by the recombinant vaccinia viruses. A. Supernatants of CV1 cells infected with the recombinant vaccinia virus VV-TN, various clones of the VV-TN-Ssol virus and with the viruses VV-TG-Ssol or VV-TN-Sflag were harvested 18 hours after infection of CV1 cells at an M.O.I. of 2. B. Supernatants of 293T, FRhK-4, BHK-21 and CV1 cells infected in duplicate (1.2) with the recombinant vaccinia virus VV-TN-Ssol at an M.O.I. of 2 were harvested 18 hours after infection. The supernatant of CV1 cells infected with the virus VV-TN was also harvested as a control (M). All the supernatants were separated on 8% SDS acrylamide gel according to Laemmli and analyzed by Western blotting with the aid of an anti-FLAG mouse monoclonal antibody and an anti-mouse IgG(H + L) polyclonal antibody coupled to peroxidase (NA931V, Amersham) (A) or with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham) (B). A molecular mass ladder (kDa) is presented in the figure. FIG. 37 shows the analysis of the Ssol polypeptide, purified on SDS polyacrylamide gel 10, 5 and 2 μl of recombinant Ssol polypeptide purified by anti-FLAG affinity chromatography were separated on 4 to 15% gradient SDS polyacrylamide gel. The Ssol polypeptide and variable quantities of molecular mass markers (MM) were visualized by staining with silver nitrate (Gelcode SilverSNAP stain kit II, Pierce). FIG. 38 illustrates the immunoreactivity of the recombinant Ssol polypeptide produced by the recombinant vaccinia virus VV-TN-Ssol toward sera of patients suffering from SARS. The reactivity of sera from patients was analyzed by indirect ELISA test against solid phases prepared with the aid of the purified recombinant Ssol polypeptide. The antibodies from patients reacting with the solid phase at a dilution of
1/100 and 1/400 are visualized with a human anti-IgG(H + L) polyclonal antibody coupled to peroxidase (Amersham NA933V) and TMB plus H202 (KPL). The sera of probable SARS cases are identified by a National Reference Center for Influenza Virus serial number and by the initials of the patient and the number of days elapsed since the onset of symptoms, where appropriate. The TV sera are control sera from subjects which were collected in France before the SARS epidemic which occurred in 2003. FIG. 39 shows the anti-SARS-CoV antibody response in mice after immunization with the recombinant vaccinia viruses. Groups of 7 BALB/c mice were immunized by the i.v. route twice at 4 weeks' interval with 106 pfu of recombinant vaccinia viruses VV-TG, VV-TG-HA, VV-TG-S, VV-TG-Ssol, VV-TN, VV-TN-S, VV-TN-Ssol. A. Pools of immune sera collected 3 weeks after each of the two immunizations were prepared for each of the groups and were analyzed by indirect ELISA using a lysate of VeroE6 cells infected with SARS-CoV as antigen. The anti-SARS-CoV antibody titers are calculated as the reciprocal of the dilution producing a specific OD of 0.5 after visualization with an anti-mouse IgG polyclonal antibody coupled to peroxidase (NA931V, Amersham) and TMB (KPL). B. The pools of immune sera were evaluated for their capacity to seroneutralize the infectivity of 100 TCID50 of SARS-CoV on FRhK-4 cells. 4 points are produced for each of the 2-fold dilutions tested from 1/20. The seroneutralizing titer is calculated according to the Reed and Munsch method as the reciprocal of the dilution neutralizing the infectivity of 2 wells out of 4. FIG. 40 describes the construction of the recombinant viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol. A. The measles vector is a complete genome of the Schwarz vaccine strain of the measles virus (MV) into which an additional transcription unit has been introduced (Combredet, 2003, Journal of Virology, 77: 11546-11554). The expression of the additional open reading frames (ORF) is controlled by cis-acting elements necessary for the transcription, for the formation of the cap and for the polyadenylation of the transgene which were copied from the elements present at the N/P junction. 2 different vectors allow the insertion between the P (phosphoprotein) and M (matrix) genes on the one hand and the H (hemagglutinin) and L (polymerase) genes on the other hand. B. The recombinant genomes MVSchw2-SARS-S and MVSchw2-SARS-Ssol of the measles virus were constructed by inserting the ORFs of the S protein and of the Ssol polypeptide into an additional transcription unit located between the P and M genes of the vector. The various genes of the measles virus (MV) are indicated: N (nucleoprotein), PVC (V/C phosphoprotein and protein), M (matrix), F (fusion), H (hemagglutinin), L (polymerase). T7 = T7 RNA polymerase promoter, hh = hammerhead ribozyme, T7t = T7 phage RNA polymerase terminator sequence, δ = ribozyme of the hepatitis δ virus, (2), (3) = additional transcription units (ATU). Size of the MV genome: 15 894 nt. SP: signal peptide TM: transmembrane region FLAG: FLAG tag FIG. 41 illustrates the expression of the S protein by the recombinant measles viruses, analyzed by Western blotting. Cytoplasmic extracts were prepared after infection of Vero cells by different passages of the viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol and the wild-type virus MWSchw as control. Cellular extracts in loading buffer according to Laemmli were also prepared 8 hours after infection of VeroE6 cells with SARS-CoV at a multiplicity of infection of 3. They were separated on 8% SDS acrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled to peroxidase (NA934V, Amersham). A molecular mass ladder (kDa) is presented in the figure. Pn: nth passage of the virus after coculture of 293-3-46 and Vero cells SARS-CoV: extract of VeroE6 cells infected with SARS-CoV Mock: control extract of noninfected VeroE6 cells FIG. 42 shows the expression of the S protein by the recombinant measles viruses, analyzed by immunofluorescence Vero cells in monolayers on glass slides were infected with the wild-type virus MWSchw (A) or the viruses MVSchw2-SARS-S (B) and MVSchw2-SARS-Ssol (C). When the syncytia have reached 30 to 40% confluence (A., B.) or 90-100% (C), the cells were fixed, permeabilized and labeled with anti-SARS-CoV rabbit polyclonal antibodies and an anti-rabbit IgG(H + L) conjugate coupled to FITC (Jackson). FIG. 43 illustrates the Western-blot analysis of the immunoreactivity of rabbit sera directed against the peptides E1-12, E53-76 and M2-14. The rabbit 20047 was immunized with the peptide E1-12 coupled to KLH. The rabbits 22234 and 22240 were immunized with the peptide E53-76 coupled to KLH. The rabbits 20013 and 20080 were immunized with the peptide M2-14 coupled to KLH. The immune sera were analyzed by Western blotting with the aid of extracts of cells infected with SARS-CoV (B) or with the aid of extracts of cells infected with a recombinant vaccinia virus expressing the protein E (A) or M (C) of the SARS-CoV 031589 isolate. The immunoblots were visualized with the aid of an anti-rabbit IgG(H + L) conjugate coupled to peroxidase (NA934V, Amersham).
[0236] The position of the E and M proteins is indicated by an arrow.
[0237] A molecular mass ladder (kDa) is presented in the figure.
[0238] It should be understood, however, that these examples are given solely by way of illustration of the subject of the invention, and do not constitute in any manner a limitation thereto.
EXAMPLE 1
Cloning and Sequencing of the Genome of the SARS-CoV Strain Derived from the Sample Recorded Under the Number 031589
[0239] The RNA of the SARS-CoV strain was extracted from the sample of bronchoalveolar washing recorded under the number 031589, performed on a patient at the Hanoi (Vietnam) French hospital suffering from SARS.
[0240] The isolated RNA was used as template to amplify the cDNAs corresponding to the various open reading frames of the genome (ORF1a, ORF1b, ORF-S, ORF-E, ORF-M, ORF-N (including ORF-13 and ORF-14), ORF3, ORF4, ORF7 to ORF11), and at the noncoding 5' and 3' ends. The sequences of the primers and of the probes used for the amplification/detection were defined based on the available SARS-CoV nucleotide sequence.
[0241] In the text which follows, the primers and the probes are identified by: the letter S, followed by a letter which indicates the corresponding region of the genome (L for the 5' end including ORF1a and ORF1b; S, M and N for ORF-S, ORF-M, ORF-N, SE and MN for the corresponding intergene regions), and then optionally by Fn, Rn, with n between 1 and 6 corresponding to the primers used for the nested PCR (F1+R1 pair for the first amplification, F2+R2 pair for the second amplication, and the like), and then by /+/ or /-/ corresponding to a sense or antisense primer and finally by the positions of the primers with reference to the Genbank sequence AY27411.3; for the sense and antisense S and N primers and the other sense primers only, when a single position is indicated, it corresponds to that of the 5' end of a probe or of a primer of about 20 bases; for the antisense primers other than the S and N primers, when a single position is indicated, it corresponds to that of the 3' end of a probe or of a primer of about 20 bases.
[0242] The amplification products thus generated were sequenced with the aid of specific primers in order to determine the complete sequence of the genome of the SARS-CoV strain derived from the sample recorded under the number 031589. These amplification products, with the exception of those corresponding to ORF1a and ORF1b, were then cloned into expression vectors in order to produce the corresponding viral proteins and the antibodies directed against these proteins, in particular by DNA-based immunization.
1. Extraction of the RNAs
[0243] The RNAs were extracted with the aid of the QIamp viral RNA extraction mini kit (QIAGEN) according to the manufacturer's recommendations. More specifically: 140 μl of the sample and 560 μl of AVL buffer were vigorously mixed for 15 seconds, incubated for 10 minutes at room temperature and then briefly centrifuged at maximum speed. 560 μl of 100% ethanol were added to the supernatant and the mixture thus obtained was very vigorously stirred for 15 sec. 630 μl of the mixture were then deposited on the column.
[0244] The column was placed on a 2 ml tube, centrifuged for 1 min at 8000 rpm, and then the remainder of the preceding mixture was deposited on the same column, centrifuged again, for 1 min at 8000 rpm, and the column was transferred over a clean 2 ml tube. Next, 500 μl of AW1 buffer were added to the column, and then the column was centrifuged for 1 min at 8000 rpm and the eluate was discarded. 500 μl of AW2 buffer were added to the column which was then centrifuged for 3 min at 14 000 rpm and transferred onto a 1.5 ml tube. Finally, 60 μl of AVE buffer were added to the column which was incubated for 1 to 2 min at room temperature and then centrifuged for 1 min at 8000 rpm. The eluate corresponding to the purified RNA was recovered and frozen at -20° C.
2. Amplification, Sequencing and Cloning of the cDNAs 2.1) cDNA Encoding the S Protein
[0245] The RNAs extracted from the sample were subjected to reverse transcription with the aid of random sequence hexameric oligonucleotides (pdN6), so as to produce cDNA fragments.
[0246] The sequence encoding the SARS-CoV S glycoprotein was amplified in the form of two overlapping DNA fragments: 5' fragment (SARS-Sa, SEQ ID NO: 5) and 3' fragment (SARS-Sb, SEQ ID NO: 6), by carrying out two successive amplifications with the aid of nested primers. The amplicons thus obtained were sequenced, cloned into the PCR plasmid vector 2.1-TOPO® (INVITROGEN), and then the sequence of the cloned cDNAs was determined.
a) Cloning and Sequencing of the Sa and Sb Fragments
[0247] a.1) Synthesis of the cDNA
[0248] The reaction mixture containing: RNA (5 μl), H2O for injection (3.5 μl), 5× reverse transcriptase buffer (4 μl), 5 mM dNTP (2 μl), pdN6 100 μg/ml (4 μl), RNasin 40 IU/μl (0.5 μl) and reverse transcriptase AMV-RT, 10 IU/μl, PROMEGA (1 μl) was incubated in a thermocycler under the following conditions: 45 min at 42° C., 15 min at 55° C., 5 min at 95° C., and then the cDNA obtained was kept at +4° C.
a.2) First PCR Amplification
[0249] The 5' and 3' ends of the S gene were respectively amplified with the pairs of primers S/F1/+/21350-21372 and S/R1/-/23518-23498, S/F3/+/23258-23277 and S/R3/-/25382-25363. The 50 μl reaction mixture containing: cDNA (2 μl), 50 μM primers (0.5 μl), 10× buffer (5 μl), 5 mM dNTP (2 μl), Taq Expand High Fidelity, Roche (0.75 μl) and H2O (39, 75 μl) was amplified in a thermocycler, under the following conditions: an initial step of denaturation at 94° C. for 2 min was followed by 40 cycles comprising: a step of denaturation at 94° C. for 30 sec, a step of annealing at 55° C. for 30 sec and then a step of extension at 72° C. for 2 min 30 sec, with 10 sec of additional extension at each cycle, and then a final step of extension at 72° C. for 5 min.
a.3) Second PCR Amplification
[0250] The products of the first PCR amplification (5' and 3' amplicons) were subjected to a second PCR amplification step (nested PCR) under conditions identical to those of the first amplification, with the pairs of primers S/F2/+/21406-21426 and S/R2/-/23454-23435 and S/F4/+/23322-23341 and S/R4/-/25348-25329, respectively for the 5' amplicon and the 3' amplicon.
a.4) Cloning and Sequencing of the Sa and Sb Fragments
[0251] The Sa (5' end) and Sb (3' end) amplicons thus obtained were purified with the aid of the QIAquick PCR purification kit (QIAGEN), following the manufacturer's instructions, and then they were cloned into the vector PCR2.1-TOPO (Invitrogen kit), to give the plasmids called SARS-S1 and SARS-S2.
[0252] The DNA of the Sa and Sb clones was isolated and then the corresponding insert was sequenced with the aid of the Big Dye kit, Applied Biosystem® and universal primers M13 forward and M13 reverse, and primers: S/S/+/21867, S/S/+/22353, S/S/+/22811, S/S/+/23754, S/S/+/24207, S/S/+/24699, S/S/+/24348, S/S/-/24209, S/S/-/23630, S/S/-/23038, S/S/-/22454, S/S/-/21815, S/S/-/24784, S/S/+/21556, S/S/+/23130 and S/S/+/24465 following the manufacturer's instructions; the sequences of the Sa and Sb fragments thus obtained correspond to the sequences SEQ ID NO: 5 and SEQ ID NO: 6 in the sequence listing appended as an annex.
[0253] The plasmid, called SARS-S1, was deposited under the No. I-3020, on May 12, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains a 5' fragment of the sequence of the S gene of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said fragment called Sa corresponding to the nucleotides at positions 21406 to 23454 (SEQ ID NO: 5), with reference to the Genbank sequence AY274119.3 Tor2.
[0254] The plasmid, called TOP10F'-SARS-S2, was deposited under the No. I-3019, on May 12, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains a 3' fragment of the sequence of the S gene of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said fragment called Sb corresponding to the nucleotides at positions 23322 to 25348 (SEQ ID NO: 6), with reference to the Genbank sequence accession No. AY274119.3.
b) Cloning and Sequencing of the Complete cDNA (SARS-S Clone of 4 kb)
[0255] The complete S cDNA was obtained from the abovementioned clones SARS-S1 and SARS-S2, in the following manner:
1) A PCR amplification reaction was carried out on a SARS-S2 clone in the presence of the above-mentioned primer S/R4/-/25348-25329 and of the primer S/S/+/24696-24715: an amplicon of 633 bp was obtained, 2) Another PCR amplification reaction was carried out on another SARS-S2 clone, in the presence of the primers S/F4/+/23322-23341 mentioned above and S/S/-/24803-24784: an amplicon of 1481 bp was obtained.
[0256] The amplification reaction was carried out under the conditions as defined above for the amplification of the Sa and Sb fragments, with the exception that 30 amplification cycles comprising a step of denaturation at 94° C. for 20 sec and a step of extension at 72° C. for 2 min 30 sec were carried out.
3) The 2 amplicons (633 bp and 1481 bp) were purified under the conditions as defined above for the Sa and Sb fragments. 4) Another PCR amplification reaction with the aid of the abovementioned primers S/F4/+/23322-23341 and S/R4/-/25348-25329 was carried out on the purified amplicons obtained in 3). The amplification reaction was carried out under the conditions as defined above for the amplification of the Sa and Sb fragments, except that 30 amplification cycles were performed.
[0257] The 2026 bp amplicon thus obtained was purified, cloned into the vector PCR2.1-TOPO and then sequenced as above, with the aid of the primers as defined above for the Sa and Sb fragments. The clone thus obtained was called clone 3'.
5) The clone SARS-S1 obtained above and the clone 3' were digested with EcoR I, the bands of about 2 kb thus obtained were gel purified and then amplified by PCR with the abovementioned primers S/F2/+/21406-21426 and S/R4/-/25348-25329. The amplification reaction was carried out under the conditions as defined above for the amplification of the Sa and Sb fragments, except that 30 amplification cycles were performed. The amplicon of about 4 kb was purified and sequenced. It was then cloned into the vector PCR2.1-TOPO in order to give the plasmid, called SARS-S, and the insert obtained in this plasmid was sequenced as above, with the aid of the primers as defined above for the Sa and Sb fragments. The cDNA sequences of the insert and of the amplicon encoding the S protein correspond respectively to the sequences SEQ ID NO: 4 and SEQ ID NO: 2 in the sequence listing appended as an annex, they encode the S protein (SEQ ID NO: 3).
[0258] The sequence of the amplicon corresponding to the cDNA encoding the S protein of the SARS-CoV strain derived from the sample No. 031589 has the following two mutations compared with the corresponding sequences of respectively the Tor2 and Urbani isolates, the positions of the mutations being indicated with reference to the complete sequence of the genome of the Tor2 isolate (Genbank AY274119.3): [0259] g/t in position 23220; the alanine codon (gct) in position 577 of the amino acid sequence of the S protein of Tor2 is replaced with a serine codon (tct), [0260] c/t in position 24872: this mutation does not modify the amino acid sequence of the S protein, and the plasmid, called SARS-S, was deposited under the No. I-3059, on Jun. 20, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the S protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, said sequence corresponding to the nucleotides at positions 21406 to 25348 (SEQ ID NO: 4), with reference to the Genbank sequence AY274119.3. 2.2) cDNA Encoding the M and E Proteins
[0261] The RNAs derived from the sample 031589, extracted as above, were subjected to a reverse transcription, combined, during the same step (Titan One Step RT-PCR® kit, Roche), with a PCR amplification reaction, with the aid of the pairs of primers: [0262] S/E/F1/+/26051-26070 and S/E/R1/-/26455-26436 in order to amplify ORF-E, and [0263] S/M/F1/+/26225-26244 and S/M/R1/-/27148-27129 in order to amplify ORF-M.
[0264] A first reaction mixture containing: 8.6 μl of H2O for injection, 1 μl of dNTP (5 mM), 0.2 μl of each of the primers (50 μM), 1.25 μl of DTT (100 mM) and 0.25 μl of RNAsin (40 IU/μl) was combined with a second reaction mixture containing: 1 μl of RNA, 7 μl of H2O for injection, 5 μl of 5×RT-PCR buffer and 0.5 μl of enzyme mixture and the combined mixtures were incubated in a thermocycler under the following conditions: 30 min at 42° C., 10 min at 55° C., 2 min at 94° C. followed by 40 cycles comprising a step of denaturation at 94° C. for 10 sec, a step of annealing at 55° C. for 30 sec and a step of extension at 68° C. for 45 sec, with 3 sec increment per cycle and finally a step of terminal extension at 68° C. for 7 min.
[0265] The amplification products thus obtained (M and E amplicons) were subjected to a second PCR amplification (nested PCR) using the Expand High-Fi® kit, Roche), with the aid of the pairs of primers: [0266] S/E/F2/+/26082-26101 and S/E/R2/-/26413-26394 for the amplicon E, and [0267] S/M/F2/+/26330-26350 and S/M/R2/-/27098-27078 for the amplicon M.
[0268] The reaction mixture containing: 2 μl of the product of the first PCR, 39.25 μl of H2O for injection, 5 μl of 10× buffer containing MgCl2, 2 μl of dNTP (5 mM), 0.5 μl of each of the primers (50 μM) and 0.75 μl of enzyme mixture was incubated in a thermocycler under the following conditions: a step of denaturation at 94° C. for 2 min was followed by 30 cycles comprising a step of denaturation at 94° C. for 15 sec, a step of annealing at 60° C. for 30 sec and a step of extension at 72° C. for 45 sec, with 3 sec increment per cycle, and finally a step of terminal extension at 72° C. for 7 min. The amplification products obtained corresponding to the cDNAs encoding the E and M proteins were sequenced as above, with the aid of the primers: S/E/F2/+/26082 and S/E/R2/-/26394, S/M/F2/+/26330, S/M/R2/-/27078 cited above and the primers S/M/+/26636-26655 and S/M/-/26567-26548. They were then cloned, as above, in order to give the plasmids called SARS-E and SARS-M. The DNA of these clones was then isolated and sequenced with the aid of the universal primers M13 forward and M13 reverse and the primers S/M/+/26636 and S/M/-/26548 mentioned above.
[0269] The sequence of the amplicon representing the cDNA encoding the E protein (SEQ ID NO: 13) of the SARS-CoV strain derived from the sample No. 031589 does not contain differences in relation to the corresponding sequences of the isolates AY274119.3-Tor2 and AY278741-Urbani. The sequence of the E protein of the SARS-CoV 031589 strain corresponds to the sequence SEQ ID NO: 14 in the sequence listing appended as an annex.
[0270] The plasmid, called SARS-E, was deposited under the No. I-3046, on May 28, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the E protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions 26082 to 26413 (SEQ ID NO: 15), with reference to the Genbank sequence accession No. AY274119.3.
[0271] The sequence of the amplicon representing the cDNA encoding M (SEQ ID NO: 16) from the SARS-CoV strain derived from the sample No. 031589 does not contain differences in relation to the corresponding sequence of the isolate AY274119.3-Tor2. By contrast, at position 26857, the isolate AY278741-Urbani contains a c and the sequence of the SARS-CoV strain derived from the sample recorded under the No. 031589 contains a t. This mutation results in a modification of the amino acid sequence of the corresponding protein: at position 154, a proline (AY278741-Urbani) is changed to serine in the SARS-CoV strain derived from the sample recorded under the No. 031589. The sequence of the M protein of the SARS-CoV strain derived from the sample recorded under the No. 031589 corresponds to the sequence SEQ ID NO: 17 in the sequence listing appended as an annex.
[0272] The plasmid, called SARS-M, was deposited under the No. I-3047, on May 28, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence encoding the M protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above; said sequence corresponding to the nucleotides at positions 26330 to 27098 (SEQ ID NO: 18), with reference to the Genbank sequence accession No. AY274119.3.
2.3) cDNA Corresponding to ORF3, ORF4, ORF7 to ORF11
[0273] The same amplification, cloning and sequencing strategy was used to obtain the cDNA fragments corresponding respectively to the following ORFs: ORF3, ORF4, ORF7, ORF8, ORF9, ORF10 and ORF11. The pairs of primers used for the first amplification are: [0274] ORF3 and ORF4: S/SE/F1/+/25069-25088 and S/SE/R1/-/26300-26281 [0275] ORF7 to ORF11: S/MN/F1/+/26898-26917 and S/MN/R1/-/28287-28266
[0276] The pairs of primers used for the second amplification are: [0277] ORF3 and ORF4: S/SE/F2/+/25110-25129 and S/SE/R2/-/26244-26225 [0278] ORF7 to ORF11: S/MN/F2/+/26977-26996 and S/MN/R2/-/28218-28199
[0279] The conditions for the first amplification (RT-PCR) are the following: 45 min at 42° C., 10 min at 55° C., 2 min at 94° C. followed by 40 cycles comprising a step of denaturation at 94° C. for 15 sec, a step of annealing at 58° C. for 30 sec and a step of extension at 68° C. for 1 min, with 5 sec increment per cycle and finally a step of terminal extension at 68° C. for 7 min.
The conditions for the nested PCR are the following: a step of denaturation at 94° C. for 2 min was followed by 40 cycles comprising a step of denaturation at 94° C. for 20 sec, a step of annealing at 58° C. for 30 sec and a step of extension at 72° C. for 50 sec, with 4 sec increment per cycle and finally a step of terminal extension at 72° C. for 7 min.
[0280] The amplification products obtained corresponding to the cDNAs containing respectively ORF3 and 4 and ORF7 to 11 were sequenced with the aid of the primers: S/SE/+/25363, S/SE/+/25835, S/SE/-/25494, S/SE/-/25875, S/MN/+/27839, S/MN/+/27409, S/MN/-/27836, S/MN/-/27799 and cloned as above for the other ORFs, to give the plasmids called SARS-SE and SARS-MN. The DNA of these clones was isolated and sequenced with the aid of these same primers and of the universal primers M13 sense and M13 antisense.
[0281] The sequence of the amplicon representing the cDNA of the region containing OFR3 and ORF4 (SEQ ID NO: 7) of the SARS-CoV strain derived from the sample No. 031589 contains a nucleotide difference in relation to the corresponding sequence of the isolate AY274119-Tor2. This mutation at position 25298 results in a modification of the amino acid sequence of the corresponding protein (ORF3): at position 11, an arginine (AY274119-Tor2) is changed to glycine in the SARS-CoV strain derived from the sample No. 031589. By contrast, no mutation was identified in relation to the corresponding sequence of the isolate AY278741-Urbani. The sequences of ORF3 and 4 of the SARS-CoV strain derived from the sample No. 031589 correspond respectively to the sequences SEQ ID NO: 10 and 12 in the sequence listing appended as an annex.
[0282] The plasmid, called SARS-SE, was deposited under the No. I-3126, on Nov. 13, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA corresponding to the region situated between ORF-S and ORF-E and overlapping ORF-E of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said region corresponding to the nucleotides at positions 25110 to 26244 (SEQ ID NO: 8), with reference to the Genbank sequence accession No. AY274119.3.
[0283] The sequence of the amplicon representing the cDNA corresponding to the region containing ORF7 to ORF11 (SEQ ID NO: 19) of the SARS-CoV strain derived from the sample No. 031589 does not contain differences in relation to the corresponding sequences of the isolates AY274119-Tort and AY278741-Urbani. The sequences of ORF7 to 11 of the SARS-CoV strain derived from the sample No. 031589 correspond respectively to the sequences SEQ ID NO: 22, 24, 26, 28 and 30 in the sequence listing appended as an annex.
[0284] The plasmid, called SARS-MN, was deposited under the No. I-3125, on Nov. 13, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence corresponding to the region situated between ORF-M and ORF-N of the SARS-CoV strain derived from the sample recorded under the No. 031589 and collected in Hanoi, as defined above, said sequence corresponding to the nucleotides at positions 26977 to 28218 (SEQ ID NO: 20), with reference to the Genbank sequence accession No. AY274119.3.
[0285] The sequence of the amplicon representing the cDNA corresponding to the region containing ORF7 to ORF11 (SEQ ID NO: 19) of the SARS-CoV strain derived from the sample No. 031589 does not contain differences in relation to the corresponding sequences of the isolates AY274119-Tor2 and AY278741-Urbani. The sequences of ORF7 to 11 of the SARS-CoV strain derived from the sample No. 031589 correspond respectively to the sequences SEQ ID NO: 22, 24, 26, 28 and 30 in the sequence listing appended as an annex.
2.4) cDNA Encoding the N Protein and Including ORF13 and ORF14
[0286] The cDNA was synthesized and amplified as described above for the fragments Sa and Sb. More specifically, the reaction mixture containing: 5 μl of RNA, 5 μl of H2O for injection, 4 μl of 5× reverse transcriptase buffer, 2 μl of dNTP (5 mM), 2 μl of oligo 20T (5 μM), 0.5 μl of RNasin (40 IU/μl) and 1.5 μl of AMV-RT (10 IU/μl Promega) was incubated in a thermocycler under the following conditions: 45 min at 42° C., 15 min at 55° C., 5 min at 95° C., and it was then kept at +4° C.
[0287] A first PCR amplification was performed with the pair of primers S/N/F3/+/28023 and S/N/R3/-/29480.
[0288] The reaction mixture as above for the amplification of the S1 and S2 fragments was incubated in a thermocycler, under the following conditions: an initial step of denaturation at 94° C. for 2 min was followed by 40 cycles comprising a step of denaturation at 94° C. for 20 sec, a step of annealing at 55° C. for 30 sec and then a step of extension at 72° C. for 1 min 30 sec with 10 sec of additional extension at each cycle, and then a final step of extension at 72° C. for 5 min.
[0289] The amplicon obtained at the first PCR amplification was subjected to a second PCR amplification step (nested PCR) with the pairs of primer S/N/F4/+/28054 and S/N/R4/-/29430 under conditions identical to those of the first amplification.
[0290] The amplification product obtained, corresponding to the cDNA encoding the N protein of the SARS-CoV strain derived from the sample No. 031589, was sequenced with the aid of the primers: S/N/F4/+/28054, S/N/R4/-/29430, S/N/+/28468, S/N/+/28918 and S/N/-/28607 and cloned as above for the other ORFs, to give the plasmid called SARS-N. The DNA of these clones was isolated and sequenced with the aid of the universal primers M13 sense and M13 antisense, and the primers S/N/+/28468, S/N/+/28918 and S/N/-/28607.
[0291] The sequence of the amplicon representing the cDNA corresponding to ORF-N and including ORF13 and ORF14 (SEQ ID NO: 36) of the SARS-CoV strain derived from the sample No. 031589 does not contain differences in relation to the corresponding sequences of the isolates AY274119.3-Tor2 and AY278741-Urbani. The sequence of the N protein of the SARS-CoV strain derived from the sample No. 031589 corresponds to the sequence SEQ ID NO: 37 in the sequence listing appended as an annex.
[0292] The sequences of ORF13 and 14 of the SARS-CoV strain derived from the sample No. 031589 correspond respectively to the sequences SEQ ID NO: 32 and 34 in the sequence listing appended as an annex.
[0293] The plasmid, called SARS-N, was deposited under the No. I-3048, on Jun. 5, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA encoding the N protein of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions 28054 to 29430 (SEQ ID NO: 38), with reference to the Genbank sequence accession No. AY274119.3.
2.5) Noncoding 5' and 3' Ends
a) Noncoding 5' End (5'NC)
[0294] a1) Synthesis of the cDNA
[0295] The RNAs derived from the sample 031589, extracted as above, were subjected to reverse transcription under the following conditions:
[0296] The RNA (15 μl) and the primer S/L/-/443 (3 μl at the concentration of 5 μm) were incubated for 10 min at 75° C.
[0297] Next, the 5× reverse transcriptase buffer (6 μl, INVITROGEN), 10 Mm dNTP (1 μl), 0.1 M DTT (3 μl) were added and the mixture was incubated at 50° C. for 3 min.
[0298] Finally, the reverse transcriptase (3 μl of Superscript®, INVITROGEN) was added to the preceding mixture which was incubated at 50° C. for 1 h 30 min and then at 90° C. for 2 min.
[0299] The cDNA thus obtained was purified with the aid of the QIAquick PCR purification kit (QIAGEN), according to the manufacturer's recommendations.
b1) Terminal Transferase Reaction (TdT)
[0300] The cDNA (10 μl) is incubated for 2 min at 100° C., stored in ice, and the following are then added: H2O (2.5 μl), 5× TdT buffer (4 μl, AMERSHAM), 5 mM dATP (2 μl) and TdT (1.5 μl, AMERSHAM). The mixture thus obtained is incubated for 45 min at 37° C. and then for 2 min at 65° C.
[0301] The product obtained is amplified by a first PCR reaction with the aid of the primers: S/L/-225-206 and anchor 14T: 5'-AGATGAATTCGGTACCTTTTTTTTTTTTTT-3' (SEQ ID NO: 68). The amplification conditions are the following: an initial step of denaturation at 94° C. for 2 min is followed by 10 cycles comprising a step of denaturation at 94° C. for 10 sec, a step of annealing at 45° C. for 30 sec and then a step of extension at 72° C. for 30 sec and then by 30 cycles comprising a step of denaturation at 94° C. for 10 sec, a step of annealing at 50° C. for 30 sec and then a step of extension at 72° C. for 30 sec, and then a final step of extension at 72° C. for 5 min.
[0302] The product of the first PCR amplification was subjected to a second amplification step with the aid of the primers: S/L/-/204-185 and anchor 14T mentioned above under conditions identical to those of the first amplification. The amplicon thus obtained was purified, sequenced with the aid of the primer S/L/-/182-163 and it was then cloned as above for the different ORFs, to give the plasmid called SARS-5'NC. The DNA of this clone was isolated and sequenced with the aid of the universal primers M13 sense and M13 antisense and the primer S/L/-/182-163 mentioned above.
[0303] The amplicon representing the cDNA corresponding to the 5'NC end of the SARS-CoV strain derived from the sample recorded under the No. 031589 corresponds to the sequence SEQ ID NO: 72 in the sequence listing appended as an annex; this sequence does not contain differences in relation to the corresponding sequences of the isolates AY274119.3-Tor2 and AY278741-Urbani.
[0304] The plasmid, called SARS-5'NC, was deposited under the No. I-3124, on Nov. 7, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA corresponding to the noncoding 5' end of the genome of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to the nucleotides at positions 1 to 204 (SEQ ID NO: 39), with reference to the Genbank sequence accession No. AY274119.3.
b) Noncoding 3' End (3'NC)
[0305] a1) Synthesis of the cDNA
[0306] The RNAs derived from the sample 031589, extracted as above, were subjected to reverse transcription, according to the following protocol: the reaction mixture containing: RNA (5 μl), H2O (5 μl), 5× reverse transcriptase buffer (4 μl), 5 mM dNTP (2 μl), 5 μM Oligo 20T (2 μl), 40 U/μl RNasin (0.5 μl) and 10 IU/μl RT-AMV (1.5 μl, PROMEGA) was incubated in a thermocycler, under the following conditions: 45 min at 42° C., 15 min at 55° C., 5 min at 95° C., and it was then kept at +4° C.
[0307] The cDNA obtained was amplified by a first PCR reaction with the aid of the primers S/N/+/28468-28487 and anchor 14T mentioned above. The amplification conditions are the following: an initial step of denaturation at 94° C. for 2 min is followed by 10 cycles comprising a step of denaturation at 94° C. for 20 sec, a step of annealing at 45° C. for 30 sec and then a step of extension at 72° C. for 50 sec and then 30 cycles comprising a step of denaturation at 94° C. for 20 sec, a step of annealing at 50° C. for 30 sec and then a step of extension at 72° C. for 50 sec, and then a final step of extension at 72° C. for 5 min.
[0308] The product of the first PCR amplification was subjected to a second amplification step with the aid of the primers S/N/+/28933-28952 and anchor 14T mentioned above, under conditions identical to those of the first amplification. The amplicon thus obtained was purified, sequenced with the aid of the primer S/N/+/29257-29278 and cloned as above for the different ORFs, to give the plasmid called SARS-3'NC. The DNA of this clone was isolated and sequenced with the aid of the universal primers M13 sense and M13 antisense and the primer S/N/+/29257-29278 mentioned above.
[0309] The amplicon representing the cDNA corresponding to the 3'NC end of the SARS-CoV strain derived from the sample recorded under the No. 031589 corresponds to the sequence SEQ ID NO: 73 in the sequence listing appended as an annex; this sequence does not contain differences in relation to the corresponding sequences of the isolates AY274119.3-Tor2 and AY278741-Urbani.
[0310] The plasmid called SARS-3'NC was deposited under the No. I-3123 on Nov. 7, 2003, at the Collection Nationale de Cultures de Microorganismes, 25 rue du Docteur Roux, 75724 Paris Cedex 15; it contains the cDNA sequence corresponding to the noncoding 3' end of the genome of the SARS-CoV strain derived from the sample recorded under the No. 031589, as defined above, said sequence corresponding to that situated between the nucleotide at positions 28933 to 29727 (SEQ ID NO: 40), with reference to the Genbank sequence accession No. AY274119.3, ends with a series of nucleotides a.
2.6) ORF1a and ORF1b
[0311] The amplification of the 5' region containing ORF1a and ORF1b of the SARS-CoV genome derived from the sample 031589 was performed by carrying out RT-PCR reactions followed by nested PCRs according to the same principles as those described above for the other ORFs. The amplified fragments overlap over several tenths of bases, thus allowing computer reconstruction of the complete sequence of this part of the genome. On average, the amplified fragments are of two kilobases.
[0312] 14 overlapping fragments, called L0 to L12, were thus amplified with the aid of the following primers:
TABLE-US-00002 TABLE II Primers used for the amplification of the 5' region (ORF1a and ORF1b) REGION AMPLIFIED AND SEQUENCED (does not include RT-PCR RT-PCR Nested PCR Nested PCR the primers) sense primer antisense primer sense primer antisense primer L0 S/L0/F1/+30 S/L0/R1/-481 50-480 L1 S/L1/F1/+147 S/L1/R1/-2336 S/L1/F2/+211 S/L1/R2/-2241 231-2240 L2 S/L2/F1/+2033 S/L2/R1/-4192 S/L2/F2/+2136 S/L2/R2/-4168 2156-4157 L3 S/L3bis/F1/+3850 S/L3bis/R1/-5365 S/L3bis/F2/+3892 S/L3bis/R2/-5325 3913-5324 L4b S/L4b/F1/+4878 S/L4b/R1/-6061 S/L4b/F2/+4932 S/L4b/R2/-6024 4952-6023 L4 S/L4/F1/+5272 S/L4/R1/-7392 S/L4/F2/+5305 S/L4/R2/-7323 5325-7318 L5 S/L5/F1/+7111 S/L5/R1/-9253 S/L5/F2/+7275 S/L5/R2/-9157 7296-9156 L6 S/L6/F1/+8975 S/L6/R1/-11151 S/L6/F2/+9032 S/L6/R2/-11067 9053-11066 L7 S/L7/F1/+10883 S/L7/R1/-13050 S/L7/F2/+10928 S/L7/R2/-12963 10928-12962 L8 S/L8/F1/+12690 S/L8/R1/-14857 S/L8/F2/+12815 S/L8/R2/-14835 12835-14834 L9 S/L9/F1/+14688 S/L9/R1/-16678 S/L9/F2/+14745 S/L9/R2/-16625 14765-16624 L10 S/L10/F1/+16451 S/L10/R1/-18594 S/L10/F2/+16514 S/L10/R2/-18571 16534-18570 L11 S/L11/F1/+18441 S/L11/R1/-20612 S/L11/F2/+18500 S/L11/R2/-20583 18521-20582 L12 S/L12/F1/+20279 S/L12/R1/-22229 S/L12/F2/+20319 S/L12/R2/-22206 20338-22205.
[0313] All the fragments were amplified under the following conditions, except fragment L0 which was amplified as described above for ORF-M: [0314] RT-PCR: 30 min at 42° C., 15 min at 55° C., 2 min at 94° C., and then the cDNA obtained is amplified under the following conditions: 40 cycles comprising: a step of denaturation at 94° C. for 15 sec, a step of annealing at 58° C. for 30 sec and then a step of extension at 68° C. for 1 min 30 sec, with 5 sec additional extension at each cycle, and then a final step of extension at 68° C. for 7 min. [0315] Nested PCR: An initial step of denaturation at 94° C. for 2 min is followed by 35 cycles comprising: a step of denaturation at 94° C. for 15 sec, a step of annealing at 60° C. for 30 sec and then a step of extension at 72° C. for 1 min 30 sec, with 5 sec of additional extension at each cycle, and then a final step of extension at 72° C. for 7 min.
[0316] The amplification products were sequenced with the aid of the primers defined in table III below:
TABLE-US-00003 TABLE III Primers used for the sequencing of the 5' region (ORF1a and ORF1b) Names Sequences (SEQ ID NO: 76 to 139) S/L3/+/4932 5'-CCACACACAGCTTGTGGATA-3' S/L4/+/6401 5'-CCGAAGTTGTAGGCAATGTC-3' S/L4/+/6984 5'-TTTGGTGCTCCTTCTTATTG-3' S/L4/-/6817 5'-CCGGCATCCAAACATAATTT-3' S/L5/-/7633 5'-TGGTCAGTAGGGTTGATTGG-3' S/L5/-/8127 5'-CATCCTTTGTGTCAACATCG-3' S/L5/-/8633 5'-GTCACGAGTGACACCATCCT-3' S/L5/+/7839 5'-ATGCGACGAGTCTGCTTCTA-3' S/L5/+/8785 5'-TTCATAGTGCCTGGCTTACC-3' S/L5/+/8255 5'-ATCTTGGCGCATGTATTGAC-3' S/L6/-/9422 5'-TGCATTAGCAGCAACAACAT-3' S/L6/-/9966 5'-TCTGCAGAACAGCAGAAGTG-3' S/L6/-/10542 5'-CCTGTGCAGTTTGTCTGTCA-3' S/L6/+/10677 5'-CCTTGTGGCAATGAAGTACA-3' S/L6/+/10106 5'-ATGTCATTTGCACAGCAGAA-3' S/L6/+/9571 5'-CTTCAATGGTTTGCCATGTT-3' S/L7/-/11271 5'-TGCGAGCTGTCATGAGAATA-3' S/L7/-/11801 5'-AACCGAGAGCAGTACCACAG-3' S/L7/-/12383 5'-TTTGGCTGCTGTAGTCAATG-3' S/L7/+/12640 5'-CTACGACAGATGTCCTGTGC-3' S/L7/+/12088 5'-GAGCAGGCTGTAGCTAATGG-3' S/L7/+/11551 5'-TTAGGCTATTGTTGCTGCTG-3' S/L8/-/13160 5-CAGACAACATGAAGCACCAC-3' S/L8/-/13704 5'-CGCTGACGTGATATATGTGG-3' S/L8/-/14284 5'-TGCACAATGAAGGATACACC-3' S/L8/+/14453 5'-ACATAGCTCGCGTCTCAGTT-3' S/L8/+/13968 5'-GGCATTGTAGGCGTACTGAC-3' S/L8/+/13401 5'-GTTTGCGGTGTAAGTGCAG-3' S/L9/-/15098 5'-TAGTGGCGGCTATTGACTTC-3' S/L9/-/15677 5'-CTAAACCTTGAGCCGCATAG-3' S/L9/-/16247 5'-CATGGTCATAGCAGCACTTG-3' S/L9/+/16323 5'-CCAGGTTGTGATGTCACTGAT-3' S/L9/+/15858 5'-CCTTACCCAGATCCATCAAG-3' S/L9/+/15288 5'-CGCAAACATAACACTTGCTG-3' S/L10/-/16914 5'-AGTGTTGGGTACAAGCCAGT-3' S/L10/-/17466 5'-GTTCCAAGGAACATGTCTGG-3' S/L10/-/18022 5'-AGGTGCCTGTGTAGGATGAA-3' S/L10/+/18245 5'-GGGCTGTCATGCAACTAGAG-3' S/L10/+/17663 5'-TCTTACACGCAATCCTGCTT-3' S/L10/+/17061 5'-TACCCATCTGCTCGCATAGT-3' S/L11/-/18877 5'-GCAAGCAGAATTAACCCTCA-3' S/L11/-/19396 5'-AGCACCACCTAAATTGCATC-3' S/L11/-/20002 5'-TGGTCCCTTTGAAGGTGTTA-3' S/L11/+/20245 5'-TCGAACACATCGTTTATGGA-3' S/L11/+/19611 5'-GAAGCACCTGTTTCCATCAT-3' S/L11/+/19021 5'-ACGATGCTCAGCCATGTAGT-3' SARS/L1/F3/+800 5'-GAGGTGCAGTCACTCGCTAT-3' SARS/L1/F4/+1391 5'-CAGAGATTGGACCTGAGCAT-3' SARS/L1/F5/+1925 5'-CAGCAAACCACTCAATTCCT-3' SARS/L1/R3/-1674 5'-AAATGATGGCAACCTCTTCA-3' SARS/L1/R4/-1107 5'-CACGTGGTTGAATGACTTTG-3' SARS/L1/R5/-520 5'-ATTTCTGCAACCAGCTCAAC-3' SARS/L2/F3/+2664 5'-CGCATTGTCTCCTGGTTTAC-3' SARS/L2/F4/+3232 5'-GAGATTGAGCCAGAACCAGA-3' SARS/L2/F5/+3746 5'-ATGAGCAGGTTGTCATGGAT-3' SARS/L2/R3/-3579 5'-CTGCCTTAAGAAGCTGGATG-3' SARS/L2/R4/-2991 5'-TTTCTTCACCAGCATCATCA-3' SARS/L2/R5/-2529 5'-CACCGTTCTTGAGAACAACC-3' SARS/L3/F3/+4708 5'-TCTTTGGCTGGCTCTTACAG-3' SARS/L3/F4/+5305 5'-GCTGGTGATGCTGCTAACTT-3' SARS/L3/F5/+5822 5'-CCATCAAGCCTGTGTCGTAT-3' SARS/L3/R3/-5610 5'-CAGGTGGTGCAGACATCATA-3' SARS/L3/R4/-4988 5'-AACATCAGCACCATCCAAGT-3' SARS/L3/R5/-4437 5'-ATCGGACACCATAGTCAACG-3'
[0317] The sequences of the fragments L0 to L12 of the SARS-CoV strain derived from the sample recorded under the No. 031589 correspond respectively to the sequences SEQ ID NO: 41 to SEQ ID NO: 54 in the sequence listing appended as an annex. Among these sequences, only that corresponding to the fragments L5 contains a nucleotide difference in relation to the corresponding sequence of the isolate AY278741-Urbani. This t/c mutation at position 7919 results in a modification of the amino acid sequence of the corresponding protein, encoded by ORF1a: at position 2552, a valine (gtt codon; AY278741) is changed to alanine (gct codon) in the SARS-CoV strain 031589. By contrast, no mutation was identified in relation to the corresponding sequence of the isolate AY274119.3-Urbani. The other fragments do not exhibit differences in relation to the corresponding sequences of the isolates Tor2 and Urbani.
EXAMPLE 2
Production and Purification of the Recombinant N and S Proteins of the SARS-CoV Strain Derived from the Sample Recorded Under the Number 031589
[0318] The entire N protein and two polypeptide fragments of the S protein of the SARS-CoV strain derived from the sample recorded under the number 031589 were produced in E. coli, in the form of fusion proteins comprising an N- or C-terminal polyhistidine tag. In the two S polypeptides, the N- and C-terminal hydrophobic sequences of the S protein (signal peptide: positions 1 to 13 and transmembrane helix: positions 1196 to 1218) were deleted whereas the 0 helix (positions 565 to 687) and the two motifs of the coiled-coil type (positions 895 to 980 and 1155 to 1186) of the S protein were preserved. These two polypeptides consist of: a long fragment (SL) corresponding to positions 14 to 1193 of the amino acid sequence of the S protein and a short fragment (SC) corresponding to positions 475 to 1193 of the amino acid sequence of the S protein.
1) Cloning of the cDNAs N, SL and SC into the Expression Vectors pIVEX2.3 and pIVEX2.4
[0319] The cDNAs corresponding to the N protein and to the SL and SC fragments were amplified by PCR under standard conditions, with the aid of the DNA polymerase Platinium Pfx® (INVITROGEN). The plasmids SRAS-N and SRAS-S were used as template and the following oligonucleotides as primers:
TABLE-US-00004 (N sense, SEQ ID NO: 55) 5'-CCCATATGTCTGATAATGGACCCCAATCAAAC-3' (N antisense, SEQ ID NO: 56) 5'-CCCCCGGGTGCCTGAGTTGAATCAGCAGAAGC-3' (SC sense, SEQ ID NO: 57) 5'-CCCATATGAGTGACCTTGACCGGTGCACCAC-3' (SL sense, SEQ ID NO: 58) 5'-CCCATATGAAACCTTGCACCCCACCTGCTC-3' (SC and SL antisense, SEQ ID NO: 29) 5'-CCCCCGGGTTTAATATATTGCTCATATTTTCCC-3'.
[0320] The sense primers introduce an NdeI site (underlined) while the antisense primers introduce an XmaI or SmaI site (underlined). The 3 amplification products were column purified (QIAquick PCR Purification kit, QIAGEN) and cloned into an appropriate vector. The plasmid DNA purified from the 3 constructs (QIAFilter Midi Plasmid kit, QIAGEN) was verified by sequencing and digested with the enzymes NdeI and XmaI. The 3 fragments corresponding to the cDNAs N, SL and SC were purified on agarose gel and then inserted into the plasmids pIVEX2.3MCS(C-terminal polyhistidine tag) and pIVEX2.4d (N-terminal polyhistidine tag) digested beforehand with the same enzymes. After verification of the constructs, the 6 expression vectors thus obtained (pIV2.3N, pIV2.3SC, pIV2.3SL, pIV2.4N, pIV2.4SC also called pIV2.4S1, pIV2.4SL) were then used, on the one hand to test the expression of the proteins in vitro, and on the other hand to transform the bacterial strain BL21(DE3)pDIA17 (NOVAGEN). These constructs encode proteins whose expected molecular mass is the following: pIV2.3N (47174 Da), pIV2.3SC (82897 Da), pIV2.3SL (132056 Da), pIV2.4N (48996 Da), pIV2.4S1 (81076 Da) and pIV2.4SL (133877 Da). Bacteria transformed with pIV2.3N were deposited at the CNCM on Oct. 23, 2003, under the number I-3117, and bacteria transformed with pIV2.4S1 were deposited at the CNCM on Oct. 23, 2003, under the number I-3118.
2) Analysis of the Expression of the Recombinant Proteins In Vitro and In Vivo
[0321] The expression of recombinant proteins from the 6 recombinant vectors was tested, in a first instance, in a system in vitro (RTS100, Roche). The proteins produced in vitro, after incubation of the recombinant vectors pIVEX for 4 h at 30° C., in the RTS100 system, were analyzed by Western blotting with the aid of an anti-(his)6 antibody coupled to peroxidase. The result of expression in vitro (FIG. 1) shows that only the N protein is expressed in large quantities, regardless of the position, N- or C-terminal, of the polyhistidine tag. In a second step, the expression of the N and S proteins was tested in vivo at 30° C. in LB medium in the presence or in the absence of inducer (1 mM IPTG). The N protein is very well produced in this bacterial
[0322] The sequences of the fragments L0 to L12 of the SARS-CoV strain derived from the sample recorded under the No. 031589 correspond respectively to the sequences SEQ ID NO: 41 to SEQ ID NO: 54 in the sequence listing appended as an annex. Among these sequences, only that corresponding to the fragments L5 contains a nucleotide difference in relation to the corresponding sequence of the isolate AY278741-Urbani. This t/c mutation at position 7919 results in a modification of the amino acid sequence of the corresponding protein, encoded by ORF1a: at position 2552, a valine (gtt codon; AY278741) is changed to alanine (gct codon) in the SARS-CoV strain 031589. By contrast, no mutation was identified in relation to the corresponding sequence of the isolate AY274119.3-Urbani. The other fragments do not exhibit differences in relation to the corresponding sequences of the isolates Tor2 and Urbani.
EXAMPLE 2
Production and Purification of the Recombinant N and S Proteins of the SARS-CoV Strain Derived from the Sample Recorded Under the Number 031589
[0323] The entire N protein and two polypeptide fragments of the S protein of the SARS-CoV strain derived from the sample recorded under the number 031589 were produced in E. coli, in the form of fusion proteins comprising an N- or C-terminal polyhistidine tag. In the two S polypeptides, the N- and C-terminal hydrophobic sequences of the S protein (signal peptide: positions 1 to 13 and transmembrane helix: positions 1196 to 1218) were deleted whereas the helix (positions 565 to 687) and the two motifs of the coiled-coil type (positions 895 to 980 and 1155 to 1186) of the S protein were preserved. These two polypeptides consist of: a long fragment (SL) corresponding to positions 14 to 1193 of the amino acid sequence of the S protein and a short fragment (SC) corresponding to positions 475 to 1193 of the amino acid sequence of the S protein.
1) Cloning of the cDNAs N, SL and SC into the Expression Vectors pIVEX2.3 and pIVEX2.4
[0324] The cDNAs corresponding to the N protein and to the SL and SC fragments were amplified by PCR under standard conditions, with the aid of the DNA polymerase Platinium Pfx® (INVITROGEN). The plasmids SRAS-N and SRAS-S were used as template and the following oligonucleotides as primers:
TABLE-US-00005 (N sense, SEQ ID NO: 55) 5'-CCCATATGTCTGATAATGGACCCCAATCAAAC-3' (N antisense, SEQ ID NO: 56) 5'-CCCCCGGGTGCCTGAGTTGAATCAGCAGAAGC-3' (SC sense, SEQ ID NO: 57) 5'-CCCATATGAGTGACCTTGACCGGTGCACCAC-3' (SL sense, SEQ ID NO: 58) 5'-CCCATATGAAACCTTGCACCCCACCTGCTC-3' (SC and SL antisense, SEQ ID NO: 29) 5'-CCCCCGGGTTTAATATATTGCTCATATTTTCCC-3'.
[0325] The sense primers introduce an NdeI site (underlined) while the antisense primers introduce an XmaI or SmaI site (underlined). The 3 amplification products were column purified (QIAquick PCR Purification kit, QIAGEN) and cloned into an appropriate vector. The plasmid DNA purified from the 3 constructs (QIAFilter Midi Plasmid kit, QIAGEN) was verified by sequencing and digested with the enzymes NdeI and XmaI. The 3 fragments corresponding to the cDNAs N, SL and SC were purified on agarose gel and then inserted into the plasmids pIVEX2.3MCS(C-terminal polyhistidine tag) and pIVEX2.4d (N-terminal polyhistidine tag) digested beforehand with the same enzymes. After verification of the constructs, the 6 expression vectors thus obtained (pIV2.3N, pIV2.3SC, pIV2.3SL, pIV2.4N, pIV2.4SC also called pIV2.4S1, pIV2.4SL) were then used, on the one hand to test the expression of the proteins in vitro, and on the other hand to transform the bacterial strain BL21(DE3)pDIA17 (NOVAGEN). These constructs encode proteins whose expected molecular mass is the following: pIV2.3N (47174 Da), pIV2.3SC (82897 Da), pIV2.3SL (132056 Da), pIV2.4N (48996 Da), pIV2.4S1 (81076 Da) and pIV2.4SL (133877 Da). Bacteria transformed with pIV2.3N were deposited at the CNCM on Oct. 23, 2003, under the number I-3117, and bacteria transformed with pIV2.4S1 were deposited at the CNCM on Oct. 23, 2003, under the number I-3118.
2) Analysis of the Expression of the Recombinant Proteins In Vitro and In Vivo
[0326] The expression of recombinant proteins from the 6 recombinant vectors was tested, in a first instance, in a system in vitro (RTS100, Roche). The proteins produced in vitro, after incubation of the recombinant vectors pIVEX for 4 h at 30° C., in the RTS100 system, were analyzed by Western blotting with the aid of an anti-(his)6 antibody coupled to peroxidase. The result of expression in vitro (FIG. 1) shows that only the N protein is expressed in large quantities, regardless of the position, N- or C-terminal, of the polyhistidine tag. In a second step, the expression of the N and S proteins was tested in vivo at 30° C. in LB medium in the presence or in the absence of inducer (1 mM IPTG). The N protein is very well produced in this bacterial system (FIG. 2) and is found mainly in a soluble fraction after lysis of the bacteria. By contrast, the long version of S(SL) is very weakly produced and is completely insoluble (FIG. 3). The short version (SC) also exhibits a very weak solubility, but an expression level that is much higher than that of the long version. Moreover, the construct SC fused with a polyhistidine tag at the C-terminal position has a smaller size than that expected. An immunodetection experiment with an anti-polyhistidine antibody has shown that this construct was incomplete. In conclusion, the two constructs, pIV2.3N and pIV2.4S1, which express respectively the entire N protein fused with the C-terminal polyhistidine tag and the short S protein fused with the N-terminal polyhistidine tag, were selected in order to produce the two proteins in a large quantity so as to purify them. The plasmids pIV2.3N and pIV2.4S1 were deposited respectively under the No. I-3117 and I-3118 at the CNCM, 25 rue du Docteur Roux, 75724 PARIS 15, on Oct. 23, 2003.
3) Analysis of the Antigenic Activity of the Recombinant Proteins
[0327] The antigenic activity of the N, SL and SC proteins was tested by Western blotting with the aid of two serum samples, obtained from the same patient infected with SARS-CoV, collected 8 days (M12) and 29 days (M13) after the onset of the SARS symptoms. The experimental protocol is as described in example 3. The results illustrated by FIG. 4 show (i) the seroconversion of the patient, and (ii) that the N protein possesses a higher antigenic reactivity than the short S protein.
4) Purification of the N Protein from pIV2.3N
[0328] Several experiments for purifying the N protein, produced from the vector pIV2.3N, were carried out according to the following protocol. The bacteria BL21(DE3)pDIA17, transformed with the expression vector pIV2.3N, were cultured at 30° C. in 1 liter of culture medium containing 0.1 mg/ml of ampicillin, and induced with 1 mM IPTG when the cell density equivalent to A600=0.8 is reached (about 3 hours). After 2 hours of culture in the presence of inducer, the cells were recovered by centrifugation (10 min at 5000 rpm), resuspended in the lysis buffer (50 mM NaH2PO4, 0.3 M NaCl, 20 mM imidazole, pH 8, containing the mixture of protease inhibitors Complete®, Roche), and lysed with the French press (12 000 psi). After centrifugation of the bacterial lysate (15 min at 12 000 rpm), the supernatant (50 ml) was deposited at a flow rate of 1 ml/min on a metal chelation column (15 ml) (Ni-NTA superflow, Qiagen), equilibrated with the lysis buffer. After washing the column with 200 ml of lysis buffer, the N protein was eluted with an imidazole gradient (20→250 mM) in 10 column volumes. The fractions containing the N protein were assembled and analyzed by polyacrylamide gel electrophoresis under denaturing conditions followed by staining with Coomassie blue. The results illustrated by FIG. 5 show that the protocol used makes it possible to purify the N protein with a very satisfactory homogeneity (95%) and a mean yield of 15 mg of protein per liter of culture.
5) Purification of the SC Protein from pIV2.4SC (pIV2.4S1)
[0329] The protocol followed for purifying the short S protein is very different from that described above because the protein is highly aggregated in the bacterial system (inclusion bodies). The bacteria BL21(DE3)pDIA17, transformed with the expression vector pIV2.4S1, were cultured at 30° C. in 1 liter of culture medium containing 0.1 mg/ml of ampicillin, and induced with 1 mM IPTG when the cell density equivalent to A600=0.8 is reached (about 3 hours). After 2 hours of culture in the presence of inducer, the cells were recovered by centrifugation (10 min at 5000 rpm), resuspended in the lysis buffer (0.1 M Tris-HCl, 1 mM EDTA, pH 7.5), and lysed with the French press (1200 psi). After centrifugation of the bacterial lysate (15 min at 12 000 rpm), the pellet was resuspended in 25 ml of lysis buffer containing 2% Triton X100 and 10 mM β-mercaptoethanol, and then centrifuged for 20 min at 12 000 rpm. The pellet was resuspended in 10 mM Tris-HCl buffer containing 7 M urea, and gently stirred for 30 min at room temperature. This final washing of the inclusion bodies with 7 M urea is necessary in order to remove most of the E. coli membrane proteins which co-sediment with the aggregated SC protein. After a final centrifugation for 20 min at 12 000 rpm, the final pellet is resuspended in the 10 mM Tris-HCl buffer. The electrophoretic analysis of this preparation (FIG. 6) shows that the short S protein may be purified with a satisfactory homogeneity (about 90%) from the inclusion bodies (insoluble extract).
EXAMPLE 3
Immunodominance of the N Protein
[0330] The reactivity of the antibodies present in the serum of patients suffering from atypical pneumopathy caused by the SARS-associated coronavirus (SARS-CoV), toward the various proteins of this virus, was analyzed by Western blotting under the conditions described below.
1) Materials
[0331] a) Lysate of Cells Infected with SARS-CoV
[0332] Vero E6 cells (2×106) were infected with SARS-CoV (isolate recorded under the number FFM/MA104) at a multiplicity of infection (M.O.I.) of 10-1 or 10-2 and then incubated in DMEM medium containing 2% FCS, at 35° C. in an atmosphere containing 5% CO2. 48 hours later, the cellular lawn was washed with PBS and then lysed with 500 μl of loading buffer prepared according to Laemmli and containing β-mercaptoethanol. The samples were then boiled for 10 minutes and then sonicated for 3 times 20 seconds.
b) Antibodies
[0333] b1) Serum from a Patient Suffering from Atypical Pneumopathy
[0334] The serum designated by a reference at the National Reference Center for Influenza Viruses (Northern region) under the No. 20033168 is that from a French patient suffering from atypical pneumopathy caused by SARS-CoV collected on day 38 after the onset of the symptoms; the diagnosis of SARS-CoV infection was performed by nested RT-PCR and quantitative PCR.
b2) Monospecific Rabbit Polyclonal Sera Directed Against the N Protein or the S Protein
[0335] The sera are those produced from the recombinant N and SC proteins (example 2), according to the immunization protocol described in example 4; they are the rabbit P13097 serum (anti-N serum) and the rabbit P11135 serum (anti-S serum).
2) Method
[0336] 20 μl of lysate of cells infected with SARS-CoV at M.O.I. values of 10-1 and 10-2 and, as a control, 20 μl of a lysate of noninfected cells (mock) were separated on 10% SDS polyacrylamide gel and then transferred onto a nitrocellulose membrane. After blocking in a solution of PBS/5% milk/0.1% Tween and washing in PBS/0.1% Tween, this membrane was hybridized overnight at 4° C. with: (i) the immune serum No. 20033168 diluted 1/300, 1/1000 and 1/3000 in the buffer PBS/1% BSA/0.1% Tween, (ii) the rabbit P13097 serum (anti-N serum) diluted 1/50 000 in the same buffer and (iii) the rabbit P11135 serum (anti-S serum) diluted 1/10 000 in the same buffer. After washing in PBS/Tween, a secondary hybridization was performed with the aid of either sheep polyclonal antibodies directed against the heavy and light chains of human G immunoglobulins and coupled with peroxidase (NA933V, Amersham), or of donkey polyclonal antibodies directed against the heavy and light chains of the rabbit G immunoglobulins and coupled with peroxidase (NA934V, Amersham). The bound antibodies were visualized with the aid of the ECL+ kit (Amersham) and of Hyperfilm MP autoradiography films (Amersham). A molecular mass ladder (kDa) is presented in the figure.
3) Results
[0337] FIG. 7 shows that three polypeptides of apparent molecular mass 35, 55 and 200 kDa are specifically detected in the extracts of cells infected with SARS-CoV.
[0338] In order to identify these polypeptides, two other immunoblots (FIG. 8) were prepared on the same samples and under the same conditions with rabbit polyclonal antibodies specific for the nucleoprotein N (rabbit P13097, FIG. 8A) and for the spicule protein S (rabbit P11135, FIG. 8B). This experiment shows that the 200 kDa polypeptide corresponds to the SARS-CoV spicule glycoprotein S, that the 55 kDa polypeptide corresponds to the nucleoprotein N while the 35 kDa polypeptide probably represents a truncated or degraded form of N.
[0339] The data presented in FIG. 7 therefore show that the serum 20033168 strongly reacts with N and a lot more weakly with the SARS-CoV S since the 35 and 55 kDa polypeptides are visualized in the form of intense bands for 1/300, 1/1000 and 1/3000 dilutions of the immunoserum whereas the 200 kDa polypeptide is only weakly visualized for a dilution of 1/300. It is also possible to note that no other SARS-CoV polypeptide is detected for dilutions greater than 1/300 of the serum
[0340] This experiment indicates that the antibody response specific for the SARS-CoV N dominates the antibody responses specific for the other SARS-CoV polypeptides and in particular the antibody response directed against the S glycoprotein. It indicates an immuno-dominance of the nucleoprotein N during human infections with SARS-CoV.
EXAMPLE 4
Preparation of Monospecific Polyclonal Anti-Bodies Directed Against the SARS-Associated Coronavirus (SARS-CoV) N and S Proteins
1) Materials and Method
[0341] Three rabbits (P13097, P13081, P13031) were immunized with the purified recombinant polypeptide corresponding to the entire nucleoprotein (N), prepared according to the protocol described in example 2. After a first injection of 0.35 mg per rabbit of protein emulsified in complete Freund's adjuvant (intradermal route), the animals received 3 booster injections at 3 and then 4 weeks' interval, of 0.35 mg of recombinant protein emulsified in incomplete Freund's adjuvant.
[0342] Three rabbits (P11135, P13042, P14001) were immunized with the recombinant polypeptide corresponding to the short fragment of the S protein (SC) produced as described in example 2. As this polypeptide is found mainly in the form of inclusion bodies in the bacterial cytoplasm, the animals received 4 intradermal injections at 3-4 weeks' interval of a preparation of inclusion bodies corresponding to 0.5 mg of recombinant protein emulsified in incomplete Freund's adjuvant. The first 3 injections were made with a preparation of inclusion bodies prepared according to the protocol described in example 2, while the fourth injection was made with a preparation of inclusion bodies which were prepared according to the protocol described in example 2 and then purified on sucrose gradient and washed in 2% Triton X100.
[0343] For each rabbit, a preimmune (p.i.) serum was prepared before the first immunization and an immune serum (I.S.) 5 weeks after the fourth immunization.
[0344] In a first instance, the reactivity of the sera was analyzed by ELISA test on preparations of recombinant proteins similar to those used for the immunizations; the ELISA tests were carried out according to the protocol and with the reagents as described in example 6.
[0345] In a second instance, the reactivity of the sera was analyzed by preparing an immunoblot (Western blot) of a lysate of cells infected with SARS-CoV, according to the protocol as described in example 3.
2) Results
[0346] The ELISA tests (FIG. 9) demonstrate that the preparations of recombinant N protein and of inclusion bodies of the short fragment of the S protein (SC) are immunogenic in animals and that the titer of the immune sera is high (more than 1/25 000).
[0347] The immunoblot (FIG. 8) shows that the rabbit P13097 immune serum recognizes two polypeptides present in the lysates of cells infected with SARS-CoV: a polypeptide whose apparent molecular mass (50-55 kDa based on experiments) is compatible with that of the nucleoprotein N (422 residues, predicted molecular mass of 46 kDa) and a polypeptide of 35 kDa, which probably represents a truncated or degraded form of N.
[0348] This experiment also shows that the rabbit P11135 serum mainly recognizes a polypeptide whose apparent molecular mass (180-220 kDa based on experiments) is compatible with a glycosylated form of S (1255 residues, nonglycosylated polypeptide chain of 139 kDa), as well as lighter polypeptides, which probably represent truncated and/or nonglycosylated forms of S.
[0349] In conclusion, all these experiments demonstrate that the recombinant polypeptides expressed in E. coli and corresponding to the SARS-CoV N and S proteins make it possible to induce, in animals, polyclonal antibodies capable of recognizing the native forms of these proteins.
EXAMPLE 5
Preparation of Monospecific Polyclonal Anti-Bodies Directed Against the SARS-Associated Coronavirus (SARS-CoV) M and E Proteins
1) Analysis of the Structure of the M and E Proteins
a) E Protein
[0350] The structure of the SARS-CoV E protein (76 amino acids) was analyzed in silico, with the aid of various software packages such as signalP v1.1, NetNGlyc 1.0, THMM 1.0 and 2.0 (Krogh et al., 2001, J. Mol. Biol., 305(3):567-580) or alternatively TOPPRED (von Heijne, 1992, J. Mol. Biol. 225, 487-494). The analysis shows that this nonglycosylated polypeptide is a type 1 membrane protein, containing a single transmembrane helix (aa 12-34 according to THMM), and in which the majority of the hydrophilic domain (42 residues) is located at the C-terminal end and probably inside the viral particle (endodomain). It is possible to note an inversion in the topology predicted by versions 1.0 (N-ter is external) and 2.0 (N-ter is internal) of the THMM software, but that other algorithms, in particular TOPPRED and THUMBUP (Zhou et Zhou, 2003, Protein Science 12:1547-1555) confirm an external location of the N-terminal end of E.
b) M Protein
[0351] A similar analysis carried out on the SARS-CoV M protein (221 amino acids) shows that this polypeptide does not possess a signal peptide (according to the software signalP v1.1) but three transmembrane domains (residues 15-37, 50-72, 77-99 according to THMM2.0) and a large hydrophilic domain (aa 100-221) located inside the viral particle (endodomain). It is probably glycosylated on the asparagine at position 4 (according to NetNGlyc 1.0).
[0352] Thus, in agreement with the experimental data known for the other coronaviruses, it is remarkable that the two M and E proteins exhibit endodomains corresponding to the majority of the polypeptides and of the ectodomains that are very small in size. [0353] The ectodomain of E probably corresponds to residues 1 to 11 or 1 to 12 of the protein: MYSFVSEETGT(L), SEQ ID NO: 70. Indeed, the probability associated with the transmembrane location of residue 12 is intermediate (0.56 according to THMM 2.0). [0354] The ectodomain of M probably corresponds to residues 2 to 14 of the protein: ADNGTITVEELKQ, SEQ ID NO: 69. Indeed, the N-terminal methionine of M is very probably cleaved from the mature polypeptide because the residue at position 2 is an alanine (Varshaysky, 1996, 93:12142-12149).
[0355] Moreover, the analysis of the hydrophobicity (Kyte & Doolittle, Hopp & Woods) of the E protein demonstrates that the C-terminal end of the endodomain of E is hydrophilic and therefore probably exposed at the surface of this domain. Thus, a synthetic peptide corresponding to this end is a good immunogenic candidate for inducing, in animals, antibodies directed against the endodomain of E. Consequently, a peptide corresponding to 24 C-terminal residues of E was synthesized.
2) Preparation of Antibodies Directed Against the Ectodomain of the M and E Proteins and the Endodomain of the E Protein
[0356] The peptides M2-14 (ADNGTITVEELKQ, SEQ ID NO: 69), E1-12 (MYSFVSEETGTL, SEQ ID NO: 70) and E53-76 (KPTVYVYSRV KNLNSSEGVP DLLV, SEQ ID NO: 71) were synthesized by Neosystem. They were coupled with KLH (Keyhole Limpet Hemocyanin) with the aid of MBS (m-maleimido-benzoyl-N-hydroxysuccinimide ester) via a cysteine added during the synthesis either at the N-terminus of the peptide (case for E53-76) or at the C-terminus (case of M2-14 and E1-12).
[0357] Two rabbits were immunized with each of the conjugates, according to the following immunization protocol: after a first injection of 0.5 mg of peptide coupled with KLH and emulsified in complete Freund's adjuvant (intradermal route), the animals receive 2 to 4 booster injections at 3 or 4 weeks' interval of 0.25 mg of peptide coupled to KLH and emulsified in incomplete Freund's adjuvant.
[0358] For each rabbit, a preimmune (p.i.) serum was prepared before the first immunization and an immune serum (I.S.) is prepared 3 to 5 weeks after the booster injections.
[0359] The reactivity of the sera was analyzed by Western blotting with the aid of extracts of cells infected with SARS-CoV (FIG. 43B) or with the aid of extracts of cells infected with a recombinant vaccinia virus expressing the protein E (VV-TG-E, FIG. 43A) or M (VV-TN-M, FIG. 43C) of the SARS-CoV 031589 isolate.
[0360] The immune sera of the rabbits 22234 and 22240, immunized with the conjugate KLH-E53-76, recognize a polypeptide of about 9 to 10 kD, which is present in the extracts of cells infected with SARS-CoV but absent from the extracts of noninfected cells (FIG. 43B). The apparent mass of this polypeptide is compatible with the predicted mass of the E protein, which is 8.4 kD. Similarly, the immune serum of the rabbit 20047, immunized with the conjugate KLH-E1-12, recognizes a polypeptide present in the extracts of cells infected with the VV-TG-E virus, whose apparent molar mass is compatible with that of the E protein (FIG. 43A).
[0361] The immune serum of the rabbits 20013 and 20080, immunized with the conjugate KLH-M2-14, recognizes a polypeptide present in the extracts of cells infected with the VV-TN-M virus (FIG. 43C), whose apparent molar mass (about 18 kD) is compatible with that of the glycoprotein M, which is 25.1 kD and has a high iso-electric point (9.1 for the naked polypeptide).
[0362] These results demonstrate that the peptides E1-12 and E53-76, on the one hand, and the peptide M2-14, on the other hand, make it possible to induce, in animals, polyclonal antibodies capable of recognizing the native forms of the SARS-CoV E and M proteins, respectively.
EXAMPLE 6
Analysis of the ELISA Reactivity of the Recombinant N Protein Toward Sera from Patients Suffering from SARS
1) Materials
[0363] The antigen used to prepare the solid phases is the purified recombinant nucleoprotein N prepared according to the protocol described in example 2.
[0364] The sera to be tested (table IV) were chosen on the basis of the results of analysis of their reactivity by immunofluorescence (IF-SARS titer), toward cells infected with SARS-CoV.
TABLE-US-00006 TABLE IV Sera tested by ELISA Serum Date of the IF-SARS Reference No. Type of serum serum*** titer 3050 A Control na* nt** 3048 B Control na nt 033168 D Patient 1-SARS Apr. 27, 2003 (D38) 320 033397 E Patient-1 SARS May 11, 2005 (D52) 320 032632 F Patient-2 SARS Mar. 21, 2003 (D17) 2500 032791 G Patient-3 SARS Apr. 04, 2003 (D3) <40 033258 H Patient-3 SARS Apr. 28, 2003 (D27) 160 *na: not applicable. **nt: not tested. ***the dates indicated correspond to the number of days after the onset of the SARS symptoms.
2) Method
[0365] The N protein (100 μl) diluted at various concentrations in 0.1 M carbonate buffer, pH 9.6 (1, 2 or 4 μg/ml) is distributed into the wells of ELISA plates, and then the plates are incubated overnight at laboratory temperature. The plates are washed with PBS-Tween buffer saturated with PBS-skimmed milk-sucrose (5%) buffer. The test sera (100 μl), diluted beforehand (1/50, 1/100, 1/200, 1/400, 1/800, 1/1600 and 1/3200) are added and then the plates are incubated for 1 h at 37° C. After 3 washings, the peroxidase-labeled anti-human IgG conjugate (reference 209-035-098, JACKSON) diluted 1/18 000 is added and then the plates are incubated for 1 h at 37° C. After 4 washings, the chromogen (TMB) and the substrate (H2O2) are added and the plates are incubated for 30 min at room temperature, protected from light. The reaction is then stopped and then the absorbance at 450 nm is measured with the aid of an automated reader.
3) Results
[0366] The ELISA tests (FIG. 10) demonstrate that the recombinant N protein preparation is specifically recognized by the antibodies of sera from patients suffering from SARS collected in the late phase of the infection (≧17 days after the onset of the symptoms) whereas it is not significantly recognized by the antibodies of a patient's serum collected in the early phase of the infection (3 days after the onset of the symptoms) or by control sera from subjects not suffering from SARS.
EXAMPLE 7
ELISA Tests Prepared for a Very Specific and Sensitive Detection of a SARS-Associated Coronavirus Infection, from Sera of Patients
1) Indirect ELISA IgG Test
a) Reagents
Preparation of the Plates
[0367] The plates are sensitized with a solution of N protein at 2 μg/ml in a 10 mM PBS buffer, pH 7.2, phenol red at 0.25 ml/l. 100 μl of solution are deposited in the wells and left to incubate at room temperature overnight. Saturation is obtained by prewashing in 10 mM PBS/0.1% Tween buffer, followed by washing with a saturation solution PBS, 25% milk/sucrose.
Diluent Sera
[0368] Buffer 0.48 g/l TRIS, 10 mM PBS, 3.7 g/l EDTA, 15% v/v milk, pH 6.7
Diluent Conjugate
[0369] Citrate buffer (15 g/l), 0.5% Tween, 25% bovine serum, 12% NaCl, 6% v/v skimmed milk pH 6.5
Conjugate
[0370] 50× anti-human IgG conjugate, marketed by Bio-Rad: Platelia H. pylori kit ref 72778
Other Solutions:
[0371] Washing solution R2, solutions for visualizing with TMB R8 diluent, R9 chromogen, R10 stopping solution: reagents marketed by Bio-Rad (e.g.: Platelia pylori kit, ref 72778)
b) Procedure
[0372] Dilute the sera 1/200 in the sample diluent Distribute 100 μl/well
Incubation 1 h at 37° C.
[0373] 3 washings in 10× WASHING solution R2 diluted before-hand 10-fold in demineralized water (i.e., 1× washing solution) Distribute 100 μl of conjugate (50× conjugate to be diluted immediately before use in the diluent conjugate provided)
Incubation 1 h at 37° C.
[0374] 4 washings in 1× washing solution Distribute 200 μl/well of visualization solution (to be diluted immediately before use e.g.: 1 ml of R9 in 10 ml of R8) Incubation for 30 min at room temperature in the dark Stop the reaction with 100 μl/well of R10
READING at 450/620 nm
[0375] The results can be interpreted by taking a THRESHOLD serum giving a response above which the sera tested would be considered as positive. This serum is chosen and diluted so as to give a significantly higher signal than the background noise.
2) DOUBLE EPITOPE ELISA Test
a) Reagents
Preparation of the Plates
[0376] The plates are sensitized with a solution of N protein at 1 μg/ml in a 10 mM PBS buffer, pH 7.2, phenol red at 0.25 ml/l. 100 μl of solution are deposited in the wells and left to incubate at room temperature overnight. Saturation is obtained by prewashing in 10 mM PBS/0.1% Tween buffer, followed by washing with a saturation solution 10 mM PBS, 25% (V/V) milk.
Diluent Sera and Conjugate
[0377] Buffer 50 mM TRIS saline, pH 8, 2% milk
Conjugate
[0378] This is the purified recombinant N protein coupled with peroxidase according to the Nakane protocol (Nakane P. K. and Kawaoi A.; (1974): Peroxydase-labeled antibody, a new method of conjugation. The Journal of Histochemistry and Cytochemistry Vol. 22, N) 23, pp. 1084-1091), in respective molar ratios 1/2. This ProtN POD conjugate is used at a concentration of 2 μg/ml in serum/conjugate diluent.
Other Solutions:
[0379] Washing solution R2, solutions for visualization with TMB R8, diluent, R9 chromogen, R10 stopping solution: reagents marketed by Bio-Rad (e.g. Platelia pylori kit, ref 72778).
b) Procedure
1st Step in "Predilution" Plate
[0380] Dilute each serum 1/5 in the predilution plate (48 μl of diluent+12 μl of serum). [0381] After having diluted all the sera, distribute 60 μl of conjugate. [0382] Where appropriate, the serum+conjugate mix is left to incubate.
2nd Step in "Reaction" Plate
[0382] [0383] Transfer 100 μl of mixture/well into the reaction plate [0384] Incubation 1 h 37° C. [0385] 5 washings in 10× WASHING solution R2 diluted 10-fold beforehand in demineralized water (→1× washing solution) [0386] Distribute 200 μl/well of visualization solution (to be diluted immediately before use e.g.: 1 ml of R9 in 10 ml of R8) [0387] Incubation 30 min at room temperature and protected from light [0388] Stop the reaction with 100 μl/well of R10 [0389] READING at 450/620 nm
[0390] Likewise as for the indirect ELISA test, the results can be interpreted using a "threshold value" serum. Any serum having a response greater than the threshold value serum will be considered as positive.
2) Results
[0391] The sera of patients classified as probable cases of SARS from the French hospital of Hanoi, Vietnam or in relation with the French hospital of Hanoi (JYK) were analyzed using the indirect IgG-N test and the double epitope N test.
[0392] The results of the indirect IgG-N test (FIGS. 14 and 15) and double epitope N test (FIGS. 16 and 17) show an excellent correlation between them and with an indirect ELISA test comparing the reactivity of the sera toward a lysate of VeroE6 cells infected or not infected with SARS-CoV (ELISA-SARS-CoV lysate; see table V below). All the sera collected 12 days or more after the onset of the symptoms were found to be positive, including in patients for whom it had not been possible to document the SARS-CoV virus infection by analyzing respiratory samples by RT-PCR, probably because of a sample being collected too late during the infection (≧D12). In the case of the patient TTH for whom a nasal sample collected on D7 was found to be negative by RT-PCR, the quality of the sample may be in question.
[0393] Some sera were found to be negative whereas the presence of SARS-CoV was detected by RT-PCR. They are in all cases early sera collected less than 10 days after the onset of the symptoms (e.g.: serum #032637). In the case of a patient PTTH (serum #032673), only a suspicion of SARS was raised at the time the samples were collected.
[0394] In conclusion, the indirect IgG-N and N-double epitope serological tests make it possible to document the SARS-CoV infection in all the patients for the sera collected 12 days or more after the infection.
TABLE-US-00007 TABLE V Results of the ELISA tests ELISA Sample SARS-CoV IgG-N 2Xepitope Num Patient Day PCR-SARS (1) lysate (2) (2nd series) (2nd series) 033168 JYK 38 POS +++ >5000 NT 033597 JYK 74 POS NT ≈5000 NT 032552 VTT 8 NEG- NEG <200 <5 D3&D8&D12 032544 CTP 16 NEG ++ >5000 >>20 D16&D20 032546 CJF 15 NEG ++ >5000 >>20 D15&D19 032548 PTL 17 NEG ++ >5000 >>20 D17&D21 032550 NTH 17 NEG-D17&D21 ++ >5000 >>20 032553 VTT 8 NEG- NEG <200 <5 D3&D8&D12 032554 NTBV 4 POS NEG <200 <5 032555 NTBV 4 POS NEG <200 032564 NTP 15 POS ++ >5000 >>20 032629 NVH 4 POS NEG <200 <5 032631 BTTX 9 POS NEG <200 <5 032635 NHH 4 POS NEG <200 <5 032537 NHB 10 POS NEG <200 <5 032642 BTTX 9 POS NEG <200 <5 032643 LTDH 1 POS NEG <200 <5 032644 NTBV 4 POS NEG <200 <5 032646 TTH 12 NEG ++ >5000 >>20 D7&D12&D16 032647 DTH 17 NEG ++ >5000 >>20 D17&D21 032648 NNT 15 NEG ++ >5000 >>20 D15&D19 032649 PTH 17 NEG ++ >5000 >>20 D17&D21 032672 LVV 16 NEG + >5000 >>20 D16&D20 032673 PTTH NA NEG NEG <200 <5 032674 PNB 17 NEG ++ >5000 >>20 D17&D21 032682 VTH 12 NEG ++ >5000 >>20 D12&D16 032683 DTV 17 NEG + >1000 >>20 D17&D21 Remarks: (1): The RT-PCR analyses were carried out by nested RT-PCR BNI, LC Artus and LC-N on nasal or pharyngeal swabs; POS means that at least one sample was found to be positive in this patient. (2): The reactivity of the sera in the ELISA test using a lysate of cells infected with SARS-CoV was classified as very highly reactive (+++), highly reactive (++), reactive (+) and negative according to the OD value obtained at the dilutions tested.
EXAMPLE 8
Detection of SARS-Associated Coronavirus (SARS-CoV) by RT-PCR
[0395] 1) Real Time Development of RT-PCR Conditions with the Aid of Primers Specific for the Gene for the Nucleocapsid Protein--"Light Cycler N" Test
a) Design of the Primers and Probes
[0396] The primers and probes were designed from the sequence of the genome of the SARS-CoV strain derived from the sample recorded under the number 031589, with the aid of the programme "Light Cycler Probe Design (Roche)". Thus, the following two series of primers and probes were selected:
TABLE-US-00008 series 1 (SEQ ID NO: 60, 61, 64, 65): sense primer: N/+/28507: 5'-GGC ATC GTA TGG GTT G-3' [28507-28522] antisense primer: N/-/28774: 5'-CAG TTT CAC CAC CTC C-3' [28774-28759] probe 1: 5'-GGC ACC CGC AAT CCT AAT AAC AAT GC- fluorescein 3' [28561-28586] probe 2: 5' Red705-GCC ACC GTG CTA CAA CTT CCT-phosphate [28588-28608] series 2 (SEQ ID NO: 62, 63, 66, 67) sense primer: N/+/28375: 5'-GGC TAC TAC CGA AGA G-3' [28375-28390] antisense primer: N/-/28702: 5'-AAT TAC CGC GAC TAC G-3' [28702-28687] probe 1: SARS/N/FL: 5'-ATA CAC CCA AAG ACC ACA TTG GC- fluorescein 3' [28541-28563] probe 2: SARS/N/LC705: 5' Red705-CCC GCA ATC CTA ATA ACA ATG CTG C-phosphate 3' [28565-28589]
b) Analysis of the Efficacy of the Two Primer Pairs
[0397] In order to test the respective efficacy of the two pairs of primers, an RT-PCR amplification was carried out on a synthetic RNA corresponding to nucleotides 28054-29430 of the genome of the SARS-CoV strain derived from the sample recorded under the number 031589 and containing the sequence of the N gene.
[0398] More specifically:
[0399] This synthetic RNA was prepared by in vitro transcription with the aid of the T7 phage RNA polymerase, of a DNA template obtained by linearization of the plasmid SRAS-N with the enzyme Bam H1. After eliminating the DNA template by digestion with the aid of DNAse 1, the synthetic RNAs are purified by a phenol-chloroform extraction, followed by two successive precipitations in ammonium acetate and isopropanol. They are then quantified by measuring the absorbance at 260 nm and their quality is checked by the ratio of the absorbances at 260 and 280 nm and by agarose gel electrophoresis. Thus, the concentration of the synthetic RNA preparation used for these studies is 1.6 mg/ml, which corresponds to 2.1×1015 copies/ml of RNA.
[0400] Decreasing quantities of synthetic RNA were amplified by RT-PCR with the aid of the "Superscript® One-Step RT-PCR with Platinum® Taq" kit and the pairs of primers No. 1 (N/+/28507, N/-/28774) (FIG. 1A) and No. 2 (N/+/28375, N/-/28702) (FIG. 1B), according to the supplier's instructions. The amplification conditions used are the following: the cDNA was synthesized by incubation for 30 min at 45° C., 15 min at 55° C. and then 2 min at 94° C. and it was then amplified by 5 cycles comprising: a step of denaturation at 94° C. for 15 sec, a step of annealing at 45° C. for 30 sec and then a step of extension at 72° C. for 30 sec, followed by 35 cycles comprising: a step of denaturation at 94° C. for 15 sec, a step of annealing at 55° C. for 30 sec and then a step of extension at 72° C. for 30 sec, with 2 sec of additional extension at each cycle, and a final step of extension at 72° C. for 5 min. The amplification products obtained were then kept at 10° C.
[0401] The results presented in FIG. 11 show that the pair of primers No. 2 (N/+/28375, N/-/28702) makes it possible to detect up to 10 copies of RNA (band of weak intensity) or 102 copies (band of good intensity) against 104 copies for the pair of primers No. 1 (N/+/28507, N/-/28774). The amplicons are respectively 268 bp (pair 1) and 328 bp (pair 2).
c) Development of Real Time RT-PCR
[0402] A real time RT-PCR was developed with the aid of the pair of primers No. 2 and of the pair of probes consisting of SRAS/N/FL and SRAS/N/LC705 (FIG. 2).
[0403] The amplification was carried out on a LightCycler® (Roche) with the aid of the "Light Cycler RNA Amplification Kit Hybridization Probes" kit (reference 2 015 145, Roche) under the following optimized conditions. A reaction mixture containing: H2O (6.8 μl), 25 mM MgCl2 (0.8 μl, 4 μM Mg2+ final), 5× reaction mixture (4 μl), 3 μm probe SRAS/N/FL (0.5 μl, 0.075 μM final), 3 μM probe SRAS/N/LC705 (0.5 μl, 0.075 μM final), 10 μM primer N/+/28375 (1 μl, 0.5 μM final), 10 μM primer N/-/28702 (1 μl, 0.5 μM final), enzyme mixture (0.4 μl) and sample (viral RNA, 5 μl) was amplified according to the following program: [0404] Reverse transcription: 50° C. 10:00 min analysis mode: none [0405] Denaturation: 95° C. 30 sec×1 analysis mode: none [0406] Amplification: 95° C. 2 sec} [0407] 50° C. 15 sec analysis mode: quantification*}×45 [0408] 72° C. 13 sec thermal ramp 2.0° C./sec} [0409] Annealing: 40° C. 30 sec×1 analysis mode: none [0410] The fluorescence is measured at the end of the annealing and at each cycle (in SINGLE mode).
[0411] The results presented in FIG. 12 show that this real time RT-PCR is very sensitive since it makes it possible to detect 102 copies of synthetic RNA in 100% of the 5 samples analyzed (29/29 samples in 8 experiments) and up to 10 copies of RNA in 100% of the 5 samples analyzed (40/45 samples in 8 experiments). It also shows that this RT-PCR makes it possible to detect the presence of the SARS-CoV genome in a sample and to quantify the number of genomes present. By way of example, the viral RNA of a SARS-CoV stock cultured on Vero E6 cells was extracted with the aid of the "Qiamp viral RNA extraction" kit (Qiagen), diluted to 0.05×10-14 and analyzed by real time RT-PCR according to the protocol described above; the analysis presented in FIG. 12 shows that this virus stock contains 6.5×109 genome-equivalents/ml (geq/ml), which is entirely similar to the 1.0×1010 geq/ml value measured with the aid of the "RealArt® HPA-Coronavirus LC RT PCR Reagents" kit marketed by Artus.
2) Development of Nested RT-PCR Conditions Targeting the Gene for RNA Polymerase--"CDC (Centers for Disease Control and Prevention)/IP Nested RT-PCR" Test
a) Extraction of the Viral RNA
[0412] Clinical sample: QIAmp viral RNA Mini Kit (QIAGEN) according to the manufacturer's instructions, or an equivalent technique. The RNA is eluted in a volume of 60 μl.
b) "SNE/SAR" Nested RT-PCR
First Step: "SNE" Coupled RT-PCR
[0413] The Invitrogen "Superscript® One-Step RT-PCR with Platinum® Taq" kit was used, but the "Titan" kit from Roche Boehringer can be used in its place with similar results.
Oligonucleotides:
TABLE-US-00009 [0414] SNE-S1 5' GGT TGG GAT TAT CCA AAA TGT GA 3' SNE-AS1 5' GCA TCA TCA GAA AGA ATC ATC ATG 3' → Expected size: 440 bp
1. Prepare a mix:
TABLE-US-00010 [0415] H2O 6.5 μl Reaction mix 2X 12.5 μl Oligo SNE-S1 50 μM 0.2 μl Oligo SNE-AS1 50 μM 0.2 μl RNAsin 40 U/μl 0.12 μl RT/Platinum Taq mix 0.5 μl
2. To 20 μl of the mix, add 5 μl of RNA and carry out the amplification on a thermocycler (ABI 9600 conditions):
TABLE-US-00011 2.1 45° C. 30 min. 55° C. 15 min. 94° C. 2 min. 2.2. 94° C. 15 sec. 45° C. 30 sec. {close oversize brace} ×5 cycles 72° C. 30 sec. 2.3. 94° C. 15 sec. 55° C. 30 sec. {close oversize brace} ×35 cycles 72° C. 30 sec. + 2 sec./cycle 2.4. 72° C. 5 min. 2.5 10° C. ∞ Storage at +4° C.
[0416] The RNAsin (N2511/N2515) from Promega was used as RNase inhibitors.
[0417] Synthetic RNAs served as positive control. As the control, 103, 102 and 10 copies of synthetic RNA RSNE were amplified in each experiment.
Second Step: "SAR" Nested PCR
Oligonucleotides:
TABLE-US-00012 [0418] SAR1-S 5' CCT CTC TTG TTC TTG CTC GCA 3' SAR1-AS 5' TAT AGT GAG CCG CCA CAC ATG 3' → Expected size: 121 bp
1. Prepare a mix:
TABLE-US-00013 [0419] H2O 35.8 μl Taq buffer 10X 5 μl MgCl2 25 mM 4 μl Mix dNTPs 5 mM 2 μl Oligo SAR1-S 50 μM 0.5 μl Oligo SAR1-AS 50 μM 0.5 μl Taq DNA pol 5 U/μl 0.25 μl
[0420] AmpliTaq DNA Pol from Applied Biosystems was used (10× buffer without MgCl2, ref 27216601).
2. To 48 μl of the mix, add 2 μl of the product from the first PCR and carry out the amplification (ABI 9600 conditions):
TABLE-US-00014 2.1. 94° C. 2 min. 2.2. 94° C. 30 sec. 45° C. 45 sec. {close oversize brace} ×5 cycles 72° C. 30 sec. 2.3. 94° C. 30 sec. 55° C. 30 sec. {close oversize brace} ×35 cycles 72° C. 30 sec. + 1 sec./cycle 2.4. 72° C. 5 min. 2.5 10° C. ∞
3. Analyze 10 μl of the reaction product on "low-melting" gel (Seakem GTG type) containing 3% agarose.
[0421] The sensitivity of the nested test is routinely, under the conditions described, 10 copies of RNA.
4. The fragments can then be purified on QIAquick PCR kit (QIAGEN) and sequenced with the oligos SAR1-S and SAR1-AS. 3) Detection of the SARS-CoV RNA by PCR from Respiratory Samples
a) First Comparative Study
[0422] A comparative study was carried out on a series of respiratory samples received by the National Reference Center for the Influenza Virus (Northern region) and likely to contain SARS-CoV. To do this, the RNA was extracted from the samples with the aid of the "Qiamp viral RNA extraction" kit (Qiagen) and analyzed by real time RT-PCR, on the one hand with the aid of the pairs of primers and probes of the No. 2 series under the conditions described above on the one hand, and on the other hand with the aid of the kit "LightCycler SARS-CoV quantification kit" marketed by Roche (reference 03 604 438). The results are summarized in table VI below. They show that 18 of the 26 samples are negative and 5 of the 26 samples are positive for the two kits, while one sample is positive for the Roche kit alone and two for the "series 2" N reagents alone. Additionally, for 3 samples (20032701, 20032712, 20032714) the quantities of RNA detected are markedly higher with the reagents (probes and primers) of the No. 2 series. These results indicate that the "series 2" N primers and probes are more sensitive for the detection of the SARS-CoV genome in biological samples than those of the kit currently available.
TABLE-US-00015 TABLE VI Real time RT-PCR analysis of the RNAs extracted from a series of samples from 5 patients with the aid of the pairs of primers and probes of the No. 2 series ("series 2" N) or of the kit "Lightcycler SARS- CoV quantification kit" (Roche). ROCHE Sample No. Patient Type of sample KIT "Series 2" N 20033082 K nasal NEG NEG 20033083 K pharyngeal NEG NEG 20033086 K nasal NEG NEG 20033087 K pharyngeal NEG NEG 20032802 M nasal NEG NEG 20032803 M expectoration NEG NEG 20032806 M nasal or pharyngeal NEG NEG 20031746ARN2 C pharyngeal NEG NEG 20032711 C nasal or pharyngeal 39 NEG 20032910 B nasal NEG NEG 20032911 B pharyngeal NEG NEG 20033356 V expectoration NEG NEG 20033357 V expectoration NEG NEG 20031725 K endotracheal asp. NEG 150 20032657 K endotracheal asp. NEG NEG 20032698 K endotracheal asp. NEG NEG 20032720 K endotracheal asp. 3 5 20033074 K stools 115 257 20032701 M pharyngeal 443 1676 20032702 M expectoration NEG 249 20031747ARN2 C pharyngeal NEG NEG 20032712 C unknown 634 6914 20032714 C pharyngeal 17 223 20032800 B nasal NEG NEG 20033353 V nasal NEG NEG 20033384 V nasal NEG NEG The type of sample is indicated as well as the number of copies of viral genome measured in each of the two tests. NEG: negative RT-PCR.
b) Second Comparative Study
[0423] The performance of various nested RT-PCR and real time RT-PCR methods were then compared for 121 respiratory samples from possible cases of SARS at the French hospital in Hanoi, Vietnam, taken between the 4th and the 17th day after the onset of the symptoms. Among these samples, 14 were found to be positive during a first test using the nested RT-PCR method targeting ORF1b (encoding replicase) as described initially by Bernhard Nocht Institute (BNI nested RT-PCR). Information relating to this test is available on the internet, at the address http://www15.bni-hamburg.de/bni2/neu2/getfile.acgi?area_engl=diagnostics&- pid=4112.
[0424] The various tests compared in this study are: [0425] the quantitative RT-PCR method according to the invention, with the "series 2" N primers and probes described above (LightCycler N column), [0426] the nested RT-PCR test targeting the RNA polymerase gene described above, developed by the CDC, BNI and Institut Pasteur (CDC/IP nested RT-PCR), [0427] the ARTUS kit with the reference "HPA Corona LC RT-PCR Kit #5601-02", which is a real time RT-PCR test targeting the ORF1b gene, [0428] the BNI nested RT-PCR test, also targeting the RNA polymerase gene mentioned above.
[0429] The inventors observed:
1) an inter-test variability for the same technique, linked to the degradation of the RNA preparation during repeated thawing, in particular for the samples containing the lowest quantities of RNA, 2) a reduced sensitivity of the CDC/IP nested RT-PCR compared with the BNI nested RT-PCR, and 3) a comparable sensitivity of the quantitative RT-PCR test according to the invention (LightCycler N) compared with the Artus LightCycler (LC) test.
[0430] These results, which are presented in table VII below, show that the quantitative RT-PCR test according to the invention constitutes an excellent addition--or an alternative--to the tests currently available. Indeed, the SARS-linked coronavirus is an emergent virus which is capable of changing rapidly. In particular, the gene for the RNA polymerase of the SARS-linked coronavirus, which is targeted in most of the tests currently available, can recombine with that of other coronaviruses not linked to SARS. The use of a test targeting this gene exclusively could then lead to the production of false-negatives.
[0431] The quantitative RT-PCR test according to the invention does not target the same genomic region as the ARTUS kit since it targets the gene encoding the N protein. By carrying out a diagnostic test targeting two different genes of the SARS-linked coronavirus, it can therefore be hoped to avoid false-negative type results which could be due to the genetic evolution of the virus.
[0432] Furthermore, it appears particularly advantageous to target the gene for the nucleocapsid protein because it is very stable because of the high selection pressure linked to the high structural constraints regarding this protein.
TABLE-US-00016 TABLE VII Comparison of various methods of analysis by gene amplification, from 121 samples of probable cases of SARS at the French hospital in Hanoi, Vietnam (epidemic 2003) Artus Sample CDC/IP BNI Light Light Sample collection nested nested Cycler Cycler NRC No. type (1) day Patient RT-PCR RT-PCR kit N (IP) 107 N and P Negative Negative Negative Negative samples 032529 P 10 NHB Negative Positive Negative Negative 032530 N 10 NHB Positive Positive 3.10E+01 4.20E+01 032531 P 7 LP Positive Positive 7.70E+00 3.10E+00 032534 N 15 BND Positive Positive 1.60E+00 Negative 032600 P 4 NHH Negative Positive Negative 1.30E+02 032612 P 17 NTS Negative Positive Negative Negative 032688 P 9 BTX Positive Positive Negative Negative 032689 N 4 NVH Positive Positive 1.20E+01 2.30E+02 032690 P 4 NVH Negative Positive 1.60E+00 Negative 032727 P 8 NVH Positive Positive 2.30E+02 4.00E+02 032728 N 8 NVH Positive Positive 1.10E+03 1.60E+04 032729 P 14 NHB Positive Positive 5.90E+00 3.40E+01 032730 N 14 NHB Positive Positive 1.30E+02 4.80E+02 032741 P 8 NHH Positive Positive 2.10E+02 1.30E+02 positives 10 14 10 9 fraction detected from the 14 positives 71.4% 100.0% 71.4% 64.3% (1) P = pharyngeal swab N = nasal swab
EXAMPLE 9
Production and Characterization of Monoclonal Antibodies Directed Against the N Protein
[0433] Balb C mice were immunized with the purified recombinant N protein and their spleen cells fused with an appropriate murine myeloma according to the Kohler and Milstein techniques.
[0434] Nineteen anti-N antibody secreting hybridomas were preselected and their immunoreactivities determined. These antibodies do indeed recognize the recombinant N protein (in ELISA) with variable intensities, and the natural viral N protein in ELISA and/or in Western blotting. FIGS. 18 to 20 show the results of these tests for 15 of these 19 monoclonal antibodies.
[0435] The highly reactive clones 12, 17, 28, 57, 72, 76, 86, 87, 98, 103, 146, 156, 166, 170, 199, 212, 218, 219 and 222 were subcloned. Specificity studies were carried out with the appropriate tools in order to determine the epitopes recognized and verify the absence of reactivity toward other human coronaviruses and certain respiratory viruses.
[0436] Epitope mapping studies (performed on spot membrane with the aid of overlapping peptides of 15 aa) and additional studies performed on the natural N protein in Western blotting revealed the existence of 4 groups of monoclonal antibodies:
1. Monoclonal antibodies specific for a major linear epitope at the N-ter position (75-81, sequence: INTNSVP).
[0437] The representative of this group is antibody 156. The hybridoma producing this antibody was deposited at the Collection Nationale de Cultures de Microorganismes (CNCM) of the Institut Pasteur (Paris, France) on Dec. 1, 2004, under the number I-3331. This same epitope is also recognized by a rabbit serum (anti-N polyclonal) obtained by conventional immunization with the aid of this same N protein.
2. Monoclonal antibodies specific for a major linear epitope located in a central position (position 217-224, sequence: ETALALL); the representatives of this group are the monoclonal antibodies 87 and 166. The hybridoma producing antibody 87 was deposited at the CNCM on Dec. 1, 2004, under the number I-3328. 3. Monoclonal antibodies specific for a major linear epitope located at the C-terminal position (position 403-408, sequence: DFFRQL), the representatives of this group are the antibodies 28, 57 and 143. The hybridoma producing antibody 57 was deposited at the CNCM on Dec. 1, 2004, under the number I-3330. 4. Monoclonal antibodies specific for a discontinuous conformational epitope. This group of antibodies does not recognize any of the peptides spanning the sequence of the N protein, but react strongly on the non-denatured natural protein. The representative of this final group is the antibody 86. The hybridoma producing this antibody was deposited at the CNCM on Dec. 1, 2004, under the number I-3329. Table VIII below summarizes the epitope mapping results obtained:
TABLE-US-00017 TABLE VIII Epitope mapping of the monoclonal antibodies Antibody Epitope Position Region 28 DFSRQL Q 403 . . . 408 C-Ter. 143 DFSRQL Q 76 DFSRQL Q 57 DFSRQL Q FFGMS RI 315 . . . 319 146 LPQRQ 383 . . . 387 166 ETALALLLL 217 . . . 224 central 87 ETALALL 217 . . . 224 156 INTNSGP 75 . . . 81 N-Ter. 86 Conformational 212 Conformational 170 Conformational
[0438] In addition, as illustrated in particular in FIGS. 18 and 19, these antibodies exhibit no reactivity in ELISA and/or in WB toward the N protein of the human coronavirus 229 E.
EXAMPLE 10
Combinations of the Monoclonal Antibodies for the Development of a Sensitive Immunocapture Test Specific for the Viral N Antigen in the Serum or Biological Fluids of Patients Infected with the SARS-CoV Virus
[0439] The antibodies listed below were selected because of their very specific properties for an additional capture and detection study of the viral N protein, in the serum of the subjects or patients.
[0440] These antibodies were produced in ascites on mice, purified by affinity chromatography and used alone or in combination, as capture antibodies and as signal antibodies.
[0441] List of the antibodies selected: [0442] Ab anti-C-ter region (No. 28, 57, 143) [0443] Ab anti-central region (No. 87, 166) [0444] Ab anti-N-ter region (No. 156) [0445] Ab anti-discontinuous conformational epitope (86)
1) Preparation of the Reagents:
a) Immunocapture ELISA Plates
[0446] The plates are sensitized with the antibody solutions at 5 μg/ml in 0.1 M carbonate buffer, pH 9.6. The (monovalent or plurivalent) solutions are deposited in a volume of 100 μl in the wells and incubated overnight at room temperature. These plates are then washed with PBS buffer (10 mM pH 7.4 supplemented with 0.1% Tween 20) and then saturated with a PBS solution supplemented with 0.3% BSA and 5% sucrose). The plates are then dried and then packaged in a bag in the presence of a desiccant. They are ready to use.
b) Conjugates
[0447] The purified antibodies were coupled with peroxidase according to the Nakane protocol (Nakane et al.--1974, J. of Histo and cytochemistry, vol. 22, pp. 1084-1091) in a ratio of one molecule of IgG per 3 molecules of peroxidase. These conjugates were purified by exclusion chromatography and stored concentrated (concentration between 1 and 2 mg/ml) in the presence of 50% glycerol and at -20° C. They are diluted for their use in the assays at the final concentration of 1 or 2 μg/ml in PBS buffer (pH 7.4) supplemented with 1% BSA.
c) Other Reagents
[0448] Human sera negative for all the serum markers for the HIV, HBV, HCV and THLV viruses [0449] Pool of negative human sera supplemented with 0.5% Triton X 100 [0450] Inactivated viral Ag: viral culture supernatant inactivated by irradiation and inactivation verified after placing in culture on sensitive cells--titer of the suspension before inactivation about 107 infectious particles per ml or alternatively about 5×109 physical viral particles per ml of antigen [0451] The Ag samples diluted in negative human serum: these samples were prepared by diluting 1:100 and then by 5-fold serial dilution. [0452] These noninfectious samples mimic human samples thought to contain low to very low concentrations of viral nucleoprotein N. Such samples are not available for routine work. [0453] Washing solution R2, solution for visualization TMB R8, chromogen R9 and stop solution R10, are the generic reagents marketed by Bio-Rad in its ELISA kits (e.g.: Platelia pylori kit ref. 72778).
2) Procedure
[0454] The samples of human sera overloaded with inactivated viral Ag are distributed in an amount of 100 μl per well, directly in the ready-to-use sensitized plates, and then incubated for 1 hour at 37° C. (Bio-Rad IPS incubation).
[0455] The material not bound to the solid phase is removed by 3 washings (washing with dilute R2 solution, automatic LP 35 washer).
[0456] The appropriate conjugates, diluted to the final concentration of 1 or 2 μg/ml, are distributed in an amount of 100 μl per well and the plates are again incubated for one hour at 37° C. (IPS incubation).
[0457] The excess conjugate is removed by 4 successive washings (dilute R2 solution--LP 35 washer).
[0458] The presence of conjugate attached to the plates is visualized after adding 100 μl of visualization solution prepared before use (1 ml of R9 and 10 ml of R8) and after incubation for 30 minutes, at room temperature and protected from light.
[0459] The enzymatic reaction is finally blocked by adding 100 μl of R10 reagent (1 N H2SO4) to all the wells.
[0460] The reading is carried out with the aid of an appropriate microplate reader at double wavelength (450/620 nm).
[0461] The results can be interpreted by using, as provisional threshold value, the mean of at least two negative controls multiplied by a factor of 2 or alternatively the mean of 100 negative sera supplemented with an increment corresponding to 6 SD (standard deviation calculated on the 100 individual measurements).
3) Results
[0462] Various capture antibody and signal antibody combinations were tested based on the properties of the antibodies selected, and avoiding the combinations of antibodies specific for the same epitopes in solid phase and as conjugates.
[0463] The best results were obtained with the 4 combinations listed below. These results are reproduced in table IX below.
1. Combination F/28
[0464] Solid phase (Ab 166+87 central region): conjugate antibody 28 (C-ter)
2. Combination G/28
[0465] Solid phase (Ab 86--conformational epitope): conjugate antibody 28 (C-ter)
3. Combination H/28
[0466] Solid phase (Ab 86, 166 and 87 central region and conformational epitope): conjugate antibody 28 (C-ter)
4. Combination H/28+87
[0467] Solid phase (Ab 86, 166 and 87 central region and conformational epitope): mixed conjugate antibodies 28 (C-ter) and 87 (central)
5. Combination G/87
[0468] Solid phase (Ab 86--conformational epitope): conjugate antibody 87 (central region)
[0469] The first 4 combinations exhibit equivalent and reproduced performance levels, greater than the other combinations used (such as for example the combination G/87). Of course, in these combinations, a monoclonal antibody may be replaced with another antibody recognizing the same epitope. Thus, the following variants may be mentioned:
6. Variant of the combination F/28 Solid phase (Ab 87 only): conjugate antibody 57 (C-ter) 7. Variant of the combination G/28 Solid phase (Ab 86--conformational epitope): conjugate antibody 57 (C-ter) 8. Variant of the combination H/28 Solid phase (Ab 86 and 87 central region and conformational epitope): conjugate antibody 57 (C-ter) 9. Variant of the combination H/28+87 Solid phase (Ab 86 and 87 central region and conformational epitope): mixed conjugate antibodies 57 (C-ter) and 87 (central)
TABLE-US-00018 TABLE IX Test of immunoreactivity of the anti-SARS-CoV nucleoprotein Abs: optical densities measured with each combination of antibodies according to the dilutions of the inactivated viral antigen. No. Dilution F/28 G/28 G/87 H/28 H/28 + 87 0 1/100 5 5 3.495 3.900 5 1 1/500 3.795 3.814 1.379 3.702 3.804 2 1/2 500 2.815 2.950 0.275 3.268 2.680 3 1/12 500 0.987 1.038 0.135 1.374 0.865 4 1/62 500 0.404 0.348 0.125 0.480 0.328 5 1/312 500 0.285 0.211 0.123 0.240 0.215 6 Control 0.210 0.200 0.098 0.186 0.156 7 Control 0.269 0.153 0.104 0.193 0.202
[0470] The detection limit for these 4 experimental trials corresponds to the antigen dilution in negative serum 1:62 500. A rapid extrapolation suggests the detection of less than 103 infectious particles per ml of sera.
[0471] From this study, it is evident that the most appropriate antibodies for the capture of the native viral nucleoprotein are the antibodies specific for the central region and/or for a conformational epitope, both being antibodies also selected for their high affinity for the native antigen.
[0472] Having determined the best antibodies for the composition of the solid phase, the antibodies to be selected as a priority for the detection of the antigens attached to the solid phase are the complementary antibodies specific for a dominant epitope in the C-ter region. The use of any other complementary antibody specific for epitopes located in the N-ter region of the protein leads to average or poor results.
EXAMPLE 11
Eukaryotic Expression Systems for the SARS-Associated Coronavirus (SARS-CoV) Spicule (S) Protein
1) Optimization of the Conditions for Expression of the SARS-CoV S in Mammalian Cells
[0473] The conditions for transient expression of the SARS-CoV spicule (S) protein were optimized in mammalian cells (293T, VeroE6).
[0474] For that, a DNA fragment containing the cDNA for SARS-CoV S was amplified by PCR with the aid of the oligonucleotides 5'-ATAGGATCCA CCATGTTTAT TTTCTTATTA TTTCTTACTC TCACT-3' and 5'-ATACTCGAGTT ATGTGTAATG TAATTTGACA CCCTTG-3' from the plasmid pSARS-S(C.N.C.M. No. I-3059) and then inserted between the BamH1 and Xho1 sites of the plasmid pTRIPΔU3-CMV containing a lentiviral vector TRIP (Sirven, 2001, Mol. Ther., 3, 438-448) in order to obtain the plasmid pTRIP-S. The BamH1 and Xho1 fragment containing the cDNA for S was then subcloned between BamH1 and Xho1 of the eukaryotic expression plasmid pcDNA3.1(+) (Clontech) in order to obtain the plasmid pcDNA-S. The Nhe1 and Xho1 fragment containing the cDNA for S was then subcloned between the corresponding sites of the expression plasmid pCI (Promega) in order to obtain the plasmid pCI-S. The WPRE sequences of the woodchuck hepatitis virus ("Woodchuck Hepatitis Virus posttranscriptional regulatory element") and the CTE sequences ("constitutive transport element") of the simian retro-virus from Mason-Pfizer were inserted into each of the two plasmids pcDNA-S and pCI-S between the Xho1 and Xba1 sites in order to obtain respectively the plasmids pcDNA-S-CTE, pcDNA-S-WPRE, pCI-S-CTE and pCI-S-WPRE (FIG. 21). The plasmid pCI-S-WPRE was deposited at the CNCM, on Nov. 22, 2004, under the number I-3323. All the inserts were sequenced with the aid of a BigDye Terminator v1.1 kit (Applied Biosystems) and an automated sequencer ABI377.
[0475] The capacity of the plasmid constructs to direct the expression of SARS-CoV S in mammalian cells was assessed after transfection of VeroE6 cells (FIG. 22). In this experiment, monolayers of 5×105 VeroE6 cells in 35 mm Petri dishes were transfected with 2 μg of plasmids pcDNA (as control), pcDNA-S, pCI and pCI-S and 6 μl of Fugene6 reagent according to the manufacturer's instructions (Roche). After 48 hours of incubation at 37° C. and under 5% CO2, cellular extracts were prepared in loading buffer according to Laemmli, separated on 8% SDS polyacrylamide gel, and then transferred onto a PVDF membrane (BioRad). The detection of this immunoblot (Western blot) was carried out with the aid of an anti-S rabbit polyclonal serum (immune serum from the rabbit P11135: cf. example 4 above) and donkey polyclonal antibodies directed against rabbit IgGs and coupled with peroxidase (NA934V, Amersham). The bound antibodies were visualized by luminescence with the aid of the ECL+ kit (Amersham) and autoradiography films Hyperfilm MP (Amersham).
[0476] This experiment (FIG. 22) shows that the plasmid pcDNA-S does not make it possible to direct the expression of SARS-CoV S at detectable levels whereas the plasmid pCI-S allows a weak expression, close to the limit of detection, which may be detected when the film is overexposed. Similar results were obtained when the expression of S was sought by immunofluorescence (data not shown). This impossibility to detect effective expression of S cannot be attributed to the detection techniques used since the S protein can be detected at the expected size (180 kDa) in an extract of cells infected with SARS-CoV or in an extract of VeroE6 cells infected with the recombinant vaccinia virus VV-TF7.3 and transfected with the plasmid pcDNA-S. In this latter experiment, the virus VV-TF7.3 expresses the RNA polymerase of the T7 phage and allows the cytoplasmic transcription of an uncapped RNA capable of being efficiently translated. This experiment suggests that the expression defects described above are due to an intrinsic inability of the cDNA for S to be efficiently expressed when the step for transcription to messenger RNA is carried out at the nuclear level.
[0477] In a second experiment, the effect of the CTE and WPRE signals on the expression of S was assessed after transfection of VeroE6 (FIG. 23A) and 293T (FIG. 23B) cells and according to a protocol similar to that described above. Whereas the expression of S cannot be detected after transfection of the plasmids pcDNA-S-CTE and pcDNA-S-WPRE derived from pcDNA-S, the insertion of the WPRE and CTE signals greatly improves the expression of S in the context of the expression plasmid pCI-S.
[0478] To specify this result, a second series of experiments were carried out where the immunoblot is quantitatively visualized by luminescence and acquisition on a digital imaging device (Fluor S, BioRad). The analysis of the results obtained with the QuantityOne v4.2.3 software (BioRad) shows that the WPRE and CTE sequences increase respectively the expression of S by a factor of 20 to 42 and 10 to 26 in Vero E6 cells (table X). In 293T cells (table X), the effect of the CTE sequence is more moderate (4 to 5 times) whereas that of the WPRE sequence remains high (13 to 28 times).
TABLE-US-00019 TABLE X Quantitative analysis of the effect of the CTE and WPRE signals on the expression of SARS-CoV S: Plasmid cell exp. 1 exp. 2 PCI VeroE6 0.0 0.0 pCI-S VeroE6 1.0 ± 0.1 1.0 pCI-S-CTE VeroE6 9.8 ± 0.9 26.4 pCI-S-WPRE VeroE6 20.1 ± 2.0 42.3 PCI 293T 0.0 0.0 PCI-S 293T 1.0 1.0 PCI-S-CTE 293T 4.6 4.0 PCI-S-WPRE 293T 27.6 12.8 Cellular extracts were prepared 48 hours after transfection of VeroE6 or 293T cells with the plasmid pCI, pCI-S, pCI-S-CTE and pCI-S-WPRE and analyzed by Western blotting as described in the legend to FIG. 22. The Western blot is visualized by luminescence (ECL+, Amersham) and acquisition on a digital imaging device (FluorS, BioRad). The expression levels are indicated according to an arbitrary scale where the value of 1 represents the level measured after transfection of the plasmidpCI-S. Two independent experiments were carried out for each of the two cell types. In experiment 1 on VeroE6 cells, the transfections were carried out in duplicate and the results are indicated in the form of the mean and standard deviation values for the expression levels measured.
[0479] In summary, all these results show that the expression, in mammalian cells, of the cDNA for the SARS-CoV S under the control of the RNA polymerase II promoter sequences requires, to be efficient, the expression of a splice signal and of either of the sequences WPRE and CTE.
2) Production of Stable Lines Allowing the Expression of SARS-CoV S
[0480] The cDNA for the SARS-CoV S protein was cloned in the form of a BamH1-Xho1 fragment into the plasmid pTRIPΔU3-CMV containing a defective lentiviral vector TRIP with central DNA flap (Sirven et al., 2001, Mol. Ther., 3: 438-448) in order to obtain the plasmid pTRIP-S (FIG. 24). Transient cotransfection according to Zennou et al. (2000, Cell, 101: 173-185) of this plasmid, of an encapsidation plasmid (p8.2) and of a plasmid for expression of the VSV envelope glycoprotein G (pHCMV-G) in 293T cells allowed the preparation of retroviral pseudoparticles containing the vector TRIP-S and pseudotyped with the envelope protein G. These pseudotyped TRIP-S vectors were used to translate 293T and FRhK-4 cells: no expression of the S protein could be detected by Western blotting and immunofluorescence in the transduced cells (data not presented).
[0481] The optimum expression cassettes consisting of the CMV virus immediate/early promoter, a splice signal, cDNA for S and either of the posttranscriptional signals WPRE or CTE described above were then substituted for the EF1α-EGFP cassette of the defective lentiviral expression vector with central DNA flap TRIPΔU3-EF1α (Sirven et al., 2001, Mol. Ther., 3: 438-448) (FIG. 25). These substitutions were carried out by a series of successive subclonings of the S expression cassettes which were excised from the plasmids pCT-S-CTE (BglII-Apa1) or respectively pCI-S-WPRE (BglII-Sal1) and then inserted between the Mlu1 and Kpn1 sites or respectively Mlu1 or Xho1 sites of the plasmid TRIPΔU3-EF1α in order to obtain the plasmids pTRIP-SD/SA-S-CTE and pTRIP-SD/SA-S-WPRE, deposited at the CNCM, on Dec. 1, 2004, under the numbers I-3336 and I-3334, respectively. Pseudotyped vectors were produced according to Zennou et al. (2000, Cell, 101: 173-185) and used to transduce 293T cells (10 000 cells) and FRhK-4 cells (15 000 cells) according to a series of 5 successive transduction cycles with a quantity of vectors corresponding to 25 ng (TRIP-SD/SA-S-CTE) or 22 ng TRIP-SD/SA-S-WPRE) of p24 per cycle.
[0482] The transduced cells were cloned by limiting dilution and a series of clones were qualitatively analyzed for the expression of SARS-CoV S by immunofluorescence (data not shown), and then quantitatively by Western blotting (FIG. 25) with the aid of an anti-S rabbit polyclonal serum. The results presented in FIG. 25 show that clones 2 and 15 of FrhK4-s-CTE cells transduced with TRIP-SD/SA-S-CTE and clones 4, 9 and 12 of FRhK4-S-WPRE cells transduced with TRIP-SD/SA-S-WPRE allow the expression of the SARS-CoV S at respectively low or moderate levels if they are compared to those which can be observed during infection with SARS-CoV.
[0483] In summary, the vectors TRIP-SD/SA-S-CTE and TRIP-SD/SA-S-WPRE allow the production of stable clones of FRhK-4 cells and similarly 293T cells expressing SARS-CoV S, whereas the assays carried out with the "parent" vector TRIP-S remained unsuccessful, which demonstrates the need for a splice signal and for either of the sequences CTE and WPRE for the production of stable cell clones expressing the S protein.
[0484] In addition, these modifications of the vector TRIP (insertion of a splice signal and of a posttranscriptional signal like CTE and WPRE) could prove advantageous for improving the expression of other cDNAs than that for S.
3) Production of Stable Lines Allowing the Expression of a Soluble Form of SARS-CoV S. Purification of this Recombinant Antigen.
[0485] A cDNA encoding a soluble form of the S protein (Ssol) was obtained by fusing the sequences encoding the ectodomain of the protein (amino acids 1 to 1193) with those of a tag (FLAG:DYKDDDDK) via a BspE1 linker encoding the SG dipeptide. Practically, in order to obtain the plasmid pcDNA-Ssol, a DNA fragment encoding the ectodomain of SARS-CoV S was amplified by PCR with the aid of the oligonucleotides 5'-ATAGGATCCA CCATGTTTAT TTTCTTATTA TTTCTTACTC TCACT-3' and 5'-ACCTCCGGAT TTAATATATT GCTCATATTT TCCCAA-3' from the plasmid pcDNA-S, and then inserted between the unique BamH1 and BspE1 sites of a modified eukaryotic expression plasmid pcDNA3.1(+) (Clontech) containing the tag sequence FLAG between its BamH1 and Xho1 sites:
TABLE-US-00020 // GGATCC . . . nnn . . . TCC GGA GAT TAT AAA GAT GAC GAC GAT AAA TAA BamHl S G D Y K D D D D K ter CTCGAG // Xhol
[0486] The Nhe1-Xho1 and BamH1-Xho1 fragments, containing the cDNA for S, were then excised from the plasmid pcDNA-Ssol, and subcloned between the corresponding sites of the plasmid pTRIP-SD/SA-S-CTE and of the plasmid pTRIP-SD-SA-S-WPRE, respectively, in order to obtain the plasmids pTRIP-SD/SA-Ssol-CTE and pTRIP-SD/SA-Ssol-WPRE, deposited at the CNCM, on Dec. 1, 2004, under the numbers I-3337 and I-3335, respectively.
[0487] Pseudotyped vectors were produced according to Zennou et al. (2000, Cell, 101:173-185) and used to transduce FRhK-4 cells (15 000 cells) according to a series of 5 successive transduction cycles (15 000 cells) with a quantity of vector corresponding to 24 ng (TRIP-SD/SA-Ssol-CTE) or 40 ng (TRIP-SD/SA-Ssol-WPRE) of p24 per cycle. The transduced cells were cloned by limiting dilution and a series of 16 clones transduced with TRIP-SD/SA-Ssol-CTE and of 15 clones with TRIP-SD/SA-Ssol-WPRE were analyzed for the expression of the Ssol polypeptide by Western blotting visualized with an anti-FLAG monoclonal antibody (FIG. 26 and data not presented), and by capture ELISA specific for the Ssol polypeptide which was developed for this purpose (table XI and data not presented). Part of the process for selecting the best secretory clones is shown in FIG. 26. Capture ELISA is based on the use of solid phases coated with polyclonal antibodies of rabbits immunized with purified and inactivated SARS-CoV. These solid phases allow the capture of the Ssol polypeptide secreted into the cellular supernatants, whose presence is then visualized with a series of steps successively involving the attachment of an anti-FLAG monoclonal antibody (M2, SIGMA), of anti-mouse IgG(H+L) biotinylated rabbit polyclonal antibodies (Jackson) and of a streptavidin-peroxidase conjugate (Amersham) and then the addition of chromogen and substrate (TME+H2O2, KPL).
TABLE-US-00021 TABLE XI Analysis of the expression of the Ssol polypeptide by cell lines transduced with the lentiviral vectors TRIP-SD/SA-Ssol-WPRE and TRIP-SD/SA-Ssol-CTE. Vector Clone OD (450 nm) Control -- 0.031 TRIP-SD/SA-Ssol- CTE2 0.547 CTE CTE3 0.668 CTE9 0.171 CTE12 0.208 CTE13 0.133 TRIP-SD/SA-Ssol- WPRE1 0.061 WPRE WPRE10 0.134 The secretion of the Ssol polypeptide was assessed in the supernatant of a series of cell clones isolated after transduction of FRhK-4 cells with the lentiviral vectors TRIP-SD/SA-Ssol-WPRE and TRIP-SD/SA-Ssol-CTE. The supernatants diluted 1/50 were analyzed by a capture ELISA test specific for SARS-CoV S.
[0488] The cell line secreting the highest quantities of Ssol polypeptide in the culture supernatant is the FRhK4-Ssol-CTE3 line. It was subjected to a second series of 5 cycles of transduction with the vector TRIP-SD/SA-Ssol-CTE under conditions similar to those described above and then cloned. The subclone secreting the highest quantities of Ssol was selected by a combination of Western blot and capture ELISA analysis: it is the subclone FRhK4-Ssol-30, which was deposited at the CNCM, on Nov. 22, 2004, under the name I-3325.
[0489] The FRhK4-Ssol-30 line allows the quantitative production and purification of the recombinant Ssol polypeptide. In a typical experiment where the experimental conditions for growth, production and purification were optimized, the cells of the FRhK4-Ssol-30 line are inoculated in standard culture medium (pyruvate-free DMEM containing 4.5 g/l of glucose and supplemented with 5% FCS, 100 U/ml of penicillin and 100 μg/ml of streptomycin) in the form of a subconfluent monolayer (1 million cells per each 100 cm2 in 20 ml of medium). At confluence, the standard medium is replaced with the secretion medium where the quantity of FCS is reduced to 0.5% and the quantity of medium reduced to 16 ml per each 100 cm2. The culture supernatant is removed after 4 to 5 days of incubation at 35° C. and under 5% CO2. The recombinant polypeptide Ssol is purified from the supernatant by the succession of steps of filtration on 0.1 μm polyethersulfone (PES) membrane, concentration by ultrafiltration on a PES membrane with a 50 kD cut-off, affinity chromatography on anti-FLAG matrix with elution with a solution of FLAG peptide (DYKDDDDK) at 100 μg/ml in TBS (50 mM tris, pH 7.4, 150 mM NaCl) and then gel filtration chromatography in TBS on sephadex G-75 beads (Pharmacia). The concentration of the purified recombinant Ssol polypeptide was determined by micro-BCA test (Pierce) and then its biochemical characteristics analyzed.
[0490] Analysis by 8% SDS acrylamide gel stained with silver nitrate demonstrates a predominant polypeptide whose molecular mass is about 180 kD and whose degree of purity may be evaluated at 98% (FIG. 27A). Two main peaks are detected by SELDI-TOF mass spectrometry (Cyphergen): they correspond to single and double charged forms of a predominant polypeptide whose molecular mass is thus determined at 182.6±3.7 kD (FIGS. 27B and C). After transfer onto Prosorb membrane and rinsing in 0.1% TFA, the N-terminal end of the Ssol polypeptide was sequenced in liquid phase by Edman degradation on 5 residues (ABI494, Applied Biosystems) and determined as being SDLDR (FIG. 27D). This demonstrates that the signal peptide located at the N-terminal end of the SARS-CoV S protein, composed of aa 1 to 13 (MFIFLLFLTLTSG) according to an analysis carried out with the software signalP v2.0 (Nielsen et al., 1997, Protein Engineering, 10:1-6), is cleaved from the mature Ssol polypeptide. The recombinant Ssol polypeptide therefore consists of amino acids 14 to 1193 of the SARS-CoV S protein fused at the C-terminals with a sequence SGDYKDDDDK containing the sequence of the FLAG tag (underlined). The difference between the theoretical molar mass of the naked Ssol polypeptide (132.0 kD) and the real molar mass of the mature polypeptide (182.6 kD) suggests that the Ssol polypeptide is glycosylated.
[0491] A preparation of purified Ssol polypeptide, whose protein concentration was determined by micro-BCA test, makes it possible to prepare a calibration series in order to measure, with the aid of the capture ELISA test described above, the concentrations of Ssol present in the culture supernatants and to review the characteristics of the secretory lines. According to this test, the FRhK4-Ssol-CT3 line secretes 4 to 6 μg/ml of polypeptide Ssol while the FRhK4-Ssol-30 line secretes 9 to 13 μg/ml of Ssol after 4 to 5 days of culture at confluence. In addition, the purification scheme presented above makes it possible routinely to purify from 1 to 2 mg of Ssol polypeptide per liter of culture supernatant.
EXAMPLE 12
Gene Immunization Involving the SARS-Associated Corona Virus (SARS-CoV) Spicule (S) Protein
[0492] The effect of a splice signal and of the posttranscriptional signals WPRE and CTE was analyzed after gene immunization of BALB/c mice (FIG. 28).
[0493] For that, BALB/c mice were immunized at intervals of 4 weeks by injecting into the tibialis anterior a saline solution of 50 μg of plasmid DNA of pcDNA-S and pCI-S and, as a control, 50 μg of plasmid DNA of pcDNA-N (directing the expression of SARS-CoV N) or of pCI-HA (directing the expression of the HA of the influenza virus A/PR/8/34) and the immune sera collected 3 weeks after the 2nd injection. The presence of antibodies directed against the SARS-CoV S was assessed by indirect ELISA using as antigen a lysate of VeroE6 cells infected with SARS-CoV and, as a control, a lysate of noninfected VeroE6 cells. The anti-SARS-CoV antibody titers (TI) are calculated as the reciprocal of the dilution producing a specific OD of 0.5 (difference between OD measured on a lysate of infected cells and OD measured on a lysate of noninfected cells) after visualization with an anti-mouse IgG polyclonal antibody coupled with peroxidase (NA931V, Amersham) and TMB supplemented with H2O2 (KPL) (FIG. 28A).
[0494] Under these conditions, the expression plasmid pcDNA-S only allows the induction of low antibody titers directed against SARS-CoV S in 3 mice out of 6 (LOG10(TI)=1.9±0.6) whereas the plasmid pcDNA-N allows the induction of anti-N antibodies at high titers (LOG10(TI)=3.9±0.3) in all the animals, and the control plasmids (pCI, pCI-HA) do not result in any detectable antibody (LOG10(TI)<1.7). The plasmid pCI-S equipped with a splice signal allows the induction of antibodies at high titers (LOG10(TI)=3.7±0.2), which are approximately 60 times higher than those observed after injection of the plasmid pcDNA-S (p<10-5).
[0495] The efficiency of the posttranscriptional signals was studied by carrying out a dose-response study of the anti-S antibody titers induced in the BALB/c mouse as a function of the quantity of plasmid DNA used as immunogen (2 μg, 10 μg and 50 μg). This study (FIG. 28B) demonstrates that the posttranscriptional signal WPRE greatly improves the efficiency of gene immunization when small doses of DNA are used (p<10-5 for a dose of 2 μg of DNA and p<10-2 for a dose of 10 μg), whereas the effect of the CTE signal remains marginal (p=0.34 for a dose of 2 μg of DNA).
[0496] Finally, the antibodies induced in mice after gene immunization neutralize the infectivity of SARS-CoV in vitro (FIGS. 29A and 29B) at titers which are consistent with the titers measured by ELISA.
[0497] In summary, the use of a splice signal and of the posttranscriptional signal WPRE of the woodchuck hepatitis virus considerably improves the induction of neutralizing antibodies directed against SARS-CoV after gene immunization with the aid of plasmid DNA directing the expression of the cDNA for SARS-CoV S.
EXAMPLE 13
Diagnostic Applications of the S Protein
[0498] The ELISA reactivity of the recombinant Ssol polypeptide was analyzed with respect to sera from patients suffering from SARS.
[0499] The sera from probable cases of SARS tested were chosen on the basis of the results (positive or negative) of analysis of their specific reactivity toward the native antigens of SARS-CoV by immunofluorescence test on VeroE6 cells infected with SARS-CoV and/or by indirect ELISA test using as antigen a lysate of VeroE6 cells infected with SARS-CoV. The sera of these patients are identified by a serial number of the National Reference Center for Influenza Viruses and by the initials of the patient and the number of days elapsed since the onset of the symptoms. All the sera of probable cases (cf. Table XII) recognize the native antigens of SARS-CoV, with the exception of the serum 032552 of the patient VTT for whom infection with SARS-CoV could not be confirmed by RT-PCR performed on respiratory samples of days 3, 8 and 12. A panel of control sera was used as control (TV sera): they are sera collected in France before the SARS epidemic that occurred in 2003.
TABLE-US-00022 TABLE XII Sera of probable cases of SARS Sample collection Serum Patient day 031724 JYK 7 033168 JYK 38 033597 JYK 74 032632 NTM 17 032634 THA 15 032541 PHV 10 032542 NIH 17 032552 VTT 8 032633 PTU 16 032791 JLB 3 033258 JLB 27 032703 JCM 8 033153 JCM 29
[0500] Solid phases sensitized with the recombinant Ssol polypeptide were prepared by adsorption of a solution of purified Ssol polypeptide at 2 μg/ml in PBS in the wells of an ELISA plate, and then the plates are incubated overnight at 4° C. and washed with PBS-Tween buffer (PBS, 0.1% Tween 20). After saturating the ELISA plates with a solution of PBS-10% skimmed milk (weight/volume) and washing in PBS-Tween, the sera to be tested (100 μl) are diluted 1/400 in PBS skimmed milk-Tween buffer (PBS, 3% skimmed milk, 0.1% Tween) and then added to the wells of the sensitized ELISA plate. The plates are incubated for 1 h at 37° C. After 3 washings with PBS-Tween buffer, the anti-human IgG conjugate labeled with peroxidase (ref. NA933V, Amersham) diluted 1/4000 in PBS-skimmed milk-Tween buffer is added, and then the plates are incubated for 1 hour at 37° C. After 6 washings with PBS-Tween buffer, the chromogen (TMB) and the substrate (H2O2) are added and the plates are incubated for 10 minutes protected from light. The reaction is stopped by adding a 1 N H3PO4 solution, and then the absorbance is measured at 450 nm with a reference at 620 nm.
[0501] The ELISA tests (FIG. 30) demonstrate that the recombinant Ssol polypeptide is specifically recognized by the serum antibodies of patients suffering from SARS collected at the medium or late phase of infection (≧10 days after the onset of the symptoms) whereas it is not significantly recognized by the serum antibodies of 2 patients (JLB and JCM) collected in the early phase of infection (3 to 8 days after the onset of the symptoms) or by control sera of subjects not suffering from SARS. The serum antibodies of patients JLB and JCM show a seroconversion between days 3 and 27 for the first and 8 and 29 for the second after the onset of the symptoms, which confirms the specificity of the reactivity of these sera toward the Ssol polypeptide.
[0502] In conclusion, these results demonstrate that the recombinant Ssol polypeptide may be used as an antigen for the development of an ELISA test for serological diagnosis of infection with SARS-CoV.
EXAMPLE 14
Vaccine Applications of the Recombinant Soluble S Protein
[0503] The immunogenicity of the recombinant Ssol polypeptide was studied in mice.
[0504] For that, a group of 6 mice was immunized at 3 weeks' interval with 10 μg of recombinant Ssol polypeptide adjuvanted with 1 mg of aluminum hydroxide (Alu-gel-S, Serva) diluted in PBS. Three successive immunizations were performed and the immune sera were collected 3 weeks after each of the immunizations (IS1, IS2, IS3). As a control, a group of mice (mock group) received aluminum hydroxide alone according to the same protocol.
[0505] The immune sera were analyzed per pool for each of the 2 groups by indirect ELISA using a lysate of VeroE6 cells infected with SARS-CoV as antigen and as a control a lysate of noninfected VeroE6 cells. The anti-SARS-CoV antibody titers are calculated as the reciprocal of the dilution producing a specific OD of 0.5 after visualization with an anti-mouse IgG(H+L) polyclonal antibody coupled with peroxidase (NA931V, Amersham) and TMB supplemented with H2O2 (KPL). This analysis (FIG. 31) shows that the immunization with the Ssol polypeptide induces in mice, from the first immunization, antibodies directed against the native form of the SARS-CoV spicule protein present in the lysate of infected VeroE6 cells. After 2 then 3 immunizations, the anti-S antibody titers become very high.
[0506] The immune sera were analyzed per pool for each of the two groups for their capacity to seroneutralize the infectivity of SARS-CoV. 4 points of seroneutralization on FRhK-4 cells (100 TCID50 of SARS-CoV) are produced for each of the 2-fold dilutions tested from 1/20. The seroneutralizing titer is calculated according to the Reed and Munsch method as the reciprocal of the dilution neutralizing the infectivity of 2 wells out of 4. This analysis shows that the antibodies induced in mice by the Ssol polypeptide are neutralizing: the titers observed are very high after 2 and then 3 immunizations (greater than 2560 and 5120 respectively, table XIII).
TABLE-US-00023 TABLE XIII Induction of antibodies directed against SARS-CoV after immunization with the recombinant Ssol polypeptide. Group Sera Neutralizing Ab Mock pi <20 IS1 <20 IS2 <20 IS3 <20 Ssol pi <20 IS1 57 IS2 >2560 IS3 >5120 The immune sera were analyzed per pool for each of the two groups for their capacity to seroneutralize the infectivity of 100 TCID50 of SARS-CoV on FRhK-4 cells. 4 points are produced for each of the 2-fold dilutions tested from 1/20. The seroneutralizing titer is calculated according to the Reed and Munsch method as the reciprocal of the dilution neutralizing the infectivity of 2 wells out of 4.
[0507] The neutralizing titers observed in mice immunized with the Ssol polypeptide reach levels far greater than the titers observed by Yang et al. in mice (2004, Nature, 428:561-564) and those observed by Buchholz in the hamster (2004, PNAS 101:9804-9809) which protect respectively mice and hamsters from infection with SARS-CoV. It is therefore probable that the neutralizing antibodies induced in mice after immunization with the Ssol polypeptide protect these animals against infection with SARS-CoV.
EXAMPLE 15
Optimized Synthetic Gene for the Expression in Mammalian Cells of the SARS-Associated Coronavirus (SARS-CoV) Spicule (S) Protein
1) Design of the Synthetic Gene
[0508] A synthetic gene encoding the SARS-CoV spicule protein was designed from the gene of the isolate 031589 (plasmid pSARS-S, C.N.C.M. No. I-3059) so as to allow high levels of expression in mammalian cells and in particular in cells of human origin.
[0509] For that: [0510] the use of codons of the wild-type gene of the isolate 031589 was modified so as to become close to the bias observed in humans and to improve the efficiency of translation of the corresponding mRNA [0511] the overall GC content of the gene was increased so as to extend the half-life of the corresponding mRNA [0512] the optionally cryptic motifs capable of interfering with an efficient expression of the gene were deleted (splice donor and acceptor sites, polyadenylation signals, sequences very rich (>80%) or very low (<30%) in GC, repeat sequences, sequences involved in the formation of secondary RNA structures, TATA boxes) [0513] a second STOP codon was added to allow efficient termination of translation.
[0514] In addition, CpG motifs were introduced into the gene so as to increase its immunogenicity as DNA vaccine. In order to facilitate the manipulation of the synthetic gene, two BamH1 and Xho1 restriction sites were placed on either side of the open reading frame of the S protein, and the BamH1, Xho1, Nhe1, Kpn1, BspE1 and Sal1 restriction sites were avoided in the synthetic gene.
[0515] The sequence of the synthetic gene designed (gene 040530) is given in SEQ ID No: 140.
[0516] An alignment of the synthetic gene 040530 with the sequence of the wild-type gene of the isolate 031589 of SARS-CoV deposited at the C.N.C.M. under the number I-3059 (SEQ ID No: 4, plasmid pSRAS-S) is presented in FIG. 32.
2) Plasmid Constructs
[0517] The synthetic gene SEQ ID No: 140 was assembled from synthetic oligonucleotides and cloned between the Kpn1 and Sac1 sites of the plasmid pUC-Kana in order to give the plasmid 040530pUC-Kana. The nucleotide sequence of the insert of the plasmid 040530pUC-Kana was verified by automated sequencing (Applied).
[0518] A Kpn1-Xho1 fragment containing the synthetic gene 040530 was excised from the plasmid 040530pUC-Kana and subcloned between the Nhe1 and Xho1 sites of the expression plasmic pCI (Promega) in order to obtain the plasmid pCI-SSYNTH, deposited at the CNCM on Dec. 1, 2004, under the number I-3333.
[0519] A synthetic gene encoding the soluble form of the S protein was then obtained by fusing the synthetic sequences encoding the ectodomain of the S protein (amino acids 1 to 1193) with those of the tag (FLAG:DYKDDDDK) via a linker BspE1 encoding the dipeptide SG. Practically, a DNA fragment encoding the ectodomain of the SARS-CoV S was amplified by PCR with the aid of the oligonucleotides 5'-ACTAGCTAGC GGATCCACCATGTTCATCTT CCTG-3' and 5'-AGTATCCGGAC TTG ATGTACT GCTCGTACTTGC-3' from the plasmid 040530pUC: Kana, digested with Nhe1 and BspE1 and then inserted between the unique Nhe1 and BspE1 sites of the plasmid pCI-Ssol, to give the plasmid pCI-SCUBE, deposited at the CNCM on Dec. 1, 2004, under the number I-3332. The plasmids pCI-Ssol, pCI-Ssol-CTE, and pCI-Ssol-WPRE (deposited at the CNCM, on Nov. 22, 2004, under the number I-3324) had been previously obtained by subcloning the Kpn1-Xho1 fragment excised from the plasmid pcDNA-Ssol (see technical note of DI 2004-106) between the Nhe1 and Xho1 sites of the plasmids pCI, pCI-S-CTE and pCI-S-WPRE respectively.)
[0520] The plasmids pCI-Scube and pCI-Ssol encode the same recombinant Ssol polypeptide.
3) Results
[0521] The capacity of the synthetic gene encoding the S protein to efficiently direct the expression of the SARS-CoV S in mammalian cells was compared with that of the wild-type gene after transient transfection of primate cells (VeroE6) and of human cells (293T).
[0522] In the experiment presented in FIG. 33 and in table XIV, monolayers of 5×105 VeroE6 cells or 7×105 293T cells in 35 mm Petri dishes were transfected with 2 μg of plasmids pCI (as control), pCI-S, pCI-S-CTE, pCI-S-WPRE and pCI-S-Ssynth and 6 μl of Fugene6 reagent according to the manufacturer's instructions (Roche). After 48 hours of incubation at 37° C. and under 5% CO2, cell extracts were prepared in loading buffer according to Laemmli, separated on 8% SDS polyacrylamide gel and then transferred onto a PVDF membrane (BioRad). The detection of this immunoblot (Western blot) was carried out with the aid of an anti-S rabbit polyclonal serum (immune serum of the rabbit P11135: cf example 4 above) and of donkey polyclonal antibodies directed against rabbit IgGs and coupled with peroxidase (NA934V, Amersham). The immunoblot was quantitatively visualized by luminescence with the aid of the ECL+ kit (Amersham) and acquisition on a digital imaging device (Fluor S, BioRad).
[0523] The analysis of the results obtained with the software QuantityOne v4.2.3 (BioRad) shows that in this experiment, the plasmid pCI-Synth allows the transient expression of the S protein at high levels in the VeroE6 and 293T cells, whereas the plasmid pCI-S does not make it possible to induce expression at sufficient levels to be detected. The expression levels observed are of the order of twice as high as those observed with the plasmid pCI-S-WPRE.
TABLE-US-00024 TABLE XIV Use of a synthetic gene for the expression of the SARS-CoV S. Plasmid VeroE6 293T pCI 0.0 0.0 pCI-S ≦0.1 ≦0.1 pCI-S-CTE 0.5 ≦0.1 pCI-S-WPRE 1.0 1.0 pCI-Ssynth 1.8 1.9 Cell extracts prepared 48 hours after transfection of VeroE6 or 293T cells with the plasmids pCI, pCI-S, pCI-S-CTE, pCI-S-WPRE and pCI-S-Ssynth were separated on 8% SDS acrylamide gel and analyzed by Western blotting with the aid of an anti-S rabbit polyclonal antibody and an anti-rabbit IgG(H + L) polyclonal antibody coupled with peroxidase (NA934V, Amersham). The Western blot is visualized by luminescence (ECL+, Amersham) and acquisition on a digital imaging device (FluorS,BioRad). The expression levels of the S protein were measured by quantifying the two predominant bands identified on the image (see FIG. 33) and are indicated according to an arbitrary scale where the value 1 represents the level measured after transfection of the plasmid pCI-S-WPRE.
[0524] In a second instance, the capacity of the synthetic gene Scube to efficiently direct the synthesis and the secretion of the Ssol polypeptide by mammalian cells was compared with that of the wild-type gene after transient transfection of hamster cells (BHK-21) and of human cells (293T).
[0525] In the experiment presented in table XV, monolayers of 6×105 BHK-21 cells and 7×105 293T cells in 35 mm Petri dishes were transfected with 2 μg of plasmids pCI (as control), pCI-Ssol, pCI-Ssol-CTE, pCI-Ssol-WPRE and pCI-Scube and 6 μl of Fugene6 reagent according to the manufacturer's instructions (Roche). After 48 hours of incubation at 37° C. and under 5% CO2, the cellular supernatants were collected and quantitatively analyzed for the secretion of the Ssol polypeptide by a capture ELISA test specific for the Ssol polypeptide.
[0526] Analysis of the results shows that, in this experiment, the plasmid pCI-Scube allows the expression of the Ssol polypeptide at levels 8 times (BHK-21 cells) to 20 times (293T cells) higher than the plasmid pCI-Ssol.
[0527] The levels of expression observed are of the order of twice (293T cells) to 5 times (BHK-21 cells) as high as those observed with the plasmid pCI-Ssol-WPRE.
TABLE-US-00025 TABLE XV Use of a synthetic gene for the expression of the Ssol polypeptide. Plasmid BHK 293T pci <20 <20 pCI-Ssol <20 56 ± 10 pCI-Ssol-CTE <20 63 ± 8 pCI-Ssol-WPRE 28 ± 1 531 ± 15 pCI-Scube 152 ± 6 1140 ± 20 The supernatants were harvested 48 hours after transfection of BHK or 293T cells with the plasmids pCI, pCI-Ssol, pCI-Ssol-CTE, pCI-Ssol-WPRE and pCI-Scube and quantitatively analyzed for the secretion of the Ssol polypeptide by an ELISA test specific for the Ssol polypeptide. The transfections were carried out in duplicate and the results are presented in the form of means and standard deviations of the concentrations of Ssol polypeptide (ng/ml) measured in the supernatants.
[0528] In summary, these results show that the expression, in mammalian cells, of the synthetic gene 040530 encoding SARS-CoV S under the control of RNA polymerase II promoter sequences is much more efficient than that of the wild-type gene of the 031589 isolate. This expression is even more efficient than that directed by the wild-type gene in the presence of the WPRE sequences of the woodchuck hepatitis virus.
4) Applications
[0529] The use of the synthetic gene 040530 encoding SARS-CoV S or its Scube variant encoding the polypeptide Ssol is capable of advantageously replacing the wild-type gene in numerous applications where the expression of S is necessary at high levels. In particular in order to: [0530] improve the efficiency of gene immunization with plasmids of the pCI-Ssynth or even pCI-Ssynth-CTE or pCI-Ssynth-WPRE type [0531] establish novel cell lines expressing higher quantities of the S protein or of the Ssol polypeptide with the aid of recombinant lentiviral vectors carrying the Ssynth gene or the Scube gene respectively [0532] improve the immunogenicity of the recombinant lentiviral vectors allowing the expression of the S protein or of the Ssol polypeptide [0533] improve the immunogenicity of live vectors allowing the expression of the S protein or of the Ssol polypeptide like recombinant vaccinia viruses or recombinant measles viruses (see examples 16 and 17 below)
EXAMPLE 16
Expression of the SARS-Associated Coronavirus (SARS-CoV) Spicule (S) Protein with the Aid of Recombinant Vaccinia Viruses
Vaccine Application
Application to the Production of a Soluble Form of the Spicule (S) Protein and Design of a Serological Test for SARS
1) Introduction
[0534] The aim of this example is to evaluate the capacity of recombinant vaccinia viruses (VV) expressing various SARS-associated coronavirus (SARS-CoV) antigens to constitute novel vaccine candidates against SARS and a means of producing recombinant antigens in mammalian cells.
[0535] For that, the inventors focused on the SARS-CoV spicule (S) protein which makes it possible to induce, after gene immunization in animals, antibodies neutralizing the infectivity of SARS-CoV, and a soluble and secreted form of this protein, the Ssol polypeptide, which is composed of the ectodomain (aa 1-1193) of S fused at its C-ter end with a tag FLAG (DYKDDDDK) via a BspE1 linker encoding the SG dipeptide. This Ssol polypeptide exhibits an antigenicity similar to that of the S protein and allows, after injection into mice in the form of a purified protein adjuvanted with aluminum hydroxide, the induction of high neutralizing antibody titers against SARS-CoV.
[0536] The various forms of the S gene were placed under the control of the promoter of the 7.5K gene and then introduced into the thymidine kinase (TK) locus of the Copenhagen strain of the vaccinia virus by double homologous recombination in vivo. In order to improve the immunogenicity of the recombinant vaccinia viruses, a synthetic late promoter was chosen in place of the 7.5K promoter, in order to increase the production of S and Ssol during the late phases of the viral cycle.
[0537] After having isolated the recombinant vaccinia viruses and verified their capacity to express the SARS-CoV S antigen, their capacity to induce in mice an immune response against SARS was tested. After having purified the Ssol antigen from the supernatant of infected cells, an ELISA test for serodiagnosis of SARS was designed, and its efficiency was evaluated with the aid of sera from probable cases of SARS.
2) Construction of the Recombinant Viruses
[0538] Recombinant vaccinia viruses directing the expression of the S glycoprotein of the 031589 isolate of SARS-CoV and of a soluble and secreted form of this protein, the Ssol polypeptide, under the control of the 7.5K promoter were obtained. With the aim of increasing the levels of expression of S and Ssol, recombinant viruses in which the cDNAs for S and for Ssol are placed under the control of a late synthetic promoter were also obtained.
[0539] The plasmid pTG186poly is a transfer plasmid for the construction of recombinant vaccinia viruses (Kieny, 1986, Biotechnology, 4:790-795). As such, it contains the VV thymidine kinase gene into which the promoter of the 7.5K gene has been inserted followed by a multiple cloning site allowing the insertion of heterologous genes (FIG. 34A). The promoter of the 7.5K gene in fact contains a tandem of two promoter sequences that are respectively active during the early (PE) and late (PL) phases of the vaccinia virus replication cycle. The BamH1-Xho1 fragments were excised from the plasmids pTRIP-S and pcDNA-Ssol respectively and inserted between the BamH1 and Sma1 sites of the plasmid pTG186poly in order to give the plasmids pTG-S and pTG-Ssol (FIG. 34A). The plasmids pTG-S and pTG-Ssol were deposited at the CNCM, on Dec. 2, 2004, under the numbers I-3338 and I-3339, respectively.
[0540] The plasmids pTN480, pTN-S and pTN-Ssol were obtained from the plasmids pTG186poly, pTG-S and pTG-Ssol respectively, by substituting the Nde1-Pst1 fragment containing the 7.5K promoter by a DNA fragment containing the synthetic late promoter 480, which was obtained by hybridization of the oligonucleotides 5'-TATGAGCTTT TTTTTTTTTT TTTTTTTGGC ATATAAATAG ACTCGGCGCG CCATCTGCA-3' and 5'-GATGGCGCGCCGAGTCTATT TATATGCCAA AAAAAAAAAA AAAAAAAAGC TCA-3' (FIG. 34B). The insert was sequenced with the aid of a BigDye Terminator v1.1 kit (Applied Biosystems) and an automated sequencer ABI377. The sequence of the late synthetic promoter 480 as cloned into the transfer plasmids of the pTN series is indicated in FIG. 34C. The plasmids pTN-S and pTN-Ssol were deposited at the CNCM, on Dec. 2, 2004, under the numbers I-3340 and I-3341, respectively.
[0541] The recombinant vaccinia viruses were obtained by double homologous recombination in vivo between the TK cassette of the transfer plasmids of the series pTG and pTN and the TK gene of the Copenhagen strain of the vaccinia virus according to a procedure described by Kieny et al. (1984, Nature, 312:163-166). Briefly, CV-1 cells are transfected with the aid of DOTAP (Roche) with genomic DNA of the Copenhagen strain of the vaccinia virus and each of the transfer plasmids of the pTG and pTN series described above, and then superinfected with the helper vaccinia virus VV-ts7 for 24 hours at 33° C. The helper virus is counter-selected by incubation at 40° C. for 2 days and then the recombinant viruses (TK-phenotype) selected by two cloning cycles under agar medium on 143Btk-cells in the presence of BuDr (25 μg/ml). The 6 viruses VV-TG, VV-TG-S, VV-TG-Ssol, VV-TN, VV-TN-S, and VV-TN-Ssol are respectively obtained with the aid of the transfer plasmids pTG186poly, pTG-S, pTG-Ssol, pTN480, pTN-S, pTN-Ssol. The viruses VV-TG and VV-TN do not express any heterologous gene and were used as TK-control in the experiments. The preparations of recombinant viruses were performed on monolayers of CV-1 or BHK-21 cells and the titer in plaque forming units (p.f.u) determined on CV-1 cells according to Earl and Moss (1998, Current Protocols in Molecular Biology, 16.16.1-16.16.13).
3) Characterization of the Recombinant Viruses
[0542] The expression of the transgenes encoding the S protein and the Ssol polypeptide was assessed by Western blotting.
[0543] Monolayers of CV-1 cells were infected at a multiplicity of 2 with various recombinant vaccinia viruses VV-TG, VV-TG-S, VV-TG-Ssol, VV-TN, VV-TN-S and VV-TN-Ssol. After 18 hours of incubation at 37° C. and under 5% CO2, cellular extracts were prepared in loading buffer according to Laemmli, separated on 8% SDS polyacrylamide gel and then transferred onto a PVDF membrane (BioRad). The detection of this immunoblot (Western blot) was performed with the aid of an anti-S rabbit polyclonal serum (immune serum from the rabbit P11135: cf. example 4) and donkey polyclonal antibodies directed against rabbit IgGs and coupled with peroxidase (NA934V, Amersham). The bound antibodies were visualized by luminescence with the aid of the ECL+ kit (Amersham) and autoradiography films Hyperfilm MP (Amersham).
[0544] As shown in FIG. 35A, the recombinant virus VV-TN-S directs the expression of the S protein at levels which are comparable to those which can be observed 8 h after infection with SARS-CoV but which are much higher than those which can be observed after infection with VV-TG-S. In a second experiment (FIG. 35B), the analysis of variable quantities of cellular extracts shows that the levels of expression observed after infection with viruses of the TN series (VV-TN-S and VV-TN-Ssol) are about 10 times as high as those observed with the viruses of the TG series (VV-TG-S and VV-TG-Ssol, respectively). In addition, the Ssol polypeptide is secreted into the supernatant of CV-1 cells infected with the VV-TN-Ssol virus more efficiently than in the supernatant of cells infected with VV-TG-Ssol (FIG. 36A). In this experiment, the VV-TN-Sflag virus was used as a control because it expresses the membrane form of the S protein fused at its C-ter end with the FLAG tag. The Sflag protein is not detected in the supernatant of cells infected with VV-TN-Sflag, demonstrating that the Ssol polypeptide is indeed actively secreted after infection with VV-TN-Ssol.
[0545] These results demonstrate that the recombinant vaccinia viruses are indeed carriers of the transgenes and allow the expression of the SRAS glycoprotein in its membrane form (S) or in a soluble or secreted form (Ssol). The vaccinia viruses carrying the synthetic promoter 480 allow the expression of S and the secretion of Ssol at levels much higher than the viruses carrying the promoter of the 7.5K gene.
4) Application to the Production of a Soluble Form of SARS-CoV S. Purification of this Recombinant Antigen and Diagnostic Applications
[0546] The BHK-21 line is the cell line which secretes the highest quantities of Ssol polypeptide after infection with the VV-TN-Ssol virus among the lines tested (BHK-21, CV1, 293T and FrhK-4, FIG. 36B); it allows the quantitative production and purification of the recombinant Ssol polypeptide. In a typical experiment where the experimental conditions for infection, production and purification were optimized, the BHK-21 cells are inoculated in standard culture medium (pyruvate-free DMEM containing 4.5 g/l of glucose and supplemented with 5% TPB, 5% FCS, 100 U/ml of penicillin and 100 μg/ml of streptomycin) in the form of a subconfluent monolayer (10 million cells for each 100 cm2 in 25 ml of medium). After 24 h of incubation at 37° C. under 5% CO2, the cells are infected at an M.O.I. of 0.03 and the standard medium replaced with the secretion medium where the quantity of FCS is reduced to 0.5% and the TPB eliminated. The culture supernatant is removed after 2.5 days of incubation at 35° C. and under 5% CO2 and the vaccinia virus inactivated by addition of Triton X-100 (0.1%). After filtration on 0.1 μm polyethersulfone (PES) membrane, the recombinant Ssol polypeptide is purified by affinity chromatography on an anti-FLAG matrix with elution with a solution of FLAG peptide (DYKDDDDK) at 100 μg/ml in TBS (50 mM Tris, pH 7.4, 150 mM NaCl).
[0547] The analysis by 8% SDS acrylamide gel stained with silver nitrate identified a predominant polypeptide whose molecular mass is about 180 kD and whose degree of purity is greater than 90% (FIG. 37). The concentration of the purified Ssol recombinant polypeptide was determined by comparison with molecular mass markers and estimated at 24 ng/μl.
[0548] This purified Ssol polypeptide preparation makes it possible to produce a calibration series in order to measure, with the aid of a capture ELISA test, the Ssol concentrations present in the culture supernatants. According to this test, the BHK-21 line secretes about 1 μg/ml of Ssol polypeptide under the production conditions described above. In addition, the purification scheme presented makes it possible to purify of the order of 160 μg of Ssol polypeptide per liter of culture supernatant.
[0549] The ELISA reactivity of the recombinant Ssol polypeptide was analyzed toward sera from patients suffering from SARS.
[0550] The sera of probable cases of SARS tested were chosen on the basis of the results (positive or negative) of analysis of their specific reactivity toward the native antigens of SARS-CoV by immunofluorescence test on VeroE6 cells infected with SARS-CoV and/or by indirect ELISA test using, as antigen, a lysate of VeroE6 cells infected with SARS-CoV. The sera of these patients are identified by a serial number of the National Reference Center for Influenza Viruses and by the patient's initials and the number of days elapsed since the onset of the symptoms. All the sera of probable cases (cf. table XVI) recognize the native antigens of SARS-CoV with the exception of the serum 032552 of the patient VTT, for which infection with SARS-CoV could not be confirmed by RT-PCR performed on respiratory samples of days 3, 8 and 12. A panel of control sera was used as control (TV sera): they are sera collected in France before the SARS epidemic which occurred in 2003.
TABLE-US-00026 TABLE XVI Sera of probable cases of SARS Serum Patient Sample collection day 033168 JYK 38 033597 JYK 74 032632 NTM 17 032634 THA 15 032541 PHV 10 032542 NIH 17 032552 VTT 8 032633 PTU 16
[0551] Solid phases sensitized with the recombinant Ssol polypeptide were prepared by adsorption of a solution of purified Ssol polypeptide at 4 μg/ml in PBS in the wells of an ELISA plate. The plates are incubated overnight at 4° C. and then washed with PBS-Tween buffer (PBS, 0.1% Tween 20). After washing with PBS-Tween, the sera to be tested (100 μl) are diluted 1/100 and 1/400 in PBS-skimmed milk-Tween buffer (PBS, 3% skimmed milk, 0.1% Tween) and then added to the wells of the sensitized ELISA plate. The plates are then incubated for 1 h at 37° C. After 3 washings with PBS-Tween buffer, the anti-human IgG conjugate labeled with peroxidase (ref. NA933V, Amersham) diluted 1/4000 in PBS-skimmed milk-Tween buffer is added and then the plates are incubated for one hour at 37° C. After 6 washings with PBS-Tween buffer, the chromogen (TMB) and the substrate (H2O2) are added and the plates are incubated for 10 minutes protected from light. The reaction is stopped by adding a 1M solution of H3PO4 and then the absorbance is measured at 450 nm with a reference at 620 nm.
[0552] The ELISA tests (FIG. 38) demonstrate that the recombinant Ssol polypeptide is specifically recognized by the serum antibodies of patients suffering from SARS, collected at the middle or late phase of infection (≧10 days after the onset of the symptoms), whereas it is not significantly recognized by the serum antibodies of the control sera of subjects not suffering from SARS.
[0553] In conclusion, these results demonstrate that the recombinant Ssol polypeptide can be purified from the supernatant of mammalian cells infected with the recombinant vaccinia virus VV-TN-Ssol and can be used as antigen for developing an ELISA test for serological diagnosis of infection with SARS-CoV.
5. Vaccine Applications
[0554] The immunogenicity of the recombinant vaccinia viruses was studied in mice.
[0555] For that, groups of 7 BALB/c mice were immunized by the i.v. route twice at 4 weeks' interval with 106 p.f.u. of recombinant vaccinia viruses VV-TG, VV-TG-S, VV-TG-Ssol, VV-TN, VV-TN-S and VV-TN-Ssol and, as a control, VV-TG-HA which directs the expression of hemagglutinin of the A/PR/8/34 strain of the influenza virus. The immune sera were collected 3 weeks after each of the immunizations (IS1, IS2).
[0556] The immune sera were analyzed per pool for each of the groups by indirect ELISA using a lysate of VeroE6 cells infected with SARS-CoV as antigen and, as control, a lysate of noninfected VeroE6 cells. The anti-SARS-CoV antibody titers (TI) are calculated as the reciprocal of the dilution producing a specific OD of 0.5 after visualization with an anti-mouse IgG(H+L) polyclonal antibody coupled with peroxidase (NA931V, Amersham) and TMB supplemented with H2O2 (KPL). This analysis (FIG. 39A) shows that immunization with the virus VV-TG-S and VV-TN-S induces in mice, from the first immunization, antibodies directed against the native form of the SARS-CoV spicule protein present in the lysate of infected VeroE6 cells. The responses induced by the VV-TN-S virus are higher than those induced by the VV-TG-S virus after the first (TI=740 and TI=270 respectively) and the second (TI=3230 and TI=600 respectively) immunization. The VV-TN-Ssol virus induces high anti-SARS-CoV antibody titers after two immunizations (TI=640), whereas the virus VV-TG-Ssol induces a response at the detection limit (TI=40).
[0557] The immune sera were analyzed per pool for each of the groups for their capacity to seroneutralize the infectivity of SARS-CoV. 4 seroneutralization points on FRhK-4 cells (100 TCID50 of SARS-CoV) are produced for each of the 2-fold dilutions tested from 1/20. The seroneutralizing titer is calculated according to the Reed and Munsch method as the reciprocal of the dilution neutralizing the infectivity of 2 wells out of 4. This analysis shows that the antibodies induced in mice by the vaccinia viruses expressing the S protein or the Ssol polypeptide are neutralizing and that the viruses with synthetic promoters are more efficient immunogens than the viruses carrying the 7.5K promoter: the highest titers (640) are observed after 2 immunizations with the virus VV-TN-S (FIG. 39B).
[0558] The protective power of the neutralizing antibodies induced in mice after immunization with the recombinant vaccinia viruses is evaluated with the aid of a challenge infection with SARS-CoV.
6) Other Applications
[0559] Third generation recombinant vaccinia viruses are constructed by substituting the wild-type sequences of the S and Ssol genes by synthetic genes optimized for the expression in mammalian cells, described above. These recombinant vaccinia viruses are capable of expressing larger quantities of S and Ssol antigens and therefore of exhibiting increased immunogenicity.
[0560] The recombinant vaccinia virus VV-TN-Ssol can be used for the quantitative production and purification of the Ssol antigen for diagnostic (serology by ELISA) and vaccine (subunit vaccine) applications.
EXAMPLE 17
Recombinant Measles Virus Expressing the SARS-Associated Coronavirus (SARS-CoV) Spicule (S) Protein. Vaccine Applications.
1) Introduction
[0561] The measles vaccine (MV) induces a lasting protective immunity in humans after a single injection (Hilleman, 2002, Vaccine, 20: 651-665). The protection conferred is very robust and is based on the induction of an antibody response and of a CD4 and CD8 cell response. The MV genome is very stable and no reversion of the vaccine strains to virulence has ever been observed. The measles virus belongs to the genus Morbillivirus of the Paramyxoviridae family; it is an enveloped virus whose genome is a 16 kb single-stranded RNA of negative polarity (FIG. 40A) and whose exclusively cytoplasmic replication cycle excludes any possibility of integration into the genome of the host. The measles vaccine is thus one of the most effective and one of the safest live vaccines used in the human population. Frederic Tangy's team recently developed an expression vector on the basis of the Schwarz strain of the measles virus, which is the safest attenuated strain and the most widely used in humans as vaccine against measles. This vaccine strain may be isolated from an infectious molecular clone while preserving its immunogenicity in primates and in mice that are sensitive to the infection. It constitutes, after insertion of additional transcription units, a vector for the expression of heterologous sequences (Combredet, 2003, J. Virol. 77: 11546-11554). In addition, a recombinant MV Schwarz expressing the envelope glycoprotein of the West Nile virus (WNV) induces an effective and lasting antibody response which protects mice from a lethal challenge infection with WNV (Despres et al., 2004, J. Infect. Dis., in press). All these characteristics make the attenuated Schwarz strain of the measles virus an extremely promising candidate vector for the construction of novel recombinant live vaccines.
[0562] The aim of this example is to evaluate the capacity of recombinant measles viruses (MV) expressing various SARS-associated coronavirus (SARS-CoV) antigens to constitute novel candidate vaccines against SARS.
[0563] The inventors focused on the SARS-CoV spicule (S) protein, which makes it possible to induce, after gene immunization in animals, antibodies neutralizing the infectivity of SARS-CoV, and on a soluble and secreted form of this protein, the Ssol polypeptide, which is composed of the ectodomain (aa 1-1193) of S fused at its C-ter end with a FLAG tag (DYKDDDDK) via a BspE1 linker encoding the SG dipeptide. This Ssol polypeptide exhibits a similar antigenicity to that of the S protein and allows, after injection into mice in the form of a purified protein adjuvanted with aluminum hydroxide, the induction of high neutralizing antibody titers against SARS-CoV.
[0564] The various forms of the S gene were introduced in the form of an additional transcription unit between the P (phosphoprotein) and M (matrix) genes into the cDNA of the Schwarz strain of MV previously described (Combredet, 2003, J. Virol. 77: 11546-11554; EP application No. 02291551.6 of Jun. 20, 2002, and EP application No. 02291550.8 of Jun. 20, 2002). After having isolated the recombinant viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol and checked their capacity to express the SARS-CoV S antigen, their capacity to induce a protective immune response against SARS in mice and then in monkeys was tested.
2) Construction of the Recombinant Viruses
[0565] The plasmid pTM-MVSchw-ATU2 (FIG. 40B) contains an infectious cDNA corresponding to the antigenome of the Schwarz vaccine strain of the measles virus (MV) into which an additional transcription unit (ATU) has been introduced between the P (phosphoprotein) and M (matrix) genes (Combredet, 2003, Journal of Virology, 77: 11546-11554). Recombinant genomes MVSchw2-SARS-S and MVSchw2-SARS-Ssol of the measles virus were constructed by inserting ORFs of the S protein and of the Ssol polypeptide into the additional transcription unit of the MVSchw-ATU2 vector.
[0566] For that, a DNA fragment containing the SARS-CoV S cDNA was amplified by PCR with the aid of the oligonucleotides 5'-ATACGTACGA CCATGTTTAT TTTCTTATTA TTTCTTACTC TCACT-3' and 5'-ATAGCGCGCT CATTATGTGT AATGTAATTT GACACCCTTG-3' using the plasmid pcDNA-S as template and then inserted into the plasmid pCR®2.1-TOPO (Invitrogen) in order to obtain the plasmid pTOPO-S-MV. The two oligonucleotides used contain restriction sites BsiW1 and BssHII, so as to allow subsequent insertion into the measles vector, and were designed so as to generate a sequence of 3774 nt including the codons for initiation and termination, so as to observe the rule of 6 which stipulates that the length of the genome of a measles virus must be divisible by 6 (Calain & Roux, 1993, J. Virol., 67: 4822-4830; Schneider et al., 1997, Virology, 227: 314-322). The insert was sequenced with the aid of a BigDye Terminator v1.1 kit (Applied Biosystems) and an automated sequencer ABI377.
[0567] To express a soluble and secreted form of SARS-CoV S, a plasmid containing the cDNA of the Ssol polypeptide corresponding to the ectodomain (aa 1-1193) of SARS-CoV S fused at its C-ter end with the sequence of a FLAG tag (DYKDDDDK) via a BspE1 linker encoding the SG dipeptide was then obtained. For that, a DNA fragment was amplified with the aid of the oligonucleotides 5'-CCATTTCAAC AATTTGGCCG-3' and 5'-ATAGGATCCG CGCGCTCATT ATTTATCGTC GTCATCTTTA TAATC-3' from the plasmid pcDNA-Ssol and then inserted into the plasmid pTOPO-S-MV between the Sal1 and BamH1 sites in order to obtain the plasmid pTOPO-S-MV-SF. The sequence generated is 3618 nt long between the BsiW1 and BssHII sites and observes the rule of 6. The insert was sequenced as indicated above.
[0568] The BsiW1-BssHII fragments containing the cDNAs for the S protein and the Ssol polypeptide were then excised by digestion of the plasmids pTOPO-S-MV and pTOPO-S-MV-SF and then subcloned between the corresponding sites of the plasmid pTM-MVSchw-ATU2 in order to give the plasmids pTM-MVSchw2-SARS-S and pTM-MVSchw2-SARS-Ssol (FIG. 40B). These two plasmids were deposited at the C.N.C.M. on Dec. 1, 2004, under the numbers I-3326 and I-3327, respectively.
[0569] The recombinant measles viruses corresponding to the plasmids pTM-MVSchw2-SARS-S and pTM-MVSchw2-SARS-Ssol were obtained by reverse genetics according to the system based on the use of a helper cell line, described by Radecke et al. (1995, Embo J., 14: 5773-5784) and modified by Parks et al. (1999, J. Virol., 73: 3560-3566). Briefly, the helper cells 293-3-46 are transfected according to the calcium phosphate method with 5 μg of the plasmids pTM-MVSchw2-SARS-S or pTM-MVSchw2-SARS-Ssol and 0.02 μg of the plasmid pEMC-La directing the expression of the MV L polymerase (gift from M. A. Billeter). After incubating overnight at 37° C., a heat shock is produced for 2 hours at 43° C. and the transfected cells are transferred onto a monolayer of Vero cells. For each of the two plasmids, syncytia appeared after 2 to 3 days of coculture and were transferred successively onto monolayers of Vero cells at 70% confluence in 35 mm Petri dishes and then in 25 and 75 cm2 flasks. When the syncytia have reached 80-90% confluence, the cells are recovered with the aid of a scraper and then frozen and thawed once. After low-speed centrifugation, the supernatant containing the virus is stored in aliquots at -80° C. The titers of the recombinant viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol were determined by limiting dilution on Vero cells and the titer as dose infecting 50% of the wells (TCID50) calculated according to the Karber method.
3) Characterization of the Recombinant Viruses
[0570] The expression of the transgenes encoding the S protein and the Ssol polypeptide was assessed by Western blotting and immunofluorescence.
[0571] Monolayers of Vero cells in T-25 flasks were infected at a multiplicity of 0.05 by various passages of the two viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol and the wild-type virus MWSchw as a control. When the syncytia had reached 80 to 90% confluence, cytoplasmic extracts were prepared in an extraction buffer (150 mM NaCl, 50 mM Tris-HCl, pH 7.2, 1% Triton X-100, 0.1% SDS, 1% DOC) and then diluted in loading buffer according to Laemmli, separated on 8% SDS polyacrylamide gel and transferred onto a PVDF membrane (BioRad). The detection of this immunoblot (Western blot) was carried out with the aid of an anti-S rabbit polyclonal serum (immune serum of the rabbit P11135: cf. example 4 above) and donkey polyclonal antibodies directed against rabbit IgGs and coupled with peroxidase (NA934V, Amersham). The bound antibodies were visualized by luminescence with the aid of the ECL+ kit (Amersham) and Hyperfilm MP autoradiography films (Amersham).
[0572] Vero cells in monolayers on glass slides were infected with the two viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol and the wild-type virus MWSchw as a control at multiplicities of infection of 0.05. When the syncytia had reached 90 to 100% (MVSchw2-SARS-Ssol virus) or 30 to 40% (MVSchw2-SARS-S, MWSchw) confluence, the cells were fixed in a 4% PBS-PFA solution, permeabilized with a PBS solution containing 0.2% Triton and then labeled with rabbit polyclonal antibodies hyperimmunized with purified and inactivated SARS-CoV virions and with an anti-rabbit IgG(H+L) goat antibody conjugate coupled with FITC (Jackson).
[0573] As shown in FIGS. 41 and 42, the recombinant viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol direct the expression of the S protein and the Ssol polypeptide respectively at levels comparable to those which can be observed 8 h after infection with SARS-CoV. The expression of these polypeptides is stable after 3 passages of the recombinant viruses in cell culture. These results demonstrate that the recombinant measles viruses are indeed carriers of the transgenes and allow the expression of the SARS glycoprotein in its membrane form (S) or in a soluble form (Ssol). The Ssol polypeptide is expected to be secreted by cells infected with the MVSchw2-SARS-Ssol virus as is the case when this same polypeptide is expressed in mammalian cells after transient transfection of the corresponding sequences (cf. example 11 above).
4) Applications
[0574] Having shown that the viruses MVSchw2-SARS-S and MVSchw2-SARS-Ssol allow the expression of the SARS-CoV S, their capacity to induce a protective immune response against SARS-CoV in CD46.sup.+/- IFN- αβR.sup.-/- mice, which is sensitive to infection by MV, is evaluated. The antibody response of the immunized mice is evaluated by ELISA test against the native antigens of SARS-CoV and for their capacity to neutralize the infectivity of SARS-CoV in vitro, using the methodologies described above. The protective power of the response will be evaluated by measuring the reduction in the pulmonary viral load 2 days after a nonlethal challenge infection with SARS-CoV.
[0575] Second generation recombinant measles viruses are constructed by substituting the wild-type sequences of the S and Sol genes by synthetic genes optimized for expression in mammalian cells, described in example 15 above. These recombinant measles viruses are capable of expressing larger quantities of the S and Ssol antigens and therefore of exhibiting increased immunogenicity.
[0576] Alternatively, the wild-type or synthetic genes encoding the S protein or the Ssol polypeptide may be inserted into the measles vector MVSchw-ATU3 in the form of an additional transcription unit located between the H and L genes, and then the recombinant viruses produced and characterized in a similar manner. This insertion is capable of generating recombinant viruses possessing different characteristics (multiplication of the virus, level of expression of the transgene) and possibly an improved immunogenicity compared with those obtained after insertion of the transgenes between the P and N genes.
[0577] The recombinant measles virus MVSchw2-SARS-Ssol may be used for the quantitative production and the purification of the Ssol antigen for diagnostic and vaccine applications.
EXAMPLE 18
Other Applications Linked to the S Protein
[0578] a) The lentiviral vectors allowing the expression of S or Ssol (or even of fragments of S) can constitute a recombinant vaccine against SARS-CoV, to be used in human or veterinary prophylaxis. In order to demonstrate the feasibility of such a vaccine, the immunogenicity of the recombinant lentiviral vectors TRIP-SD/SA-S-WPRE and TRIP-SD/SA-Ssol-WPRE is studied in mice. b) Monoclonal antibodies are produced with the aid of the recombinant Ssol polypeptide. According to the results presented in example 14 above, these antibodies or at least the majority of them will recognize the native form of the SARS-CoV S and will be capable of diagnostic and/or prophylactic applications. c) A serological test for SARS is developed with the Ssol polypeptide used as antigen and the double epitope methodology.
Sequence CWU
1
158129746DNACORONAVIRUS 1atattaggtt tttacctacc caggaaaagc caaccaacct
cgatctcttg tagatctgtt 60ctctaaacga actttaaaat ctgtgtagct gtcgctcggc
tgcatgccta gtgcacctac 120gcagtataaa caataataaa ttttactgtc gttgacaaga
aacgagtaac tcgtccctct 180tctgcagact gcttacggtt tcgtccgtgt tgcagtcgat
catcagcata cctaggtttc 240gtccgggtgt gaccgaaagg taagatggag agccttgttc
ttggtgtcaa cgagaaaaca 300cacgtccaac tcagtttgcc tgtccttcag gttagagacg
tgctagtgcg tggcttcggg 360gactctgtgg aagaggccct atcggaggca cgtgaacacc
tcaaaaatgg cacttgtggt 420ctagtagagc tggaaaaagg cgtactgccc cagcttgaac
agccctatgt gttcattaaa 480cgttctgatg ccttaagcac caatcacggc cacaaggtcg
ttgagctggt tgcagaaatg 540gacggcattc agtacggtcg tagcggtata acactgggag
tactcgtgcc acatgtgggc 600gaaaccccaa ttgcataccg caatgttctt cttcgtaaga
acggtaataa gggagccggt 660ggtcatagct atggcatcga tctaaagtct tatgacttag
gtgacgagct tggcactgat 720cccattgaag attatgaaca aaactggaac actaagcatg
gcagtggtgc actccgtgaa 780ctcactcgtg agctcaatgg aggtgcagtc actcgctatg
tcgacaacaa tttctgtggc 840ccagatgggt accctcttga ttgcatcaaa gattttctcg
cacgcgcggg caagtcaatg 900tgcactcttt ccgaacaact tgattacatc gagtcgaaga
gaggtgtcta ctgctgccgt 960gaccatgagc atgaaattgc ctggttcact gagcgctctg
ataagagcta cgagcaccag 1020acacccttcg aaattaagag tgccaagaaa tttgacactt
tcaaagggga atgcccaaag 1080tttgtgtttc ctcttaactc aaaagtcaaa gtcattcaac
cacgtgttga aaagaaaaag 1140actgagggtt tcatggggcg tatacgctct gtgtaccctg
ttgcatctcc acaggagtgt 1200aacaatatgc acttgtctac cttgatgaaa tgtaatcatt
gcgatgaagt ttcatggcag 1260acgtgcgact ttctgaaagc cacttgtgaa cattgtggca
ctgaaaattt agttattgaa 1320ggacctacta catgtgggta cctacctact aatgctgtag
tgaaaatgcc atgtcctgcc 1380tgtcaagacc cagagattgg acctgagcat agtgttgcag
attatcacaa ccactcaaac 1440attgaaactc gactccgcaa gggaggtagg actagatgtt
ttggaggctg tgtgtttgcc 1500tatgttggct gctataataa gcgtgcctac tgggttcctc
gtgctagtgc tgatattggc 1560tcaggccata ctggcattac tggtgacaat gtggagacct
tgaatgagga tctccttgag 1620atactgagtc gtgaacgtgt taacattaac attgttggcg
attttcattt gaatgaagag 1680gttgccatca ttttggcatc tttctctgct tctacaagtg
cctttattga cactataaag 1740agtcttgatt acaagtcttt caaaaccatt gttgagtcct
gcggtaacta taaagttacc 1800aagggaaagc ccgtaaaagg tgcttggaac attggacaac
agagatcagt tttaacacca 1860ctgtgtggtt ttccctcaca ggctgctggt gttatcagat
caatttttgc gcgcacactt 1920gatgcagcaa accactcaat tcctgatttg caaagagcag
ctgtcaccat acttgatggt 1980atttctgaac agtcattacg tcttgtcgac gccatggttt
atacttcaga cctgctcacc 2040aacagtgtca ttattatggc atatgtaact ggtggtcttg
tacaacagac ttctcagtgg 2100ttgtctaatc ttttgggcac tactgttgaa aaactcaggc
ctatctttga atggattgag 2160gcgaaactta gtgcaggagt tgaatttctc aaggatgctt
gggagattct caaatttctc 2220attacaggtg tttttgacat cgtcaagggt caaatacagg
ttgcttcaga taacatcaag 2280gattgtgtaa aatgcttcat tgatgttgtt aacaaggcac
tcgaaatgtg cattgatcaa 2340gtcactatcg ctggcgcaaa gttgcgatca ctcaacttag
gtgaagtctt catcgctcaa 2400agcaagggac tttaccgtca gtgtatacgt ggcaaggagc
agctgcaact actcatgcct 2460cttaaggcac caaaagaagt aacctttctt gaaggtgatt
cacatgacac agtacttacc 2520tctgaggagg ttgttctcaa gaacggtgaa ctcgaagcac
tcgagacgcc cgttgatagc 2580ttcacaaatg gagctatcgt tggcacacca gtctgtgtaa
atggcctcat gctcttagag 2640attaaggaca aagaacaata ctgcgcattg tctcctggtt
tactggctac aaacaatgtc 2700tttcgcttaa aagggggtgc accaattaaa ggtgtaacct
ttggagaaga tactgtttgg 2760gaagttcaag gttacaagaa tgtgagaatc acatttgagc
ttgatgaacg tgttgacaaa 2820gtgcttaatg aaaagtgctc tgtctacact gttgaatccg
gtaccgaagt tactgagttt 2880gcatgtgttg tagcagaggc tgttgtgaag actttacaac
cagtttctga tctccttacc 2940aacatgggta ttgatcttga tgagtggagt gtagctacat
tctacttatt tgatgatgct 3000ggtgaagaaa acttttcatc acgtatgtat tgttcctttt
accctccaga tgaggaagaa 3060gaggacgatg cagagtgtga ggaagaagaa attgatgaaa
cctgtgaaca tgagtacggt 3120acagaggatg attatcaagg tctccctctg gaatttggtg
cctcagctga aacagttcga 3180gttgaggaag aagaagagga agactggctg gatgatacta
ctgagcaatc agagattgag 3240ccagaaccag aacctacacc tgaagaacca gttaatcagt
ttactggtta tttaaaactt 3300actgacaatg ttgccattaa atgtgttgac atcgttaagg
aggcacaaag tgctaatcct 3360atggtgattg taaatgctgc taacatacac ctgaaacatg
gtggtggtgt agcaggtgca 3420ctcaacaagg caaccaatgg tgccatgcaa aaggagagtg
atgattacat taagctaaat 3480ggccctctta cagtaggagg gtcttgtttg ctttctggac
ataatcttgc taagaagtgt 3540ctgcatgttg ttggacctaa cctaaatgca ggtgaggaca
tccagcttct taaggcagca 3600tatgaaaatt tcaattcaca ggacatctta cttgcaccat
tgttgtcagc aggcatattt 3660ggtgctaaac cacttcagtc tttacaagtg tgcgtgcaga
cggttcgtac acaggtttat 3720attgcagtca atgacaaagc tctttatgag caggttgtca
tggattatct tgataacctg 3780aagcctagag tggaagcacc taaacaagag gagccaccaa
acacagaaga ttccaaaact 3840gaggagaaat ctgtcgtaca gaagcctgtc gatgtgaagc
caaaaattaa ggcctgcatt 3900gatgaggtta ccacaacact ggaagaaact aagtttctta
ccaataagtt actcttgttt 3960gctgatatca atggtaagct ttaccatgat tctcagaaca
tgcttagagg tgaagatatg 4020tctttccttg agaaggatgc accttacatg gtaggtgatg
ttatcactag tggtgatatc 4080acttgtgttg taataccctc caaaaaggct ggtggcacta
ctgagatgct ctcaagagct 4140ttgaagaaag tgccagttga tgagtatata accacgtacc
ctggacaagg atgtgctggt 4200tatacacttg aggaagctaa gactgctctt aagaaatgca
aatctgcatt ttatgtacta 4260ccttcagaag cacctaatgc taaggaagag attctaggaa
ctgtatcctg gaatttgaga 4320gaaatgcttg ctcatgctga agagacaaga aaattaatgc
ctatatgcat ggatgttaga 4380gccataatgg caaccatcca acgtaagtat aaaggaatta
aaattcaaga gggcatcgtt 4440gactatggtg tccgattctt cttttatact agtaaagagc
ctgtagcttc tattattacg 4500aagctgaact ctctaaatga gccgcttgtc acaatgccaa
ttggttatgt gacacatggt 4560tttaatcttg aagaggctgc gcgctgtatg cgttctctta
aagctcctgc cgtagtgtca 4620gtatcatcac cagatgctgt tactacatat aatggatacc
tcacttcgtc atcaaagaca 4680tctgaggagc actttgtaga aacagtttct ttggctggct
cttacagaga ttggtcctat 4740tcaggacagc gtacagagtt aggtgttgaa tttcttaagc
gtggtgacaa aattgtgtac 4800cacactctgg agagccccgt cgagtttcat cttgacggtg
aggttctttc acttgacaaa 4860ctaaagagtc tcttatccct gcgggaggtt aagactataa
aagtgttcac aactgtggac 4920aacactaatc tccacacaca gcttgtggat atgtctatga
catatggaca gcagtttggt 4980ccaacatact tggatggtgc tgatgttaca aaaattaaac
ctcatgtaaa tcatgagggt 5040aagactttct ttgtactacc tagtgatgac acactacgta
gtgaagcttt cgagtactac 5100catactcttg atgagagttt tcttggtagg tacatgtctg
ctttaaacca cacaaagaaa 5160tggaaatttc ctcaagttgg tggtttaact tcaattaaat
gggctgataa caattgttat 5220ttgtctagtg ttttattagc acttcaacag cttgaagtca
aattcaatgc accagcactt 5280caagaggctt attatagagc ccgtgctggt gatgctgcta
acttttgtgc actcatactc 5340gcttacagta ataaaactgt tggcgagctt ggtgatgtca
gagaaactat gacccatctt 5400ctacagcatg ctaatttgga atctgcaaag cgagttctta
atgtggtgtg taaacattgt 5460ggtcagaaaa ctactacctt aacgggtgta gaagctgtga
tgtatatggg tactctatct 5520tatgataatc ttaagacagg tgtttccatt ccatgtgtgt
gtggtcgtga tgctacacaa 5580tatctagtac aacaagagtc ttcttttgtt atgatgtctg
caccacctgc tgagtataaa 5640ttacagcaag gtacattctt atgtgcgaat gagtacactg
gtaactatca gtgtggtcat 5700tacactcata taactgctaa ggagaccctc tatcgtattg
acggagctca ccttacaaag 5760atgtcagagt acaaaggacc agtgactgat gttttctaca
aggaaacatc ttacactaca 5820accatcaagc ctgtgtcgta taaactcgat ggagttactt
acacagagat tgaaccaaaa 5880ttggatgggt attataaaaa ggataatgct tactatacag
agcagcctat agaccttgta 5940ccaactcaac cattaccaaa tgcgagtttt gataatttca
aactcacatg ttctaacaca 6000aaatttgctg atgatttaaa tcaaatgaca ggcttcacaa
agccagcttc acgagagcta 6060tctgtcacat tcttcccaga cttgaatggc gatgtagtgg
ctattgacta tagacactat 6120tcagcgagtt tcaagaaagg tgctaaatta ctgcataagc
caattgtttg gcacattaac 6180caggctacaa ccaagacaac gttcaaacca aacacttggt
gtttacgttg tctttggagt 6240acaaagccag tagatacttc aaattcattt gaagttctgg
cagtagaaga cacacaagga 6300atggacaatc ttgcttgtga aagtcaacaa cccacctctg
aagaagtagt ggaaaatcct 6360accatacaga aggaagtcat agagtgtgac gtgaaaacta
ccgaagttgt aggcaatgtc 6420atacttaaac catcagatga aggtgttaaa gtaacacaag
agttaggtca tgaggatctt 6480atggctgctt atgtggaaaa cacaagcatt accattaaga
aacctaatga gctttcacta 6540gccttaggtt taaaaacaat tgccactcat ggtattgctg
caattaatag tgttccttgg 6600agtaaaattt tggcttatgt caaaccattc ttaggacaag
cagcaattac aacatcaaat 6660tgcgctaaga gattagcaca acgtgtgttt aacaattata
tgccttatgt gtttacatta 6720ttgttccaat tgtgtacttt tactaaaagt accaattcta
gaattagagc ttcactacct 6780acaactattg ctaaaaatag tgttaagagt gttgctaaat
tatgtttgga tgccggcatt 6840aattatgtga agtcacccaa attttctaaa ttgttcacaa
tcgctatgtg gctattgttg 6900ttaagtattt gcttaggttc tctaatctgt gtaactgctg
cttttggtgt actcttatct 6960aattttggtg ctccttctta ttgtaatggc gttagagaat
tgtatcttaa ttcgtctaac 7020gttactacta tggatttctg tgaaggttct tttccttgca
gcatttgttt aagtggatta 7080gactcccttg attcttatcc agctcttgaa accattcagg
tgacgatttc atcgtacaag 7140ctagacttga caattttagg tctggccgct gagtgggttt
tggcatatat gttgttcaca 7200aaattctttt atttattagg tctttcagct ataatgcagg
tgttctttgg ctattttgct 7260agtcatttca tcagcaattc ttggctcatg tggtttatca
ttagtattgt acaaatggca 7320cccgtttctg caatggttag gatgtacatc ttctttgctt
ctttctacta catatggaag 7380agctatgttc atatcatgga tggttgcacc tcttcgactt
gcatgatgtg ctataagcgc 7440aatcgtgcca cacgcgttga gtgtacaact attgttaatg
gcatgaagag atctttctat 7500gtctatgcaa atggaggccg tggcttctgc aagactcaca
attggaattg tctcaattgt 7560gacacatttt gcactggtag tacattcatt agtgatgaag
ttgctcgtga tttgtcactc 7620cagtttaaaa gaccaatcaa ccctactgac cagtcatcgt
atattgttga tagtgttgct 7680gtgaaaaatg gcgcgcttca cctctacttt gacaaggctg
gtcaaaagac ctatgagaga 7740catccgctct cccattttgt caatttagac aatttgagag
ctaacaacac taaaggttca 7800ctgcctatta atgtcatagt ttttgatggc aagtccaaat
gcgacgagtc tgcttctaag 7860tctgcttctg tgtactacag tcagctgatg tgccaaccta
ttctgttgct tgaccaagct 7920cttgtatcag acgttggaga tagtactgaa gtttccgtta
agatgtttga tgcttatgtc 7980gacacctttt cagcaacttt tagtgttcct atggaaaaac
ttaaggcact tgttgctaca 8040gctcacagcg agttagcaaa gggtgtagct ttagatggtg
tcctttctac attcgtgtca 8100gctgcccgac aaggtgttgt tgataccgat gttgacacaa
aggatgttat tgaatgtctc 8160aaactttcac atcactctga cttagaagtg acaggtgaca
gttgtaacaa tttcatgctc 8220acctataata aggttgaaaa catgacgccc agagatcttg
gcgcatgtat tgactgtaat 8280gcaaggcata tcaatgccca agtagcaaaa agtcacaatg
tttcactcat ctggaatgta 8340aaagactaca tgtctttatc tgaacagctg cgtaaacaaa
ttcgtagtgc tgccaagaag 8400aacaacatac cttttagact aacttgtgct acaactagac
aggttgtcaa tgtcataact 8460actaaaatct cactcaaggg tggtaagatt gttagtactt
gttttaaact tatgcttaag 8520gccacattat tgtgcgttct tgctgcattg gtttgttata
tcgttatgcc agtacataca 8580ttgtcaatcc atgatggtta cacaaatgaa atcattggtt
acaaagccat tcaggatggt 8640gtcactcgtg acatcatttc tactgatgat tgttttgcaa
ataaacatgc tggttttgac 8700gcatggttta gccagcgtgg tggttcatac aaaaatgaca
aaagctgccc tgtagtagct 8760gctatcatta caagagagat tggtttcata gtgcctggct
taccgggtac tgtgctgaga 8820gcaatcaatg gtgacttctt gcattttcta cctcgtgttt
ttagtgctgt tggcaacatt 8880tgctacacac cttccaaact cattgagtat agtgattttg
ctacctctgc ttgcgttctt 8940gctgctgagt gtacaatttt taaggatgct atgggcaaac
ctgtgccata ttgttatgac 9000actaatttgc tagagggttc tatttcttat agtgagcttc
gtccagacac tcgttatgtg 9060cttatggatg gttccatcat acagtttcct aacacttacc
tggagggttc tgttagagta 9120gtaacaactt ttgatgctga gtactgtaga catggtacat
gcgaaaggtc agaagtaggt 9180atttgcctat ctaccagtgg tagatgggtt cttaataatg
agcattacag agctctatca 9240ggagttttct gtggtgttga tgcgatgaat ctcatagcta
acatctttac tcctcttgtg 9300caacctgtgg gtgctttaga tgtgtctgct tcagtagtgg
ctggtggtat tattgccata 9360ttggtgactt gtgctgccta ctactttatg aaattcagac
gtgtttttgg tgagtacaac 9420catgttgttg ctgctaatgc acttttgttt ttgatgtctt
tcactatact ctgtctggta 9480ccagcttaca gctttctgcc gggagtctac tcagtctttt
acttgtactt gacattctat 9540ttcaccaatg atgtttcatt cttggctcac cttcaatggt
ttgccatgtt ttctcctatt 9600gtgccttttt ggataacagc aatctatgta ttctgtattt
ctctgaagca ctgccattgg 9660ttctttaaca actatcttag gaaaagagtc atgtttaatg
gagttacatt tagtaccttc 9720gaggaggctg ctttgtgtac ctttttgctc aacaaggaaa
tgtacctaaa attgcgtagc 9780gagacactgt tgccacttac acagtataac aggtatcttg
ctctatataa caagtacaag 9840tatttcagtg gagccttaga tactaccagc tatcgtgaag
cagcttgctg ccacttagca 9900aaggctctaa atgactttag caactcaggt gctgatgttc
tctaccaacc accacagaca 9960tcaatcactt ctgctgttct gcagagtggt tttaggaaaa
tggcattccc gtcaggcaaa 10020gttgaagggt gcatggtaca agtaacctgt ggaactacaa
ctcttaatgg attgtggttg 10080gatgacacag tatactgtcc aagacatgtc atttgcacag
cagaagacat gcttaatcct 10140aactatgaag atctgctcat tcgcaaatcc aaccatagct
ttcttgttca ggctggcaat 10200gttcaacttc gtgttattgg ccattctatg caaaattgtc
tgcttaggct taaagttgat 10260acttctaacc ctaagacacc caagtataaa tttgtccgta
tccaacctgg tcaaacattt 10320tcagttctag catgctacaa tggttcacca tctggtgttt
atcagtgtgc catgagacct 10380aatcatacca ttaaaggttc tttccttaat ggatcatgtg
gtagtgttgg ttttaacatt 10440gattatgatt gcgtgtcttt ctgctatatg catcatatgg
agcttccaac aggagtacac 10500gctggtactg acttagaagg taaattctat ggtccatttg
ttgacagaca aactgcacag 10560gctgcaggta cagacacaac cataacatta aatgttttgg
catggctgta tgctgctgtt 10620atcaatggtg ataggtggtt tcttaataga ttcaccacta
ctttgaatga ctttaacctt 10680gtggcaatga agtacaacta tgaacctttg acacaagatc
atgttgacat attgggacct 10740ctttctgctc aaacaggaat tgccgtctta gatatgtgtg
ctgctttgaa agagctgctg 10800cagaatggta tgaatggtcg tactatcctt ggtagcacta
ttttagaaga tgagtttaca 10860ccatttgatg ttgttagaca atgctctggt gttaccttcc
aaggtaagtt caagaaaatt 10920gttaagggca ctcatcattg gatgctttta actttcttga
catcactatt gattcttgtt 10980caaagtacac agtggtcact gtttttcttt gtttacgaga
atgctttctt gccatttact 11040cttggtatta tggcaattgc tgcatgtgct atgctgcttg
ttaagcataa gcacgcattc 11100ttgtgcttgt ttctgttacc ttctcttgca acagttgctt
actttaatat ggtctacatg 11160cctgctagct gggtgatgcg tatcatgaca tggcttgaat
tggctgacac tagcttgtct 11220ggttataggc ttaaggattg tgttatgtat gcttcagctt
tagttttgct tattctcatg 11280acagctcgca ctgtttatga tgatgctgct agacgtgttt
ggacactgat gaatgtcatt 11340acacttgttt acaaagtcta ctatggtaat gctttagatc
aagctatttc catgtgggcc 11400ttagttattt ctgtaacctc taactattct ggtgtcgtta
cgactatcat gtttttagct 11460agagctatag tgtttgtgtg tgttgagtat tacccattgt
tatttattac tggcaacacc 11520ttacagtgta tcatgcttgt ttattgtttc ttaggctatt
gttgctgctg ctactttggc 11580cttttctgtt tactcaaccg ttacttcagg cttactcttg
gtgtttatga ctacttggtc 11640tctacacaag aatttaggta tatgaactcc caggggcttt
tgcctcctaa gagtagtatt 11700gatgctttca agcttaacat taagttgttg ggtattggag
gtaaaccatg tatcaaggtt 11760gctactgtac agtctaaaat gtctgacgta aagtgcacat
ctgtggtact gctctcggtt 11820cttcaacaac ttagagtaga gtcatcttct aaattgtggg
cacaatgtgt acaactccac 11880aatgatattc ttcttgcaaa agacacaact gaagctttcg
agaagatggt ttctcttttg 11940tctgttttgc tatccatgca gggtgctgta gacattaata
ggttgtgcga ggaaatgctc 12000gataaccgtg ctactcttca ggctattgct tcagaattta
gttctttacc atcatatgcc 12060gcttatgcca ctgcccagga ggcctatgag caggctgtag
ctaatggtga ttctgaagtc 12120gttctcaaaa agttaaagaa atctttgaat gtggctaaat
ctgagtttga ccgtgatgct 12180gccatgcaac gcaagttgga aaagatggca gatcaggcta
tgacccaaat gtacaaacag 12240gcaagatctg aggacaagag ggcaaaagta actagtgcta
tgcaaacaat gctcttcact 12300atgcttagga agcttgataa tgatgcactt aacaacatta
tcaacaatgc gcgtgatggt 12360tgtgttccac tcaacatcat accattgact acagcagcca
aactcatggt tgttgtccct 12420gattatggta cctacaagaa cacttgtgat ggtaacacct
ttacatatgc atctgcactc 12480tgggaaatcc agcaagttgt tgatgcggat agcaagattg
ttcaacttag tgaaattaac 12540atggacaatt caccaaattt ggcttggcct cttattgtta
cagctctaag agccaactca 12600gctgttaaac tacagaataa tgaactgagt ccagtagcac
tacgacagat gtcctgtgcg 12660gctggtacca cacaaacagc ttgtactgat gacaatgcac
ttgcctacta taacaattcg 12720aagggaggta ggtttgtgct ggcattacta tcagaccacc
aagatctcaa atgggctaga 12780ttccctaaga gtgatggtac aggtacaatt tacacagaac
tggaaccacc ttgtaggttt 12840gttacagaca caccaaaagg gcctaaagtg aaatacttgt
acttcatcaa aggcttaaac 12900aacctaaata gaggtatggt gctgggcagt ttagctgcta
cagtacgtct tcaggctgga 12960aatgctacag aagtacctgc caattcaact gtgctttcct
tctgtgcttt tgcagtagac 13020cctgctaaag catataagga ttacctagca agtggaggac
aaccaatcac caactgtgtg 13080aagatgttgt gtacacacac tggtacagga caggcaatta
ctgtaacacc agaagctaac 13140atggaccaag agtcctttgg tggtgcttca tgttgtctgt
attgtagatg ccacattgac 13200catccaaatc ctaaaggatt ctgtgacttg aaaggtaagt
acgtccaaat acctaccact 13260tgtgctaatg acccagtggg ttttacactt agaaacacag
tctgtaccgt ctgcggaatg 13320tggaaaggtt atggctgtag ttgtgaccaa ctccgcgaac
ccttgatgca gtctgcggat 13380gcatcaacgt ttttaaacgg gtttgcggtg taagtgcagc
ccgtcttaca ccgtgcggca 13440caggcactag tactgatgtc gtctacaggg cttttgatat
ttacaacgaa aaagttgctg 13500gttttgcaaa gttcctaaaa actaattgct gtcgcttcca
ggagaaggat gaggaaggca 13560atttattaga ctcttacttt gtagttaaga ggcatactat
gtctaactac caacatgaag 13620agactattta taacttggtt aaagattgtc cagcggttgc
tgtccatgac tttttcaagt 13680ttagagtaga tggtgacatg gtaccacata tatcacgtca
gcgtctaact aaatacacaa 13740tggctgattt agtctatgct ctacgtcatt ttgatgaggg
taattgtgat acattaaaag 13800aaatactcgt cacatacaat tgctgtgatg atgattattt
caataagaag gattggtatg 13860acttcgtaga gaatcctgac atcttacgcg tatatgctaa
cttaggtgag cgtgtacgcc 13920aatcattatt aaagactgta caattctgcg atgctatgcg
tgatgcaggc attgtaggcg 13980tactgacatt agataatcag gatcttaatg ggaactggta
cgatttcggt gatttcgtac 14040aagtagcacc aggctgcgga gttcctattg tggattcata
ttactcattg ctgatgccca 14100tcctcacttt gactagggca ttggctgctg agtcccatat
ggatgctgat ctcgcaaaac 14160cacttattaa gtgggatttg ctgaaatatg attttacgga
agagagactt tgtctcttcg 14220accgttattt taaatattgg gaccagacat accatcccaa
ttgtattaac tgtttggatg 14280ataggtgtat ccttcattgt gcaaacttta atgtgttatt
ttctactgtg tttccaccta 14340caagttttgg accactagta agaaaaatat ttgtagatgg
tgttcctttt gttgtttcaa 14400ctggatacca ttttcgtgag ttaggagtcg tacataatca
ggatgtaaac ttacatagct 14460cgcgtctcag tttcaaggaa cttttagtgt atgctgctga
tccagctatg catgcagctt 14520ctggcaattt attgctagat aaacgcacta catgcttttc
agtagctgca ctaacaaaca 14580atgttgcttt tcaaactgtc aaacccggta attttaataa
agacttttat gactttgctg 14640tgtctaaagg tttctttaag gaaggaagtt ctgttgaact
aaaacacttc ttctttgctc 14700aggatggcaa cgctgctatc agtgattatg actattatcg
ttataatctg ccaacaatgt 14760gtgatatcag acaactccta ttcgtagttg aagttgttga
taaatacttt gattgttacg 14820atggtggctg tattaatgcc aaccaagtaa tcgttaacaa
tctggataaa tcagctggtt 14880tcccatttaa taaatggggt aaggctagac tttattatga
ctcaatgagt tatgaggatc 14940aagatgcact tttcgcgtat actaagcgta atgtcatccc
tactataact caaatgaatc 15000ttaagtatgc cattagtgca aagaatagag ctcgcaccgt
agctggtgtc tctatctgta 15060gtactatgac aaatagacag tttcatcaga aattattgaa
gtcaatagcc gccactagag 15120gagctactgt ggtaattgga acaagcaagt tttacggtgg
ctggcataat atgttaaaaa 15180ctgtttacag tgatgtagaa actccacacc ttatgggttg
ggattatcca aaatgtgaca 15240gagccatgcc taacatgctt aggataatgg cctctcttgt
tcttgctcgc aaacataaca 15300cttgctgtaa cttatcacac cgtttctaca ggttagctaa
cgagtgtgcg caagtattaa 15360gtgagatggt catgtgtggc ggctcactat atgttaaacc
aggtggaaca tcatccggtg 15420atgctacaac tgcttatgct aatagtgtct ttaacatttg
tcaagctgtt acagccaatg 15480taaatgcact tctttcaact gatggtaata agatagctga
caagtatgtc cgcaatctac 15540aacacaggct ctatgagtgt ctctatagaa atagggatgt
tgatcatgaa ttcgtggatg 15600agttttacgc ttacctgcgt aaacatttct ccatgatgat
tctttctgat gatgccgttg 15660tgtgctataa cagtaactat gcggctcaag gtttagtagc
tagcattaag aactttaagg 15720cagttcttta ttatcaaaat aatgtgttca tgtctgaggc
aaaatgttgg actgagactg 15780accttactaa aggacctcac gaattttgct cacagcatac
aatgctagtt aaacaaggag 15840atgattacgt gtacctgcct tacccagatc catcaagaat
attaggcgca ggctgttttg 15900tcgatgatat tgtcaaaaca gatggtacac ttatgattga
aaggttcgtg tcactggcta 15960ttgatgctta cccacttaca aaacatccta atcaggagta
tgctgatgtc tttcacttgt 16020atttacaata cattagaaag ttacatgatg agcttactgg
ccacatgttg gacatgtatt 16080ccgtaatgct aactaatgat aacacctcac ggtactggga
acctgagttt tatgaggcta 16140tgtacacacc acatacagtc ttgcaggctg taggtgcttg
tgtattgtgc aattcacaga 16200cttcacttcg ttgcggtgcc tgtattagga gaccattcct
atgttgcaag tgctgctatg 16260accatgtcat ttcaacatca cacaaattag tgttgtctgt
taatccctat gtttgcaatg 16320ccccaggttg tgatgtcact gatgtgacac aactgtatct
aggaggtatg agctattatt 16380gcaagtcaca taagcctccc attagttttc cattatgtgc
taatggtcag gtttttggtt 16440tatacaaaaa cacatgtgta ggcagtgaca atgtcactga
cttcaatgcg atagcaacat 16500gtgattggac taatgctggc gattacatac ttgccaacac
ttgtactgag agactcaagc 16560ttttcgcagc agaaacgctc aaagccactg aggaaacatt
taagctgtca tatggtattg 16620ccactgtacg cgaagtactc tctgacagag aattgcatct
ttcatgggag gttggaaaac 16680ctagaccacc attgaacaga aactatgtct ttactggtta
ccgtgtaact aaaaatagta 16740aagtacagat tggagagtac acctttgaaa aaggtgacta
tggtgatgct gttgtgtaca 16800gaggtactac gacatacaag ttgaatgttg gtgattactt
tgtgttgaca tctcacactg 16860taatgccact tagtgcacct actctagtgc cacaagagca
ctatgtgaga attactggct 16920tgtacccaac actcaacatc tcagatgagt tttctagcaa
tgttgcaaat tatcaaaagg 16980tcggcatgca aaagtactct acactccaag gaccacctgg
tactggtaag agtcattttg 17040ccatcggact tgctctctat tacccatctg ctcgcatagt
gtatacggca tgctctcatg 17100cagctgttga tgccctatgt gaaaaggcat taaaatattt
gcccatagat aaatgtagta 17160gaatcatacc tgcgcgtgcg cgcgtagagt gttttgataa
attcaaagtg aattcaacac 17220tagaacagta tgttttctgc actgtaaatg cattgccaga
aacaactgct gacattgtag 17280tctttgatga aatctctatg gctactaatt atgacttgag
tgttgtcaat gctagacttc 17340gtgcaaaaca ctacgtctat attggcgatc ctgctcaatt
accagccccc cgcacattgc 17400tgactaaagg cacactagaa ccagaatatt ttaattcagt
gtgcagactt atgaaaacaa 17460taggtccaga catgttcctt ggaacttgtc gccgttgtcc
tgctgaaatt gttgacactg 17520tgagtgcttt agtttatgac aataagctaa aagcacacaa
ggataagtca gctcaatgct 17580tcaaaatgtt ctacaaaggt gttattacac atgatgtttc
atctgcaatc aacagacctc 17640aaataggcgt tgtaagagaa tttcttacac gcaatcctgc
ttggagaaaa gctgttttta 17700tctcacctta taattcacag aacgctgtag cttcaaaaat
cttaggattg cctacgcaga 17760ctgttgattc atcacagggt tctgaatatg actatgtcat
attcacacaa actactgaaa 17820cagcacactc ttgtaatgtc aaccgcttca atgtggctat
cacaagggca aaaattggca 17880ttttgtgcat aatgtctgat agagatcttt atgacaaact
gcaatttaca agtctagaaa 17940taccacgtcg caatgtggct acattacaag cagaaaatgt
aactggactt tttaaggact 18000gtagtaagat cattactggt cttcatccta cacaggcacc
tacacacctc agcgttgata 18060taaagttcaa gactgaagga ttatgtgttg acataccagg
cataccaaag gacatgacct 18120accgtagact catctctatg atgggtttca aaatgaatta
ccaagtcaat ggttacccta 18180atatgtttat cacccgcgaa gaagctattc gtcacgttcg
tgcgtggatt ggctttgatg 18240tagagggctg tcatgcaact agagatgctg tgggtactaa
cctacctctc cagctaggat 18300tttctacagg tgttaactta gtagctgtac cgactggtta
tgttgacact gaaaataaca 18360cagaattcac cagagttaat gcaaaacctc caccaggtga
ccagtttaaa catcttatac 18420cactcatgta taaaggcttg ccctggaatg tagtgcgtat
taagatagta caaatgctca 18480gtgatacact gaaaggattg tcagacagag tcgtgttcgt
cctttgggcg catggctttg 18540agcttacatc aatgaagtac tttgtcaaga ttggacctga
aagaacgtgt tgtctgtgtg 18600acaaacgtgc aacttgcttt tctacttcat cagatactta
tgcctgctgg aatcattctg 18660tgggttttga ctatgtctat aacccattta tgattgatgt
tcagcagtgg ggctttacgg 18720gtaaccttca gagtaaccat gaccaacatt gccaggtaca
tggaaatgca catgtggcta 18780gttgtgatgc tatcatgact agatgtttag cagtccatga
gtgctttgtt aagcgcgttg 18840attggtctgt tgaataccct attataggag atgaactgag
ggttaattct gcttgcagaa 18900aagtacaaca catggttgtg aagtctgcat tgcttgctga
taagtttcca gttcttcatg 18960acattggaaa tccaaaggct atcaagtgtg tgcctcaggc
tgaagtagaa tggaagttct 19020acgatgctca gccatgtagt gacaaagctt acaaaataga
ggaactcttc tattcttatg 19080ctacacatca cgataaattc actgatggtg tttgtttgtt
ttggaattgt aacgttgatc 19140gttacccagc caatgcaatt gtgtgtaggt ttgacacaag
agtcttgtca aacttgaact 19200taccaggctg tgatggtggt agtttgtatg tgaataagca
tgcattccac actccagctt 19260tcgataaaag tgcatttact aatttaaagc aattgccttt
cttttactat tctgatagtc 19320cttgtgagtc tcatggcaaa caagtagtgt cggatattga
ttatgttcca ctcaaatctg 19380ctacgtgtat tacacgatgc aatttaggtg gtgctgtttg
cagacaccat gcaaatgagt 19440accgacagta cttggatgca tataatatga tgatttctgc
tggatttagc ctatggattt 19500acaaacaatt tgatacttat aacctgtgga atacatttac
caggttacag agtttagaaa 19560atgtggctta taatgttgtt aataaaggac actttgatgg
acacgccggc gaagcacctg 19620tttccatcat taataatgct gtttacacaa aggtagatgg
tattgatgtg gagatctttg 19680aaaataagac aacacttcct gttaatgttg catttgagct
ttgggctaag cgtaacatta 19740aaccagtgcc agagattaag atactcaata atttgggtgt
tgatatcgct gctaatactg 19800taatctggga ctacaaaaga gaagccccag cacatgtatc
tacaataggt gtctgcacaa 19860tgactgacat tgccaagaaa cctactgaga gtgcttgttc
ttcacttact gtcttgtttg 19920atggtagagt ggaaggacag gtagaccttt ttagaaacgc
ccgtaatggt gttttaataa 19980cagaaggttc agtcaaaggt ctaacacctt caaagggacc
agcacaagct agcgtcaatg 20040gagtcacatt aattggagaa tcagtaaaaa cacagtttaa
ctactttaag aaagtagacg 20100gcattattca acagttgcct gaaacctact ttactcagag
cagagactta gaggatttta 20160agcccagatc acaaatggaa actgactttc tcgagctcgc
tatggatgaa ttcatacagc 20220gatataagct cgagggctat gccttcgaac acatcgttta
tggagatttc agtcatggac 20280aacttggcgg tcttcattta atgataggct tagccaagcg
ctcacaagat tcaccactta 20340aattagagga ttttatccct atggacagca cagtgaaaaa
ttacttcata acagatgcgc 20400aaacaggttc atcaaaatgt gtgtgttctg tgattgatct
tttacttgat gactttgtcg 20460agataataaa gtcacaagat ttgtcagtga tttcaaaagt
ggtcaaggtt acaattgact 20520atgctgaaat ttcattcatg ctttggtgta aggatggaca
tgttgaaacc ttctacccaa 20580aactacaagc aagtcaagcg tggcaaccag gtgttgcgat
gcctaacttg tacaagatgc 20640aaagaatgct tcttgaaaag tgtgaccttc agaattatgg
tgaaaatgct gttataccaa 20700aaggaataat gatgaatgtc gcaaagtata ctcaactgtg
tcaatactta aatacactta 20760ctttagctgt accctacaac atgagagtta ttcactttgg
tgctggctct gataaaggag 20820ttgcaccagg tacagctgtg ctcagacaat ggttgccaac
tggcacacta cttgtcgatt 20880cagatcttaa tgacttcgtc tccgacgcag attctacttt
aattggagac tgtgcaacag 20940tacatacggc taataaatgg gaccttatta ttagcgatat
gtatgaccct aggaccaaac 21000atgtgacaaa agagaatgac tctaaagaag ggtttttcac
ttatctgtgt ggatttataa 21060agcaaaaact agccctgggt ggttctatag ctgtaaagat
aacagagcat tcttggaatg 21120ctgaccttta caagcttatg ggccatttct catggtggac
agcttttgtt acaaatgtaa 21180atgcatcatc atcggaagca tttttaattg gggctaacta
tcttggcaag ccgaaggaac 21240aaattgatgg ctataccatg catgctaact acattttctg
gaggaacaca aatcctatcc 21300agttgtcttc ctattcactc tttgacatga gcaaatttcc
tcttaaatta agaggaactg 21360ctgtaatgtc tcttaaggag aatcaaatca atgatatgat
ttattctctt ctggaaaaag 21420gtaggcttat cattagagaa aacaacagag ttgtggtttc
aagtgatatt cttgttaaca 21480actaaacgaa catgtttatt ttcttattat ttcttactct
cactagtggt agtgaccttg 21540accggtgcac cacttttgat gatgttcaag ctcctaatta
cactcaacat acttcatcta 21600tgaggggggt ttactatcct gatgaaattt ttagatcaga
cactctttat ttaactcagg 21660atttatttct tccattttat tctaatgtta cagggtttca
tactattaat catacgtttg 21720gcaaccctgt catacctttt aaggatggta tttattttgc
tgccacagag aaatcaaatg 21780ttgtccgtgg ttgggttttt ggttctacca tgaacaacaa
gtcacagtcg gtgattatta 21840ttaacaattc tactaatgtt gttatacgag catgtaactt
tgaattgtgt gacaaccctt 21900tctttgctgt ttctaaaccc atgggtacac agacacatac
tatgatattc gataatgcat 21960ttaattgcac tttcgagtac atatctgatg ccttttcgct
tgatgtttca gaaaagtcag 22020gtaattttaa acacttacga gagtttgtgt ttaaaaataa
agatgggttt ctctatgttt 22080ataagggcta tcaacctata gatgtagttc gtgatctacc
ttctggtttt aacactttga 22140aacctatttt taagttgcct cttggtatta acattacaaa
ttttagagcc attcttacag 22200ccttttcacc tgctcaagac atttggggca cgtcagctgc
agcctatttt gttggctatt 22260taaagccaac tacatttatg ctcaagtatg atgaaaatgg
tacaatcaca gatgctgttg 22320attgttctca aaatccactt gctgaactca aatgctctgt
taagagcttt gagattgaca 22380aaggaattta ccagacctct aatttcaggg ttgttccctc
aggagatgtt gtgagattcc 22440ctaatattac aaacttgtgt ccttttggag aggtttttaa
tgctactaaa ttcccttctg 22500tctatgcatg ggagagaaaa aaaatttcta attgtgttgc
tgattactct gtgctctaca 22560actcaacatt tttttcaacc tttaagtgct atggcgtttc
tgccactaag ttgaatgatc 22620tttgcttctc caatgtctat gcagattctt ttgtagtcaa
gggagatgat gtaagacaaa 22680tagcgccagg acaaactggt gttattgctg attataatta
taaattgcca gatgatttca 22740tgggttgtgt ccttgcttgg aatactagga acattgatgc
tacttcaact ggtaattata 22800attataaata taggtatctt agacatggca agcttaggcc
ctttgagaga gacatatcta 22860atgtgccttt ctcccctgat ggcaaacctt gcaccccacc
tgctcttaat tgttattggc 22920cattaaatga ttatggtttt tacaccacta ctggcattgg
ctaccaacct tacagagttg 22980tagtactttc ttttgaactt ttaaatgcac cggccacggt
ttgtggacca aaattatcca 23040ctgaccttat taagaaccag tgtgtcaatt ttaattttaa
tggactcact ggtactggtg 23100tgttaactcc ttcttcaaag agatttcaac catttcaaca
atttggccgt gatgtttctg 23160atttcactga ttccgttcga gatcctaaaa catctgaaat
attagacatt tcaccttgct 23220cttttggggg tgtaagtgta attacacctg gaacaaatgc
ttcatctgaa gttgctgttc 23280tatatcaaga tgttaactgc actgatgttt ctacagcaat
tcatgcagat caactcacac 23340cagcttggcg catatattct actggaaaca atgtattcca
gactcaagca ggctgtctta 23400taggagctga gcatgtcgac acttcttatg agtgcgacat
tcctattgga gctggcattt 23460gtgctagtta ccatacagtt tctttattac gtagtactag
ccaaaaatct attgtggctt 23520atactatgtc tttaggtgct gatagttcaa ttgcttactc
taataacacc attgctatac 23580ctactaactt ttcaattagc attactacag aagtaatgcc
tgtttctatg gctaaaacct 23640ccgtagattg taatatgtac atctgcggag attctactga
atgtgctaat ttgcttctcc 23700aatatggtag cttttgcaca caactaaatc gtgcactctc
aggtattgct gctgaacagg 23760atcgcaacac acgtgaagtg ttcgctcaag tcaaacaaat
gtacaaaacc ccaactttga 23820aatattttgg tggttttaat ttttcacaaa tattacctga
ccctctaaag ccaactaaga 23880ggtcttttat tgaggacttg ctctttaata aggtgacact
cgctgatgct ggcttcatga 23940agcaatatgg cgaatgccta ggtgatatta atgctagaga
tctcatttgt gcgcagaagt 24000tcaatggact tacagtgttg ccacctctgc tcactgatga
tatgattgct gcctacactg 24060ctgctctagt tagtggtact gccactgctg gatggacatt
tggtgctggc gctgctcttc 24120aaataccttt tgctatgcaa atggcatata ggttcaatgg
cattggagtt acccaaaatg 24180ttctctatga gaaccaaaaa caaatcgcca accaatttaa
caaggcgatt agtcaaattc 24240aagaatcact tacaacaaca tcaactgcat tgggcaagct
gcaagacgtt gttaaccaga 24300atgctcaagc attaaacaca cttgttaaac aacttagctc
taattttggt gcaatttcaa 24360gtgtgctaaa tgatatcctt tcgcgacttg ataaagtcga
ggcggaggta caaattgaca 24420ggttaattac aggcagactt caaagccttc aaacctatgt
aacacaacaa ctaatcaggg 24480ctgctgaaat cagggcttct gctaatcttg ctgctactaa
aatgtctgag tgtgttcttg 24540gacaatcaaa aagagttgac ttttgtggaa agggctacca
ccttatgtcc ttcccacaag 24600cagccccgca tggtgttgtc ttcctacatg tcacgtatgt
gccatcccag gagaggaact 24660tcaccacagc gccagcaatt tgtcatgaag gcaaagcata
cttccctcgt gaaggtgttt 24720ttgtgtttaa tggcacttct tggtttatta cacagaggaa
cttcttttct ccacaaataa 24780ttactacaga caatacattt gtctcaggaa attgtgatgt
cgttattggc atcattaaca 24840acacagttta tgatcctctg caacctgagc ttgactcatt
caaagaagag ctggacaagt 24900acttcaaaaa tcatacatca ccagatgttg atcttggcga
catttcaggc attaacgctt 24960ctgtcgtcaa cattcaaaaa gaaattgacc gcctcaatga
ggtcgctaaa aatttaaatg 25020aatcactcat tgaccttcaa gaattgggaa aatatgagca
atatattaaa tggccttggt 25080atgtttggct cggcttcatt gctggactaa ttgccatcgt
catggttaca atcttgcttt 25140gttgcatgac tagttgttgc agttgcctca agggtgcatg
ctcttgtggt tcttgctgca 25200agtttgatga ggatgactct gagccagttc tcaagggtgt
caaattacat tacacataaa 25260cgaacttatg gatttgttta tgagattttt tactcttgga
tcaattactg cacagccagt 25320aaaaattgac aatgcttctc ctgcaagtac tgttcatgct
acagcaacga taccgctaca 25380agcctcactc cctttcggat ggcttgttat tggcgttgca
tttcttgctg tttttcagag 25440cgctaccaaa ataattgcgc tcaataaaag atggcagcta
gccctttata agggcttcca 25500gttcatttgc aatttactgc tgctatttgt taccatctat
tcacatcttt tgcttgtcgc 25560tgcaggtatg gaggcgcaat ttttgtacct ctatgccttg
atatattttc tacaatgcat 25620caacgcatgt agaattatta tgagatgttg gctttgttgg
aagtgcaaat ccaagaaccc 25680attactttat gatgccaact actttgtttg ctggcacaca
cataactatg actactgtat 25740accatataac agtgtcacag atacaattgt cgttactgaa
ggtgacggca tttcaacacc 25800aaaactcaaa gaagactacc aaattggtgg ttattctgag
gataggcact caggtgttaa 25860agactatgtc gttgtacatg gctatttcac cgaagtttac
taccagcttg agtctacaca 25920aattactaca gacactggta ttgaaaatgc tacattcttc
atctttaaca agcttgttaa 25980agacccaccg aatgtgcaaa tacacacaat cgacggctct
tcaggagttg ctaatccagc 26040aatggatcca atttatgatg agccgacgac gactactagc
gtgcctttgt aagcacaaga 26100aagtgagtac gaacttatgt actcattcgt ttcggaagaa
acaggtacgt taatagttaa 26160tagcgtactt ctttttcttg ctttcgtggt attcttgcta
gtcacactag ccatccttac 26220tgcgcttcga ttgtgtgcgt actgctgcaa tattgttaac
gtgagtttag taaaaccaac 26280ggtttacgtc tactcgcgtg ttaaaaatct gaactcttct
gaaggagttc ctgatcttct 26340ggtctaaacg aactaactat tattattatt ctgtttggaa
ctttaacatt gcttatcatg 26400gcagacaacg gtactattac cgttgaggag cttaaacaac
tcctggaaca atggaaccta 26460gtaataggtt tcctattcct agcctggatt atgttactac
aatttgccta ttctaatcgg 26520aacaggtttt tgtacataat aaagcttgtt ttcctctggc
tcttgtggcc agtaacactt 26580gcttgttttg tgcttgctgc tgtctacaga attaattggg
tgactggcgg gattgcgatt 26640gcaatggctt gtattgtagg cttgatgtgg cttagctact
tcgttgcttc cttcaggctg 26700tttgctcgta cccgctcaat gtggtcattc aacccagaaa
caaacattct tctcaatgtg 26760cctctccggg ggacaattgt gaccagaccg ctcatggaaa
gtgaacttgt cattggtgct 26820gtgatcattc gtggtcactt gcgaatggcc ggacactccc
tagggcgctg tgacattaag 26880gacctgccaa aagagatcac tgtggctaca tcacgaacgc
tttcttatta caaattagga 26940gcgtcgcagc gtgtaggcac tgattcaggt tttgctgcat
acaaccgcta ccgtattgga 27000aactataaat taaatacaga ccacgccggt agcaacgaca
atattgcttt gctagtacag 27060taagtgacaa cagatgtttc atcttgttga cttccaggtt
acaatagcag agatattgat 27120tatcattatg aggactttca ggattgctat ttggaatctt
gacgttataa taagttcaat 27180agtgagacaa ttatttaagc ctctaactaa gaagaattat
tcggagttag atgatgaaga 27240acctatggag ttagattatc cataaaacga acatgaaaat
tattctcttc ctgacattga 27300ttgtatttac atcttgcgag ctatatcact atcaggagtg
tgttagaggt acgactgtac 27360tactaaaaga accttgccca tcaggaacat acgagggcaa
ttcaccattt caccctcttg 27420ctgacaataa atttgcacta acttgcacta gcacacactt
tgcttttgct tgtgctgacg 27480gtactcgaca tacctatcag ctgcgtgcaa gatcagtttc
accaaaactt ttcatcagac 27540aagaggaggt tcaacaagag ctctactcgc cactttttct
cattgttgct gctctagtat 27600ttttaatact ttgcttcacc attaagagaa agacagaatg
aatgagctca ctttaattga 27660cttctatttg tgctttttag cctttctgct attccttgtt
ttaataatgc ttattatatt 27720ttggttttca ctcgaaatcc aggatctaga agaaccttgt
accaaagtct aaacgaacat 27780gaaacttctc attgttttga cttgtatttc tctatgcagt
tgcatatgca ctgtagtaca 27840gcgctgtgca tctaataaac ctcatgtgct tgaagatcct
tgtaaggtac aacactaggg 27900gtaatactta tagcactgct tggctttgtg ctctaggaaa
ggttttacct tttcatagat 27960ggcacactat ggttcaaaca tgcacaccta atgttactat
caactgtcaa gatccagctg 28020gtggtgcgct tatagctagg tgttggtacc ttcatgaagg
tcaccaaact gctgcattta 28080gagacgtact tgttgtttta aataaacgaa caaattaaaa
tgtctgataa tggaccccaa 28140tcaaaccaac gtagtgcccc ccgcattaca tttggtggac
ccacagattc aactgacaat 28200aaccagaatg gaggacgcaa tggggcaagg ccaaaacagc
gccgacccca aggtttaccc 28260aataatactg cgtcttggtt cacagctctc actcagcatg
gcaaggagga acttagattc 28320cctcgaggcc agggcgttcc aatcaacacc aatagtggtc
cagatgacca aattggctac 28380taccgaagag ctacccgacg agttcgtggt ggtgacggca
aaatgaaaga gctcagcccc 28440agatggtact tctattacct aggaactggc ccagaagctt
cacttcccta cggcgctaac 28500aaagaaggca tcgtatgggt tgcaactgag ggagccttga
atacacccaa agaccacatt 28560ggcacccgca atcctaataa caatgctgcc accgtgctac
aacttcctca aggaacaaca 28620ttgccaaaag gcttctacgc agagggaagc agaggcggca
gtcaagcctc ttctcgctcc 28680tcatcacgta gtcgcggtaa ttcaagaaat tcaactcctg
gcagcagtag gggaaattct 28740cctgctcgaa tggctagcgg aggtggtgaa actgccctcg
cgctattgct gctagacaga 28800ttgaaccagc ttgagagcaa agtttctggt aaaggccaac
aacaacaagg ccaaactgtc 28860actaagaaat ctgctgctga ggcatctaaa aagcctcgcc
aaaaacgtac tgccacaaaa 28920cagtacaacg tcactcaagc atttgggaga cgtggtccag
aacaaaccca aggaaatttc 28980ggggaccaag acctaatcag acaaggaact gattacaaac
attggccgca aattgcacaa 29040tttgctccaa gtgcctctgc attctttgga atgtcacgca
ttggcatgga agtcacacct 29100tcgggaacat ggctgactta tcatggagcc attaaattgg
atgacaaaga tccacaattc 29160aaagacaacg tcatactgct gaacaagcac attgacgcat
acaaaacatt cccaccaaca 29220gagcctaaaa aggacaaaaa gaaaaagact gatgaagctc
agcctttgcc gcagagacaa 29280aagaagcagc ccactgtgac tcttcttcct gcggctgaca
tggatgattt ctccagacaa 29340cttcaaaatt ccatgagtgg agcttctgct gattcaactc
aggcataaac actcatgatg 29400accacacaag gcagatgggc tatgtaaacg ttttcgcaat
tccgtttacg atacatagtc 29460tactcttgtg cagaatgaat tctcgtaact aaacagcaca
agtaggttta gttaacttta 29520atctcacata gcaatcttta atcaatgtgt aacattaggg
aggacttgaa agagccacca 29580cattttcatc gaggccacgc ggagtacgat cgagggtaca
gtgaataatg ctagggagag 29640ctgcctatat ggaagagccc taatgtgtaa aattaatttt
agtagtgcta tccccatgtg 29700attttaatag cttcttagga gaatgacaaa aaaaaaaaaa
aaaaaa 2974623945DNACORONAVIRUSCDS(89)..(3853)
2ttctcttctg gaaaaaggta ggcttatcat tagagaaaac aacagagttg tggtttcaag
60tgatattctt gttaacaact aaacgaac atg ttt att ttc tta tta ttt ctt
112 Met Phe Ile Phe Leu Leu Phe Leu
1 5act ctc act agt ggt agt gac ctt
gac cgg tgc acc act ttt gat gat 160Thr Leu Thr Ser Gly Ser Asp Leu
Asp Arg Cys Thr Thr Phe Asp Asp 10 15
20gtt caa gct cct aat tac act caa cat act tca tct atg agg ggg gtt
208Val Gln Ala Pro Asn Tyr Thr Gln His Thr Ser Ser Met Arg Gly Val25
30 35 40tac tat cct gat gaa
att ttt aga tca gac act ctt tat tta act cag 256Tyr Tyr Pro Asp Glu
Ile Phe Arg Ser Asp Thr Leu Tyr Leu Thr Gln 45
50 55gat tta ttt ctt cca ttt tat tct aat gtt aca
ggg ttt cat act att 304Asp Leu Phe Leu Pro Phe Tyr Ser Asn Val Thr
Gly Phe His Thr Ile 60 65
70aat cat acg ttt ggc aac cct gtc ata cct ttt aag gat ggt att tat
352Asn His Thr Phe Gly Asn Pro Val Ile Pro Phe Lys Asp Gly Ile Tyr
75 80 85ttt gct gcc aca gag aaa tca aat
gtt gtc cgt ggt tgg gtt ttt ggt 400Phe Ala Ala Thr Glu Lys Ser Asn
Val Val Arg Gly Trp Val Phe Gly 90 95
100tct acc atg aac aac aag tca cag tcg gtg att att att aac aat tct
448Ser Thr Met Asn Asn Lys Ser Gln Ser Val Ile Ile Ile Asn Asn Ser105
110 115 120act aat gtt gtt
ata cga gca tgt aac ttt gaa ttg tgt gac aac cct 496Thr Asn Val Val
Ile Arg Ala Cys Asn Phe Glu Leu Cys Asp Asn Pro 125
130 135ttc ttt gct gtt tct aaa ccc atg ggt aca
cag aca cat act atg ata 544Phe Phe Ala Val Ser Lys Pro Met Gly Thr
Gln Thr His Thr Met Ile 140 145
150ttc gat aat gca ttt aat tgc act ttc gag tac ata tct gat gcc ttt
592Phe Asp Asn Ala Phe Asn Cys Thr Phe Glu Tyr Ile Ser Asp Ala Phe
155 160 165tcg ctt gat gtt tca gaa aag
tca ggt aat ttt aaa cac tta cga gag 640Ser Leu Asp Val Ser Glu Lys
Ser Gly Asn Phe Lys His Leu Arg Glu 170 175
180ttt gtg ttt aaa aat aaa gat ggg ttt ctc tat gtt tat aag ggc tat
688Phe Val Phe Lys Asn Lys Asp Gly Phe Leu Tyr Val Tyr Lys Gly Tyr185
190 195 200caa cct ata gat
gta gtt cgt gat cta cct tct ggt ttt aac act ttg 736Gln Pro Ile Asp
Val Val Arg Asp Leu Pro Ser Gly Phe Asn Thr Leu 205
210 215aaa cct att ttt aag ttg cct ctt ggt att
aac att aca aat ttt aga 784Lys Pro Ile Phe Lys Leu Pro Leu Gly Ile
Asn Ile Thr Asn Phe Arg 220 225
230gcc att ctt aca gcc ttt tca cct gct caa gac att tgg ggc acg tca
832Ala Ile Leu Thr Ala Phe Ser Pro Ala Gln Asp Ile Trp Gly Thr Ser
235 240 245gct gca gcc tat ttt gtt ggc
tat tta aag cca act aca ttt atg ctc 880Ala Ala Ala Tyr Phe Val Gly
Tyr Leu Lys Pro Thr Thr Phe Met Leu 250 255
260aag tat gat gaa aat ggt aca atc aca gat gct gtt gat tgt tct caa
928Lys Tyr Asp Glu Asn Gly Thr Ile Thr Asp Ala Val Asp Cys Ser Gln265
270 275 280aat cca ctt gct
gaa ctc aaa tgc tct gtt aag agc ttt gag att gac 976Asn Pro Leu Ala
Glu Leu Lys Cys Ser Val Lys Ser Phe Glu Ile Asp 285
290 295aaa gga att tac cag acc tct aat ttc agg
gtt gtt ccc tca gga gat 1024Lys Gly Ile Tyr Gln Thr Ser Asn Phe Arg
Val Val Pro Ser Gly Asp 300 305
310gtt gtg aga ttc cct aat att aca aac ttg tgt cct ttt gga gag gtt
1072Val Val Arg Phe Pro Asn Ile Thr Asn Leu Cys Pro Phe Gly Glu Val
315 320 325ttt aat gct act aaa ttc cct
tct gtc tat gca tgg gag aga aaa aaa 1120Phe Asn Ala Thr Lys Phe Pro
Ser Val Tyr Ala Trp Glu Arg Lys Lys 330 335
340att tct aat tgt gtt gct gat tac tct gtg ctc tac aac tca aca ttt
1168Ile Ser Asn Cys Val Ala Asp Tyr Ser Val Leu Tyr Asn Ser Thr Phe345
350 355 360ttt tca acc ttt
aag tgc tat ggc gtt tct gcc act aag ttg aat gat 1216Phe Ser Thr Phe
Lys Cys Tyr Gly Val Ser Ala Thr Lys Leu Asn Asp 365
370 375ctt tgc ttc tcc aat gtc tat gca gat tct
ttt gta gtc aag gga gat 1264Leu Cys Phe Ser Asn Val Tyr Ala Asp Ser
Phe Val Val Lys Gly Asp 380 385
390gat gta aga caa ata gcg cca gga caa act ggt gtt att gct gat tat
1312Asp Val Arg Gln Ile Ala Pro Gly Gln Thr Gly Val Ile Ala Asp Tyr
395 400 405aat tat aaa ttg cca gat gat
ttc atg ggt tgt gtc ctt gct tgg aat 1360Asn Tyr Lys Leu Pro Asp Asp
Phe Met Gly Cys Val Leu Ala Trp Asn 410 415
420act agg aac att gat gct act tca act ggt aat tat aat tat aaa tat
1408Thr Arg Asn Ile Asp Ala Thr Ser Thr Gly Asn Tyr Asn Tyr Lys Tyr425
430 435 440agg tat ctt aga
cat ggc aag ctt agg ccc ttt gag aga gac ata tct 1456Arg Tyr Leu Arg
His Gly Lys Leu Arg Pro Phe Glu Arg Asp Ile Ser 445
450 455aat gtg cct ttc tcc cct gat ggc aaa cct
tgc acc cca cct gct ctt 1504Asn Val Pro Phe Ser Pro Asp Gly Lys Pro
Cys Thr Pro Pro Ala Leu 460 465
470aat tgt tat tgg cca tta aat gat tat ggt ttt tac acc act act ggc
1552Asn Cys Tyr Trp Pro Leu Asn Asp Tyr Gly Phe Tyr Thr Thr Thr Gly
475 480 485att ggc tac caa cct tac aga
gtt gta gta ctt tct ttt gaa ctt tta 1600Ile Gly Tyr Gln Pro Tyr Arg
Val Val Val Leu Ser Phe Glu Leu Leu 490 495
500aat gca ccg gcc acg gtt tgt gga cca aaa tta tcc act gac ctt att
1648Asn Ala Pro Ala Thr Val Cys Gly Pro Lys Leu Ser Thr Asp Leu Ile505
510 515 520aag aac cag tgt
gtc aat ttt aat ttt aat gga ctc act ggt act ggt 1696Lys Asn Gln Cys
Val Asn Phe Asn Phe Asn Gly Leu Thr Gly Thr Gly 525
530 535gtg tta act cct tct tca aag aga ttt caa
cca ttt caa caa ttt ggc 1744Val Leu Thr Pro Ser Ser Lys Arg Phe Gln
Pro Phe Gln Gln Phe Gly 540 545
550cgt gat gtt tct gat ttc act gat tcc gtt cga gat cct aaa aca tct
1792Arg Asp Val Ser Asp Phe Thr Asp Ser Val Arg Asp Pro Lys Thr Ser
555 560 565gaa ata tta gac att tca cct
tgc tct ttt ggg ggt gta agt gta att 1840Glu Ile Leu Asp Ile Ser Pro
Cys Ser Phe Gly Gly Val Ser Val Ile 570 575
580aca cct gga aca aat gct tca tct gaa gtt gct gtt cta tat caa gat
1888Thr Pro Gly Thr Asn Ala Ser Ser Glu Val Ala Val Leu Tyr Gln Asp585
590 595 600gtt aac tgc act
gat gtt tct aca gca att cat gca gat caa ctc aca 1936Val Asn Cys Thr
Asp Val Ser Thr Ala Ile His Ala Asp Gln Leu Thr 605
610 615cca gct tgg cgc ata tat tct act gga aac
aat gta ttc cag act caa 1984Pro Ala Trp Arg Ile Tyr Ser Thr Gly Asn
Asn Val Phe Gln Thr Gln 620 625
630gca ggc tgt ctt ata gga gct gag cat gtc gac act tct tat gag tgc
2032Ala Gly Cys Leu Ile Gly Ala Glu His Val Asp Thr Ser Tyr Glu Cys
635 640 645gac att cct att gga gct ggc
att tgt gct agt tac cat aca gtt tct 2080Asp Ile Pro Ile Gly Ala Gly
Ile Cys Ala Ser Tyr His Thr Val Ser 650 655
660tta tta cgt agt act agc caa aaa tct att gtg gct tat act atg tct
2128Leu Leu Arg Ser Thr Ser Gln Lys Ser Ile Val Ala Tyr Thr Met Ser665
670 675 680tta ggt gct gat
agt tca att gct tac tct aat aac acc att gct ata 2176Leu Gly Ala Asp
Ser Ser Ile Ala Tyr Ser Asn Asn Thr Ile Ala Ile 685
690 695cct act aac ttt tca att agc att act aca
gaa gta atg cct gtt tct 2224Pro Thr Asn Phe Ser Ile Ser Ile Thr Thr
Glu Val Met Pro Val Ser 700 705
710atg gct aaa acc tcc gta gat tgt aat atg tac atc tgc gga gat tct
2272Met Ala Lys Thr Ser Val Asp Cys Asn Met Tyr Ile Cys Gly Asp Ser
715 720 725act gaa tgt gct aat ttg ctt
ctc caa tat ggt agc ttt tgc aca caa 2320Thr Glu Cys Ala Asn Leu Leu
Leu Gln Tyr Gly Ser Phe Cys Thr Gln 730 735
740cta aat cgt gca ctc tca ggt att gct gct gaa cag gat cgc aac aca
2368Leu Asn Arg Ala Leu Ser Gly Ile Ala Ala Glu Gln Asp Arg Asn Thr745
750 755 760cgt gaa gtg ttc
gct caa gtc aaa caa atg tac aaa acc cca act ttg 2416Arg Glu Val Phe
Ala Gln Val Lys Gln Met Tyr Lys Thr Pro Thr Leu 765
770 775aaa tat ttt ggt ggt ttt aat ttt tca caa
ata tta cct gac cct cta 2464Lys Tyr Phe Gly Gly Phe Asn Phe Ser Gln
Ile Leu Pro Asp Pro Leu 780 785
790aag cca act aag agg tct ttt att gag gac ttg ctc ttt aat aag gtg
2512Lys Pro Thr Lys Arg Ser Phe Ile Glu Asp Leu Leu Phe Asn Lys Val
795 800 805aca ctc gct gat gct ggc ttc
atg aag caa tat ggc gaa tgc cta ggt 2560Thr Leu Ala Asp Ala Gly Phe
Met Lys Gln Tyr Gly Glu Cys Leu Gly 810 815
820gat att aat gct aga gat ctc att tgt gcg cag aag ttc aat gga ctt
2608Asp Ile Asn Ala Arg Asp Leu Ile Cys Ala Gln Lys Phe Asn Gly Leu825
830 835 840aca gtg ttg cca
cct ctg ctc act gat gat atg att gct gcc tac act 2656Thr Val Leu Pro
Pro Leu Leu Thr Asp Asp Met Ile Ala Ala Tyr Thr 845
850 855gct gct cta gtt agt ggt act gcc act gct
gga tgg aca ttt ggt gct 2704Ala Ala Leu Val Ser Gly Thr Ala Thr Ala
Gly Trp Thr Phe Gly Ala 860 865
870ggc gct gct ctt caa ata cct ttt gct atg caa atg gca tat agg ttc
2752Gly Ala Ala Leu Gln Ile Pro Phe Ala Met Gln Met Ala Tyr Arg Phe
875 880 885aat ggc att gga gtt acc caa
aat gtt ctc tat gag aac caa aaa caa 2800Asn Gly Ile Gly Val Thr Gln
Asn Val Leu Tyr Glu Asn Gln Lys Gln 890 895
900atc gcc aac caa ttt aac aag gcg att agt caa att caa gaa tca ctt
2848Ile Ala Asn Gln Phe Asn Lys Ala Ile Ser Gln Ile Gln Glu Ser Leu905
910 915 920aca aca aca tca
act gca ttg ggc aag ctg caa gac gtt gtt aac cag 2896Thr Thr Thr Ser
Thr Ala Leu Gly Lys Leu Gln Asp Val Val Asn Gln 925
930 935aat gct caa gca tta aac aca ctt gtt aaa
caa ctt agc tct aat ttt 2944Asn Ala Gln Ala Leu Asn Thr Leu Val Lys
Gln Leu Ser Ser Asn Phe 940 945
950ggt gca att tca agt gtg cta aat gat atc ctt tcg cga ctt gat aaa
2992Gly Ala Ile Ser Ser Val Leu Asn Asp Ile Leu Ser Arg Leu Asp Lys
955 960 965gtc gag gcg gag gta caa att
gac agg tta att aca ggc aga ctt caa 3040Val Glu Ala Glu Val Gln Ile
Asp Arg Leu Ile Thr Gly Arg Leu Gln 970 975
980agc ctt caa acc tat gta aca caa caa cta atc agg gct gct gaa atc
3088Ser Leu Gln Thr Tyr Val Thr Gln Gln Leu Ile Arg Ala Ala Glu Ile985
990 995 1000agg gct tct
gct aat ctt gct gct act aaa atg tct gag tgt gtt 3133Arg Ala Ser
Ala Asn Leu Ala Ala Thr Lys Met Ser Glu Cys Val 1005
1010 1015ctt gga caa tca aaa aga gtt gac ttt
tgt gga aag ggc tac cac 3178Leu Gly Gln Ser Lys Arg Val Asp Phe
Cys Gly Lys Gly Tyr His 1020 1025
1030ctt atg tcc ttc cca caa gca gcc ccg cat ggt gtt gtc ttc cta
3223Leu Met Ser Phe Pro Gln Ala Ala Pro His Gly Val Val Phe Leu
1035 1040 1045cat gtc acg
tat gtg cca tcc cag gag agg aac ttc acc aca gcg 3268His Val Thr
Tyr Val Pro Ser Gln Glu Arg Asn Phe Thr Thr Ala 1050
1055 1060cca gca att tgt cat gaa ggc aaa gca
tac ttc cct cgt gaa ggt 3313Pro Ala Ile Cys His Glu Gly Lys Ala
Tyr Phe Pro Arg Glu Gly 1065 1070
1075gtt ttt gtg ttt aat ggc act tct tgg ttt att aca cag agg aac
3358Val Phe Val Phe Asn Gly Thr Ser Trp Phe Ile Thr Gln Arg Asn
1080 1085 1090ttc ttt tct
cca caa ata att act aca gac aat aca ttt gtc tca 3403Phe Phe Ser
Pro Gln Ile Ile Thr Thr Asp Asn Thr Phe Val Ser 1095
1100 1105gga aat tgt gat gtc gtt att ggc atc
att aac aac aca gtt tat 3448Gly Asn Cys Asp Val Val Ile Gly Ile
Ile Asn Asn Thr Val Tyr 1110 1115
1120gat cct ctg caa cct gag ctt gac tca ttc aaa gaa gag ctg gac
3493Asp Pro Leu Gln Pro Glu Leu Asp Ser Phe Lys Glu Glu Leu Asp
1125 1130 1135aag tac ttc
aaa aat cat aca tca cca gat gtt gat ctt ggc gac 3538Lys Tyr Phe
Lys Asn His Thr Ser Pro Asp Val Asp Leu Gly Asp 1140
1145 1150att tca ggc att aac gct tct gtc gtc
aac att caa aaa gaa att 3583Ile Ser Gly Ile Asn Ala Ser Val Val
Asn Ile Gln Lys Glu Ile 1155 1160
1165gac cgc ctc aat gag gtc gct aaa aat tta aat gaa tca ctc att
3628Asp Arg Leu Asn Glu Val Ala Lys Asn Leu Asn Glu Ser Leu Ile
1170 1175 1180gac ctt caa
gaa ttg gga aaa tat gag caa tat att aaa tgg cct 3673Asp Leu Gln
Glu Leu Gly Lys Tyr Glu Gln Tyr Ile Lys Trp Pro 1185
1190 1195tgg tat gtt tgg ctc ggc ttc att gct
gga cta att gcc atc gtc 3718Trp Tyr Val Trp Leu Gly Phe Ile Ala
Gly Leu Ile Ala Ile Val 1200 1205
1210atg gtt aca atc ttg ctt tgt tgc atg act agt tgt tgc agt tgc
3763Met Val Thr Ile Leu Leu Cys Cys Met Thr Ser Cys Cys Ser Cys
1215 1220 1225ctc aag ggt
gca tgc tct tgt ggt tct tgc tgc aag ttt gat gag 3808Leu Lys Gly
Ala Cys Ser Cys Gly Ser Cys Cys Lys Phe Asp Glu 1230
1235 1240gat gac tct gag cca gtt ctc aag ggt
gtc aaa tta cat tac aca 3853Asp Asp Ser Glu Pro Val Leu Lys Gly
Val Lys Leu His Tyr Thr 1245 1250
1255taaacgaact tatggatttg tttatgagat tttttactct tggatcaatt
actgcacagc 3913cagtaaaaat tgacaatgct tctcctgcaa gt
394531255PRTCORONAVIRUS 3Met Phe Ile Phe Leu Leu Phe Leu Thr
Leu Thr Ser Gly Ser Asp Leu1 5 10
15Asp Arg Cys Thr Thr Phe Asp Asp Val Gln Ala Pro Asn Tyr Thr
Gln 20 25 30His Thr Ser Ser
Met Arg Gly Val Tyr Tyr Pro Asp Glu Ile Phe Arg 35
40 45Ser Asp Thr Leu Tyr Leu Thr Gln Asp Leu Phe Leu
Pro Phe Tyr Ser 50 55 60Asn Val Thr
Gly Phe His Thr Ile Asn His Thr Phe Gly Asn Pro Val65 70
75 80Ile Pro Phe Lys Asp Gly Ile Tyr
Phe Ala Ala Thr Glu Lys Ser Asn 85 90
95Val Val Arg Gly Trp Val Phe Gly Ser Thr Met Asn Asn Lys
Ser Gln 100 105 110Ser Val Ile
Ile Ile Asn Asn Ser Thr Asn Val Val Ile Arg Ala Cys 115
120 125Asn Phe Glu Leu Cys Asp Asn Pro Phe Phe Ala
Val Ser Lys Pro Met 130 135 140Gly Thr
Gln Thr His Thr Met Ile Phe Asp Asn Ala Phe Asn Cys Thr145
150 155 160Phe Glu Tyr Ile Ser Asp Ala
Phe Ser Leu Asp Val Ser Glu Lys Ser 165
170 175Gly Asn Phe Lys His Leu Arg Glu Phe Val Phe Lys
Asn Lys Asp Gly 180 185 190Phe
Leu Tyr Val Tyr Lys Gly Tyr Gln Pro Ile Asp Val Val Arg Asp 195
200 205Leu Pro Ser Gly Phe Asn Thr Leu Lys
Pro Ile Phe Lys Leu Pro Leu 210 215
220Gly Ile Asn Ile Thr Asn Phe Arg Ala Ile Leu Thr Ala Phe Ser Pro225
230 235 240Ala Gln Asp Ile
Trp Gly Thr Ser Ala Ala Ala Tyr Phe Val Gly Tyr 245
250 255Leu Lys Pro Thr Thr Phe Met Leu Lys Tyr
Asp Glu Asn Gly Thr Ile 260 265
270Thr Asp Ala Val Asp Cys Ser Gln Asn Pro Leu Ala Glu Leu Lys Cys
275 280 285Ser Val Lys Ser Phe Glu Ile
Asp Lys Gly Ile Tyr Gln Thr Ser Asn 290 295
300Phe Arg Val Val Pro Ser Gly Asp Val Val Arg Phe Pro Asn Ile
Thr305 310 315 320Asn Leu
Cys Pro Phe Gly Glu Val Phe Asn Ala Thr Lys Phe Pro Ser
325 330 335Val Tyr Ala Trp Glu Arg Lys
Lys Ile Ser Asn Cys Val Ala Asp Tyr 340 345
350Ser Val Leu Tyr Asn Ser Thr Phe Phe Ser Thr Phe Lys Cys
Tyr Gly 355 360 365Val Ser Ala Thr
Lys Leu Asn Asp Leu Cys Phe Ser Asn Val Tyr Ala 370
375 380Asp Ser Phe Val Val Lys Gly Asp Asp Val Arg Gln
Ile Ala Pro Gly385 390 395
400Gln Thr Gly Val Ile Ala Asp Tyr Asn Tyr Lys Leu Pro Asp Asp Phe
405 410 415Met Gly Cys Val Leu
Ala Trp Asn Thr Arg Asn Ile Asp Ala Thr Ser 420
425 430Thr Gly Asn Tyr Asn Tyr Lys Tyr Arg Tyr Leu Arg
His Gly Lys Leu 435 440 445Arg Pro
Phe Glu Arg Asp Ile Ser Asn Val Pro Phe Ser Pro Asp Gly 450
455 460Lys Pro Cys Thr Pro Pro Ala Leu Asn Cys Tyr
Trp Pro Leu Asn Asp465 470 475
480Tyr Gly Phe Tyr Thr Thr Thr Gly Ile Gly Tyr Gln Pro Tyr Arg Val
485 490 495Val Val Leu Ser
Phe Glu Leu Leu Asn Ala Pro Ala Thr Val Cys Gly 500
505 510Pro Lys Leu Ser Thr Asp Leu Ile Lys Asn Gln
Cys Val Asn Phe Asn 515 520 525Phe
Asn Gly Leu Thr Gly Thr Gly Val Leu Thr Pro Ser Ser Lys Arg 530
535 540Phe Gln Pro Phe Gln Gln Phe Gly Arg Asp
Val Ser Asp Phe Thr Asp545 550 555
560Ser Val Arg Asp Pro Lys Thr Ser Glu Ile Leu Asp Ile Ser Pro
Cys 565 570 575Ser Phe Gly
Gly Val Ser Val Ile Thr Pro Gly Thr Asn Ala Ser Ser 580
585 590Glu Val Ala Val Leu Tyr Gln Asp Val Asn
Cys Thr Asp Val Ser Thr 595 600
605Ala Ile His Ala Asp Gln Leu Thr Pro Ala Trp Arg Ile Tyr Ser Thr 610
615 620Gly Asn Asn Val Phe Gln Thr Gln
Ala Gly Cys Leu Ile Gly Ala Glu625 630
635 640His Val Asp Thr Ser Tyr Glu Cys Asp Ile Pro Ile
Gly Ala Gly Ile 645 650
655Cys Ala Ser Tyr His Thr Val Ser Leu Leu Arg Ser Thr Ser Gln Lys
660 665 670Ser Ile Val Ala Tyr Thr
Met Ser Leu Gly Ala Asp Ser Ser Ile Ala 675 680
685Tyr Ser Asn Asn Thr Ile Ala Ile Pro Thr Asn Phe Ser Ile
Ser Ile 690 695 700Thr Thr Glu Val Met
Pro Val Ser Met Ala Lys Thr Ser Val Asp Cys705 710
715 720Asn Met Tyr Ile Cys Gly Asp Ser Thr Glu
Cys Ala Asn Leu Leu Leu 725 730
735Gln Tyr Gly Ser Phe Cys Thr Gln Leu Asn Arg Ala Leu Ser Gly Ile
740 745 750Ala Ala Glu Gln Asp
Arg Asn Thr Arg Glu Val Phe Ala Gln Val Lys 755
760 765Gln Met Tyr Lys Thr Pro Thr Leu Lys Tyr Phe Gly
Gly Phe Asn Phe 770 775 780Ser Gln Ile
Leu Pro Asp Pro Leu Lys Pro Thr Lys Arg Ser Phe Ile785
790 795 800Glu Asp Leu Leu Phe Asn Lys
Val Thr Leu Ala Asp Ala Gly Phe Met 805
810 815Lys Gln Tyr Gly Glu Cys Leu Gly Asp Ile Asn Ala
Arg Asp Leu Ile 820 825 830Cys
Ala Gln Lys Phe Asn Gly Leu Thr Val Leu Pro Pro Leu Leu Thr 835
840 845Asp Asp Met Ile Ala Ala Tyr Thr Ala
Ala Leu Val Ser Gly Thr Ala 850 855
860Thr Ala Gly Trp Thr Phe Gly Ala Gly Ala Ala Leu Gln Ile Pro Phe865
870 875 880Ala Met Gln Met
Ala Tyr Arg Phe Asn Gly Ile Gly Val Thr Gln Asn 885
890 895Val Leu Tyr Glu Asn Gln Lys Gln Ile Ala
Asn Gln Phe Asn Lys Ala 900 905
910Ile Ser Gln Ile Gln Glu Ser Leu Thr Thr Thr Ser Thr Ala Leu Gly
915 920 925Lys Leu Gln Asp Val Val Asn
Gln Asn Ala Gln Ala Leu Asn Thr Leu 930 935
940Val Lys Gln Leu Ser Ser Asn Phe Gly Ala Ile Ser Ser Val Leu
Asn945 950 955 960Asp Ile
Leu Ser Arg Leu Asp Lys Val Glu Ala Glu Val Gln Ile Asp
965 970 975Arg Leu Ile Thr Gly Arg Leu
Gln Ser Leu Gln Thr Tyr Val Thr Gln 980 985
990Gln Leu Ile Arg Ala Ala Glu Ile Arg Ala Ser Ala Asn Leu
Ala Ala 995 1000 1005Thr Lys Met
Ser Glu Cys Val Leu Gly Gln Ser Lys Arg Val Asp 1010
1015 1020Phe Cys Gly Lys Gly Tyr His Leu Met Ser Phe
Pro Gln Ala Ala 1025 1030 1035Pro His
Gly Val Val Phe Leu His Val Thr Tyr Val Pro Ser Gln 1040
1045 1050Glu Arg Asn Phe Thr Thr Ala Pro Ala Ile
Cys His Glu Gly Lys 1055 1060 1065Ala
Tyr Phe Pro Arg Glu Gly Val Phe Val Phe Asn Gly Thr Ser 1070
1075 1080Trp Phe Ile Thr Gln Arg Asn Phe Phe
Ser Pro Gln Ile Ile Thr 1085 1090
1095Thr Asp Asn Thr Phe Val Ser Gly Asn Cys Asp Val Val Ile Gly
1100 1105 1110Ile Ile Asn Asn Thr Val
Tyr Asp Pro Leu Gln Pro Glu Leu Asp 1115 1120
1125Ser Phe Lys Glu Glu Leu Asp Lys Tyr Phe Lys Asn His Thr
Ser 1130 1135 1140Pro Asp Val Asp Leu
Gly Asp Ile Ser Gly Ile Asn Ala Ser Val 1145 1150
1155Val Asn Ile Gln Lys Glu Ile Asp Arg Leu Asn Glu Val
Ala Lys 1160 1165 1170Asn Leu Asn Glu
Ser Leu Ile Asp Leu Gln Glu Leu Gly Lys Tyr 1175
1180 1185Glu Gln Tyr Ile Lys Trp Pro Trp Tyr Val Trp
Leu Gly Phe Ile 1190 1195 1200Ala Gly
Leu Ile Ala Ile Val Met Val Thr Ile Leu Leu Cys Cys 1205
1210 1215Met Thr Ser Cys Cys Ser Cys Leu Lys Gly
Ala Cys Ser Cys Gly 1220 1225 1230Ser
Cys Cys Lys Phe Asp Glu Asp Asp Ser Glu Pro Val Leu Lys 1235
1240 1245Gly Val Lys Leu His Tyr Thr 1250
125543943DNACORONAVIRUS 4ctcttctgga aaaaggtagg cttatcatta
gagaaaacaa cagagttgtg gtttcaagtg 60atattcttgt taacaactaa acgaacatgt
ttattttctt attatttctt actctcacta 120gtggtagtga ccttgaccgg tgcaccactt
ttgatgatgt tcaagctcct aattacactc 180aacatacttc atctatgagg ggggtttact
atcctgatga aatttttaga tcagacactc 240tttatttaac tcaggattta tttcttccat
tttattctaa tgttacaggg tttcatacta 300ttaatcatac gtttggcaac cctgtcatac
cttttaagga tggtatttat tttgctgcca 360cagagaaatc aaatgttgtc cgtggttggg
tttttggttc taccatgaac aacaagtcac 420agtcggtgat tattattaac aattctacta
atgttgttat acgagcatgt aactttgaat 480tgtgtgacaa ccctttcttt gctgtttcta
aacccatggg tacacagaca catactatga 540tattcgataa tgcatttaat tgcactttcg
agtacatatc tgatgccttt tcgcttgatg 600tttcagaaaa gtcaggtaat tttaaacact
tacgagagtt tgtgtttaaa aataaagatg 660ggtttctcta tgtttataag ggctatcaac
ctatagatgt agttcgtgat ctaccttctg 720gttttaacac tttgaaacct atttttaagt
tgcctcttgg tattaacatt acaaatttta 780gagccattct tacagccttt tcacctgctc
aagacatttg gggcacgtca gctgcagcct 840attttgttgg ctatttaaag ccaactacat
ttatgctcaa gtatgatgaa aatggtacaa 900tcacagatgc tgttgattgt tctcaaaatc
cacttgctga actcaaatgc tctgttaaga 960gctttgagat tgacaaagga atttaccaga
cctctaattt cagggttgtt ccctcaggag 1020atgttgtgag attccctaat attacaaact
tgtgtccttt tggagaggtt tttaatgcta 1080ctaaattccc ttctgtctat gcatgggaga
gaaaaaaaat ttctaattgt gttgctgatt 1140actctgtgct ctacaactca acattttttt
caacctttaa gtgctatggc gtttctgcca 1200ctaagttgaa tgatctttgc ttctccaatg
tctatgcaga ttcttttgta gtcaagggag 1260atgatgtaag acaaatagcg ccaggacaaa
ctggtgttat tgctgattat aattataaat 1320tgccagatga tttcatgggt tgtgtccttg
cttggaatac taggaacatt gatgctactt 1380caactggtaa ttataattat aaatataggt
atcttagaca tggcaagctt aggccctttg 1440agagagacat atctaatgtg cctttctccc
ctgatggcaa accttgcacc ccacctgctc 1500ttaattgtta ttggccatta aatgattatg
gtttttacac cactactggc attggctacc 1560aaccttacag agttgtagta ctttcttttg
aacttttaaa tgcaccggcc acggtttgtg 1620gaccaaaatt atccactgac cttattaaga
accagtgtgt caattttaat tttaatggac 1680tcactggtac tggtgtgtta actccttctt
caaagagatt tcaaccattt caacaatttg 1740gccgtgatgt ctctgatttc actgattccg
ttcgagatcc taaaacatct gaaatattag 1800acatttcacc ttgctctttt gggggtgtaa
gtgtaattac acctggaaca aatgcttcat 1860ctgaagttgc tgttctatat caagatgtta
actgcactga tgtttctaca gcaatccatg 1920cagatcaact cacaccagct tggcgcatat
attctactgg aaacaatgta ttccagactc 1980aagcaggctg tcttatagga gctgagcatg
tcgacacttc ttatgagtgc gacattccta 2040ttggagctgg catttgtgct agttaccata
cagtttcttt attacgtagt actagccaaa 2100aatctattgt ggcttatact atgtctttag
gtgctgatag ttcaattgct tactctaata 2160acaccattgc tatacctact aacttttcaa
ttagcattac tacagaagta atgcctgttt 2220ctatggctaa aacctccgta gattgtaata
tgtacatctg cggagattct actgaatgtg 2280ctaatttgct tctccaatat ggtagctttt
gcacacaact aaatcgtgca ctctcaggta 2340ttgctgctga acaggatcgc aacacacgtg
aagtgttcgc tcaagtcaaa caaatgtaca 2400aaaccccaac tttgaaatat tttggtggtt
ttaatttttc acaaatatta cctgaccctc 2460taaagccaac taagaggtct tttattgagg
acttgctctt taataaggtg acactcgctg 2520atgctggctt catgaagcaa tatggcgaat
gcctaggtga tattaatgct agagatctca 2580tttgtgcgca gaagttcaat gggcttacag
tgttgccacc tctgctcact gatgatatga 2640ttgctgccta cactgctgct ctagttagtg
gtactgccac tgctggatgg acatttggtg 2700ctggcgctgc tcttcaaata ccttttgcta
tgcaaatggc atataggttc aatggcattg 2760gagttaccca aaatgttctc tatgagaacc
aaaaacaaat cgccaaccaa tttaacaagg 2820cgattagtca aattcaagaa tcacttacaa
caacatcaac tgcattgggc aagctgcaag 2880acgttgttaa ccagaatgct caagcattaa
acacacttgt taaacaactt agctctaatt 2940ttggtgcaat ttcaagtgtg ctaaatgata
tcctttcgcg acttgataaa gtcgaggcgg 3000aggtacaaat tgacaggcta attacaggca
gacttcaaag ccttcaaacc tatgtaacac 3060aacaactaat cagggctgct gaaatcaggg
cttctgctaa tcttgctgct actaaaatgt 3120ctgagtgtgt tcttggacaa tcaaaaagag
ttgacttttg tggaaagggc taccacctta 3180tgtccttccc acaagcagcc ccgcatggtg
ttgtcttcct acatgtcacg tatgtgccat 3240cccaggagag gaacttcacc acagcgccag
caatttgtca tgaaggcaaa gcatacttcc 3300ctcgtgaagg tgtttttgtg tttaatggca
cttcttggtt tattacacag aggaacttct 3360tttctccaca aataattact acagacaata
catttgtctc aggaaattgt gatgtcgtta 3420ttggcatcat taacaacaca gtttatgatc
ctctgcaacc tgagcttgac tcattcaaag 3480aagagctgga caagtacttc aaaaatcata
catcaccaga tgttgatctt ggcgacattt 3540caggcattaa cgcttctgtc gtcaacattc
aaaaagaaat tgaccgcctc aatgaggtcg 3600ctaaaaattt aaatgaatca ctcattgacc
ttcaagaatt gggaaaatat gagcaatata 3660ttaaatggcc ttggtatgtt tggctcggct
tcattgctgg actaattgcc atcgtcatgg 3720ttacaatctt gctttgttgc atgactagtt
gttgcagttg cctcaagggt gcatgctctt 3780gtggttcttg ctgcaagttt gatgaggatg
actctgagcc agttctcaag ggtgtcaaat 3840tacattacac ataaacgaac ttatggattt
gtttatgaga ttttttactc ttggatcaat 3900tactgcacag ccagtaaaaa ttgacaatgc
ttctcctgca agt 394352049DNACORONAVIRUS 5ctcttctgga
aaaaggtagg cttatcatta gagaaaacaa cagagttgtg gtttcaagtg 60atattcttgt
taacaactaa acgaacatgt ttattttctt attatttctt actctcacta 120gtggtagtga
ccttgaccgg tgcaccactt ttgatgatgt tcaagctcct aattacactc 180aacatacttc
atctatgagg ggggtttact atcctgatga aatttttaga tcagacactc 240tttatttaac
tcaggattta tttcttccat tttattctaa tgttacaggg tttcatacta 300ttaatcatac
gtttggcaac cctgtcatac cttttaagga tggtatttat tttgctgcca 360cagagaaatc
aaatgttgtc cgtggttggg tttttggttc taccatgaac aacaagtcac 420agtcggtgat
tattattaac aattctacta atgttgttat acgagcatgt aactttgaat 480tgtgtgacaa
ccctttcttt gctgtttcta aacccatggg tacacagaca catactatga 540tattcgataa
tgcatttaat tgcactttcg agtacatatc tgatgccttt tcgcttgatg 600tttcagaaaa
gtcaggtaat tttaaacact tacgagagtt tgtgtttaaa aataaagatg 660ggtttctcta
tgtttataag ggctatcaac ctatagatgt agttcgtgat ctaccttctg 720gttttaacac
tttgaaacct atttttaagt tgcctcttgg tattaacatt acaaatttta 780gagccattct
tacagccttt tcacctgctc aagacatttg gggcacgtca gctgcagcct 840attttgttgg
ctatttaaag ccaactacat ttatgctcaa gtatgatgaa aatggtacaa 900tcacagatgc
tgttgattgt tctcaaaatc cacttgctga actcaaatgc tctgttaaga 960gctttgagat
tgacaaagga atttaccaga cctctaattt cagggttgtt ccctcaggag 1020atgttgtgag
attccctaat attacaaact tgtgtccttt tggagaggtt tttaatgcta 1080ctaaattccc
ttctgtctat gcatgggaga gaaaaaaaat ttctaattgt gttgctgatt 1140actctgtgct
ctacaactca acattttttt caacctttaa gtgctatggc gtttctgcca 1200ctaagttgaa
tgatctttgc ttctccaatg tctatgcaga ttcttttgta gtcaagggag 1260atgatgtaag
acaaatagcg ccaggacaaa ctggtgttat tgctgattat aattataaat 1320tgccagatga
tttcatgggt tgtgtccttg cttggaatac taggaacatt gatgctactt 1380caactggtaa
ttataattat aaatataggt atcttagaca tggcaagctt aggccctttg 1440agagagacat
atctaatgtg cctttctccc ctgatggcaa accttgcacc ccacctgctc 1500ttaattgtta
ttggccatta aatgattatg gtttttacac cactactggc attggctacc 1560aaccttacag
agttgtagta ctttcttttg aacttttaaa tgcaccggcc acggtttgtg 1620gaccaaaatt
atccactgac cttattaaga accagtgtgt caattttaat tttaatggac 1680tcactggtac
tggtgtgtta actccttctt caaagagatt tcaaccattt caacaatttg 1740gccgtgatgt
ctctgatttc actgattccg ttcgagatcc taaaacatct gaaatattag 1800acatttcacc
ttgctctttt gggggtgtaa gtgtaattac acctggaaca aatgcttcat 1860ctgaagttgc
tgttctatat caagatgtta actgcactga tgtttctaca gcaatccatg 1920cagatcaact
cacaccagct tggcgcatat attctactgg aaacaatgta ttccagactc 1980aagcaggctg
tcttatagga gctgagcatg tcgacacttc ttatgagtgc gacattccta 2040ttggagctg
204962027DNACORONAVIRUS 6catgcagatc aactcacacc agcttggcgc atatattcta
ctggaaacaa tgtattccag 60actcaagcag gctgtcttat aggagctgag catgtcgaca
cttcttatga gtgcgacatt 120cctattggag ctggcatttg tgctagttac catacagttt
ctttattacg tagtactagc 180caaaaatcta ttgtggctta tactatgtct ttaggtgctg
atagttcaat tgcttactct 240aataacacca ttgctatacc tactaacttt tcaattagca
ttactacaga agtaatgcct 300gtttctatgg ctaaaacctc cgtagattgt aatatgtaca
tctgcggaga ttctactgaa 360tgtgctaatt tgcttctcca atatggtagc ttttgcacac
aactaaatcg tgcactctca 420ggtattgctg ctgaacagga tcgcaacaca cgtgaagtgt
tcgctcaagt caaacaaatg 480tacaaaaccc caactttgaa atattttggt ggttttaatt
tttcacaaat attacctgac 540cctctaaagc caactaagag gtcttttatt gaggacttgc
tctttaataa ggtgacactc 600gctgatgctg gcttcatgaa gcaatatggc gaatgcctag
gtgatattaa tgctagagat 660ctcatttgtg cgcagaagtt caatgggctt acagtgttgc
cacctctgct cactgatgat 720atgattgctg cctacactgc tgctctagtt agtggtactg
ccactgctgg atggacattt 780ggtgctggcg ctgctcttca aatacctttt gctatgcaaa
tggcatatag gttcaatggc 840attggagtta cccaaaatgt tctctatgag aaccaaaaac
aaatcgccaa ccaatttaac 900aaggcgatta gtcaaattca agaatcactt acaacaacat
caactgcatt gggcaagctg 960caagacgttg ttaaccagaa tgctcaagca ttaaacacac
ttgttaaaca acttagctct 1020aattttggtg caatttcaag tgtgctaaat gatatccttt
cgcgacttga taaagtcgag 1080gcggaggtac aaattgacag gttaattaca ggcagacttc
aaagccttca aacctatgta 1140acacaacaac taatcagggc tgctgaaatc agggcttctg
ctaatcttgc tgctactaaa 1200atgtctgagt gtgttcttgg acaatcaaaa agagttgact
tttgtggaaa gggctaccac 1260cttatgtcct tcccacaagc agccccgcat ggtgttgtct
tcctacatgt cacgtatgtg 1320ccatcccagg agaggaactt caccacagcg ccagcaattt
gtcatgaagg caaagcatac 1380ttccctcgtg aaggtgtttt tgtgtttaat ggcacttctt
ggtttattac acagaggaac 1440ttcttttctc cacaaataat tactacagac aatacatttg
tctcaggaaa ttgtgatgtc 1500gttattggcg tcattaacaa cacagtttat gatcctctgc
aacctgagct tgactcattc 1560aaagaagagc tggacaagta cttcaaaaat catacatcac
cagatgttga tcttggcgac 1620atttcaggca ttaacgcttc tgtcgtcaac attcaaaaag
aaattgaccg cctcaatgag 1680gtcgctaaaa atttaaatga atcactcatt gaccttcaag
aattgggaaa atatgagcaa 1740tatattaaat ggccttggta tgtttggctc ggcttcattg
ctggactaat tgccatcgtc 1800atggttacaa tcttgctttg ttgcatgact agttgttgca
gttgcctcaa gggtgcatgc 1860tcttgtggtt cttgctgcaa gtttgatgag gatgactctg
agccagttct caagggtgtc 1920aaattacatt acacataaac gaacttatgg atttgtttat
gagatttttt actcttggat 1980caattactgc acagccagta aaaattgaca atgcttctcc
tgcaagt 202771096DNACORONAVIRUS 7tcttgctttg ttgcatgact
agttgttgca gttgcctcaa gggtgcatgc tcttgtggtt 60cttgctgcaa gtttgatgag
gatgactctg agccagttct caagggtgtc aaattacatt 120acacataaac gaacttatgg
atttgtttat gagatttttt actcttggat caattactgc 180acagccagta aaaattgaca
atgcttctcc tgcaagtact gttcatgcta cagcaacgat 240accgctacaa gcctcactcc
ctttcggatg gcttgttatt ggcgttgcat ttcttgctgt 300ttttcagagc gctaccaaaa
taattgcgct caataaaaga tggcagctag ccctttataa 360gggcttccag ttcatttgca
atttactgct gctatttgtt accatctatt cacatctttt 420gcttgtcgct gcaggtatgg
aggcgcaatt tttgtacctc tatgccttga tatattttct 480acaatgcatc aacgcatgta
gaattattat gagatgttgg ctttgttgga agtgcaaatc 540caagaaccca ttactttatg
atgccaacta ctttgtttgc tggcacacac ataactatga 600ctactgtata ccatataaca
gtgtcacaga tacaattgtc gttactgaag gtgacggcat 660ttcaacacca aaactcaaag
aagactacca aattggtggt tattctgagg ataggcactc 720aggtgttaaa gactatgtcg
ttgtacatgg ctatttcacc gaagtttact accagcttga 780gtctacacaa attactacag
acactggtat tgaaaatgct acattcttca tctttaacaa 840gcttgttaaa gacccaccga
atgtgcaaat acacacaatc gacggctctt caggagttgc 900taatccagca atggatccaa
tttatgatga gccgacgacg actactagcg tgcctttgta 960agcacaagaa agtgagtacg
aacttatgta ctcattcgtt tcggaagaaa caggtacgtt 1020aatagttaat agcgtacttc
tttttcttgc tttcgtggta ttcttgctag tcacactagc 1080catccttact gcgctt
109681135DNACORONAVIRUS
8attgccatcg tcatggttac aatcttgctt tgttgcatga ctagttgttg cagttgcctc
60aagggtgcat gctcttgtgg ttcttgctgc aagtttgatg aggatgactc tgagccagtt
120ctcaagggtg tcaaattaca ttacacataa acgaacttat ggatttgttt atgagatttt
180ttactcttgg atcaattact gcacagccag taaaaattga caatgcttct cctgcaagta
240ctgttcatgc tacagcaacg ataccgctac aagcctcact ccctttcgga tggcttgtta
300ttggcgttgc atttcttgct gtttttcaga gcgctaccaa aataattgcg ctcaataaaa
360gatggcagct agccctttat aagggcttcc agttcatttg caatttactg ctgctatttg
420ttaccatcta ttcacatctt ttgcttgtcg ctgcaggtat ggaggcgcaa tttttgtacc
480tctatgcctt gatatatttt ctacaatgca tcaacgcatg tagaattatt atgagatgtt
540ggctttgttg gaagtgcaaa tccaagaacc cattacttta tgatgccaac tactttgttt
600gctggcacac acataactat gactactgta taccatataa cagtgtcaca gatacaattg
660tcgttactga aggtgacggc atttcaacac caaaactcaa agaagactac caaattggtg
720gttattctga ggataggcac tcaggtgtta aagactatgt cgttgtacat ggctatttca
780ccgaagttta ctaccagctt gagtctacac aaattactac agacactggt attgaaaatg
840ctacattctt catctttaac aagcttgtta aagacccacc gaatgtgcaa atacacacaa
900tcgacggctc ttcaggagtt gctaatccag caatggatcc aatttatgat gagccgacga
960cgactactag cgtgcctttg taagcacaag aaagtgagta cgaacttatg tactcattcg
1020tttcggaaga aacaggtacg ttaatagtta atagcgtact tctttttctt gctttcgtgg
1080tattcttgct agtcacacta gccatcctta ctgcgcttcg attgtgtgcg tactg
113591096DNACORONAVIRUSCDS(137)..(958) 9tcttgctttg ttgcatgact agttgttgca
gttgcctcaa gggtgcatgc tcttgtggtt 60cttgctgcaa gtttgatgag gatgactctg
agccagttct caagggtgtc aaattacatt 120acacataaac gaactt atg gat ttg ttt
atg aga ttt ttt act ctt gga tca 172 Met Asp Leu Phe
Met Arg Phe Phe Thr Leu Gly Ser 1 5
10att act gca cag cca gta aaa att gac aat gct tct cct gca agt
act 220Ile Thr Ala Gln Pro Val Lys Ile Asp Asn Ala Ser Pro Ala Ser
Thr 15 20 25gtt cat gct aca gca
acg ata ccg cta caa gcc tca ctc cct ttc gga 268Val His Ala Thr Ala
Thr Ile Pro Leu Gln Ala Ser Leu Pro Phe Gly 30 35
40tgg ctt gtt att ggc gtt gca ttt ctt gct gtt ttt cag agc
gct acc 316Trp Leu Val Ile Gly Val Ala Phe Leu Ala Val Phe Gln Ser
Ala Thr45 50 55 60aaa
ata att gcg ctc aat aaa aga tgg cag cta gcc ctt tat aag ggc 364Lys
Ile Ile Ala Leu Asn Lys Arg Trp Gln Leu Ala Leu Tyr Lys Gly
65 70 75ttc cag ttc att tgc aat tta
ctg ctg cta ttt gtt acc atc tat tca 412Phe Gln Phe Ile Cys Asn Leu
Leu Leu Leu Phe Val Thr Ile Tyr Ser 80 85
90cat ctt ttg ctt gtc gct gca ggt atg gag gcg caa ttt ttg
tac ctc 460His Leu Leu Leu Val Ala Ala Gly Met Glu Ala Gln Phe Leu
Tyr Leu 95 100 105tat gcc ttg ata
tat ttt cta caa tgc atc aac gca tgt aga att att 508Tyr Ala Leu Ile
Tyr Phe Leu Gln Cys Ile Asn Ala Cys Arg Ile Ile 110
115 120atg aga tgt tgg ctt tgt tgg aag tgc aaa tcc aag
aac cca tta ctt 556Met Arg Cys Trp Leu Cys Trp Lys Cys Lys Ser Lys
Asn Pro Leu Leu125 130 135
140tat gat gcc aac tac ttt gtt tgc tgg cac aca cat aac tat gac tac
604Tyr Asp Ala Asn Tyr Phe Val Cys Trp His Thr His Asn Tyr Asp Tyr
145 150 155tgt ata cca tat aac
agt gtc aca gat aca att gtc gtt act gaa ggt 652Cys Ile Pro Tyr Asn
Ser Val Thr Asp Thr Ile Val Val Thr Glu Gly 160
165 170gac ggc att tca aca cca aaa ctc aaa gaa gac tac
caa att ggt ggt 700Asp Gly Ile Ser Thr Pro Lys Leu Lys Glu Asp Tyr
Gln Ile Gly Gly 175 180 185tat tct
gag gat agg cac tca ggt gtt aaa gac tat gtc gtt gta cat 748Tyr Ser
Glu Asp Arg His Ser Gly Val Lys Asp Tyr Val Val Val His 190
195 200ggc tat ttc acc gaa gtt tac tac cag ctt gag
tct aca caa att act 796Gly Tyr Phe Thr Glu Val Tyr Tyr Gln Leu Glu
Ser Thr Gln Ile Thr205 210 215
220aca gac act ggt att gaa aat gct aca ttc ttc atc ttt aac aag ctt
844Thr Asp Thr Gly Ile Glu Asn Ala Thr Phe Phe Ile Phe Asn Lys Leu
225 230 235gtt aaa gac cca ccg
aat gtg caa ata cac aca atc gac ggc tct tca 892Val Lys Asp Pro Pro
Asn Val Gln Ile His Thr Ile Asp Gly Ser Ser 240
245 250gga gtt gct aat cca gca atg gat cca att tat gat
gag ccg acg acg 940Gly Val Ala Asn Pro Ala Met Asp Pro Ile Tyr Asp
Glu Pro Thr Thr 255 260 265act act
agc gtg cct ttg taagcacaag aaagtgagta cgaacttatg 988Thr Thr
Ser Val Pro Leu 270tactcattcg tttcggaaga aacaggtacg ttaatagtta
atagcgtact tctttttctt 1048gctttcgtgg tattcttgct agtcacacta gccatcctta
ctgcgctt 109610274PRTCORONAVIRUS 10Met Asp Leu Phe Met Arg
Phe Phe Thr Leu Gly Ser Ile Thr Ala Gln1 5
10 15Pro Val Lys Ile Asp Asn Ala Ser Pro Ala Ser Thr
Val His Ala Thr 20 25 30Ala
Thr Ile Pro Leu Gln Ala Ser Leu Pro Phe Gly Trp Leu Val Ile 35
40 45Gly Val Ala Phe Leu Ala Val Phe Gln
Ser Ala Thr Lys Ile Ile Ala 50 55
60Leu Asn Lys Arg Trp Gln Leu Ala Leu Tyr Lys Gly Phe Gln Phe Ile65
70 75 80Cys Asn Leu Leu Leu
Leu Phe Val Thr Ile Tyr Ser His Leu Leu Leu 85
90 95Val Ala Ala Gly Met Glu Ala Gln Phe Leu Tyr
Leu Tyr Ala Leu Ile 100 105
110Tyr Phe Leu Gln Cys Ile Asn Ala Cys Arg Ile Ile Met Arg Cys Trp
115 120 125Leu Cys Trp Lys Cys Lys Ser
Lys Asn Pro Leu Leu Tyr Asp Ala Asn 130 135
140Tyr Phe Val Cys Trp His Thr His Asn Tyr Asp Tyr Cys Ile Pro
Tyr145 150 155 160Asn Ser
Val Thr Asp Thr Ile Val Val Thr Glu Gly Asp Gly Ile Ser
165 170 175Thr Pro Lys Leu Lys Glu Asp
Tyr Gln Ile Gly Gly Tyr Ser Glu Asp 180 185
190Arg His Ser Gly Val Lys Asp Tyr Val Val Val His Gly Tyr
Phe Thr 195 200 205Glu Val Tyr Tyr
Gln Leu Glu Ser Thr Gln Ile Thr Thr Asp Thr Gly 210
215 220Ile Glu Asn Ala Thr Phe Phe Ile Phe Asn Lys Leu
Val Lys Asp Pro225 230 235
240Pro Asn Val Gln Ile His Thr Ile Asp Gly Ser Ser Gly Val Ala Asn
245 250 255Pro Ala Met Asp Pro
Ile Tyr Asp Glu Pro Thr Thr Thr Thr Ser Val 260
265 270Pro Leu 111096DNACORONAVIRUSCDS(558)..(1019)
11tcttgctttg ttgcatgact agttgttgca gttgcctcaa gggtgcatgc tcttgtggtt
60cttgctgcaa gtttgatgag gatgactctg agccagttct caagggtgtc aaattacatt
120acacataaac gaacttatgg atttgtttat gagatttttt actcttggat caattactgc
180acagccagta aaaattgaca atgcttctcc tgcaagtact gttcatgcta cagcaacgat
240accgctacaa gcctcactcc ctttcggatg gcttgttatt ggcgttgcat ttcttgctgt
300ttttcagagc gctaccaaaa taattgcgct caataaaaga tggcagctag ccctttataa
360gggcttccag ttcatttgca atttactgct gctatttgtt accatctatt cacatctttt
420gcttgtcgct gcaggtatgg aggcgcaatt tttgtacctc tatgccttga tatattttct
480acaatgcatc aacgcatgta gaattattat gagatgttgg ctttgttgga agtgcaaatc
540caagaaccca ttacttt atg atg cca act act ttg ttt gct ggc aca cac
590 Met Met Pro Thr Thr Leu Phe Ala Gly Thr His
1 5 10ata act atg act act gta
tac cat ata aca gtg tca cag ata caa ttg 638Ile Thr Met Thr Thr Val
Tyr His Ile Thr Val Ser Gln Ile Gln Leu 15 20
25tcg tta ctg aag gtg acg gca ttt caa cac caa aac tca
aag aag act 686Ser Leu Leu Lys Val Thr Ala Phe Gln His Gln Asn Ser
Lys Lys Thr 30 35 40acc aaa ttg
gtg gtt att ctg agg ata ggc act cag gtg tta aag act 734Thr Lys Leu
Val Val Ile Leu Arg Ile Gly Thr Gln Val Leu Lys Thr 45
50 55atg tcg ttg tac atg gct att tca ccg aag ttt act
acc agc ttg agt 782Met Ser Leu Tyr Met Ala Ile Ser Pro Lys Phe Thr
Thr Ser Leu Ser60 65 70
75cta cac aaa tta cta cag aca ctg gta ttg aaa atg cta cat tct tca
830Leu His Lys Leu Leu Gln Thr Leu Val Leu Lys Met Leu His Ser Ser
80 85 90tct tta aca agc ttg tta
aag acc cac cga atg tgc aaa tac aca caa 878Ser Leu Thr Ser Leu Leu
Lys Thr His Arg Met Cys Lys Tyr Thr Gln 95
100 105tcg acg gct ctt cag gag ttg cta atc cag caa tgg
atc caa ttt atg 926Ser Thr Ala Leu Gln Glu Leu Leu Ile Gln Gln Trp
Ile Gln Phe Met 110 115 120atg agc
cga cga cga cta cta gcg tgc ctt tgt aag cac aag aaa gtg 974Met Ser
Arg Arg Arg Leu Leu Ala Cys Leu Cys Lys His Lys Lys Val 125
130 135agt acg aac tta tgt act cat tcg ttt cgg aag
aaa cag gta cgt 1019Ser Thr Asn Leu Cys Thr His Ser Phe Arg Lys
Lys Gln Val Arg140 145 150taatagttaa
tagcgtactt ctttttcttg ctttcgtggt attcttgcta gtcacactag 1079ccatccttac
tgcgctt
109612154PRTCORONAVIRUS 12Met Met Pro Thr Thr Leu Phe Ala Gly Thr His Ile
Thr Met Thr Thr1 5 10
15Val Tyr His Ile Thr Val Ser Gln Ile Gln Leu Ser Leu Leu Lys Val
20 25 30Thr Ala Phe Gln His Gln Asn
Ser Lys Lys Thr Thr Lys Leu Val Val 35 40
45Ile Leu Arg Ile Gly Thr Gln Val Leu Lys Thr Met Ser Leu Tyr
Met 50 55 60Ala Ile Ser Pro Lys Phe
Thr Thr Ser Leu Ser Leu His Lys Leu Leu65 70
75 80Gln Thr Leu Val Leu Lys Met Leu His Ser Ser
Ser Leu Thr Ser Leu 85 90
95Leu Lys Thr His Arg Met Cys Lys Tyr Thr Gln Ser Thr Ala Leu Gln
100 105 110Glu Leu Leu Ile Gln Gln
Trp Ile Gln Phe Met Met Ser Arg Arg Arg 115 120
125Leu Leu Ala Cys Leu Cys Lys His Lys Lys Val Ser Thr Asn
Leu Cys 130 135 140Thr His Ser Phe Arg
Lys Lys Gln Val Arg145
15013332DNACORONAVIRUSCDS(36)..(263) 13tgcctttgta agcacaagaa agtgagtacg
aactt atg tac tca ttc gtt tcg 53
Met Tyr Ser Phe Val Ser 1
5gaa gaa aca ggt acg tta ata gtt aat agc gta ctt ctt ttt ctt gct
101Glu Glu Thr Gly Thr Leu Ile Val Asn Ser Val Leu Leu Phe Leu Ala
10 15 20ttc gtg gta ttc ttg cta
gtc aca cta gcc atc ctt act gcg ctt cga 149Phe Val Val Phe Leu Leu
Val Thr Leu Ala Ile Leu Thr Ala Leu Arg 25 30
35ttg tgt gcg tac tgc tgc aat att gtt aac gtg agt tta gta
aaa cca 197Leu Cys Ala Tyr Cys Cys Asn Ile Val Asn Val Ser Leu Val
Lys Pro 40 45 50acg gtt tac gtc tac
tcg cgt gtt aaa aat ctg aac tct tct gaa gga 245Thr Val Tyr Val Tyr
Ser Arg Val Lys Asn Leu Asn Ser Ser Glu Gly55 60
65 70gtt cct gat ctt ctg gtc taaacgaact
aactattatt attattctgt 293Val Pro Asp Leu Leu Val
75ttggaacttt aacattgctt atcatggcag acaacggta
3321476PRTCORONAVIRUS 14Met Tyr Ser Phe Val Ser Glu Glu Thr Gly Thr Leu
Ile Val Asn Ser1 5 10
15Val Leu Leu Phe Leu Ala Phe Val Val Phe Leu Leu Val Thr Leu Ala
20 25 30Ile Leu Thr Ala Leu Arg Leu
Cys Ala Tyr Cys Cys Asn Ile Val Asn 35 40
45Val Ser Leu Val Lys Pro Thr Val Tyr Val Tyr Ser Arg Val Lys
Asn 50 55 60Leu Asn Ser Ser Glu Gly
Val Pro Asp Leu Leu Val65 70
7515332DNACORONAVIRUS 15tgcctttgta agcacaagaa agtgagtacg aacttatgta
ctcattcgtt tcggaagaaa 60caggtacgtt aatagttaat agcgtacttc tttttcttgc
tttcgtggta ttcttgctag 120tcacactagc catccttact gcgcttcgat tgtgtgcgta
ctgctgcaat attgttaacg 180tgagtttagt aaaaccaacg gtttacgtct actcgcgtgt
taaaaatctg aactcttctg 240aaggagttcc tgatcttctg gtctaaacga actaactatt
attattattc tgtttggaac 300tttaacattg cttatcatgg cagacaacgg ta
33216708DNACORONAVIRUSCDS(41)..(703) 16tattattatt
attctgtttg gaactttaac attgcttatc atg gca gac aac ggt 55
Met Ala Asp Asn Gly
1 5act att acc gtt gag gag ctt aaa caa
ctc ctg gaa caa tgg aac cta 103Thr Ile Thr Val Glu Glu Leu Lys Gln
Leu Leu Glu Gln Trp Asn Leu 10 15
20gta ata ggt ttc cta ttc cta gcc tgg att atg tta cta caa ttt
gcc 151Val Ile Gly Phe Leu Phe Leu Ala Trp Ile Met Leu Leu Gln Phe
Ala 25 30 35tat tct aat cgg
aac agg ttt ttg tac ata ata aag ctt gtt ttc ctc 199Tyr Ser Asn Arg
Asn Arg Phe Leu Tyr Ile Ile Lys Leu Val Phe Leu 40
45 50tgg ctc ttg tgg cca gta aca ctt gct tgt ttt gtg
ctt gct gct gtc 247Trp Leu Leu Trp Pro Val Thr Leu Ala Cys Phe Val
Leu Ala Ala Val 55 60 65tac aga att
aat tgg gtg act ggc ggg att gcg att gca atg gct tgt 295Tyr Arg Ile
Asn Trp Val Thr Gly Gly Ile Ala Ile Ala Met Ala Cys70 75
80 85att gta ggc ttg atg tgg ctt agc
tac ttc gtt gct tcc ttc agg ctg 343Ile Val Gly Leu Met Trp Leu Ser
Tyr Phe Val Ala Ser Phe Arg Leu 90 95
100ttt gct cgt acc cgc tca atg tgg tca ttc aac cca gaa aca
aac att 391Phe Ala Arg Thr Arg Ser Met Trp Ser Phe Asn Pro Glu Thr
Asn Ile 105 110 115ctt ctc aat
gtg cct ctc cgg ggg aca att gtg acc aga ccg ctc atg 439Leu Leu Asn
Val Pro Leu Arg Gly Thr Ile Val Thr Arg Pro Leu Met 120
125 130gaa agt gaa ctt gtc att ggt gct gtg atc att
cgt ggt cac ttg cga 487Glu Ser Glu Leu Val Ile Gly Ala Val Ile Ile
Arg Gly His Leu Arg 135 140 145atg gcc
gga cac tcc cta ggg cgc tgt gac att aag gac ctg cca aaa 535Met Ala
Gly His Ser Leu Gly Arg Cys Asp Ile Lys Asp Leu Pro Lys150
155 160 165gag atc act gtg gct aca tca
cga acg ctt tct tat tac aaa tta gga 583Glu Ile Thr Val Ala Thr Ser
Arg Thr Leu Ser Tyr Tyr Lys Leu Gly 170
175 180gcg tcg cag cgt gta ggc act gat tca ggt ttt gct
gca tac aac cgc 631Ala Ser Gln Arg Val Gly Thr Asp Ser Gly Phe Ala
Ala Tyr Asn Arg 185 190 195tac
cgt att gga aac tat aaa tta aat aca gac cac gcc ggt agc aac 679Tyr
Arg Ile Gly Asn Tyr Lys Leu Asn Thr Asp His Ala Gly Ser Asn 200
205 210gac aat att gct ttg cta gta cag taagt
708Asp Asn Ile Ala Leu Leu Val Gln 215
22017221PRTCORONAVIRUS 17Met Ala Asp Asn Gly Thr Ile Thr
Val Glu Glu Leu Lys Gln Leu Leu1 5 10
15Glu Gln Trp Asn Leu Val Ile Gly Phe Leu Phe Leu Ala Trp
Ile Met 20 25 30Leu Leu Gln
Phe Ala Tyr Ser Asn Arg Asn Arg Phe Leu Tyr Ile Ile 35
40 45Lys Leu Val Phe Leu Trp Leu Leu Trp Pro Val
Thr Leu Ala Cys Phe 50 55 60Val Leu
Ala Ala Val Tyr Arg Ile Asn Trp Val Thr Gly Gly Ile Ala65
70 75 80Ile Ala Met Ala Cys Ile Val
Gly Leu Met Trp Leu Ser Tyr Phe Val 85 90
95Ala Ser Phe Arg Leu Phe Ala Arg Thr Arg Ser Met Trp
Ser Phe Asn 100 105 110Pro Glu
Thr Asn Ile Leu Leu Asn Val Pro Leu Arg Gly Thr Ile Val 115
120 125Thr Arg Pro Leu Met Glu Ser Glu Leu Val
Ile Gly Ala Val Ile Ile 130 135 140Arg
Gly His Leu Arg Met Ala Gly His Ser Leu Gly Arg Cys Asp Ile145
150 155 160Lys Asp Leu Pro Lys Glu
Ile Thr Val Ala Thr Ser Arg Thr Leu Ser 165
170 175Tyr Tyr Lys Leu Gly Ala Ser Gln Arg Val Gly Thr
Asp Ser Gly Phe 180 185 190Ala
Ala Tyr Asn Arg Tyr Arg Ile Gly Asn Tyr Lys Leu Asn Thr Asp 195
200 205His Ala Gly Ser Asn Asp Asn Ile Ala
Leu Leu Val Gln 210 215
22018769DNACORONAVIRUS 18cctgatcttc tggtctaaac gaactaacta ttattattat
tctgtttgga actttaacat 60tgcttatcat ggcagacaac ggtactatta ccgttgagga
gcttaaacaa ctcctggaac 120aatggaacct agtaataggt ttcctattcc tagcctggat
tatgttacta caatttgcct 180attctaatcg gaacaggttt ttgtacataa taaagcttgt
tttcctctgg ctcttgtggc 240cagtaacact tgcttgtttt gtgcttgctg ctgtctacag
aattaattgg gtgactggcg 300ggattgcgat tgcaatggct tgtattgtag gcttgatgtg
gcttagctac ttcgttgctt 360ccttcaggct gtttgctcgt acccgctcaa tgtggtcatt
caacccagaa acaaacattc 420ttctcaatgt gcctctccgg gggacaattg tgaccagacc
gctcatggaa agtgaacttg 480tcattggtgc tgtgatcatt cgtggtcact tgcgaatggc
cggacactcc ctagggcgct 540gtgacattaa ggacctgcca aaagagatca ctgtggctac
atcacgaacg ctttcttatt 600acaaattagg agcgtcgcag cgtgtaggca ctgattcagg
ttttgctgca tacaaccgct 660accgtattgg aaactataaa ttaaatacag accacgccgg
tagcaacgac aatattgctt 720tgctagtaca gtaagtgaca acagatgttt catcttgttg
acttccagg 769191231DNACORONAVIRUS 19taccgtattg gaaactataa
attaaataca gaccacgccg gtagcaacga caatattgct 60ttgctagtac agtaagtgac
aacagatgtt tcatcttgtt gacttccagg ttacaatagc 120agagatattg attatcatta
tgaggacttt caggattgct atttggaatc ttgacgttat 180aataagttca atagtgagac
aattatttaa gcctctaact aagaagaatt attcggagtt 240agatgatgaa gaacctatgg
agttagatta tccataaaac gaacatgaaa attattctct 300tcctgacatt gattgtattt
acatcttgcg agctatatca ctatcaggag tgtgttagag 360gtacgactgt actactaaaa
gaaccttgcc catcaggaac atacgagggc aattcaccat 420ttcaccctct tgctgacaat
aaatttgcac taacttgcac tagcacacac tttgcttttg 480cttgtgctga cggtactcga
catacctatc agctgcgtgc aagatcagtt tcaccaaaac 540ttttcatcag acaagaggag
gttcaacaag agctctactc gccacttttt ctcattgttg 600ctgctctagt atttttaata
ctttgcttca ccattaagag aaagacagaa tgaatgagct 660cactttaatt gacttctatt
tgtgcttttt agcctttctg ctattccttg ttttaataat 720gcttattata ttttggtttt
cactcgaaat ccaggatcta gaagaacctt gtaccaaagt 780ctaaacgaac atgaaacttc
tcattgtttt gacttgtatt tctctatgca gttgcatatg 840cactgtagta cagcgctgtg
catctaataa acctcatgtg cttgaagatc cttgtaaggt 900acaacactag gggtaatact
tatagcactg cttggctttg tgctctagga aaggttttac 960cttttcatag atggcacact
atggttcaaa catgcacacc taatgttact atcaactgtc 1020aagatccagc tggtggtgcg
cttatagcta ggtgttggta ccttcatgaa ggtcaccaaa 1080ctgctgcatt tagagacgta
cttgttgttt taaataaacg aacaaattaa aatgtctgat 1140aatggacccc aatcaaacca
acgtagtgcc ccccgcatta catttggtgg acccacagat 1200tcaactgaca ataaccagaa
tggaggacgc a 1231201242DNACORONAVIRUS
20gcatacaacc gctaccgtat tggaaactat aaattaaata cagaccacgc cggtagcaac
60gacaatattg ctttgctagt acagtaagtg acaacagatg tttcatcttg ttgacttcca
120ggttacaata gcagagatat tgattatcat tatgaggact ttcaggattg ctatttggaa
180tcttgacgtt ataataagtt caatagtgag acagttattt aagcctctaa ctaagaagaa
240ttattcggag ttagatgatg aagaacctat ggagttagat tatccataaa acgaacatga
300aaattattct cttcctgaca ttgattgtat ttacatcttg cgagctatat cactatcagg
360agtgtgttag aggtacgact gtactactaa aagaaccttg cccatcagga acatacgagg
420gcaattcacc atttcaccct cttgctgaca ataaatttgc actaacttgc actagcacac
480actttgcttt tgcttgtgct gacggtactc gacataccta tcagctgcgt gcaagatcag
540tttcaccaaa acttttcatc agacaagagg aggttcaaca agagctctac tcgccacttt
600ttctcattgt tgctgctcta gtatttttaa tactttgctt caccattaag agaaagacag
660aatgaatgag ctcactttaa ttgacttcta tttgtgcttt ttagcctttc tgctattcct
720tgttttaata atgcttatta tattttggtt ttcactcgaa atccaggatc tagaagaacc
780ttgtaccaaa gtctaaacga acatgaaact tctcattgtt ttgacttgta tttctctatg
840cagttgcata tgcactgtag tacagcgctg tgcatctaat aaacctcatg tgcttgaaga
900tccttgtaag gtacaacact aggggtaata cttatagcac tgcttggctt tgtgctctag
960gaaaggtttt accttttcat agatggcaca ctatggttca aacatgcaca cctaatgtta
1020ctatcaactg tcaagatcca gctggtggtg cgcttatagc taggtgttgg taccttcatg
1080aaggtcacca aactgctgca tttagagacg tacttgttgt tttaaataaa cgaacgaatt
1140aaaatgtctg ataatggacc ccaatcaaac caacgtagtg ccccccgcat tacatttggt
1200ggacccacag attcaactga caataaccag aatggaggac gc
1242211231DNACORONAVIRUSCDS(86)..(274) 21taccgtattg gaaactataa attaaataca
gaccacgccg gtagcaacga caatattgct 60ttgctagtac agtaagtgac aacag atg
ttt cat ctt gtt gac ttc cag gtt 112 Met
Phe His Leu Val Asp Phe Gln Val 1
5aca ata gca gag ata ttg att atc att atg agg act ttc agg att gct
160Thr Ile Ala Glu Ile Leu Ile Ile Ile Met Arg Thr Phe Arg Ile Ala10
15 20 25att tgg aat ctt gac
gtt ata ata agt tca ata gtg aga caa tta ttt 208Ile Trp Asn Leu Asp
Val Ile Ile Ser Ser Ile Val Arg Gln Leu Phe 30
35 40aag cct cta act aag aag aat tat tcg gag tta
gat gat gaa gaa cct 256Lys Pro Leu Thr Lys Lys Asn Tyr Ser Glu Leu
Asp Asp Glu Glu Pro 45 50
55atg gag tta gat tat cca taaaacgaac atgaaaatta ttctcttcct
304Met Glu Leu Asp Tyr Pro 60gacattgatt gtatttacat cttgcgagct
atatcactat caggagtgtg ttagaggtac 364gactgtacta ctaaaagaac cttgcccatc
aggaacatac gagggcaatt caccatttca 424ccctcttgct gacaataaat ttgcactaac
ttgcactagc acacactttg cttttgcttg 484tgctgacggt actcgacata cctatcagct
gcgtgcaaga tcagtttcac caaaactttt 544catcagacaa gaggaggttc aacaagagct
ctactcgcca ctttttctca ttgttgctgc 604tctagtattt ttaatacttt gcttcaccat
taagagaaag acagaatgaa tgagctcact 664ttaattgact tctatttgtg ctttttagcc
tttctgctat tccttgtttt aataatgctt 724attatatttt ggttttcact cgaaatccag
gatctagaag aaccttgtac caaagtctaa 784acgaacatga aacttctcat tgttttgact
tgtatttctc tatgcagttg catatgcact 844gtagtacagc gctgtgcatc taataaacct
catgtgcttg aagatccttg taaggtacaa 904cactaggggt aatacttata gcactgcttg
gctttgtgct ctaggaaagg ttttaccttt 964tcatagatgg cacactatgg ttcaaacatg
cacacctaat gttactatca actgtcaaga 1024tccagctggt ggtgcgctta tagctaggtg
ttggtacctt catgaaggtc accaaactgc 1084tgcatttaga gacgtacttg ttgttttaaa
taaacgaaca aattaaaatg tctgataatg 1144gaccccaatc aaaccaacgt agtgcccccc
gcattacatt tggtggaccc acagattcaa 1204ctgacaataa ccagaatgga ggacgca
12312263PRTCORONAVIRUS 22Met Phe His Leu
Val Asp Phe Gln Val Thr Ile Ala Glu Ile Leu Ile1 5
10 15Ile Ile Met Arg Thr Phe Arg Ile Ala Ile
Trp Asn Leu Asp Val Ile 20 25
30Ile Ser Ser Ile Val Arg Gln Leu Phe Lys Pro Leu Thr Lys Lys Asn
35 40 45Tyr Ser Glu Leu Asp Asp Glu Glu
Pro Met Glu Leu Asp Tyr Pro 50 55
60231231DNACORONAVIRUSCDS(285)..(650) 23taccgtattg gaaactataa attaaataca
gaccacgccg gtagcaacga caatattgct 60ttgctagtac agtaagtgac aacagatgtt
tcatcttgtt gacttccagg ttacaatagc 120agagatattg attatcatta tgaggacttt
caggattgct atttggaatc ttgacgttat 180aataagttca atagtgagac aattatttaa
gcctctaact aagaagaatt attcggagtt 240agatgatgaa gaacctatgg agttagatta
tccataaaac gaac atg aaa att att 296
Met Lys Ile Ile
1ctc ttc ctg aca ttg att gta ttt aca tct tgc gag cta tat cac tat
344Leu Phe Leu Thr Leu Ile Val Phe Thr Ser Cys Glu Leu Tyr His Tyr5
10 15 20cag gag tgt gtt aga
ggt acg act gta cta cta aaa gaa cct tgc cca 392Gln Glu Cys Val Arg
Gly Thr Thr Val Leu Leu Lys Glu Pro Cys Pro 25
30 35tca gga aca tac gag ggc aat tca cca ttt cac
cct ctt gct gac aat 440Ser Gly Thr Tyr Glu Gly Asn Ser Pro Phe His
Pro Leu Ala Asp Asn 40 45
50aaa ttt gca cta act tgc act agc aca cac ttt gct ttt gct tgt gct
488Lys Phe Ala Leu Thr Cys Thr Ser Thr His Phe Ala Phe Ala Cys Ala
55 60 65gac ggt act cga cat acc tat cag
ctg cgt gca aga tca gtt tca cca 536Asp Gly Thr Arg His Thr Tyr Gln
Leu Arg Ala Arg Ser Val Ser Pro 70 75
80aaa ctt ttc atc aga caa gag gag gtt caa caa gag ctc tac tcg cca
584Lys Leu Phe Ile Arg Gln Glu Glu Val Gln Gln Glu Leu Tyr Ser Pro85
90 95 100ctt ttt ctc att
gtt gct gct cta gta ttt tta ata ctt tgc ttc acc 632Leu Phe Leu Ile
Val Ala Ala Leu Val Phe Leu Ile Leu Cys Phe Thr 105
110 115att aag aga aag aca gaa tgaatgagct
cactttaatt gacttctatt 680Ile Lys Arg Lys Thr Glu
120tgtgcttttt agcctttctg ctattccttg ttttaataat gcttattata ttttggtttt
740cactcgaaat ccaggatcta gaagaacctt gtaccaaagt ctaaacgaac atgaaacttc
800tcattgtttt gacttgtatt tctctatgca gttgcatatg cactgtagta cagcgctgtg
860catctaataa acctcatgtg cttgaagatc cttgtaaggt acaacactag gggtaatact
920tatagcactg cttggctttg tgctctagga aaggttttac cttttcatag atggcacact
980atggttcaaa catgcacacc taatgttact atcaactgtc aagatccagc tggtggtgcg
1040cttatagcta ggtgttggta ccttcatgaa ggtcaccaaa ctgctgcatt tagagacgta
1100cttgttgttt taaataaacg aacaaattaa aatgtctgat aatggacccc aatcaaacca
1160acgtagtgcc ccccgcatta catttggtgg acccacagat tcaactgaca ataaccagaa
1220tggaggacgc a
123124122PRTCORONAVIRUS 24Met Lys Ile Ile Leu Phe Leu Thr Leu Ile Val Phe
Thr Ser Cys Glu1 5 10
15Leu Tyr His Tyr Gln Glu Cys Val Arg Gly Thr Thr Val Leu Leu Lys
20 25 30Glu Pro Cys Pro Ser Gly Thr
Tyr Glu Gly Asn Ser Pro Phe His Pro 35 40
45Leu Ala Asp Asn Lys Phe Ala Leu Thr Cys Thr Ser Thr His Phe
Ala 50 55 60Phe Ala Cys Ala Asp Gly
Thr Arg His Thr Tyr Gln Leu Arg Ala Arg65 70
75 80Ser Val Ser Pro Lys Leu Phe Ile Arg Gln Glu
Glu Val Gln Gln Glu 85 90
95Leu Tyr Ser Pro Leu Phe Leu Ile Val Ala Ala Leu Val Phe Leu Ile
100 105 110Leu Cys Phe Thr Ile Lys
Arg Lys Thr Glu 115
120251231DNACORONAVIRUSCDS(650)..(781) 25taccgtattg gaaactataa attaaataca
gaccacgccg gtagcaacga caatattgct 60ttgctagtac agtaagtgac aacagatgtt
tcatcttgtt gacttccagg ttacaatagc 120agagatattg attatcatta tgaggacttt
caggattgct atttggaatc ttgacgttat 180aataagttca atagtgagac aattatttaa
gcctctaact aagaagaatt attcggagtt 240agatgatgaa gaacctatgg agttagatta
tccataaaac gaacatgaaa attattctct 300tcctgacatt gattgtattt acatcttgcg
agctatatca ctatcaggag tgtgttagag 360gtacgactgt actactaaaa gaaccttgcc
catcaggaac atacgagggc aattcaccat 420ttcaccctct tgctgacaat aaatttgcac
taacttgcac tagcacacac tttgcttttg 480cttgtgctga cggtactcga catacctatc
agctgcgtgc aagatcagtt tcaccaaaac 540ttttcatcag acaagaggag gttcaacaag
agctctactc gccacttttt ctcattgttg 600ctgctctagt atttttaata ctttgcttca
ccattaagag aaagacaga atg aat gag 658
Met Asn Glu
1ctc act tta att gac ttc tat ttg tgc ttt tta gcc ttt ctg cta
ttc 706Leu Thr Leu Ile Asp Phe Tyr Leu Cys Phe Leu Ala Phe Leu Leu
Phe 5 10 15ctt gtt tta ata atg ctt
att ata ttt tgg ttt tca ctc gaa atc cag 754Leu Val Leu Ile Met Leu
Ile Ile Phe Trp Phe Ser Leu Glu Ile Gln20 25
30 35gat cta gaa gaa cct tgt acc aaa gtc taaacgaaca
tgaaacttct 801Asp Leu Glu Glu Pro Cys Thr Lys Val
40cattgttttg acttgtattt ctctatgcag ttgcatatgc actgtagtac agcgctgtgc
861atctaataaa cctcatgtgc ttgaagatcc ttgtaaggta caacactagg ggtaatactt
921atagcactgc ttggctttgt gctctaggaa aggttttacc ttttcataga tggcacacta
981tggttcaaac atgcacacct aatgttacta tcaactgtca agatccagct ggtggtgcgc
1041ttatagctag gtgttggtac cttcatgaag gtcaccaaac tgctgcattt agagacgtac
1101ttgttgtttt aaataaacga acaaattaaa atgtctgata atggacccca atcaaaccaa
1161cgtagtgccc cccgcattac atttggtgga cccacagatt caactgacaa taaccagaat
1221ggaggacgca
12312644PRTCORONAVIRUS 26Met Asn Glu Leu Thr Leu Ile Asp Phe Tyr Leu Cys
Phe Leu Ala Phe1 5 10
15Leu Leu Phe Leu Val Leu Ile Met Leu Ile Ile Phe Trp Phe Ser Leu
20 25 30Glu Ile Gln Asp Leu Glu Glu
Pro Cys Thr Lys Val 35
40271231DNACORONAVIRUSCDS(791)..(907) 27taccgtattg gaaactataa attaaataca
gaccacgccg gtagcaacga caatattgct 60ttgctagtac agtaagtgac aacagatgtt
tcatcttgtt gacttccagg ttacaatagc 120agagatattg attatcatta tgaggacttt
caggattgct atttggaatc ttgacgttat 180aataagttca atagtgagac aattatttaa
gcctctaact aagaagaatt attcggagtt 240agatgatgaa gaacctatgg agttagatta
tccataaaac gaacatgaaa attattctct 300tcctgacatt gattgtattt acatcttgcg
agctatatca ctatcaggag tgtgttagag 360gtacgactgt actactaaaa gaaccttgcc
catcaggaac atacgagggc aattcaccat 420ttcaccctct tgctgacaat aaatttgcac
taacttgcac tagcacacac tttgcttttg 480cttgtgctga cggtactcga catacctatc
agctgcgtgc aagatcagtt tcaccaaaac 540ttttcatcag acaagaggag gttcaacaag
agctctactc gccacttttt ctcattgttg 600ctgctctagt atttttaata ctttgcttca
ccattaagag aaagacagaa tgaatgagct 660cactttaatt gacttctatt tgtgcttttt
agcctttctg ctattccttg ttttaataat 720gcttattata ttttggtttt cactcgaaat
ccaggatcta gaagaacctt gtaccaaagt 780ctaaacgaac atg aaa ctt ctc att gtt
ttg act tgt att tct cta tgc 829 Met Lys Leu Leu Ile Val
Leu Thr Cys Ile Ser Leu Cys 1 5
10agt tgc ata tgc act gta gta cag cgc tgt gca tct aat aaa cct cat
877Ser Cys Ile Cys Thr Val Val Gln Arg Cys Ala Ser Asn Lys Pro His 15
20 25gtg ctt gaa gat cct tgt aag gta caa
cac taggggtaat acttatagca 927Val Leu Glu Asp Pro Cys Lys Val Gln
His30 35ctgcttggct ttgtgctcta ggaaaggttt taccttttca
tagatggcac actatggttc 987aaacatgcac acctaatgtt actatcaact gtcaagatcc
agctggtggt gcgcttatag 1047ctaggtgttg gtaccttcat gaaggtcacc aaactgctgc
atttagagac gtacttgttg 1107ttttaaataa acgaacaaat taaaatgtct gataatggac
cccaatcaaa ccaacgtagt 1167gccccccgca ttacatttgg tggacccaca gattcaactg
acaataacca gaatggagga 1227cgca
12312839PRTCORONAVIRUS 28Met Lys Leu Leu Ile Val
Leu Thr Cys Ile Ser Leu Cys Ser Cys Ile1 5
10 15Cys Thr Val Val Gln Arg Cys Ala Ser Asn Lys Pro
His Val Leu Glu 20 25 30Asp
Pro Cys Lys Val Gln His 35291231DNACORONAVIRUSCDS(876)..(1127)
29taccgtattg gaaactataa attaaataca gaccacgccg gtagcaacga caatattgct
60ttgctagtac agtaagtgac aacagatgtt tcatcttgtt gacttccagg ttacaatagc
120agagatattg attatcatta tgaggacttt caggattgct atttggaatc ttgacgttat
180aataagttca atagtgagac aattatttaa gcctctaact aagaagaatt attcggagtt
240agatgatgaa gaacctatgg agttagatta tccataaaac gaacatgaaa attattctct
300tcctgacatt gattgtattt acatcttgcg agctatatca ctatcaggag tgtgttagag
360gtacgactgt actactaaaa gaaccttgcc catcaggaac atacgagggc aattcaccat
420ttcaccctct tgctgacaat aaatttgcac taacttgcac tagcacacac tttgcttttg
480cttgtgctga cggtactcga catacctatc agctgcgtgc aagatcagtt tcaccaaaac
540ttttcatcag acaagaggag gttcaacaag agctctactc gccacttttt ctcattgttg
600ctgctctagt atttttaata ctttgcttca ccattaagag aaagacagaa tgaatgagct
660cactttaatt gacttctatt tgtgcttttt agcctttctg ctattccttg ttttaataat
720gcttattata ttttggtttt cactcgaaat ccaggatcta gaagaacctt gtaccaaagt
780ctaaacgaac atgaaacttc tcattgtttt gacttgtatt tctctatgca gttgcatatg
840cactgtagta cagcgctgtg catctaataa acctc atg tgc ttg aag atc ctt
893 Met Cys Leu Lys Ile Leu
1 5gta agg tac aac act agg
ggt aat act tat agc act gct tgg ctt tgt 941Val Arg Tyr Asn Thr Arg
Gly Asn Thr Tyr Ser Thr Ala Trp Leu Cys 10 15
20gct cta gga aag gtt tta cct ttt cat aga tgg cac act
atg gtt caa 989Ala Leu Gly Lys Val Leu Pro Phe His Arg Trp His Thr
Met Val Gln 25 30 35aca tgc aca
cct aat gtt act atc aac tgt caa gat cca gct ggt ggt 1037Thr Cys Thr
Pro Asn Val Thr Ile Asn Cys Gln Asp Pro Ala Gly Gly 40
45 50gcg ctt ata gct agg tgt tgg tac ctt cat gaa ggt
cac caa act gct 1085Ala Leu Ile Ala Arg Cys Trp Tyr Leu His Glu Gly
His Gln Thr Ala55 60 65
70gca ttt aga gac gta ctt gtt gtt tta aat aaa cga aca aat
1127Ala Phe Arg Asp Val Leu Val Val Leu Asn Lys Arg Thr Asn
75 80taaaatgtct gataatggac cccaatcaaa ccaacgtagt
gccccccgca ttacatttgg 1187tggacccaca gattcaactg acaataacca gaatggagga
cgca 12313084PRTCORONAVIRUS 30Met Cys Leu Lys Ile Leu
Val Arg Tyr Asn Thr Arg Gly Asn Thr Tyr1 5
10 15Ser Thr Ala Trp Leu Cys Ala Leu Gly Lys Val Leu
Pro Phe His Arg 20 25 30Trp
His Thr Met Val Gln Thr Cys Thr Pro Asn Val Thr Ile Asn Cys 35
40 45Gln Asp Pro Ala Gly Gly Ala Leu Ile
Ala Arg Cys Trp Tyr Leu His 50 55
60Glu Gly His Gln Thr Ala Ala Phe Arg Asp Val Leu Val Val Leu Asn65
70 75 80Lys Arg Thr
Asn3121221DNACORONAVIRUS 31atggagagcc ttgttcttgg tgtcaacgag aaaacacacg
tccaactcag tttgcctgtc 60cttcaggtta gagacgtgct agtgcgtggc ttcggggact
ctgtggaaga ggccctatcg 120gaggcacgtg aacacctcaa aaatggcact tgtggtctag
tagagctgga aaaaggcgta 180ctgccccagc ttgaacagcc ctatgtgttc attaaacgtt
ctgatgcctt aagcaccaat 240cacggccaca aggtcgttga gctggttgca gaaatggacg
gcattcagta cggtcgtagc 300ggtataacac tgggagtact cgtgccacat gtgggcgaaa
ccccaattgc ataccgcaat 360gttcttcttc gtaagaacgg taataaggga gccggtggtc
atagctatgg catcgatcta 420aagtcttatg acttaggtga cgagcttggc actgatccca
ttgaagatta tgaacaaaac 480tggaacacta agcatggcag tggtgcactc cgtgaactca
ctcgtgagct caatggaggt 540gcagtcactc gctatgtcga caacaatttc tgtggcccag
atgggtaccc tcttgattgc 600atcaaagatt ttctcgcacg cgcgggcaag tcaatgtgca
ctctttccga acaacttgat 660tacatcgagt cgaagagagg tgtctactgc tgccgtgacc
atgagcatga aattgcctgg 720ttcactgagc gctctgataa gagctacgag caccagacac
ccttcgaaat taagagtgcc 780aagaaatttg acactttcaa aggggaatgc ccaaagtttg
tgtttcctct taactcaaaa 840gtcaaagtca ttcaaccacg tgttgaaaag aaaaagactg
agggtttcat ggggcgtata 900cgctctgtgt accctgttgc atctccacag gagtgtaaca
atatgcactt gtctaccttg 960atgaaatgta atcattgcga tgaagtttca tggcagacgt
gcgactttct gaaagccact 1020tgtgaacatt gtggcactga aaatttagtt attgaaggac
ctactacatg tgggtaccta 1080cctactaatg ctgtagtgaa aatgccatgt cctgcctgtc
aagacccaga gattggacct 1140gagcatagtg ttgcagatta tcacaaccac tcaaacattg
aaactcgact ccgcaaggga 1200ggtaggacta gatgttttgg aggctgtgtg tttgcctatg
ttggctgcta taataagcgt 1260gcctactggg ttcctcgtgc tagtgctgat attggctcag
gccatactgg cattactggt 1320gacaatgtgg agaccttgaa tgaggatctc cttgagatac
tgagtcgtga acgtgttaac 1380attaacattg ttggcgattt tcatttgaat gaagaggttg
ccatcatttt ggcatctttc 1440tctgcttcta caagtgcctt tattgacact ataaagagtc
ttgattacaa gtctttcaaa 1500accattgttg agtcctgcgg taactataaa gttaccaagg
gaaagcccgt aaaaggtgct 1560tggaacattg gacaacagag atcagtttta acaccactgt
gtggttttcc ctcacaggct 1620gctggtgtta tcagatcaat ttttgcgcgc acacttgatg
cagcaaacca ctcaattcct 1680gatttgcaaa gagcagctgt caccatactt gatggtattt
ctgaacagtc attacgtctt 1740gtcgacgcca tggtttatac ttcagacctg ctcaccaaca
gtgtcattat tatggcatat 1800gtaactggtg gtcttgtaca acagacttct cagtggttgt
ctaatctttt gggcactact 1860gttgaaaaac tcaggcctat ctttgaatgg attgaggcga
aacttagtgc aggagttgaa 1920tttctcaagg atgcttggga gattctcaaa tttctcatta
caggtgtttt tgacatcgtc 1980aagggtcaaa tacaggttgc ttcagataac atcaaggatt
gtgtaaaatg cttcattgat 2040gttgttaaca aggcactcga aatgtgcatt gatcaagtca
ctatcgctgg cgcaaagttg 2100cgatcactca acttaggtga agtcttcatc gctcaaagca
agggacttta ccgtcagtgt 2160atacgtggca aggagcagct gcaactactc atgcctctta
aggcaccaaa agaagtaacc 2220tttcttgaag gtgattcaca tgacacagta cttacctctg
aggaggttgt tctcaagaac 2280ggtgaactcg aagcactcga gacgcccgtt gatagcttca
caaatggagc tatcgttggc 2340acaccagtct gtgtaaatgg cctcatgctc ttagagatta
aggacaaaga acaatactgc 2400gcattgtctc ctggtttact ggctacaaac aatgtctttc
gcttaaaagg gggtgcacca 2460attaaaggtg taacctttgg agaagatact gtttgggaag
ttcaaggtta caagaatgtg 2520agaatcacat ttgagcttga tgaacgtgtt gacaaagtgc
ttaatgaaaa gtgctctgtc 2580tacactgttg aatccggtac cgaagttact gagtttgcat
gtgttgtagc agaggctgtt 2640gtgaagactt tacaaccagt ttctgatctc cttaccaaca
tgggtattga tcttgatgag 2700tggagtgtag ctacattcta cttatttgat gatgctggtg
aagaaaactt ttcatcacgt 2760atgtattgtt ccttttaccc tccagatgag gaagaagagg
acgatgcaga gtgtgaggaa 2820gaagaaattg atgaaacctg tgaacatgag tacggtacag
aggatgatta tcaaggtctc 2880cctctggaat ttggtgcctc agctgaaaca gttcgagttg
aggaagaaga agaggaagac 2940tggctggatg atactactga gcaatcagag attgagccag
aaccagaacc tacacctgaa 3000gaaccagtta atcagtttac tggttattta aaacttactg
acaatgttgc cattaaatgt 3060gttgacatcg ttaaggaggc acaaagtgct aatcctatgg
tgattgtaaa tgctgctaac 3120atacacctga aacatggtgg tggtgtagca ggtgcactca
acaaggcaac caatggtgcc 3180atgcaaaagg agagtgatga ttacattaag ctaaatggcc
ctcttacagt aggagggtct 3240tgtttgcttt ctggacataa tcttgctaag aagtgtctgc
atgttgttgg acctaaccta 3300aatgcaggtg aggacatcca gcttcttaag gcagcatatg
aaaatttcaa ttcacaggac 3360atcttacttg caccattgtt gtcagcaggc atatttggtg
ctaaaccact tcagtcttta 3420caagtgtgcg tgcagacggt tcgtacacag gtttatattg
cagtcaatga caaagctctt 3480tatgagcagg ttgtcatgga ttatcttgat aacctgaagc
ctagagtgga agcacctaaa 3540caagaggagc caccaaacac agaagattcc aaaactgagg
agaaatctgt cgtacagaag 3600cctgtcgatg tgaagccaaa aattaaggcc tgcattgatg
aggttaccac aacactggaa 3660gaaactaagt ttcttaccaa taagttactc ttgtttgctg
atatcaatgg taagctttac 3720catgattctc agaacatgct tagaggtgaa gatatgtctt
tccttgagaa ggatgcacct 3780tacatggtag gtgatgttat cactagtggt gatatcactt
gtgttgtaat accctccaaa 3840aaggctggtg gcactactga gatgctctca agagctttga
agaaagtgcc agttgatgag 3900tatataacca cgtaccctgg acaaggatgt gctggttata
cacttgagga agctaagact 3960gctcttaaga aatgcaaatc tgcattttat gtactacctt
cagaagcacc taatgctaag 4020gaagagattc taggaactgt atcctggaat ttgagagaaa
tgcttgctca tgctgaagag 4080acaagaaaat taatgcctat atgcatggat gttagagcca
taatggcaac catccaacgt 4140aagtataaag gaattaaaat tcaagagggc atcgttgact
atggtgtccg attcttcttt 4200tatactagta aagagcctgt agcttctatt attacgaagc
tgaactctct aaatgagccg 4260cttgtcacaa tgccaattgg ttatgtgaca catggtttta
atcttgaaga ggctgcgcgc 4320tgtatgcgtt ctcttaaagc tcctgccgta gtgtcagtat
catcaccaga tgctgttact 4380acatataatg gatacctcac ttcgtcatca aagacatctg
aggagcactt tgtagaaaca 4440gtttctttgg ctggctctta cagagattgg tcctattcag
gacagcgtac agagttaggt 4500gttgaatttc ttaagcgtgg tgacaaaatt gtgtaccaca
ctctggagag ccccgtcgag 4560tttcatcttg acggtgaggt tctttcactt gacaaactaa
agagtctctt atccctgcgg 4620gaggttaaga ctataaaagt gttcacaact gtggacaaca
ctaatctcca cacacagctt 4680gtggatatgt ctatgacata tggacagcag tttggtccaa
catacttgga tggtgctgat 4740gttacaaaaa ttaaacctca tgtaaatcat gagggtaaga
ctttctttgt actacctagt 4800gatgacacac tacgtagtga agctttcgag tactaccata
ctcttgatga gagttttctt 4860ggtaggtaca tgtctgcttt aaaccacaca aagaaatgga
aatttcctca agttggtggt 4920ttaacttcaa ttaaatgggc tgataacaat tgttatttgt
ctagtgtttt attagcactt 4980caacagcttg aagtcaaatt caatgcacca gcacttcaag
aggcttatta tagagcccgt 5040gctggtgatg ctgctaactt ttgtgcactc atactcgctt
acagtaataa aactgttggc 5100gagcttggtg atgtcagaga aactatgacc catcttctac
agcatgctaa tttggaatct 5160gcaaagcgag ttcttaatgt ggtgtgtaaa cattgtggtc
agaaaactac taccttaacg 5220ggtgtagaag ctgtgatgta tatgggtact ctatcttatg
ataatcttaa gacaggtgtt 5280tccattccat gtgtgtgtgg tcgtgatgct acacaatatc
tagtacaaca agagtcttct 5340tttgttatga tgtctgcacc acctgctgag tataaattac
agcaaggtac attcttatgt 5400gcgaatgagt acactggtaa ctatcagtgt ggtcattaca
ctcatataac tgctaaggag 5460accctctatc gtattgacgg agctcacctt acaaagatgt
cagagtacaa aggaccagtg 5520actgatgttt tctacaagga aacatcttac actacaacca
tcaagcctgt gtcgtataaa 5580ctcgatggag ttacttacac agagattgaa ccaaaattgg
atgggtatta taaaaaggat 5640aatgcttact atacagagca gcctatagac cttgtaccaa
ctcaaccatt accaaatgcg 5700agttttgata atttcaaact cacatgttct aacacaaaat
ttgctgatga tttaaatcaa 5760atgacaggct tcacaaagcc agcttcacga gagctatctg
tcacattctt cccagacttg 5820aatggcgatg tagtggctat tgactataga cactattcag
cgagtttcaa gaaaggtgct 5880aaattactgc ataagccaat tgtttggcac attaaccagg
ctacaaccaa gacaacgttc 5940aaaccaaaca cttggtgttt acgttgtctt tggagtacaa
agccagtaga tacttcaaat 6000tcatttgaag ttctggcagt agaagacaca caaggaatgg
acaatcttgc ttgtgaaagt 6060caacaaccca cctctgaaga agtagtggaa aatcctacca
tacagaagga agtcatagag 6120tgtgacgtga aaactaccga agttgtaggc aatgtcatac
ttaaaccatc agatgaaggt 6180gttaaagtaa cacaagagtt aggtcatgag gatcttatgg
ctgcttatgt ggaaaacaca 6240agcattacca ttaagaaacc taatgagctt tcactagcct
taggtttaaa aacaattgcc 6300actcatggta ttgctgcaat taatagtgtt ccttggagta
aaattttggc ttatgtcaaa 6360ccattcttag gacaagcagc aattacaaca tcaaattgcg
ctaagagatt agcacaacgt 6420gtgtttaaca attatatgcc ttatgtgttt acattattgt
tccaattgtg tacttttact 6480aaaagtacca attctagaat tagagcttca ctacctacaa
ctattgctaa aaatagtgtt 6540aagagtgttg ctaaattatg tttggatgcc ggcattaatt
atgtgaagtc acccaaattt 6600tctaaattgt tcacaatcgc tatgtggcta ttgttgttaa
gtatttgctt aggttctcta 6660atctgtgtaa ctgctgcttt tggtgtactc ttatctaatt
ttggtgctcc ttcttattgt 6720aatggcgtta gagaattgta tcttaattcg tctaacgtta
ctactatgga tttctgtgaa 6780ggttcttttc cttgcagcat ttgtttaagt ggattagact
cccttgattc ttatccagct 6840cttgaaacca ttcaggtgac gatttcatcg tacaagctag
acttgacaat tttaggtctg 6900gccgctgagt gggttttggc atatatgttg ttcacaaaat
tcttttattt attaggtctt 6960tcagctataa tgcaggtgtt ctttggctat tttgctagtc
atttcatcag caattcttgg 7020ctcatgtggt ttatcattag tattgtacaa atggcacccg
tttctgcaat ggttaggatg 7080tacatcttct ttgcttcttt ctactacata tggaagagct
atgttcatat catggatggt 7140tgcacctctt cgacttgcat gatgtgctat aagcgcaatc
gtgccacacg cgttgagtgt 7200acaactattg ttaatggcat gaagagatct ttctatgtct
atgcaaatgg aggccgtggc 7260ttctgcaaga ctcacaattg gaattgtctc aattgtgaca
cattttgcac tggtagtaca 7320ttcattagtg atgaagttgc tcgtgatttg tcactccagt
ttaaaagacc aatcaaccct 7380actgaccagt catcgtatat tgttgatagt gttgctgtga
aaaatggcgc gcttcacctc 7440tactttgaca aggctggtca aaagacctat gagagacatc
cgctctccca ttttgtcaat 7500ttagacaatt tgagagctaa caacactaaa ggttcactgc
ctattaatgt catagttttt 7560gatggcaagt ccaaatgcga cgagtctgct tctaagtctg
cttctgtgta ctacagtcag 7620ctgatgtgcc aacctattct gttgcttgac caagctcttg
tatcagacgt tggagatagt 7680actgaagttt ccgttaagat gtttgatgct tatgtcgaca
ccttttcagc aacttttagt 7740gttcctatgg aaaaacttaa ggcacttgtt gctacagctc
acagcgagtt agcaaagggt 7800gtagctttag atggtgtcct ttctacattc gtgtcagctg
cccgacaagg tgttgttgat 7860accgatgttg acacaaagga tgttattgaa tgtctcaaac
tttcacatca ctctgactta 7920gaagtgacag gtgacagttg taacaatttc atgctcacct
ataataaggt tgaaaacatg 7980acgcccagag atcttggcgc atgtattgac tgtaatgcaa
ggcatatcaa tgcccaagta 8040gcaaaaagtc acaatgtttc actcatctgg aatgtaaaag
actacatgtc tttatctgaa 8100cagctgcgta aacaaattcg tagtgctgcc aagaagaaca
acataccttt tagactaact 8160tgtgctacaa ctagacaggt tgtcaatgtc ataactacta
aaatctcact caagggtggt 8220aagattgtta gtacttgttt taaacttatg cttaaggcca
cattattgtg cgttcttgct 8280gcattggttt gttatatcgt tatgccagta catacattgt
caatccatga tggttacaca 8340aatgaaatca ttggttacaa agccattcag gatggtgtca
ctcgtgacat catttctact 8400gatgattgtt ttgcaaataa acatgctggt tttgacgcat
ggtttagcca gcgtggtggt 8460tcatacaaaa atgacaaaag ctgccctgta gtagctgcta
tcattacaag agagattggt 8520ttcatagtgc ctggcttacc gggtactgtg ctgagagcaa
tcaatggtga cttcttgcat 8580tttctacctc gtgtttttag tgctgttggc aacatttgct
acacaccttc caaactcatt 8640gagtatagtg attttgctac ctctgcttgc gttcttgctg
ctgagtgtac aatttttaag 8700gatgctatgg gcaaacctgt gccatattgt tatgacacta
atttgctaga gggttctatt 8760tcttatagtg agcttcgtcc agacactcgt tatgtgctta
tggatggttc catcatacag 8820tttcctaaca cttacctgga gggttctgtt agagtagtaa
caacttttga tgctgagtac 8880tgtagacatg gtacatgcga aaggtcagaa gtaggtattt
gcctatctac cagtggtaga 8940tgggttctta ataatgagca ttacagagct ctatcaggag
ttttctgtgg tgttgatgcg 9000atgaatctca tagctaacat ctttactcct cttgtgcaac
ctgtgggtgc tttagatgtg 9060tctgcttcag tagtggctgg tggtattatt gccatattgg
tgacttgtgc tgcctactac 9120tttatgaaat tcagacgtgt ttttggtgag tacaaccatg
ttgttgctgc taatgcactt 9180ttgtttttga tgtctttcac tatactctgt ctggtaccag
cttacagctt tctgccggga 9240gtctactcag tcttttactt gtacttgaca ttctatttca
ccaatgatgt ttcattcttg 9300gctcaccttc aatggtttgc catgttttct cctattgtgc
ctttttggat aacagcaatc 9360tatgtattct gtatttctct gaagcactgc cattggttct
ttaacaacta tcttaggaaa 9420agagtcatgt ttaatggagt tacatttagt accttcgagg
aggctgcttt gtgtaccttt 9480ttgctcaaca aggaaatgta cctaaaattg cgtagcgaga
cactgttgcc acttacacag 9540tataacaggt atcttgctct atataacaag tacaagtatt
tcagtggagc cttagatact 9600accagctatc gtgaagcagc ttgctgccac ttagcaaagg
ctctaaatga ctttagcaac 9660tcaggtgctg atgttctcta ccaaccacca cagacatcaa
tcacttctgc tgttctgcag 9720agtggtttta ggaaaatggc attcccgtca ggcaaagttg
aagggtgcat ggtacaagta 9780acctgtggaa ctacaactct taatggattg tggttggatg
acacagtata ctgtccaaga 9840catgtcattt gcacagcaga agacatgctt aatcctaact
atgaagatct gctcattcgc 9900aaatccaacc atagctttct tgttcaggct ggcaatgttc
aacttcgtgt tattggccat 9960tctatgcaaa attgtctgct taggcttaaa gttgatactt
ctaaccctaa gacacccaag 10020tataaatttg tccgtatcca acctggtcaa acattttcag
ttctagcatg ctacaatggt 10080tcaccatctg gtgtttatca gtgtgccatg agacctaatc
ataccattaa aggttctttc 10140cttaatggat catgtggtag tgttggtttt aacattgatt
atgattgcgt gtctttctgc 10200tatatgcatc atatggagct tccaacagga gtacacgctg
gtactgactt agaaggtaaa 10260ttctatggtc catttgttga cagacaaact gcacaggctg
caggtacaga cacaaccata 10320acattaaatg ttttggcatg gctgtatgct gctgttatca
atggtgatag gtggtttctt 10380aatagattca ccactacttt gaatgacttt aaccttgtgg
caatgaagta caactatgaa 10440cctttgacac aagatcatgt tgacatattg ggacctcttt
ctgctcaaac aggaattgcc 10500gtcttagata tgtgtgctgc tttgaaagag ctgctgcaga
atggtatgaa tggtcgtact 10560atccttggta gcactatttt agaagatgag tttacaccat
ttgatgttgt tagacaatgc 10620tctggtgtta ccttccaagg taagttcaag aaaattgtta
agggcactca tcattggatg 10680cttttaactt tcttgacatc actattgatt cttgttcaaa
gtacacagtg gtcactgttt 10740ttctttgttt acgagaatgc tttcttgcca tttactcttg
gtattatggc aattgctgca 10800tgtgctatgc tgcttgttaa gcataagcac gcattcttgt
gcttgtttct gttaccttct 10860cttgcaacag ttgcttactt taatatggtc tacatgcctg
ctagctgggt gatgcgtatc 10920atgacatggc ttgaattggc tgacactagc ttgtctggtt
ataggcttaa ggattgtgtt 10980atgtatgctt cagctttagt tttgcttatt ctcatgacag
ctcgcactgt ttatgatgat 11040gctgctagac gtgtttggac actgatgaat gtcattacac
ttgtttacaa agtctactat 11100ggtaatgctt tagatcaagc tatttccatg tgggccttag
ttatttctgt aacctctaac 11160tattctggtg tcgttacgac tatcatgttt ttagctagag
ctatagtgtt tgtgtgtgtt 11220gagtattacc cattgttatt tattactggc aacaccttac
agtgtatcat gcttgtttat 11280tgtttcttag gctattgttg ctgctgctac tttggccttt
tctgtttact caaccgttac 11340ttcaggctta ctcttggtgt ttatgactac ttggtctcta
cacaagaatt taggtatatg 11400aactcccagg ggcttttgcc tcctaagagt agtattgatg
ctttcaagct taacattaag 11460ttgttgggta ttggaggtaa accatgtatc aaggttgcta
ctgtacagtc taaaatgtct 11520gacgtaaagt gcacatctgt ggtactgctc tcggttcttc
aacaacttag agtagagtca 11580tcttctaaat tgtgggcaca atgtgtacaa ctccacaatg
atattcttct tgcaaaagac 11640acaactgaag ctttcgagaa gatggtttct cttttgtctg
ttttgctatc catgcagggt 11700gctgtagaca ttaataggtt gtgcgaggaa atgctcgata
accgtgctac tcttcaggct 11760attgcttcag aatttagttc tttaccatca tatgccgctt
atgccactgc ccaggaggcc 11820tatgagcagg ctgtagctaa tggtgattct gaagtcgttc
tcaaaaagtt aaagaaatct 11880ttgaatgtgg ctaaatctga gtttgaccgt gatgctgcca
tgcaacgcaa gttggaaaag 11940atggcagatc aggctatgac ccaaatgtac aaacaggcaa
gatctgagga caagagggca 12000aaagtaacta gtgctatgca aacaatgctc ttcactatgc
ttaggaagct tgataatgat 12060gcacttaaca acattatcaa caatgcgcgt gatggttgtg
ttccactcaa catcatacca 12120ttgactacag cagccaaact catggttgtt gtccctgatt
atggtaccta caagaacact 12180tgtgatggta acacctttac atatgcatct gcactctggg
aaatccagca agttgttgat 12240gcggatagca agattgttca acttagtgaa attaacatgg
acaattcacc aaatttggct 12300tggcctctta ttgttacagc tctaagagcc aactcagctg
ttaaactaca gaataatgaa 12360ctgagtccag tagcactacg acagatgtcc tgtgcggctg
gtaccacaca aacagcttgt 12420actgatgaca atgcacttgc ctactataac aattcgaagg
gaggtaggtt tgtgctggca 12480ttactatcag accaccaaga tctcaaatgg gctagattcc
ctaagagtga tggtacaggt 12540acaatttaca cagaactgga accaccttgt aggtttgtta
cagacacacc aaaagggcct 12600aaagtgaaat acttgtactt catcaaaggc ttaaacaacc
taaatagagg tatggtgctg 12660ggcagtttag ctgctacagt acgtcttcag gctggaaatg
ctacagaagt acctgccaat 12720tcaactgtgc tttccttctg tgcttttgca gtagaccctg
ctaaagcata taaggattac 12780ctagcaagtg gaggacaacc aatcaccaac tgtgtgaaga
tgttgtgtac acacactggt 12840acaggacagg caattactgt aacaccagaa gctaacatgg
accaagagtc ctttggtggt 12900gcttcatgtt gtctgtattg tagatgccac attgaccatc
caaatcctaa aggattctgt 12960gacttgaaag gtaagtacgt ccaaatacct accacttgtg
ctaatgaccc agtgggtttt 13020acacttagaa acacagtctg taccgtctgc ggaatgtgga
aaggttatgg ctgtagttgt 13080gaccaactcc gcgaaccctt gatgcagtct gcggatgcat
caacgttttt aaacgggttt 13140gcggtgtaag tgcagcccgt cttacaccgt gcggcacagg
cactagtact gatgtcgtct 13200acagggcttt tgatatttac aacgaaaaag ttgctggttt
tgcaaagttc ctaaaaacta 13260attgctgtcg cttccaggag aaggatgagg aaggcaattt
attagactct tactttgtag 13320ttaagaggca tactatgtct aactaccaac atgaagagac
tatttataac ttggttaaag 13380attgtccagc ggttgctgtc catgactttt tcaagtttag
agtagatggt gacatggtac 13440cacatatatc acgtcagcgt ctaactaaat acacaatggc
tgatttagtc tatgctctac 13500gtcattttga tgagggtaat tgtgatacat taaaagaaat
actcgtcaca tacaattgct 13560gtgatgatga ttatttcaat aagaaggatt ggtatgactt
cgtagagaat cctgacatct 13620tacgcgtata tgctaactta ggtgagcgtg tacgccaatc
attattaaag actgtacaat 13680tctgcgatgc tatgcgtgat gcaggcattg taggcgtact
gacattagat aatcaggatc 13740ttaatgggaa ctggtacgat ttcggtgatt tcgtacaagt
agcaccaggc tgcggagttc 13800ctattgtgga ttcatattac tcattgctga tgcccatcct
cactttgact agggcattgg 13860ctgctgagtc ccatatggat gctgatctcg caaaaccact
tattaagtgg gatttgctga 13920aatatgattt tacggaagag agactttgtc tcttcgaccg
ttattttaaa tattgggacc 13980agacatacca tcccaattgt attaactgtt tggatgatag
gtgtatcctt cattgtgcaa 14040actttaatgt gttattttct actgtgtttc cacctacaag
ttttggacca ctagtaagaa 14100aaatatttgt agatggtgtt ccttttgttg tttcaactgg
ataccatttt cgtgagttag 14160gagtcgtaca taatcaggat gtaaacttac atagctcgcg
tctcagtttc aaggaacttt 14220tagtgtatgc tgctgatcca gctatgcatg cagcttctgg
caatttattg ctagataaac 14280gcactacatg cttttcagta gctgcactaa caaacaatgt
tgcttttcaa actgtcaaac 14340ccggtaattt taataaagac ttttatgact ttgctgtgtc
taaaggtttc tttaaggaag 14400gaagttctgt tgaactaaaa cacttcttct ttgctcagga
tggcaacgct gctatcagtg 14460attatgacta ttatcgttat aatctgccaa caatgtgtga
tatcagacaa ctcctattcg 14520tagttgaagt tgttgataaa tactttgatt gttacgatgg
tggctgtatt aatgccaacc 14580aagtaatcgt taacaatctg gataaatcag ctggtttccc
atttaataaa tggggtaagg 14640ctagacttta ttatgactca atgagttatg aggatcaaga
tgcacttttc gcgtatacta 14700agcgtaatgt catccctact ataactcaaa tgaatcttaa
gtatgccatt agtgcaaaga 14760atagagctcg caccgtagct ggtgtctcta tctgtagtac
tatgacaaat agacagtttc 14820atcagaaatt attgaagtca atagccgcca ctagaggagc
tactgtggta attggaacaa 14880gcaagtttta cggtggctgg cataatatgt taaaaactgt
ttacagtgat gtagaaactc 14940cacaccttat gggttgggat tatccaaaat gtgacagagc
catgcctaac atgcttagga 15000taatggcctc tcttgttctt gctcgcaaac ataacacttg
ctgtaactta tcacaccgtt 15060tctacaggtt agctaacgag tgtgcgcaag tattaagtga
gatggtcatg tgtggcggct 15120cactatatgt taaaccaggt ggaacatcat ccggtgatgc
tacaactgct tatgctaata 15180gtgtctttaa catttgtcaa gctgttacag ccaatgtaaa
tgcacttctt tcaactgatg 15240gtaataagat agctgacaag tatgtccgca atctacaaca
caggctctat gagtgtctct 15300atagaaatag ggatgttgat catgaattcg tggatgagtt
ttacgcttac ctgcgtaaac 15360atttctccat gatgattctt tctgatgatg ccgttgtgtg
ctataacagt aactatgcgg 15420ctcaaggttt agtagctagc attaagaact ttaaggcagt
tctttattat caaaataatg 15480tgttcatgtc tgaggcaaaa tgttggactg agactgacct
tactaaagga cctcacgaat 15540tttgctcaca gcatacaatg ctagttaaac aaggagatga
ttacgtgtac ctgccttacc 15600cagatccatc aagaatatta ggcgcaggct gttttgtcga
tgatattgtc aaaacagatg 15660gtacacttat gattgaaagg ttcgtgtcac tggctattga
tgcttaccca cttacaaaac 15720atcctaatca ggagtatgct gatgtctttc acttgtattt
acaatacatt agaaagttac 15780atgatgagct tactggccac atgttggaca tgtattccgt
aatgctaact aatgataaca 15840cctcacggta ctgggaacct gagttttatg aggctatgta
cacaccacat acagtcttgc 15900aggctgtagg tgcttgtgta ttgtgcaatt cacagacttc
acttcgttgc ggtgcctgta 15960ttaggagacc attcctatgt tgcaagtgct gctatgacca
tgtcatttca acatcacaca 16020aattagtgtt gtctgttaat ccctatgttt gcaatgcccc
aggttgtgat gtcactgatg 16080tgacacaact gtatctagga ggtatgagct attattgcaa
gtcacataag cctcccatta 16140gttttccatt atgtgctaat ggtcaggttt ttggtttata
caaaaacaca tgtgtaggca 16200gtgacaatgt cactgacttc aatgcgatag caacatgtga
ttggactaat gctggcgatt 16260acatacttgc caacacttgt actgagagac tcaagctttt
cgcagcagaa acgctcaaag 16320ccactgagga aacatttaag ctgtcatatg gtattgccac
tgtacgcgaa gtactctctg 16380acagagaatt gcatctttca tgggaggttg gaaaacctag
accaccattg aacagaaact 16440atgtctttac tggttaccgt gtaactaaaa atagtaaagt
acagattgga gagtacacct 16500ttgaaaaagg tgactatggt gatgctgttg tgtacagagg
tactacgaca tacaagttga 16560atgttggtga ttactttgtg ttgacatctc acactgtaat
gccacttagt gcacctactc 16620tagtgccaca agagcactat gtgagaatta ctggcttgta
cccaacactc aacatctcag 16680atgagttttc tagcaatgtt gcaaattatc aaaaggtcgg
catgcaaaag tactctacac 16740tccaaggacc acctggtact ggtaagagtc attttgccat
cggacttgct ctctattacc 16800catctgctcg catagtgtat acggcatgct ctcatgcagc
tgttgatgcc ctatgtgaaa 16860aggcattaaa atatttgccc atagataaat gtagtagaat
catacctgcg cgtgcgcgcg 16920tagagtgttt tgataaattc aaagtgaatt caacactaga
acagtatgtt ttctgcactg 16980taaatgcatt gccagaaaca actgctgaca ttgtagtctt
tgatgaaatc tctatggcta 17040ctaattatga cttgagtgtt gtcaatgcta gacttcgtgc
aaaacactac gtctatattg 17100gcgatcctgc tcaattacca gccccccgca cattgctgac
taaaggcaca ctagaaccag 17160aatattttaa ttcagtgtgc agacttatga aaacaatagg
tccagacatg ttccttggaa 17220cttgtcgccg ttgtcctgct gaaattgttg acactgtgag
tgctttagtt tatgacaata 17280agctaaaagc acacaaggat aagtcagctc aatgcttcaa
aatgttctac aaaggtgtta 17340ttacacatga tgtttcatct gcaatcaaca gacctcaaat
aggcgttgta agagaatttc 17400ttacacgcaa tcctgcttgg agaaaagctg tttttatctc
accttataat tcacagaacg 17460ctgtagcttc aaaaatctta ggattgccta cgcagactgt
tgattcatca cagggttctg 17520aatatgacta tgtcatattc acacaaacta ctgaaacagc
acactcttgt aatgtcaacc 17580gcttcaatgt ggctatcaca agggcaaaaa ttggcatttt
gtgcataatg tctgatagag 17640atctttatga caaactgcaa tttacaagtc tagaaatacc
acgtcgcaat gtggctacat 17700tacaagcaga aaatgtaact ggacttttta aggactgtag
taagatcatt actggtcttc 17760atcctacaca ggcacctaca cacctcagcg ttgatataaa
gttcaagact gaaggattat 17820gtgttgacat accaggcata ccaaaggaca tgacctaccg
tagactcatc tctatgatgg 17880gtttcaaaat gaattaccaa gtcaatggtt accctaatat
gtttatcacc cgcgaagaag 17940ctattcgtca cgttcgtgcg tggattggct ttgatgtaga
gggctgtcat gcaactagag 18000atgctgtggg tactaaccta cctctccagc taggattttc
tacaggtgtt aacttagtag 18060ctgtaccgac tggttatgtt gacactgaaa ataacacaga
attcaccaga gttaatgcaa 18120aacctccacc aggtgaccag tttaaacatc ttataccact
catgtataaa ggcttgccct 18180ggaatgtagt gcgtattaag atagtacaaa tgctcagtga
tacactgaaa ggattgtcag 18240acagagtcgt gttcgtcctt tgggcgcatg gctttgagct
tacatcaatg aagtactttg 18300tcaagattgg acctgaaaga acgtgttgtc tgtgtgacaa
acgtgcaact tgcttttcta 18360cttcatcaga tacttatgcc tgctggaatc attctgtggg
ttttgactat gtctataacc 18420catttatgat tgatgttcag cagtggggct ttacgggtaa
ccttcagagt aaccatgacc 18480aacattgcca ggtacatgga aatgcacatg tggctagttg
tgatgctatc atgactagat 18540gtttagcagt ccatgagtgc tttgttaagc gcgttgattg
gtctgttgaa taccctatta 18600taggagatga actgagggtt aattctgctt gcagaaaagt
acaacacatg gttgtgaagt 18660ctgcattgct tgctgataag tttccagttc ttcatgacat
tggaaatcca aaggctatca 18720agtgtgtgcc tcaggctgaa gtagaatgga agttctacga
tgctcagcca tgtagtgaca 18780aagcttacaa aatagaggaa ctcttctatt cttatgctac
acatcacgat aaattcactg 18840atggtgtttg tttgttttgg aattgtaacg ttgatcgtta
cccagccaat gcaattgtgt 18900gtaggtttga cacaagagtc ttgtcaaact tgaacttacc
aggctgtgat ggtggtagtt 18960tgtatgtgaa taagcatgca ttccacactc cagctttcga
taaaagtgca tttactaatt 19020taaagcaatt gcctttcttt tactattctg atagtccttg
tgagtctcat ggcaaacaag 19080tagtgtcgga tattgattat gttccactca aatctgctac
gtgtattaca cgatgcaatt 19140taggtggtgc tgtttgcaga caccatgcaa atgagtaccg
acagtacttg gatgcatata 19200atatgatgat ttctgctgga tttagcctat ggatttacaa
acaatttgat acttataacc 19260tgtggaatac atttaccagg ttacagagtt tagaaaatgt
ggcttataat gttgttaata 19320aaggacactt tgatggacac gccggcgaag cacctgtttc
catcattaat aatgctgttt 19380acacaaaggt agatggtatt gatgtggaga tctttgaaaa
taagacaaca cttcctgtta 19440atgttgcatt tgagctttgg gctaagcgta acattaaacc
agtgccagag attaagatac 19500tcaataattt gggtgttgat atcgctgcta atactgtaat
ctgggactac aaaagagaag 19560ccccagcaca tgtatctaca ataggtgtct gcacaatgac
tgacattgcc aagaaaccta 19620ctgagagtgc ttgttcttca cttactgtct tgtttgatgg
tagagtggaa ggacaggtag 19680acctttttag aaacgcccgt aatggtgttt taataacaga
aggttcagtc aaaggtctaa 19740caccttcaaa gggaccagca caagctagcg tcaatggagt
cacattaatt ggagaatcag 19800taaaaacaca gtttaactac tttaagaaag tagacggcat
tattcaacag ttgcctgaaa 19860cctactttac tcagagcaga gacttagagg attttaagcc
cagatcacaa atggaaactg 19920actttctcga gctcgctatg gatgaattca tacagcgata
taagctcgag ggctatgcct 19980tcgaacacat cgtttatgga gatttcagtc atggacaact
tggcggtctt catttaatga 20040taggcttagc caagcgctca caagattcac cacttaaatt
agaggatttt atccctatgg 20100acagcacagt gaaaaattac ttcataacag atgcgcaaac
aggttcatca aaatgtgtgt 20160gttctgtgat tgatctttta cttgatgact ttgtcgagat
aataaagtca caagatttgt 20220cagtgatttc aaaagtggtc aaggttacaa ttgactatgc
tgaaatttca ttcatgcttt 20280ggtgtaagga tggacatgtt gaaaccttct acccaaaact
acaagcaagt caagcgtggc 20340aaccaggtgt tgcgatgcct aacttgtaca agatgcaaag
aatgcttctt gaaaagtgtg 20400accttcagaa ttatggtgaa aatgctgtta taccaaaagg
aataatgatg aatgtcgcaa 20460agtatactca actgtgtcaa tacttaaata cacttacttt
agctgtaccc tacaacatga 20520gagttattca ctttggtgct ggctctgata aaggagttgc
accaggtaca gctgtgctca 20580gacaatggtt gccaactggc acactacttg tcgattcaga
tcttaatgac ttcgtctccg 20640acgcagattc tactttaatt ggagactgtg caacagtaca
tacggctaat aaatgggacc 20700ttattattag cgatatgtat gaccctagga ccaaacatgt
gacaaaagag aatgactcta 20760aagaagggtt tttcacttat ctgtgtggat ttataaagca
aaaactagcc ctgggtggtt 20820ctatagctgt aaagataaca gagcattctt ggaatgctga
cctttacaag cttatgggcc 20880atttctcatg gtggacagct tttgttacaa atgtaaatgc
atcatcatcg gaagcatttt 20940taattggggc taactatctt ggcaagccga aggaacaaat
tgatggctat accatgcatg 21000ctaactacat tttctggagg aacacaaatc ctatccagtt
gtcttcctat tcactctttg 21060acatgagcaa atttcctctt aaattaagag gaactgctgt
aatgtctctt aaggagaatc 21120aaatcaatga tatgatttat tctcttctgg aaaaaggtag
gcttatcatt agagaaaaca 21180acagagttgt ggtttcaagt gatattcttg ttaacaacta a
2122132297DNACORONAVIRUS 32atggacccca atcaaaccaa
cgtagtgccc cccgcattac atttggtgga cccacagatt 60caactgacaa taaccagaat
ggaggacgca atggggcaag gccaaaacag cgccgacccc 120aaggtttacc caataatact
gcgtcttggt tcacagctct cactcagcat ggcaaggagg 180aacttagatt ccctcgaggc
cagggcgttc caatcaacac caatagtggt ccagatgacc 240aaattggcta ctaccgaaga
gctacccgac gagttcgtgg tggtgacggc aaaatga 2973398PRTCORONAVIRUS
33Met Asp Pro Asn Gln Thr Asn Val Val Pro Pro Ala Leu His Leu Val1
5 10 15Asp Pro Gln Ile Gln Leu
Thr Ile Thr Arg Met Glu Asp Ala Met Gly 20 25
30Gln Gly Gln Asn Ser Ala Asp Pro Lys Val Tyr Pro Ile
Ile Leu Arg 35 40 45Leu Gly Ser
Gln Leu Ser Leu Ser Met Ala Arg Arg Asn Leu Asp Ser 50
55 60Leu Glu Ala Arg Ala Phe Gln Ser Thr Pro Ile Val
Val Gln Met Thr65 70 75
80Lys Leu Ala Thr Thr Glu Glu Leu Pro Asp Glu Phe Val Val Val Thr
85 90 95Ala
Lys34213DNACORONAVIRUS 34atgctgccac cgtgctacaa cttcctcaag gaacaacatt
gccaaaaggc ttctacgcag 60agggaagcag aggcggcagt caagcctctt ctcgctcctc
atcacgtagt cgcggtaatt 120caagaaattc aactcctggc agcagtaggg gaaattctcc
tgctcgaatg gctagcggag 180gtggtgaaac tgccctcgcg ctattgctgc tag
2133570PRTCORONAVIRUS 35Met Leu Pro Pro Cys Tyr
Asn Phe Leu Lys Glu Gln His Cys Gln Lys1 5
10 15Ala Ser Thr Gln Arg Glu Ala Glu Ala Ala Val Lys
Pro Leu Leu Ala 20 25 30Pro
His His Val Val Ala Val Ile Gln Glu Ile Gln Leu Leu Ala Ala 35
40 45Val Gly Glu Ile Leu Leu Leu Glu Trp
Leu Ala Glu Val Val Lys Leu 50 55
60Pro Ser Arg Tyr Cys Cys65
70361377DNACORONAVIRUSCDS(67)..(1335) 36atgaaggtca ccaaactgct gcatttagag
acgtacttgt tgttttaaat aaacgaacaa 60attaaa atg tct gat aat gga ccc caa
tca aac caa cgt agt gcc ccc 108 Met Ser Asp Asn Gly Pro Gln
Ser Asn Gln Arg Ser Ala Pro 1 5
10cgc att aca ttt ggt gga ccc aca gat tca act gac aat aac cag aat
156Arg Ile Thr Phe Gly Gly Pro Thr Asp Ser Thr Asp Asn Asn Gln Asn15
20 25 30gga gga cgc aat ggg
gca agg cca aaa cag cgc cga ccc caa ggt tta 204Gly Gly Arg Asn Gly
Ala Arg Pro Lys Gln Arg Arg Pro Gln Gly Leu 35
40 45ccc aat aat act gcg tct tgg ttc aca gct ctc
act cag cat ggc aag 252Pro Asn Asn Thr Ala Ser Trp Phe Thr Ala Leu
Thr Gln His Gly Lys 50 55
60gag gaa ctt aga ttc cct cga ggc cag ggc gtt cca atc aac acc aat
300Glu Glu Leu Arg Phe Pro Arg Gly Gln Gly Val Pro Ile Asn Thr Asn
65 70 75agt ggt cca gat gac caa att ggc
tac tac cga aga gct acc cga cga 348Ser Gly Pro Asp Asp Gln Ile Gly
Tyr Tyr Arg Arg Ala Thr Arg Arg 80 85
90gtt cgt ggt ggt gac ggc aaa atg aaa gag ctc agc ccc aga tgg tac
396Val Arg Gly Gly Asp Gly Lys Met Lys Glu Leu Ser Pro Arg Trp Tyr95
100 105 110ttc tat tac cta
gga act ggc cca gaa gct tca ctt ccc tac ggc gct 444Phe Tyr Tyr Leu
Gly Thr Gly Pro Glu Ala Ser Leu Pro Tyr Gly Ala 115
120 125aac aaa gaa ggc atc gta tgg gtt gca act
gag gga gcc ttg aat aca 492Asn Lys Glu Gly Ile Val Trp Val Ala Thr
Glu Gly Ala Leu Asn Thr 130 135
140ccc aaa gac cac att ggc acc cgc aat cct aat aac aat gct gcc acc
540Pro Lys Asp His Ile Gly Thr Arg Asn Pro Asn Asn Asn Ala Ala Thr
145 150 155gtg cta caa ctt cct caa gga
aca aca ttg cca aaa ggc ttc tac gca 588Val Leu Gln Leu Pro Gln Gly
Thr Thr Leu Pro Lys Gly Phe Tyr Ala 160 165
170gag gga agc aga ggc ggc agt caa gcc tct tct cgc tcc tca tca cgt
636Glu Gly Ser Arg Gly Gly Ser Gln Ala Ser Ser Arg Ser Ser Ser Arg175
180 185 190agt cgc ggt aat
tca aga aat tca act cct ggc agc agt agg gga aat 684Ser Arg Gly Asn
Ser Arg Asn Ser Thr Pro Gly Ser Ser Arg Gly Asn 195
200 205tct cct gct cga atg gct agc gga ggt ggt
gaa act gcc ctc gcg cta 732Ser Pro Ala Arg Met Ala Ser Gly Gly Gly
Glu Thr Ala Leu Ala Leu 210 215
220ttg ctg cta gac aga ttg aac cag ctt gag agc aaa gtt tct ggt aaa
780Leu Leu Leu Asp Arg Leu Asn Gln Leu Glu Ser Lys Val Ser Gly Lys
225 230 235ggc caa caa caa caa ggc caa
act gtc act aag aaa tct gct gct gag 828Gly Gln Gln Gln Gln Gly Gln
Thr Val Thr Lys Lys Ser Ala Ala Glu 240 245
250gca tct aaa aag cct cgc caa aaa cgt act gcc aca aaa cag tac aac
876Ala Ser Lys Lys Pro Arg Gln Lys Arg Thr Ala Thr Lys Gln Tyr Asn255
260 265 270gtc act caa gca
ttt ggg aga cgt ggt cca gaa caa acc caa gga aat 924Val Thr Gln Ala
Phe Gly Arg Arg Gly Pro Glu Gln Thr Gln Gly Asn 275
280 285ttc ggg gac caa gac cta atc aga caa gga
act gat tac aaa cat tgg 972Phe Gly Asp Gln Asp Leu Ile Arg Gln Gly
Thr Asp Tyr Lys His Trp 290 295
300ccg caa att gca caa ttt gct cca agt gcc tct gca ttc ttt gga atg
1020Pro Gln Ile Ala Gln Phe Ala Pro Ser Ala Ser Ala Phe Phe Gly Met
305 310 315tca cgc att ggc atg gaa gtc
aca cct tcg gga aca tgg ctg act tat 1068Ser Arg Ile Gly Met Glu Val
Thr Pro Ser Gly Thr Trp Leu Thr Tyr 320 325
330cat gga gcc att aaa ttg gat gac aaa gat cca caa ttc aaa gac aac
1116His Gly Ala Ile Lys Leu Asp Asp Lys Asp Pro Gln Phe Lys Asp Asn335
340 345 350gtc ata ctg ctg
aac aag cac att gac gca tac aaa aca ttc cca cca 1164Val Ile Leu Leu
Asn Lys His Ile Asp Ala Tyr Lys Thr Phe Pro Pro 355
360 365aca gag cct aaa aag gac aaa aag aaa aag
act gat gaa gct cag cct 1212Thr Glu Pro Lys Lys Asp Lys Lys Lys Lys
Thr Asp Glu Ala Gln Pro 370 375
380ttg ccg cag aga caa aag aag cag ccc act gtg act ctt ctt cct gcg
1260Leu Pro Gln Arg Gln Lys Lys Gln Pro Thr Val Thr Leu Leu Pro Ala
385 390 395gct gac atg gat gat ttc tcc
aga caa ctt caa aat tcc atg agt gga 1308Ala Asp Met Asp Asp Phe Ser
Arg Gln Leu Gln Asn Ser Met Ser Gly 400 405
410gct tct gct gat tca act cag gca taa acactcatga tgaccacaca
1355Ala Ser Ala Asp Ser Thr Gln Ala415 420aggcagatgg
gctatgtaaa cg
137737422PRTCORONAVIRUS 37Met Ser Asp Asn Gly Pro Gln Ser Asn Gln Arg Ser
Ala Pro Arg Ile1 5 10
15Thr Phe Gly Gly Pro Thr Asp Ser Thr Asp Asn Asn Gln Asn Gly Gly
20 25 30Arg Asn Gly Ala Arg Pro Lys
Gln Arg Arg Pro Gln Gly Leu Pro Asn 35 40
45Asn Thr Ala Ser Trp Phe Thr Ala Leu Thr Gln His Gly Lys Glu
Glu 50 55 60Leu Arg Phe Pro Arg Gly
Gln Gly Val Pro Ile Asn Thr Asn Ser Gly65 70
75 80Pro Asp Asp Gln Ile Gly Tyr Tyr Arg Arg Ala
Thr Arg Arg Val Arg 85 90
95Gly Gly Asp Gly Lys Met Lys Glu Leu Ser Pro Arg Trp Tyr Phe Tyr
100 105 110Tyr Leu Gly Thr Gly Pro
Glu Ala Ser Leu Pro Tyr Gly Ala Asn Lys 115 120
125Glu Gly Ile Val Trp Val Ala Thr Glu Gly Ala Leu Asn Thr
Pro Lys 130 135 140Asp His Ile Gly Thr
Arg Asn Pro Asn Asn Asn Ala Ala Thr Val Leu145 150
155 160Gln Leu Pro Gln Gly Thr Thr Leu Pro Lys
Gly Phe Tyr Ala Glu Gly 165 170
175Ser Arg Gly Gly Ser Gln Ala Ser Ser Arg Ser Ser Ser Arg Ser Arg
180 185 190Gly Asn Ser Arg Asn
Ser Thr Pro Gly Ser Ser Arg Gly Asn Ser Pro 195
200 205Ala Arg Met Ala Ser Gly Gly Gly Glu Thr Ala Leu
Ala Leu Leu Leu 210 215 220Leu Asp Arg
Leu Asn Gln Leu Glu Ser Lys Val Ser Gly Lys Gly Gln225
230 235 240Gln Gln Gln Gly Gln Thr Val
Thr Lys Lys Ser Ala Ala Glu Ala Ser 245
250 255Lys Lys Pro Arg Gln Lys Arg Thr Ala Thr Lys Gln
Tyr Asn Val Thr 260 265 270Gln
Ala Phe Gly Arg Arg Gly Pro Glu Gln Thr Gln Gly Asn Phe Gly 275
280 285Asp Gln Asp Leu Ile Arg Gln Gly Thr
Asp Tyr Lys His Trp Pro Gln 290 295
300Ile Ala Gln Phe Ala Pro Ser Ala Ser Ala Phe Phe Gly Met Ser Arg305
310 315 320Ile Gly Met Glu
Val Thr Pro Ser Gly Thr Trp Leu Thr Tyr His Gly 325
330 335Ala Ile Lys Leu Asp Asp Lys Asp Pro Gln
Phe Lys Asp Asn Val Ile 340 345
350Leu Leu Asn Lys His Ile Asp Ala Tyr Lys Thr Phe Pro Pro Thr Glu
355 360 365Pro Lys Lys Asp Lys Lys Lys
Lys Thr Asp Glu Ala Gln Pro Leu Pro 370 375
380Gln Arg Gln Lys Lys Gln Pro Thr Val Thr Leu Leu Pro Ala Ala
Asp385 390 395 400Met Asp
Asp Phe Ser Arg Gln Leu Gln Asn Ser Met Ser Gly Ala Ser
405 410 415Ala Asp Ser Thr Gln Ala
420381377DNACORONAVIRUS 38atgaaggtca ccaaactgct gcatttagag acgtacttgt
tgttttaaat aaacgaacaa 60attaaaatgt ctgataatgg accccaatca aaccaacgta
gtgccccccg cattacattt 120ggtggaccca cagattcaac tgacaataac cagaatggag
gacgcaatgg ggcaaggcca 180aaacagcgcc gaccccaagg tttacccaat aatactgcgt
cttggttcac agctctcact 240cagcatggca aggaggaact tagattccct cgaggccagg
gcgttccaat caacaccaat 300agtggtccag atgaccaaat tggctactac cgaagagcta
cccgacgagt tcgtggtggt 360gacggcaaaa tgaaagagct cagccccaga tggtacttct
attacctagg aactggccca 420gaagcttcac ttccctacgg cgctaacaaa gaaggcatcg
tatgggttgc aactgaggga 480gccttgaata cacccaaaga ccacattggc acccgcaatc
ctaataacaa tgctgccacc 540gtgctacaac ttcctcaagg aacaacattg ccaaaaggct
tctacgcaga gggaagcaga 600ggcggcagtc aagcctcttc tcgctcctca tcacgtagtc
gcggtaattc aagaaattca 660actcctggca gcagtagggg aaattctcct gctcgaatgg
ctagcggagg tggtgaaact 720gccctcgcgc tattgctgct agacagattg aaccagcttg
agagcaaagt ttctggtaaa 780ggccaacaac aacaaggcca aactgtcact aagaaatctg
ctgctgaggc atctaaaaag 840cctcgccaaa aacgtactgc cacaaaacag tacaacgtca
ctcaagcatt tgggagacgt 900ggtccagaac aaacccaagg aaatttcggg gaccaagacc
taatcagaca aggaactgat 960tacaaacatt ggccgcaaat tgcacaattt gctccaagtg
cctctgcatt ctttggaatg 1020tcacgcattg gcatggaagt cacaccttcg ggaacatggc
tgacttatca tggagccatt 1080aaattggatg acaaagatcc acaattcaaa gacaacgtca
tactgctgaa caagcacatt 1140gacgcataca aaacattccc accaacagag cctaaaaagg
acaaaaagaa aaagactgat 1200gaagctcagc ctttgccgca gagacaaaag aagcagccca
ctgtgactct tcttcctgcg 1260gctgacatgg atgatttctc cagacaactt caaaattcca
tgagtggagc ttctgctgat 1320tcaactcagg cataaacact catgatgacc acacaaggca
gatgggctat gtaaacg 137739204DNACORONAVIRUS 39atattaggtt tttacctacc
caggaaaagc caaccaacct cgatctcttg tagatctgtt 60ctctaaacga actttaaaat
ctgtgtagct gtcgctcggc tgcatgccta gtgcacctac 120gcagtataaa caataataaa
ttttactgtc gttgacaaga aacgagtaac tcgtccctct 180tctgcagact gcttacggtt
tcgt 20440809DNACORONAVIRUS
40actcaagcat ttgggagacg tggtccagaa caaacccaag gaaatttcgg ggaccaagac
60ctaatcagac aaggaactga ttacaaacat tggccgcaaa ttgcacaatt tgctccaagt
120gcctctgcat tctttggaat gtcacgcatt ggcatggaag tcacaccttc gggaacatgg
180ctgacttatc atggagccat taaattggat gacaaagatc cacaattcaa agacaacgtc
240atactgctga acaagcacat tgacgcatac aaaacattcc caccaacaga gcctaaaaag
300gacaaaaaga aaaagactga tgaagctcag cctttgccgc agagacaaaa gaagcagccc
360actgtgactc ttcttcctgc ggctgacatg gatgatttct ccagacaact tcaaaattcc
420atgagtggag cttctgctga ttcaactcag gcataaacac tcatgatgac cacacaaggc
480agatgggcta tgtaaacgtt ttcgcaattc cgtttacgat acatagtcta ctcttgtgca
540gaatgaattc tcgtaactaa acagcacaag taggtttagt taactttaat ctcacatagc
600aatctttaat caatgtgtaa cattagggag gacttgaaag agccaccaca ttttcatcga
660ggccacgcgg agtacgatcg agggtacagt gaataatgct agggagagct gcctatatgg
720aagagcccta atgtgtaaaa ttaattttag tagtgctatc cccatgtgat tttaatagct
780tcttaggaga atgacaaaaa aaaaaaaaa
80941448DNACORONAVIRUS 41aatgaacaca tagggctgtt caagctgggg cagtacgcct
ttttccagct ctactagacc 60acaagtgcca tttttgaggt gttcacgtgc ctccgatagg
gcctcttcca cagagtcccc 120gaagccacgc actagcacgt ctctaacctg aaggacaggc
aaactgagtt ggacgtgtgt 180tttctcgttg acaccaagaa caaggctctc catcttacct
ttcggtcaca cccggacgaa 240acctaggtat gctgatgatc gactgcaaca cggacgaaac
cgtaagcagt ctgcagaaga 300gggacgagtt actcgtttct tgtcaacgac agtaaaattt
attattgttt atactgcgta 360ggtgcactag gcatgcagcc gagcgacagc tacacagatt
ttaaagttcg tttagagaac 420agatctacaa gagatcgagg ttggttgg
448422033DNACORONAVIRUS 42atacctaggt ttcgtccggg
tgtgaccgaa aggtaagatg gagagccttg ttcttggtgt 60caacgagaaa acacacgtcc
aactcagttt gcctgtcctt caggttagag acgtgctagt 120gcgtggcttc ggggactctg
tggaagaggc cctatcggag gcacgtgaac acctcaaaaa 180tggcacttgt ggtctagtag
agctggaaaa aggcgtactg ccccagcttg aacagcccta 240tgtgttcatt aaacgttctg
atgccttaag caccaatcac ggccacaagg tcgttgagct 300ggttgcagaa atggacggca
ttcagtacgg tcgtagcggt ataacactgg gagtactcgt 360gccacatgtg ggcgaaaccc
caattgcata ccgcaatgtt cttcttcgta agaacggtaa 420taagggagcc ggtggtcata
gctatggcat cgatctaaag tcttatgact taggtgacga 480gcttggcact gatcccattg
aagattatga acaaaactgg aacactaagc atggcagtgg 540tgcactccgt gaactcactc
gtgagctcaa tggaggtgca gtcactcgct atgtcgacaa 600caatttctgt ggcccagatg
ggtaccctct tgattgcatc aaagattttc tcgcacgcgc 660gggcaagtca atgtgcactc
tttccgaaca acttgattac atcgagtcga agagaggtgt 720ctactgctgc cgtgaccatg
agcatgaaat tgcctggttc actgagcgct ctgataagag 780ctacgagcac cagacaccct
tcgaaattaa gagtgccaag aaatttgaca ctttcaaagg 840ggaatgccca aagtttgtgt
ttcctcttaa ctcaaaagtc aaagtcattc aaccacgtgt 900tgaaaagaaa aagactgagg
gtttcatggg gcgtatacgc tctgtgtacc ctgttgcatc 960tccacaggag tgtaacaata
tgcacttgtc taccttgatg aaatgtaatc attgcgatga 1020agtttcatgg cagacgtgcg
actttctgaa agccacttgt gaacattgtg gcactgaaaa 1080tttagttatt gaaggaccta
ctacatgtgg gtacctacct actaatgctg tagtgaaaat 1140gccatgtcct gcctgtcaag
acccagagat tggacctgag catagtgttg cagattatca 1200caaccactca aacattgaaa
ctcgactccg caagggaggt aggactagat gttttggagg 1260ctgtgtgttt gcctatgttg
gctgctataa taagcgtgcc tactgggttc ctcgtgctag 1320tgctgatatt ggctcaggcc
atactggcat tactggtgac aatgtggaga ccttgaatga 1380ggatctcctt gagatactga
gtcgtgaacg tgttaacatt aacattgttg gcgattttca 1440tttgaatgaa gaggttgcca
tcattttggc atctttctct gcttctacaa gtgcctttat 1500tgacactata aagagtcttg
attacaagtc tttcaaaacc attgttgagt cctgcggtaa 1560ctataaagtt accaagggaa
agcccgtaaa aggtgcttgg aacattggac aacagagatc 1620agttttaaca ccactgtgtg
gttttccctc acaggctgct ggtgttatca gatcaatttt 1680tgcgcgcaca cttgatgcag
caaaccactc aattcctgat ttgcaaagag cagctgtcac 1740catacttgat ggtatttctg
aacagtcatt acgtcttgtc gacgccatgg tttatacttc 1800agacctgctc accaacagtg
tcattattat ggcatatgta actggtggtc ttgtacaaca 1860gacttctcag tggttgtcta
atcttttggg cactactgtt gaaaaactca ggcctatctt 1920tgaatggatt gaggcgaaac
ttagtgcagg agttgaattt ctcaaggatg cttgggagat 1980tctcaaattt ctcattacag
gtgtttttga catcgtcaag ggtcaaatac agg 2033432018DNACORONAVIRUS
43ggattgaggc gaaacttagt gcaggagttg aatttctcaa ggatgcttgg gagattctca
60aatttctcat tacaggtgtt tttgacatcg tcaagggtca aatacaggtt gcttcagata
120acatcaagga ttgtgtaaaa tgcttcattg atgttgttaa caaggcactc gaaatgtgca
180ttgatcaagt cactatcgct ggcgcaaagt tgcgatcact caacttaggt gaagtcttca
240tcgctcaaag caagggactt taccgtcagt gtatacgtgg caaggagcag ctgcaactac
300tcatgcctct taaggcacca aaagaagtaa cctttcttga aggtgattca catgacacag
360tacttacctc tgaggaggtt gttctcaaga acggtgaact cgaagcactc gagacgcccg
420ttgatagctt cacaaatgga gctatcgttg gcacaccagt ctgtgtaaat ggcctcatgc
480tcttagagat taaggacaaa gaacaatact gcgcattgtc tcctggttta ctggctacaa
540acaatgtctt tcgcttaaaa gggggtgcac caattaaagg tgtaaccttt ggagaagata
600ctgtttggga agttcaaggt tacaagaatg tgagaatcac atttgagctt gatgaacgtg
660ttgacaaagt gcttaatgaa aagtgctctg tctacactgt tgaatccggt accgaagtta
720ctgagtttgc atgtgttgta gcagaggctg ttgtgaagac tttacaacca gtttctgatc
780tccttaccaa catgggtatt gatcttgatg agtggagtgt agctacattc tacttatttg
840atgatgctgg tgaagaaaac ttttcatcac gtatgtattg ttccttttac cctccagatg
900aggaagaaga ggacgatgca gagtgtgagg aagaagaaat tgatgaaacc tgtgaacatg
960agtacggtac agaggatgat tatcaaggtc tccctctgga atttggtgcc tcagctgaaa
1020cagttcgagt tgaggaagaa gaagaggaag actggctgga tgatactact gagcaatcag
1080agattgagcc agaaccagaa cctacacctg aagaaccagt taatcagttt actggttatt
1140taaaacttac tgacaatgtt gccattaaat gtgttgacat cgttaaggag gcacaaagtg
1200ctaatcctat ggtgattgta aatgctgcta acatacacct gaaacatggt ggtggtgtag
1260caggtgcact caacaaggca accaatggtg ccatgcaaaa ggagagtgat gattacatta
1320agctaaatgg ccctcttaca gtaggagggt cttgtttgct ttctggacat aatcttgcta
1380agaagtgtct gcatgttgtt ggacctaacc taaatgcagg tgaggacatc cagcttctta
1440aggcagcata tgaaaatttc aattcacagg acatcttact tgcaccattg ttgtcagcag
1500gcatatttgg tgctaaacca cttcagtctt tacaagtgtg cgtgcagacg gttcgtacac
1560aggtttatat tgcagtcaat gacaaagctc tttatgagca ggttgtcatg gattatcttg
1620ataacctgaa gcctagagtg gaagcaccta aacaagagga gccaccaaac acagaagatt
1680ccaaaactga ggagaaatct gtcgtacaga agcctgtcga tgtgaagcca aaaattaagg
1740cctgcattga tgaggttacc acaacactgg aagaaactaa gtttcttacc aataagttac
1800tcttgtttgc tgatatcaat ggtaagcttt accatgattc tcagaacatg cttagaggtg
1860aagatatgtc tttccttgag aaggatgcac cttacatggt aggtgatgtt atcactagtg
1920gtgatatcac ttgtgttgta ataccctcca aaaaggctgg tggcactact gagatgctct
1980caagagcttt gaagaaagtg ccagttgatg agtatata
2018441442DNACORONAVIRUS 44ttgatgaggt taccacaaca ctggaagaaa ctaagtttct
taccaataag ttactcttgt 60ttgctgatat caatggtaag ctttaccatg attctcagaa
catgcttaga ggtgaagata 120tgtctttcct tgagaaggat gcaccttaca tggtaggtga
tgttatcact agtggtgata 180tcacttgtgt tgtaataccc tccaaaaagg ctggtggcac
tactgagatg ctctcaagag 240ctttgaagaa agtgccagtt gatgagtata taaccacgta
ccctggacaa ggatgtgctg 300gttatacact tgaggaagct aagactgctc ttaagaaatg
caaatctgca ttttatgtac 360taccttcaga agcacctaat gctaaggaag agattctagg
aactgtatcc tggaatttga 420gagaaatgct tgctcatgct gaagagacaa gaaaattaat
gcctatatgc atggatgtta 480gagccataat ggcaaccatc caacgtaagt ataaaggaat
taaaattcaa gagggcatcg 540ttgactatgg tgtccgattc ttcttttata ctagtaaaga
gcctgtagct tctattatta 600cgaagctgaa ctctctaaat gagccgcttg tcacaatgcc
aattggttat gtgacacatg 660gttttaatct tgaagaggct gcgcgctgta tgcgttctct
taaagctcct gccgtagtgt 720cagtatcatc accagatgct gttactacat ataatggata
cctcacttcg tcatcaaaga 780catctgagga gcactttgta gaaacagttt ctttggctgg
ctcttacaga gattggtcct 840attcaggaca gcgtacagag ttaggtgttg aatttcttaa
gcgtggtgac aaaattgtgt 900accacactct ggagagcccc gtcgagtttc atcttgacgg
tgaggttctt tcacttgaca 960aactaaagag tctcttatcc ctgcgggagg ttaagactat
aaaagtgttc acaactgtgg 1020acaacactaa tctccacaca cagcttgtgg atatgtctat
gacatatgga cagcagtttg 1080gtccaacata cttggatggt gctgatgtta caaaaattaa
acctcatgta aatcatgagg 1140gtaagacttt ctttgtacta cctagtgatg acacactacg
tagtgaagct ttcgagtact 1200accatactct tgatgagagt tttcttggta ggtacatgtc
tgctttaaac cacacaaaga 1260aatggaaatt tcctcaagtt ggtggtttaa cttcaattaa
atgggctgat aacaattgtt 1320atttgtctag tgttttatta gcacttcaac agcttgaagt
caaattcaat gcaccagcac 1380ttcaagaggc ttattataga gcccgtgctg gtgatgctgc
taacttttgt gcactcatac 1440tc
1442451050DNACORONAVIRUS 45atatgtctat gacatatgga
cagcagtttg gtccaacata cttggatggt gctgatgtta 60caaaaattaa acctcatgta
aatcatgagg gtaagacttt ctttgtacta cctagtgatg 120acacactacg tagtgaagct
ttcgagtact accatactct tgatgagagt tttcttggta 180ggtacatgtc tgctttaaac
cacacaaaga aatggaaatt tcctcaagtt ggtggtttaa 240cttcaattaa atgggctgat
aacaattgtt atttgtctag tgttttatta gcacttcaac 300agcttgaagt caaattcaat
gcaccagcac ttcaagaggc ttattataga gcccgtgctg 360gtgatgctgc taacttttgt
gcactcatac tcgcttacag taataaaact gttggcgagc 420ttggtgatgt cagagaaact
atgacccatc ttctacagca tgctaatttg gaatctgcaa 480agcgagttct taatgtggtg
tgtaaacatt gtggtcagaa aactactacc ttaacgggtg 540tagaagctgt gatgtatatg
ggtactctat cttatgataa tcttaagaca ggtgtttcca 600ttccatgtgt gtgtggtcgt
gatgctacac aatatctagt acaacaagag tcttcttttg 660ttatgatgtc tgcaccacct
gctgagtata aattacagca aggtacattc ttatgtgcga 720atgagtacac tggtaactat
cagtgtggtc attacactca tataactgct aaggagaccc 780tctatcgtat tgacggagct
caccttacaa agatgtcaga gtacaaagga ccagtgactg 840atgttttcta caaggaaaca
tcttacacta caaccatcaa gcctgtgtcg tataaactcg 900atggagttac ttacacagag
attgaaccaa aattggatgg gtattataaa aaggataatg 960cttactatac agagcagcct
atagaccttg taccaactca accattacca aatgcgagtt 1020ttgataattt caaactcaca
tgttctaaca 1050461995DNACORONAVIRUS
46tttgtgcact catactcgct tacagtaata aaactgttgg cgagcttggt gatgtcagag
60aaactatgac ccatcttcta cagcatgcta atttggaatc tgcaaagcga gttcttaatg
120tggtgtgtaa acattgtggt cagaaaacta ctaccttaac gggtgtagaa gctgtgatgt
180atatgggtac tctatcttat gataatctta agacaggtgt ttccattcca tgtgtgtgtg
240gtcgtgatgc tacacaatat ctagtacaac aagagtcttc ttttgttatg atgtctgcac
300cacctgctga gtataaatta cagcaaggta cattcttatg tgcgaatgag tacactggta
360actatcagtg tggtcattac actcatataa ctgctaagga gaccctctat cgtattgacg
420gagctcacct tacaaagatg tcagagtaca aaggaccagt gactgatgtt ttctacaagg
480aaacatctta cactacaacc atcaagcctg tgtcgtataa actcgatgga gttacttaca
540cagagattga accaaaattg gatgggtatt ataaaaagga taatgcttac tatacagagc
600agcctataga ccttgtacca actcaaccat taccaaatgc gagttttgat aatttcaaac
660tcacatgttc taacacaaaa tttgctgatg atttaaatca aatgacaggc ttcacaaagc
720cagcttcacg agagctatct gtcacattct tcccagactt gaatggcgat gtagtggcta
780ttgactatag acactattca gcgagtttca agaaaggtgc taaattactg cataagccaa
840ttgtttggca cattaaccag gctacaacca agacaacgtt caaaccaaac acttggtgtt
900tacgttgtct ttggagtaca aagccagtag atacttcaaa ttcatttgaa gttctggcag
960tagaagacac acaaggaatg gacaatcttg cttgtgaaag tcaacaaccc acctctgaag
1020aagtagtgga aaatcctacc atacagaagg aagtcataga gtgtgacgtg aaaactaccg
1080aagttgtagg caatgtcata cttaaaccat cagatgaagg tgttaaagta acacaagagt
1140taggtcatga ggatcttatg gctgcttatg tggaaaacac aagcattacc attaagaaac
1200ctaatgagct ttcactagcc ttaggtttaa aaacaattgc cactcatggt attgctgcaa
1260ttaatagtgt tccttggagt aaaattttgg cttatgtcaa accattctta ggacaagcag
1320caattacaac atcaaattgc gctaagagat tagcacaacg tgtgtttaac aattatatgc
1380cttatgtgtt tacattattg ttccaattgt gtacttttac taaaagtacc aattctagaa
1440ttagagcttc actacctaca actattgcta aaaatagtgt taagagtgtt gctaaattat
1500gtttggatgc cggcattaat tatgtgaagt cacccaaatt ttctaaattg ttcacaatcg
1560ctatgtggct attgttgtta agtatttgct taggttctct aatctgtgta actgctgctt
1620ttggtgtact cttatctaat tttggtgctc cttcttattg taatggcgtt agagaattgt
1680atcttaattc gtctaacgtt actactatgg atttctgtga aggttctttt ccttgcagca
1740tttgtttaag tggattagac tcccttgatt cttatccagc tcttgaaacc attcaggtga
1800cgatttcatc gtacaagcta gacttgacaa ttttaggtct ggccgctgag tgggttttgg
1860catatatgtt gttcacaaaa ttcttttatt tattaggtct ttcagctata atgcaggtgt
1920tctttggcta ttttgctagt catttcatca gcaattcttg gctcatgtgg tttatcatta
1980gtattgtaca aatgg
1995471884DNACORONAVIRUS 47aattcttggc tcatgtggtt tatcattagt attgtacaaa
tggcacccgt ttctgcaatg 60gttaggatgt acatcttctt tgcttctttc tactacatat
ggaagagcta tgttcatatc 120atggatggtt gcacctcttc gacttgcatg atgtgctata
agcgcaatcg tgccacacgc 180gttgagtgta caactattgt taatggcatg aagagatctt
tctatgtcta tgcaaatgga 240ggccgtggct tctgcaagac tcacaattgg aattgtctca
attgtgacac attttgcact 300ggtagtacat tcattagtga tgaagttgct cgtgatttgt
cactccagtt taaaagacca 360atcaacccta ctgaccagtc atcgtatatt gttgatagtg
ttgctgtgaa aaatggcgcg 420cttcacctct actttgacaa ggctggtcaa aagacctatg
agagacatcc gctctcccat 480tttgtcaatt tagacaattt gagagctaac aacactaaag
gttcactgcc tattaatgtc 540atagtttttg atggcaagtc caaatgcgac gagtctgctt
ctaagtctgc ttctgtgtac 600tacagtcagc tgatgtgcca acctattctg ttgcttgacc
aagctcttgt atcagacgtt 660ggagatagta ctgaagtttc cgttaagatg tttgatgctt
atgtcgacac cttttcagca 720acttttagtg ttcctatgga aaaacttaag gcacttgttg
ctacagctca cagcgagtta 780gcaaagggtg tagctttaga tggtgtcctt tctacattcg
tgtcagctgc ccgacaaggt 840gttgttgata ccgatgttga cacaaaggat gttattgaat
gtctcaaact ttcacatcac 900tctgacttag aagtgacagg tgacagttgt aacaatttca
tgctcaccta taataaggtt 960gaaaacatga cgcccagaga tcttggcgca tgtattgact
gtaatgcaag gcatatcaat 1020gcccaagtag caaaaagtca caatgtttca ctcatctgga
atgtaaaaga ctacatgtct 1080ttatctgaac agctgcgtaa acaaattcgt agtgctgcca
agaagaacaa catacctttt 1140agactaactt gtgctacaac tagacaggtt gtcaatgtca
taactactaa aatctcactc 1200aagggtggta agattgttag tacttgtttt aaacttatgc
ttaaggccac attattgtgc 1260gttcttgctg cattggtttg ttatatcgtt atgccagtac
atacattgtc aatccatgat 1320ggttacacaa atgaaatcat tggttacaaa gccattcagg
atggtgtcac tcgtgacatc 1380atttctactg atgattgttt tgcaaataaa catgctggtt
ttgacgcatg gtttagccag 1440cgtggtggtt catacaaaaa tgacaaaagc tgccctgtag
tagctgctat cattacaaga 1500gagattggtt tcatagtgcc tggcttaccg ggtactgtgc
tgagagcaat caatggtgac 1560ttcttgcatt ttctacctcg tgtttttagt gctgttggca
acatttgcta cacaccttcc 1620aaactcattg agtatagtga ttttgctacc tctgcttgcg
ttcttgctgc tgagtgtaca 1680atttttaagg atgctatggg caaacctgtg ccatattgtt
atgacactaa tttgctagag 1740ggttctattt cttatagtga gcttcgtcca gacactcgtt
atgtgcttat ggatggttcc 1800atcatacagt ttcctaacac ttacctggag ggttctgtta
gagtagtaac aacttttgat 1860gctgagtact gtagacatgg taca
1884482020DNACORONAVIRUS 48cactcgttat gtgcttatgg
atggttccat catacagttt cctaacactt acctggaggg 60ttctgttaga gtagtaacaa
cttttgatgc tgagtactgt agacatggta catgcgaaag 120gtcagaagta ggtatttgcc
tatctaccag tggtagatgg gttcttaata atgagcatta 180cagagctcta tcaggagttt
tctgtggtgt tgatgcgatg aatctcatag ctaacatctt 240tactcctctt gtgcaacctg
tgggtgcttt agatgtgtct gcttcagtag tggctggtgg 300tattattgcc atattggtga
cttgtgctgc ctactacttt atgaaattca gacgtgtttt 360tggtgagtac aaccatgttg
ttgctgctaa tgcacttttg tttttgatgt ctttcactat 420actctgtctg gtaccagctt
acagctttct gccgggagtc tactcagtct tttacttgta 480cttgacattc tatttcacca
atgatgtttc attcttggct caccttcaat ggtttgccat 540gttttctcct attgtgcctt
tttggataac agcaatctat gtattctgta tttctctgaa 600gcactgccat tggttcttta
acaactatct taggaaaaga gtcatgttta atggagttac 660atttagtacc ttcgaggagg
ctgctttgtg tacctttttg ctcaacaagg aaatgtacct 720aaaattgcgt agcgagacac
tgttgccact tacacagtat aacaggtatc ttgctctata 780taacaagtac aagtatttca
gtggagcctt agatactacc agctatcgtg aagcagcttg 840ctgccactta gcaaaggctc
taaatgactt tagcaactca ggtgctgatg ttctctacca 900accaccacag acatcaatca
cttctgctgt tctgcagagt ggttttagga aaatggcatt 960cccgtcaggc aaagttgaag
ggtgcatggt acaagtaacc tgtggaacta caactcttaa 1020tggattgtgg ttggatgaca
cagtatactg tccaagacat gtcatttgca cagcagaaga 1080catgcttaat cctaactatg
aagatctgct cattcgcaaa tccaaccata gctttcttgt 1140tcaggctggc aatgttcaac
ttcgtgttat tggccattct atgcaaaatt gtctgcttag 1200gcttaaagtt gatacttcta
accctaagac acccaagtat aaatttgtcc gtatccaacc 1260tggtcaaaca ttttcagttc
tagcatgcta caatggttca ccatctggtg tttatcagtg 1320tgccatgaga cctaatcata
ccattaaagg ttctttcctt aatggatcat gtggtagtgt 1380tggttttaac attgattatg
attgcgtgtc tttctgctat atgcatcata tggagcttcc 1440aacaggagta cacgctggta
ctgacttaga aggtaaattc tatggtccat ttgttgacag 1500acaaactgca caggctgcag
gtacagacac aaccataaca ttaaatgttt tggcatggct 1560gtatgctgct gttatcaatg
gtgataggtg gtttcttaat agattcacca ctactttgaa 1620tgactttaac cttgtggcaa
tgaagtacaa ctatgaacct ttgacacaag atcatgttga 1680catattggga cctctttctg
ctcaaacagg aattgccgtc ttagatatgt gtgctgcttt 1740gaaagagctg ctgcagaatg
gtatgaatgg tcgtactatc cttggtagca ctattttaga 1800agatgagttt acaccatttg
atgttgttag acaatgctct ggtgttacct tccaaggtaa 1860gttcaagaaa attgttaagg
gcactcatca ttggatgctt ttaactttct tgacatcact 1920attgattctt gttcaaagta
cacagtggtc actgtttttc tttgtttacg agaatgcttt 1980cttgccattt actcttggta
ttatggcaat tgctgcatgt 2020492040DNACORONAVIRUS
49agcatttcca gcctgaagac gtactgtagc agctaaactg cccagcacca tacctctatt
60taggttgttt aagcctttga tgaagtacaa gtatttcact ttaggccctt ttggtgtgtc
120tgtaacaaac ctacaaggtg gttccagttc tgtgtaaatt gtacctgtac catcactctt
180agggaatcta gcccatttga gatcttggtg gtctgatagt aatgccagca caaacctacc
240tcccttcgaa ttgttatagt aggcaagtgc attgtcatca gtacaagctg tttgtgtggt
300accagccgca caggacatct gtcgtagtgc tactggactc agttcattat tctgtagttt
360aacagctgag ttggctctta gagctgtaac aataagaggc caagccaaat ttggtgaatt
420gtccatgtta atttcactaa gttgaacaat cttgctatcc gcatcaacaa cttgctggat
480ttcccagagt gcagatgcat atgtaaaggt gttaccatca caagtgttct tgtaggtacc
540ataatcaggg acaacaacca tgagtttggc tgctgtagtc aatggtatga tgttgagtgg
600aacacaacca tcacgcgcat tgttgataat gttgttaagt gcatcattat caagcttcct
660aagcatagtg aagagcattg tttgcatagc actagttact tttgccctct tgtcctcaga
720tcttgcctgt ttgtacattt gggtcatagc ctgatctgcc atcttttcca acttgcgttg
780catggcagca tcacggtcaa actcagattt agccacattc aaagatttct ttaacttttt
840gagaacgact tcagaatcac cattagctac agcctgctca taggcctcct gggcagtggc
900ataagcggca tatgatggta aagaactaaa ttctgaagca atagcctgaa gagtagcacg
960gttatcgagc atttcctcgc acaacctatt aatgtctaca gcaccctgca tggatagcaa
1020aacagacaaa agagaaacca tcttctcgaa agcttcagtt gtgtcttttg caagaagaat
1080atcattgtgg agttgtacac attgtgccca caatttagaa gatgactcta ctctaagttg
1140ttgaagaacc gagagcagta ccacagatgt gcactttacg tcagacattt tagactgtac
1200agtagcaacc ttgatacatg gtttacctcc aatacccaac aacttaatgt taagcttgaa
1260agcatcaata ctactcttag gaggcaaaag cccctgggag ttcatatacc taaattcttg
1320tgtagagacc aagtagtcat aaacaccaag agtaagcctg aagtaacggt tgagtaaaca
1380gaaaaggcca aagtagcagc agcaacaata gcctaagaaa caataaacaa gcatgataca
1440ctgtaaggtg ttgccagtaa taaataacaa tgggtaatac tcaacacaca caaacactat
1500agctctagct aaaaacatga tagtcgtaac gacaccagaa tagttagagg ttacagaaat
1560aactaaggcc cacatggaaa tagcttgatc taaagcatta ccatagtaga ctttgtaaac
1620aagtgtaatg acattcatca gtgtccaaac acgtctagca gcatcatcat aaacagtgcg
1680agctgtcatg agaataagca aaactaaagc tgaagcatac ataacacaat ccttaagcct
1740ataaccagac aagctagtgt cagccaattc aagccatgtc atgatacgca tcacccagct
1800agcaggcatg tagaccatat taaagtaagc aactgttgca agagaaggta acagaaacaa
1860gcacaagaat gcgtgcttat gcttaacaag cagcatagca catgcagcaa ttgccataat
1920accaagagta aatggcaaga aagcattctc gtaaacaaag aaaaacagtg accactgtgt
1980actttgaaca agaatcaata gtgatgtcaa gaaagttaaa agcatccaat gatgagtgca
2040502012DNACORONAVIRUS 50cttgtaggtt tgttacagac acaccaaaag ggcctaaagt
gaaatacttg tacttcatca 60aaggcttaaa caacctaaat agaggtatgg tgctgggcag
tttagctgct acagtacgtc 120ttcaggctgg aaatgctaca gaagtacctg ccaattcaac
tgtgctttcc ttctgtgctt 180ttgcagtaga ccctgctaaa gcatataagg attacctagc
aagtggagga caaccaatca 240ccaactgtgt gaagatgttg tgtacacaca ctggtacagg
acaggcaatt actgtaacac 300cagaagctaa catggaccaa gagtcctttg gtggtgcttc
atgttgtctg tattgtagat 360gccacattga ccatccaaat cctaaaggat tctgtgactt
gaaaggtaag tacgtccaaa 420tacctaccac ttgtgctaat gacccagtgg gttttacact
tagaaacaca gtctgtaccg 480tctgcggaat gtggaaaggt tatggctgta gttgtgacca
actccgcgaa cccttgatgc 540agtctgcgga tgcatcaacg tttttaaacg ggtttgcggt
gtaagtgcag cccgtcttac 600accgtgcggc acaggcacta gtactgatgt cgtctacagg
gcttttgata tttacaacga 660aaaagttgct ggttttgcaa agttcctaaa aactaattgc
tgtcgcttcc aggagaagga 720tgaggaaggc aatttattag actcttactt tgtagttaag
aggcatacta tgtctaacta 780ccaacatgaa gagactattt ataacttggt taaagattgt
ccagcggttg ctgtccatga 840ctttttcaag tttagagtag atggtgacat ggtaccacat
atatcacgtc agcgtctaac 900taaatacaca atggctgatt tagtctatgc tctacgtcat
tttgatgagg gtaattgtga 960tacattaaaa gaaatactcg tcacatacaa ttgctgtgat
gatgattatt tcaataagaa 1020ggattggtat gacttcgtag agaatcctga catcttacgc
gtatatgcta acttaggtga 1080gcgtgtacgc caatcattat taaagactgt acaattctgc
gatgctatgc gtgatgcagg 1140cattgtaggc gtactgacat tagataatca ggatcttaat
gggaactggt acgatttcgg 1200tgatttcgta caagtagcac caggctgcgg agttcctatt
gtggattcat attactcatt 1260gctgatgccc atcctcactt tgactagggc attggctgct
gagtcccata tggatgctga 1320tctcgcaaaa ccacttatta agtgggattt gctgaaatat
gattttacgg aagagagact 1380ttgtctcttc gaccgttatt ttaaatattg ggaccagaca
taccatccca attgtattaa 1440ctgtttggat gataggtgta tccttcattg tgcaaacttt
aatgtgttat tttctactgt 1500gtttccacct acaagttttg gaccactagt aagaaaaata
tttgtagatg gtgttccttt 1560tgttgtttca actggatacc attttcgtga gttaggagtc
gtacataatc aggatgtaaa 1620cttacatagc tcgcgtctca gtttcaagga acttttagtg
tatgctgctg atccagctat 1680gcatgcagct tctggcaatt tattgctaga taaacgcact
acatgctttt cagtagctgc 1740actaacaaac aatgttgctt ttcaaactgt caaacccggt
aattttaata aagactttta 1800tgactttgct gtgtctaaag gtttctttaa ggaaggaagt
tctgttgaac taaaacactt 1860cttctttgct caggatggca acgctgctat cagtgattat
gactattatc gttataatct 1920gccaacaatg tgtgatatca gacaactcct attcgtagtt
gaagttgttg ataaatactt 1980tgattgttac gatggtggct gtattaatgc ca
2012511877DNACORONAVIRUS 51gtacttcgcg tacagtggca
ataccatatg acagcttaaa tgtttcctca gtggctttga 60gcgtttctgc tgcgaaaagc
ttgagtctct cagtacaagt gttggcaagt atgtaatcgc 120cagcattagt ccaatcacat
gttgctatcg cattgaagtc agtgacattg tcactgccta 180cacatgtgtt tttgtataaa
ccaaaaacct gaccattagc acataatgga aaactaatgg 240gaggcttatg tgacttgcaa
taatagctca tacctcctag atacagttgt gtcacatcag 300tgacatcaca acctggggca
ttgcaaacat agggattaac agacaacact aatttgtgtg 360atgttgaaat gacatggtca
tagcagcact tgcaacatag gaatggtctc ctaatacagg 420caccgcaacg aagtgaagtc
tgtgaattgc acaatacaca agcacctaca gcctgcaaga 480ctgtatgtgg tgtgtacata
gcctcataaa actcaggttc ccagtaccgt gaggtgttat 540cattagttag cattacggaa
tacatgtcca acatgtggcc agtaagctca tcatgtaact 600ttctaatgta ttgtaaatac
aagtgaaaga catcagcata ctcctgatta ggatgttttg 660taagtgggta agcatcaata
gccagtgaca cgaacctttc aatcataagt gtaccatctg 720ttttgacaat atcatcgaca
aaacagcctg cgcctaatat tcttgatgga tctgggtaag 780gcaggtacac gtaatcatct
ccttgtttaa ctagcattgt atgctgtgag caaaattcgt 840gaggtccttt agtaaggtca
gtctcagtcc aacattttgc ctcagacatg aacacattat 900tttgataata aagaactgcc
ttaaagttct taatgctagc tactaaacct tgagccgcat 960agttactgtt atagcacaca
acggcatcat cagaaagaat catcatggag aaatgtttac 1020gcaggtaagc gtaaaactca
tccacgaatt catgatcaac atccctattt ctatagagac 1080actcatagag cctgtgttgt
agattgcgga catacttgtc agctatctta ttaccatcag 1140ttgaaagaag tgcatttaca
ttggctgtaa cagcttgaca aatgttaaag acactattag 1200cataagcagt tgtagcatca
ccggatgatg ttccacctgg tttaacatat agtgagccgc 1260cacacatgac catctcactt
aatacttgcg cacactcgtt agctaacctg tagaaacggt 1320gtgataagtt acagcaagtg
ttatgtttgc gagcaagaac aagagaggcc attatcctaa 1380gcatgttagg catggctctg
tcacattttg gataatccca acccataagg tgtggagttt 1440ctacatcact gtaaacagtt
tttaacatat tatgccagcc accgtaaaac ttgcttgttc 1500caattaccac agtagctcct
ctagtggcgg ctattgactt caataatttc tgatgaaact 1560gtctatttgt catagtacta
cagatagaga caccagctac ggtgcgagct ctattctttg 1620cactaatggc atacttaaga
ttcatttgag ttatagtagg gatgacatta cgcttagtat 1680acgcgaaaag tgcatcttga
tcctcataac tcattgagtc ataataaagt ctagccttac 1740cccatttatt aaatgggaaa
ccagctgatt tatccagatt gttaacgatt acttggttgg 1800cattaataca gccaccatcg
taacaatcaa agtatttatc aacaacttca actacgaata 1860ggagttgtct gatatca
1877522051DNACORONAVIRUS
52tcaggtccaa tcttgacaaa gtacttcatt gatgtaagct caaagccatg cgcccaaagg
60acgaacacga ctctgtctga caatcctttc agtgtatcac tgagcatttg tactatctta
120atacgcacta cattccaggg caagccttta tacatgagtg gtataagatg tttaaactgg
180tcacctggtg gaggttttgc attaactctg gtgaattctg tgttattttc agtgtcaaca
240taaccagtcg gtacagctac taagttaaca cctgtagaaa atcctagctg gagaggtagg
300ttagtaccca cagcatctct agttgcatga cagccctcta catcaaagcc aatccacgca
360cgaacgtgac gaatagcttc ttcgcgggtg ataaacatat tagggtaacc attgacttgg
420taattcattt tgaaacccat catagagatg agtctacggt aggtcatgtc ctttggtatg
480cctggtatgt caacacataa tccttcagtc ttgaacttta tatcaacgct gaggtgtgta
540ggtgcctgtg taggatgaag accagtaatg atcttactac agtccttaaa aagtccagtt
600acattttctg cttgtaatgt agccacattg cgacgtggta tttctagact tgtaaattgc
660agtttgtcat aaagatctct atcagacatt atgcacaaaa tgccaatttt tgcccttgtg
720atagccacat tgaagcggtt gacattacaa gagtgtgctg tttcagtagt ttgtgtgaat
780atgacatagt catattcaga accctgtgat gaatcaacag tctgcgtagg caatcctaag
840atttttgaag ctacagcgtt ctgtgaatta taaggtgaga taaaaacagc ttttctccaa
900gcaggattgc gtgtaagaaa ttctcttaca acgcctattt gaggtctgtt gattgcagat
960gaaacatcat gtgtaataac acctttgtag aacattttga agcattgagc tgacttatcc
1020ttgtgtgctt ttagcttatt gtcataaact aaagcactca cagtgtcaac aatttcagca
1080ggacaacggc gacaagttcc aaggaacatg tctggaccta ttgttttcat aagtctgcac
1140actgaattaa aatattctgg ttctagtgtg cctttagtca gcaatgtgcg gggggctggt
1200aattgagcag gatcgccaat atagacgtag tgttttgcac gaagtctagc attgacaaca
1260ctcaagtcat aattagtagc catagagatt tcatcaaaga ctacaatgtc agcagttgtt
1320tctggcaatg catttacagt gcagaaaaca tactgttcta gtgttgaatt cactttgaat
1380ttatcaaaac actctacgcg cgcacgcgca ggtatgattc tactacattt atctatgggc
1440aaatatttta atgccttttc acatagggca tcaacagctg catgagagca tgccgtatac
1500actatgcgag cagatgggta atagagagca agtccgatgg caaaatgact cttaccagta
1560ccaggtggtc cttggagtgt agagtacttt tgcatgccga ccttttgata atttgcaaca
1620ttgctagaaa actcatctga gatgttgagt gttgggtaca agccagtaat tctcacatag
1680tgctcttgtg gcactagagt aggtgcacta agtggcatta cagtgtgaga tgtcaacaca
1740aagtaatcac caacattcaa cttgtatgtc gtagtacctc tgtacacaac agcatcacca
1800tagtcacctt tttcaaaggt gtactctcca atctgtactt tactattttt agttacacgg
1860taaccagtaa agacatagtt tctgttcaat ggtggtctag gttttccaac ctcccatgaa
1920agatgcaatt ctctgtcaga gagtacttcg cgtacagtgg caataccata tgacagctta
1980aatgtttcct cagtggcttt gagcgtttct gctgcgaaaa gcttgagtct ctcagtacaa
2040gtgttggcaa g
2051532075DNACORONAVIRUS 53tgcttgtagt tttgggtaga aggtttcaac atgtccatcc
ttacaccaaa gcatgaatga 60aatttcagca tagtcaattg taaccttgac cacttttgaa
atcactgaca aatcttgtga 120ctttattatc tcgacaaagt catcaagtaa aagatcaatc
acagaacaca cacattttga 180tgaacctgtt tgcgcatctg ttatgaagta atttttcact
gtgctgtcca tagggataaa 240atcctctaat ttaagtggtg aatcttgtga gcgcttggct
aagcctatca ttaaatgaag 300accgccaagt tgtccatgac tgaaatctcc ataaacgatg
tgttcgaagg catagccctc 360gagcttatat cgctgtatga attcatccat agcgagctcg
agaaagtcag tttccatttg 420tgatctgggc ttaaaatcct ctaagtctct gctctgagta
aagtaggttt caggcaactg 480ttgaataatg ccgtctactt tcttaaagta gttaaactgt
gtttttactg attctccaat 540taatgtgact ccattgacgc tagcttgtgc tggtcccttt
gaaggtgtta gacctttgac 600tgaaccttct gttattaaaa caccattacg ggcgtttcta
aaaaggtcta cctgtccttc 660cactctacca tcaaacaaga cagtaagtga agaacaagca
ctctcagtag gtttcttggc 720aatgtcagtc attgtgcaga cacctattgt agatacatgt
gctggggctt ctcttttgta 780gtcccagatt acagtattag cagcgatatc aacacccaaa
ttattgagta tcttaatctc 840tggcactggt ttaatgttac gcttagccca aagctcaaat
gcaacattaa caggaagtgt 900tgtcttattt tcaaagatct ccacatcaat accatctacc
tttgtgtaaa cagcattatt 960aatgatggaa acaggtgctt cgccggcgtg tccatcaaag
tgtcctttat taacaacatt 1020ataagccaca ttttctaaac tctgtaacct ggtaaatgta
ttccacaggt tataagtatc 1080aaattgtttg taaatccata ggctaaatcc agcagaaatc
atcatattat atgcatccaa 1140gtactgtcgg tactcatttg catggtgtct gcaaacagca
ccacctaaat tgcatcgtgt 1200aatacacgta gcagatttga gtggaacata atcaatatcc
gacactactt gtttgccatg 1260agactcacaa ggactatcag aatagtaaaa gaaaggcaat
tgctttaaat tagtaaatgc 1320acttttatcg aaagctggag tgtggaatgc atgcttattc
acatacaaac taccaccatc 1380acagcctggt aagttcaagt ttgacaagac tcttgtgtca
aacctacaca caattgcatt 1440ggctgggtaa cgatcaacgt tacaattcca aaacaaacaa
acaccatcag tgaatttatc 1500gtgatgtgta gcataagaat agaagagttc ctctattttg
taagctttgt cactacatgg 1560ctgagcatcg tagaacttcc attctacttc agcctgaggc
acacacttga tagcctttgg 1620atttccaatg tcatgaagaa ctggaaactt atcagcaagc
aatgcagact tcacaaccat 1680gtgttgtact tttctgcaag cagaattaac cctcagttca
tctcctataa tagggtattc 1740aacagaccaa tcaacgcgct taacaaagca ctcatggact
gctaaacatc tagtcatgat 1800agcatcacaa ctagccacat gtgcatttcc atgtacctgg
caatgttggt catggttact 1860ctgaaggtta cccgtaaagc cccactgctg aacatcaatc
ataaatgggt tatagacata 1920gtcaaaaccc acagaatgat tccagcaggc ataagtatct
gatgaagtag aaaagcaagt 1980tgcacgtttg tcacacagac aacacgttct ttcaggtcca
atcttgacaa agtacttcat 2040tgatgtaagc tcaaagccat gcgcccaaag gacga
2075541891DNACORONAVIRUS 54aagattcacc acttaaatta
gaggatttta tccctatgga cagcacagtg aaaaattact 60tcataacaga tgcgcaaaca
ggttcatcaa aatgtgtgtg ttctgtgatt gatcttttac 120ttgatgactt tgtcgagata
ataaagtcac aagatttgtc agtgatttca aaagtggtca 180aggttacaat tgactatgct
gaaatttcat tcatgctttg gtgtaaggat ggacatgttg 240aaaccttcta cccaaaacta
caagcaagtc aagcgtggca accaggtgtt gcgatgccta 300acttgtacaa gatgcaaaga
atgcttcttg aaaagtgtga ccttcagaat tatggtgaaa 360atgctgttat accaaaagga
ataatgatga atgtcgcaaa gtatactcaa ctgtgtcaat 420acttaaatac acttacttta
gctgtaccct acaacatgag agttattcac tttggtgctg 480gctctgataa aggagttgca
ccaggtacag ctgtgctcag acaatggttg ccaactggca 540cactacttgt cgattcagat
cttaatgact tcgtctccga cgcagattct actttaattg 600gagactgtgc aacagtacat
acggctaata aatgggacct tattattagc gatatgtatg 660accctaggac caaacatgtg
acaaaagaga atgactctaa agaagggttt ttcacttatc 720tgtgtggatt tataaagcaa
aaactagccc tgggtggttc tatagctgta aagataacag 780agcattcttg gaatgctgac
ctttacaagc ttatgggcca tttctcatgg tggacagctt 840ttgttacaaa tgtaaatgca
tcatcatcgg aagcattttt aattggggct aactatcttg 900gcaagccgaa ggaacaaatt
gatggctata ccatgcatgc taactacatt ttctggagga 960acacaaatcc tatccagttg
tcttcctatt cactctttga catgagcaaa tttcctctta 1020aattaagagg aactgctgta
atgtctctta aggagaatca aatcaatgat atgatttatt 1080ctcttctgga aaaaggtagg
cttatcatta gagaaaacaa cagagttgtg gtttcaagtg 1140atattcttgt taacaactaa
acgaacatgt ttattttctt attatttctt actctcacta 1200gtggtagtga ccttgaccgg
tgcaccactt ttgatgatgt tcaagctcct aattacactc 1260aacatacttc atctatgagg
ggggtttact atcctgatga aatttttaga tcagacactc 1320tttatttaac tcaggattta
tttcttccat tttattctaa tgttacaggg tttcatacta 1380ttaatcatac gtttggcaac
cctgtcatac cttttaagga tggtatttat tttgctgcca 1440cagagaaatc aaatgttgtc
cgtggttggg tttttggttc taccatgaac aacaagtcac 1500agtcggtgat tattattaac
aattctacta atgttgttat acgagcatgt aactttgaat 1560tgtgtgacaa ccctttcttt
gctgtttcta aacccatggg tacacagaca catactatga 1620tattcgataa tgcatttaat
tgcactttcg agtacatatc tgatgccttt tcgcttgatg 1680tttcagaaaa gtcaggtaat
tttaaacact tacgagagtt tgtgtttaaa aataaagatg 1740ggtttctcta tgtttataag
ggctatcaac ctatagatgt agttcgtgat ctaccttctg 1800gttttaacac tttgaaacct
atttttaagt tgcctcttgg tattaacatt acaaatttta 1860gagccattct tacagccttt
tcacctgctc a 18915532DNAartificial
sequenceN sens primer 55cccatatgtc tgataatgga ccccaatcaa ac
325632DNAartificial sequenceN antisens primer
56cccccgggtg cctgagttga atcagcagaa gc
325731DNAartificial sequenceSc sens primer 57cccatatgag tgaccttgac
cggtgcacca c 315830DNAartificial
sequenceSL sens primer 58cccatatgaa accttgcacc ccacctgctc
305933DNASc and SL antisens primer 59cccccgggtt
taatatattg ctcatatttt ccc 336016DNASens
set 1 primer 60ggcatcgtat gggttg
166116DNAAntisens set 2 (28774-28759) primer 61cagtttcacc
acctcc 166216DNASens
set 2 (28375-28390) primer 62ggctactacc gaagag
166316DNAAntisens set 2 (28702-28687)primer
63aattaccgcg actacg
166426DNAProbe 1/set 1 (28561-28586) 64ggcacccgca atcctaataa caatgc
266521DNAProbe 2/set 1 (28588-28608)
65gccaccgtgc tacaacttcc t
216623DNAProbe 1/set 2 /probe N/FL (28541-28563) 66atacacccaa agaccacatt
ggc 236725DNAProbe 2/set
2/probe SARS/N/LC705 (28565-28589) 67cccgcaatcc taataacaat gctgc
256830DNAartificial sequenceAnchor
primer 14T 68agatgaattc ggtacctttt tttttttttt
306913PRTartificial sequenceM2-14 peptide 69Ala Asp Asn Gly Thr
Ile Thr Val Glu Glu Leu Lys Gln1 5
107012PRTartificial sequenceE1-12 peptide 70Met Tyr Ser Phe Val Ser Glu
Glu Thr Gly Thr Leu1 5
107124PRTartificial sequenceE53-72 peptide 71Lys Pro Thr Val Tyr Val Tyr
Ser Arg Val Lys Asn Leu Asn Ser Ser1 5 10
15Glu Gly Val Pro Asp Leu Leu Val
2072153DNACORONAVIRUS 72gatattaggt ttttacctac ccaggaaaag ccaaccaacc
tcgatctctt gtagatctgt 60tctctaaacg aactttaaaa tctgtgtagc tgtcgctcgg
ctgcatgcct agtgcaccta 120cgcagtataa acaataataa attttactgt cgt
15373410DNACORONAVIRUS 73ttctccagac aacttcaaaa
ttccatgagt ggagcttctg ctgattcaac tcaggcataa 60acactcatga tgaccacaca
aggcagatgg gctatgtaaa cgttttcgca attccgttta 120cgatacatag tctactcttg
tgcagaatga attctcgtaa ctaaacagca caagtaggtt 180tagttaactt taatctcaca
tagcaatctt taatcaatgt gtaacattag ggaggacttg 240aaagagccac cacattttca
tcgaggccac gcggagtacg atcgagggta cagtgaataa 300tgctagggag agctgcctat
atggaagagc cctaatgtgt aaaattaatt ttagtagtgc 360tatccccatg tgattttaat
agcttcttag gagaatgaca aaaaaaaaaa 410744382PRTCORONAVIRUS
74Met Glu Ser Leu Val Leu Gly Val Asn Glu Lys Thr His Val Gln Leu1
5 10 15Ser Leu Pro Val Leu Gln
Val Arg Asp Val Leu Val Arg Gly Phe Gly 20 25
30Asp Ser Val Glu Glu Ala Leu Ser Glu Ala Arg Glu His
Leu Lys Asn 35 40 45Gly Thr Cys
Gly Leu Val Glu Leu Glu Lys Gly Val Leu Pro Gln Leu 50
55 60Glu Gln Pro Tyr Val Phe Ile Lys Arg Ser Asp Ala
Leu Ser Thr Asn65 70 75
80His Gly His Lys Val Val Glu Leu Val Ala Glu Met Asp Gly Ile Gln
85 90 95Tyr Gly Arg Ser Gly Ile
Thr Leu Gly Val Leu Val Pro His Val Gly 100
105 110Glu Thr Pro Ile Ala Tyr Arg Asn Val Leu Leu Arg
Lys Asn Gly Asn 115 120 125Lys Gly
Ala Gly Gly His Ser Tyr Gly Ile Asp Leu Lys Ser Tyr Asp 130
135 140Leu Gly Asp Glu Leu Gly Thr Asp Pro Ile Glu
Asp Tyr Glu Gln Asn145 150 155
160Trp Asn Thr Lys His Gly Ser Gly Ala Leu Arg Glu Leu Thr Arg Glu
165 170 175Leu Asn Gly Gly
Ala Val Thr Arg Tyr Val Asp Asn Asn Phe Cys Gly 180
185 190Pro Asp Gly Tyr Pro Leu Asp Cys Ile Lys Asp
Phe Leu Ala Arg Ala 195 200 205Gly
Lys Ser Met Cys Thr Leu Ser Glu Gln Leu Asp Tyr Ile Glu Ser 210
215 220Lys Arg Gly Val Tyr Cys Cys Arg Asp His
Glu His Glu Ile Ala Trp225 230 235
240Phe Thr Glu Arg Ser Asp Lys Ser Tyr Glu His Gln Thr Pro Phe
Glu 245 250 255Ile Lys Ser
Ala Lys Lys Phe Asp Thr Phe Lys Gly Glu Cys Pro Lys 260
265 270Phe Val Phe Pro Leu Asn Ser Lys Val Lys
Val Ile Gln Pro Arg Val 275 280
285Glu Lys Lys Lys Thr Glu Gly Phe Met Gly Arg Ile Arg Ser Val Tyr 290
295 300Pro Val Ala Ser Pro Gln Glu Cys
Asn Asn Met His Leu Ser Thr Leu305 310
315 320Met Lys Cys Asn His Cys Asp Glu Val Ser Trp Gln
Thr Cys Asp Phe 325 330
335Leu Lys Ala Thr Cys Glu His Cys Gly Thr Glu Asn Leu Val Ile Glu
340 345 350Gly Pro Thr Thr Cys Gly
Tyr Leu Pro Thr Asn Ala Val Val Lys Met 355 360
365Pro Cys Pro Ala Cys Gln Asp Pro Glu Ile Gly Pro Glu His
Ser Val 370 375 380Ala Asp Tyr His Asn
His Ser Asn Ile Glu Thr Arg Leu Arg Lys Gly385 390
395 400Gly Arg Thr Arg Cys Phe Gly Gly Cys Val
Phe Ala Tyr Val Gly Cys 405 410
415Tyr Asn Lys Arg Ala Tyr Trp Val Pro Arg Ala Ser Ala Asp Ile Gly
420 425 430Ser Gly His Thr Gly
Ile Thr Gly Asp Asn Val Glu Thr Leu Asn Glu 435
440 445Asp Leu Leu Glu Ile Leu Ser Arg Glu Arg Val Asn
Ile Asn Ile Val 450 455 460Gly Asp Phe
His Leu Asn Glu Glu Val Ala Ile Ile Leu Ala Ser Phe465
470 475 480Ser Ala Ser Thr Ser Ala Phe
Ile Asp Thr Ile Lys Ser Leu Asp Tyr 485
490 495Lys Ser Phe Lys Thr Ile Val Glu Ser Cys Gly Asn
Tyr Lys Val Thr 500 505 510Lys
Gly Lys Pro Val Lys Gly Ala Trp Asn Ile Gly Gln Gln Arg Ser 515
520 525Val Leu Thr Pro Leu Cys Gly Phe Pro
Ser Gln Ala Ala Gly Val Ile 530 535
540Arg Ser Ile Phe Ala Arg Thr Leu Asp Ala Ala Asn His Ser Ile Pro545
550 555 560Asp Leu Gln Arg
Ala Ala Val Thr Ile Leu Asp Gly Ile Ser Glu Gln 565
570 575Ser Leu Arg Leu Val Asp Ala Met Val Tyr
Thr Ser Asp Leu Leu Thr 580 585
590Asn Ser Val Ile Ile Met Ala Tyr Val Thr Gly Gly Leu Val Gln Gln
595 600 605Thr Ser Gln Trp Leu Ser Asn
Leu Leu Gly Thr Thr Val Glu Lys Leu 610 615
620Arg Pro Ile Phe Glu Trp Ile Glu Ala Lys Leu Ser Ala Gly Val
Glu625 630 635 640Phe Leu
Lys Asp Ala Trp Glu Ile Leu Lys Phe Leu Ile Thr Gly Val
645 650 655Phe Asp Ile Val Lys Gly Gln
Ile Gln Val Ala Ser Asp Asn Ile Lys 660 665
670Asp Cys Val Lys Cys Phe Ile Asp Val Val Asn Lys Ala Leu
Glu Met 675 680 685Cys Ile Asp Gln
Val Thr Ile Ala Gly Ala Lys Leu Arg Ser Leu Asn 690
695 700Leu Gly Glu Val Phe Ile Ala Gln Ser Lys Gly Leu
Tyr Arg Gln Cys705 710 715
720Ile Arg Gly Lys Glu Gln Leu Gln Leu Leu Met Pro Leu Lys Ala Pro
725 730 735Lys Glu Val Thr Phe
Leu Glu Gly Asp Ser His Asp Thr Val Leu Thr 740
745 750Ser Glu Glu Val Val Leu Lys Asn Gly Glu Leu Glu
Ala Leu Glu Thr 755 760 765Pro Val
Asp Ser Phe Thr Asn Gly Ala Ile Val Gly Thr Pro Val Cys 770
775 780Val Asn Gly Leu Met Leu Leu Glu Ile Lys Asp
Lys Glu Gln Tyr Cys785 790 795
800Ala Leu Ser Pro Gly Leu Leu Ala Thr Asn Asn Val Phe Arg Leu Lys
805 810 815Gly Gly Ala Pro
Ile Lys Gly Val Thr Phe Gly Glu Asp Thr Val Trp 820
825 830Glu Val Gln Gly Tyr Lys Asn Val Arg Ile Thr
Phe Glu Leu Asp Glu 835 840 845Arg
Val Asp Lys Val Leu Asn Glu Lys Cys Ser Val Tyr Thr Val Glu 850
855 860Ser Gly Thr Glu Val Thr Glu Phe Ala Cys
Val Val Ala Glu Ala Val865 870 875
880Val Lys Thr Leu Gln Pro Val Ser Asp Leu Leu Thr Asn Met Gly
Ile 885 890 895Asp Leu Asp
Glu Trp Ser Val Ala Thr Phe Tyr Leu Phe Asp Asp Ala 900
905 910Gly Glu Glu Asn Phe Ser Ser Arg Met Tyr
Cys Ser Phe Tyr Pro Pro 915 920
925Asp Glu Glu Glu Glu Asp Asp Ala Glu Cys Glu Glu Glu Glu Ile Asp 930
935 940Glu Thr Cys Glu His Glu Tyr Gly
Thr Glu Asp Asp Tyr Gln Gly Leu945 950
955 960Pro Leu Glu Phe Gly Ala Ser Ala Glu Thr Val Arg
Val Glu Glu Glu 965 970
975Glu Glu Glu Asp Trp Leu Asp Asp Thr Thr Glu Gln Ser Glu Ile Glu
980 985 990Pro Glu Pro Glu Pro Thr
Pro Glu Glu Pro Val Asn Gln Phe Thr Gly 995 1000
1005Tyr Leu Lys Leu Thr Asp Asn Val Ala Ile Lys Cys Val
Asp Ile 1010 1015 1020Val Lys Glu Ala
Gln Ser Ala Asn Pro Met Val Ile Val Asn Ala 1025
1030 1035Ala Asn Ile His Leu Lys His Gly Gly Gly Val
Ala Gly Ala Leu 1040 1045 1050Asn Lys
Ala Thr Asn Gly Ala Met Gln Lys Glu Ser Asp Asp Tyr 1055
1060 1065Ile Lys Leu Asn Gly Pro Leu Thr Val Gly
Gly Ser Cys Leu Leu 1070 1075 1080Ser
Gly His Asn Leu Ala Lys Lys Cys Leu His Val Val Gly Pro 1085
1090 1095Asn Leu Asn Ala Gly Glu Asp Ile Gln
Leu Leu Lys Ala Ala Tyr 1100 1105
1110Glu Asn Phe Asn Ser Gln Asp Ile Leu Leu Ala Pro Leu Leu Ser
1115 1120 1125Ala Gly Ile Phe Gly Ala
Lys Pro Leu Gln Ser Leu Gln Val Cys 1130 1135
1140Val Gln Thr Val Arg Thr Gln Val Tyr Ile Ala Val Asn Asp
Lys 1145 1150 1155Ala Leu Tyr Glu Gln
Val Val Met Asp Tyr Leu Asp Asn Leu Lys 1160 1165
1170Pro Arg Val Glu Ala Pro Lys Gln Glu Glu Pro Pro Asn
Thr Glu 1175 1180 1185Asp Ser Lys Thr
Glu Glu Lys Ser Val Val Gln Lys Pro Val Asp 1190
1195 1200Val Lys Pro Lys Ile Lys Ala Cys Ile Asp Glu
Val Thr Thr Thr 1205 1210 1215Leu Glu
Glu Thr Lys Phe Leu Thr Asn Lys Leu Leu Leu Phe Ala 1220
1225 1230Asp Ile Asn Gly Lys Leu Tyr His Asp Ser
Gln Asn Met Leu Arg 1235 1240 1245Gly
Glu Asp Met Ser Phe Leu Glu Lys Asp Ala Pro Tyr Met Val 1250
1255 1260Gly Asp Val Ile Thr Ser Gly Asp Ile
Thr Cys Val Val Ile Pro 1265 1270
1275Ser Lys Lys Ala Gly Gly Thr Thr Glu Met Leu Ser Arg Ala Leu
1280 1285 1290Lys Lys Val Pro Val Asp
Glu Tyr Ile Thr Thr Tyr Pro Gly Gln 1295 1300
1305Gly Cys Ala Gly Tyr Thr Leu Glu Glu Ala Lys Thr Ala Leu
Lys 1310 1315 1320Lys Cys Lys Ser Ala
Phe Tyr Val Leu Pro Ser Glu Ala Pro Asn 1325 1330
1335Ala Lys Glu Glu Ile Leu Gly Thr Val Ser Trp Asn Leu
Arg Glu 1340 1345 1350Met Leu Ala His
Ala Glu Glu Thr Arg Lys Leu Met Pro Ile Cys 1355
1360 1365Met Asp Val Arg Ala Ile Met Ala Thr Ile Gln
Arg Lys Tyr Lys 1370 1375 1380Gly Ile
Lys Ile Gln Glu Gly Ile Val Asp Tyr Gly Val Arg Phe 1385
1390 1395Phe Phe Tyr Thr Ser Lys Glu Pro Val Ala
Ser Ile Ile Thr Lys 1400 1405 1410Leu
Asn Ser Leu Asn Glu Pro Leu Val Thr Met Pro Ile Gly Tyr 1415
1420 1425Val Thr His Gly Phe Asn Leu Glu Glu
Ala Ala Arg Cys Met Arg 1430 1435
1440Ser Leu Lys Ala Pro Ala Val Val Ser Val Ser Ser Pro Asp Ala
1445 1450 1455Val Thr Thr Tyr Asn Gly
Tyr Leu Thr Ser Ser Ser Lys Thr Ser 1460 1465
1470Glu Glu His Phe Val Glu Thr Val Ser Leu Ala Gly Ser Tyr
Arg 1475 1480 1485Asp Trp Ser Tyr Ser
Gly Gln Arg Thr Glu Leu Gly Val Glu Phe 1490 1495
1500Leu Lys Arg Gly Asp Lys Ile Val Tyr His Thr Leu Glu
Ser Pro 1505 1510 1515Val Glu Phe His
Leu Asp Gly Glu Val Leu Ser Leu Asp Lys Leu 1520
1525 1530Lys Ser Leu Leu Ser Leu Arg Glu Val Lys Thr
Ile Lys Val Phe 1535 1540 1545Thr Thr
Val Asp Asn Thr Asn Leu His Thr Gln Leu Val Asp Met 1550
1555 1560Ser Met Thr Tyr Gly Gln Gln Phe Gly Pro
Thr Tyr Leu Asp Gly 1565 1570 1575Ala
Asp Val Thr Lys Ile Lys Pro His Val Asn His Glu Gly Lys 1580
1585 1590Thr Phe Phe Val Leu Pro Ser Asp Asp
Thr Leu Arg Ser Glu Ala 1595 1600
1605Phe Glu Tyr Tyr His Thr Leu Asp Glu Ser Phe Leu Gly Arg Tyr
1610 1615 1620Met Ser Ala Leu Asn His
Thr Lys Lys Trp Lys Phe Pro Gln Val 1625 1630
1635Gly Gly Leu Thr Ser Ile Lys Trp Ala Asp Asn Asn Cys Tyr
Leu 1640 1645 1650Ser Ser Val Leu Leu
Ala Leu Gln Gln Leu Glu Val Lys Phe Asn 1655 1660
1665Ala Pro Ala Leu Gln Glu Ala Tyr Tyr Arg Ala Arg Ala
Gly Asp 1670 1675 1680Ala Ala Asn Phe
Cys Ala Leu Ile Leu Ala Tyr Ser Asn Lys Thr 1685
1690 1695Val Gly Glu Leu Gly Asp Val Arg Glu Thr Met
Thr His Leu Leu 1700 1705 1710Gln His
Ala Asn Leu Glu Ser Ala Lys Arg Val Leu Asn Val Val 1715
1720 1725Cys Lys His Cys Gly Gln Lys Thr Thr Thr
Leu Thr Gly Val Glu 1730 1735 1740Ala
Val Met Tyr Met Gly Thr Leu Ser Tyr Asp Asn Leu Lys Thr 1745
1750 1755Gly Val Ser Ile Pro Cys Val Cys Gly
Arg Asp Ala Thr Gln Tyr 1760 1765
1770Leu Val Gln Gln Glu Ser Ser Phe Val Met Met Ser Ala Pro Pro
1775 1780 1785Ala Glu Tyr Lys Leu Gln
Gln Gly Thr Phe Leu Cys Ala Asn Glu 1790 1795
1800Tyr Thr Gly Asn Tyr Gln Cys Gly His Tyr Thr His Ile Thr
Ala 1805 1810 1815Lys Glu Thr Leu Tyr
Arg Ile Asp Gly Ala His Leu Thr Lys Met 1820 1825
1830Ser Glu Tyr Lys Gly Pro Val Thr Asp Val Phe Tyr Lys
Glu Thr 1835 1840 1845Ser Tyr Thr Thr
Thr Ile Lys Pro Val Ser Tyr Lys Leu Asp Gly 1850
1855 1860Val Thr Tyr Thr Glu Ile Glu Pro Lys Leu Asp
Gly Tyr Tyr Lys 1865 1870 1875Lys Asp
Asn Ala Tyr Tyr Thr Glu Gln Pro Ile Asp Leu Val Pro 1880
1885 1890Thr Gln Pro Leu Pro Asn Ala Ser Phe Asp
Asn Phe Lys Leu Thr 1895 1900 1905Cys
Ser Asn Thr Lys Phe Ala Asp Asp Leu Asn Gln Met Thr Gly 1910
1915 1920Phe Thr Lys Pro Ala Ser Arg Glu Leu
Ser Val Thr Phe Phe Pro 1925 1930
1935Asp Leu Asn Gly Asp Val Val Ala Ile Asp Tyr Arg His Tyr Ser
1940 1945 1950Ala Ser Phe Lys Lys Gly
Ala Lys Leu Leu His Lys Pro Ile Val 1955 1960
1965Trp His Ile Asn Gln Ala Thr Thr Lys Thr Thr Phe Lys Pro
Asn 1970 1975 1980Thr Trp Cys Leu Arg
Cys Leu Trp Ser Thr Lys Pro Val Asp Thr 1985 1990
1995Ser Asn Ser Phe Glu Val Leu Ala Val Glu Asp Thr Gln
Gly Met 2000 2005 2010Asp Asn Leu Ala
Cys Glu Ser Gln Gln Pro Thr Ser Glu Glu Val 2015
2020 2025Val Glu Asn Pro Thr Ile Gln Lys Glu Val Ile
Glu Cys Asp Val 2030 2035 2040Lys Thr
Thr Glu Val Val Gly Asn Val Ile Leu Lys Pro Ser Asp 2045
2050 2055Glu Gly Val Lys Val Thr Gln Glu Leu Gly
His Glu Asp Leu Met 2060 2065 2070Ala
Ala Tyr Val Glu Asn Thr Ser Ile Thr Ile Lys Lys Pro Asn 2075
2080 2085Glu Leu Ser Leu Ala Leu Gly Leu Lys
Thr Ile Ala Thr His Gly 2090 2095
2100Ile Ala Ala Ile Asn Ser Val Pro Trp Ser Lys Ile Leu Ala Tyr
2105 2110 2115Val Lys Pro Phe Leu Gly
Gln Ala Ala Ile Thr Thr Ser Asn Cys 2120 2125
2130Ala Lys Arg Leu Ala Gln Arg Val Phe Asn Asn Tyr Met Pro
Tyr 2135 2140 2145Val Phe Thr Leu Leu
Phe Gln Leu Cys Thr Phe Thr Lys Ser Thr 2150 2155
2160Asn Ser Arg Ile Arg Ala Ser Leu Pro Thr Thr Ile Ala
Lys Asn 2165 2170 2175Ser Val Lys Ser
Val Ala Lys Leu Cys Leu Asp Ala Gly Ile Asn 2180
2185 2190Tyr Val Lys Ser Pro Lys Phe Ser Lys Leu Phe
Thr Ile Ala Met 2195 2200 2205Trp Leu
Leu Leu Leu Ser Ile Cys Leu Gly Ser Leu Ile Cys Val 2210
2215 2220Thr Ala Ala Phe Gly Val Leu Leu Ser Asn
Phe Gly Ala Pro Ser 2225 2230 2235Tyr
Cys Asn Gly Val Arg Glu Leu Tyr Leu Asn Ser Ser Asn Val 2240
2245 2250Thr Thr Met Asp Phe Cys Glu Gly Ser
Phe Pro Cys Ser Ile Cys 2255 2260
2265Leu Ser Gly Leu Asp Ser Leu Asp Ser Tyr Pro Ala Leu Glu Thr
2270 2275 2280Ile Gln Val Thr Ile Ser
Ser Tyr Lys Leu Asp Leu Thr Ile Leu 2285 2290
2295Gly Leu Ala Ala Glu Trp Val Leu Ala Tyr Met Leu Phe Thr
Lys 2300 2305 2310Phe Phe Tyr Leu Leu
Gly Leu Ser Ala Ile Met Gln Val Phe Phe 2315 2320
2325Gly Tyr Phe Ala Ser His Phe Ile Ser Asn Ser Trp Leu
Met Trp 2330 2335 2340Phe Ile Ile Ser
Ile Val Gln Met Ala Pro Val Ser Ala Met Val 2345
2350 2355Arg Met Tyr Ile Phe Phe Ala Ser Phe Tyr Tyr
Ile Trp Lys Ser 2360 2365 2370Tyr Val
His Ile Met Asp Gly Cys Thr Ser Ser Thr Cys Met Met 2375
2380 2385Cys Tyr Lys Arg Asn Arg Ala Thr Arg Val
Glu Cys Thr Thr Ile 2390 2395 2400Val
Asn Gly Met Lys Arg Ser Phe Tyr Val Tyr Ala Asn Gly Gly 2405
2410 2415Arg Gly Phe Cys Lys Thr His Asn Trp
Asn Cys Leu Asn Cys Asp 2420 2425
2430Thr Phe Cys Thr Gly Ser Thr Phe Ile Ser Asp Glu Val Ala Arg
2435 2440 2445Asp Leu Ser Leu Gln Phe
Lys Arg Pro Ile Asn Pro Thr Asp Gln 2450 2455
2460Ser Ser Tyr Ile Val Asp Ser Val Ala Val Lys Asn Gly Ala
Leu 2465 2470 2475His Leu Tyr Phe Asp
Lys Ala Gly Gln Lys Thr Tyr Glu Arg His 2480 2485
2490Pro Leu Ser His Phe Val Asn Leu Asp Asn Leu Arg Ala
Asn Asn 2495 2500 2505Thr Lys Gly Ser
Leu Pro Ile Asn Val Ile Val Phe Asp Gly Lys 2510
2515 2520Ser Lys Cys Asp Glu Ser Ala Ser Lys Ser Ala
Ser Val Tyr Tyr 2525 2530 2535Ser Gln
Leu Met Cys Gln Pro Ile Leu Leu Leu Asp Gln Ala Leu 2540
2545 2550Val Ser Asp Val Gly Asp Ser Thr Glu Val
Ser Val Lys Met Phe 2555 2560 2565Asp
Ala Tyr Val Asp Thr Phe Ser Ala Thr Phe Ser Val Pro Met 2570
2575 2580Glu Lys Leu Lys Ala Leu Val Ala Thr
Ala His Ser Glu Leu Ala 2585 2590
2595Lys Gly Val Ala Leu Asp Gly Val Leu Ser Thr Phe Val Ser Ala
2600 2605 2610Ala Arg Gln Gly Val Val
Asp Thr Asp Val Asp Thr Lys Asp Val 2615 2620
2625Ile Glu Cys Leu Lys Leu Ser His His Ser Asp Leu Glu Val
Thr 2630 2635 2640Gly Asp Ser Cys Asn
Asn Phe Met Leu Thr Tyr Asn Lys Val Glu 2645 2650
2655Asn Met Thr Pro Arg Asp Leu Gly Ala Cys Ile Asp Cys
Asn Ala 2660 2665 2670Arg His Ile Asn
Ala Gln Val Ala Lys Ser His Asn Val Ser Leu 2675
2680 2685Ile Trp Asn Val Lys Asp Tyr Met Ser Leu Ser
Glu Gln Leu Arg 2690 2695 2700Lys Gln
Ile Arg Ser Ala Ala Lys Lys Asn Asn Ile Pro Phe Arg 2705
2710 2715Leu Thr Cys Ala Thr Thr Arg Gln Val Val
Asn Val Ile Thr Thr 2720 2725 2730Lys
Ile Ser Leu Lys Gly Gly Lys Ile Val Ser Thr Cys Phe Lys 2735
2740 2745Leu Met Leu Lys Ala Thr Leu Leu Cys
Val Leu Ala Ala Leu Val 2750 2755
2760Cys Tyr Ile Val Met Pro Val His Thr Leu Ser Ile His Asp Gly
2765 2770 2775Tyr Thr Asn Glu Ile Ile
Gly Tyr Lys Ala Ile Gln Asp Gly Val 2780 2785
2790Thr Arg Asp Ile Ile Ser Thr Asp Asp Cys Phe Ala Asn Lys
His 2795 2800 2805Ala Gly Phe Asp Ala
Trp Phe Ser Gln Arg Gly Gly Ser Tyr Lys 2810 2815
2820Asn Asp Lys Ser Cys Pro Val Val Ala Ala Ile Ile Thr
Arg Glu 2825 2830 2835Ile Gly Phe Ile
Val Pro Gly Leu Pro Gly Thr Val Leu Arg Ala 2840
2845 2850Ile Asn Gly Asp Phe Leu His Phe Leu Pro Arg
Val Phe Ser Ala 2855 2860 2865Val Gly
Asn Ile Cys Tyr Thr Pro Ser Lys Leu Ile Glu Tyr Ser 2870
2875 2880Asp Phe Ala Thr Ser Ala Cys Val Leu Ala
Ala Glu Cys Thr Ile 2885 2890 2895Phe
Lys Asp Ala Met Gly Lys Pro Val Pro Tyr Cys Tyr Asp Thr 2900
2905 2910Asn Leu Leu Glu Gly Ser Ile Ser Tyr
Ser Glu Leu Arg Pro Asp 2915 2920
2925Thr Arg Tyr Val Leu Met Asp Gly Ser Ile Ile Gln Phe Pro Asn
2930 2935 2940Thr Tyr Leu Glu Gly Ser
Val Arg Val Val Thr Thr Phe Asp Ala 2945 2950
2955Glu Tyr Cys Arg His Gly Thr Cys Glu Arg Ser Glu Val Gly
Ile 2960 2965 2970Cys Leu Ser Thr Ser
Gly Arg Trp Val Leu Asn Asn Glu His Tyr 2975 2980
2985Arg Ala Leu Ser Gly Val Phe Cys Gly Val Asp Ala Met
Asn Leu 2990 2995 3000Ile Ala Asn Ile
Phe Thr Pro Leu Val Gln Pro Val Gly Ala Leu 3005
3010 3015Asp Val Ser Ala Ser Val Val Ala Gly Gly Ile
Ile Ala Ile Leu 3020 3025 3030Val Thr
Cys Ala Ala Tyr Tyr Phe Met Lys Phe Arg Arg Val Phe 3035
3040 3045Gly Glu Tyr Asn His Val Val Ala Ala Asn
Ala Leu Leu Phe Leu 3050 3055 3060Met
Ser Phe Thr Ile Leu Cys Leu Val Pro Ala Tyr Ser Phe Leu 3065
3070 3075Pro Gly Val Tyr Ser Val Phe Tyr Leu
Tyr Leu Thr Phe Tyr Phe 3080 3085
3090Thr Asn Asp Val Ser Phe Leu Ala His Leu Gln Trp Phe Ala Met
3095 3100 3105Phe Ser Pro Ile Val Pro
Phe Trp Ile Thr Ala Ile Tyr Val Phe 3110 3115
3120Cys Ile Ser Leu Lys His Cys His Trp Phe Phe Asn Asn Tyr
Leu 3125 3130 3135Arg Lys Arg Val Met
Phe Asn Gly Val Thr Phe Ser Thr Phe Glu 3140 3145
3150Glu Ala Ala Leu Cys Thr Phe Leu Leu Asn Lys Glu Met
Tyr Leu 3155 3160 3165Lys Leu Arg Ser
Glu Thr Leu Leu Pro Leu Thr Gln Tyr Asn Arg 3170
3175 3180Tyr Leu Ala Leu Tyr Asn Lys Tyr Lys Tyr Phe
Ser Gly Ala Leu 3185 3190 3195Asp Thr
Thr Ser Tyr Arg Glu Ala Ala Cys Cys His Leu Ala Lys 3200
3205 3210Ala Leu Asn Asp Phe Ser Asn Ser Gly Ala
Asp Val Leu Tyr Gln 3215 3220 3225Pro
Pro Gln Thr Ser Ile Thr Ser Ala Val Leu Gln Ser Gly Phe 3230
3235 3240Arg Lys Met Ala Phe Pro Ser Gly Lys
Val Glu Gly Cys Met Val 3245 3250
3255Gln Val Thr Cys Gly Thr Thr Thr Leu Asn Gly Leu Trp Leu Asp
3260 3265 3270Asp Thr Val Tyr Cys Pro
Arg His Val Ile Cys Thr Ala Glu Asp 3275 3280
3285Met Leu Asn Pro Asn Tyr Glu Asp Leu Leu Ile Arg Lys Ser
Asn 3290 3295 3300His Ser Phe Leu Val
Gln Ala Gly Asn Val Gln Leu Arg Val Ile 3305 3310
3315Gly His Ser Met Gln Asn Cys Leu Leu Arg Leu Lys Val
Asp Thr 3320 3325 3330Ser Asn Pro Lys
Thr Pro Lys Tyr Lys Phe Val Arg Ile Gln Pro 3335
3340 3345Gly Gln Thr Phe Ser Val Leu Ala Cys Tyr Asn
Gly Ser Pro Ser 3350 3355 3360Gly Val
Tyr Gln Cys Ala Met Arg Pro Asn His Thr Ile Lys Gly 3365
3370 3375Ser Phe Leu Asn Gly Ser Cys Gly Ser Val
Gly Phe Asn Ile Asp 3380 3385 3390Tyr
Asp Cys Val Ser Phe Cys Tyr Met His His Met Glu Leu Pro 3395
3400 3405Thr Gly Val His Ala Gly Thr Asp Leu
Glu Gly Lys Phe Tyr Gly 3410 3415
3420Pro Phe Val Asp Arg Gln Thr Ala Gln Ala Ala Gly Thr Asp Thr
3425 3430 3435Thr Ile Thr Leu Asn Val
Leu Ala Trp Leu Tyr Ala Ala Val Ile 3440 3445
3450Asn Gly Asp Arg Trp Phe Leu Asn Arg Phe Thr Thr Thr Leu
Asn 3455 3460 3465Asp Phe Asn Leu Val
Ala Met Lys Tyr Asn Tyr Glu Pro Leu Thr 3470 3475
3480Gln Asp His Val Asp Ile Leu Gly Pro Leu Ser Ala Gln
Thr Gly 3485 3490 3495Ile Ala Val Leu
Asp Met Cys Ala Ala Leu Lys Glu Leu Leu Gln 3500
3505 3510Asn Gly Met Asn Gly Arg Thr Ile Leu Gly Ser
Thr Ile Leu Glu 3515 3520 3525Asp Glu
Phe Thr Pro Phe Asp Val Val Arg Gln Cys Ser Gly Val 3530
3535 3540Thr Phe Gln Gly Lys Phe Lys Lys Ile Val
Lys Gly Thr His His 3545 3550 3555Trp
Met Leu Leu Thr Phe Leu Thr Ser Leu Leu Ile Leu Val Gln 3560
3565 3570Ser Thr Gln Trp Ser Leu Phe Phe Phe
Val Tyr Glu Asn Ala Phe 3575 3580
3585Leu Pro Phe Thr Leu Gly Ile Met Ala Ile Ala Ala Cys Ala Met
3590 3595 3600Leu Leu Val Lys His Lys
His Ala Phe Leu Cys Leu Phe Leu Leu 3605 3610
3615Pro Ser Leu Ala Thr Val Ala Tyr Phe Asn Met Val Tyr Met
Pro 3620 3625 3630Ala Ser Trp Val Met
Arg Ile Met Thr Trp Leu Glu Leu Ala Asp 3635 3640
3645Thr Ser Leu Ser Gly Tyr Arg Leu Lys Asp Cys Val Met
Tyr Ala 3650 3655 3660Ser Ala Leu Val
Leu Leu Ile Leu Met Thr Ala Arg Thr Val Tyr 3665
3670 3675Asp Asp Ala Ala Arg Arg Val Trp Thr Leu Met
Asn Val Ile Thr 3680 3685 3690Leu Val
Tyr Lys Val Tyr Tyr Gly Asn Ala Leu Asp Gln Ala Ile 3695
3700 3705Ser Met Trp Ala Leu Val Ile Ser Val Thr
Ser Asn Tyr Ser Gly 3710 3715 3720Val
Val Thr Thr Ile Met Phe Leu Ala Arg Ala Ile Val Phe Val 3725
3730 3735Cys Val Glu Tyr Tyr Pro Leu Leu Phe
Ile Thr Gly Asn Thr Leu 3740 3745
3750Gln Cys Ile Met Leu Val Tyr Cys Phe Leu Gly Tyr Cys Cys Cys
3755 3760 3765Cys Tyr Phe Gly Leu Phe
Cys Leu Leu Asn Arg Tyr Phe Arg Leu 3770 3775
3780Thr Leu Gly Val Tyr Asp Tyr Leu Val Ser Thr Gln Glu Phe
Arg 3785 3790 3795Tyr Met Asn Ser Gln
Gly Leu Leu Pro Pro Lys Ser Ser Ile Asp 3800 3805
3810Ala Phe Lys Leu Asn Ile Lys Leu Leu Gly Ile Gly Gly
Lys Pro 3815 3820 3825Cys Ile Lys Val
Ala Thr Val Gln Ser Lys Met Ser Asp Val Lys 3830
3835 3840Cys Thr Ser Val Val Leu Leu Ser Val Leu Gln
Gln Leu Arg Val 3845 3850 3855Glu Ser
Ser Ser Lys Leu Trp Ala Gln Cys Val Gln Leu His Asn 3860
3865 3870Asp Ile Leu Leu Ala Lys Asp Thr Thr Glu
Ala Phe Glu Lys Met 3875 3880 3885Val
Ser Leu Leu Ser Val Leu Leu Ser Met Gln Gly Ala Val Asp 3890
3895 3900Ile Asn Arg Leu Cys Glu Glu Met Leu
Asp Asn Arg Ala Thr Leu 3905 3910
3915Gln Ala Ile Ala Ser Glu Phe Ser Ser Leu Pro Ser Tyr Ala Ala
3920 3925 3930Tyr Ala Thr Ala Gln Glu
Ala Tyr Glu Gln Ala Val Ala Asn Gly 3935 3940
3945Asp Ser Glu Val Val Leu Lys Lys Leu Lys Lys Ser Leu Asn
Val 3950 3955 3960Ala Lys Ser Glu Phe
Asp Arg Asp Ala Ala Met Gln Arg Lys Leu 3965 3970
3975Glu Lys Met Ala Asp Gln Ala Met Thr Gln Met Tyr Lys
Gln Ala 3980 3985 3990Arg Ser Glu Asp
Lys Arg Ala Lys Val Thr Ser Ala Met Gln Thr 3995
4000 4005Met Leu Phe Thr Met Leu Arg Lys Leu Asp Asn
Asp Ala Leu Asn 4010 4015 4020Asn Ile
Ile Asn Asn Ala Arg Asp Gly Cys Val Pro Leu Asn Ile 4025
4030 4035Ile Pro Leu Thr Thr Ala Ala Lys Leu Met
Val Val Val Pro Asp 4040 4045 4050Tyr
Gly Thr Tyr Lys Asn Thr Cys Asp Gly Asn Thr Phe Thr Tyr 4055
4060 4065Ala Ser Ala Leu Trp Glu Ile Gln Gln
Val Val Asp Ala Asp Ser 4070 4075
4080Lys Ile Val Gln Leu Ser Glu Ile Asn Met Asp Asn Ser Pro Asn
4085 4090 4095Leu Ala Trp Pro Leu Ile
Val Thr Ala Leu Arg Ala Asn Ser Ala 4100 4105
4110Val Lys Leu Gln Asn Asn Glu Leu Ser Pro Val Ala Leu Arg
Gln 4115 4120 4125Met Ser Cys Ala Ala
Gly Thr Thr Gln Thr Ala Cys Thr Asp Asp 4130 4135
4140Asn Ala Leu Ala Tyr Tyr Asn Asn Ser Lys Gly Gly Arg
Phe Val 4145 4150 4155Leu Ala Leu Leu
Ser Asp His Gln Asp Leu Lys Trp Ala Arg Phe 4160
4165 4170Pro Lys Ser Asp Gly Thr Gly Thr Ile Tyr Thr
Glu Leu Glu Pro 4175 4180 4185Pro Cys
Arg Phe Val Thr Asp Thr Pro Lys Gly Pro Lys Val Lys 4190
4195 4200Tyr Leu Tyr Phe Ile Lys Gly Leu Asn Asn
Leu Asn Arg Gly Met 4205 4210 4215Val
Leu Gly Ser Leu Ala Ala Thr Val Arg Leu Gln Ala Gly Asn 4220
4225 4230Ala Thr Glu Val Pro Ala Asn Ser Thr
Val Leu Ser Phe Cys Ala 4235 4240
4245Phe Ala Val Asp Pro Ala Lys Ala Tyr Lys Asp Tyr Leu Ala Ser
4250 4255 4260Gly Gly Gln Pro Ile Thr
Asn Cys Val Lys Met Leu Cys Thr His 4265 4270
4275Thr Gly Thr Gly Gln Ala Ile Thr Val Thr Pro Glu Ala Asn
Met 4280 4285 4290Asp Gln Glu Ser Phe
Gly Gly Ala Ser Cys Cys Leu Tyr Cys Arg 4295 4300
4305Cys His Ile Asp His Pro Asn Pro Lys Gly Phe Cys Asp
Leu Lys 4310 4315 4320Gly Lys Tyr Val
Gln Ile Pro Thr Thr Cys Ala Asn Asp Pro Val 4325
4330 4335Gly Phe Thr Leu Arg Asn Thr Val Cys Thr Val
Cys Gly Met Trp 4340 4345 4350Lys Gly
Tyr Gly Cys Ser Cys Asp Gln Leu Arg Glu Pro Leu Met 4355
4360 4365Gln Ser Ala Asp Ala Ser Thr Phe Leu Asn
Gly Phe Ala Val 4370 4375
4380752695PRTCORONAVIRUS 75Arg Val Cys Gly Val Ser Ala Ala Arg Leu Thr
Pro Cys Gly Thr Gly1 5 10
15Thr Ser Thr Asp Val Val Tyr Arg Ala Phe Asp Ile Tyr Asn Glu Lys
20 25 30Val Ala Gly Phe Ala Lys Phe
Leu Lys Thr Asn Cys Cys Arg Phe Gln 35 40
45Glu Lys Asp Glu Glu Gly Asn Leu Leu Asp Ser Tyr Phe Val Val
Lys 50 55 60Arg His Thr Met Ser Asn
Tyr Gln His Glu Glu Thr Ile Tyr Asn Leu65 70
75 80Val Lys Asp Cys Pro Ala Val Ala Val His Asp
Phe Phe Lys Phe Arg 85 90
95Val Asp Gly Asp Met Val Pro His Ile Ser Arg Gln Arg Leu Thr Lys
100 105 110Tyr Thr Met Ala Asp Leu
Val Tyr Ala Leu Arg His Phe Asp Glu Gly 115 120
125Asn Cys Asp Thr Leu Lys Glu Ile Leu Val Thr Tyr Asn Cys
Cys Asp 130 135 140Asp Asp Tyr Phe Asn
Lys Lys Asp Trp Tyr Asp Phe Val Glu Asn Pro145 150
155 160Asp Ile Leu Arg Val Tyr Ala Asn Leu Gly
Glu Arg Val Arg Gln Ser 165 170
175Leu Leu Lys Thr Val Gln Phe Cys Asp Ala Met Arg Asp Ala Gly Ile
180 185 190Val Gly Val Leu Thr
Leu Asp Asn Gln Asp Leu Asn Gly Asn Trp Tyr 195
200 205Asp Phe Gly Asp Phe Val Gln Val Ala Pro Gly Cys
Gly Val Pro Ile 210 215 220Val Asp Ser
Tyr Tyr Ser Leu Leu Met Pro Ile Leu Thr Leu Thr Arg225
230 235 240Ala Leu Ala Ala Glu Ser His
Met Asp Ala Asp Leu Ala Lys Pro Leu 245
250 255Ile Lys Trp Asp Leu Leu Lys Tyr Asp Phe Thr Glu
Glu Arg Leu Cys 260 265 270Leu
Phe Asp Arg Tyr Phe Lys Tyr Trp Asp Gln Thr Tyr His Pro Asn 275
280 285Cys Ile Asn Cys Leu Asp Asp Arg Cys
Ile Leu His Cys Ala Asn Phe 290 295
300Asn Val Leu Phe Ser Thr Val Phe Pro Pro Thr Ser Phe Gly Pro Leu305
310 315 320Val Arg Lys Ile
Phe Val Asp Gly Val Pro Phe Val Val Ser Thr Gly 325
330 335Tyr His Phe Arg Glu Leu Gly Val Val His
Asn Gln Asp Val Asn Leu 340 345
350His Ser Ser Arg Leu Ser Phe Lys Glu Leu Leu Val Tyr Ala Ala Asp
355 360 365Pro Ala Met His Ala Ala Ser
Gly Asn Leu Leu Leu Asp Lys Arg Thr 370 375
380Thr Cys Phe Ser Val Ala Ala Leu Thr Asn Asn Val Ala Phe Gln
Thr385 390 395 400Val Lys
Pro Gly Asn Phe Asn Lys Asp Phe Tyr Asp Phe Ala Val Ser
405 410 415Lys Gly Phe Phe Lys Glu Gly
Ser Ser Val Glu Leu Lys His Phe Phe 420 425
430Phe Ala Gln Asp Gly Asn Ala Ala Ile Ser Asp Tyr Asp Tyr
Tyr Arg 435 440 445Tyr Asn Leu Pro
Thr Met Cys Asp Ile Arg Gln Leu Leu Phe Val Val 450
455 460Glu Val Val Asp Lys Tyr Phe Asp Cys Tyr Asp Gly
Gly Cys Ile Asn465 470 475
480Ala Asn Gln Val Ile Val Asn Asn Leu Asp Lys Ser Ala Gly Phe Pro
485 490 495Phe Asn Lys Trp Gly
Lys Ala Arg Leu Tyr Tyr Asp Ser Met Ser Tyr 500
505 510Glu Asp Gln Asp Ala Leu Phe Ala Tyr Thr Lys Arg
Asn Val Ile Pro 515 520 525Thr Ile
Thr Gln Met Asn Leu Lys Tyr Ala Ile Ser Ala Lys Asn Arg 530
535 540Ala Arg Thr Val Ala Gly Val Ser Ile Cys Ser
Thr Met Thr Asn Arg545 550 555
560Gln Phe His Gln Lys Leu Leu Lys Ser Ile Ala Ala Thr Arg Gly Ala
565 570 575Thr Val Val Ile
Gly Thr Ser Lys Phe Tyr Gly Gly Trp His Asn Met 580
585 590Leu Lys Thr Val Tyr Ser Asp Val Glu Thr Pro
His Leu Met Gly Trp 595 600 605Asp
Tyr Pro Lys Cys Asp Arg Ala Met Pro Asn Met Leu Arg Ile Met 610
615 620Ala Ser Leu Val Leu Ala Arg Lys His Asn
Thr Cys Cys Asn Leu Ser625 630 635
640His Arg Phe Tyr Arg Leu Ala Asn Glu Cys Ala Gln Val Leu Ser
Glu 645 650 655Met Val Met
Cys Gly Gly Ser Leu Tyr Val Lys Pro Gly Gly Thr Ser 660
665 670Ser Gly Asp Ala Thr Thr Ala Tyr Ala Asn
Ser Val Phe Asn Ile Cys 675 680
685Gln Ala Val Thr Ala Asn Val Asn Ala Leu Leu Ser Thr Asp Gly Asn 690
695 700Lys Ile Ala Asp Lys Tyr Val Arg
Asn Leu Gln His Arg Leu Tyr Glu705 710
715 720Cys Leu Tyr Arg Asn Arg Asp Val Asp His Glu Phe
Val Asp Glu Phe 725 730
735Tyr Ala Tyr Leu Arg Lys His Phe Ser Met Met Ile Leu Ser Asp Asp
740 745 750Ala Val Val Cys Tyr Asn
Ser Asn Tyr Ala Ala Gln Gly Leu Val Ala 755 760
765Ser Ile Lys Asn Phe Lys Ala Val Leu Tyr Tyr Gln Asn Asn
Val Phe 770 775 780Met Ser Glu Ala Lys
Cys Trp Thr Glu Thr Asp Leu Thr Lys Gly Pro785 790
795 800His Glu Phe Cys Ser Gln His Thr Met Leu
Val Lys Gln Gly Asp Asp 805 810
815Tyr Val Tyr Leu Pro Tyr Pro Asp Pro Ser Arg Ile Leu Gly Ala Gly
820 825 830Cys Phe Val Asp Asp
Ile Val Lys Thr Asp Gly Thr Leu Met Ile Glu 835
840 845Arg Phe Val Ser Leu Ala Ile Asp Ala Tyr Pro Leu
Thr Lys His Pro 850 855 860Asn Gln Glu
Tyr Ala Asp Val Phe His Leu Tyr Leu Gln Tyr Ile Arg865
870 875 880Lys Leu His Asp Glu Leu Thr
Gly His Met Leu Asp Met Tyr Ser Val 885
890 895Met Leu Thr Asn Asp Asn Thr Ser Arg Tyr Trp Glu
Pro Glu Phe Tyr 900 905 910Glu
Ala Met Tyr Thr Pro His Thr Val Leu Gln Ala Val Gly Ala Cys 915
920 925Val Leu Cys Asn Ser Gln Thr Ser Leu
Arg Cys Gly Ala Cys Ile Arg 930 935
940Arg Pro Phe Leu Cys Cys Lys Cys Cys Tyr Asp His Val Ile Ser Thr945
950 955 960Ser His Lys Leu
Val Leu Ser Val Asn Pro Tyr Val Cys Asn Ala Pro 965
970 975Gly Cys Asp Val Thr Asp Val Thr Gln Leu
Tyr Leu Gly Gly Met Ser 980 985
990Tyr Tyr Cys Lys Ser His Lys Pro Pro Ile Ser Phe Pro Leu Cys Ala
995 1000 1005Asn Gly Gln Val Phe Gly Leu
Tyr Lys Asn Thr Cys Val Gly Ser 1010 1015
1020Asp Asn Val Thr Asp Phe Asn Ala Ile Ala Thr Cys Asp Trp Thr
1025 1030 1035Asn Ala Gly Asp Tyr Ile
Leu Ala Asn Thr Cys Thr Glu Arg Leu 1040 1045
1050Lys Leu Phe Ala Ala Glu Thr Leu Lys Ala Thr Glu Glu Thr
Phe 1055 1060 1065Lys Leu Ser Tyr Gly
Ile Ala Thr Val Arg Glu Val Leu Ser Asp 1070 1075
1080Arg Glu Leu His Leu Ser Trp Glu Val Gly Lys Pro Arg
Pro Pro 1085 1090 1095Leu Asn Arg Asn
Tyr Val Phe Thr Gly Tyr Arg Val Thr Lys Asn 1100
1105 1110Ser Lys Val Gln Ile Gly Glu Tyr Thr Phe Glu
Lys Gly Asp Tyr 1115 1120 1125Gly Asp
Ala Val Val Tyr Arg Gly Thr Thr Thr Tyr Lys Leu Asn 1130
1135 1140Val Gly Asp Tyr Phe Val Leu Thr Ser His
Thr Val Met Pro Leu 1145 1150 1155Ser
Ala Pro Thr Leu Val Pro Gln Glu His Tyr Val Arg Ile Thr 1160
1165 1170Gly Leu Tyr Pro Thr Leu Asn Ile Ser
Asp Glu Phe Ser Ser Asn 1175 1180
1185Val Ala Asn Tyr Gln Lys Val Gly Met Gln Lys Tyr Ser Thr Leu
1190 1195 1200Gln Gly Pro Pro Gly Thr
Gly Lys Ser His Phe Ala Ile Gly Leu 1205 1210
1215Ala Leu Tyr Tyr Pro Ser Ala Arg Ile Val Tyr Thr Ala Cys
Ser 1220 1225 1230His Ala Ala Val Asp
Ala Leu Cys Glu Lys Ala Leu Lys Tyr Leu 1235 1240
1245Pro Ile Asp Lys Cys Ser Arg Ile Ile Pro Ala Arg Ala
Arg Val 1250 1255 1260Glu Cys Phe Asp
Lys Phe Lys Val Asn Ser Thr Leu Glu Gln Tyr 1265
1270 1275Val Phe Cys Thr Val Asn Ala Leu Pro Glu Thr
Thr Ala Asp Ile 1280 1285 1290Val Val
Phe Asp Glu Ile Ser Met Ala Thr Asn Tyr Asp Leu Ser 1295
1300 1305Val Val Asn Ala Arg Leu Arg Ala Lys His
Tyr Val Tyr Ile Gly 1310 1315 1320Asp
Pro Ala Gln Leu Pro Ala Pro Arg Thr Leu Leu Thr Lys Gly 1325
1330 1335Thr Leu Glu Pro Glu Tyr Phe Asn Ser
Val Cys Arg Leu Met Lys 1340 1345
1350Thr Ile Gly Pro Asp Met Phe Leu Gly Thr Cys Arg Arg Cys Pro
1355 1360 1365Ala Glu Ile Val Asp Thr
Val Ser Ala Leu Val Tyr Asp Asn Lys 1370 1375
1380Leu Lys Ala His Lys Asp Lys Ser Ala Gln Cys Phe Lys Met
Phe 1385 1390 1395Tyr Lys Gly Val Ile
Thr His Asp Val Ser Ser Ala Ile Asn Arg 1400 1405
1410Pro Gln Ile Gly Val Val Arg Glu Phe Leu Thr Arg Asn
Pro Ala 1415 1420 1425Trp Arg Lys Ala
Val Phe Ile Ser Pro Tyr Asn Ser Gln Asn Ala 1430
1435 1440Val Ala Ser Lys Ile Leu Gly Leu Pro Thr Gln
Thr Val Asp Ser 1445 1450 1455Ser Gln
Gly Ser Glu Tyr Asp Tyr Val Ile Phe Thr Gln Thr Thr 1460
1465 1470Glu Thr Ala His Ser Cys Asn Val Asn Arg
Phe Asn Val Ala Ile 1475 1480 1485Thr
Arg Ala Lys Ile Gly Ile Leu Cys Ile Met Ser Asp Arg Asp 1490
1495 1500Leu Tyr Asp Lys Leu Gln Phe Thr Ser
Leu Glu Ile Pro Arg Arg 1505 1510
1515Asn Val Ala Thr Leu Gln Ala Glu Asn Val Thr Gly Leu Phe Lys
1520 1525 1530Asp Cys Ser Lys Ile Ile
Thr Gly Leu His Pro Thr Gln Ala Pro 1535 1540
1545Thr His Leu Ser Val Asp Ile Lys Phe Lys Thr Glu Gly Leu
Cys 1550 1555 1560Val Asp Ile Pro Gly
Ile Pro Lys Asp Met Thr Tyr Arg Arg Leu 1565 1570
1575Ile Ser Met Met Gly Phe Lys Met Asn Tyr Gln Val Asn
Gly Tyr 1580 1585 1590Pro Asn Met Phe
Ile Thr Arg Glu Glu Ala Ile Arg His Val Arg 1595
1600 1605Ala Trp Ile Gly Phe Asp Val Glu Gly Cys His
Ala Thr Arg Asp 1610 1615 1620Ala Val
Gly Thr Asn Leu Pro Leu Gln Leu Gly Phe Ser Thr Gly 1625
1630 1635Val Asn Leu Val Ala Val Pro Thr Gly Tyr
Val Asp Thr Glu Asn 1640 1645 1650Asn
Thr Glu Phe Thr Arg Val Asn Ala Lys Pro Pro Pro Gly Asp 1655
1660 1665Gln Phe Lys His Leu Ile Pro Leu Met
Tyr Lys Gly Leu Pro Trp 1670 1675
1680Asn Val Val Arg Ile Lys Ile Val Gln Met Leu Ser Asp Thr Leu
1685 1690 1695Lys Gly Leu Ser Asp Arg
Val Val Phe Val Leu Trp Ala His Gly 1700 1705
1710Phe Glu Leu Thr Ser Met Lys Tyr Phe Val Lys Ile Gly Pro
Glu 1715 1720 1725Arg Thr Cys Cys Leu
Cys Asp Lys Arg Ala Thr Cys Phe Ser Thr 1730 1735
1740Ser Ser Asp Thr Tyr Ala Cys Trp Asn His Ser Val Gly
Phe Asp 1745 1750 1755Tyr Val Tyr Asn
Pro Phe Met Ile Asp Val Gln Gln Trp Gly Phe 1760
1765 1770Thr Gly Asn Leu Gln Ser Asn His Asp Gln His
Cys Gln Val His 1775 1780 1785Gly Asn
Ala His Val Ala Ser Cys Asp Ala Ile Met Thr Arg Cys 1790
1795 1800Leu Ala Val His Glu Cys Phe Val Lys Arg
Val Asp Trp Ser Val 1805 1810 1815Glu
Tyr Pro Ile Ile Gly Asp Glu Leu Arg Val Asn Ser Ala Cys 1820
1825 1830Arg Lys Val Gln His Met Val Val Lys
Ser Ala Leu Leu Ala Asp 1835 1840
1845Lys Phe Pro Val Leu His Asp Ile Gly Asn Pro Lys Ala Ile Lys
1850 1855 1860Cys Val Pro Gln Ala Glu
Val Glu Trp Lys Phe Tyr Asp Ala Gln 1865 1870
1875Pro Cys Ser Asp Lys Ala Tyr Lys Ile Glu Glu Leu Phe Tyr
Ser 1880 1885 1890Tyr Ala Thr His His
Asp Lys Phe Thr Asp Gly Val Cys Leu Phe 1895 1900
1905Trp Asn Cys Asn Val Asp Arg Tyr Pro Ala Asn Ala Ile
Val Cys 1910 1915 1920Arg Phe Asp Thr
Arg Val Leu Ser Asn Leu Asn Leu Pro Gly Cys 1925
1930 1935Asp Gly Gly Ser Leu Tyr Val Asn Lys His Ala
Phe His Thr Pro 1940 1945 1950Ala Phe
Asp Lys Ser Ala Phe Thr Asn Leu Lys Gln Leu Pro Phe 1955
1960 1965Phe Tyr Tyr Ser Asp Ser Pro Cys Glu Ser
His Gly Lys Gln Val 1970 1975 1980Val
Ser Asp Ile Asp Tyr Val Pro Leu Lys Ser Ala Thr Cys Ile 1985
1990 1995Thr Arg Cys Asn Leu Gly Gly Ala Val
Cys Arg His His Ala Asn 2000 2005
2010Glu Tyr Arg Gln Tyr Leu Asp Ala Tyr Asn Met Met Ile Ser Ala
2015 2020 2025Gly Phe Ser Leu Trp Ile
Tyr Lys Gln Phe Asp Thr Tyr Asn Leu 2030 2035
2040Trp Asn Thr Phe Thr Arg Leu Gln Ser Leu Glu Asn Val Ala
Tyr 2045 2050 2055Asn Val Val Asn Lys
Gly His Phe Asp Gly His Ala Gly Glu Ala 2060 2065
2070Pro Val Ser Ile Ile Asn Asn Ala Val Tyr Thr Lys Val
Asp Gly 2075 2080 2085Ile Asp Val Glu
Ile Phe Glu Asn Lys Thr Thr Leu Pro Val Asn 2090
2095 2100Val Ala Phe Glu Leu Trp Ala Lys Arg Asn Ile
Lys Pro Val Pro 2105 2110 2115Glu Ile
Lys Ile Leu Asn Asn Leu Gly Val Asp Ile Ala Ala Asn 2120
2125 2130Thr Val Ile Trp Asp Tyr Lys Arg Glu Ala
Pro Ala His Val Ser 2135 2140 2145Thr
Ile Gly Val Cys Thr Met Thr Asp Ile Ala Lys Lys Pro Thr 2150
2155 2160Glu Ser Ala Cys Ser Ser Leu Thr Val
Leu Phe Asp Gly Arg Val 2165 2170
2175Glu Gly Gln Val Asp Leu Phe Arg Asn Ala Arg Asn Gly Val Leu
2180 2185 2190Ile Thr Glu Gly Ser Val
Lys Gly Leu Thr Pro Ser Lys Gly Pro 2195 2200
2205Ala Gln Ala Ser Val Asn Gly Val Thr Leu Ile Gly Glu Ser
Val 2210 2215 2220Lys Thr Gln Phe Asn
Tyr Phe Lys Lys Val Asp Gly Ile Ile Gln 2225 2230
2235Gln Leu Pro Glu Thr Tyr Phe Thr Gln Ser Arg Asp Leu
Glu Asp 2240 2245 2250Phe Lys Pro Arg
Ser Gln Met Glu Thr Asp Phe Leu Glu Leu Ala 2255
2260 2265Met Asp Glu Phe Ile Gln Arg Tyr Lys Leu Glu
Gly Tyr Ala Phe 2270 2275 2280Glu His
Ile Val Tyr Gly Asp Phe Ser His Gly Gln Leu Gly Gly 2285
2290 2295Leu His Leu Met Ile Gly Leu Ala Lys Arg
Ser Gln Asp Ser Pro 2300 2305 2310Leu
Lys Leu Glu Asp Phe Ile Pro Met Asp Ser Thr Val Lys Asn 2315
2320 2325Tyr Phe Ile Thr Asp Ala Gln Thr Gly
Ser Ser Lys Cys Val Cys 2330 2335
2340Ser Val Ile Asp Leu Leu Leu Asp Asp Phe Val Glu Ile Ile Lys
2345 2350 2355Ser Gln Asp Leu Ser Val
Ile Ser Lys Val Val Lys Val Thr Ile 2360 2365
2370Asp Tyr Ala Glu Ile Ser Phe Met Leu Trp Cys Lys Asp Gly
His 2375 2380 2385Val Glu Thr Phe Tyr
Pro Lys Leu Gln Ala Ser Gln Ala Trp Gln 2390 2395
2400Pro Gly Val Ala Met Pro Asn Leu Tyr Lys Met Gln Arg
Met Leu 2405 2410 2415Leu Glu Lys Cys
Asp Leu Gln Asn Tyr Gly Glu Asn Ala Val Ile 2420
2425 2430Pro Lys Gly Ile Met Met Asn Val Ala Lys Tyr
Thr Gln Leu Cys 2435 2440 2445Gln Tyr
Leu Asn Thr Leu Thr Leu Ala Val Pro Tyr Asn Met Arg 2450
2455 2460Val Ile His Phe Gly Ala Gly Ser Asp Lys
Gly Val Ala Pro Gly 2465 2470 2475Thr
Ala Val Leu Arg Gln Trp Leu Pro Thr Gly Thr Leu Leu Val 2480
2485 2490Asp Ser Asp Leu Asn Asp Phe Val Ser
Asp Ala Asp Ser Thr Leu 2495 2500
2505Ile Gly Asp Cys Ala Thr Val His Thr Ala Asn Lys Trp Asp Leu
2510 2515 2520Ile Ile Ser Asp Met Tyr
Asp Pro Arg Thr Lys His Val Thr Lys 2525 2530
2535Glu Asn Asp Ser Lys Glu Gly Phe Phe Thr Tyr Leu Cys Gly
Phe 2540 2545 2550Ile Lys Gln Lys Leu
Ala Leu Gly Gly Ser Ile Ala Val Lys Ile 2555 2560
2565Thr Glu His Ser Trp Asn Ala Asp Leu Tyr Lys Leu Met
Gly His 2570 2575 2580Phe Ser Trp Trp
Thr Ala Phe Val Thr Asn Val Asn Ala Ser Ser 2585
2590 2595Ser Glu Ala Phe Leu Ile Gly Ala Asn Tyr Leu
Gly Lys Pro Lys 2600 2605 2610Glu Gln
Ile Asp Gly Tyr Thr Met His Ala Asn Tyr Ile Phe Trp 2615
2620 2625Arg Asn Thr Asn Pro Ile Gln Leu Ser Ser
Tyr Ser Leu Phe Asp 2630 2635 2640Met
Ser Lys Phe Pro Leu Lys Leu Arg Gly Thr Ala Val Met Ser 2645
2650 2655Leu Lys Glu Asn Gln Ile Asn Asp Met
Ile Tyr Ser Leu Leu Glu 2660 2665
2670Lys Gly Arg Leu Ile Ile Arg Glu Asn Asn Arg Val Val Val Ser
2675 2680 2685Ser Asp Ile Leu Val Asn
Asn 2690 26957620DNAArtificial sequenceS/L3/+/4932
primer 76ccacacacag cttgtggata
207720DNAArtificial sequenceS/L4/+/6401 primer 77ccgaagttgt
aggcaatgtc
207820DNAArtificial sequenceS/L4/+/6964 primer 78tttggtgctc cttcttattg
207920DNAArtificial
sequenceS/L4/-/6817 primer 79ccggcatcca aacataattt
208020DNAArtificial sequenceS/L5/-/7633 primer
80tggtcagtag ggttgattgg
208120DNAArtificial sequenceS/L5/-/8127 primer 81catcctttgt gtcaacatcg
208220DNAArtificial
sequenceS/L5/-/8633 primer 82gtcacgagtg acaccatcct
208320DNAArtificial sequenceS/L5/+/7839 primer
83atgcgacgag tctgcttcta
208420DNAArtificial sequenceS/L5/+/8785 primer 84ttcatagtgc ctggcttacc
208520DNAArtificial
sequenceS/L5/+/8255 primer 85atcttggcgc atgtattgac
208620DNAArtificial sequenceS/L6/-/9422 primer
86tgcattagca gcaacaacat
208720DNAArtificial sequenceS/L6/-/9966 primer 87tctgcagaac agcagaagtg
208820DNAArtificial
sequenceS/L6/-/10542 primer 88cctgtgcagt ttgtctgtca
208920DNAArtificial sequenceS/L6/+/10677 primer
89ccttgtggca atgaagtaca
209020DNAArtificial sequenceS/L6/+/10106 primer 90atgtcatttg cacagcagaa
209120DNAArtificial
sequenceS/L6/+/9571 primer 91cttcaatggt ttgccatgtt
209220DNAArtificial sequenceS/L7/-/11271 primer
92tgcgagctgt catgagaata
209320DNAArtificial sequenceS/L7/-/11801 primer 93aaccgagagc agtaccacag
209420DNAArtificial
sequenceS/L7/-/12383 primer 94tttggctgct gtagtcaatg
209520DNAArtificial sequenceS/L7/+/12640 primer
95ctacgacaga tgtcctgtgc
209620DNAArtificial sequenceS/L7/+/12088 primer 96gagcaggctg tagctaatgg
209720DNAArtificial
sequenceS/L7/+/11551 primer 97ttaggctatt gttgctgctg
209820DNAArtificial sequenceS/L8/-/13160 primer
98cagacaacat gaagcaccac
209920DNAArtificial sequenceS/L8/-/13704 primer 99cgctgacgtg atatatgtgg
2010020DNAArtificial
sequenceS/L8/-/14284 primer 100tgcacaatga aggatacacc
2010120DNAArtificial sequenceS/L8/+/14453
primer 101acatagctcg cgtctcagtt
2010220DNAArtificial sequenceS/L8/+/13968 primer 102ggcattgtag
gcgtactgac
2010319DNAArtificial sequenceS/L8/+/13401 primer 103gtttgcggtg taagtgcag
1910420DNAArtificial
sequenceS/L9/-/15098 primer 104tagtggcggc tattgacttc
2010520DNAArtificial sequenceS/L9/-/15677
primer 105ctaaaccttg agccgcatag
2010620DNAArtificial sequenceS/L9/-/16247 primer 106catggtcata
gcagcacttg
2010721DNAArtificial sequenceS/L9/+/16323 primer 107ccaggttgtg atgtcactga
t 2110820DNAArtificial
sequenceS/L9/+/15858 primer 108ccttacccag atccatcaag
2010920DNAArtificial sequenceS/L9/+/15288
primer 109cgcaaacata acacttgctg
2011020DNAArtificial sequenceS/L10/-/16914 primer 110agtgttgggt
acaagccagt
2011120DNAArtificial sequenceS/L10/-/17466 primer 111gttccaagga
acatgtctgg
2011220DNAArtificial sequenceS/L10/-/18022 primer 112aggtgcctgt
gtaggatgaa
2011320DNAArtificial sequenceS/L10/+/18245 primer 113gggctgtcat
gcaactagag
2011420DNAArtificial sequenceS/L10/+/17663 primer 114tcttacacgc
aatcctgctt
2011520DNAArtificial sequenceS/L10/+/17061 primer 115tacccatctg
ctcgcatagt
2011620DNAArtificial sequenceS/L11/-/18877 primer 116gcaagcagaa
ttaaccctca
2011720DNAArtificial sequenceS/L11/-/19396 primer 117agcaccacct
aaattgcatc
2011820DNAArtificial sequenceS/L11/-/20002 primer 118tggtcccttt
gaaggtgtta
2011920DNAArtificial sequenceS/L11/+/20245 primer 119tcgaacacat
cgtttatgga
2012020DNAArtificial sequenceS/L11/+/19611 primer 120gaagcacctg
tttccatcat
2012120DNAArtificial sequenceS/L11/+/19021 primer 121acgatgctca
gccatgtagt
2012220DNAArtificial sequenceSARS/L1/F3/+/800 primer 122gaggtgcagt
cactcgctat
2012320DNAArtificial sequenceSARS/L1/F4/+/1391 primer 123cagagattgg
acctgagcat
2012420DNAArtificial sequenceSARS/L1/F5/+/1925 primer 124cagcaaacca
ctcaattcct
2012520DNAArtificial sequenceSARS/L1/R3/-/1674 primer 125aaatgatggc
aacctcttca
2012620DNAArtificial sequenceSARS/L1/R4/-/1107 primer 126cacgtggttg
aatgactttg
2012720DNAArtificial sequenceSARS/L1/R5/-/520 primer 127atttctgcaa
ccagctcaac
2012820DNAArtificial sequenceSARS/L2/F3/+/2664 primer 128cgcattgtct
cctggtttac
2012920DNAArtificial sequenceSARS/L2/F4/+/3232 primer 129gagattgagc
cagaaccaga
2013020DNAArtificial sequenceSARS/L2/F5/+/3746 primer 130atgagcaggt
tgtcatggat
2013120DNAArtificial sequenceSARS/L2/R3/-/3579 primer 131ctgccttaag
aagctggatg
2013220DNAArtificial sequenceSARS/L2/R4/-/2991 primer 132tttcttcacc
agcatcatca
2013320DNAArtificial sequenceSARS/L2/R5/-/2529 primer 133caccgttctt
gagaacaacc
2013420DNAArtificial sequenceSARS/L3/F3/+/4708 primer 134tctttggctg
gctcttacag
2013520DNAArtificial sequenceSRAS/L3/F4/+/5305 primer 135gctggtgatg
ctgctaactt
2013620DNAArtificial sequenceSARS/L3/F5/+/5822 primer 136ccatcaagcc
tgtgtcgtat
2013720DNAArtificial sequenceSARS/L3/R3/-/5610 primer 137caggtggtgc
agacatcata
2013820DNAArtificial sequenceSARS/L3/R4/-/4988 primer 138aacatcagca
ccatccaagt
2013920DNAArtificial sequenceSARS/L3/R5/-/4437 primer 139atcggacacc
atagtcaacg
201407788DNAArtificial sequencesynthetic S gene 140tcaatattgg ccattagcca
tattattcat tggttatata gcataaatca atattggcta 60ttggccattg catacgttgt
atctatatca taatatgtac atttatattg gctcatgtcc 120aatatgaccg ccatgttggc
attgattatt gactagttat taatagtaat caattacggg 180gtcattagtt catagcccat
atatggagtt ccgcgttaca taacttacgg taaatggccc 240gcctggctga ccgcccaacg
acccccgccc attgacgtca ataatgacgt atgttcccat 300agtaacgcca atagggactt
tccattgacg tcaatgggtg gagtatttac ggtaaactgc 360ccacttggca gtacatcaag
tgtatcatat gccaagtccg ccccctattg acgtcaatga 420cggtaaatgg cccgcctggc
attatgccca gtacatgacc ttacgggact ttcctacttg 480gcagtacatc tacgtattag
tcatcgctat taccatggtg atgcggtttt ggcagtacac 540caatgggcgt ggatagcggt
ttgactcacg gggatttcca agtctccacc ccattgacgt 600caatgggagt ttgttttggc
accaaaatca acgggacttt ccaaaatgtc gtaataaccc 660cgccccgttg acgcaaatgg
gcggtaggcg tgtacggtgg gaggtctata taagcagagc 720tcgtttagtg aaccgtcaga
tcactagaag ctttattgcg gtagtttatc acagttaaat 780tgctaacgca gtcagtgctt
ctgacacaac agtctcgaac ttaagctgca gaagttggtc 840gtgaggcact gggcaggtaa
gtatcaaggt tacaagacag gtttaaggag accaatagaa 900actgggcttg tcgagacaga
gaagactctt gcgtttctga taggcaccta ttggtcttac 960tgacatccac tttgcctttc
tctccacagg tgtccactcc cagttcaatt acagctctta 1020aggctagagt acttaatacg
actcactata ggctagcgga tccaccatgt tcatcttcct 1080gctgttcctg accctgacca
gcggcagcga cctggaccgg tgcaccacct tcgacgacgt 1140gcaggccccc aactacaccc
agcacaccag cagcatgcgg ggcgtgtact accccgacga 1200gatctttcgg agcgacaccc
tgtacctgac ccaggacctg ttcctgccct tctacagcaa 1260cgtgaccggc ttccacacca
tcaaccacac cttcggcaac cccgtgatcc ccttcaagga 1320cggcatctac ttcgccgcca
ccgagaagag caacgtggtg cggggctggg tgttcggcag 1380caccatgaac aacaagagcc
agagcgtgat catcatcaac aacagcacca acgtggtgat 1440ccgggcctgc aacttcgagc
tgtgcgacaa ccccttcttc gccgtgtcca aacccatggg 1500cacccagacc cacaccatga
tcttcgacaa cgccttcaac tgcaccttcg agtacatcag 1560cgacgccttc agcctggacg
tgagcgagaa gagcggcaac ttcaagcacc tgcgggagtt 1620cgtgttcaag aacaaggacg
gcttcctgta cgtgtacaag ggctaccagc ccatcgacgt 1680ggtgagagac ctgcccagcg
gcttcaacac cctgaagccc atcttcaagc tgcccctggg 1740catcaacatc accaacttcc
gggccatcct gaccgccttt agccctgccc aggacatctg 1800gggcaccagc gccgccgcct
acttcgtggg ctacctgaag cctaccacct tcatgctgaa 1860gtacgacgag aacggcacca
tcaccgacgc cgtggactgc agccagaacc ccctggccga 1920gctgaagtgc agcgtgaaga
gcttcgagat cgacaagggc atctaccaga ccagcaactt 1980cagagtggtg cctagcggcg
atgtggtgcg gttccccaat atcaccaacc tgtgcccctt 2040cggcgaagtg ttcaacgcca
ccaagttccc cagcgtgtac gcctgggagc ggaagaagat 2100cagcaactgc gtggccgact
acagcgtgct gtacaactcc accttcttca gcaccttcaa 2160gtgctacggc gtgagcgcca
ccaagctgaa cgacctgtgc ttcagcaacg tgtacgccga 2220cagcttcgtg gtgaagggcg
acgacgtgag acagatcgcc cctggccaga ccggcgtgat 2280cgccgactac aactacaagc
tgcccgacga cttcatgggc tgcgtgctgg cctggaacac 2340ccggaacatc gacgccacaa
gcaccggcaa ctacaattac aagtaccgct acctgcggca 2400cggcaagctg cggcccttcg
agcgggacat ctccaacgtg cccttcagcc ccgacggcaa 2460gccctgcacc ccccctgccc
tgaactgcta ctggcccctg aacgactacg gcttctacac 2520caccaccggc atcggctatc
agccctacag agtggtggtg ctgagcttcg agctgctgaa 2580cgcccctgcc accgtgtgcg
gccccaagct gagcaccgac ctgatcaaga accagtgcgt 2640gaacttcaac ttcaacggcc
tgaccggcac cggcgtgctg acccccagca gcaagcgctt 2700ccagcccttc cagcagttcg
gccgggatgt gagcgacttc accgacagcg tgcgggaccc 2760caagaccagc gagatcctgg
acatcagccc ctgcagcttc ggcggcgtgt ccgtgatcac 2820ccccggcacc aacgccagca
gcgaagtggc cgtgctgtac caggacgtga actgcaccga 2880cgtgagcacc gccatccacg
ccgaccagct gacccccgcc tggcggatct acagcaccgg 2940gaacaacgtg ttccagaccc
aggccggctg cctgatcggc gccgagcacg tggacaccag 3000ctacgagtgc gacatcccca
ttggcgccgg aatctgcgcc agctaccaca ccgtgagcct 3060gctgcggagc accagccaga
agtccatcgt ggcctacacc atgagcctgg gcgccgacag 3120cagcatcgcc tacagcaaca
acaccatcgc catccccacc aacttcagca tctccatcac 3180caccgaagtg atgcccgtga
gcatggccaa gacaagcgtg gattgcaaca tgtacatctg 3240cggcgacagc accgagtgcg
ccaacctgct gctgcagtac ggcagcttct gcacccagct 3300gaaccgggcc ctgagcggca
tcgccgccga gcaggaccgg aacaccagag aagtgttcgc 3360ccaagtgaag cagatgtata
agacccccac cctgaagtac ttcgggggct tcaacttctc 3420tcagatcctg cccgaccctc
tgaagcccac caagcgctcc ttcatcgagg acctgctgtt 3480caacaaagtg accctggccg
acgccggctt tatgaagcag tacggcgagt gcctgggcga 3540catcaacgcc cgggacctga
tctgcgccca gaagtttaac gggctgaccg tgctgccccc 3600cctgctgacc gacgacatga
tcgccgccta tacagccgcc ctggtgagcg gcaccgccac 3660cgccggctgg accttcggag
ccggagccgc cctgcagatc cccttcgcca tgcagatggc 3720ctaccggttc aacggcatcg
gcgtgaccca gaacgtgctg tacgagaacc agaagcagat 3780cgccaaccag ttcaacaagg
ccatcagcca gatccaggag agcctgacca caaccagcac 3840cgccctgggc aagctgcagg
acgtggtgaa ccagaacgcc caggccctga acaccctggt 3900gaagcagctg agcagcaact
tcggcgccat cagctctgtg ctgaacgaca tcctgagcag 3960gctggacaaa gtggaggccg
aagtgcagat cgaccggctg atcaccggac gcctgcagtc 4020cctgcagacc tacgtgaccc
agcagctgat cagagccgcc gagatccggg ccagcgccaa 4080tctggccgcc accaagatga
gcgagtgcgt gctgggccag agcaagagag tggacttctg 4140cggcaagggc tatcacctga
tgagcttccc ccaggccgcc ccccacggcg tggtgttcct 4200gcacgtgacc tacgtgccta
gccaggagcg gaacttcacc accgccccag ccatctgcca 4260cgagggcaag gcctacttcc
cccgggaggg cgtgttcgtg tttaacggca ccagctggtt 4320catcacccag cgcaacttct
tcagccccca gatcatcacc acagacaaca ccttcgtgtc 4380cggcaactgt gatgtggtga
tcggcatcat caataacacc gtgtacgacc ccctgcagcc 4440cgagctggac agcttcaagg
aggagctgga caaatacttc aagaaccaca cctcccccga 4500cgtggacctg ggcgatatca
gcggcatcaa cgcctccgtg gtgaacatcc agaaggagat 4560cgacagactg aacgaagtgg
ccaagaacct gaacgagagc ctgatcgacc tgcaggagct 4620gggcaagtac gagcagtaca
tcaagtggcc ctggtacgtg tggctgggct tcatcgccgg 4680cctgatcgcc atcgtgatgg
tgaccatcct gctgtgctgc atgaccagct gctgtagctg 4740cctgaaaggc gcctgcagct
gtggcagctg ctgcaagttc gacgaggacg acagcgagcc 4800cgtgctgaag ggcgtgaagc
tgcactacac ctgataactc gagaattcac gcgtggtacc 4860tctagagtcg acccgggcgg
ccgcttcgag cagacatgat aagatacatt gatgagtttg 4920gacaaaccac aactagaatg
cagtgaaaaa aatgctttat ttgtgaaatt tgtgatgcta 4980ttgctttatt tgtaaccatt
ataagctgca ataaacaagt taacaacaac aattgcattc 5040attttatgtt tcaggttcag
ggggagatgt gggaggtttt ttaaagcaag taaaacctct 5100acaaatgtgg taaaatcgat
aaggatccgg gctggcgtaa tagcgaagag gcccgcaccg 5160atcgcccttc ccaacagttg
cgcagcctga atggcgaatg gacgcgccct gtagcggcgc 5220attaagcgcg gcgggtgtgg
tggttacgcg cagcgtgacc gctacacttg ccagcgccct 5280agcgcccgct cctttcgctt
tcttcccttc ctttctcgcc acgttcgccg gctttccccg 5340tcaagctcta aatcgggggc
tccctttagg gttccgattt agagctttac ggcacctcga 5400ccgcaaaaaa cttgatttgg
gtgatggttc acgtagtggg ccatcgccct gatagacggt 5460ttttcgccct ttgacgttgg
agtccacgtt ctttaatagt ggactcttgt tccaaactgg 5520aacaacactc aaccctatct
cggtctattc ttttgattta taagggattt tgccgatttc 5580ggcctattgg ttaaaaaatg
agctgattta acaaatattt aacgcgaatt ttaacaaaat 5640attaacgttt acaatttcgc
ctgatgcggt attttctcct tacgcatctg tgcggtattt 5700cacaccgcat atggtgcact
ctcagtacaa tctgctctga tgccgcatag ttaagccagc 5760cccgacaccc gccaacaccc
gctgacgcgc cctgacgggc ttgtctgctc ccggcatccg 5820cttacagaca agctgtgacc
gtctccggga gctgcatgtg tcagaggttt tcaccgtcat 5880caccgaaacg cgcgagacga
aagggcctcg tgatacgcct atttttatag gttaatgtca 5940tgataataat ggtttcttag
acgtcaggtg gcacttttcg gggaaatgtg cgcggaaccc 6000ctatttgttt atttttctaa
atacattcaa atatgtatcc gctcatgaga caataaccct 6060gataaatgct tcaataatat
tgaaaaagga agagtatgag tattcaacat ttccgtgtcg 6120cccttattcc cttttttgcg
gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg 6180tgaaagtaaa agatgctgaa
gatcagttgg gtgcacgagt gggttacatc gaactggatc 6240tcaacagcgg taagatcctt
gagagttttc gccccgaaga acgttttcca atgatgagca 6300cttttaaagt tctgctatgt
ggcgcggtat tatcccgtat tgacgccggg caagagcaac 6360tcggtcgccg catacactat
tctcagaatg acttggttga gtactcacca gtcacagaaa 6420agcatcttac ggatggcatg
acagtaagag aattatgcag tgctgccata accatgagtg 6480ataacactgc ggccaactta
cttctgacaa cgatcggagg accgaaggag ctaaccgctt 6540ttttgcacaa catgggggat
catgtaactc gccttgatcg ttgggaaccg gagctgaatg 6600aagccatacc aaacgacgag
cgtgacacca cgatgcctgt agcaatggca acaacgttgc 6660gcaaactatt aactggcgaa
ctacttactc tagcttcccg gcaacaatta atagactgga 6720tggaggcgga taaagttgca
ggaccacttc tgcgctcggc ccttccggct ggctggttta 6780ttgctgataa atctggagcc
ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc 6840cagatggtaa gccctcccgt
atcgtagtta tctacacgac ggggagtcag gcaactatgg 6900atgaacgaaa tagacagatc
gctgagatag gtgcctcact gattaagcat tggtaactgt 6960cagaccaagt ttactcatat
atactttaga ttgatttaaa acttcatttt taatttaaaa 7020ggatctaggt gaagatcctt
tttgataatc tcatgaccaa aatcccttaa cgtgagtttt 7080cgttccactg agcgtcagac
cccgtagaaa agatcaaagg atcttcttga gatccttttt 7140ttctgcgcgt aatctgctgc
ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt 7200tgccggatca agagctacca
actctttttc cgaaggtaac tggcttcagc agagcgcaga 7260taccaaatac tgtccttcta
gtgtagccgt agttaggcca ccacttcaag aactctgtag 7320caccgcctac atacctcgct
ctgctaatcc tgttaccagt ggctgctgcc agtggcgata 7380agtcgtgtct taccgggttg
gactcaagac gatagttacc ggataaggcg cagcggtcgg 7440gctgaacggg gggttcgtgc
acacagccca gcttggagcg aacgacctac accgaactga 7500gatacctaca gcgtgagcta
tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca 7560ggtatccggt aagcggcagg
gtcggaacag gagagcgcac gagggagctt ccagggggaa 7620acgcctggta tctttatagt
cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt 7680tgtgatgctc gtcagggggg
cggagcctat ggaaaaacgc cagcaacgcg gcctttttac 7740ggttcctggc cttttgctgg
ccttttgctc acatggctcg acagatct 778814123DNAArtificial
sequenceSNE-S1 primer 141ggttgggatt atccaaaatg tga
2314224DNAArtificial sequenceSNE-AS1 primer
142gcatcatcag aaagaatcat catg
2414321DNAArtificial sequenceSAR1-S primer 143cctctcttgt tcttgctcgc a
2114421DNAArtificial
sequenceSAR1-AS primer 144tatagtgagc cgccacacat g
2114545DNAArtificial sequencePCR primer
145ataggatcca ccatgtttat tttcttatta tttcttactc tcact
4514637DNAArtificial sequencePCR primer 146atactcgagt tatgtgtaat
gtaatttgac acccttg 3714745DNAArtificial
sequencePCR primer 147ataggatcca ccatgtttat tttcttatta tttcttactc tcact
4514836DNAArtificial sequencePCR primer 148acctccggat
ttaatatatt gctcatattt tcccaa
3614913PRTArtificial sequenceN-terminal end of SRAS-CoV S protein
(amino acids 1 to 13) 149Met Phe Ile Phe Leu Leu Phe Leu Thr Leu Thr Ser
Gly1 5 1015010PRTArtificial
sequenceoligopeptide 150Ser Gly Asp Tyr Lys Asp Asp Asp Asp Lys1
5 1015134DNAArtificial sequencePCR primer
151actagctagc ggatccacca tgttcatctt cctg
3415233DNAArtificial sequencePCR primer 152agtatccgga cttgatgtac
tgctcgtact tgc 3315359DNAArtificial
sequenceoligonucleotid 153tatgagcttt tttttttttt tttttttggc atataaatag
actcggcgcg ccatctgca 5915453DNAArtificial sequenceoligonucleotid
154gatggcgcgc cgagtctatt tatatgccaa aaaaaaaaaa aaaaaaaagc tca
5315545DNAArtificial sequencePCR primer 155atacgtacga ccatgtttat
tttcttatta tttcttactc tcact 4515640DNAArtificial
sequencePCR primer 156atagcgcgct cattatgtgt aatgtaattt gacacccttg
4015720DNAArtificial sequencePCR primer 157ccatttcaac
aatttggccg
2015845DNAArtificial sequencePCR primer 158ataggatccg cgcgctcatt
atttatcgtc gtcatcttta taatc 45
User Contributions:
Comment about this patent or add new information about this topic:
People who visited this patent also read: | |
Patent application number | Title |
---|---|
20180040836 | ORGANIC ELECTROLUMINESCENT MATERIALS CONTAINING CARBOLINE GROUP AND ORGANIC ELECTROLUMINESCENT DEVICE BY USING THE SAME |
20180040834 | ORGANIC LIGHT EMITTING COMPOUND AND ORGANIC LIGHT EMITTING DIODE INCLUDING THE SAME |
20180040833 | HETEROCYCLIC COMPOUND AND ORGANIC LIGHT-EMITTING DEVICE INCLUDING THE SAME |
20180040832 | ELECTRONIC DEVICE CONTAINING CYCLIC LACTAMS |
20180040829 | ORGANIC ELECTROLUMINESCENT ELEMENT |