CN109265562B

CN109265562B - Nicking enzyme and application thereof in genome base replacement

Info

Publication number: CN109265562B
Application number: CN201811122909.9A
Authority: CN
Inventors: 杨进孝; 杨永星; 吕欣欣; 赵思; 冯峰
Original assignee: Beijing Academy of Agriculture and Forestry Sciences
Current assignee: Beijing Academy of Agriculture and Forestry Sciences
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2021-03-30
Anticipated expiration: 2038-09-26
Also published as: CN109265562A

Abstract

The invention discloses a nicking enzyme and application thereof in genome base replacement. According to the invention, nicking enzyme HypaCas9n and PmCDA1 and UGI are fused for the first time to construct a base editing system, and the finding shows that the detrargeting efficiency can be reduced under the condition that the C.T base replacement efficiency is not influenced basically compared with HypACas9n & PmCDA1& UGI and SpCas9n & PmCDA1& UGI.

Description

Nicking enzyme and application thereof in genome base replacement

Technical Field

The invention relates to a nicking enzyme and application thereof in genome base replacement.

Background

The emergence and development of the CRISPR-Cas9(the clustered differentiated short palindromic repeats-CRISPR-associated protein 9) technology has become a powerful genome editing means, and is widely applied to many tissues and cells. The CRISPR/Cas9 protein-RNA complex is localized on the target by a guide RNA (guide RNA), cleaved to generate a DNA Double Strand Break (DSB), and the organism will then instigate a DNA repair mechanism to repair the DSB. There are generally two repair mechanisms, one is non-homologous end joining (NHEJ) and one is homologous recombination (HDR), and NHEJ repair usually accounts for the majority, so repair produces random indels (insertions or deletions) much higher than precise repair. For base exact substitution, the application of using HDR to achieve base exact substitution is greatly limited because of the low efficiency of HDR and the need for a DNA template.

In 2016, two laboratories such as David Liu and Akihiko Kondo independently report two different types of Cytosine Base Editors (CBE) respectively, and the principle is that a single C (Cytosine) base is directly edited by using cytidine deaminase, and the base editing efficiency of C to T (Thymine) is greatly improved by generating DSB and initiating HDR repair. PmcDA1(activation-induced cytidine deaminase (AID) orthogonal from sea layout) is one type of cytidine deaminase used therein. In the tested PmCDA1 editor, the average mutation rate of the SpCas9n (D10A) & PmCDA1& UGI (Uracil DNA glucoamylase inhibitor) base editing system was higher, firstly because UGI can inhibit UDG (Uracil DNA glucoamylase) from catalyzing and eliminating U (Uracil) in DNA, and secondly because SpCas9n (D10A) nicks on the non-editing strand, inducing eukaryotic mismatch repair mechanism or long-patch BER (base-evolution repair) repair mechanism, promoting more preferential repair of U: G mismatch to U: a. SpCas9n (D10A) is positioned to a target point through sgRNA together with PmCDA1, PmCDA1 catalyzes C on unpaired single-stranded DNA to generate cytosine deamination reaction to become U, U is paired with A (Adenine and Adenine) through DNA repair, and finally T is paired with A through DNA replication, so that conversion from C to T is realized.

The gene group editing using SpCas9 in plants has a certain off-target effect, and the fusion of SpCas9n (D10A) and PmCDA1 is used for base editing, namely, the potential off-target risk can exist when the SpCas9n (D10A) & PmCDA1& UGI base editing system is used. Although plants differ from animals in that off-target sites can be removed by genetic segregation in later generations, it is difficult to purposefully remove them in later generations, since some potential off-target sites may not be known. Therefore, reducing off-target effects is also a long-standing technical direction in plants.

Disclosure of Invention

The invention aims to provide a nicking enzyme and application thereof in genome base replacement.

The invention provides a fusion protein which comprises nickase, cytosine nucleoside deaminase PmCDA1 and uracil DNA glucoamylase inhibitor UGI; the nicking enzyme is shown as amino acids from 1 st to 1423 rd positions of the N end of a sequence 13 in a sequence table.

The cytosine nucleoside deaminase PmCDA1 is shown as amino acid from 1521-1728 th site of the N end of the sequence 13 in the sequence table.

The uracil DNA glucoamylase inhibitor UGI is shown as the amino acid at the 1736-1833 th site from the N end of the sequence 13 in the sequence table.

The invention also protects the coding gene of the fusion protein.

The encoding gene of the fusion protein can be specifically shown as 1721-7222 th from the 5' end of sequence 2 in the sequence table (wherein, the 1721-5989 th is the encoding gene of nicking enzyme, the 6281-6904 th is the encoding gene of cytosine nucleoside deaminase PmCDA1, and the 6926-7222 th is the encoding gene of uracil DNA glucoamylase inhibitor UGI).

The invention also provides a base editing system comprising the fusion protein.

The system also includes a sgRNA.

The nucleotide sequence of the sgRNA is shown in sequence 1 of the sequence table from 571-646 of the 5' end.

The invention also protects a recombinant expression vector, an expression cassette, a recombinant cell or a recombinant bacterium for expressing the base editing system. The expression cassette for expressing the fusion protein may specifically be expression cassette a. The expression cassette for expressing the sgRNA may specifically be expression cassette b.

The invention also discloses a recombinant expression vector for genome base replacement, which comprises an expression cassette A and an expression cassette B; the expression cassette A expresses the fusion protein; the expression cassette B comprises n elements B; the element b comprises sgRNA and a target sequence; the recombinant expression vector can target n different target sequences for base substitution.

The element B also includes a pre-tRNA. The nucleotide sequence of the pre-tRNA is shown as the 474-550 th position from the 5' end of the sequence 1 in the sequence table. The element B is provided with a pre-tRNA, a target sequence and a nucleotide sequence of sgRNA from the 5' end in sequence.

The expression cassette A is expressed by a promoter A to start a coding gene of nickase, a coding gene of cytosine nucleoside deaminase PmCDA1 and a coding gene of uracil DNA glucoamylase inhibitor UGI. The expression cassette A sequentially comprises a promoter A, a coding gene of nicking enzyme, a coding gene of cytosine nucleoside deaminase PmCDA1, a coding gene of uracil DNA glucoamylase inhibitor UGI and a terminator A from the 5' end. The promoter A can be an OsUbq3 promoter. The nucleotide sequence of the OsUbq3 promoter is shown as 1-1714 th from 5' end of a sequence 2 in a sequence table. The terminator A can be a CaMV35S terminator. The nucleotide sequence of the CaMV35S terminator is shown as the 5' -end 7229-7423 site in the sequence 2 of the sequence table. The coding gene of the nicking enzyme is shown as the 1721-5989 th site from the 5' end of the sequence 2 of the sequence table. The coding gene of the cytosine nucleoside deaminase PmCDA1 is shown as the 6281-6904 th site from the 5' end of the sequence 2 in the sequence table. The UGI encoding gene of the uracil DNA glucoamylase inhibitor is shown as the 6926-7222 th site from the 5' end of the sequence 2 in the sequence table. The expression cassette A can be specifically shown as a sequence 2 in a sequence table.

And any one of the expression cassettes B is expressed by a promoter B promoter element B. The expression cassette B is provided with a promoter B, an element B and a terminator B in sequence from the 5' end. The promoter B can be specifically OsU3 promoter. The nucleotide sequence of the OsU3 promoter is shown as the 131-467 th site from the 5' end of the sequence 1 in the sequence table. The terminator B can be specifically an OsU3 terminator. The nucleotide sequence of the OsU3 terminator is shown as the 993-position and 1283-position from the 5' end of the sequence 1 in the sequence table. When the target sequence is shown in Table 1, the expression cassette B can be specifically shown as 131 th to 1283 rd positions from 5' end of the sequence 1 in the sequence table.

When the target sequences are shown in Table 1, any one of the above recombinant expression vectors can be specifically a circular plasmid obtained by replacing the sequence 1 from the 5' end of the sequence table with the sequence 2 of the sequence table from 1290-8712.

The invention also provides a method for replacing the base of the plant genome, which comprises the following steps: the base substitution of the plant genome is accomplished using any of the base editing systems described above.

The invention also provides a method for replacing the base of the plant genome, which comprises the following steps: the recombinant expression vector described above is introduced into a target plant to achieve base substitution of a plant genome.

The invention also protects the nicking enzyme, which is shown as amino acids from 1 st to 1423 rd of the N end of the sequence 13 in the sequence table.

The invention also provides the fusion protein, or the base editing system, or any one of the recombinant expression vector, the expression cassette, the recombinant cell or the recombinant bacterium, or the application of the nicking enzyme in base replacement of plant genomes.

Any of the base substitutions described above is a substitution of base C to T.

Any of the plants described above may specifically be rice, more specifically may be japonica rice.

According to the invention, nicking enzyme HypaCas9n and PmCDA1 and UGI are fused for the first time to construct a base editing system, and the finding shows that the detrargeting efficiency can be reduced under the condition that the C.T base replacement efficiency is not influenced basically compared with HypACas9n & PmCDA1& UGI and SpCas9n & PmCDA1& UGI.

Drawings

FIG. 1 shows the C.T base substitution efficiency of SpCas9n & PmCDA1& UGI and HypaCas9n & PmCDA1& UGI.

FIG. 2 shows the off-target efficiency of SpCas9n & PmCDA1& UGI and HypaCas9n & PmCDA1& UGI.

Detailed Description

The following examples are given to facilitate a better understanding of the invention, but do not limit the invention. The experimental procedures in the following examples are conventional unless otherwise specified. The test materials used in the following examples were purchased from a conventional biochemical reagent store unless otherwise specified.

Japanese fine rice: reference documents: the effects of sodium nitroprusside and its photolysis products on the growth of Nippon rice seedlings and the expression of 5 hormone marker genes [ J ]. proceedings of university of Master Henan (Nature edition), 2017(2): 48-52.; the public is available from the agroforestry academy of sciences of Beijing.

The target genes, target names and sequences in the examples below are shown in Table 1.

TABLE 1

Target gene	Name of target point	Target sequence
			OsALS	CS650	cgcgtccatggagatccacc
OsCDC48	CS651	gaccagccagcgtctggcgc
			OsNRT1.1B	CS652	cggcgacggcgagcaagtgg

Example 1, Hypacas9n & PmCDA1& UGI System C.T base substitution efficiency

First, construction of genome editing vector

1. SpCas9n & PmCDA1& UGI vector: artificially synthesizing a circular plasmid shown in a sequence 1 in a sequence table.

Sequence 1 of the sequence table comprises the following three expression cassettes:

the 131 rd to 1283 rd positions from the 5' end of the sequence 1 are an expression cassette I, wherein the 131 st and 467 th positions are OsU3 th nucleotide sequence of promoter, the 474 st and 550 th positions are pre-tRNA nucleotide sequence, the 551 and 570 th positions are CS650 target nucleotide sequence, the 571 st and 646 th positions are sgRNA nucleotide sequence, the 647 nd and 723 th positions are pre-tRNA nucleotide sequence, the 724 nd and 743 th positions are CS651 target nucleotide sequence, the 744 and 819 th positions are sgRNA nucleotide sequence, the 820 th and 896 th positions are pre-tRNA nucleotide sequence, the 897 th and 916 th positions are CS652 target nucleotide sequence, the 917 and 992 th positions are sgRNA nucleotide sequence, and the 993 rd and 1283 th positions are OsU3 terminator nucleotide sequence.

The 1290-8712 position of the sequence 1 from the 5' end is an expression cassette II, wherein the 1290-3003 position is the nucleotide sequence of the OsUbq3 promoter, the 3010-7278 position is the nucleotide sequence of SpCas9n (without a stop codon), the 7570-8193 position is the nucleotide sequence of PmCDA1, the 8215-8511 position is the nucleotide sequence of UGI, and the 8518-8712 position is the nucleotide sequence of the CaMV35S terminator.

The 8787-12064 of the 5' end of the sequence 1 is the expression cassette III, wherein the 8787-10779 is the nucleotide sequence of ZmUbi1 promoter, the 10786-11811 is the nucleotide sequence of hygromycin, and the 11812-12064 is the nucleotide sequence of Nos terminator.

2. HypaCas9n & PmCDA1& UGI vector: the circular plasmid is obtained by replacing the 1290-8712 (expression cassette II) of the sequence 1 in the sequence table from the 5' end with the sequence 2 in the sequence table. The HypaCas9n & PmCDA1& UGI vector differs from the SpCas9n & PmCDA1& UGI vector only in that the nucleotide sequence of SpCas9n (which does not contain a stop codon) is replaced with the nucleotide sequence of HypaCas9n (which does not contain a stop codon).

In the sequence 2 of the sequence table, the nucleotide sequence of the OsUbq3 promoter is from the 1 st to 1714 th position of the 5' end, the nucleotide sequence of the HypaCas9n is from the 1721 st-5989 th position (not containing a stop codon), the nucleotide sequence of the PmCDA1 is from the 6281 st-6904 th position, the nucleotide sequence of the UGI is from the 6926 st-7222 th position, and the nucleotide sequence of the CaMV35S terminator is from the 7229 st-7423 th position.

Secondly, gene editing is carried out in the rice callus

The SpCas9n & PmCDA1& UGI vector and the HypaCas9n & PmCDA1& UGI vector constructed in the step one are respectively operated according to the following steps 1-5:

1. introducing the vector into Agrobacterium LBA4404 (Diego, Shanghai, CAT #: AC1030) to obtain recombinant bacteria, and culturing the recombinant bacteria with YEP culture medium to obtain bacterial liquid OD_600nmThe staining solution was 0.2.

2. Selecting seeds of the Japanese fine rice, peeling off the seeds, sterilizing and washing, uniformly dropping the seeds into an N6 culture medium, and culturing at 28 ℃ in the dark for 4-6 weeks to induce the generation of callus.

3. Soaking the rice callus obtained in the step 2 in the infection solution obtained in the step 1 for 10min, taking the callus to inoculate on a culture dish containing two layers of filter paper, culturing for 3 days at 25 ℃ in the dark (the culture medium is N6 culture medium containing 100mg/L timentin), then screening and culturing the callus in a screening culture medium (N6 culture medium containing 50mg/L hygromycin, pH5.7) at 28 ℃ in the dark for 2 weeks, and transferring the callus into a newly configured screening culture medium to perform screening and culturing for 2 weeks again to obtain resistant callus.

4. Extracting the genome DNA of the resistant callus obtained in the step 3, and performing PCR amplification by adopting a primer pair consisting of a primer F (5'-attatgtagcttgtgcgtttcg-3') and a primer R (5'-gatgaagagcttatcgacgt-3'); the obtained amplification product is subjected to agarose gel electrophoresis, and a band with the size of 1150bp is shown in an electrophoretogram, so that the corresponding callus is a positive resistant callus.

5. The positive resistant callus DNA obtained in step 4 (15 selected per vector) was PCR amplified using CS650 target primers (CS650-F and CS650-R), CS651 target primers (CS651-F and CS651-R) and CS652 target primers (CS652-F and CS652-R), respectively, and the amplified products were then sequenced.

CS650-F:5’-taagaaccaccagcgacacc-3’；

CS650-R:5’-ggtaattgtgcttggtgatggag-3’；

CS651-F:5’-acatcgagatggagaagcgg-3’；

CS651-R:5’-ccatgctccaatcgatgaatac-3’；

CS652-F:5’-ttacgaactttataactttgtcgg-3’；

CS652-R:5’-atggaggcgatgaggaagac-3’。

For the CS650 target, the sites at which C.T base substitutions occur correspond to: cgTgtccatggagatccacc, respectively;

for the CS651 target, the sites at which c.t base substitutions occur correspond to: gaTTagccagcgtctggcgc, respectively;

for the CS652 target, the sites at which the c.t base substitutions occur correspond to: cggTgacggcgagcaagtgg are provided.

The three targets respectively count the number of the positive resistant calli which are subjected to C.T base substitution, and the base substitution efficiency is the proportion of the positive resistant calli which are subjected to C.T base substitution in 15 positive resistant calli.

The results are shown in FIG. 1. The experimental result shows that the C.T base replacement efficiency of HypaCas9n & PmCDA1& UGI is not greatly different or slightly reduced from that of SpCas9n & PmCDA1& UGI.

Example 2 HypaCas9n & PmCDA1& UGI off-target Effect

First, construction of genome editing vector

SpCas9n & PmCDA1& UGI-T1: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 3 in the sequence table.

SpCas9n & PmCDA1& UGI-T2: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 4 in the sequence table.

SpCas9n & PmCDA1& UGI-T3: the circular plasmid is obtained by replacing the 551-th and 916-th sites of the sequence 1 in the sequence table from the 5' end with the sequence 5 in the sequence table.

SpCas9n & PmCDA1& UGI-T4: the circular plasmid is obtained by replacing the 551-th and 916-th sites of the sequence 1 in the sequence table from the 5' end with the sequence 6 in the sequence table.

SpCas9n & PmCDA1& UGI-T5: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 7 in the sequence table.

SpCas9n & PmCDA1& UGI-T6: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 8 in the sequence table.

SpCas9n & PmCDA1& UGI-T7: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 9 in the sequence table.

SpCas9n & PmCDA1& UGI-T8: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 10 in the sequence table.

SpCas9n & PmCDA1& UGI-T9: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 11 in the sequence table.

SpCas9n & PmCDA1& UGI-T10: the circular plasmid is obtained by replacing the 551-916 th site of the sequence 1 in the sequence table from the 5' end with the sequence 12 in the sequence table.

Hypacas9n & PmCDA1& UGI-T1: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 3 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T2: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 4 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T3: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 5 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T4: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 6 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-5: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 7 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T6: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 8 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T7: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 9 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T8: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 10 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T9: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 11 of the sequence table from the 551-916 position of the 5' end.

Hypacas9n & PmCDA1& UGI-T10: the HypaCas9n & PmCDA1& UGI vector is replaced by a circular plasmid obtained by the sequence 12 of the sequence table from the 551-916 position of the 5' end.

Secondly, gene editing is carried out in the rice callus

1. The vector obtained in the first step is operated according to 1-4 of the second step in the example 1 respectively to obtain the positive resistant callus.

2. Randomly selecting 8 blocks from the positive resistant callus obtained in the step 1 by each vector, mixing DNA of the 8 blocks, respectively adopting CS652 target primers (CS652-F and CS652-R) to carry out first round of PCR amplification, taking a first round of PCR product as a template, adding forward and reverse barcodes into the tail ends of the PCR product to construct a library, sequencing by using an IlluminaHiSeq2500 high-throughput sequencing platform, wherein the sequencing depth is 10000X (Shijiazhuang Boryddi Biotechnology Co., Ltd.), a target region detects C.T base substitution and indels, any C.T base substitution and indels in the target region are all subjected to off-target mutation, and the off-target efficiency is the proportion of the detected mutant reads number to the total reads number.

The results are shown in FIG. 2. The experimental result shows that when the 2bp at the 5' end is different, off-target effect can occur; the off-target efficiency of HypaCas9n & PmCDA1& UGI occurred about 4-fold lower compared to SpCas9n & PmCDA1& UGI.

The above results indicate that the D10A mutation was introduced into HypaCas9, and changed into a nicking enzyme (HypaCas9n), and fused with PmCDA1 and UGI to construct a base editing system. HypaCas9n & PmCDA1& UGI can decrease off-target efficiency without substantially affecting c.t base substitution efficiency, compared to SpCas9n & PmCDA1& UGI.

Sequence listing

<110> agriculture and forestry academy of sciences of Beijing City

<120> a nicking enzyme and its use in genome base substitution

<160> 13

<170> SIPOSequenceListing 1.0

<210> 1

<211> 18476

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

ggtggcagga tatattgtgg tgtaaacatg gcactagcct caccgtcttc gcagacgagg 60

ccgctaagtc gcagctacgc tctcaacggc actgactagg tagtttaaac gtgcacttaa 120

ttaaggtacc gaagcaactt aaagttatca ggcatgcatg gatcttggag gaatcagatg 180

tgcagtcagg gaccatagca caagacaggc gtcttctact ggtgctacca gcaaatgctg 240

gaagccggga acactgggta cgttggaaac cacgtgatgt gaagaagtaa gataaactgt 300

aggagaaaag catttcgtag tgggccatga agcctttcag gacatgtatt gcagtatggg 360

ccggcccatt acgcaattgg acgacaacaa agactagtat tagtaccacc tcggctatcc 420

acatagatca aagctgattt aaaagagttg tgcagatgat ccgtggcgga tccaacaaag 480

caccagtggt ctagtggtag aatagtaccc tgccacggta cagacccggg ttcgattccc 540

ggctggtgca cgcgtccatg gagatccacc gttttagagc tagaaatagc aagttaaaat 600

aaggctagtc cgttatcaac ttgaaaaagt ggcaccgagt cggtgcaaca aagcaccagt 660

ggtctagtgg tagaatagta ccctgccacg gtacagaccc gggttcgatt cccggctggt 720

gcagaccagc cagcgtctgg cgcgttttag agctagaaat agcaagttaa aataaggcta 780

gtccgttatc aacttgaaaa agtggcaccg agtcggtgca acaaagcacc agtggtctag 840

tggtagaata gtaccctgcc acggtacaga cccgggttcg attcccggct ggtgcacggc 900

gacggcgagc aagtgggttt tagagctaga aatagcaagt taaaataagg ctagtccgtt 960

atcaacttga aaaagtggca ccgagtcggt gctttttttt ttcgttttgc attgagtttt 1020

ctccgtcgca tgtttgcagt tttattttcc gttttgcatt gaaatttctc cgtctcatgt 1080

ttgcagcgtg ttcaaaaagt acgcagctgt atttcactta tttacggcgc cacattttca 1140

tgccgtttgt gccaactatc ccgagctagt gaatacagct tggcttcaca caacactggt 1200

gacccgctga cctgctcgta cctcgtaccg tcgtacggca cagcatttgg aattaaaggg 1260

tgtgatcgat actgcttgct gctaagctta caaattcggg tcaaggcgga agccagcgcg 1320

ccaccccacg tcagcaaata cggaggcgcg gggttgacgg cgtcacccgg tcctaacggc 1380

gaccaacaaa ccagccagaa gaaattacag taaaaaaaaa gtaaattgca ctttgatcca 1440

ccttttatta cctaagtctc aatttggatc acccttaaac ctatcttttc aatttgggcc 1500

gggttgtggt ttggactacc atgaacaact tttcgtcatg tctaacttcc ctttcagcaa 1560

acatatgaac catatataga ggagatcggc cgtatactag agctgatgtg tttaaggtcg 1620

ttgattgcac gagaaaaaaa aatccaaatc gcaacaatag caaatttatc tggttcaaag 1680

tgaaaagata tgtttaaagg tagtccaaag taaaacttat agataataaa atgtggtcca 1740

aagcgtaatt cactcaaaaa aaatcaacga gacgtgtacc aaacggagac aaacggcatc 1800

ttctcgaaat ttcccaaccg ctcgctcgcc cgcctcgtct tcccggaaac cgcggtggtt 1860

tcagcgtggc ggattctcca agcagacgga gacgtcacgg cacgggactc ctcccaccac 1920

ccaaccgcca taaataccag ccccctcatc tcctctcctc gcatcagctc cacccccgaa 1980

aaatttctcc ccaatctcgc gaggctctcg tcgtcgaatc gaatcctctc gcgtcctcaa 2040

ggtacgctgc ttctcctctc ctcgcttcgt ttcgattcga tttcggacgg gtgaggttgt 2100

tttgttgcta gatccgattg gtggttaggg ttgtcgatgt gattatcgtg agatgtttag 2160

gggttgtaga tctgatggtt gtgatttggg cacggttggt tcgataggtg gaatcgtggt 2220

taggttttgg gattggatgt tggttctgat gattgggggg aatttttacg gttagatgaa 2280

ttgttggatg attcgattgg ggaaatcggt gtagatctgt tggggaattg tggaactagt 2340

catgcctgag tgattggtgc gatttgtagc gtgttccatc ttgtaggcct tgttgcgagc 2400

atgttcagat ctactgttcc gctcttgatt gagttattgg tgccatgggt tggtgcaaac 2460

acaggcttta atatgttata tctgttttgt gtttgatgta gatctgtagg gtagttcttc 2520

ttagacatgg ttcaattatg tagcttgtgc gtttcgattt gatttcatat gttcacagat 2580

tagataatga tgaactcttt taattaattg tcaatggtaa ataggaagtc ttgtcgctat 2640

atctgtcata atgatctcat gttactatct gccagtaatt tatgctaaga actatattag 2700

aatatcatgt tacaatctgt agtaatatca tgttacaatc tgtagttcat ctatataatc 2760

tattgtggta atttcttttt actatctgtg tgaagattat tgccactagt tcattctact 2820

tatttctgaa gttcaggata cgtgtgctgt tactacctat ctgaatacat gtgtgatgtg 2880

cctgttacta tctttttgaa tacatgtatg ttctgttgga atatgtttgc tgtttgatcc 2940

gttgttgtgt ccttaatctt gtgctagttc ttaccctatc tgtttggtga ttatttcttg 3000

cagtacgtaa tggactacaa ggaccacgac ggggattaca aagaccacga catagactac 3060

aaggatgacg atgacaaaat ggcaccgaag aaaaaaagga aggtcggaat ccatggcgtt 3120

ccagctgccg ataagaaata ttccatcgga ctcgccattg gcacgaatag cgtcggatgg 3180

gctgttatta ctgatgagta caaagttccg tctaagaagt tcaaggtgct gggcaacaca 3240

gaccgccaca gcataaagaa aaatctcatc ggtgcactcc ttttcgatag tggggagact 3300

gcagaagcga caagattgaa aaggactgcg agaaggcgct atacacggcg taagaataga 3360

atctgctacc ttcaggagat tttctctaac gaaatggcta aggtcgatga cagtttcttt 3420

catagacttg aggaatcgtt cttggttgag gaggataaga aacatgagag gcacccgata 3480

tttggaaaca tcgtggatga ggtcgcatat catgaaaagt accccacaat ctaccacctg 3540

agaaagaaac tcgttgattc caccgacaaa gcggatttga gactcatcta cctcgctctt 3600

gcccatatga taaagttccg cggacacttt ctgatcgagg gcgacctcaa ccctgataat 3660

agcgacgtcg ataagctctt catccagttg gttcaaacct acaatcagct ctttgaggaa 3720

aacccaatta atgctagtgg agtggatgca aaagcgatac tgtcggccag actctccaag 3780

agcagaaggt tggagaacct gatcgctcaa cttcctggag aaaagaaaaa cggtcttttt 3840

gggaatttga ttgccttgtc tctgggcctc acaccaaact tcaagtcaaa ttttgacctc 3900

gctgaggatg ccaaacttca gttgtctaag gatacctatg atgacgatct tgacaatttg 3960

ctggcacaaa ttggcgacca gtacgcggat ctgttcctcg cagcgaagaa tctgagtgat 4020

gctattctcc tttcggacat actcagggtt aacactgaga tcacaaaagc acctttgagt 4080

gcgtcgatga ttaagcgcta tgatgaacat caccaagacc tcactttgct gaaggccctt 4140

gtgcggcagc aattgccaga gaagtacaaa gaaatcttct ttgaccaatc taagaacgga 4200

tacgctggct atattgatgg aggagcttct caggaggaat tctataagtt tatcaaacct 4260

atacttgaga agatggatgg tacagaggaa ctccttgtta aattgaacag agaagatttg 4320

ctgcgcaagc aacggacctt tgacaacgga tcaattccgc atcagataca cctcggcgag 4380

cttcatgcca tccttcgccg gcaggaagat ttctacccct ttttgaagga caaccgcgag 4440

aagatagaaa aaatccttac gttccggatt ccttactatg tgggtccatt ggcaaggggg 4500

aattcccgct ttgcgtggat gactcggaaa agcgaggaaa ctatcacacc gtggaacttc 4560

gaggaagttg tggacaaggg agcttctgcc caatcattca ttgagaggat gactaacttc 4620

gataagaacc tgccgaacga gaaagttctc cccaagcact ccctccttta cgagtatttc 4680

accgtgtata acgaacttac gaaggttaaa tacgtgactg agggtatgag gaagccagca 4740

ttcttgagcg gggaacaaaa gaaagcgatt gttgatttgc tgtttaaaac taatcgcaag 4800

gtgacagtca agcagctcaa agaggattat ttcaagaaaa ttgaatgttt cgactctgtg 4860

gagatatcag gagtcgaaga taggtttaac gcttcccttg gcacatacca tgacctcctt 4920

aagatcatta aggacaaaga tttcctggat aacgaggaaa atgaggacat cctcgaagat 4980

attgttctta ccttgacgct gtttgaggat cgcgaaatga tcgaggaacg gcttaagacg 5040

tatgctcact tgttcgacga taaggttatg aagcagctca agcgtagaag gtacactgga 5100

tggggccgtc tgtctagaaa gctcatcaac ggaatacgtg ataaacaaag tggcaagaca 5160

attttggatt ttctgaagtc ggacggattc gccaacagaa attttatgca gctgattcat 5220

gacgatagtc tcaccttcaa agaggacata cagaaggctc aagtgagtgg tcaaggggat 5280

tcgctgcatg aacacatcgc aaacctcgcg ggttcaccgg ccataaagaa aggaatcctt 5340

caaactgtta aggtcgttga tgagttggtt aaagtgatgg gtaggcacaa gcccgaaaac 5400

atagtgatcg agatggctcg cgaaaatcag actacacaaa aagggcagaa gaactctcgc 5460

gagcggatga aaaggattga ggaaggaatc aaggaactgg gctcacagat tctcaaagag 5520

catccagtcg aaaacacaca gctgcaaaat gagaagctct atctttacta tctccaaaat 5580

ggccgggaca tgtatgttga tcaggagctt gacatcaacc gtttgtccga ctatgatgtg 5640

gaccacattg tcccgcaatc tttccttaag gacgattcaa tcgataataa ggtgttgacc 5700

cggagcgata aaaaccgtgg aaagtctgac aatgtccctt cagaggaagt ggttaagaag 5760

atgaagaact actggagaca attgctgaat gcaaaactga tcacacagag aaagttcgac 5820

aacctcacca aagcagagag aggtgggctc agtgaacttg ataaagcggg cttcattaag 5880

cgtcagctcg ttgagactag acagatcacg aagcatgtcg cgcagatttt ggattcgcgg 5940

atgaacacga agtacgacga gaatgataaa ctgatacgtg aagtcaaggt tatcactctt 6000

aagtccaaat tggtgagcga tttcagaaag gacttccaat tctataaggt cagggagatc 6060

aacaattatc atcacgctca cgatgcctac cttaatgctg ttgtggggac cgcccttatt 6120

aagaaatacc ctaaattgga gtctgaattc gtttacgggg attataaggt ctacgacgtt 6180

aggaaaatga tagctaagag tgagcaggag atcggtaaag caactgcgaa gtatttcttt 6240

tactcgaaca tcatgaattt ctttaagacc gagataacgc tggcaaatgg cgaaattaga 6300

aagaggcctc tcatagagac taacggtgag acaggggaaa tcgtctggga taagggtagg 6360

gactttgcga cagtgcgcaa ggtcctctct atgccgcaag ttaatattgt gaagaaaacc 6420

gaggtgcaga cgggaggctt ctccaaggaa agcatacttc ccaaacggaa ctctgataag 6480

ttgatcgctc gtaagaaaga ttgggaccct aagaaatatg gtgggttcga ttccccaact 6540

gttgcttaca gcgtgctggt cgttgccaag gtcgagaagg gtaaatccaa gaaactcaaa 6600

agcgttaagg aactccttgg gattactatc atggagagat cttcattcga aaagaatcct 6660

atcgactttc ttgaggccaa aggatataag gaagttaaga aagatctgat aatcaaactc 6720

ccaaagtact cattgtttga gctggaaaac ggcaggaagc gcatgcttgc ttccgccgga 6780

gagttgcaga aagggaacga gttggctctg ccttctaagt atgttaactt cctctatctt 6840

gcctctcatt acgagaagct caaaggctca ccagaggaca acgaacagaa acaacttttt 6900

gtcgagcaac ataagcacta tttggatgag attatagaac agatcagtga attctcgaaa 6960

agggttatcc ttgcagatgc gaatcttgac aaggtgttgt ctgcatacaa caaacataga 7020

gataagccga tcagggagca agcggaaaat atcattcacc tcttcactct tacaaacttg 7080

ggtgctcccg ctgccttcaa gtattttgat accacgattg accggaaacg ttacacctca 7140

acgaaggagg tgctggatgc caccctcatc caccaatcta ttaccggact ctacgagact 7200

agaatcgatc tctcacagct cggcggggat aaaagaccag cagcgacgaa aaaggcagga 7260

caggctaaga agaagaaaga gctcggagga ggaggcacgg gaggaggagg ctccgccgag 7320

tatgtgcgcg cgctcttcga cttcaacggc aatgacgagg aggatctccc tttcaagaag 7380

ggcgacatcc tccgcatccg cgataagccg gaggagcagt ggtggaacgc agaggactcc 7440

gagggcaagc ggggcatgat cctggtgcca tacgtcgaga agtacagcgg cgattacaag 7500

gaccacgatg gcgactacaa ggatcatgac atcgattaca aggacgatga cgataagtcc 7560

ggcgtcgaca tgacggacgc ggagtatgtg cgcatccacg agaagctcga tatctacacc 7620

ttcaagaagc agttcttcaa caataagaag tcggtgtccc atcggtgcta cgtcctcttc 7680

gagctgaagc gcaggggaga gcgccgcgcc tgcttctggg gctacgcggt gaataagccg 7740

cagtcaggca cagagcgcgg catccacgcc gagatcttct cgatccggaa ggtcgaggag 7800

tacctccgcg acaacccagg ccagttcacg atcaattggt actccagctg gtccccttgc 7860

gcagattgcg cagagaagat cctcgagtgg tacaaccagg agctgagggg caatggccat 7920

accctcaaga tctgggcctg caagctgtac tacgagaaga acgcgaggaa tcagatcggc 7980

ctctggaacc tgcgggataa tggcgtgggc ctcaacgtga tggtgtccga gcactaccag 8040

tgctgccgca agatcttcat ccagtcctcc cacaatcagc tgaacgagaa taggtggctc 8100

gaaaagaccc tgaagcgcgc cgagaagtgg aggagcgagc tgtctatcat gatccaggtc 8160

aagatcctgc acaccacaaa gtcaccggcg gtgggcggcg gcggcagcga attctccggc 8220

ggcagcacga acctcagcga catcatcgag aaggagacag gcaagcagct cgtgatccag 8280

gagtctatcc tcatgctgcc tgaggaggtg gaggaggtca tcggcaacaa gccggagtcc 8340

gatatcctcg tgcacaccgc ctacgacgag tcgacagatg agaatgtcat gctcctgacc 8400

tccgacgcac cagagtacaa gccatgggcg ctcgtgatcc aggattccaa cggcgagaat 8460

aagatcaaga tgctgtctgg cggctccccg aagaagaagc gcaaggtcta gactagtctg 8520

aaatcaccag tctctctcta caaatctatc tctctctata ataatgtgtg agtagttccc 8580

agataaggga attagggttc ttatagggtt tcgctcatgt gttgagcata taagaaaccc 8640

ttagtatgta tttgtatttg taaaatactt ctatcaataa aatttctaat tcctaaaacc 8700

aaaatccagt ggggcgcccg acctgtactc gcgaaggtta acttacagag agtgtccggg 8760

cgcgcctggt ggatcgtccg cctaggctgc agtgcagcgt gacccggtcg tgcccctctc 8820

tagagataat gagcattgca tgtctaagtt ataaaaaatt accacatatt ttttttgtca 8880

cacttgtttg aagtgcagtt tatctatctt tatacatata tttaaacttt actctacgaa 8940

taatataatc tatagtacta caataatatc agtgttttag agaatcatat aaatgaacag 9000

ttagacatgg tctaaaggac aattgagtat tttgacaaca ggactctaca gttttatctt 9060

tttagtgtgc atgtgttctc cttttttttt gcaaatagct tcacctatat aatacttcat 9120

ccattttatt agtacatcca tttagggttt agggttaatg gtttttatag actaattttt 9180

ttagtacatc tattttattc tattttagcc tctaaattaa gaaaactaaa actctatttt 9240

agttttttta tttaataatt tagatataaa atagaataaa ataaagtgac taaaaattaa 9300

acaaataccc tttaagaaat taaaaaaact aaggaaacat ttttcttgtt tcgagtagat 9360

aatgccagcc tgttaaacgc cgtcgacgag tctaacggac accaaccagc gaaccagcag 9420

cgtcgcgtcg ggccaagcga agcagacggc acggcatctc tgtcgctgcc tctggacccc 9480

tctcgagagt tccgctccac cgttggactt gctccgctgt cggcatccag aaattgcgtg 9540

gcggagcggc agacgtgagc cggcacggca ggcggcctcc tcctcctctc acggcaccgg 9600

cagctacggg ggattccttt cccaccgctc cttcgctttc ccttcctcgc ccgccgtaat 9660

aaatagacac cccctccaca ccctctttcc ccaacctcgt gttgttcgga gcgcacacac 9720

acacaaccag atctccccca aatccacccg tcggcacctc cgcttcaagg tacgccgctc 9780

gtcctccccc cccccccctc tctaccttct ctagatcggc gttccggtcc atggttaggg 9840

cccggtagtt ctacttctgt tcatgtttgt gttagatccg tgtttgtgtt agatccgtgc 9900

tgctagcgtt cgtacacgga tgcgacctgt acgtcagaca cgttctgatt gctaacttgc 9960

cagtgtttct ctttggggaa tcctgggatg gctctagccg ttccgcagac gggatcgatt 10020

tcatgatttt ttttgtttcg ttgcataggg tttggtttgc ccttttcctt tatttcaata 10080

tatgccgtgc acttgtttgt cgggtcatct tttcatgctt ttttttgtct tggttgtgat 10140

gatgtggtct ggttgggcgg tcgttctaga tcggagtaga attctgtttc aaactacctg 10200

gtggatttat taattttgga tctgtatgtg tgtgccatac atattcatag ttacgaattg 10260

aagatgatgg atggaaatat cgatctagga taggtataca tgttgatgcg ggttttactg 10320

atgcatatac agagatgctt tttgttcgct tggttgtgat gatgtggtgt ggttgggcgg 10380

tcgttcattc gttctagatc ggagtagaat actgtttcaa actacctggt gtatttatta 10440

attttggaac tgtatgtgtg tgtcatacat cttcatagtt acgagtttaa gatggatgga 10500

aatatcgatc taggataggt atacatgttg atgtgggttt tactgatgca tatacatgat 10560

ggcatatgca gcatctattc atatgctcta accttgagta cctatctatt ataataaaca 10620

agtatgtttt ataattattt tgatcttgat atacttggat gatggcatat gcagcagcta 10680

tatgtggatt tttttagccc tgccttcata cgctatttat ttgcttggta ctgtttcttt 10740

tgtcgatgct caccctgttg tttggtgtta cttctgcagg agctcatgaa aaagcctgaa 10800

ctcaccgcga cgtctgtcga gaagtttctg atcgaaaagt tcgacagcgt ctccgacctg 10860

atgcagctct cggagggcga agaatctcgt gctttcagct tcgatgtagg agggcgtgga 10920

tatgtcctgc gggtaaatag ctgcgccgat ggtttctaca aagatcgtta tgtttatcgg 10980

cactttgcat cggccgcgct cccgattccg gaagtgcttg acattgggga gtttagcgag 11040

agcctgacct attgcatctc ccgccgttca cagggtgtca cgttgcaaga cctgcctgaa 11100

accgaactgc ccgctgttct acaaccggtc gcggaggcta tggatgcgat cgctgcggcc 11160

gatcttagcc agacgagcgg gttcggccca ttcggaccgc aaggaatcgg tcaatacact 11220

acatggcgtg atttcatatg cgcgattgct gatccccatg tgtatcactg gcaaactgtg 11280

atggacgaca ccgtcagtgc gtccgtcgcg caggctctcg atgagctgat gctttgggcc 11340

gaggactgcc ccgaagtccg gcacctcgtg cacgcggatt tcggctccaa caatgtcctg 11400

acggacaatg gccgcataac agcggtcatt gactggagcg aggcgatgtt cggggattcc 11460

caatacgagg tcgccaacat cttcttctgg aggccgtggt tggcttgtat ggagcagcag 11520

acgcgctact tcgagcggag gcatccggag cttgcaggat cgccacgact ccgggcgtat 11580

atgctccgca ttggtcttga ccaactctat cagagcttgg ttgacggcaa tttcgatgat 11640

gcagcttggg cgcagggtcg atgcgacgca atcgtccgat ccggagccgg gactgtcggg 11700

cgtacacaaa tcgcccgcag aagcgcggcc gtctggaccg atggctgtgt agaagtactc 11760

gccgatagtg gaaaccgacg ccccagcact cgtccgaggg caaagaaata ggatcgttca 11820

aacatttggc aataaagttt cttaagattg aatcctgttg ccggtcttgc gatgattatc 11880

atataatttc tgttgaatta cgttaagcat gtaataatta acatgtaatg catgacgtta 11940

tttatgagat gggtttttat gattagagtc ccgcaattat acatttaata cgcgatagaa 12000

aacaaaatat agcgcgcaaa ctaggataaa ttatcgcgcg cggtgtcatc tatgttacta 12060

gatctgtagc cctgcaggac gcgtttaatt aagtgcacgc ggccgcctac ttagtcaaga 12120

gcctcgcacg cgactgtcac gcggccagga tcgcctcgtg agcctcgcaa tctgtaccta 12180

gtgtttaaac tatcagtgtt tgacaggata tattggcggg taaacctaag agaaaagagc 12240

gtttattaga ataacggata tttaaaaggg cgtgaaaagg tttatccgtt cgtccatttg 12300

tatgtgcatg ccaaccacag ggttcccctc gggatcaaag tactttgatc caacccctcc 12360

gctgctatag tgcagtcggc ttctgacgtt cagtgcagcc gtcttctgaa aacgacatgt 12420

cgcacaagtc ctaagttacg cgacaggctg ccgccctgcc cttttcctgg cgttttcttg 12480

tcgcgtgttt tagtcgcata aagtagaata cttgcgacta gaaccggaga cattacgcca 12540

tgaacaagag cgccgccgct ggcctgctgg gctatgcccg cgtcagcacc gacgaccagg 12600

acttgaccaa ccaacgggcc gaactgcacg cggccggctg caccaagctg ttttccgaga 12660

agatcaccgg caccaggcgc gaccgcccgg agctggccag gatgcttgac cacctacgcc 12720

ctggcgacgt tgtgacagtg accaggctag accgcctggc ccgcagcacc cgcgacctac 12780

tggacattgc cgagcgcatc caggaggccg gcgcgggcct gcgtagcctg gcagagccgt 12840

gggccgacac caccacgccg gccggccgca tggtgttgac cgtgttcgcc ggcattgccg 12900

agttcgagcg ttccctaatc atcgaccgca cccggagcgg gcgcgaggcc gccaaggccc 12960

gaggcgtgaa gtttggcccc cgccctaccc tcaccccggc acagatcgcg cacgcccgcg 13020

agctgatcga ccaggaaggc cgcaccgtga aagaggcggc tgcactgctt ggcgtgcatc 13080

gctcgaccct gtaccgcgca cttgagcgca gcgaggaagt gacgcccacc gaggccaggc 13140

ggcgcggtgc cttccgtgag gacgcattga ccgaggccga cgccctggcg gccgccgaga 13200

atgaacgcca agaggaacaa gcatgaaacc gcaccaggac ggccaggacg aaccgttttt 13260

cattaccgaa gagatcgagg cggagatgat cgcggccggg tacgtgttcg agccgcccgc 13320

gcacgtctca accgtgcggc tgcatgaaat cctggccggt ttgtctgatg ccaagctggc 13380

ggcctggccg gccagcttgg ccgctgaaga aaccgagcgc cgccgtctaa aaaggtgatg 13440

tgtatttgag taaaacagct tgcgtcatgc ggtcgctgcg tatatgatgc gatgagtaaa 13500

taaacaaata cgcaagggga acgcatgaag gttatcgctg tacttaacca gaaaggcggg 13560

tcaggcaaga cgaccatcgc aacccatcta gcccgcgccc tgcaactcgc cggggccgat 13620

gttctgttag tcgattccga tccccagggc agtgcccgcg attgggcggc cgtgcgggaa 13680

gatcaaccgc taaccgttgt cggcatcgac cgcccgacga ttgaccgcga cgtgaaggcc 13740

atcggccggc gcgacttcgt agtgatcgac ggagcgcccc aggcggcgga cttggctgtg 13800

tccgcgatca aggcagccga cttcgtgctg attccggtgc agccaagccc ttacgacata 13860

tgggccaccg ccgacctggt ggagctggtt aagcagcgca ttgaggtcac ggatggaagg 13920

ctacaagcgg cctttgtcgt gtcgcgggcg atcaaaggca cgcgcatcgg cggtgaggtt 13980

gccgaggcgc tggccgggta cgagctgccc attcttgagt cccgtatcac gcagcgcgtg 14040

agctacccag gcactgccgc cgccggcaca accgttcttg aatcagaacc cgagggcgac 14100

gctgcccgcg aggtccaggc gctggccgct gaaattaaat caaaactcat ttgagttaat 14160

gaggtaaaga gaaaatgagc aaaagcacaa acacgctaag tgccggccgt ccgagcgcac 14220

gcagcagcaa ggctgcaacg ttggccagcc tggcagacac gccagccatg aagcgggtca 14280

actttcagtt gccggcggag gatcacacca agctgaagat gtacgcggta cgccaaggca 14340

agaccattac cgagctgcta tctgaataca tcgcgcagct accagagtaa atgagcaaat 14400

gaataaatga gtagatgaat tttagcggct aaaggaggcg gcatggaaaa tcaagaacaa 14460

ccaggcaccg acgccgtgga atgccccatg tgtggaggaa cgggcggttg gccaggcgta 14520

agcggctggg ttgtctgccg gccctgcaat ggcactggaa cccccaagcc cgaggaatcg 14580

gcgtgacggt cgcaaaccat ccggcccggt acaaatcggc gcggcgctgg gtgatgacct 14640

ggtggagaag ttgaaggccg cgcaggccgc ccagcggcaa cgcatcgagg cagaagcacg 14700

ccccggtgaa tcgtggcaag cggccgctga tcgaatccgc aaagaatccc ggcaaccgcc 14760

ggcagccggt gcgccgtcga ttaggaagcc gcccaagggc gacgagcaac cagatttttt 14820

cgttccgatg ctctatgacg tgggcacccg cgatagtcgc agcatcatgg acgtggccgt 14880

tttccgtctg tcgaagcgtg accgacgagc tggcgaggtg atccgctacg agcttccaga 14940

cgggcacgta gaggtttccg cagggccggc cggcatggcc agtgtgtggg attacgacct 15000

ggtactgatg gcggtttccc atctaaccga atccatgaac cgataccggg aagggaaggg 15060

agacaagccc ggccgcgtgt tccgtccaca cgttgcggac gtactcaagt tctgccggcg 15120

agccgatggc ggaaagcaga aagacgacct ggtagaaacc tgcattcggt taaacaccac 15180

gcacgttgcc atgcagcgta cgaagaaggc caagaacggc cgcctggtga cggtatccga 15240

gggtgaagcc ttgattagcc gctacaagat cgtaaagagc gaaaccgggc ggccggagta 15300

catcgagatc gagctagctg attggatgta ccgcgagatc acagaaggca agaacccgga 15360

cgtgctgacg gttcaccccg attacttttt gatcgatccc ggcatcggcc gttttctcta 15420

ccgcctggca cgccgcgccg caggcaaggc agaagccaga tggttgttca agacgatcta 15480

cgaacgcagt ggcagcgccg gagagttcaa gaagttctgt ttcaccgtgc gcaagctgat 15540

cgggtcaaat gacctgccgg agtacgattt gaaggaggag gcggggcagg ctggcccgat 15600

cctagtcatg cgctaccgca acctgatcga gggcgaagca tccgccggtt cctaatgtac 15660

ggagcagatg ctagggcaaa ttgccctagc aggggaaaaa ggtcgaaaag gtctctttcc 15720

tgtggatagc acgtacattg ggaacccaaa gccgtacatt gggaaccgga acccgtacat 15780

tgggaaccca aagccgtaca ttgggaaccg gtcacacatg taagtgactg atataaaaga 15840

gaaaaaaggc gatttttccg cctaaaactc tttaaaactt attaaaactc ttaaaacccg 15900

cctggcctgt gcataactgt ctggccagcg cacagccgaa gagctgcaaa aagcgcctac 15960

ccttcggtcg ctgcgctccc tacgccccgc cgcttcgcgt cggcctatcg cggccgctgg 16020

ccgctcaaaa atggctggcc tacggccagg caatctacca gggcgcggac aagccgcgcc 16080

gtcgccactc gaccgccggc gcccacatca aggcaccctg cctcgcgcgt ttcggtgatg 16140

acggtgaaaa cctctgacac atgcagctcc cggagacggt cacagcttgt ctgtaagcgg 16200

atgccgggag cagacaagcc cgtcagggcg cgtcagcggg tgttggcggg tgtcggggcg 16260

cagccatgac ccagtcacgt agcgatagcg gagtgtatac tggcttaact atgcggcatc 16320

agagcagatt gtactgagag tgcaccatat gcggtgtgaa ataccgcaca gatgcgtaag 16380

gagaaaatac cgcatcaggc gctcttccgc ttcctcgctc actgactcgc tgcgctcggt 16440

cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt tatccacaga 16500

atcaggggat aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg 16560

taaaaaggcc gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa 16620

aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga ctataaagat accaggcgtt 16680

tccccctgga agctccctcg tgcgctctcc tgttccgacc ctgccgctta ccggatacct 16740

gtccgccttt ctcccttcgg gaagcgtggc gctttctcat agctcacgct gtaggtatct 16800

cagttcggtg taggtcgttc gctccaagct gggctgtgtg cacgaacccc ccgttcagcc 16860

cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa gacacgactt 16920

atcgccactg gcagcagcca ctggtaacag gattagcaga gcgaggtatg taggcggtgc 16980

tacagagttc ttgaagtggt ggcctaacta cggctacact agaaggacag tatttggtat 17040

ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt gatccggcaa 17100

acaaaccacc gctggtagcg gtggtttttt tgtttgcaag cagcagatta cgcgcagaaa 17160

aaaaggatct caagaagatc ctttgatctt ttctacgggg tctgacgctc agtggaacga 17220

aaactcacgt taagggattt tggtcatgca ttctaggtac taaaacaatt catccagtaa 17280

aatataatat tttattttct cccaatcagg cttgatcccc agtaagtcaa aaaatagctc 17340

gacatactgt tcttccccga tatcctccct gatcgaccgg acgcagaagg caatgtcata 17400

ccacttgtcc gccctgccgc ttctcccaag atcaataaag ccacttactt tgccatcttt 17460

cacaaagatg ttgctgtctc ccaggtcgcc gtgggaaaag acaagttcct cttcgggctt 17520

ttccgtcttt aaaaaatcat acagctcgcg cggatcttta aatggagtgt cttcttccca 17580

gttttcgcaa tccacatcgg ccagatcgtt attcagtaag taatccaatt cggctaagcg 17640

gctgtctaag ctattcgtat agggacaatc cgatatgtcg atggagtgaa agagcctgat 17700

gcactccgca tacagctcga taatcttttc agggctttgt tcatcttcat actcttccga 17760

gcaaaggacg ccatcggcct cactcatgag cagattgctc cagccatcat gccgttcaaa 17820

gtgcaggacc tttggaacag gcagctttcc ttccagccat agcatcatgt ccttttcccg 17880

ttccacatca taggtggtcc ctttataccg gctgtccgtc atttttaaat ataggttttc 17940

attttctccc accagcttat ataccttagc aggagacatt ccttccgtat cttttacgca 18000

gcggtatttt tcgatcagtt ttttcaattc cggtgatatt ctcattttag ccatttatta 18060

tttccttcct cttttctaca gtatttaaag ataccccaag aagctaatta taacaagacg 18120

aactccaatt cactgttcct tgcattctaa aaccttaaat accagaaaac agctttttca 18180

aagttgtttt caaagttggc gtataacata gtatcgacgg agccgatttt gaaaccgcgg 18240

tgatcacagg cagcaacgct ctgtcatcgt tacaatcaac atgctaccct ccgcgagatc 18300

atccgtgttt caaacccggc agcttagttg ccgttcttcc gaatagcatc ggtaacatga 18360

gcaaagtctg ccgccttaca acggctctcc cgctgacgcc gtcccggact gatgggctgc 18420

ctgtatcgag tggtgatttt gtgccgagct gccggtcggg gagctgttgg ctggct 18476

<210> 2

<211> 7423

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 2

acaaattcgg gtcaaggcgg aagccagcgc gccaccccac gtcagcaaat acggaggcgc 60

ggggttgacg gcgtcacccg gtcctaacgg cgaccaacaa accagccaga agaaattaca 120

gtaaaaaaaa agtaaattgc actttgatcc accttttatt acctaagtct caatttggat 180

cacccttaaa cctatctttt caatttgggc cgggttgtgg tttggactac catgaacaac 240

ttttcgtcat gtctaacttc cctttcagca aacatatgaa ccatatatag aggagatcgg 300

ccgtatacta gagctgatgt gtttaaggtc gttgattgca cgagaaaaaa aaatccaaat 360

cgcaacaata gcaaatttat ctggttcaaa gtgaaaagat atgtttaaag gtagtccaaa 420

gtaaaactta tagataataa aatgtggtcc aaagcgtaat tcactcaaaa aaaatcaacg 480

agacgtgtac caaacggaga caaacggcat cttctcgaaa tttcccaacc gctcgctcgc 540

ccgcctcgtc ttcccggaaa ccgcggtggt ttcagcgtgg cggattctcc aagcagacgg 600

agacgtcacg gcacgggact cctcccacca cccaaccgcc ataaatacca gccccctcat 660

ctcctctcct cgcatcagct ccacccccga aaaatttctc cccaatctcg cgaggctctc 720

gtcgtcgaat cgaatcctct cgcgtcctca aggtacgctg cttctcctct cctcgcttcg 780

tttcgattcg atttcggacg ggtgaggttg ttttgttgct agatccgatt ggtggttagg 840

gttgtcgatg tgattatcgt gagatgttta ggggttgtag atctgatggt tgtgatttgg 900

gcacggttgg ttcgataggt ggaatcgtgg ttaggttttg ggattggatg ttggttctga 960

tgattggggg gaatttttac ggttagatga attgttggat gattcgattg gggaaatcgg 1020

tgtagatctg ttggggaatt gtggaactag tcatgcctga gtgattggtg cgatttgtag 1080

cgtgttccat cttgtaggcc ttgttgcgag catgttcaga tctactgttc cgctcttgat 1140

tgagttattg gtgccatggg ttggtgcaaa cacaggcttt aatatgttat atctgttttg 1200

tgtttgatgt agatctgtag ggtagttctt cttagacatg gttcaattat gtagcttgtg 1260

cgtttcgatt tgatttcata tgttcacaga ttagataatg atgaactctt ttaattaatt 1320

gtcaatggta aataggaagt cttgtcgcta tatctgtcat aatgatctca tgttactatc 1380

tgccagtaat ttatgctaag aactatatta gaatatcatg ttacaatctg tagtaatatc 1440

atgttacaat ctgtagttca tctatataat ctattgtggt aatttctttt tactatctgt 1500

gtgaagatta ttgccactag ttcattctac ttatttctga agttcaggat acgtgtgctg 1560

ttactaccta tctgaataca tgtgtgatgt gcctgttact atctttttga atacatgtat 1620

gttctgttgg aatatgtttg ctgtttgatc cgttgttgtg tccttaatct tgtgctagtt 1680

cttaccctat ctgtttggtg attatttctt gcagtacgta atggactaca aggaccacga 1740

cggggattac aaagaccacg acatagacta caaggatgac gatgacaaaa tggcaccgaa 1800

gaaaaaaagg aaggtcggaa tccatggcgt tccagctgcc gataagaaat attccatcgg 1860

actcgccatt ggcacgaata gcgtcggatg ggctgttatt actgatgagt acaaagttcc 1920

gtctaagaag ttcaaggtgc tgggcaacac agaccgccac agcataaaga aaaatctcat 1980

cggtgcactc cttttcgata gtggggagac tgcagaagcg acaagattga aaaggactgc 2040

gagaaggcgc tatacacggc gtaagaatag aatctgctac cttcaggaga ttttctctaa 2100

cgaaatggct aaggtcgatg acagtttctt tcatagactt gaggaatcgt tcttggttga 2160

ggaggataag aaacatgaga ggcacccgat atttggaaac atcgtggatg aggtcgcata 2220

tcatgaaaag taccccacaa tctaccacct gagaaagaaa ctcgttgatt ccaccgacaa 2280

agcggatttg agactcatct acctcgctct tgcccatatg ataaagttcc gcggacactt 2340

tctgatcgag ggcgacctca accctgataa tagcgacgtc gataagctct tcatccagtt 2400

ggttcaaacc tacaatcagc tctttgagga aaacccaatt aatgctagtg gagtggatgc 2460

aaaagcgata ctgtcggcca gactctccaa gagcagaagg ttggagaacc tgatcgctca 2520

acttcctgga gaaaagaaaa acggtctttt tgggaatttg attgccttgt ctctgggcct 2580

cacaccaaac ttcaagtcaa attttgacct cgctgaggat gccaaacttc agttgtctaa 2640

ggatacctat gatgacgatc ttgacaattt gctggcacaa attggcgacc agtacgcgga 2700

tctgttcctc gcagcgaaga atctgagtga tgctattctc ctttcggaca tactcagggt 2760

taacactgag atcacaaaag cacctttgag tgcgtcgatg attaagcgct atgatgaaca 2820

tcaccaagac ctcactttgc tgaaggccct tgtgcggcag caattgccag agaagtacaa 2880

agaaatcttc tttgaccaat ctaagaacgg atacgctggc tatattgatg gaggagcttc 2940

tcaggaggaa ttctataagt ttatcaaacc tatacttgag aagatggatg gtacagagga 3000

actccttgtt aaattgaaca gagaagattt gctgcgcaag caacggacct ttgacaacgg 3060

atcaattccg catcagatac acctcggcga gcttcatgcc atccttcgcc ggcaggaaga 3120

tttctacccc tttttgaagg acaaccgcga gaagatagaa aaaatcctta cgttccggat 3180

tccttactat gtgggtccat tggcaagggg gaattcccgc tttgcgtgga tgactcggaa 3240

aagcgaggaa actatcacac cgtggaactt cgaggaagtt gtggacaagg gagcttctgc 3300

ccaatcattc attgagagga tgactaactt cgataagaac ctgccgaacg agaaagttct 3360

ccccaagcac tccctccttt acgagtattt caccgtgtat aacgaactta cgaaggttaa 3420

atacgtgact gagggtatga ggaagccagc attcttgagc ggggaacaaa agaaagcgat 3480

tgttgatttg ctgtttaaaa ctaatcgcaa ggtgacagtc aagcagctca aagaggatta 3540

tttcaagaaa attgaatgtt tcgactctgt ggagatatca ggagtcgaag ataggtttaa 3600

cgcttccctt ggcacatacc atgacctcct taagatcatt aaggacaaag atttcctgga 3660

taacgaggaa aatgaggaca tcctcgaaga tattgttctt accttgacgc tgtttgagga 3720

tcgcgaaatg atcgaggaac ggcttaagac gtatgctcac ttgttcgacg ataaggttat 3780

gaagcagctc aagcgtagaa ggtacactgg atggggccgt ctgtctagaa agctcatcaa 3840

cggaatacgt gataaacaaa gtggcaagac aattttggat tttctgaagt cggacggatt 3900

cgccaacaga gcttttgcgg cactgattgc tgacgatagt ctcaccttca aagaggacat 3960

acagaaggct caagtgagtg gtcaagggga ttcgctgcat gaacacatcg caaacctcgc 4020

gggttcaccg gccataaaga aaggaatcct tcaaactgtt aaggtcgttg atgagttggt 4080

taaagtgatg ggtaggcaca agcccgaaaa catagtgatc gagatggctc gcgaaaatca 4140

gactacacaa aaagggcaga agaactctcg cgagcggatg aaaaggattg aggaaggaat 4200

caaggaactg ggctcacaga ttctcaaaga gcatccagtc gaaaacacac agctgcaaaa 4260

tgagaagctc tatctttact atctccaaaa tggccgggac atgtatgttg atcaggagct 4320

tgacatcaac cgtttgtccg actatgatgt ggaccacatt gtcccgcaat ctttccttaa 4380

ggacgattca atcgataata aggtgttgac ccggagcgat aaaaaccgtg gaaagtctga 4440

caatgtccct tcagaggaag tggttaagaa gatgaagaac tactggagac aattgctgaa 4500

tgcaaaactg atcacacaga gaaagttcga caacctcacc aaagcagaga gaggtgggct 4560

cagtgaactt gataaagcgg gcttcattaa gcgtcagctc gttgagacta gacagatcac 4620

gaagcatgtc gcgcagattt tggattcgcg gatgaacacg aagtacgacg agaatgataa 4680

actgatacgt gaagtcaagg ttatcactct taagtccaaa ttggtgagcg atttcagaaa 4740

ggacttccaa ttctataagg tcagggagat caacaattat catcacgctc acgatgccta 4800

ccttaatgct gttgtgggga ccgcccttat taagaaatac cctaaattgg agtctgaatt 4860

cgtttacggg gattataagg tctacgacgt taggaaaatg atagctaaga gtgagcagga 4920

gatcggtaaa gcaactgcga agtatttctt ttactcgaac atcatgaatt tctttaagac 4980

cgagataacg ctggcaaatg gcgaaattag aaagaggcct ctcatagaga ctaacggtga 5040

gacaggggaa atcgtctggg ataagggtag ggactttgcg acagtgcgca aggtcctctc 5100

tatgccgcaa gttaatattg tgaagaaaac cgaggtgcag acgggaggct tctccaagga 5160

aagcatactt cccaaacgga actctgataa gttgatcgct cgtaagaaag attgggaccc 5220

taagaaatat ggtgggttcg attccccaac tgttgcttac agcgtgctgg tcgttgccaa 5280

ggtcgagaag ggtaaatcca agaaactcaa aagcgttaag gaactccttg ggattactat 5340

catggagaga tcttcattcg aaaagaatcc tatcgacttt cttgaggcca aaggatataa 5400

ggaagttaag aaagatctga taatcaaact cccaaagtac tcattgtttg agctggaaaa 5460

cggcaggaag cgcatgcttg cttccgccgg agagttgcag aaagggaacg agttggctct 5520

gccttctaag tatgttaact tcctctatct tgcctctcat tacgagaagc tcaaaggctc 5580

accagaggac aacgaacaga aacaactttt tgtcgagcaa cataagcact atttggatga 5640

gattatagaa cagatcagtg aattctcgaa aagggttatc cttgcagatg cgaatcttga 5700

caaggtgttg tctgcataca acaaacatag agataagccg atcagggagc aagcggaaaa 5760

tatcattcac ctcttcactc ttacaaactt gggtgctccc gctgccttca agtattttga 5820

taccacgatt gaccggaaac gttacacctc aacgaaggag gtgctggatg ccaccctcat 5880

ccaccaatct attaccggac tctacgagac tagaatcgat ctctcacagc tcggcgggga 5940

taaaagacca gcagcgacga aaaaggcagg acaggctaag aagaagaaag agctcggagg 6000

aggaggcacg ggaggaggag gctccgccga gtatgtgcgc gcgctcttcg acttcaacgg 6060

caatgacgag gaggatctcc ctttcaagaa gggcgacatc ctccgcatcc gcgataagcc 6120

ggaggagcag tggtggaacg cagaggactc cgagggcaag cggggcatga tcctggtgcc 6180

atacgtcgag aagtacagcg gcgattacaa ggaccacgat ggcgactaca aggatcatga 6240

catcgattac aaggacgatg acgataagtc cggcgtcgac atgacggacg cggagtatgt 6300

gcgcatccac gagaagctcg atatctacac cttcaagaag cagttcttca acaataagaa 6360

gtcggtgtcc catcggtgct acgtcctctt cgagctgaag cgcaggggag agcgccgcgc 6420

ctgcttctgg ggctacgcgg tgaataagcc gcagtcaggc acagagcgcg gcatccacgc 6480

cgagatcttc tcgatccgga aggtcgagga gtacctccgc gacaacccag gccagttcac 6540

gatcaattgg tactccagct ggtccccttg cgcagattgc gcagagaaga tcctcgagtg 6600

gtacaaccag gagctgaggg gcaatggcca taccctcaag atctgggcct gcaagctgta 6660

ctacgagaag aacgcgagga atcagatcgg cctctggaac ctgcgggata atggcgtggg 6720

cctcaacgtg atggtgtccg agcactacca gtgctgccgc aagatcttca tccagtcctc 6780

ccacaatcag ctgaacgaga ataggtggct cgaaaagacc ctgaagcgcg ccgagaagtg 6840

gaggagcgag ctgtctatca tgatccaggt caagatcctg cacaccacaa agtcaccggc 6900

ggtgggcggc ggcggcagcg aattctccgg cggcagcacg aacctcagcg acatcatcga 6960

gaaggagaca ggcaagcagc tcgtgatcca ggagtctatc ctcatgctgc ctgaggaggt 7020

ggaggaggtc atcggcaaca agccggagtc cgatatcctc gtgcacaccg cctacgacga 7080

gtcgacagat gagaatgtca tgctcctgac ctccgacgca ccagagtaca agccatgggc 7140

gctcgtgatc caggattcca acggcgagaa taagatcaag atgctgtctg gcggctcccc 7200

gaagaagaag cgcaaggtct agactagtct gaaatcacca gtctctctct acaaatctat 7260

ctctctctat aataatgtgt gagtagttcc cagataaggg aattagggtt cttatagggt 7320

ttcgctcatg tgttgagcat ataagaaacc cttagtatgt atttgtattt gtaaaatact 7380

tctatcaata aaatttctaa ttcctaaaac caaaatccag tgg 7423

<210> 3

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

tagcgacggc gagcaagtgg 20

<210> 4

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

cgatgacggc gagcaagtgg 20

<210> 5

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

cggcagcggc gagcaagtgg 20

<210> 6

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 6

cggcgatagc gagcaagtgg 20

<210> 7

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

cggcgacgat gagcaagtgg 20

<210> 8

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

cggcgacggc aggcaagtgg 20

<210> 9

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

cggcgacggc gaataagtgg 20

<210> 10

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

cggcgacggc gagcgggtgg 20

<210> 11

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 11

cggcgacggc gagcaaacgg 20

<210> 12

<211> 20

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

cggcgacggc gagcaagtaa 20

<210> 13

<211> 1833

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 13

Met Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys Asp His Asp Ile Asp

1 5 10 15

Tyr Lys Asp Asp Asp Asp Lys Met Ala Pro Lys Lys Lys Arg Lys Val

20 25 30

Gly Ile His Gly Val Pro Ala Ala Asp Lys Lys Tyr Ser Ile Gly Leu

35 40 45

Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr

50 55 60

Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His

65 70 75 80

Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu

85 90 95

Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr

100 105 110

Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu

115 120 125

Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe

130 135 140

Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn

145 150 155 160

Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His

165 170 175

Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu

180 185 190

Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu

195 200 205

Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe

210 215 220

Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile

225 230 235 240

Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser

245 250 255

Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys

260 265 270

Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr

275 280 285

Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln

290 295 300

Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln

305 310 315 320

Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser

325 330 335

Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr

340 345 350

Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His

355 360 365

Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu

370 375 380

Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly

385 390 395 400

Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys

405 410 415

Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu

420 425 430

Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser

435 440 445

Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg

450 455 460

Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu

465 470 475 480

Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg

485 490 495

Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile

500 505 510

Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln

515 520 525

Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu

530 535 540

Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr

545 550 555 560

Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro

565 570 575

Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe

580 585 590

Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe

595 600 605

Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp

610 615 620

Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile

625 630 635 640

Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu

645 650 655

Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu

660 665 670

Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys

675 680 685

Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys

690 695 700

Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp

705 710 715 720

Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Ala Phe Ala Ala Leu Ile

725 730 735

Ala Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val

740 745 750

Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly

755 760 765

Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp

770 775 780

Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile

785 790 795 800

Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser

805 810 815

Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser

820 825 830

Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu

835 840 845

Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp

850 855 860

Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile

865 870 875 880

Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu

885 890 895

Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu

900 905 910

Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala

915 920 925

Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg

930 935 940

Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu

945 950 955 960

Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser

965 970 975

Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val

980 985 990

Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp

995 1000 1005

Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His

1010 1015 1020

Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr

1025 1030 1035 1040

Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1045 1050 1055

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr

1060 1065 1070

Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu

1075 1080 1085

Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr

1090 1095 1100

Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala

1105 1110 1115 1120

Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys

1125 1130 1135

Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys

1140 1145 1150

Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys

1155 1160 1165

Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val

1170 1175 1180

Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys

1185 1190 1195 1200

Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn

1205 1210 1215

Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp

1220 1225 1230

Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly

1235 1240 1245

Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu

1250 1255 1260

Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His

1265 1270 1275 1280

Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1285 1290 1295

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile

1300 1305 1310

Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys

1315 1320 1325

Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln

1330 1335 1340

Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro

1345 1350 1355 1360

Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr

1365 1370 1375

Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr

1380 1385 1390

Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys

1395 1400 1405

Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys Glu

1410 1415 1420

Leu Gly Gly Gly Gly Thr Gly Gly Gly Gly Ser Ala Glu Tyr Val Arg

1425 1430 1435 1440

Ala Leu Phe Asp Phe Asn Gly Asn Asp Glu Glu Asp Leu Pro Phe Lys

1445 1450 1455

Lys Gly Asp Ile Leu Arg Ile Arg Asp Lys Pro Glu Glu Gln Trp Trp

1460 1465 1470

Asn Ala Glu Asp Ser Glu Gly Lys Arg Gly Met Ile Leu Val Pro Tyr

1475 1480 1485

Val Glu Lys Tyr Ser Gly Asp Tyr Lys Asp His Asp Gly Asp Tyr Lys

1490 1495 1500

Asp His Asp Ile Asp Tyr Lys Asp Asp Asp Asp Lys Ser Gly Val Asp

1505 1510 1515 1520

Met Thr Asp Ala Glu Tyr Val Arg Ile His Glu Lys Leu Asp Ile Tyr

1525 1530 1535

Thr Phe Lys Lys Gln Phe Phe Asn Asn Lys Lys Ser Val Ser His Arg

1540 1545 1550

Cys Tyr Val Leu Phe Glu Leu Lys Arg Arg Gly Glu Arg Arg Ala Cys

1555 1560 1565

Phe Trp Gly Tyr Ala Val Asn Lys Pro Gln Ser Gly Thr Glu Arg Gly

1570 1575 1580

Ile His Ala Glu Ile Phe Ser Ile Arg Lys Val Glu Glu Tyr Leu Arg

1585 1590 1595 1600

Asp Asn Pro Gly Gln Phe Thr Ile Asn Trp Tyr Ser Ser Trp Ser Pro

1605 1610 1615

Cys Ala Asp Cys Ala Glu Lys Ile Leu Glu Trp Tyr Asn Gln Glu Leu

1620 1625 1630

Arg Gly Asn Gly His Thr Leu Lys Ile Trp Ala Cys Lys Leu Tyr Tyr

1635 1640 1645

Glu Lys Asn Ala Arg Asn Gln Ile Gly Leu Trp Asn Leu Arg Asp Asn

1650 1655 1660

Gly Val Gly Leu Asn Val Met Val Ser Glu His Tyr Gln Cys Cys Arg

1665 1670 1675 1680

Lys Ile Phe Ile Gln Ser Ser His Asn Gln Leu Asn Glu Asn Arg Trp

1685 1690 1695

Leu Glu Lys Thr Leu Lys Arg Ala Glu Lys Trp Arg Ser Glu Leu Ser

1700 1705 1710

Ile Met Ile Gln Val Lys Ile Leu His Thr Thr Lys Ser Pro Ala Val

1715 1720 1725

Gly Gly Gly Gly Ser Glu Phe Ser Gly Gly Ser Thr Asn Leu Ser Asp

1730 1735 1740

Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile

1745 1750 1755 1760

Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu

1765 1770 1775

Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn

1780 1785 1790

Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu

1795 1800 1805

Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly

1810 1815 1820

Gly Ser Pro Lys Lys Lys Arg Lys Val

1825 1830

Claims

1. The fusion protein consists of nickase, cytosine nucleoside deaminase PmCDA1 and uracil DNA glucoamylase inhibitor UGI; the nicking enzyme, the cytosine nucleoside deaminase PmCDA1 and the uracil DNA glucoamylase inhibitor UGI in the fusion protein are sequentially arranged from the N end; the nicking enzyme is shown as amino acids from 1 st to 1423 rd positions of the N end of a sequence 13 in a sequence table.

2. A gene encoding the fusion protein of claim 1.

3. A base editing system comprising the fusion protein of claim 1.

4. The base editing system according to claim 3, wherein: the system also includes a sgRNA.

5. A recombinant expression vector, expression cassette or recombinant bacterium for expressing the base editing system of claim 3 or 4.

6. A recombinant expression vector for genome base substitution comprises an expression cassette A and an expression cassette B; the expression cassette A expresses the fusion protein of claim 1; the expression cassette B comprises n elements B; the element b comprises sgRNA and a target sequence; the recombinant expression vector can target n different target sequences for base substitution.

7. A method of base substitution in a plant genome comprising the steps of: the base substitution of plant genome is accomplished by using the base editing system of claim 3 or 4.

8. A method of base substitution in a plant genome comprising the steps of: the recombinant expression vector of claim 6 is introduced into a plant of interest to effect base substitution of the plant genome.

9. The nicking enzyme is shown as amino acids from 1 st to 1423 rd position of the N end of a sequence 13 in a sequence table.

10. Use of the fusion protein of claim 1 or the base editing system of claim 3 or 4 or the recombinant expression vector, expression cassette or recombinant bacterium of claim 5 or the recombinant expression vector of claim 6 or the nicking enzyme of claim 9 for base replacement in a plant genome; the base substitution is a substitution of base C to T.