JPH0358518B2

JPH0358518B2 -

Info

Publication number: JPH0358518B2
Application number: JP57184917A
Authority: JP
Inventors: Yoji Sugiura
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1982-10-20
Filing date: 1982-10-20
Publication date: 1991-09-05
Also published as: JPS5974595A

Description

[Detailed description of the invention]

本発明は音声の合成装置に関し、合成音声信号
の品質を高める事を目的としている。一般に音素片即ち単語、音節、あるいはこれよ
り更に短い音声セグメントを結合編集して合成さ
れた音声信号（単語、句、話声）の品質は音声の
構成単位であり音素片の接続部の処理によつて決
まると言える。例えば接続部に発生する波形の急
激な変化、即ち波形の不連続性は高調波ノイズの
原因となり、合成音のＳ／Ｎ比を低下させ、明瞭
度を落す。又、声帯振動の基本周波数であるピツ
チ周波数の変動は合成音声の自然性を劣化させる
ことも知られている。ピツチ周波数の変化に対し
て人間の聴覚はきわめて敏感であり（検知限は
0.1％と言われる）。結合された音素片のピツチ周
波数が不連続な場合、合成音声は聞きづらい不自
然なものとなる。本発明は音素片波形のパターンを認識して自然
な形で各音素片を結合することにより高品質の合
成音を得ることを可能とするものである。音素片
波形としては、自然音声からたとえばピツチ区間
毎に切り出したものを用いたり、別の音声合成装
置で一素片分を合成したものを取り出したりして
用いる方法があるが、本発明は比較的短時間の、
具体的には数拾ミリ秒の音素片を接続部に於ける
波形の不連続及びピツチ周波数の変動なしに結合
する方法を明らかにするものである。即ち、かか
る短時間の音素片は相隣る音素片の少なくとも結
合部については波形が類似しているはずであり、
従つて、各音素片の時間軸をそれぞれ若干修正す
ることにより、接続部をなめらかに結合して行く
ことが出来る。本発明は結合される音素片の接続
部について、波形の類似度を信号のレベルの形で
把握し、これにもとづき音素片の時間軸に適当な
時間的修正を施こすものである。本発明の詳細な内容について、以下音声の時間
軸変換装置をその具体的実施例として説明する。第１図は従来の時間軸伸長装置を例示するブロ
ツク図である。同図に於て端子１は音声入力端
子、２は出力端子、３及び４はいずれもＮビツト
の例えばBBDなどのアナログシフトレジスト、
５は低域通過フイルタ（LPF）である。６，７，
８及び９はアナログスイツチであり、入力端子１
からアナログシフトレジスタ３或いは４、LPF
５を経て出力端子２に至る音声信号をスイツチ制
御する。かつ、これらアナログスイツチはアナロ
グシフトレジスタ３，４の書込みクロツク回路１
０を２ｍＮ（ｍについては後述する）分周する分
周回路１１のＱ及び出力によつて図示の如く開
閉制御される。アナログシフトレジスタ３及び４はクロツク回
路１０及び分周回路１１のＱ、出力のANDゲ
ート１２及び１３によりORゲート１４及び１５
を介して交互に７込みクロツク制御され、又、読
出しクロツク回路１６及び分周回路１１のＱ、
出力のANDゲート１７及び１８により同じくOR
ゲート１４及び１５を介して交互に読出しクロツ
ク制御される。即ち、例えば入力端子に与えられ
た時間軸がｍ倍（ｍ＞１）に圧縮された音声信号
（かかる圧縮信号は、例えばテープレコーダの再
生速度を録音速度のｍ倍にすることにより得られ
る）は、分周回路１１のＱ出力が１のとき、アナ
ログスイツチ８を経てアナログシフトレジスタ４
に書込まれる。該シフトレジスタのビツト数はＮ
であるため、入力音声信号がmN個のサンプリン
グ列として順次入力を完了したとき、該シフトレ
ジスタにはmN個のサンプリング列の後端Ｎ個が
記憶され、分周回路１１のＱ出力は反転して０と
なり、スイツチ８を閉じる。同時に該分周回路の
Ｑ出力は１となり、スイツチ６を開いて、同様に
アナログシフトレジスタ３に書込みを行なう。こ
のとき図の構成から明らかなように、アナログシ
フトレジスタ４は読出しクロツク回路１６により
クロツクされて、同様に出力により制御されて
いるスイツチ９を経て読み出される。アナログシ
フトレジスタ３へ書き込み期間中、別のアナログ
シフトレジスタ４はこのように読み出しを行な
い、続いて分周回路１１のＱ、出力が反転する
と、再びアナログシフトレジスタ４が書込み、３
が読出しを行なう。ここで書込みクロツク回路１
０のクロツク周波数をf₁、読出しクロツク回路１
６のクロツク周波数をf₂としたとき、 f₁／f₂＝ｍ (1) となるように、各クロツク周波数を決めれば、時
間軸はｍ倍に伸長され、音声入力端子１に入力し
た圧縮音声は出力端子２に時間軸が復元されてあ
らわれる。読出しクロツク周波数f₂は、当然、必
要な出力音声周波数帯域に対しナイキストのサン
プリング定理を満たすように決められる。上述の如き従来装置に於ては、アナログシフト
レジスタ３及び４を交互に出力する音素片の接続
タイミングは、書き込みクロツク１０を2mN分
周する分周回路１１の出力によりmN／f₁秒毎に
自動的に決められるため、従つて第２図に図示す
るように音素片の接続部に不連続な波形変化とピ
ツチ周波数の変動とが発生する。前記の如く、こ
のような音素片の接続部に於ける波形やピツチの
不連続は音質や明瞭度をいちじるしく低下させ
る。次にこのような従来装置の欠点を改良するため
出願人は先に特願昭56−94802号（昭和56年６月
18日出願）に記載の如き技術を提案した。先ず、
この先願の技術内容について第３図のブロツク図
と共に説明する。同図に於いて、１０１は音声信
号入力端子、１０２は音声信号出力端子、１０３
は音声信号をデイジタルデータに変換するアナロ
グ−デイジタル変換回路（以下Ａ／Ｄと称す）で
ある。１０４は2Aバイトの記憶要素を持つラン
ダムアクセスメモリ（以下RAMと称す）であ
り、制御入力端子（LT₃）が論理レベル“０”の
ときデータ入力端子I₁〜Id（下位I₁）に与えられた
デイジタル値をアドレス入力端子A₁〜Aa（下位
A₁）により与えられるアドレスに記憶する。制
御入力端子LT₃が論理レベル“１”のときは、ア
ドレス入力端子A₁〜Aaにより与えられるアドレ
スの内容をデータ出力端子０₁〜０ｄに出力する。
１０６，１０８はクロツク発生回路である。クロ
ツク発生回路１０６の出力fRはORゲート１２０
を介して読出しカウンタ１０７のクロツク入力端
子Ｔに供給され、読出しカウンタよりなるアドレ
ス制御回路１０７の出力が歩進される。アドレス
制御回路１０７はＡビツトのカウンタであり、演
算制御回路１０５の出力により初期値が設定され
る。ここでこの初期値設定のしかたについて述べ
る。先づ演算制御回路１０５は読出しカウンタ１０
７のクリア入力端子GLにパルスを与えて読出し
カウンタ１０７の出力をクリアする。続いて演算
制御回路１０５のSC（Set Counter）端子から初
期値化すべき数のパルスをORゲート１２０の入
力に与える事により読出しカウンタ１０７の初期
値を設定する。尚、この初期値を設定する周期は
クロツク発生回路１０６の出力fRが所定計数さ
れる間隔であり、従つて、このときの読出しカウ
ンタ１０７の出力値は、前の周期で初期値化され
た値に所定数が加わつた値であり、この値を新た
に初期値設定すべき値から減じた数のクロツクを
ORゲート１２０を介して読出しカウンタ１０７
のクロツク入力端行Ｔに供給すればよい。この場
合、読出しカウンタをクリアする必要はない。
尚、以上述べた演算制御回路１０５による読出し
カウンタ１０７を歩進はクロツク発生回路１０６
の出力fRが論理レベル“０”のときに行なわな
ければならない。このfRの論理レベル“１”のときにも上述の
設定を行う場合は、ORゲート１２０のfRからの
入力端子の所に第４図に示すようにANDゲート
１２１をおき、一方の入力端子にこのfRを供給
し、他方の入力端子に演算制御回路１０５の出力
端子を入力結線して、このANDゲート１２１の
出力をORゲート１２０の入力端子に結線し、演
算制御回路１０５でANDゲート１２１の入力の
一方を禁止すれば、fRの論理レベルが“０”で
も“１”でも読出しカウンタ１０７の初期値を設
定できる。また、演算制御回路１０５による読出しカウン
タ１０７の初期値設定は第５図に示す如くクロツ
ク発生回路１２３の出力fHを用いる事によつて
も同様に行なわれる。この場合fHはfRと較べて
充分に周波数の高いクロツクであり、これを
ANDゲート１２２の一方の入力端子と演算制御
回路１０５の入力端子に結線する。演算制御回路
１０５は読出しカウンタ１０７と初期値設定を行
う際、ANDゲート１２１の入力に論理レベル
“０”を与え、ANDゲート１２２の入力に論理レ
ベル“１”を与え、クロツク回路１２３の出力が
所定数計数されたら、ANDゲート１２１の入力
を論理レベル“１”に、ANDゲート１２２の論
理レベルを“０”に戻すことにより読出しカウン
タを初期化できる。また、読出しカウンタをプリ
セツトカウンタで構成し、直接初期値をプリセツ
トしても同様である事は明らかである。この様にして初期値設定が行なわれたのち、読
出しカウンタはfRを分周する。尚読出しカウン
タの出力Y₁〜Yaの下位ビツトはY₁である。さて、クロツク発生回路１０８はRAM１０４
の書込みクロツクタイミングを与える。クロツク
発生回路１０８の出力fwはＡビツトの分周回路
１０９のクロツク入力端子Ｔに入力供給され、分
周回路１０９の出力W₁〜Wa（下位W₁）を順次歩
進させる。１１０は切換え回路であり、制御入力
LT₁が論理レベル“１”のとき、分周回路１０９
の出力W₁〜Waを、また論理レベル“０”のと
き、読出しカウンタ１０７の出力をRAM１０４
のアドレス入力A₁〜Aaへ出力する。１１４，１
１６はインバータであり１１５はANDゲート、
１１７はNANDゲートである。R₁，R₂及びR₃は
抵抗器であり、C₁，C₂及びC₃はコンデンサであ
る。R₁とC₁、R₂とC₂、及びR₃とC₃はそれぞれ積
分回路を構成している。これらの時定数をそれぞ
れτ₁，τ₂，τ₃とすると、これらは全て書込みクロ
ツクfwの周期よりも充分に小さく、τ₁＞τ₃＞τ₂と
なるよう構成する。即ち、第６図に示す如く、
ANDゲート１１５の出力（同図ｂ）はfw（同図
ａ）の立ち上りで論理レベル“１”となり、時定
数τ₁でコンデンサC₁が充電されると、立ち下が
る。NANDゲート１１７の出力（同図ｃ）はfw
（同図ａ）の立ち上りより遅れて立ち下がり、
ANDゲート１１５の出力が立ち下がる時点より
先に立ち上がる。１１１はラツチ回路であり、制
御入力端子LT₂の論理レベルが“０”のとき、入
力を出力に伝え、“１”のときは立ち上りの時点
の情報をラツチ出力する。１１２はデイジタル−
アナログ変換回路（以下Ｄ／Ａと称す）であり、
デイジタル値をアナログ値に変換する。１１３は
ローパスフイルタであり、Ｄ／Ａ変換された音声
信号のサンプリングノイズを除去する。１３０は
NANDゲートであり、ANDゲート１１５の出力
と演算制御回路１０５の出力を入力結線し、出力
をラツチ回路１１１のLT２入力に結線する。演
算制御回路１０５は読出しカウンタ１０７の初期
値を設定している間は論理レベル“０”を
NANDゲート１３０に出力する。これにより読
出しカウンタの初期値が設定される過渡状態にお
いて、ラツチ回路１１１は入力を出力に伝えない
よう構成している。このように構成すると、入力端子に与えられた
音声信号はＡ／Ｄ１０３によりデイジタル値に変
換され、書込みクロツクfwの周期でRAM１０４
に記憶される。即ち、ANDゲート１１５の出力
が“１”のとき、RAM１０４のアドレス入力A₁
〜Aaは分周回路１０９の出力が与えられ、制御
入力端子LT₃が“０”となり、Ａ／Ｄ１０３の出
力が記憶される。fwの周期で分周回路１０９は
歩進するので、音声信号がサンプリングされ記憶
されるRAM１０４のアドレスは連続的である。
但し、2^Aのアドレスは０となる。書込みクロツク
fwに従つてサンプリングされ、デイジタル値と
してRAM１０４に記憶された音声信号は読出し
クロツクfRに従つて読み出され、Ｄ／Ａ変換１
１２され、アナログ信号として音声信号が再生さ
れる。この書込みクロツクfwと読出しクロツク
fRの比が時間軸変換される比率となる。読出しカンウンタは読出しクロツクfRの周期
で歩進され、従つてRAM１０４の記憶内容を読
み出すアドレスはfRの周期で歩進される。ラツ
チ回路１１１を設けたのはRAM１０４の書き込
み時に誤つたアドレスの内容を読み出さなくする
為である。即ち、RAM１０４の読み出しは書き
込み時以外常時行なわれている。尚、本発明においては、書込みクロツクfwは
以上の説明の構成にかえて (イ) クロツク発生回路の出力をＡ／Ｄ１０３の
Ａ／Ｄ変換開始用クロツクとして結線する。従
つてＡ／Ｄ１０３は該クロツクに基づきＡ／Ｄ
変換し、この変換が終了する都度、変換終了信
号（END OF CONVERSION，EOCと略す）
を発生する。 (ロ) このEOCを第３図の書込みクロツクfwのか
わりに用いる。という構成にしても、前述と同様に機能する事は
明らかである。尚、この構成によれば、音声信号
のＡ／Ｄ変換が終了されてからRAMへの書き込
みを行うこととなるので、Ａ／Ｄ変換途中のデー
タをRAMへ書き込む事はなく、制御クロツクの
タイミングが簡明に考えられる。さて、第３図に記載の先願技術は第１図の従来
例にて説明した如く、接続する音素片の接続部に
ついて時間的修正を加えるものであるが、これを
演算制御回路１０５により行なう。演算制御回路
１０５は、ROMによりプログラムされた演算処
理装置CPU（コンピユータ）であつても構わな
い。第７図は演算制御回路１０５の働きを示すも
のである。各処理周期は読出しクロツクがＮヶ計
数される周期である。以下、時間軸ｔ方向は書込
みクロツクfwを単位に述べる。〔処理周期２〕で
読み出される音素片サンプル列Ｎ個のうち、最後
端のＭ個のサンプル列を〔処理周期１〕において
書込みクロツクfwに従つて記憶する。〔処理周期
２〕の先頭から（Ｍ＋ｒ）個のサンプル列をとり
こみ、これと前述のＭ個のサンプル列について、
相関度の高い点(K)を算出する。この(K)の算出につ
いては後述する。〔処理周期２〕の先頭から(K)個
経た時点から、前述のＭ個のサンプル列の相関度
が高い故、〔処理周期３〕の先端で、〔処理周期
２〕の先頭から（Ｋ＋ｒ）個すぎた時点の分周回
路１０９の出力の値に読出しカウンタ１０７の出
力を初期値化する。これにより〔処理周期２〕と
〔処理周期３〕の接続点において読み出される音
声波形のサンプル列は連続的に連なつていくこと
ができる。〔処理周期２〕の先頭から（Ｋ＋Ｎ）
個の書込みクロツクfwを計数した時点からＭ個
のサンプル列は、〔処理周期３〕で読み出され後
端部Ｍ個のサンプル列であり、次の処理周囲の間
の接続点の算出の為、これを記憶する。以後、処
理周期毎にこの操作をすれば、波形は滑らかに接
続されてゆく。さて、相関度の高い接続点の値Ｋの算出につい
て以下述べる。第８図ａ及びｂはそれぞれ第７図
の〔処理周期１〕で書き込まれる先行音素片の後
端部のサンプルＭ個及び〔処理周期２〕の先端の
後続音素片の前端部（Ｍ＋ｒ）個のサンプルを示
す。この先行音素片後端部のサンプル数列を
（Xp）（ｐ＝１、２、…Ｍ）、後続音素片前端部の
サンプル数列を（Yp）（ｐ＝１、２、…Ｍ＋ｒ）
とする。この（Xp）及び（Yp）はＡ／Ｄ１０３
の出力を書込みクロツクfwでサンプリングして
得られる。この音素片の類似性を演算するには、
（Xp）と（Yp）の二乗誤差（ek²）を計算するの
がよい。二乗誤差（ek²）は、 ek²＝１／Ｍ_M 〓^P=1 （Xp−Ｘ／σX−Yp＋ｋ−Ｙ／σY）² …(2) 但し、＝１／Ｍ _M 〓^P=1 Xp、＝１／Ｍ_M 〓^P=1 Yp、 The present invention relates to a speech synthesis device, and an object of the present invention is to improve the quality of a synthesized speech signal. In general, the quality of speech signals (words, phrases, speech) synthesized by combining and editing phoneme fragments, that is, words, syllables, or even shorter speech segments, is determined by the processing of the connections between phoneme fragments, which are the constituent units of speech. It can be said that it depends on the situation. For example, a sudden change in the waveform that occurs at the connection, that is, a discontinuity in the waveform, causes harmonic noise, lowers the S/N ratio of the synthesized sound, and reduces the clarity. It is also known that fluctuations in the pitch frequency, which is the fundamental frequency of vocal fold vibration, degrade the naturalness of synthesized speech. Human hearing is extremely sensitive to changes in pitch frequency (the detection limit is
(said to be 0.1%). If the pitch frequencies of the combined phoneme segments are discontinuous, the synthesized speech will be difficult to hear and unnatural. The present invention makes it possible to obtain high-quality synthesized speech by recognizing phoneme waveform patterns and combining phoneme pieces in a natural manner. As the phoneme segment waveform, there are methods to use one cut out from natural speech, for example, for each pitch interval, or to use a phoneme segment waveform synthesized by another speech synthesizer, but the present invention uses a comparison method. for a short period of time,
Specifically, the purpose of this study is to clarify a method for combining phoneme segments of several milliseconds in length without discontinuities in the waveform at the connection point and without fluctuations in the pitch frequency. In other words, the waveforms of such short-duration phonemes should be similar at least at the joints of adjacent phonemes,
Therefore, by slightly modifying the time axis of each phoneme, the connecting parts can be smoothly connected. The present invention grasps the degree of waveform similarity in the form of a signal level for the connecting portions of phoneme segments to be combined, and makes appropriate temporal corrections to the time axes of the phoneme segments based on this. The detailed content of the present invention will be explained below using an audio time axis conversion device as a specific example. FIG. 1 is a block diagram illustrating a conventional time axis expansion device. In the figure, terminal 1 is an audio input terminal, 2 is an output terminal, and 3 and 4 are N-bit analog shift registers such as BBD.
5 is a low pass filter (LPF). 6,7,
8 and 9 are analog switches, and input terminal 1
From analog shift register 3 or 4, LPF
The switch controls the audio signal that reaches the output terminal 2 via the terminal 5. In addition, these analog switches are connected to the write clock circuit 1 of the analog shift registers 3 and 4.
Opening/closing is controlled as shown in the figure by the Q and output of a frequency dividing circuit 11 that divides 0 by 2 mN (m will be described later). The analog shift registers 3 and 4 are connected to OR gates 14 and 15 by the clock circuit 10 and the Q of the frequency divider circuit 11, and the output AND gates 12 and 13.
Q of the read clock circuit 16 and the frequency divider circuit 11 are alternately controlled by the 7-input clock through
Similarly, OR is performed by output AND gates 17 and 18.
It is alternately read clock controlled via gates 14 and 15. That is, for example, an audio signal in which the time axis applied to the input terminal is compressed by m times (m>1) (such a compressed signal can be obtained by, for example, increasing the playback speed of a tape recorder to m times the recording speed). When the Q output of the frequency divider circuit 11 is 1, the signal is sent to the analog shift register 4 via the analog switch 8.
written to. The number of bits of the shift register is N
Therefore, when the input audio signal completes sequential input as mN sampling strings, the rear end N of the mN sampling strings are stored in the shift register, and the Q output of the frequency dividing circuit 11 is inverted. becomes 0, and the switch 8 is closed. At the same time, the Q output of the frequency dividing circuit becomes 1, the switch 6 is opened, and data is written into the analog shift register 3 in the same way. As is clear from the structure shown, the analog shift register 4 is then clocked by a readout clock circuit 16 and read out via a switch 9 which is also controlled by the output. During the write period to the analog shift register 3, another analog shift register 4 performs reading in this way, and then when the Q and output of the frequency divider circuit 11 is inverted, the analog shift register 4 writes again, and 3
performs the read. Here write clock circuit 1
0 clock frequency as f ₁ , read clock circuit 1
If the clock frequency of 6 is f ₂ , then if each clock frequency is determined so that f ₁ /f ₂ = m (1), the time axis will be expanded by m times, and the compressed signal input to audio input terminal 1 will be expanded by m times. The audio appears at the output terminal 2 with the time axis restored. The readout clock frequency _f2 is naturally determined to satisfy Nyquist's sampling theorem for the required output audio frequency band. In the conventional device as described above, the connection timing of the phoneme pieces that are alternately output from the analog shift registers 3 and 4 is determined by the output of the frequency dividing circuit 11 that divides the write clock 10 by 2 mN every mN/f ₁ second. Since it is determined automatically, discontinuous waveform changes and pitch frequency fluctuations occur at the connection portions of phoneme pieces, as shown in FIG. As mentioned above, such discontinuities in waveform and pitch at the junctions of phoneme segments significantly degrade sound quality and clarity. Next, in order to improve the drawbacks of the conventional device, the applicant first filed Japanese Patent Application No. 56-94802 (June 1983).
We proposed a technology as described in (filed on the 18th). First of all,
The technical content of this prior application will be explained with reference to the block diagram of FIG. In the figure, 101 is an audio signal input terminal, 102 is an audio signal output terminal, and 103 is an audio signal input terminal.
is an analog-digital conversion circuit (hereinafter referred to as A/D) that converts an audio signal into digital data. 104 is a random access memory (hereinafter referred to as RAM) having a storage element of 2A bytes, and when the control input terminal (LT ₃ ) is at logic level "0", it is applied to the data input terminals I ₁ to Id (lower I ₁ ). The digital value is input to address input terminals A ₁ to Aa (lower
A ₁ ) is stored at the address given by A 1 ). When the control input terminal LT ₃ is at logic level "1", the contents of the address given by the address input terminals A ₁ -Aa are outputted to the data output terminals 0 ₁ -0d.
106 and 108 are clock generation circuits. The output fR of the clock generation circuit 106 is output from the OR gate 120.
The clock signal is supplied to the clock input terminal T of the read counter 107 via the read counter 107, and the output of the address control circuit 107 consisting of the read counter is incremented. The address control circuit 107 is an A-bit counter, and its initial value is set by the output of the arithmetic control circuit 105. Here, we will discuss how to set this initial value. First, the arithmetic control circuit 105 is a read counter 10.
The output of the read counter 107 is cleared by applying a pulse to the clear input terminal GL of No. 7. Next, the initial value of the read counter 107 is set by applying the number of pulses to be initialized from the SC (Set Counter) terminal of the arithmetic control circuit 105 to the input of the OR gate 120. Incidentally, the cycle for setting this initial value is the interval at which the output fR of the clock generation circuit 106 is counted for a predetermined value, and therefore, the output value of the read counter 107 at this time is the value initialized in the previous cycle. This is the value obtained by adding a predetermined number to
Read counter 107 via OR gate 120
It is sufficient to supply the clock to the clock input end row T of the clock. In this case, there is no need to clear the read counter.
Incidentally, the clock generation circuit 106 increments the read counter 107 by the arithmetic control circuit 105 described above.
This must be done when the output fR of is at logic level "0". If the above setting is to be performed even when the logic level of fR is "1", an AND gate 121 is placed at the input terminal from fR of the OR gate 120 as shown in FIG. This fR is supplied, the output terminal of the arithmetic control circuit 105 is connected to the other input terminal, and the output of the AND gate 121 is connected to the input terminal of the OR gate 120. If one of the inputs is prohibited, the initial value of the read counter 107 can be set regardless of whether the logic level of fR is "0" or "1". Further, the initial value setting of the read counter 107 by the arithmetic control circuit 105 is similarly performed by using the output fH of the clock generation circuit 123 as shown in FIG. In this case, fH is a clock with a sufficiently high frequency compared to fR, and this
One input terminal of the AND gate 122 and an input terminal of the arithmetic control circuit 105 are connected. When the arithmetic control circuit 105 performs initial value setting with the read counter 107, it applies a logic level "0" to the input of the AND gate 121, a logic level "1" to the input of the AND gate 122, and outputs the clock circuit 123. After counting a predetermined number, the read counter can be initialized by returning the input of AND gate 121 to logic level "1" and the logic level of AND gate 122 to "0". Furthermore, it is clear that the same effect can be achieved even if the read counter is constructed from a preset counter and the initial value is directly preset. After the initial value is set in this manner, the read counter divides fR. Note that the lower bit of the outputs Y ₁ to Ya of the read counter is Y ₁ . Now, the clock generation circuit 108 is the RAM 104.
Provides write clock timing. The output fw of the clock generating circuit 108 is supplied to the clock input terminal T of the A-bit frequency dividing circuit 109, and the outputs W ₁ to Wa (lower W ₁ ) of the frequency dividing circuit 109 are incremented sequentially. 110 is a switching circuit, and a control input
When LT ₁ is at logic level “1”, the frequency divider circuit 109
The outputs W ₁ to Wa of
output to address inputs A ₁ to Aa. 114,1
16 is an inverter, 115 is an AND gate,
117 is a NAND gate. R ₁ , R ₂ and R ₃ are resistors, and C ₁ , C ₂ and C ₃ are capacitors. R ₁ and C ₁ , R ₂ and C ₂ , and R ₃ and C ₃ each constitute an integrating circuit. Letting these time constants be τ ₁ , τ ₂ , and τ _{3 ,} respectively, they are all sufficiently smaller than the period of the write clock fw, so that τ ₁ >τ ₃ >τ ₂ holds. That is, as shown in Figure 6,
The output of the AND gate 115 (b in the figure) becomes logic level "1" at the rising edge of fw (a) in the figure, and falls when the capacitor _C1 is charged with the time constant _τ1 . The output of the NAND gate 117 (c in the same figure) is fw
It falls later than the rise of (a) in the same figure,
It rises before the output of the AND gate 115 falls. Reference numeral 111 denotes a latch circuit, which transmits the input to the output when the logic level of the control input terminal _LT2 is "0", and latches and outputs information at the time of rising when it is "1". 112 is digital
It is an analog conversion circuit (hereinafter referred to as D/A),
Convert digital values to analog values. A low-pass filter 113 removes sampling noise from the D/A converted audio signal. 130 is
It is a NAND gate, and the output of the AND gate 115 and the output of the arithmetic control circuit 105 are connected as inputs, and the output is connected as the LT2 input of the latch circuit 111. The arithmetic control circuit 105 keeps the logic level “0” while setting the initial value of the read counter 107.
Output to NAND gate 130. In a transient state where the initial value of the read counter is thereby set, the latch circuit 111 is configured not to transmit the input to the output. With this configuration, the audio signal applied to the input terminal is converted into a digital value by the A/D 103, and is sent to the RAM 104 at the cycle of the write clock fw.
is memorized. That is, when the output of the AND gate 115 is "1", the address input A ₁ of the RAM 104
~Aa is given the output of the frequency dividing circuit 109, the control input terminal _LT3 becomes "0", and the output of the A/D 103 is stored. Since the frequency dividing circuit 109 advances with a period of fw, the addresses of the RAM 104 where the audio signal is sampled and stored are continuous.
However, the address of ^2A is 0. write clock
The audio signal sampled according to fw and stored in the RAM 104 as a digital value is read out according to the readout clock fR, and the D/A conversion 1
12, and the audio signal is reproduced as an analog signal. This write clock FW and read clock
The ratio of fR is the ratio of time axis conversion. The read counter is incremented with the period of the read clock fR, and therefore the address from which the contents of the RAM 104 are read is incremented with the period of fR. The reason why the latch circuit 111 is provided is to prevent the contents of an erroneous address from being read out when writing to the RAM 104. That is, reading from the RAM 104 is performed at all times except when writing. In the present invention, instead of the configuration described above, the write clock fw is constructed by (a) connecting the output of the clock generation circuit as the clock for starting A/D conversion of the A/D 103; Therefore, the A/D 103 is based on the clock.
Conversion is performed, and each time this conversion ends, a conversion end signal (END OF CONVERSION, abbreviated as EOC) is sent.
occurs. (b) Use this EOC in place of the write clock fw in Figure 3. It is clear that even with this configuration, it functions in the same way as described above. In addition, according to this configuration, since the writing to the RAM is performed after the A/D conversion of the audio signal is completed, the data in the middle of A/D conversion is not written to the RAM, and the timing of the control clock is can be easily considered. Now, as explained in the conventional example shown in FIG. 1, the prior art shown in FIG. . The arithmetic control circuit 105 may be an arithmetic processing unit CPU (computer) programmed with a ROM. FIG. 7 shows the operation of the arithmetic control circuit 105. Each processing period is a period in which N read clocks are counted. Hereinafter, the direction of the time axis t will be described in units of the write clock FW. Of the N phoneme segment sample strings read out in [processing cycle 2], the last M sample strings are stored in [processing cycle 1] in accordance with the write clock fw. Take in (M+r) sample strings from the beginning of [processing cycle 2], and regarding this and the above M sample strings,
Calculate points (K) with high correlation. The calculation of this (K) will be described later. Since the correlation of the above-mentioned M sample strings is high after (K) samples have passed from the beginning of [processing cycle 2], at the beginning of [processing cycle 3], (K+r) from the beginning of [processing cycle 2] The output of the read counter 107 is initialized to the value of the output of the frequency divider circuit 109 at the time when the frequency exceeds the value. As a result, the sample string of the audio waveform read out at the connection point between [processing cycle 2] and [processing cycle 3] can be continuous. From the beginning of [processing cycle 2] (K+N)
The M sample strings from the time when the write clocks FW are counted are read out in [processing cycle 3] and are the trailing end M sample strings, which are used to calculate the connection points between the surroundings of the next process. , remember this. Thereafter, by performing this operation every processing cycle, the waveforms will be smoothly connected. Now, calculation of the value K of a connection point with a high degree of correlation will be described below. Figures 8a and b are M samples of the rear end of the preceding phoneme written in [processing cycle 1] and (M+r) samples of the front end of the succeeding phoneme at the tip of [processing cycle 2] in Figure 7, respectively. A sample is shown below. The sequence of samples at the rear end of this preceding phoneme is (Xp) (p=1, 2,...M), and the sequence of samples at the front end of the following phoneme is (Yp) (p=1, 2,...M+r)
shall be. This (Xp) and (Yp) are A/D103
It is obtained by sampling the output of with write clock fw. To calculate the similarity of this phoneme,
It is better to calculate the squared error (ek ² ) between (Xp) and (Yp). The squared error (ek ² ) is: ek ² = 1/M _M 〓 ^P=1 (Xp-X/σX-Yp+k-Y/σY) ² ...(2) However, = 1/M _M 〓 ^P=1 Xp, =1/M _M 〓 ^P=1 Yp,

【式】【formula】

【式】Ｋ＝０、１、２、…、ｒ−１であらわされる。これはサンプリング波形（Xp）
に対して（Yp）をＫ個だけずらせて重ね合わせ
たときの類似度をあらわすものである。しかしながら、(2)式にもとづく演算処理は、実
際には膨大な計算ステツプ数となり、短時間（少
なくとも数10ミリ秒の間）で計算するには、高性
能のコンピユータによらねばならない。もともと
(2)式は振幅やレベルの異なる２つの波形の相関を
しらべるものであつて、その為標準偏差（σx）、
（σy）で波形を正規化し、更に平均レベル（）
（）との差について二乗和をとることにより誤
差を計算している。ところで第３図に記載の音声
の合成装置の場合、取扱う音素片は時間的に近接
した波形であり、従つて振幅およびレベル共もと
もと類似しているとみて良い。この場合２つの波
形間の差は(2)式に代えて ek²＝１／Ｍ_M 〓^P=1 （Xp−Yp＋ｋ）² …(3) を計算しても良い。しかもこの場合は２つの波形
の類似度が最大のタイミングを把握すれば良いの
であり、従つて(3)式は更に次の(4)式に代えられ
る。 e_k＝_M 〓^P=1 ｜Xp−Yp＋ｋ｜ …(4) ここで（Xp）及び（Yp＋ｋ）はＡ／Ｄ変換器
の最上位桁だけを用いてもよい。また、入力信号
の交流交叉点付近の極性を用いてもよい。この場
合（Xp）及び（Yp＋ｋ）はいずれも〔１〕又は
[Formula] It is expressed as K=0, 1, 2,..., r-1. This is the sampling waveform (Xp)
It represents the degree of similarity when (Yp) is shifted by K points and superimposed. However, the arithmetic processing based on equation (2) actually involves a huge number of calculation steps, and requires a high-performance computer to perform calculations in a short period of time (at least several tens of milliseconds). originally
Equation (2) examines the correlation between two waveforms with different amplitudes and levels, and therefore the standard deviation (σx),
Normalize the waveform with (σy) and further average level ()
The error is calculated by calculating the sum of squares of the difference between () and (). By the way, in the case of the speech synthesis apparatus shown in FIG. 3, the phoneme pieces handled have waveforms that are close in time, and therefore it can be considered that they are originally similar in amplitude and level. In this case, the difference between the two waveforms may be calculated by ek ² =1/M _M 〓 ^P=1 (Xp−Yp+k) ² (3) instead of equation (2). Moreover, in this case, it is sufficient to know the timing at which the similarity between the two waveforms is maximum, and therefore, equation (3) can be further replaced with the following equation (4). e _k = _M 〓 ^P=1 |Xp−Yp+k| (4) Here, only the most significant digit of the A/D converter may be used for (Xp) and (Yp+k). Alternatively, the polarity near the AC crossover point of the input signal may be used. In this case, (Xp) and (Yp+k) are both [1] or

〔０〕である。即ち、これは各対応するサンプリ
ング値の差の絶対値を積分したものであり、これ
が極小となるｋを知る事により接続タイミングが
決定される。第３図の装置では計算処理時間を極力小さくす
る為、(4)式にかえて gk＝_M 〓^P=1 （XpYp＋ｋ） …(5) を計算してもよい。(5)式において、（Xp）及び
（Yp＋ｋ）はＡ／Ｄ変換器の最上位桁のデータで
あり、〔１〕又はIt is [0]. That is, this is the integral of the absolute value of the difference between the corresponding sampling values, and the connection timing is determined by knowing k at which this is the minimum. In order to minimize the calculation processing time in the apparatus shown in FIG. 3, gk= _M 〓 ^P=1 (XpYp+k)...(5) may be calculated instead of equation (4). In equation (5), (Xp) and (Yp+k) are the most significant digit data of the A/D converter, and [1] or

〔０〕である。の記号は排他
的論理和をとる記号であり、従つて、（XpYp
＋ｋ）は（Xp）と（Yp＋ｋ）の排他的論理和、
すなわち（Xp）と（Yp＋ｋ）が共に〔１〕、又
はIt is [0]. The symbol is the symbol for exclusive OR, and therefore, (XpYp
+k) is the exclusive OR of (Xp) and (Yp+k),
That is, both (Xp) and (Yp+k) are [1], or

〔０〕のときWhen [0]

〔０〕が与えられ、その他の時
〔１〕が与えられる。従つて先行音素片の後端部
の２値信号サンプリングデータ（Xp）と、後続
音素片の先端部の２値信号サンプリングデータ
（Yp）の類似性が（gk）により与えられ、この
（gk）を最小にする(K)を知る事により接続タイミ
ングが決定される。即ち、演算制御回路１０５は
（gk）をｋ＝０、１、…、ｒ−１についてそれぞ
れ計算し、これが最も小さくなるｋを決定する。
即ち、第８図に示すように先行音素片の後端Ｍ個
のサンプル列は、高速音素片の先頭よりｋ個ずれ
た部分から重ね合わせるのが最も誤差が少ないと
いうことになる。以上説明した前記先頭の音声合成装置における
波形接続誤差は前記サンプル列（Xp）、及び
（Yp）のサンプリング間隔（サンプリング周期を
τとする）となる。従つてこの接続誤差を小さく
するためには該サンプル列のサンプリング間隔を
短かくすれば良いが、（Ｍ−τ）は入力アナログ
信号のピツチ周波数のほぼ１周期程度の時間が必
要であり、またｒ・τは入力アナログ信号のピツ
チ周波数の１周期以上の時間が必要となり、サン
プル列（Xp）及び（Yp）のサンプリング間隔言
短かくすると、その分だけ（Xp）、（Yp）のサン
プル列のデータ量が増え、演算処理に要する時間
が増える。換言すれば、演算処理を高速で行わす
事ができれば、その分だけ波形接続誤差が小さく
なり、高品質の再生音が得られる。そこで本発明では、差に入力アナログ信号の通
過帯域を制限する帯域フイルタを設け、該帯域フ
イルタにより帯域制限された入力アナログ信号を
２値信号に変換し、該２値信号をサンプリングし
てサンプル列（Xp）と（Xp）を得るよう構成す
る。即ち、本発明に於ては第３図のＡ／Ｄ変換回
路１０３と演算制御回路１０５とを結ぶ信号系路
（d₁）（d_o）を削除し、その代りに信号入力端子１
０１と第９図の入力端子２０１を結線すると共に
第９図の出力端子２０２を第３図の演算制御回路
１０に結線するように構成する。第９図において、端子２０１には入力アナログ
信号が与えられており、２０３は帯域フイルタで
入力アナログ信号の基本周波数成分を通過させる
特性を有する。２０４は増幅回路であり、２０５
はヒステリシス回路である。第９図のように構成すると、入力アナログ信号
の基本周波数に応答した２値信号が得られ、これ
をサンプリングして前述の先行音素片後端部及び
後続音素片前端部のサンプル列（Xp）、（ｐ＝１、
２、…Ｍ）及び（Yp）、（ｐ＝１、２、…Ｍ＋τ）
を得る。尚、第９図でヒステリシス回路２０５を設けた
のは帯域制限された入力アナログ信号の微小変位
に２値信号が応答しないようにするためである。
このようにして得られたサンプル列（Xp）と
（Yp）に関し、前述の式(4)に基づく演算を行う訳
であるが、（Xp）と（Yp）は時間的に近接した
入力アナログ信号を２値信号に変換して得られた
ものである。（Xp）のサンプル列の極性反転時点
をlM番目とし、これと同じ極性反転時点の
（Yp）のサンプルをl₁、l₂、…番目とすると、
（XlM）と（Yl₁）、（XlM）と（Yl₂）、…等
（Xp）及び（Yp）の極性反転時点を合せて gli＝_M 〓^P=1 （XpYli−lM＋ｐ−１） …(6) (6)式の演算を行い lM≦li≦τ＋lM …(7) (7)式の範囲の（li）のうち(6)式の（gli）を最小
とする（li）を求める。前述と同様に先行音素片後端部の２値信号サン
プリングデータ（Xp）と後続音素片前端部の２
値信号のサンプリングデータ（Yp）の類似性が
（gli）により与えられ、この（gli）を最小にする
liを知る事により接続タイミングが決定される。
即ち、先行音素片後端部のＭ個のサンプル列は後
続音素片前端部の先頭から（li−lM）個ずれた部
分から重ね合せるのが最も誤差が少いという事に
なる。(5)式に基づく演算はτ回行なわなければな
らなかつたのに対し、(6)式に基づく演算は(7)式を
満足する範囲で帯域制限された入力アナログ信号
の極性が反転する回数だけで良く、演算回数が少
なくてすむ。換言すればサンプル列（Xp）と
（Yp）のサンプリング間隔を短くする事ができ、
従つて接続誤差を小さくできるので、合成音質を
向上させる事ができる。以上説明したように演算制御回路１０５は入力
端子１０１に与えられた音声信号がＡ／Ｄ１０３
により変換されたデイジタル値を、クロツク発生
回路１０８の出力である書込みクロツクfwでサ
ンプリングして、前記サンプル列（Xp）と
（Yp）を得る。このサンプル列（Xp）及び
（Yp）を取り込むタイミングは全て、分周回路１
０９の出力（W₁〜Wa）の値により指示される。
又、演算制御回路１０５はクロツク発生回路１０
６の出力である読み出しクロツクを計数し、これ
がＮ個計数された時、読出しカウンタ１０７が初
期値を設定し、次の処理周期に入る。この読出し
カウンタを初期値化する値は、（Xp）と（Yp）
の演算により得られた(k)に（Yp）を取り込んだ
時の分周回路の指示値を加えたものである。尚、上述の説明においては、RAM１０４の記
憶内容を読み出すアドレスは読出しカウンタより
なるアドレス制御回路１０７により与える構成と
し、この読出しカウンタ１０７を所定値に初期値
化することにより波形の接続タイミングを与えて
いるが、読出しカウンタ１０７の出力に加算（又
は減算）回路を接続し、この加算（又は演算）回
路に所定値を加算（又は減算）することによつて
も同様に動作することは言うまでもない。このように本発明は、演算制御回路１０５の働
きにより滑らかな接続点の得られる時間軸変換回
路を提供するものであり、従つて従来装置の如き
接続部波形の不連続やピツチ周波数の変動のない
合成音を得ることができる。[0] is given, and at other times [1] is given. Therefore, the similarity between the binary signal sampling data (Xp) at the rear end of the preceding phoneme and the binary signal sampling data (Yp) at the tip of the following phoneme is given by (gk), and this (gk) The connection timing is determined by knowing the value (K) that minimizes . That is, the arithmetic control circuit 105 calculates (gk) for k=0, 1, .
That is, as shown in FIG. 8, the least error is achieved when the M sample strings at the rear end of the preceding phoneme are superimposed from a portion shifted by k from the beginning of the high-speed phoneme. The waveform connection error in the first speech synthesizer described above is the sampling interval (the sampling period is τ) of the sample sequences (Xp) and (Yp). Therefore, in order to reduce this connection error, the sampling interval of the sample string can be shortened, but (M-τ) requires approximately one period of the pitch frequency of the input analog signal, and r・τ requires a time longer than one period of the pitch frequency of the input analog signal, and to put it simply, the sampling interval of the sample strings (Xp) and (Yp) is increased by that amount. The amount of data increases, and the time required for arithmetic processing increases. In other words, if arithmetic processing can be performed at high speed, waveform connection errors will be reduced accordingly, and high quality reproduced sound can be obtained. Therefore, in the present invention, a band filter is provided to limit the pass band of the input analog signal, and the input analog signal whose band is limited by the band filter is converted into a binary signal, and the binary signal is sampled to form a sample sequence. Configure to obtain (Xp) and (Xp). That is, in the present invention, the signal path (d ₁ ) (d _o ) connecting the A/D conversion circuit 103 and the arithmetic control circuit 105 in FIG. 3 is deleted, and the signal input terminal 1 is used instead.
01 and the input terminal 201 of FIG. 9 are connected, and the output terminal 202 of FIG. 9 is connected to the arithmetic control circuit 10 of FIG. 3. In FIG. 9, an input analog signal is applied to a terminal 201, and 203 is a bandpass filter having a characteristic of passing the fundamental frequency component of the input analog signal. 204 is an amplifier circuit, 205
is a hysteresis circuit. With the configuration shown in Fig. 9, a binary signal responsive to the fundamental frequency of the input analog signal is obtained, and this is sampled to create the sample sequence (Xp) of the rear end of the preceding phoneme and the front end of the succeeding phoneme. , (p=1,
2,...M) and (Yp), (p=1, 2,...M+τ)
get. The reason why the hysteresis circuit 205 is provided in FIG. 9 is to prevent the binary signal from responding to minute displacements of the band-limited input analog signal.
Regarding the sample sequences (Xp) and (Yp) obtained in this way, calculations are performed based on the above equation (4), but (Xp) and (Yp) are input analog signals that are close in time. It is obtained by converting the signal into a binary signal. Let the polarity reversal point of the sample sequence of (Xp) be lMth, and let the samples of (Yp) at the same polarity reversal point be l ₁ , l ₂ ,...th,
(XlM) and (Yl ₁ ), (XlM) and (Yl ₂ ), etc. Together with the polarity reversal points of (Xp) and (Yp), gli = _M 〓 ^P=1 (XpYli−lM+p−1) …( 6) Calculate equation (6), lM≦li≦τ+lM …(7) Find (li) that minimizes (gli) in equation (6) among (li) in the range of equation (7). Similarly to the above, the binary signal sampling data (Xp) at the rear end of the preceding phoneme and the 2-value signal sampling data (Xp) at the front end of the following phoneme are
The similarity of the sampling data (Yp) of the value signal is given by (gli), and minimize this (gli)
By knowing li, the connection timing is determined.
In other words, the smallest error is obtained when the M sample strings at the rear end of the preceding phoneme are superimposed from a portion shifted (li-lM) from the beginning of the front end of the succeeding phoneme. The calculation based on equation (5) had to be performed τ times, whereas the calculation based on equation (6) had to be performed the number of times the polarity of the band-limited input analog signal was reversed within the range that satisfied equation (7). The number of calculations can be reduced. In other words, the sampling interval between sample sequences (Xp) and (Yp) can be shortened,
Therefore, since the connection error can be reduced, the synthesized sound quality can be improved. As explained above, the arithmetic control circuit 105 receives the audio signal applied to the input terminal 101 from the A/D 103.
The digital values converted by the above are sampled by the write clock fw, which is the output of the clock generating circuit 108, to obtain the sample strings (Xp) and (Yp). The timing to take in this sample string (Xp) and (Yp) is determined by the frequency divider circuit 1.
It is indicated by the value of the output (W ₁ to Wa) of 09.
Further, the arithmetic control circuit 105 is connected to the clock generation circuit 10.
When N clocks are counted, the read counter 107 sets an initial value and enters the next processing cycle. The values to initialize this read counter are (Xp) and (Yp)
The value indicated by the frequency divider circuit when (Yp) is taken in is added to (k) obtained by the calculation. In the above description, the address for reading out the memory contents of the RAM 104 is given by the address control circuit 107 consisting of a read counter, and the waveform connection timing is given by initializing the read counter 107 to a predetermined value. However, it goes without saying that the same operation can be achieved by connecting an addition (or subtraction) circuit to the output of the read counter 107 and adding (or subtracting) a predetermined value to this addition (or arithmetic) circuit. As described above, the present invention provides a time base conversion circuit that can obtain smooth connection points by the operation of the arithmetic control circuit 105, and therefore eliminates discontinuities in connection waveforms and fluctuations in pitch frequency that occur in conventional devices. You can get a synthesized sound that is not possible.

[Brief explanation of drawings]

第１図は従来の音声合成装置のブロツク・ダイ
ヤグラム、第２図は従来の装置の特性を示す図
面、第３図は本発明の先願の音声合成装置の構成
を示すブロツク・ダイヤグラム、第４図および第
５図は第３図の読出しカウンタ１０７の初期値化
を行う際の要部の構成例を示す回路図、第６図は
第３図の同装置のゲート１１５及び１１７の出力
を説明する為のタイムチヤートを示す図面、第７
図は第３図の同装置の演算制御回路１０５の働き
を説明する為のタイムチヤートを示す図面、第８
図は先行音素片Ｍ個と後続音素片（Ｍ＋ｒ）個の
サンプル列（Xp）と（Yp）の波形図、第９図は
本発明の音声合成装置の要部ブロツク回路図であ
る。１０１，２０１……信号入力端子、１０２，２
０２……信号出力端子、１０３……アナログ−デ
イジタル変換回路、１０４……ランダムアクセス
メモリ、１０５……演算制御回路、１０６……読
出しクロツクを発生するクロツク回路、１０７…
…アドレス制御回路（読出しカウンタ）、１０８
……書込みクロツクを発生するクロツク回路、１
１０……切り換え回路、１１１……ラツチ回路、
１１２……デイジタル…アナログ変換回路、１１
３……ローパスフイルタ、２０３……帯域フイル
タ、２０４……増幅回路、２０５……ヒシテリシ
ス回路である。 Fig. 1 is a block diagram of a conventional speech synthesis device, Fig. 2 is a drawing showing the characteristics of the conventional device, Fig. 3 is a block diagram showing the configuration of the speech synthesis device of the prior application of the present invention, and Fig. 4 5 and 5 are circuit diagrams showing configuration examples of main parts when initializing the read counter 107 shown in FIG. 3, and FIG. 6 explains the outputs of the gates 115 and 117 of the same device shown in FIG. 3. Drawing showing the time chart for
8 is a diagram showing a time chart for explaining the operation of the arithmetic control circuit 105 of the same device shown in FIG.
The figure is a waveform diagram of the sample strings (Xp) and (Yp) of M leading phoneme pieces and (M+r) following phoneme pieces, and FIG. 9 is a block circuit diagram of the main part of the speech synthesis apparatus of the present invention. 101,201...Signal input terminal, 102,2
02...Signal output terminal, 103...Analog-digital conversion circuit, 104...Random access memory, 105...Arithmetic control circuit, 106...Clock circuit that generates a read clock, 107...
...Address control circuit (read counter), 108
...Clock circuit that generates a write clock, 1
10...Switching circuit, 111...Latch circuit,
112...Digital...Analog conversion circuit, 11
3...Low pass filter, 203...Band filter, 204...Amplification circuit, 205...Hysteresis circuit.

Claims

[Claims] 1. A speech synthesis device that performs editing and synthesis using phoneme segments extracted from an analog speech waveform, which includes: (a) analog-to-digital conversion means for converting an analog input signal into a digital signal; (b) ) digital storage means for storing the output of the conversion means in accordance with a first clock; (c) address control means for controlling an address from which the stored contents of the digital storage means are read; and (d) the basics of the analog input signal. a bandpass filter that passes frequency components and blocks harmonic components; (e) binary signal conversion means that converts an analog input signal band-limited by the bandpass filter into a binary signal; (f) a binary signal The vicinity of the rear end of the preceding phoneme and the vicinity of the front end of the succeeding phoneme of the binary signal converted from the analog input signal by the signal conversion means are converted into the first
Sampling is performed in response to a clock, and the similarity calculation is performed while making the sample strings correspond relatively to each other, and the correspondence relationship between the two sample strings at the point in time when the similarity is highest is calculated. (g) digital-to-analog conversion means for converting the digital signal read from the digital storage means into an analog signal and reproducing an analog audio signal; A speech synthesis device, characterized in that the address control means is stepped by a second clock and instructs an address from which to read out the stored contents of the digital storage means. 2. The binary signal conversion means includes a hysteresis circuit, and the hysteresis circuit converts the signal into a binary signal that does not respond to minute fluctuations in an analog input signal whose frequency band is limited by the bandpass filter. The speech synthesis device according to item 1.