JPH02275996A

JPH02275996A - Word recognition system

Info

Publication number: JPH02275996A
Application number: JP1098376A
Authority: JP
Inventors: Kazuhiko Okashita; 和彦岡下; Shingo Nishimura; 新吾西村; Masayuki Unno; 海野　雅幸; Masashi Miyagawa; 宮川　正志
Original assignee: Sekisui Chemical Co Ltd
Current assignee: Sekisui Chemical Co Ltd
Priority date: 1989-04-18
Filing date: 1989-04-18
Publication date: 1990-11-09

Abstract

PURPOSE:To secure a high recognition rate and to easily perform real-time processing by using time variation of frequency characteristics of a voice as an input to a neural network. CONSTITUTION:The time variations 10 and 15 of the frequency characteristics of the voice are used as the input to the neural network 20. Thus, 'time variation of frequency characteristic of voice' is used, so preprocessing for obtaining the input is simplified and the time required for the preprocessing may be short. Further, the arithmetic processing of the whole neural network 20 is simple and speedy in principle. Consequently, the high recognition rate is secured and the real-time processing is easily performed.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、電気錠、ＩＣカード等のオンライン端末等て
入力音声からその単語を認識するに好適な単語認識方式
に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a word recognition method suitable for recognizing words from input speech using online terminals such as electric locks and IC cards.

［従来の技術］従来の単語認識方式は、例えば特公昭６３−４２００号
公報、特開昭６２−２２０９９８号公報に記載される如
く、以下の手順による。[Prior Art] A conventional word recognition method, as described in, for example, Japanese Patent Publication No. 63-4200 and Japanese Patent Application Laid-open No. 62-220998, is based on the following procedure.

■入力音声に含まれる単語に関する特徴量を抽出する。■Extract features related to words included in input speech.

■予め上記■と同様にして抽出しておいた標準パターン
と上記■て抽出した特徴量との距離を計算する。(2) Calculate the distance between the standard pattern extracted in advance in the same manner as (2) above and the feature amount extracted in (2) above.

■計算結果より、上記距離か最小の標準パターンの単語
を入力音声の単語と判定する。(2) Based on the calculation results, determine the word of the standard pattern with the minimum distance above as the word of the input speech.

［発明が解決しようとする課題］然しなから、上記従来の単語認識方式では、以下の問題
点がある。[Problems to be Solved by the Invention] However, the conventional word recognition method described above has the following problems.

■標準パターン作成時から時間か経過するにつれ、認識
率が劣化する。■The recognition rate deteriorates as time passes from the time the standard pattern was created.

■実時間処理が困難である。即ち、従来の単語認識方式
において一定以上の認識率を確保するためには複雑な特
徴量を用いる必要かあるが、複雑な特徴量を抽出するに
は複雑な処理装置が必要てあり、処理時間も多大となる
。■Real-time processing is difficult. In other words, in order to ensure a recognition rate above a certain level in conventional word recognition methods, it is necessary to use complex features, but extracting complex features requires a complex processing device, which reduces processing time. It will also be huge.

本発明は、高い認識率を確保し、かつ容易に実時間処理
てきる単語認識方式を得ることを目的とする。An object of the present invention is to obtain a word recognition method that ensures a high recognition rate and that can be easily processed in real time.

［課題を解決するための手段］請求項１に記載の本発明は、ニューラルネットワークを
用いて入力音声からその単語を認識する単語認識方式で
あって、ニューラルネットワークへの入力として、音声
の周波数特性の時間的変化を用いるようにしたものであ
る。[Means for Solving the Problems] The present invention as set forth in claim 1 is a word recognition method that uses a neural network to recognize a word from an input voice. This method uses the temporal change in .

請求項２に記載の本発明は、前記ニューラルネットワー
クへの入力として、音声の一定時間内における平均的な
周波数特性の時間的変化を用いるようにしたものである
。According to a second aspect of the present invention, a temporal change in an average frequency characteristic of audio within a certain period of time is used as an input to the neural network.

請求項３に記載の本発明は、前記ニューラルネットワー
クが階層的なニューラルネットワークであるようにした
ものである。According to a third aspect of the present invention, the neural network is a hierarchical neural network.

［作用］請求項１に記載の本発明によれば以下の■〜■の作用効
果がある。[Function] According to the present invention as set forth in claim 1, there are the following functions and effects.

■経時的な認識率の劣化か極めて少ない。このことは、
ニューラルネットワークが音声の時期差による変動の影
響を受けにくい構造をとることか可能なためと推定され
る。■Deterioration of recognition rate over time is extremely small. This means that
It is presumed that this is because it is possible for the neural network to adopt a structure that is less susceptible to fluctuations due to differences in speech timing.

■ニューラルネットワークへの入力として、「音声の周
波数特性の時間的変化」を用いたから、入力を得るため
の前処理が、従来の複雑な特徴量抽出に比して、単純と
なり、この前処理に要する時間が短くて足りる。■Since "temporal changes in the frequency characteristics of audio" are used as input to the neural network, the preprocessing to obtain the input is simpler than the conventional complex feature extraction. The time required is short and sufficient.

■ニューラルネットワークは、原理的に、ネットワーク
全体の演算処理か単純且つ迅速である。■Neural networks are, in principle, simple and quick to perform calculations on the entire network.

■ニューラルネットワークは、原理的に、それを構成し
ている各ユニットが独立に動作しており、並列的な演算
処理が可能である。従って、演算処理か迅速である。■In principle, each unit that makes up a neural network operates independently, and parallel arithmetic processing is possible. Therefore, calculation processing is quick.

■上記■〜■により、単語認識処理を複雑な処理装置に
よることなく容易に実時間処理できる。(2) With the above (2) to (2), word recognition processing can be easily performed in real time without using a complicated processing device.

又、請求項２に記載の本発明によれば上記■〜■の作用
効果に加えて、以下の■の作用効果がある。Further, according to the present invention as set forth in claim 2, in addition to the effects (1) to (2) above, there is the following effect (2).

■ニューラルネットワークへの入力として、「音声の一
定時間内における平均的な周波数特性の時間的変化」を
用いたから、ニューラルネットワークにおける処理が単
純となり、この処理に要する時間がより短くて足りる。■Since "temporal changes in the average frequency characteristics of audio within a certain period of time" are used as input to the neural network, the processing in the neural network is simple and the time required for this processing is shorter.

又、請求項３に記載の本発明によれば上記■〜■の作用
効果に加えて、以下の■の作用効果がある。Further, according to the present invention as set forth in claim 3, in addition to the effects (1) to (2) above, there is the following effect (2).

■階層的なニューラルネットワークにあっては、現在、
後述する如くの簡単な学習アルゴリズム（パックプロパ
ゲーション）が確立されており、高い認識率を実現でき
るニューラルネットワークを容易に形成できる。■For hierarchical neural networks, currently,
A simple learning algorithm (pack propagation) as described below has been established, and a neural network that can achieve a high recognition rate can be easily formed.

［実施例］第１図は本発明が適用された単語認識システムの一例を
示す模式図、第２図は入力音声を示す模式図、第３図は
バンドパスフィルタの出力を示す模式図、第４図はニュ
ーラルネットワークを示す模式図、第５図は階層的なニ
ューラルネットワークを示す模式図、第６図はユニッＩ
への構造を示す模式図である。[Example] Fig. 1 is a schematic diagram showing an example of a word recognition system to which the present invention is applied; Fig. 2 is a schematic diagram showing input speech; Fig. 3 is a schematic diagram showing the output of a bandpass filter; Figure 4 is a schematic diagram showing a neural network, Figure 5 is a schematic diagram showing a hierarchical neural network, and Figure 6 is a diagram showing a unit I.
FIG.

本発明の具体的実施例の説明に先立ち、ニューラルネッ
トワークの構成、学習アルゴリズムについて説明する。Prior to describing specific embodiments of the present invention, the configuration of the neural network and the learning algorithm will be described.

（１）ニューラルネットワークは、その構造から、第４
図（Ａ）に示す階層的ネットワークと第４図（Ｂ）に示
す相互結合ネットワークの２種に大別できる。本発明は
、両ネットワークのいずれを用いて構成するものであっ
ても良いが、階層的ネットワークは後述する如くの簡単
な学習アルゴリスムが確立されているためより有用であ
る。(1) Due to its structure, neural networks are
It can be roughly divided into two types: a hierarchical network shown in FIG. 4(A) and an interconnected network shown in FIG. 4(B). Although the present invention may be configured using either of these networks, the hierarchical network is more useful because a simple learning algorithm as described below has been established.

（２）ネットワークの構造階層的ネットワークは、第５図に示す如く、入力層、中
間層、出力層からなる階層構造をとる。(2) Network Structure A hierarchical network has a hierarchical structure consisting of an input layer, an intermediate layer, and an output layer, as shown in FIG.

各層は１以上のユニットから構成される。結合は、人力
層→中間層→出力層という前向きの結合たけて、各層内
での結合はない。Each layer is composed of one or more units. The connections are forward-looking, from the human power layer to the intermediate layer to the output layer, and there are no connections within each layer.

（３）ユニットの構造ユニットは第６図に示す如く脳のニューロンのモデル化
であり構造は簡単である。他のユニットから入力を受け
、その総和をとり一定の規則（変換関数）で変換し、結
果を出力する。他のユニットとの結合には、それぞれ結
合の強さを表わす可変の重みを付ける。(3) Structure of the unit The unit is a model of a neuron in the brain and has a simple structure as shown in FIG. It receives input from other units, sums it up, transforms it using a certain rule (conversion function), and outputs the result. Each connection with another unit is given a variable weight that represents the strength of the connection.

（４）学習（パックプロパゲーション）ネットワークの
学習とは、実際の出力を目標値（望ましい出力）に近づ
けることてあり、−Ｓ的には第６図に示した各ユニット
の変換関数及び重みを変化させて学習を行なう。(4) Learning (pack propagation) Learning of a network is to bring the actual output closer to the target value (desired output). Learn by making changes.

又、学習のアルゴリズムとしては、例えば、Ｒｕｍｅｌ
ｈａｒｔ、　Ｄ、Ｅ、、ＭｃＣｌｅｌｌａｎｄ、　Ｊ、
Ｌ、　ａｎｄ　ｔｈｅＰＤＰ　Ｒｅ５ｅａｒｃｈ　Ｇｒ
ｏｕｐ、　ＰＡＲＡＬＬＥＬ　ＤＩＳＴＲＩＢＵＴＥＤ
ＰＲＯＣＥＳＳＩＮＧ、　ｔｈｅ　ＭＩＴ　Ｐｒｅｓｓ
、　１９８６．　（文献２）に記載されているバックプ
ロパゲーションを用いることかできる。Further, as a learning algorithm, for example, Rumel
hart, D.E., McClelland, J.
L, and thePDP Re5earch Gr.
oup, PARALLEL DISTRIBUTED
PROCESSING, the MIT Press
, 1986. The backpropagation described in (Reference 2) can be used.

以下、本発明の具体的な実施例について説明する。尚、
この実施例の認識システム１は、ｎチャンネルのバンド
パスフィルタ１０．平均化回路１５、ニューラルネット
ワーク２０、判定回路３０の結合にて構成される（第１
図参照）（Ａ）学習単語を「ショウメイ」　「エアコン
」、「カーテン」、「テレビ」、「ドア」の５単語とし
、入力単語を「ショウメイ」、「エアコン」、「カーテ
ン」、「テレビ」、「ドア」の５単語とした。Hereinafter, specific examples of the present invention will be described. still,
The recognition system 1 of this embodiment includes an n-channel bandpass filter 10. Consisting of a combination of an averaging circuit 15, a neural network 20, and a determination circuit 30 (first
(See figure) (A) The learning words are ``Shomei'', ``Air conditioner'', ``Curtain'', ``TV'', and ``Door'', and the input words are ``Shomei'', ``Air conditioner'', ``Curtain'', and ``TV''. , the five words were ``door''.

（Ｂ）前処理 ■入力音声（５単語）を、第２図に示す如く、４つのブ
ロックに時間的に等分割する。(B) Preprocessing ■ Input speech (5 words) is temporally equally divided into four blocks as shown in FIG.

■各ブロックの音声波形を第１図に示す如く、複数（ｎ
個）（この実施例てはｎ＝８）チャンネルのバンドパス
フィルタ１０に通し、各ブロック即ち各一定時間毎に第
３図（Ａ）〜（Ｄ）のそれぞれに示す如くの周波数特性
を得る。■As shown in Figure 1, the audio waveform of each block is multiple (n
(in this embodiment, n=8) channels, and frequency characteristics as shown in FIGS. 3A to 3D are obtained for each block, that is, for each fixed time period.

この時、バンドパスフィルタ１０の出力は各ブロック毎
に平均化回路１５て平均化される。At this time, the output of the bandpass filter 10 is averaged by an averaging circuit 15 for each block.

（Ｃ）ニューラルネットワークによる処理及び判定 ■前処理の結果（ブロック毎のバントパスフィルタ１０
の出力）を、第１図に示す如く、３層の階層的なニュー
ラルネットワーク２０に入力する。入力層２１は、前処
理の４ブロツク、ｎチャンネルに対応する、４Ｘｎユニ
ツト（この実施例ではｎ＝８　３２ユニツト）にて構成
される。出力層２２は、５単語のそれそ゛れについて登
録単語とその他の単語とに対応する２ユニツトを設り、
全体を１０ユニットにて構成される。(C) Processing and judgment by neural network ■Results of preprocessing (Bant pass filter 10 for each block
output) is input into a three-layer hierarchical neural network 20 as shown in FIG. The input layer 21 is composed of 4×n units (n=832 units in this embodiment) corresponding to 4 blocks of preprocessing and n channels. The output layer 22 has two units corresponding to registered words and other words for each of the five words,
The whole is composed of 10 units.

■ニューラルネットワーク２０の出力を判定回路３０に
入力し、今回入力音声の単語を認識する。但し、本発明
の実施において、ニューラルネットワーク２０の出力は
判定回路３０の如くにて機械的に判定処理されず、ニュ
ーラルネットワーク２０の出力を得た人間の知力にて判
定処理されるものてあっても良い。- The output of the neural network 20 is input to the judgment circuit 30, and the words of the input speech are recognized. However, in implementing the present invention, the output of the neural network 20 is not mechanically judged by the judgment circuit 30, but is judged by the human intellect that has obtained the output of the neural network 20. Also good.

■前述した学習アルゴリズムのバックプロパゲーション
により、入力に対する出力のエラーが一定レベルに収束
するまで　１，０００回学習させ、一定認識率を保証し
得るネットワークを構築する。尚、出力としては、各登
録単語に対応したユニットが「１」、その他の単語に対
応したユニットが「０」となるように学習した。■By backpropagation of the learning algorithm described above, the network is trained 1,000 times until the error in the output relative to the input converges to a certain level, and a network that can guarantee a certain recognition rate is constructed. It should be noted that learning was performed so that the output would be "1" for the unit corresponding to each registered word, and "0" for the units corresponding to other words.

（Ｄ）実験上記認識システム１を用いて、単語認識を実験した。(D) Experiment Word recognition was experimented using the recognition system 1 described above.

入力音声は、パックプロパゲーションにより学習した５
単語（「ショウメイ」、「エアコン」、「カーテン」、
「テレビ」、「ドア」）とした。The input speech was learned using pack propagation.
Words ("shomei", "air conditioner", "curtain",
``television'', ``door'').

（ａ）認識率結果、認識率は　１００％であることが認められた。(a) Recognition rate As a result, the recognition rate was found to be 100%.

（ｂ）処理速度又、処理速度（１単語の発声に対する認識に要した時間
）は１秒以内であり、極めて迅速処理できることが認め
られた。(b) Processing speed Also, the processing speed (the time required to recognize the utterance of one word) was within 1 second, and it was recognized that extremely rapid processing could be performed.

即ち、上記認識システム１にあっては、」−記（ａ）の
結果が示すように認識率が極めて高い。That is, in the above-mentioned recognition system 1, the recognition rate is extremely high as shown in the result of (a).

又、上記認識システム１にあっては、上記　（ｂ）の結
果が示すように話者認識処理を複雑な処理装置によるこ
となく迅速処理でき、容易に実時間処理できる。Further, in the recognition system 1, as shown in the result (b) above, speaker recognition processing can be performed quickly without using a complicated processing device, and can be easily performed in real time.

尚、上記認識システム１にあっては、ニューラルネット
ワーク２０への入力として、平均化回路１５を用いて音
声の「一定時間における平均的な」周波数特性の時間的
変化を用いることとしたが、本発明の実施においては、
ニューラルネットワークへの入力として、単に「音声の
周波数特性の時間的変化」を用いるものであっても良い
。In the recognition system 1 described above, the averaging circuit 15 is used to use the temporal change in the "average frequency characteristic over a certain period of time" of the voice as input to the neural network 20. In carrying out the invention,
It is also possible to simply use "temporal changes in frequency characteristics of audio" as input to the neural network.

［発明の効果］以上のように本発明によれば、高い認識率を確保し、且
つ容易に実時間処理てきる単語認識方式％式％[Effects of the Invention] As described above, according to the present invention, the word recognition method % formula % ensures a high recognition rate and easily performs real-time processing.

[Brief explanation of drawings]

第１図は本発明が適用された単語認識システムの一例を
示す模式図、第２図は入力音声を示す模式図、第３図は
バンドパスフィルタの出力を示す模式図、第４図はニュ
ーラルネットワークを示す模式図、第５図は階層的なニ
ューラルネットワークを示す模式図、第６図はユニット
の構造を示す模式図である。１・・・認識システム、１０・・・バンドパスフィルタ、１５・・・平均化回路、２０・・・ニューラルネットワーク、２１・・・入力層２２・・・出力層、３０・・・判定回路（単語認識部）。特許出願人　積水化学工業株式会社代表者　　廣１）馨慣田Δ゛（よ、き−Fig. 1 is a schematic diagram showing an example of a word recognition system to which the present invention is applied, Fig. 2 is a schematic diagram showing input speech, Fig. 3 is a schematic diagram showing the output of a bandpass filter, and Fig. 4 is a schematic diagram showing an example of a word recognition system to which the present invention is applied. FIG. 5 is a schematic diagram showing a hierarchical neural network, and FIG. 6 is a schematic diagram showing the structure of a unit. DESCRIPTION OF SYMBOLS 1... Recognition system, 10... Band pass filter, 15... Averaging circuit, 20... Neural network, 21... Input layer 22... Output layer, 30... Judgment circuit ( word recognition section). Patent applicant Sekisui Chemical Co., Ltd. Representative Hiroshi 1) Kaoru Toda Δ゛ (Yo, Ki-

Claims

[Claims]

(1) A word recognition method that uses a neural network to recognize a word from input speech, and uses temporal changes in the frequency characteristics of the speech as input to the neural network.

(2) The word recognition method according to claim 1, wherein temporal changes in average frequency characteristics of speech within a certain period of time are used as input to the neural network.

(3) The word recognition method according to claim 1 or 2, wherein the neural network is a hierarchical neural network.