JP2001188551A

JP2001188551A - Device and method for information processing and recording medium

Info

Publication number: JP2001188551A
Application number: JP37377799A
Authority: JP
Inventors: Atsuo Hiroe; 厚夫廣江; Hironaga Tsutsumi; 洪長包; Hideki Kishi; 秀樹岸; Masanori Omote; 雅則表; Kazuhiko Tajima; 和彦田島; Masatoshi Takeda; 正資武田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-12-28
Filing date: 1999-12-28
Publication date: 2001-07-10

Abstract

PROBLEM TO BE SOLVED: To properly generate a response sentence as an answer to a user's speech. SOLUTION: When the user speaks a prescribed command, a robot recognizes the spoken command and operates according to the recognition result. For example, the robot speaks the answer sentence as an answer to the speech of the user. In this case, the robot distinguishes proper use of words, such as 'this', 'that', and 'it' corresponding to the positional relation between the robot and user, the robot and an object, or the user and object. When the object is near the robot, the robot represents and pronounces object with a word, such as 'this box' or 'it', showing a nearby body, while when the object is at a distance, the robot represents and pronounces the object with a word, such as 'that box' or 'that ', indicating a distance body.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報処理装置およ
び方法、並びに記録媒体に関し、特に、ユーザの発話に
対する応答としての応答文を適切に生成することができ
るようにした情報処理装置および方法、並びに記録媒体
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus and method, and a recording medium, and more particularly, to an information processing apparatus and method capable of appropriately generating a response sentence as a response to a user's utterance. And a recording medium.

【０００２】[0002]

【従来の技術】音声認識技術を利用して、使用者（ユー
ザ）が発話したコマンドを認識し、その認識結果に基づ
いて、各種の行動を起こすロボットが実用化されつつあ
る。例えば、ロボットは、ユーザの発話に対する応答と
しての応答文（メッセージ）を発話する。2. Description of the Related Art A robot that recognizes a command spoken by a user by using a voice recognition technique and takes various actions based on the recognition result is being put into practical use. For example, the robot utters a response sentence (message) as a response to the utterance of the user.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、ロボッ
トが発話する応答文は、与えられた対話の内容で一義的
生成され、例えば、ロボットとユーザ、またはロボット
とその対話の中で指示される対象物との位置関係に基づ
いて生成されるようにはなされていない。その結果、ロ
ボットとユーザの対話が不自然になる課題があった。However, the response sentence uttered by the robot is uniquely generated based on the content of the given dialogue, for example, the robot and the user, or the robot and the object indicated in the dialogue. It is not generated on the basis of the positional relationship with. As a result, there has been a problem that the dialog between the robot and the user becomes unnatural.

【０００４】例えば、ユーザの発話に対する応答とし
て、発話の内容を確認するための応答文をロボットが発
話する場合において、ユーザが、自分の遠くにある”
物”に対して、「あれを持って来い」と発話したとき、
ロボットは、その”物”が、ロボットから遠くにあるか
近くにあるかにかかわらず、「あれですか」と発話して
しまう。その”物”が、ロボットの遠くにあるものであ
れば、この発話（「あれですか」）は適切であるが、近
くにあるものであれば、「これですか」と発話する方
が、より自然な対話となる。[0004] For example, when the robot utters a response sentence for confirming the content of the utterance as a response to the utterance of the user, the user is far away from the user. "
When you say, "Take that thing,"
The robot utters "Are you?" Regardless of whether the "thing" is far or near the robot. If the "thing" is far from the robot, this utterance ("that is") is appropriate, but if it is near, it is better to say "this is?" A more natural conversation.

【０００５】本発明はこのような状況に鑑みてなされた
ものであり、位置関係に応じて応答文が生成されるよう
にするものである。[0005] The present invention has been made in view of such a situation, and aims to generate a response sentence according to a positional relationship.

【０００６】[0006]

【課題を解決するための手段】請求項１に記載の情報処
理装置は、ロボットと発話の中で指示される対象、ロボ
ットとユーザ、またはユーザと対象の位置関係を取得す
る取得手段と、生成される応答文に、位置関係に対応し
て使い分けられる言葉が含まれる場合、取得手段により
取得された位置関係に対応する言葉を選択する選択手段
と、選択手段により選択された言葉を用いて、応答文を
生成する生成手段とを備えることを特徴とする。According to a first aspect of the present invention, there is provided an information processing apparatus, comprising: an acquisition unit for acquiring a target indicated in a speech with a robot, a robot and a user, or a positional relationship between a user and a target; If the response sentence includes a word that is properly used in accordance with the positional relationship, a selecting unit that selects a word corresponding to the positional relationship acquired by the acquiring unit, and using the word selected by the selecting unit, Generating means for generating a response sentence.

【０００７】位置関係は、ロボットと対象、ロボットと
ユーザ、若しくはユーザと対象の間の距離、またはロボ
ット、ユーザ、若しくは対象が位置する方向若しくは向
きであることとすることができる。The positional relationship may be a distance between the robot and the object, a robot and the user, or a distance between the user and the object, or a direction or a direction in which the robot, the user, or the object is located.

【０００８】発話に、位置関係に対応して使い分けられ
た言葉が含まれている場合、言葉に基づいて、位置関係
を推定する推定手段をさらに設け、取得手段は、推定手
段により推定された位置関係により特定される範囲内か
らユーザまたは対象を検出する処理を実行し、実行した
処理での検出結果に基づいて位置関係を取得することが
できる。If the utterance contains words that have been properly used in accordance with the positional relationship, the utterance further includes estimating means for estimating the positional relationship based on the words, and the acquiring means includes a position estimated by the estimating means. A process of detecting a user or a target from within a range specified by the relationship is executed, and a positional relationship can be acquired based on a detection result of the executed process.

【０００９】生成手段は、応答文が音声で出力される場
合、取得手段により取得された位置関係に基づいて、応
答文の内容を調整することができる。[0009] When the response sentence is output by voice, the generation means can adjust the content of the response sentence based on the positional relationship acquired by the acquisition means.

【００１０】請求項５に記載の情報処理方法は、ロボッ
トと発話の中で指示される対象、ロボットとユーザ、ま
たはユーザと対象の位置関係を取得する取得ステップ
と、生成される応答文に、位置関係に対応して使い分け
られる言葉が含まれる場合、取得ステップの処理で取得
された位置関係に対応する言葉を選択する選択ステップ
と、選択ステップの処理で選択された言葉を用いて、応
答文を生成する生成ステップとを含むことを特徴とす
る。According to a fifth aspect of the present invention, there is provided an information processing method comprising: obtaining an object designated in a utterance with a robot, a robot and a user, or obtaining a positional relationship between the user and the object; When words that can be properly used according to the positional relationship are included, a response step is performed by using a selecting step of selecting a word corresponding to the positional relationship acquired in the processing of the acquiring step, and a word selected in the processing of the selecting step. And a generating step of generating

【００１１】請求項６に記載の記録媒体のプログラム
は、ロボットと発話の中で指示される対象、ロボットと
ユーザ、またはユーザと対象の位置関係を取得する取得
ステップと、生成される応答文に、位置関係に対応して
使い分けられる言葉が含まれる場合、取得ステップの処
理で取得された位置関係に対応する言葉を選択する選択
ステップと、選択ステップの処理で選択された言葉を用
いて、応答文を生成する生成ステップとを含むことを特
徴とする。According to a sixth aspect of the present invention, there is provided a program for a recording medium, comprising: an acquisition step of acquiring an object indicated in a utterance with a robot, a robot and a user, or a positional relationship between a user and an object; If a word that is properly used in accordance with the positional relationship is included, a response step is performed by using the word selected in the processing of the selecting step and a selecting step of selecting a word corresponding to the positional relationship acquired in the processing of the acquiring step. And a generating step of generating a sentence.

【００１２】請求項１に記載の情報処理装置、請求項５
に記載の情報処理方法、および請求項６に記載の記録媒
体のプログラムにおいては、ロボットと発話の中で指示
される対象、ロボットとユーザ、またはユーザと対象の
位置関係が取得され、生成される応答文に、位置関係に
対応して使い分けられる言葉が含まれる場合、取得され
た位置関係に対応する言葉が選択され、選択された言葉
が用いられ、応答文が生成される。An information processing apparatus according to claim 1,
In the information processing method according to the above, and the program of the recording medium according to the sixth aspect, the target indicated in the utterance with the robot, the robot and the user, or the positional relationship between the user and the user is acquired and generated. When the response sentence includes a word that is properly used in accordance with the positional relationship, a word corresponding to the acquired positional relationship is selected, the selected word is used, and a response sentence is generated.

【００１３】[0013]

【発明の実施の形態】図１は、本発明を適用したロボッ
トの利用例を示している。ロボット、ユーザ、および対
象物（この例の場合、箱）のそれぞれは、所定の範囲内
（例えば、部屋の中）の任意の場所に位置する。なお、
ロボットおよびユーザは、その範囲内を移動することが
できる。FIG. 1 shows an example of using a robot to which the present invention is applied. Each of the robot, the user, and the object (in this case, a box) is located at an arbitrary position within a predetermined range (for example, in a room). In addition,
The robot and the user can move within the range.

【００１４】ユーザが所定のコマンドを発話すると、ロ
ボットは、その発話を音声認識し、その認識結果に基づ
いて動作する。例えば、ロボットは、ユーザの発話に対
する応答として、所定の応答文を発話する。この場合、
ロボットは、「これ」や、「あれ」、「それ」などの言
葉を、ロボットとユーザ、ロボットと対象物、またはユ
ーザと対象物との位置関係に対応して適切に使い分け
る。例えば、対象物が、ロボットの近くにある場合、そ
の対象物は応答文の中で、「この箱」、または「これ」
などの近くにあるものを示す言葉で表され（指示さ
れ）、一方、遠くにある場合、「あの箱」または「あ
れ」など遠くにあるものを示す言葉で表される（指示さ
れる）。なお、以下において、「これ」や、「あれ」、
「それ」など対話相手や対話中の対象物との位置関係に
より使い分けが必要となる言葉を、個々に区別する必要
がない場合、距離／方向依存言葉と称する。When a user speaks a predetermined command, the robot recognizes the speech by voice and operates based on the recognition result. For example, the robot utters a predetermined response sentence as a response to the utterance of the user. in this case,
The robot appropriately uses words such as "this", "that", and "it" according to the positional relationship between the robot and the user, the robot and the target, or the user and the target. For example, if the object is near the robot, the object will appear in the response text as "this box" or "this"
Etc., is indicated (indicated) by words indicating something near, while, when it is far away, it is indicated (indicated) by words indicating things that are far away, such as "that box" or "that". In the following, "this", "that",
A word that needs to be used properly depending on the positional relationship with a conversation partner or an object during a conversation, such as “it,” is referred to as a distance / direction-dependent word when it is not necessary to distinguish each word.

【００１５】図２は、図１のロボットの外観の構成例を
示している。図３は、その電気的構成例を示している。
胴体部ユニット２の前後左右に、それぞれ脚部ユニット
３Ａ，３Ｂ，３Ｃ，３Ｄが連結されるとともに、胴体部
ユニット２の前端部と後端部に、それぞれ頭部ユニット
４と尻尾部ユニット５が連結されることにより構成され
ている。FIG. 2 shows an example of the external configuration of the robot shown in FIG. FIG. 3 shows an example of the electrical configuration.
Leg units 3A, 3B, 3C, and 3D are connected to the front, rear, left, and right sides of the body unit 2, respectively, and a head unit 4 and a tail unit 5 are provided at the front end and the rear end of the body unit 2, respectively. It is constituted by being connected.

【００１６】尻尾部ユニット５は、胴体部ユニット２の
上面に設けられたベース部５Ｂから、２自由度をもって
湾曲または揺動自在に引き出されている。The tail unit 5 is drawn out of the base unit 5B provided on the upper surface of the body unit 2 so as to bend or swing with two degrees of freedom.

【００１７】胴体部ユニット２には、ロボット全体の制
御を行うコントローラ１０、ロボットの動力源となるバ
ッテリ１１、並びにバッテリセンサ１２および熱センサ
１３からなる内部センサ部１４などが収納されている。The body unit 2 contains a controller 10 for controlling the entire robot, a battery 11 as a power source of the robot, and an internal sensor unit 14 including a battery sensor 12 and a heat sensor 13.

【００１８】頭部ユニット４には、「耳」に相当するマ
イク（マイクロフォン）１５、「目」に相当するＣＣＤ
(Charge Coupled Device)カメラ１６、触覚に相当する
タッチセンサ１７、「口」に相当するスピーカ１８など
が、それぞれ所定位置に配設されている。頭部ユニット
４にはまた、GPS(Global Positioning System)受信部１
９、ジャイロコンパス２０、地磁気センサ２１、および
超音波センサ２２がそれぞれ所定の位置に配置されてい
る。The head unit 4 includes a microphone (microphone) 15 corresponding to “ears” and a CCD corresponding to “eyes”.
(Charge Coupled Device) A camera 16, a touch sensor 17 corresponding to tactile sensation, a speaker 18 corresponding to a "mouth", and the like are arranged at predetermined positions. The head unit 4 also has a GPS (Global Positioning System) receiver 1
9, a gyrocompass 20, a geomagnetic sensor 21, and an ultrasonic sensor 22 are respectively arranged at predetermined positions.

【００１９】脚部ユニット３Ａ乃至３Ｄそれぞれの関節
部分や、脚部ユニット３Ａ乃至３Ｄそれぞれと胴体部ユ
ニット２の連結部分、頭部ユニット４と胴体部ユニット
２の連結部分、並びに尻尾部ユニット５と胴体部ユニッ
ト２の連結部分などには、図３に示すように、それぞれ
アクチュエータ３ＡＡ1乃至３ＡＡK、３ＢＡ1乃至３Ｂ
ＡK、３ＣＡ1乃至３ＣＡK、３ＤＡ1乃至３ＤＡK、４Ａ1
乃至４ＡL、５Ａ1および５Ａ2が配設されており、これ
により、各連結部分は、所定の自由度をもって回転する
ことができるようになっている。The joint portions of the leg units 3A to 3D, the connecting portions of the leg units 3A to 3D and the body unit 2, the connecting portions of the head unit 4 and the body unit 2, and the tail unit 5 As shown in FIG. 3, actuators 3AA1 to 3AAK, 3BA1 to 3B
AK, 3CA1 to 3CAK, 3DA1 to 3DAK, 4A1
4AL, 5A1 and 5A2 are provided so that each connecting portion can rotate with a predetermined degree of freedom.

【００２０】頭部ユニット４におけるマイク１５は、ユ
ーザからの発話を含む周囲の音声（音）を集音し、得ら
れた音声信号を、コントローラ１０に送出する。ＣＣＤ
カメラ１６は、周囲の状況を撮像し、得られた画像信号
を、コントローラ１０に送出する。The microphone 15 in the head unit 4 collects surrounding sounds (sounds) including utterances from the user, and sends the obtained sound signals to the controller 10. CCD
The camera 16 captures an image of the surroundings, and sends the obtained image signal to the controller 10.

【００２１】タッチセンサ１７は、例えば、頭部ユニッ
ト４の上部に設けられており、ユーザからの「なでる」
や「たたく」といった物理的な働きかけにより受けた圧
力を検出し、その検出結果を圧力検出信号としてコント
ローラ１０に送出する。The touch sensor 17 is provided, for example, above the head unit 4 and “strokes” from the user.
It detects the pressure received by a physical action such as tapping or tapping, and sends the detection result to the controller 10 as a pressure detection signal.

【００２２】GPS受信部１９は、所定の人工衛星から送
信される電波を受信し、コントローラ１０に送出する。
ジャイロコンパス２０および地磁気センサ２１は、頭部
ユニット４が向いている方角を検出し、その検出結果を
コントローラ１０に送出する。超音波センサ２２は、超
音波を出力するとともに、その反射波を受信し、コント
ローラ１０に送出する。なお、以下において、GPS受信
部１９乃至超音波センサ２２からコントローラ１０に送
出される各信号を、個々に区別する必要がない場合、こ
れらをまとめて、位置情報信号と称する。The GPS receiver 19 receives a radio wave transmitted from a predetermined artificial satellite and sends it to the controller 10.
The gyro compass 20 and the geomagnetic sensor 21 detect the direction in which the head unit 4 is facing, and send the detection result to the controller 10. The ultrasonic sensor 22 outputs an ultrasonic wave, receives the reflected wave, and sends the reflected wave to the controller 10. In the following, when it is not necessary to distinguish each signal transmitted from the GPS receiving unit 19 to the ultrasonic sensor 22 to the controller 10, these signals are collectively referred to as a position information signal.

【００２３】胴体部ユニット２におけるバッテリセンサ
１２は、バッテリ１１の残量を検出し、その検出結果
を、バッテリ残量検出信号としてコントローラ１０に送
出する。熱センサ１３は、ロボット内部の熱を検出し、
その検出結果を、熱検出信号としてコントローラ１０に
送出する。The battery sensor 12 in the body unit 2 detects the remaining amount of the battery 11 and sends the detection result to the controller 10 as a battery remaining amount detection signal. The heat sensor 13 detects heat inside the robot,
The detection result is sent to the controller 10 as a heat detection signal.

【００２４】コントローラ１０は、ＣＰＵ(Central Pro
cessing Unit)１０Ａやメモリ１０Ｂ等を内蔵してお
り、ＣＰＵ１０Ａにおいて、メモリ１０Ｂに記憶された
制御プログラムが実行されることにより、各種の処理を
行う。The controller 10 has a CPU (Central Pro
(Processing Unit) 10A, a memory 10B, and the like. The CPU 10A performs various processes by executing a control program stored in the memory 10B.

【００２５】即ち、コントローラ１０は、マイク１５
や、ＣＣＤカメラ１６、タッチセンサ１７、GPS受信部
１９、ジャイロコンパス２０、地磁気センサ２１、超音
波センサ２２、バッテリセンサ１２、熱センサ１３から
与えられる音声信号、画像信号、圧力検出信号、位置情
報信号、バッテリ残量検出信号、熱検出信号に基づい
て、周囲の状況や、ユーザからの指令、ユーザからの働
きかけなどの有無を判断する。That is, the controller 10 controls the microphone 15
, A CCD camera 16, a touch sensor 17, a GPS receiver 19, a gyro compass 20, a geomagnetic sensor 21, an ultrasonic sensor 22, a battery sensor 12, and a sound signal, an image signal, a pressure detection signal, and position information provided from a heat sensor 13. Based on the signal, the remaining battery detection signal, and the heat detection signal, it is determined whether there is a surrounding situation, a command from the user, an action from the user, or the like.

【００２６】さらに、コントローラ１０は、この判断結
果等に基づいて、続く行動を決定し、その決定結果に基
づいて、アクチュエータ３ＡＡ1乃至３ＡＡK、３ＢＡ1
乃至３ＢＡK、３ＣＡ1乃至３ＣＡK、３ＤＡ1乃至３ＤＡ
K、４Ａ1乃至４ＡL、５Ａ1、５Ａ2のうちの必要なもの
を駆動させ、これにより、頭部ユニット４を上下左右に
振らせたり、尻尾部ユニット５を動かせたり、各脚部ユ
ニット３Ａ乃至３Ｄを駆動して、ロボットを歩行させる
などの行動を行わせる。Further, the controller 10 determines a subsequent action based on the determination result and the like, and based on the determined result, the actuators 3AA1 to 3AAK, 3BA1.
To 3BAK, 3CA1 to 3CAK, 3DA1 to 3DA
K, 4A1 to 4AL, 5A1, and 5A2 are driven, whereby the head unit 4 can be swung up and down, left and right, the tail unit 5 can be moved, and the leg units 3A to 3D can be moved. Drive to make the robot perform an action such as walking.

【００２７】また、コントローラ１０は、必要に応じ
て、合成音（応答文に対応する合成音）を生成し、スピ
ーカ１８に供給して出力させる。Further, the controller 10 generates a synthesized sound (a synthesized sound corresponding to the response sentence) as necessary, and supplies the synthesized sound to the speaker 18 for output.

【００２８】以上のようにして、ロボットは、周囲の状
況等に基づいて自律的に行動をとることができるように
なっている。As described above, the robot can take an autonomous action based on the surrounding situation and the like.

【００２９】なお、本明細書においては、図１および図
２に示すような犬形状のロボットを例として、本発明を
説明するが、本発明は他の形態のロボット（例えば、人
の形状）にも当然適用することができる。In the present specification, the present invention will be described by taking a dog-shaped robot as shown in FIGS. 1 and 2 as an example. However, the present invention is not limited to other forms of robot (for example, human shape). Of course can also be applied.

【００３０】図４は、図３のコントローラ１０の機能的
構成例を示している。なお、図４に示す機能的構成は、
ＣＰＵ１０Ａが、メモリ１０Ｂに記憶された制御プログ
ラムを実行することで実現されるようになっている。FIG. 4 shows an example of a functional configuration of the controller 10 of FIG. The functional configuration shown in FIG.
The CPU 10A is realized by executing a control program stored in the memory 10B.

【００３１】入力処理部５１は、マイク１５や、ＣＣＤ
カメラ１６、タッチセンサ１７、GPS受信部１９、ジャ
イロコンパス２０、地磁気センサ２１、超音波センサ２
２等からの音声信号、画像信号、圧力検出信号、位置情
報信号等を受信し、制御部５０に送出する。The input processing unit 51 includes a microphone 15 and a CCD.
Camera 16, touch sensor 17, GPS receiver 19, gyro compass 20, geomagnetic sensor 21, ultrasonic sensor 2
The audio signal, the image signal, the pressure detection signal, the position information signal, and the like from the second and the like are received and transmitted to the control unit 50.

【００３２】音声認識／解析部５２は、入力処理部５１
を介して入力された、マイク１５からの音声信号に対し
て、所定の音声認識処理および言語解析処理を実行し、
その音声信号を、対話管理部５６が処理することができ
るような形式（テキスト、構文解析木、または対話用フ
レーム）に変換し、制御部５０に送出する。The speech recognition / analysis unit 52 includes an input processing unit 51
Performs predetermined speech recognition processing and language analysis processing on the audio signal from the microphone 15 input through
The voice signal is converted into a format (text, parse tree, or frame for dialogue) that can be processed by the dialogue manager 56 and sent to the controller 50.

【００３３】画像認識部５３は、入力処理部５１を介し
て入力された、ＣＣＤカメラ１６により取り込まれた画
像信号に対して、画像認識処理を行う。そして、画像認
識部５３は、その処理の結果、例えば、「直方体のも
の」や、「指先」、「人の顔」等を検出したときには、
「箱がある」や、「指先が示す方向」、「顔の大きさ」
等の画像認識結果を、制御部５０に通知する。The image recognizing section 53 performs an image recognizing process on an image signal input via the input processing section 51 and taken in by the CCD camera 16. Then, when the image recognition unit 53 detects, for example, “a rectangular parallelepiped object”, “fingertip”, “human face”, or the like as a result of the processing,
"There is a box", "Direction indicated by fingertip", "Face size"
Is notified to the control unit 50.

【００３４】位置検出部５４は、入力処理部５１を介し
て入力された音声信号や位置情報信号、画像認識部５３
による画像認識結果を用いて、例えば、ロボット、ユー
ザ、および対象物の位置、それぞの間の距離、並びにロ
ボットとユーザが向いている方向をなど検出し、その検
出結果を、制御部５０に送出する。The position detecting section 54 receives a voice signal, a position information signal, and an image recognizing section 53 inputted through the input processing section 51.
For example, the position of the robot, the user, and the target object, the distance between each of them, and the direction in which the robot and the user are facing are detected using the image recognition result by, and the detection result is sent to the control unit 50. Send out.

【００３５】位置検出部５４はまた、検出したロボッ
ト、ユーザ、または対象物の位置に基づいて、例えば、
距離／方向依存言葉を使い分ける上での位置関係（以
下、遠近状態と称する）を検出し、その検出結果を、制
御部５０に通知する。例えば、距離／方向依存言葉のう
ち、遠くにあるものを示す距離／方向依存言葉を使用す
べき遠近状態（以下、遠い状態と称する）であるか、ま
た近くにあるものを示す距離／方向依存言葉を使用すべ
き遠近状態（以下、近い状態と称する）であるかが検出
される。The position detecting section 54 also detects, for example, based on the detected position of the robot, the user, or the object, for example,
A positional relationship (hereinafter, referred to as a near-far state) in properly using the distance / direction-dependent words is detected, and the detection result is notified to the control unit 50. For example, of the distance / direction-dependent words, a distance / direction-dependent word indicating a distant word should be used (hereinafter, referred to as a far state), or a distance / direction-dependent word indicating a near word. It is detected whether or not the user is in a near / far state where words are to be used (hereinafter referred to as a near state).

【００３６】ユーザ識別部５５は、入力処理部５１を介
して入力された音声信号から、声紋を抽出したり、また
は画像認識部５３による画像認識された顔の画像から、
特徴量を抽出し、その抽出した声紋または特徴量に基づ
いて、ユーザを識別し、その識別結果を、制御部５０に
通知する。The user identification unit 55 extracts a voiceprint from a voice signal input via the input processing unit 51 or extracts a voice image from a face image recognized by the image recognition unit 53.
The feature amount is extracted, the user is identified based on the extracted voiceprint or feature amount, and the identification result is notified to the control unit 50.

【００３７】対話管理部５６は、入力処理部５１を介し
て入力された各信号や、時間経過等に基づいて、ロボッ
トが取るべき行動を決定し、決定した行動の内容を、行
動指令情報として、制御部５０に通知する。例えば、ロ
ボットが応答文を発話する行動をとる場合、例えば、位
置検出部５４により検出された遠近状態に対応する適切
な距離／方向依存言葉を使用して応答文を生成し、ロボ
ットの頭部や手足等を動作させる情報とともに、制御部
５０に通知する。例えば、「”いいえ”と発話して、首
を左右に振る」という行動指令情報が、制御部５０に通
知される。The dialogue management unit 56 determines the action to be taken by the robot based on each signal input through the input processing unit 51, the passage of time, and the like, and uses the determined action as action command information. , To the control unit 50. For example, when the robot takes an action of uttering a response sentence, for example, a response sentence is generated using an appropriate distance / direction-dependent word corresponding to the perspective state detected by the position detection unit 54, and the robot's head The control unit 50 is notified together with information for operating the limbs and the like. For example, the control unit 50 is notified of the action command information that “Speaks“ No ”and shakes the head left and right”.

【００３８】対話管理部５６はまた、音声認識／解析部
５２により音声認識および言語解析されたユーザの発話
の中に、距離／方向依存言葉が含まれている場合、その
距離／方向依存言葉に基づいて、例えば、ロボットとユ
ーザ、ロボットと対象物、またはユーザと対象物の遠近
状態を検出する（判定する）。If the user's utterance subjected to the speech recognition and language analysis by the speech recognition / analysis unit 52 includes a distance / direction dependent word, the dialog management unit 56 Based on this, for example, the distance between the robot and the user, the robot and the object, or the distance between the user and the object is detected (determined).

【００３９】出力生成部５７は、対話管理部５６により
生成された行動指令情報に、応答文の発話を指令が含ま
れている場合、その応答文（例えば、「いいえ」）に対
応する合成音を生成し、音声出力部５８に出力する。出
力生成部５７はまた、行動指令情報に基づいて、動作パ
ターン（ロボットが、首を左右に振るための動作パター
ン）を生成し、動作制御部５９に送出する。When the action command information generated by the dialogue management unit 56 includes a command to utter a response sentence, the output generation unit 57 outputs a synthesized sound corresponding to the response sentence (for example, “No”). Is generated and output to the audio output unit 58. The output generation unit 57 also generates an operation pattern (an operation pattern for the robot to swing the head left and right) based on the action command information, and sends it to the operation control unit 59.

【００４０】音声出力部５８は、出力生成部５７により
生成された合成音を、スピーカ１８から出力させる。こ
れにより、ロボットは、応答文、例えば、「いいえ」を
発話する。The audio output section 58 causes the speaker 18 to output the synthesized sound generated by the output generation section 57. Thereby, the robot utters a response sentence, for example, “No”.

【００４１】動作制御部５９は、出力生成部５７により
生成された動作パターンに基づいて、アクチュエータ３
ＡＡ1乃至５Ａ1および５Ａ2を駆動するための制御信号
を生成し、アクチュエータ３ＡＡ1乃至５Ａ1および５Ａ
2に送出する。これにより、アクチュエータ３ＡＡ1乃至
５Ａ1および５Ａ2は、その制御信号にしたがって駆動す
る。これにより、ロボットは、自律的に行動を起こし、
例えば、首を左右に振る。The operation control section 59 determines whether or not the actuator 3 has been operated based on the operation pattern generated by the output generation section 57.
A control signal for driving AA1 to 5A1 and 5A2 is generated, and actuators 3AA1 to 5A1 and 5A
Send to 2. Thus, the actuators 3AA1 to 5A1 and 5A2 are driven according to the control signals. This allows the robot to act autonomously,
For example, shake your head left or right.

【００４２】制御部５０は、上述した各部を制御する。The control section 50 controls each section described above.

【００４３】次に、ロボット、ユーザ、および対象物の
位置、それぞれの間の距離、並びにロボットとユーザそ
れぞれの向きを検出する場合の位置検出部５４の処理に
ついて説明する。Next, the processing of the position detecting section 54 for detecting the positions of the robot, the user, and the object, the distance between them, and the directions of the robot and the user will be described.

【００４４】はじめに、ロボットの位置を検出する場
合の処理手順を説明する。この場合、位置検出部５４
は、入力処理部５１を介して入力される、GPS受信部１
９で受信された所定の人工衛星からの電波に基づいて、
ロボットの現在位置に対応する緯度および経度を算出
し、ロボットの絶対位置を検出する。First, a processing procedure for detecting the position of the robot will be described. In this case, the position detector 54
Is the GPS receiving unit 1 input through the input processing unit 51.
9, based on the radio wave from the predetermined satellite received
The latitude and longitude corresponding to the current position of the robot are calculated, and the absolute position of the robot is detected.

【００４５】位置検出部５４はまた、入力処理部５１を
介して入力される、所定の範囲内（例えば、部屋の中）
の物体から反射される、超音波センサ２２から送信され
た超音波の反射波を用いて、ロボットの、その範囲（部
屋の中）における相対的な位置を検出する。The position detecting section 54 is also provided within a predetermined range (for example, in a room) inputted through the input processing section 51.
The relative position of the robot in the range (in the room) is detected using the reflected wave of the ultrasonic wave transmitted from the ultrasonic sensor 22 and reflected from the object.

【００４６】ロボット（例えば、頭部ユニット４の先
端）が向いている方向を検出する場合の処理を説明す
る。この場合、位置検出部５４は、入力処理部５１を介
して入力される、ジャイロコンパス２０および地磁気セ
ンサ２１の検出結果に基づいて、ロボットが向いている
方向（方角）を検出する。A process for detecting the direction in which the robot (for example, the tip of the head unit 4) faces will be described. In this case, the position detection unit 54 detects the direction (direction) in which the robot is facing, based on the detection results of the gyrocompass 20 and the geomagnetic sensor 21 input via the input processing unit 51.

【００４７】次に、ユーザの位置（ロボットから見た
位置）を検出する場合の処理を説明する。位置検出部５
４は、ユーザ識別部５５によりユーザであると識別され
た物体から反射される、超音波センサ２２から送信され
た超音波の反射波を用いて、ユーザの相対位置を検出す
る。Next, processing for detecting the position of the user (position viewed from the robot) will be described. Position detector 5
4 detects the relative position of the user using the reflected wave of the ultrasonic wave transmitted from the ultrasonic sensor 22 and reflected from the object identified as the user by the user identification unit 55.

【００４８】位置検出部５４はまた、ユーザから音が発
生している場合（例えば、ユーザが発話している場
合）、入力処理部５１を介して入力される、マイク１５
（この場合、複数のマイク１５）からの音声信号の音量
差、位相差、または音色差などに基づいて、ユーザ（音
源）が位置する方向を検出する。すなわち、位置検出部
５４は、ユーザ識別部５５による識別結果を利用して、
音源がユーザであるか否かを判定することもできる。The position detecting section 54 also receives the sound from the user (for example, when the user speaks), the microphone 15 input via the input processing section 51,
In this case, the direction in which the user (sound source) is located is detected based on a volume difference, a phase difference, a tone color difference, or the like of the audio signal from the plurality of microphones 15. That is, the position detection unit 54 uses the identification result by the user identification unit 55 to
It can also be determined whether the sound source is a user.

【００４９】ユーザ（例えば、顔）が向いている方向
（ロボットから見た方向）を検出する場合の処理につい
て説明する。位置検出部５４は、画像認識部５３により
画像認識処理が施された顔の画像から、特徴量を抽出
し、抽出した特徴量に基づいて、ユーザが向いている方
向を検出する。Processing for detecting the direction in which the user (for example, face) is facing (the direction viewed from the robot) will be described. The position detection unit 54 extracts a feature amount from the face image subjected to the image recognition processing by the image recognition unit 53, and detects a direction in which the user is facing based on the extracted feature amount.

【００５０】位置検出部５４はまた、画像認識部５３に
より画像認識処理が施されたユーザの画像の変移を（ど
の方向にどれだけ移動したかを）検出し、移動した方向
をユーザが向いている方向に決定する。The position detecting section 54 detects a change (in which direction and how much the user has moved) of the image of the user subjected to the image recognition processing by the image recognizing section 53 and turns the moving direction. Direction.

【００５１】なお、ユーザの顔画像も、移動方向も取得
することができない場合、ユーザがロボットの方向を向
いていると仮定して、の方法で求めたユーザの位置か
ら、ユーザの方向を求めることができる。If neither the user's face image nor the moving direction can be obtained, the user's direction is obtained from the user's position obtained by the method assuming that the user is facing the robot. be able to.

【００５２】次に、ロボットとユーザの距離を検出す
る場合の処理について説明する。位置検出部５４は、ユ
ーザ識別部５５によりユーザであると識別された物体か
ら反射される、超音波センサ２２から送信された超音波
の反射波を用いて、ロボットとユーザの距離を検出す
る。Next, a process for detecting the distance between the robot and the user will be described. The position detection unit 54 detects the distance between the robot and the user by using the reflected wave of the ultrasonic wave transmitted from the ultrasonic sensor 22 and reflected from the object identified as the user by the user identification unit 55.

【００５３】位置検出部５４はまた、ＣＣＤカメラ１６
（この場合、複数のＣＣＤカメラ１６）により、異なる
角度から撮像された複数のユーザの画像から、視差を検
出し、その視差に基づいて、ロボットとユーザの距離を
検出する。The position detecting section 54 is also provided with the CCD camera 16.
(In this case, a plurality of CCD cameras 16), parallax is detected from a plurality of user images captured from different angles, and a distance between the robot and the user is detected based on the parallax.

【００５４】位置検出部５４はさらに、ＣＣＤカメラ１
６によりユーザの顔が撮像されている場合、画像認識部
５３により画像認識処理が施されたその画像から、ユー
ザの顔の大きさを検出し、その検出結果に基づいてロボ
ットとユーザの距離を検出する。The position detector 54 further includes a CCD camera 1
6, when the user's face is imaged, the size of the user's face is detected from the image subjected to the image recognition processing by the image recognition unit 53, and the distance between the robot and the user is determined based on the detection result. To detect.

【００５５】次に、対象物の位置を検出する場合の処
理について説明する。位置検出部５４は、画像認識部５
３により対象物であると識別された物体から反射され
る、超音波センサ２２から送信された超音波の反射波を
用いて、対象物の相対的な位置を検出する。Next, the processing for detecting the position of the object will be described. The position detection unit 54 includes the image recognition unit 5
The relative position of the target object is detected using the reflected wave of the ultrasonic wave transmitted from the ultrasonic sensor 22 and reflected from the object identified as the target object by 3.

【００５６】位置検出部５４は、ＣＣＤカメラ１６によ
り、対象物を指し示している、ユーザの指が撮像されて
いる場合、画像認識部５３により画像認識処理が施され
たその画像から、指が指し示している方向を検出し、そ
れにより、対象物が位置する方角を検出する。When the user's finger is being imaged by the CCD camera 16, the position detection unit 54 indicates the finger from the image subjected to the image recognition processing by the image recognition unit 53. Direction is detected, thereby detecting the direction in which the object is located.

【００５７】次に、ロボットと対象物の距離を検出す
る場合の処理について説明する。位置検出部５４は、画
像認識部５３により対象物であると認識された物体から
反射される、超音波センサ２２から送信された超音波の
反射波を用いて、ロボットと対象物との距離を検出す
る。Next, a process for detecting the distance between the robot and the object will be described. The position detection unit 54 uses the reflected wave of the ultrasonic wave transmitted from the ultrasonic sensor 22 reflected from the object recognized as the target object by the image recognition unit 53 to determine the distance between the robot and the target object. To detect.

【００５８】位置検出部５４はまた、ＣＣＤカメラ１６
（この場合、複数のＣＣＤカメラ１６）により、異なる
角度から撮像された複数の対象物の画像から、視差を検
出し、その視差に基づいて、対象物までの距離を検出す
る。The position detecting section 54 is also provided with the CCD camera 16.
(In this case, a plurality of CCD cameras 16), parallax is detected from a plurality of images of the object taken from different angles, and a distance to the object is detected based on the parallax.

【００５９】ユーザと対象物の距離を検出する場合の
処理について説明する。上述ので説明した方法で検出
されたユーザの位置およびで説明した方法で検出され
た対象物の位置から、それらの距離が検出される。The processing for detecting the distance between the user and the object will be described. From the position of the user detected by the method described above and the position of the object detected by the method described above, their distances are detected.

【００６０】次に、上述の、、およびで説明した
方法で検出されたロボット、ユーザ、および対象物の位
置に基づいて、ロボットから見た、ロボットとユーザ、
またはロボットと対象物の遠近状態を検出（判定）する
場合の位置検出部５４の処理について説明する。Next, based on the positions of the robot, the user, and the object detected by the methods described above and above, the robot and the user, as viewed from the robot,
Alternatively, the process of the position detection unit 54 when detecting (determining) the near / far state between the robot and the target object will be described.

【００６１】位置検出部５４は、図５に示すように、ロ
ボットの位置（図５の例では、頭部ユニット４の中心位
置）に対応する点を原点とし、その原点を通るｙ軸（図
中、縦方向の軸）とｚ軸（図中、横方向の軸）からなる
２次元空間を、例えば、メモリ１０Ｂに設定し、その２
次元空間上に、遠近判定パターンＡを形成（描画）す
る。この遠近判定パターンＡは、この場合、ロボットか
ら前方に、例えば、２ｍ離れた位置に対応するｚ軸上の
点（ｚ軸上のプラス側の点）、ロボットから左右方向に
それぞれ、例えば、１ｍ離れた位置に対応するｙ軸上の
点（ｙ軸上のプラス側およびマイナス側の点）、そし
て、ロボットから後方に、例えば、０．５ｍ離れた位置
に対応するｚ軸上の点（ｚ軸上のマイナス側の点）を通
る外縁で囲まれる範囲を示す。As shown in FIG. 5, the position detector 54 sets the point corresponding to the position of the robot (the center position of the head unit 4 in the example of FIG. 5) as the origin, and the y-axis (FIG. For example, a two-dimensional space including a middle axis and a vertical axis) and a z axis (horizontal axis in the figure) is set in the memory 10B.
A perspective determination pattern A is formed (drawn) on the dimensional space. In this case, the perspective determination pattern A includes, for example, a point on the z-axis (a point on the plus side on the z-axis) corresponding to a position 2 m away from the robot, and a distance of 1 m, for example, in the left-right direction from the robot. A point on the y-axis (a point on the plus side and a minus side on the y-axis) corresponding to a position away from the robot, and a point on the z-axis (z (A minus point on the axis).

【００６２】位置検出部５４は、検出したユーザまたは
対象物の位置を、図５の２次元空間上の位置（座標）に
変換し、その座標が、遠近判定パターンＡの範囲内に存
在するか否かを判定し、遠近判定パターンＡ内に存在す
ると判定した場合、ロボットとユーザ、またはロボット
と対象物は近い状態であると判定し、一方、遠近判定パ
ターンＡ内に存在しないと判定した場合、遠い状態であ
ると判定する。The position detection unit 54 converts the detected position of the user or the object into a position (coordinates) in the two-dimensional space in FIG. 5 and determines whether the coordinates are within the range of the perspective judgment pattern A. If it is determined that the robot is present in the perspective determination pattern A, it is determined that the robot and the user, or the robot and the object are in a close state, while it is determined that the robot and the user are not present in the perspective determination pattern A. Is determined to be far away.

【００６３】例えば、図６に示すように、ロボットとユ
ーザが近寄っている場合（ユーザが、ロボットの前方２
ｍ以内に近寄っている場合）、検出されたユーザの位置
の座標は、遠近判定パターンＡ内に存在するので、ロボ
ットとユーザは、近い状態であると判定される。一方、
図７に示すように、ロボットとユーザが遠く離れている
場合（ユーザが、ロボットの前方２ｍ以上離れている場
合）、検出されたユーザの位置の座標は、遠近判定パタ
ーンＡ外に存在するので、ロボットとユーザは、遠い状
態であると判定される。For example, as shown in FIG. 6, when the robot and the user are close to each other (when the user
m), the coordinates of the detected position of the user are present in the perspective determination pattern A, so that the robot and the user are determined to be in a close state. on the other hand,
As shown in FIG. 7, when the robot and the user are far apart (when the user is more than 2 m ahead of the robot), the coordinates of the detected position of the user are outside the perspective determination pattern A. , The robot and the user are determined to be far away.

【００６４】なお、この判定においては、パーセプトロ
ン（遠近判定パターンＡ内であるか外であるかを判定す
るニューラルネットワーク）を用意し、ロボットに対す
るユーザおよび対象物の位置（座標）と、近い状態また
は遠い状態との対応関係を予め学習し、それを用いても
良い。In this judgment, a perceptron (a neural network for judging whether the object is inside or outside the perspective judgment pattern A) is prepared, and the position (coordinates) of the user and the object with respect to the robot is in a state close to the position or coordinates. The correspondence with the distant state may be learned in advance and used.

【００６５】また、以上においては、遠近判定パターン
Ａを利用して、ロボットから見た、ロボットとユーザ、
またはロボットと対象物の遠近状態を判定する場合を説
明したが、位置検出部５４は、図５の２次元空間（ロボ
ットの位置を原点とし、その原点を通るｙ軸およびｚ軸
からなる２次元空間）上に、図８に示すように、ユーザ
の位置に対応する点を中心とする円形の遠近判定パター
ンＢをさらに形成して、ロボットから見た場合の、ユー
ザと対象物の遠近状態（「ロボットから見て、対象物は
ユーザの近くにあるか」）を判定することもできる。遠
近判定パターンＢは、この場合、ユーザを中心とする円
の範囲を示すが、これは、ロボットから見て、ユーザに
取っての近い状態または遠い状態の範囲は、ユーザの向
きに関係しないためである。つまり、向きに依存しない
形状となっている。Further, in the above description, using the perspective judgment pattern A, the robot and the user,
Alternatively, the case where the distance between the robot and the target object is determined has been described. However, the position detection unit 54 may be configured to use the two-dimensional space shown in FIG. As shown in FIG. 8, a circular perspective determination pattern B centered on a point corresponding to the position of the user is further formed on the space, and the perspective state of the user and the target when viewed from the robot ( "Is the object near the user as seen from the robot?" In this case, the perspective determination pattern B indicates a range of a circle centered on the user. This is because the range of a state close or far from the robot is not related to the orientation of the user. It is. That is, the shape does not depend on the direction.

【００６６】例えば、図９に示すように、遠近判定パタ
ーンＡと遠近判定パターンＢが重なり合う範囲（図中、
範囲１）に、対象物の位置の座標が存在する場合、ロボ
ットから見て、ロボットと対象物、およびユーザと対象
物は、それぞれ近い状態にあると判定される。一方、図
１０に示すように、遠近判定パターンＡと遠近判定パタ
ーンＢが重なり合っている範囲がない場合、ロボットか
ら見て、ロボットと対象物、およびユーザと対象物が、
それぞれ近い状態であると判定される場合はない。For example, as shown in FIG. 9, a range where the perspective judgment pattern A and the perspective judgment pattern B overlap (in the figure,
When the coordinates of the position of the target object exist in the range 1), it is determined that the robot and the target object and the user and the target object are close to each other when viewed from the robot. On the other hand, as shown in FIG. 10, when there is no area where the perspective determination pattern A and the perspective determination pattern B overlap each other, the robot and the object, and the user and the object are viewed from the robot.
There is no case where it is determined that they are close to each other.

【００６７】また、遠近判定パターンＡの範囲（図１０
中、範囲２）および遠近判定パターンＡの、遠近判定パ
ターンＢと重なり合わない範囲（図９中、範囲２）に、
対象物の座標が存在する場合、ロボットから見て、ロボ
ットと対象物は、近い状態であり、ユーザと対象物は、
遠い状態であると判定される。The range of the perspective judgment pattern A (FIG. 10)
The middle and the range 2) and the range of the perspective determination pattern A that do not overlap with the perspective determination pattern B (range 2 in FIG. 9)
When the coordinates of the object exist, the robot and the object are close to each other when viewed from the robot, and the user and the object are
It is determined that the state is far.

【００６８】また、遠近判定パターンＢの、遠近判定パ
ターンＡと重なり合わない範囲（図９中、範囲３）およ
び遠近判定パターンＢの範囲（図１０中、範囲３）に、
対象物の座標が存在する場合、ロボットから見て、ロボ
ットと対象物は、遠い状態であり、ユーザと対象物は、
近い状態であると判定される。さらに、遠近判定パター
ンＡおよび遠近判定パターンＢの両方に含まれない範囲
（図９，１０中、範囲４）に、対象物の座標が存在する
場合、ロボットから見て、ロボットと対象物、およびユ
ーザと対象物は、それぞれ遠い状態であると判定され
る。The range of the perspective judgment pattern B that does not overlap with the perspective judgment pattern A (range 3 in FIG. 9) and the range of the perspective judgment pattern B (range 3 in FIG. 10)
When the coordinates of the object exist, the robot and the object are far from each other when viewed from the robot, and the user and the object are
It is determined that the state is close. Furthermore, when the coordinates of the object exist in a range (range 4 in FIGS. 9 and 10) that is not included in both the perspective determination pattern A and the perspective determination pattern B, the robot and the object, and The user and the object are each determined to be far away.

【００６９】なお、図５，８，９，１０を参照して説明
した判定方法は、ロボットから見た場合の遠近状態の判
定する方法であるが、位置検出部５４は、図１１および
図１２に示すように、遠近判定パターンＡおよび遠近判
定パターンＢを、逆に利用することにより、ユーザから
見た場合の、ユーザと対象物、またはロボットと対象物
の遠近状態を判定することもできる。The determination method described with reference to FIGS. 5, 8, 9, and 10 is a method for determining the distance from the viewpoint of the robot. As shown in (1), by using the perspective judgment pattern A and the perspective judgment pattern B in reverse, the perspective state of the user and the target object or the robot and the target object when viewed from the user can be determined.

【００７０】この場合、位置検出部５４は、図１１に示
すように、ロボットの位置（座標）を原点とし、その原
点を通るｙ軸とｚ軸からなる２次元空間に、原点を中心
とする遠近判定パターンＢを形成し、そして、図１２に
示すように、ユーザの位置（座標）を中心とする遠近判
定パターンＡを形成する。In this case, as shown in FIG. 11, the position detector 54 sets the robot position (coordinates) as the origin and sets the center in the two-dimensional space including the y-axis and the z-axis passing through the origin. A perspective determination pattern B is formed, and a perspective determination pattern A centered on the position (coordinates) of the user is formed as shown in FIG.

【００７１】例えば、図１３に示すように、遠近判定パ
ターンＡと遠近判定パターンＢが重なり合う範囲（図
中、範囲１）に、対象物の位置の座標が存在する場合、
ユーザから見て、ユーザと対象物、およびロボットと対
象物は、それぞれ近い状態にあると判定される。一方、
図１４に示すように、遠近判定パターＡと遠近判定パタ
ーンＢが重なり合う範囲がない場合、ユーザから見て、
ユーザと対象物、およびロボットと対象物が、それぞれ
近い状態であると判定される場合はない。なお、図１
３，１４中、範囲１範囲４は、図９，１０における場合
と対応し、対象物の座標が範囲２乃至範囲４に存在する
場合の判定結果は、図９，１０の場合と同様であるの
で、その判定結果の説明は省略する。For example, as shown in FIG. 13, when the coordinates of the position of the object exist in a range (range 1 in the figure) where the perspective judgment pattern A and the perspective judgment pattern B overlap,
From the user's point of view, it is determined that the user and the object and the robot and the object are close to each other. on the other hand,
As shown in FIG. 14, when there is no range in which the perspective determination pattern A and the perspective determination pattern B overlap, as viewed from the user,
There is no case where it is determined that the user and the target object and the robot and the target object are close to each other. FIG.
3 and 14, range 1 range 4 corresponds to the case in FIGS. 9 and 10, and the determination result when the coordinates of the object are in range 2 to range 4 is the same as that in FIGS. Therefore, description of the determination result is omitted.

【００７２】また、以上においては、遠近判定パターン
Ａまたは遠近判定パターンＢを利用して、遠近状態を判
定する方法を説明したが、例えば、単に、実際の距離に
基づいて、遠近状態を判定することもできる。例えば、
ロボットと、対象物またはユーザとの距離が、２ｍ未満
である場合、近い状態であると判定し、それ以上である
場合、遠い状態であると判定することができる。In the above description, the method of determining the perspective state using the perspective determination pattern A or the perspective determination pattern B has been described. However, for example, the perspective state is determined simply based on the actual distance. You can also. For example,
When the distance between the robot and the target object or the user is less than 2 m, it can be determined that the robot is close, and when it is longer than that, it can be determined that the robot is far.

【００７３】さらに、このように実際の距離で遠近状態
を判定する場合、ロボットと対象物との距離、またはロ
ボットとユーザとの距離を１つの変数とし、近い場合と
遠い場合とについてそれぞれ度合いを算出し、算出した
度合いが大きい方を距離の状態とするという方法も可能
である。Further, when determining the perspective state based on the actual distance as described above, the distance between the robot and the object or the distance between the robot and the user is used as one variable, and the degree is determined for each of the near and far cases. A method is also possible in which the calculated degree is calculated as the distance state.

【００７４】例えば、近さの度合いとして図１５の実線
の関数を用意し、遠さの度合いとして図１５の点線の関
数（遠さの度合い＝１−近さの度合い）を用意する。実
線は、距離が０ｍから２ｍまでは度合いが１で、５ｍ以
上では、度合いが０であり、２ｍから５ｍまでの間は、
距離が大きくなるにつれて度合いが１から０へ直線的に
減少する関数である。For example, the function of the solid line in FIG. 15 is prepared as the degree of closeness, and the function of the dotted line in FIG. 15 (degree of distance = 1−degree of closeness) is prepared as the degree of distance. The solid line indicates that the degree is 1 when the distance is 0 m to 2 m, the degree is 0 when the distance is 5 m or more, and that the distance is 2 m to 5 m.
It is a function in which the degree decreases linearly from 1 to 0 as the distance increases.

【００７５】一方、点線は、実線とは逆に、距離が０ｍ
から２ｍまでは度合いが０で、５ｍ以上は度合いが１で
あり、２ｍから５ｍまでの間は距離が大きくなるにつれ
て度合いが０から１へ直線的に増加する関数である。On the other hand, the dotted line has a distance of 0 m, contrary to the solid line.
The function is a function in which the degree is 0 from 1 m to 2 m, the degree is 1 from 5 m or more, and the degree linearly increases from 0 to 1 as the distance increases from 2 m to 5 m.

【００７６】距離が、３ｍの場合、近さの度合い（実
線）＝２／３であり、遠さの度合い（点線）＝１／３で
あり、近さの度合い＞遠さの度合いであるので、「近い
状態」であると判定される。また距離が４ｍの場合、近
さの度合い（実線）＝１／３（実線）で、遠さの度合い
（点線）＝２／３であり、近さの度合い＜遠さの度合い
であるので、「遠い状態」であると判定される。When the distance is 3 m, the degree of closeness (solid line) = ２, the degree of distance (dotted line) = 1/3, and the degree of closeness> the degree of distance , "Close state". Also, when the distance is 4 m, the degree of closeness (solid line) = 1/3 (solid line), the degree of distance (dotted line) = 2/3, and the degree of closeness <the degree of distance, It is determined that the state is “distant”.

【００７７】なお、以上のように、近い状態の度合いま
たは遠い状態の度合いを演算するにあたり、距離の他、
例えば、ロボットまたはユーザの向きも変数とすること
ができ、これにより、遠近判定パターンＡ（例えば、図
５）のように、近い状態と判定される範囲が、前方にお
いて広く、後方において狭くすることもできる。As described above, when calculating the degree of the near state or the degree of the distant state, in addition to the distance,
For example, the orientation of the robot or the user can also be a variable, whereby the range in which the state is determined to be close is widened in the front and narrowed in the rear as in the perspective determination pattern A (for example, FIG. 5). Can also.

【００７８】次に、対話管理部５６の詳細について説明
する。図１６は、対話管理部５６の応答文を生成した
り、遠近状態を判定する部分の構成例を示している。Next, details of the dialogue management section 56 will be described. FIG. 16 shows an example of a configuration of a part for generating a response sentence of the dialog management unit 56 and determining a perspective state.

【００７９】対話処理部１１１には、音声認識／解析部
５２により音声認識処理および言語解析処理が施された
ユーザの発話（例えば、テキスト）の他、また位置検出
部５４による遠近状態の判定結果が、制御部５０を介し
て入力される。対話処理部１１１は、入力された発話の
内容に対応する応答文を、対話規則テーブル１１２およ
び位置／言葉対応テーブル１１３を参照して生成する
が、その応答文に距離／方向依存言葉が含まれる場合、
対話処理部１１１は、位置検出部５４による遠近状態の
判定結果に対応する適切な距離／方向依存言葉を用い
て、応答文を生成する。The dialogue processing unit 111 includes, in addition to the user's utterance (for example, text) that has been subjected to the voice recognition processing and the language analysis processing by the voice recognition / analysis unit 52, and the determination result of the perspective state by the position detection unit 54. Is input via the control unit 50. The dialog processing unit 111 generates a response sentence corresponding to the content of the input utterance with reference to the interaction rule table 112 and the position / word correspondence table 113, and the response sentence includes a distance / direction dependent word. If
The dialogue processing unit 111 generates a response sentence using an appropriate distance / direction-dependent word corresponding to the result of the determination of the perspective state by the position detection unit 54.

【００８０】対話規則テーブル１１２には、入力処理部
５１を介して入力された各信号、音声認識／解析部５２
による音声認識結果、画像認識部５３による画像認識結
果等が、対話履歴として整理され記憶されている。Each signal input through the input processing unit 51, the speech recognition / analysis unit 52
, The image recognition result by the image recognition unit 53, and the like are arranged and stored as a conversation history.

【００８１】距離／言葉対応テーブル１１３には、距離
／呼びかけ言葉対応テーブル１２１（図１７）および距
離／こそあど言葉対応テーブル１２２（図１８）が含ま
れている。距離／呼びかけ言葉対応テーブル１２１は、
ロボットから見た、近い状態および遠い状態に対応し
て、「ねえ」または「おーい」の相手に呼びかける時に
用いられる言葉（以下、呼びかけ言葉と称する）が示さ
れている。The distance / word correspondence table 113 includes a distance / calling word correspondence table 121 (FIG. 17) and a distance / sword word correspondence table 122 (FIG. 18). The distance / calling word correspondence table 121 is
The words (hereinafter referred to as “calling words”) used when calling the “Hey” or “Hey” partner in response to the near state and the far state seen from the robot are shown.

【００８２】例えば、図６に示したように、ユーザとロ
ボットが近い状態である場合、一方が他方を呼びかける
とき、「ねえ」など、近い相手を呼びかけるための呼び
かけ言葉が用いられる。一方、図７に示したように、ロ
ボットとユーザが遠い状態である場合、一方が他方を呼
びかけるとき、「おーい」など、遠い相手を呼びかける
ための呼びかけ言葉が用いられる。すなわち、図１７の
距離／呼びかけ言葉対応テーブル１２１には、呼びかけ
る相手の位置関係（遠近状態）と、その遠近状態におい
て適切な呼びかけ言葉の対応関係が示されている。For example, as shown in FIG. 6, when the user and the robot are close to each other, when one calls the other, a call word for calling a close partner such as "Hey" is used. On the other hand, as shown in FIG. 7, when the robot and the user are far from each other, when one of them calls the other, a call word for calling a distant partner such as “hey” is used. That is, the distance / calling word correspondence table 121 in FIG. 17 shows the positional relationship (perspective state) of the calling party and the corresponding relation between the calling words appropriate in the perspective state.

【００８３】距離／こそあど言葉対応テーブル１２２
は、ロボットと対象物の遠近状態およびユーザとロボッ
トの遠近状態のそれぞれの組み合わせを示す４つの列側
の項目（列項目１乃至列項目４）、並びにロボットの発
話に使用されるこそあど言葉を示す行側の項目（行項目
１）、およびユーザの発話に使用されるこそあど言葉を
示す行側の項目（行項目２）が配置されている。なお、
こそあど言葉とは、「これ」、「あれ」、または「そ
れ」など、所定の対象を表す言葉を意味する。Distance / Shoreado Word Correspondence Table 122
Are four column-side items (column item 1 to column item 4) indicating respective combinations of the perspective state of the robot and the object and the perspective state of the user and the robot, and the words used in the utterance of the robot. A line-side item (line item 1) and a line-side item (line item 2) indicating a word used in the user's utterance are arranged. In addition,
However, the word means a word representing a predetermined object, such as "this", "that", or "it".

【００８４】ロボットと対象物、並びにユーザと対象物
が、それぞれ近い状態であるの組み合わせを示す列項目
１と、ロボットの発話に使用されるこそあど言葉を示す
行項目１に対応するデータ領域には、”これ”、”こい
つ”、”この”、”ここの”など、近くのものを示すこ
そあど言葉が示されている。また、列項目１と、ユーザ
の発話に使用されるこそあど言葉を示す行項目２に対応
するデータ領域にも、列項目１と行項目１に対応するデ
ータ領域と同様のこそあど言葉が示されている。The data area corresponding to the column item 1 indicating the combination of the robot and the object and the user and the object being in the close state and the line item 1 indicating the words used in the utterance of the robot are stored in the data area. Is a word that indicates something close to it, such as "this", "this guy", "this", "here". The data area corresponding to the column item 1 and the row item 2 indicating the word used in the utterance of the user is also indicated by the same word area as the data area corresponding to the column item 1 and the row item 1. Have been.

【００８５】図１９に示すように、ロボットから見て、
ロボットと対象物、およびユーザと対象物がそれぞれ近
い状態である場合、ロボットおよびユーザは、通常、そ
の対象物（この例の場合、箱）を、近くのものを示す
「この」などのこそあど言葉を利用して表す。すなわ
ち、距離／こそあど言葉対応テーブル１２２の列項目１
と、行項目１および行項目２に対応する各データ領域に
は、図１９に示す遠近状態において使用されるこそあど
言葉が示されている。As shown in FIG. 19, when viewed from the robot,
When the robot and the object, and the user and the object are close to each other, the robot and the user usually place the object (in this example, a box) in a state such as “this” indicating a nearby object. Express using words. That is, column item 1 in the distance / sword word correspondence table 122
In each data area corresponding to line item 1 and line item 2, the words used in the perspective state shown in FIG. 19 are shown.

【００８６】ロボットと対象物が近い状態であり、ユー
ザと対象物が遠い状態である組み合わせの列項目２と、
行項目１に対応するデータ領域には、”これ”、”こい
つ”、”この”、”ここの”など、近くのものを示すこ
そあど言葉が示されている。また、列項目２と、行項目
２に対応するデータ領域には、”それ”、”そい
つ”、”その”、そこの”などの遠くのものを示すこそ
あど言葉が示されている。Column item 2 of a combination in which the robot and the object are close and the user and the object are far,
In the data area corresponding to the line item 1, words such as “this”, “this”, “this”, and “here” are shown to indicate nearby things. Further, in the data area corresponding to the column item 2 and the row item 2, words that indicate distant items such as "it", "that", "the", "there" are shown.

【００８７】図２０に示すように、ロボットから見て、
ロボットと対象物が近い状態で、ユーザと対象物が遠い
状態である場合、通常、ロボットは、その対象物を、近
くのものを示す「この」などのこそあど言葉を利用し表
す。一方、ユーザは、その対象物を、遠くのものを示す
「その」などのこそあど言葉を利用して表す。すなわ
ち、列項目２と、行項目１および行項目２に対応する各
データ領域には、図２０に示す遠近状態において使用さ
れるこそあど言葉が示されている。As shown in FIG. 20, as viewed from the robot,
When the robot and the object are close to each other and the user and the object are far from each other, the robot usually expresses the object using words such as “this” indicating a nearby object. On the other hand, the user expresses the object using words, such as "the" indicating a distant object. That is, the words used in the perspective state shown in FIG. 20 are shown in the data area corresponding to the column item 2 and the row items 1 and 2.

【００８８】ロボットと対象物が遠い状態であり、ユー
ザと対象物が近い状態である組み合わせの列項目３と、
行項目１に対応するデータ領域には、”それ”、”そい
つ”、”その”、そこの”などの遠くのものを示すこそ
あど言葉が示されている。また、列項目３と、行項目２
に対応するデータ領域には、”これ”、”こいつ”、”
この”、”ここの”など、近くのものを示すこそあど言
葉が示されている。A column item 3 of a combination in which the robot and the object are far away and the user and the object are close,
In the data area corresponding to the row item 1, the words indicating the distant things such as "it", "that", "the", "there" are shown. Item 2
In the data area corresponding to, "this", "this guy", "
Words are shown to indicate something near, such as "this" or "here".

【００８９】図２１に示すように、ロボットから見て、
ロボットと対象物が遠い状態であり、ユーザと対象物が
近い状態である場合、通常、ロボットは、その対象物
を、遠くのものを示す「その」などのこそあど言葉を利
用して表す。一方、ユーザは、その対象物を、近くのも
のを示す「この」などのこそあど言葉を利用して表す。
すなわち、列項目３と、行項目１および行項目２に対応
する各データ領域には、図２１に示す遠近状態において
使用されるこそあど言葉が示されている。As shown in FIG. 21, as viewed from the robot,
When the robot and the object are in a distant state and the user and the object are in a close state, usually, the robot expresses the object using words such as “the” indicating a distant object. On the other hand, the user expresses the target object using words such as “this” indicating a nearby object.
That is, the words used in the perspective state shown in FIG. 21 are shown in the data areas corresponding to the column item 3 and the row items 1 and 2.

【００９０】ロボットと対象物、並びにユーザと対象物
が、それぞれ遠い状態である組み合わせの列項目４と、
行項目１および行項目２に対応するそれぞれのデータ領
域には、”あれ”、”あいつ”、”あの”、”あそこ
の”など、遠くのものを示す他のこそあど言葉が示され
ている。A column item 4 of a combination in which the robot and the object, and the user and the object are respectively far from each other,
In each data area corresponding to the line item 1 and the line item 2, other words indicating a distant thing such as "that", "that guy", "that", "that there" are shown. .

【００９１】図２２に示すように、ロボットから見て、
ロボットと対象物、およびユーザと対象物が、それぞれ
遠い状態である場合、通常、ロボットおよびユーザは、
その対象物を、遠くのものを示す「あの」などのこそあ
ど言葉を利用して表す。すなわち、列項目４と、行項目
１および行項目２に対応する各データ領域には、図２２
に示す遠近状態において使用されるこそあど言葉が示さ
れている。As shown in FIG. 22, as seen from the robot,
When the robot and the object and the user and the object are far from each other, usually, the robot and the user
The object is expressed using words, such as "that", which indicates a distant object. That is, in each data area corresponding to column item 4 and line item 1 and line item 2, FIG.
The words are used when used in the perspective state shown in FIG.

【００９２】なお、距離／呼びかけ言葉対応テーブル１
２１に示される呼びかけ言葉、および距離／こそあど言
葉対応テーブル１２２に示されるこそあど言葉は、上述
したように、対話相手や対象物との位置関係により、使
い分けられる言葉である。すなわち、距離／方向依存言
葉の１つのである。The distance / calling word correspondence table 1
The call words shown in FIG. 21 and the word words shown in the distance / short word correspondence table 122 are words that can be properly used depending on the positional relationship with the conversation partner and the object as described above. That is, it is one of the distance / direction dependent words.

【００９３】すなわち、対話処理部１１１は、位置検出
部５４による遠近状態の判定結果に応じて、適切な距離
／方向依存言葉を位置／言葉対応テーブル１１３（距離
／呼びかけ言葉対応テーブル１２１、距離／こそあど言
葉対応テーブル１２２）から選択し、応答文を生成す
る。That is, the dialogue processing unit 111 sends an appropriate distance / direction-dependent word to the position / word correspondence table 113 (distance / calling word correspondence table 121, distance / A response sentence is generated by selecting from the word correspondence table 122).

【００９４】対話処理部１１１はまた、入力された発話
の中に、距離／方向依存言葉が含まれている場合、位置
／言葉対応テーブル１１３を参照し、その距離／方向依
存言葉に対応する遠近状態を検出し、例えば、ロボット
とユーザ、ロボットと対象物、またはユーザと対象物の
遠近状態を判定する。When the input utterance includes a distance / direction dependent word, the dialog processing unit 111 refers to the position / word correspondence table 113 to determine the distance / direction corresponding to the distance / direction dependent word. The state is detected, and for example, the perspective state of the robot and the user, the robot and the object, or the user and the object is determined.

【００９５】次に、ロボットが応答文を発話する場合の
処理手順を、図２３のフローチャートを参照して説明す
る。なお、図２０に示すように、ロボットと対象物が近
い状態であり、ユーザと対象物が遠い状態である場合に
おいて、ユーザが「その箱の前に座れ」と発話し、これ
に応答して、ロボットが、「この箱ですね」と発話した
後、その箱の前に座る動作をする場合を例として説明す
る。Next, a processing procedure when the robot utters a response sentence will be described with reference to a flowchart of FIG. As shown in FIG. 20, when the robot and the object are in a close state and the user and the object are in a distant state, the user utters “sit in front of the box” and responds thereto. In the following, an example will be described in which the robot speaks "This box" and then sits in front of the box.

【００９６】ステップＳ１において、位置検出部５４
は、上述したおよびで示した方法で、ロボットの位
置およびロボットの向きを検出する処理を開始し、検出
したロボットの位置および向きに基づいて、図１１に示
したような、ロボットの位置を中心とする遠近判定パタ
ーンＢを形成する。In step S1, the position detector 54
Starts the process of detecting the position and orientation of the robot by the method described above and in the above, and based on the detected position and orientation of the robot, the center of the robot as shown in FIG. Is formed.

【００９７】ステップＳ２において、位置検出部５４
は、マイク１５から音声信号が入力されるまで待機し、
すなわち、ユーザが発話するまで待機し、ユーザが発話
して、その音声信号が入力されたとき、ステップＳ３に
進む。In step S2, the position detector 54
Waits until an audio signal is input from the microphone 15,
That is, the process waits until the user speaks, and when the user speaks and the voice signal is input, the process proceeds to step S3.

【００９８】ステップＳ３において、位置検出部５４
は、上述したの方法で、発話したユーザの位置を検出
し、またの方法で、ユーザの向きを検出し、検出した
ユーザの位置および向きに基づいて、図１２に示すよう
に、ユーザの位置を中心とする遠近判定パターンＡを形
成する。In step S3, the position detector 54
Detects the position of the uttering user by the method described above, detects the direction of the user by the other method, and, based on the detected position and direction of the user, as shown in FIG. Is formed.

【００９９】出力生成部５７は、このとき、例えば、
「目」に相当するＣＣＤカメラ１６が、発話したユーザ
の顔を撮像することができるように、必要なアクチュエ
ータ３乃至５アクチュエータ３ＡＡ1乃至５Ａ1および５
Ａ2を駆動させる動作パターンを生成し、動作制御部５
９に出力する。動作制御部５９は、それに基づいて必要
なアクチュエータ３ＡＡ1乃至５Ａ1および５Ａ2を制御
し、ＣＣＤカメラ１６をユーザに向ける。これにより、
発話したユーザの画像が、ＣＣＤカメラ１６により取り
込まれる。At this time, the output generation unit 57
Required actuators 3 to 5 actuators 3AA1 to 5A1 and 5 so that the CCD camera 16 corresponding to “eyes” can capture the face of the uttered user.
An operation pattern for driving A2 is generated, and the operation control unit 5
9 is output. The operation control section 59 controls the necessary actuators 3AA1 to 5A1 and 5A2 based on the operation, and directs the CCD camera 16 to the user. This allows
An image of the uttered user is captured by the CCD camera 16.

【０１００】ステップＳ４において、ユーザ識別部５５
は、発話したユーザを識別する。具体的には、ユーザ識
別部５５は、入力処理部５１を介して入力された音声信
号から、声紋を抽出し、画像認識部５３により画像認識
処理が施されたユーザの画像から、特徴量を抽出し、抽
出した声紋または特徴量に基づいて、ユーザを識別す
る。In step S4, the user identification section 55
Identifies the uttering user. Specifically, the user identification unit 55 extracts a voiceprint from the audio signal input via the input processing unit 51, and extracts a feature amount from the image of the user that has been subjected to the image recognition processing by the image recognition unit 53. The user is identified based on the extracted voiceprint or feature amount.

【０１０１】次に、ステップＳ５において、音声認識／
解析部５２は、ステップＳ２で入力されたユーザの発話
に対応する音声信号に対して、音声認識処理および言語
解析処理を施する。Next, in step S5, the voice recognition /
The analysis unit 52 performs a voice recognition process and a language analysis process on the voice signal corresponding to the utterance of the user input in step S2.

【０１０２】ステップＳ６において、対話管理部５６
は、ステップＳ５で、音声認識／解析部５２により音声
認識された、ユーザの発話の中に、距離／方向依存言葉
（呼びかけ言葉またはこそあど言葉）が含まれているか
否かを判定し、含まれていると判定した場合、ステップ
Ｓ７に進む。この例の場合、ユーザの発話の中には、距
離／方向依存言葉である「その」が含まれているので、
ステップＳ７に進む。In step S6, the dialogue management section 56
Determines in step S5 whether or not the user's utterance speech-recognized by the speech recognition / analysis unit 52 includes a distance / direction-dependent word (calling word or sword word), and includes If it is determined that it has been performed, the process proceeds to step S7. In this example, since the user's utterance includes the distance / direction-dependent word “that”,
Proceed to step S7.

【０１０３】ステップＳ７において、距離／方向依存言
葉で指示されるのもの位置や向きが検出される。具体的
には、はじめに、対話管理部５６（対話処理部１１１）
は、距離／言葉対応テーブル１１３を参照し、発話の中
に含まれている距離／方向依存言葉に対応する遠近状態
を検出する。この例の場合のユーザの発話に使用された
距離／方向依存言葉は、「その」であるので、距離／言
葉対応テーブル１１３のうち、距離／こそあど言葉対応
テーブル１２１の、「その」を含む行項目２が対応する
列項目２が参照され、その列項目２に示されるロボット
と対象物の遠近状態（近い状態）およびユーザと対象物
の遠近状態（遠い状態が）が検出される。In step S7, the position and orientation of the object indicated by the distance / direction dependent word are detected. Specifically, first, the dialog management unit 56 (the dialog processing unit 111)
Refers to the distance / word correspondence table 113 and detects a perspective state corresponding to the distance / direction-dependent word included in the utterance. Since the distance / direction-dependent word used in the user's utterance in this example is "that", the distance / word correspondence table 113 includes "the" of the distance / sword word correspondence table 121 in the distance / word correspondence table 113. The column item 2 corresponding to the row item 2 is referred to, and the perspective state (close state) between the robot and the object and the perspective state (distant state) between the user and the object indicated in the column item 2 are detected.

【０１０４】そして、次に、位置検出部５４は、対話管
理部５６により検出されたロボットと対象物、またはユ
ーザと対象物の遠近状態で特定される、ステップＳ１で
形成したロボットの位置を中心とする遠近判定パターン
Ｂまたはユーザの位置を中心とする遠近判定パターンＡ
の範囲から、例えば、上述したの方法で、距離／方向
依存言葉で指示されたものの位置を検出する処理を実行
する。Next, the position detecting section 54 centers on the position of the robot formed in step S1 which is specified by the distance between the robot and the object detected by the dialog managing section 56 or between the user and the object. Perspective determination pattern B or perspective determination pattern A centered on the position of the user
For example, a process of detecting the position of the object designated by the distance / direction dependent word from the range described above is performed by the method described above.

【０１０５】この例の場合（図２０）、ロボットと対象
物は近い状態であり、ユーザと対象物は遠い状態である
ので、図１３の例では、遠近判定パターンＡが重なり合
わない遠近判定パターンＢの部分、そして図１４の例で
は、遠近判定パターンＢに対象物の位置（座標）が存在
する。すなわち、これにより特定された範囲に、例え
ば、超音波センサ２２からの超音波が出射され。対象物
を検出する処理が行われる。これにより、所定の範囲を
特定することより、より効率的に対象物の位置を検出す
ることができる。In the case of this example (FIG. 20), the robot and the object are in a close state, and the user and the object are in a distant state. Therefore, in the example of FIG. In the part B, and in the example of FIG. 14, the position (coordinates) of the object exists in the perspective determination pattern B. That is, for example, an ultrasonic wave is emitted from the ultrasonic sensor 22 to the range specified by this. Processing for detecting an object is performed. Thereby, the position of the target object can be detected more efficiently by specifying the predetermined range.

【０１０６】ステップＳ６で、音声認識結果（ユーザの
発話）に、距離／方向依存言葉が含まれていないと判定
されたとき、またはステップＳ７で、対象物の位置が検
出された後、ステップＳ８に進み、ユーザの発話に対す
る応答として動作を行うための処理が実行される。When it is determined in step S6 that the distance / direction-dependent word is not included in the speech recognition result (user's utterance), or after the position of the object is detected in step S7, step S8 is performed. The processing for performing an operation as a response to the utterance of the user is executed.

【０１０７】ステップＳ８における処理の詳細は、図２
４のフローチャートに示されている。Details of the processing in step S8 are shown in FIG.
4 is shown in the flowchart.

【０１０８】ステップＳ２１において、対話管理部５６
は、ステップＳ２で入力されたユーザの発話に対応する
行動指令情報を生成するが、この行動指令情報に、距離
／方向依存言葉が含まれる応答文を発話する指令が含ま
れている場合、この時点において、応答文の、距離／方
向依存言葉が使用される部分は、距離／方向依存言葉が
使用される旨を示す情報（例えば、抽象化された言葉）
となっている。この例の場合、「”距離／方向依存言
葉”箱ですね」と発話し、箱の前の座る」という行動指
令情報が生成される。In step S21, the dialog management unit 56
Generates action command information corresponding to the user's utterance input in step S2. If the action command information includes a command to utter a response sentence including a distance / direction-dependent word, this At this point, the portion of the response sentence where the distance / direction-dependent word is used is information indicating that the distance / direction-dependent word is used (eg, an abstracted word).
It has become. In the case of this example, the action command information is generated, saying “Speaks“ distance / direction dependent word ”box and sits in front of the box”.

【０１０９】ステップＳ２２において、対話管理部５６
は、ステップＳ２１で生成した行動指令情報に、応答文
を発話する指令が含まれ、かつ、その応答文に距離／方
向依存言葉が使用されているか（例えば、距離／方向依
存言葉の使用を示す抽象化された言葉が応答文に含まれ
ているか）を判定し、応答文を発話する指令が含まれ、
その応答文に距離／方向依存言葉が含まれていると判定
した場合、ステップＳ２３に進む。この例の場合、ステ
ップＳ２３に進む。In step S22, the dialog management unit 56
Indicates whether the action command information generated in step S21 includes a command to utter a response sentence, and whether a distance / direction-dependent word is used in the response sentence (for example, indicates that the distance / direction-dependent word is used). Whether the abstracted words are included in the response sentence) and utter the response sentence,
When it is determined that the response sentence includes the distance / direction dependent word, the process proceeds to step S23. In the case of this example, the process proceeds to step S23.

【０１１０】ステップＳ２３において、距離／方向依存
言葉が選択される。具体的には、はじめに、位置検出部
５４は、図５に示したように、ステップＳ１で検出した
ロボットの位置を中心とする遠近判定パターンＡを形成
し、ステップＳ７で検出した対象物の位置（座標）が、
その遠近判定パターンＡ内に存在するか否かを判定し、
ロボットと対象物の遠近状態を判定する。この例の場
合、対象物（箱）は、ロボットの近くにあるので、対象
物の座標は、遠近判定パターンＡ内に存在すると判定さ
れ、ロボットと対象物は近い状態であると判定される。In step S23, a distance / direction dependent word is selected. Specifically, first, as shown in FIG. 5, the position detection unit 54 forms a perspective determination pattern A centered on the position of the robot detected in step S1, and the position of the object detected in step S7. (Coordinates)
It is determined whether or not it exists in the perspective determination pattern A,
Determine the distance between the robot and the object. In this example, since the object (box) is near the robot, the coordinates of the object are determined to be present in the perspective determination pattern A, and the robot and the object are determined to be close.

【０１１１】次に、対話管理部５６（対話処理部１１
１）は、位置検出部５４により判定された遠近状態に対
応する、ステップＳ２１で生成した応答文の中で使用さ
れる距離／方向依存言葉を、距離／言葉対応テーブル１
１３から選択する。この例の場合、距離／言葉対応テー
ブル１１３のうちの距離／こそあど言葉対応テーブル１
２２が参照され、ロボットと対象物が近い状態であるこ
とを示す列項目１および列項目２に対応する行項目１と
のデータ領域に含まれる「この」が選択される。Next, the dialog management unit 56 (the dialog processing unit 11)
1) stores the distance / direction-dependent words used in the response sentence generated in step S21 corresponding to the perspective state determined by the position detection unit 54 in the distance / word correspondence table 1.
Select from 13. In the case of this example, the distance / sword word correspondence table 1 in the distance / word correspondence table 113
Reference is made to 22, and “this” included in the data area of the row item 1 corresponding to the column item 1 and the column item 2 indicating that the robot and the object are close to each other is selected.

【０１１２】ステップＳ２４において、対話管理部５６
は、ステップＳ２１で生成した行動指令情報の応答文の
中の、”距離／方向依存言葉が使用される部分”に、ス
テップＳ２３で選択した距離／方向依存言葉を組み込
み、応答文を完成させる。この例の場合、「この箱です
か」を表す応答文が生成される。In step S24, the dialogue manager 56
Incorporates the distance / direction dependent word selected in step S23 into the "portion where distance / direction dependent word is used" in the response sentence of the action command information generated in step S21, and completes the response sentence. In this example, a response sentence indicating “is this box?” Is generated.

【０１１３】ステップＳ２２で、行動指令情報に、応答
文を発話する指令が含まれ、かつ、その応答文に距離／
方向依存言葉が使用されていると判定されなかった場
合、またはステップＳ２４で、応答文が完成した後、ス
テップＳ２５に進み、ステップＳ２１で生成された行動
指令情報、またはステップＳ２４で完成された応答文を
発話する指令を含む行動指令情報に基づいて、ロボット
が動作する。具体的には、出力生成部５７は、この行動
指令情報に基づいて、動作パターンを生成し、動作制御
部５９に送出し、動作制御部５９は、その動作パターン
に基づいて、アクチュエータ３ＡＡ1乃至５Ａ1および５
Ａ2を駆動するための制御信号を生成し、アクチュエー
タ３ＡＡ1乃至５Ａ1および５Ａ2に送出する。これによ
り、アクチュエータ３ＡＡ1乃至５Ａ1および５Ａ2は、
その制御信号にしたがって駆動し、この例の場合、ロボ
ットは、対象物の近くまで歩行し、そして座る。In step S22, the action command information includes a command to utter a response sentence, and the response sentence includes a distance /
If it is not determined that the direction-dependent word is used, or after the response sentence is completed in step S24, the process proceeds to step S25, where the action command information generated in step S21 or the response completed in step S24 The robot operates based on action command information including a command to utter a sentence. Specifically, the output generation unit 57 generates an operation pattern based on the action command information and sends the operation pattern to the operation control unit 59. The operation control unit 59 controls the actuators 3AA1 to 5A1 based on the operation pattern. And 5
A control signal for driving A2 is generated and sent to the actuators 3AA1 to 5A1 and 5A2. Thereby, the actuators 3AA1 to 5A1 and 5A2 are
Driven according to the control signal, in this example the robot walks and sits close to the object.

【０１１４】その後、処理は終了し、図２３のステップ
Ｓ９に進む。Thereafter, the processing ends, and the flow advances to step S9 in FIG.

【０１１５】ステップＳ９において、発話が続けて入力
されているか否かが判定され、入力されていると判定さ
れた場合、ステップＳ３に戻り、それ以降の処理が実行
される。一方、発話が入力されていないと判定された場
合、処理は終了する。In step S9, it is determined whether or not the utterance is continuously input. If it is determined that the utterance is input, the process returns to step S3, and the subsequent processing is executed. On the other hand, if it is determined that no utterance has been input, the process ends.

【０１１６】以上のようにすることより、ロボットとユ
ーザ、ロボットと対象物、またはユーザと対象物の遠近
状態に応じて、適切な距離／方向依存言葉が使用される
ので、ロボットとユーザの間で、自然な対話が行われ
る。In the manner described above, appropriate distance / direction-dependent words are used according to the distance between the robot and the user, the robot and the object, or the distance between the user and the object. A natural dialogue takes place.

【０１１７】なお、以上においては、対象物が１つの場
合を例として説明したが、図２５または図２６に示すよ
うに、対象物が複数存在する場合においても応用するこ
とができる。このような場合、例えば、ユーザが「あの
箱」と対象物を指示したとき、ロボットは、「あのに相
当するものが複数あります」、「右ですか、または左で
すか」などと発話することもできる。またロボットが、
図２６に示すように、どちらかの対象物に移動し、「こ
ちらですか」と発話するこもできる。In the above description, the case where there is one object has been described as an example. However, as shown in FIG. 25 or FIG. 26, the present invention can be applied to the case where there are a plurality of objects. In such a case, for example, when the user points to "that box" and an object, the robot may say "there are multiple equivalents", "is it right or left?" Can also. Also, the robot
As shown in FIG. 26, it is possible to move to one of the objects and speak “Here?”.

【０１１８】また、以上においては、ロボットが移動す
るにつれて、こそあど言葉が変化していくが、本発明の
方法では、それを正しく扱うことができる。すなわち、
図２５の時点では、ユーザは、対象物を「あの箱」と表
現し、図２６の時点では、「それ」（または、「その
箱」など）と表現しているが、ロボットは、遠近状態判
別パターン（図１１，１２）と距離／こそあど言葉テー
ブル１２２（図１８）とを用いることで、図２６の時点
での「それ」が図２５の時点の「あの箱」と同一のもの
を示していることを認識することができる。In the above description, the words change as the robot moves, but the method of the present invention can correctly handle the words. That is,
At the time of FIG. 25, the user expresses the target object as “that box”, and at the time of FIG. 26, it expresses it as “it” (or “the box” or the like). By using the discrimination pattern (FIGS. 11 and 12) and the distance / sword word table 122 (FIG. 18), "it" at the time of FIG. 26 is the same as "that box" at the time of FIG. Can be recognized.

【０１１９】また、上述したステップＳ７において、判
定された遠近状態により特定される範囲内で、対象物が
検出される処理が行われるが、ユーザが使用する距離／
方向依存言葉が適切でない場合、例えば、対象物がユー
ザの近くにあるにもかかわらず、「その箱」と対象物を
指示した場合、対象物が検出されない。この場合、ロボ
ットは、例えば、「相当するものが見つかりません。」
を発話したり、また、判定された遠近状態に対応する範
囲外の場所に存在する物の近くに移動し、「このことで
すか」と発話することもできる。またロボットは、この
ユーザは、嘘を言っている（例えば、ロボットをからか
っている）と判断し、それに対応する動作を行うように
することもできる。さらに、対象物を検出することがで
きない場合が続いたとき、図１１に示した遠近判定パタ
ーンＢ、または図１２に示した遠近判定パターンＡの範
囲を拡大したり、縮小したりして、ユーザが使用する距
離／方向依存言葉と、実際の遠近状態が適合することも
できる。In step S7 described above, a process of detecting an object is performed within the range specified by the determined perspective state.
If the direction-dependent word is not appropriate, for example, if the object is located near the user but indicates "the box" and the object, the object is not detected. In this case, the robot may say, for example, "No equivalent found."
Can also be spoken, or the user can move closer to an object located outside the range corresponding to the determined perspective state, and say "What is this?" Further, the robot may determine that the user is lying (for example, making fun of the robot) and perform an operation corresponding to the determination. Further, when the case where the object cannot be detected continues, the range of the perspective judgment pattern B shown in FIG. 11 or the perspective judgment pattern A shown in FIG. The distance / direction-dependent words used by the user and the actual perspective can also be adapted.

【０１２０】また、以上においては、遠近判定パターン
を利用して遠近状態を判定し、その判定結果に基づい
て、距離／こそあど言葉対応テーブル１１２から適切な
こそあど言葉が選択される場合を例として説明したが、
図１５を参照して説明したように、近い状態の度合いま
たは遠い状態の度合いを利用して、適切なこそあど言葉
を選択することもできる。In the above description, the case where the perspective state is determined by using the perspective determination pattern and an appropriate word is selected from the distance / right word correspondence table 112 based on the determination result is described. As explained,
As described with reference to FIG. 15, the degree of the near state or the degree of the far state can be used to select an appropriate word.

【０１２１】例えば、ロボットと対象物との距離が３ｍ
であり、ユーザと対象物との距離が４ｍである場合、図
１５によれば、ロボットと対象物の遠い度合い（図中、
距離３ｍに対応する点線の値）は、１／３で、近い状態
の度合い（図中、距離３ｍに対応する実線の値）は、２
／３となる。一方、ユーザと対象物との遠い状態の度合
い（図中、距離４ｍに対応する点線の値）は、２／３で
あり、近い状態の度合い（図中、距離４ｍに対応する実
施の値）は、１／３となる。そこで、距離／こそあど言
葉対応テーブル１２２の列項目に対応する組み合わせで
それぞれの値を乗算する。その乗算結果は、下記の通り
である。For example, when the distance between the robot and the object is 3 m
In the case where the distance between the user and the object is 4 m, according to FIG. 15, the degree of the distance between the robot and the object (in the figure,
The value of the dotted line corresponding to the distance 3 m) is 1/3, and the degree of the close state (the value of the solid line corresponding to the distance 3 m in the figure) is 2
/ 3. On the other hand, the degree of the distant state between the user and the object (the value of the dotted line corresponding to a distance of 4 m in the figure) is 2/3, and the degree of the close state (the value of the implementation corresponding to the distance of 4 m in the figure) Becomes 1/3. Therefore, each value is multiplied by a combination corresponding to the column item of the distance / sword word correspondence table 122. The result of the multiplication is as follows.

【０１２２】２／３（ロボットと対象物の近い状態の度
合い）×１／３（ユーザと対象物の近い状態の度合い）
＝２／９（列項目１に対応する組み合わせの演算結果）２／３（ロボットと対象物の近い状態の度合い）×２／
３（ユーザと対象物の遠い状態の度合い）＝４／９（列
項目２に対応する組み合わせの演算結果）１／３（ロボットと対象物の遠い状態の度合い）×１／
３（ユーザと対象物の近い状態の度合い）＝１／９（列
項目３に対応する組み合わせの演算結果）１／３（ロボットと対象物の遠い状態の度合い）×２／
３（ユーザと対象物の遠い状態の度合い）＝２／９（列
項目４に対応する組み合わせの演算結果）2/3 (degree of close state between robot and target object) × 1/3 (degree of close state between user and target object)
= 2/9 (computation result of combination corresponding to column item 1) 2/3 (degree of close state between robot and target) x 2 /
3 (degree of distant state between user and target) = 4/9 (computation result of combination corresponding to column item 2) 1/3 (degree of distant state between robot and target) × 1 /
3 (degree of close state between user and object) = 1/9 (computation result of combination corresponding to column item 3) 1/3 (degree of far state between robot and object) × 2 /
3 (degree of distant state between user and target object) = 2/9 (computation result of combination corresponding to column item 4)

【０１２３】そして、その演算結果のうち、最も大きい
値を遠近状態の組み合わせを示す、距離／こそあど言葉
対応テーブル１２２の列項目（この例の場合、列項目
２）に対応するデータ領域に示されるこそあど言葉が選
択される。The largest value among the calculation results is shown in the data area corresponding to the column item (column item 2 in this example) of the distance / sword word correspondence table 122 indicating the combination of the perspective and near states. That is when words are selected.

【０１２４】また、以上においては、遠近状態に基づい
て、距離／方向依存言葉が適切に選択される場合を例と
して説明したが、例えば、ロボットとユーザが遠い状態
である場合、ロボットは、ユーザの電話番号や、住所、
年齢など個人情報を発話しないようにすることができ
る。すなわち、遠近状態に基づいて、発話される情報を
が適切に選択されるようにするこもできる。In the above description, the case where the distance / direction dependent words are appropriately selected based on the perspective state has been described as an example. For example, when the robot and the user are far apart, the robot Phone number, address,
It is possible not to utter personal information such as age. That is, information to be uttered can be appropriately selected based on the perspective state.

【０１２５】また、以上においては、ロボット、ユー
ザ、および対象物の位置関係に対応して使い分けられる
距離／方向依存言葉（呼びかけ言葉やこそあど言葉が距
離／方向依存言葉）が適切に選択されるようになされて
いるが、例えば、ロボット、目的地、およびユーザの位
置関係に対応して使い分けられる、例えば、「行く」ま
たは「来る」などの言葉を適切に選択することもでき
る。In the above description, the distance / direction-dependent words (calling words and sneak words are distance / direction-dependent words) that can be properly used in accordance with the positional relationship between the robot, the user, and the object are appropriately selected. However, for example, words such as “go” or “come” that can be properly used according to the positional relationship between the robot, the destination, and the user can be appropriately selected.

【０１２６】また、以上においては、応答文が音声で出
力される場合を例として説明したが、例えば、表示部を
さらに設け、その応答文を表示させることもできる。In the above description, the case where the response sentence is output by voice has been described as an example. However, for example, a display unit may be further provided to display the response sentence.

【０１２７】上述した一連の処理は、ハードウエアによ
り実現させることもできるが、ソフトウエアにより実現
させることもできる。一連の処理をソフトウエアにより
実現する場合には、そのソフトウエアを構成するプログ
ラムがコンピュータにインストールされ、そのプログラ
ムがコンピュータで実行されることより、上述したロボ
ットが機能的に実現される。The above series of processing can be realized by hardware, but can also be realized by software. When a series of processing is realized by software, a program constituting the software is installed in a computer, and the program is executed by the computer, whereby the above-described robot is functionally realized.

【０１２８】図２７は、上述のようなロボットとして機
能するコンピュータ５０１の一実施の形態の構成を示す
ブロック図である。CPU５１１にはバス５１５を介して
入出力インタフェース５１６が接続されており、CPU５
１１は、入出力インタフェース５１６を介して、ユーザ
から、キーボード、マウスなどよりなる入力部５１８か
ら指令が入力されると、例えば、ROM（Read Only Memor
y）５１２、ハードディスク５１４、またはドライブ５
２０に装着される磁気ディスク５３１、光ディスク５３
２、光磁気ディスク５３３、若しくは半導体メモリ５３
４などの記録媒体に格納されているプログラムを、RAM
（Random Access Memory）５１３にロードして実行す
る。これにより、上述した各種の処理（例えば、図２３
のフローチャートまたは図２４のフローチャートにより
示される処理）が行われる。さらに、CPU５１１は、そ
の処理結果を、例えば、入出力インタフェース５１６を
介して、LCD（Liquid Crystal Display）などよりなる
表示部５１７に必要に応じて出力する。なお、プログラ
ムは、ハードディスク５１４やROM５１２に予め記憶し
ておき、コンピュータ５０１と一体的にユーザに提供し
たり、磁気ディスク５３１、光ディスク５３２、光磁気
ディスク５３３，半導体メモリ５３４等のパッケージメ
ディアとして提供したり、衛星、ネットワーク等から通
信部５１９を介してハードディスク５１４に提供するこ
とができる。FIG. 27 is a block diagram showing the configuration of an embodiment of a computer 501 functioning as a robot as described above. An input / output interface 516 is connected to the CPU 511 via a bus 515.
When a user inputs a command from the input unit 518 including a keyboard, a mouse, and the like via the input / output interface 516, the ROM 11 reads, for example, a ROM (Read Only Memory).
y) 512, hard disk 514, or drive 5
Magnetic disk 531 and optical disk 53 mounted on
2. Magneto-optical disk 533 or semiconductor memory 53
4 is stored in a storage medium such as RAM.
(Random Access Memory) 513 is loaded and executed. Thereby, the various processes described above (for example, FIG.
24 or the processing shown by the flowchart in FIG. 24). Further, the CPU 511 outputs the processing result to a display unit 517 such as an LCD (Liquid Crystal Display) via the input / output interface 516 as necessary. The program is stored in the hard disk 514 or the ROM 512 in advance and provided to the user integrally with the computer 501, or provided as package media such as the magnetic disk 531, the optical disk 532, the magneto-optical disk 533, and the semiconductor memory 534. Or can be provided to the hard disk 514 via a communication unit 519 from a satellite, a network, or the like.

【０１２９】なお、本明細書において、記録媒体により
提供されるプログラムを記述するステップは、記載され
た順序に沿って時系列的に行われる処理はもちろん、必
ずしも時系列的に処理されなくとも、並列的あるいは個
別に実行される処理をも含むものである。In the present specification, the step of describing a program provided by a recording medium may be performed not only in chronological order according to the described order but also in chronological order. This also includes processing executed in parallel or individually.

【０１３０】[0130]

【発明の効果】請求項１に記載の情報処理装置、請求項
５に記載の情報処理方法、および請求項６に記載の記録
媒体のプログラムによれば、ロボットと発話の中で指示
される対象、ロボットとユーザ、またはユーザと対象の
位置関係に対応する、その位置関係により使い分けられ
る言葉を選択するようにしたので、より適切な、ユーザ
の発話に対する応答としての応答文を生成するこができ
る。According to the information processing apparatus according to the first aspect, the information processing method according to the fifth aspect, and the program of the recording medium according to the sixth aspect, the robot and the object specified in the utterance In this case, a word corresponding to the positional relationship between the robot and the user or the user and the target is selected according to the positional relationship, so that a more appropriate response sentence can be generated as a response to the utterance of the user. .

[Brief description of the drawings]

【図１】本発明を適用したロボットの利用例を示す図で
ある。FIG. 1 is a diagram showing a usage example of a robot to which the present invention is applied.

【図２】図１のロボットの外観の構成例を示す図であ
る。FIG. 2 is a diagram illustrating a configuration example of an appearance of the robot in FIG. 1;

【図３】図１のロボットの内部の構成例を示すブロック
図である。FIG. 3 is a block diagram showing an example of the internal configuration of the robot shown in FIG. 1;

【図４】図３のコントローラ１０の機能的構成例を示す
ブロック図である。FIG. 4 is a block diagram illustrating a functional configuration example of a controller 10 of FIG. 3;

【図５】遠近判定パターンＡを説明する図である。FIG. 5 is a diagram illustrating a perspective determination pattern A.

【図６】ロボットとユーザの位置関係を示す図である。FIG. 6 is a diagram illustrating a positional relationship between a robot and a user.

【図７】ロボットとユーザの位置関係を示す他の図であ
る。FIG. 7 is another diagram showing the positional relationship between the robot and the user.

【図８】遠近判定パターンＢを説明する図である。FIG. 8 is a diagram illustrating a perspective determination pattern B;

【図９】遠近判定パターンＡおよび遠近判定パターンＢ
の利用例を説明する図である。FIG. 9 is a perspective judgment pattern A and a perspective judgment pattern B.
It is a figure explaining the example of utilization of.

【図１０】遠近判定パターンＡおよび遠近判定パターン
Ｂの利用例を説明する他の図である。FIG. 10 is another diagram illustrating a use example of the perspective determination pattern A and the perspective determination pattern B;

【図１１】遠近判定パターンＢを説明する他の図であ
る。FIG. 11 is another diagram illustrating a perspective determination pattern B;

【図１２】遠近判定パターンＡを説明する他の図であ
る。FIG. 12 is another diagram illustrating a perspective determination pattern A;

【図１３】遠近判定パターンＡおよび遠近判定パターン
Ｂの利用例を説明する他の図である。FIG. 13 is another diagram illustrating an example of use of the perspective determination pattern A and the perspective determination pattern B.

【図１４】遠近判定パターンＡおよび遠近判定パターン
Ｂの利用例を説明する他の図である。FIG. 14 is another diagram illustrating an example of use of the perspective determination pattern A and the perspective determination pattern B;

【図１５】近い状態の度合いおよび遠い状態の度合いの
演算結果を示す図である。FIG. 15 is a diagram showing calculation results of the degree of the near state and the degree of the far state.

【図１６】図４の対話管理部５６の構成例を示す図であ
る。FIG. 16 is a diagram illustrating a configuration example of a dialog management unit 56 in FIG. 4;

【図１７】距離／呼びかけ言葉対応テーブル１２１を示
す図である。17 is a diagram showing a distance / calling word correspondence table 121. FIG.

【図１８】距離／こそあど言葉対応テーブル１２２を示
す図である。FIG. 18 is a diagram showing a distance / sword word correspondence table 122.

【図１９】ロボット、ユーザ、および対象物の位置関係
を示す図である。FIG. 19 is a diagram illustrating a positional relationship between a robot, a user, and an object.

【図２０】ロボット、ユーザ、および対象物の位置関係
を示す他の図である。FIG. 20 is another diagram showing the positional relationship between the robot, the user, and the target object.

【図２１】ロボット、ユーザ、および対象物の位置関係
を示す他の図である。FIG. 21 is another diagram showing a positional relationship between a robot, a user, and an object.

【図２２】ロボット、ユーザ、および対象物の位置関係
を示す他の図である。FIG. 22 is another diagram showing a positional relationship between a robot, a user, and a target object.

【図２３】ロボットがユーザの対話に対応して動作する
場合の処理手順を示すフローチャートである。FIG. 23 is a flowchart showing a processing procedure when the robot operates in response to a user's dialogue.

【図２４】図２３のステップＳ８の処理の詳細を説明す
るフローチャートである。FIG. 24 is a flowchart illustrating details of a process in step S8 of FIG. 23;

【図２５】ロボット、ユーザ、および対象物の位置関係
を示す他の図である。FIG. 25 is another diagram showing a positional relationship between a robot, a user, and a target object.

【図２６】ロボット、ユーザ、および対象物の位置関係
を示す他の図である。FIG. 26 is another diagram showing the positional relationship between the robot, the user, and the target object.

【図２７】コンピュータ５０１の構成例を示すブロック
図である。FIG. 27 is a block diagram illustrating a configuration example of a computer 501.

[Explanation of symbols]

１０コントローラ，１０ＡＣＰＵ，１０Ｂメ
モリ，１５マイク，１６ＣＣＤカメラ，１７
タッチセンサ，１８スピーカ，１９GPS受信
部，２０ジャイロコンパス，２１地磁気セン
サ，２２超音波センサ，５１入力処理部，
５２音声認識／解析部，５３画像認識部，５４
位置検出部，５５ユーザ識別部，５６対話管
理部，５７出力生成部，５８音声出力部，５
９動作制御部10 controller, 10A CPU, 10B memory, 15 microphone, 16 CCD camera, 17
Touch sensor, 18 speaker, 19 GPS receiver, 20 gyro compass, 21 geomagnetic sensor, 22 ultrasonic sensor, 51 input processor,
52 voice recognition / analysis unit, 53 image recognition unit, 54
Position detection unit, 55 user identification unit, 56 dialogue management unit, 57 output generation unit, 58 audio output unit, 5
9 Operation control unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５４５Ｃ 15/22 ５５１Ｈ 13/04 ５７１Ｔ 5/02 Ｊ (72)発明者岸秀樹東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者表雅則東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者田島和彦東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者武田正資東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 3F059 AA00 BA00 BB06 DA05 DC00 DC08 DD00 DD18 FC00 5B091 AA15 AB15 BA02 BA16 BA19 CA12 CA21 CB01 CB12 CB32 CD15 DA03 5D015 AA03 HH04 KK02 LL00 5D045 AA20 AB11 AB30 9A001 BB04 BB06 DZ15 HH17 HH18 HH19 HH20 HH34 HZ05 HZ10 KZ62 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/00 G10L 3/00 545C 15/22 551H 13/04 571T 5/02 J (72) Inventor Kishi Hideki 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation (72) Inventor Table Masanori 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation (72) Inventor Kazuhiko Tajima Tokyo 6-7-35 Kita-Shinagawa, Shinagawa-ku, Sony Corporation (72) Inventor Masayoshi Takeda 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Corporation F-term (reference) 3F059 AA00 BA00 BB06 DA05 DC00 DC08 DD00 DD18 FC00 5B091 AA15 AB15 BA02 BA16 BA19 CA12 CA21 CB01 CB12 CB32 CD15 DA03 5D015 AA03 HH04 KK02 LL00 5D045 AA20 AB11 AB30 9A001 BB04 BB06 DZ15 HH17 HH18 HH19 HHZHHZ 62

Claims

[Claims]

1. An information processing apparatus that generates a response sentence as a response to a user's utterance output by a robot, wherein the robot and a target indicated in the utterance, the robot and the user, or the user Acquiring means for acquiring the positional relationship of the object; if the generated response sentence includes a word which can be properly used in accordance with the positional relationship, the word corresponding to the positional relationship acquired by the acquiring means; An information processing apparatus comprising: a selection unit that selects a response sentence; and a generation unit that generates the response sentence by using the word selected by the selection unit.

2. The positional relationship may be based on a distance between the robot and the target, the robot and the user, or a distance between the user and the target, or a direction or a direction in which the robot, the user, or the target is located. The information processing apparatus according to claim 1, wherein:

3. When the utterance includes the words properly used in accordance with the positional relationship, the utterance further includes an estimating unit that estimates the positional relationship based on the words. Executing a process of detecting the user or the target from within a range specified by the positional relationship estimated by the estimating unit, and acquiring the positional relationship based on a detection result of the executed process. The information processing apparatus according to claim 1.

4. The method according to claim 1, wherein the generating unit adjusts the content of the response sentence based on the positional relationship acquired by the acquiring unit when the response sentence is output as voice. 2. The information processing device according to 1.

5. An information processing method of an information processing apparatus for generating a response sentence as a response to a user's utterance output by a robot, wherein the robot and a target indicated in the utterance, the robot and the user, Or, an acquiring step of acquiring a positional relationship between the user and the target; and, if the generated response sentence includes a word that can be properly used in accordance with the positional relationship, the position acquired in the processing of the acquiring step An information processing method, comprising: a selection step of selecting the word corresponding to a relationship; and a generation step of generating the response sentence using the word selected in the processing of the selection step.

6. A program for information processing for generating a response sentence as a response to a user's utterance, which is output by a robot, wherein the robot and a target indicated in the utterance, the robot and the robot An acquisition step of acquiring a positional relationship between a user or the user and the object; and a case where the generated response sentence includes a word that can be properly used in accordance with the positional relationship. A computer comprising: a computer that performs processing including: a selection step of selecting the word corresponding to the positional relationship; and a generation step of generating the response sentence using the word selected in the processing of the selection step. A recording medium on which a program to be executed is recorded.