CN107305768B - Error-prone character calibration method in voice interaction - Google Patents

Error-prone character calibration method in voice interaction Download PDF

Info

Publication number
CN107305768B
CN107305768B CN201610248440.8A CN201610248440A CN107305768B CN 107305768 B CN107305768 B CN 107305768B CN 201610248440 A CN201610248440 A CN 201610248440A CN 107305768 B CN107305768 B CN 107305768B
Authority
CN
China
Prior art keywords
similarity
sentence
corrected
character
place name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610248440.8A
Other languages
Chinese (zh)
Other versions
CN107305768A (en
Inventor
黄亦睿
刘功申
苏波
刘春梅
李建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610248440.8A priority Critical patent/CN107305768B/en
Publication of CN107305768A publication Critical patent/CN107305768A/en
Application granted granted Critical
Publication of CN107305768B publication Critical patent/CN107305768B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for calibrating error-prone characters in voice interaction, which comprises the following steps: the method comprises the steps of context recognition, automatic error correction based on limited semantics and manual error correction based on semantic feedback. The invention realizes the automatic error correction function for the entity with specific meaning by interacting with the user voice and sensing and identifying the topic context by utilizing the named entity identification technology in the limited semantic range, and supports the additional semantics obtained by manual feedback to further correct the error, thereby realizing higher input efficiency and more convenient error correction mode than the existing voice identification software.

Description

Error-prone character calibration method in voice interaction
Technical Field
The invention relates to a calibration technology of error-prone characters, in particular to a calibration method of error-prone characters in voice interaction, and particularly relates to a calibration scheme of available voice interaction error-prone characters, which is realized by applying a natural language understanding method to calibration and correction of voice interaction error-prone characters.
Background
As a new approach for man-machine interaction, voice interaction has been widely used in recent years. This is derived from the development of speech recognition technology, and the error rate of the speech recognition system is greatly reduced from Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM) to the current Deep Neural Network models (DNN); secondly, the use habit of the intelligent equipment user is not formed, and new technologies such as voice interaction are easily accepted by the public; and due to the ultra-conventional development of cloud computing and mobile internet, a large amount of brand-new corpus resources are generated to further promote the development of the voice recognition technology.
Under many scenes, the voice interaction has more practical value and accords with the interactive habit of human beings. However, since the voice input is inevitably affected by environmental noise and fading channels, many erroneous results are often generated, and in addition, the machine cannot accurately recognize the voice input of the user due to the fact that a large number of homophones and phonetic close characters exist in the Chinese language, so that erroneous characters are easy to occur in the voice recognition. In other words, the accuracy of speech recognition has not yet reached the desired level, and speech recognition technology must also make breakthrough in many aspects.
Through the search of prior art documents, chinese patent document No. CN201210584746.2, publication No. CN103021412A, describes a "speech recognition method and system", which includes: carrying out voice recognition on a voice signal input by a user to obtain a voice recognition result and a voice segment corresponding to each character in the voice recognition result; receiving error correction information independently input by a user and generating an error correction character string; determining a voice section generating recognition errors in a voice signal input by a user according to the error correction character string; determining a character string corresponding to the speech segment with the recognition error in the speech recognition result as an error character string according to the speech segment corresponding to each character in the speech recognition result; the error string is replaced with an error correction string. The technology realizes an error correction method for the error character string, but the entry of the error correction character string can be entered after a special key is used, or entered by other modes such as pinyin, handwriting and the like. The voice input mode can only repeat the previously input content so as to achieve the aim of correcting the error recognition; but if the user enters a word that is not entered by the system, the scheme will not be corrected correctly.
Chinese patent document No. CN201310589827.6, publication No. CN103680505A, describes a "speech recognition method and system", the method including: continuously receiving a recording input; performing voice recognition on the recording by using a small vocabulary voice recognition network to check whether the recording contains preset keywords; and if the sound recording contains the keywords, identifying the sound recording after the keywords by using a large-vocabulary speech recognition network to obtain an identification result. The technology solves the problem of recognition accuracy rate when monitoring commands for a long time, and can smoothly transit from a small vocabulary network to a normal speech recognition stage, namely the large vocabulary network. However, the technology does not optimize large-vocabulary networks, such as semantic enhancement in the context of restriction, and does not mention related error-prone word alignment technology.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for calibrating error-prone characters in voice interaction. The present invention uses the existing speech recognition API (Application Programming Interface) to complete a valuable error-prone word calibration system. The system senses and identifies topic contexts through voice interaction with a user, so that in a limited semantic range, an entity with specific significance is automatically corrected by using a named entity identification technology, additional semantics are obtained through manual feedback to correct errors, and higher input efficiency and a more convenient wrong word correction mode are realized compared with the existing voice identification software.
The invention provides a method for calibrating error-prone characters in voice interaction, which comprises the following steps:
and a context recognizing step: creating respective contextual knowledge bases for different domains, the step of constructing the contextual knowledge bases comprising: firstly, according to keywords of a field, obtaining related documents through a search engine to serve as a corpus of the field; and then, acquiring core words of the field according to the semantic knowledge, and clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base.
Preferably, in the step of identifying the context, the judgment is carried out according to the context similarity of the text sentences and different fields in the context knowledge base, and the judgment is used as the premise of automatic error correction; the specific algorithm of the context similarity is as follows:
s1: counting the occurrence times of each word in the text sentence A, and expressing the occurrence times into a vector form;
s2: according to a cosine similarity calculation formula, calculating a cosine value of a vector included angle between two vectors of a text sentence A and each example sentence B in a vector form in a context Ci, and taking the cosine value as the word shape similarity based on the vectors;
s3: converting all words of the text sentence A into a pinyin form, counting the occurrence times of each different pinyin sequence in the text sentence A, expressing the pinyin sequence into a vector form, calculating a cosine value of a vector included angle between two vectors of the text sentence A expressed in the pinyin form and each example sentence B in the vector form in the context Ci, and obtaining the pinyin similarity based on the vectors;
s4: the sentence similarity between the text sentence A and each example sentence B is calculated by giving different weights to the pinyin similarity and the morphological similarity, and the value with the maximum sentence similarity is selected as the sentence similarity between the text sentence A and the context Ci;
s5: calculating the matching rate of the core words of the text sentence A and the context Ci, namely the number of all the core words in the context Ci contained in the text sentence A accounts for the percentage of the number of all the words in the text sentence A;
s6: the context similarity of the text sentence A and the context Ci is calculated by giving different weights to the sentence similarity and the matching rate of the core words;
s7: computing smooth contextual similarity SmoothContextSim (a, C) of text sentence a and context Ci based on the context in fronti):
SmoothContextSim(A,Ci)=λ1·ContextSim(A-2,Ci)
2·ContextSim(A-1,Ci)
3·ContextSim(A,Ci)
λ123=1
λ1≤λ2≤λ3
Wherein, A-1,A-2Respectively representing a current text sentence, a first sentence before the current text sentence and a second sentence before the current text sentence; lambda [ alpha ]123Is a constant; ContextSim (X, Y) represents the contextual similarity of the text sentence X to the context Y.
Preferably, the method further comprises the following steps:
and (3) automatic error correction based on the limited semantics: and acquiring the place name to be corrected in the text sentence input by the voice of the user, and performing error correction on the place name to be corrected.
Preferably, the automatic error correction based on the constraint semantics comprises:
reading a text sentence: reading in a text sentence P input by a user voice, wherein P is P1P2...Pi...Pn(ii) a Wherein p isiRepresenting the ith Chinese character in the text sentence, and n represents the length of the text sentence;
and a to-be-corrected place name obtaining step: scanning P, and matching according to a place name matching rule to obtain a place name to be corrected;
error correction step: and carrying out short text similarity matching on the place name to be corrected and all the place names in the place name library to obtain the place name most similar to the place name to be corrected, and taking the place name most similar to the place name to be corrected as the correct place name after error checking and correcting.
Preferably, the place name matching rule includes any one of the following rules:
rule one is as follows: if W islBelonging to the set of left boundary words, WrBelonging to the set of right boundary words, WpNumber of words WpLen is greater than 1, then WpIdentifying the place name to be corrected;
rule two: if W islIn the set of left boundary words, WrBelonging to the set of place name suffixes, then will be represented by Wp、WrFormed character string
Figure BDA0000970286530000041
Identifying the place name to be corrected;
rule three: if W islBelonging to a collection of place suffixes, WrBelonging to the set of right boundary words, WpIf the number of words is greater than 1, then W is setpIdentifying the place name to be corrected;
rule four: if W islBelonging to a collection of place suffixes, WrSet of place name suffixes, then will be represented by Wp、WrFormed character string
Figure BDA0000970286530000042
Identifying the place name to be corrected;
wherein, WlIs the previous word of the word to be corrected, WpIs a word to be corrected, WrIs the latter word of the word to be corrected.
Preferably, in the automatic error correction step based on the restricted semantics, a weighted longest common subsequence algorithm is adopted to calculate the short text similarity matching; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.
Preferably, the pinyin similarity refers to: the similarity of initial consonants in the two phonetics and the similarity of final consonants in the two phonetics are respectively calculated, and the corresponding similarity is respectively given to the situation of syllable mixing.
Preferably, the method further comprises the following steps:
and (3) artificial error correction based on semantic feedback: correcting errors according to the correct sentence pattern input by the voice; wherein, the form of correcting sentence pattern includes:
in a first form: modifying, wherein the character A is a character C of the word B;
a second form: modifying, wherein the Nth character A is a character C of the word B;
wherein, the character A and the character C are the same character and are marked as indicating characters; the word B is a idiom or phrase containing the character A and the character C and is marked as a correcting word;
the pinyin of the indicating character is the same as the pinyin of the wrong character in the input text and the same as the pinyin of the correct character in the corrected word;
and according to the indicator, extracting the correct character from the corrected word as the corrected character for replacement.
Compared with the prior art, the invention has the following beneficial effects:
first, the three-stage calibration technique for error-prone words of the present invention can be widely applied to various speech recognition systems and speech interaction devices, and can be used together or separately to enhance the correction capability for error-prone words in a single aspect.
Secondly, the context recognition function of the invention can be applied to a generalized voice input system, and can recognize the corresponding context according to the input context of the user and improve the weight of various words under the context so as to improve the recognition accuracy.
Thirdly, the automatic error correction function based on the voice vehicle navigation context can improve the recognition accuracy of command entities such as road names, places and the like, reduce the interaction and correction frequency of drivers and navigation equipment and improve the driving safety.
Fourthly, the automatic error correction function of the artificial semantic feedback can be applied to the scene of long-time and large-amount text input, and natural and smooth command voice is used for realizing the error correction of the previously input information. The function accords with the Chinese language culture habit, and can realize the text input of pure voice without additional click.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a basic framework diagram of the present invention.
FIG. 2 is a schematic diagram of the overall calibration process of the present invention.
FIG. 3 is a flow chart illustrating the context recognition process of the present invention.
FIG. 4 is a schematic diagram of an automatic error correction process according to the present invention.
FIG. 5 is a schematic diagram of a manual error correction process according to the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a calibration technology of error-prone characters in a series of voice interaction, which applies a natural language understanding method to the calibration and correction of the error-prone characters in the voice interaction and realizes a comprehensive voice interaction error-prone character calibration system. The system comprises the following functions:
first, semantic enhancement based on context. Under a plurality of specific contexts, the system perceives and identifies topic contexts by analyzing the voice input by the user and understands the interaction requirements of the user.
Second, automatic error correction based on constrained semantics. The system utilizes language features to pertinently improve the accuracy of voice recognition in a context environment with limited semantics by imposing restrictions on the context of voice interaction.
Third, artificial semantic enhancement based on voice interaction. The user is required to actively interact with the system voice through additional semantics, and semantic technology enhancement is carried out on the important words in the interactive system, so that the computer is guided to accurately understand the intention of the user and make corresponding feedback.
In particular, the present invention completes a valuable voice interaction error-prone word alignment system available based on existing speech recognition APIs. The system senses and identifies topic contexts through voice interaction with a user, so that in a limited semantic range, an entity with specific significance is automatically corrected by using a named entity identification technology, additional semantics are obtained through manual feedback to correct errors, and higher input efficiency and a more convenient wrong word correction mode are realized compared with the existing voice identification software. Fig. 1 depicts the basic framework of the present invention and fig. 2 depicts the overall calibration flow of the present invention.
The invention provides a method for calibrating error-prone characters in voice interaction, which comprises the following steps:
first, identifying context step
The first premise for identifying a context is to create a corresponding context knowledge base for different domains. The process of constructing the context knowledge base for each domain is as follows: firstly, a large number of related documents are obtained through a search engine according to keywords of a selected field and are used as a corpus of the field. And then manually acquiring core words of the field according to semantic knowledge, and manually clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base.
In the step of identifying the context, the judgment is mainly carried out according to the similarity of the text sentences and the contexts of different fields in the context knowledge base, and the judgment is used as the premise of automatic error correction.
The specific algorithm of the context similarity is as follows:
s1: counting the occurrence times of each word in the text sentence A, and expressing the occurrence times into a vector form;
s2: according to a cosine similarity calculation formula, calculating a cosine value of a vector included angle between two vectors of a text sentence A and an example sentence B in each vector form in a context Ci, and taking the cosine value as the word shape similarity based on the vectors;
s3: converting all words of the text sentence A into a pinyin form, counting the occurrence times of each different pinyin sequence in the text sentence A, expressing the occurrence times into a vector form, and calculating a cosine value of a vector included angle between two vectors of the text sentence A and an example sentence B in each vector form in the context Ci to obtain a pinyin similarity based on the vectors;
s4: the sentence similarity between the text sentence A and each example sentence B is calculated by giving different weights to the pinyin similarity and the morphological similarity, and the value with the maximum similarity is selected as the sentence similarity between the text sentence A and the context Ci;
s5: calculating the matching rate of the core words of the text sentence A and the context Ci, namely the number of all the core words in the context Ci contained in the text sentence A accounts for the percentage of the number of all the words in the text sentence A;
s6: the context similarity of the text sentence A and the context Ci is calculated by giving different weights to the sentence similarity and the matching rate of the core words;
s7: computing smooth contextual similarity SmoothContextSim (a, C) of text sentence a and context Ci based on the context in fronti):
SmoothContextSim(A,Ci)=λ1·ContextSim(A-2,Ci)
2·ContextSim(A-1,Ci)
3·ContextSim(A,Ci)
λ123=1
λ1≤λ2≤λ3
Wherein, A-1,A-2Respectively representing a current text sentence, a first sentence before the current text sentence and a second sentence before the current text sentence; lambda [ alpha ]123Is a constant; ContextSim (X, Y) represents the contextual similarity of the text sentence X and the context Y;
in the test of the invention, λ is selected1=0.1,λ2=0.2,λ30.7. FIG. 3 gives a general flow of identifying context.
Second, an automatic error correction step based on restricted semantics
The invention preferably applies the voice interaction scenario to the car navigation system, and therefore, in a preferred embodiment of the invention, the corpus is a cell word bank storing correct road names, place names and organization names.
Firstly, the invention defines the following set based on the analysis of place name composition and context rules in the vehicle navigation system:
set of place name suffixes PlaceTailWord, such as "city", "county", "road", "district", "village", and the like.
Set of left boundary words leftborderderword: such as "to", "from", "at", "distance", "close", etc.
Set of right border words rightborderworword: such as "near", "around", "beside", etc.
Asplace (S) denotes the recognition of S as a place name to be corrected.
Will be composed ofl、Wp、WrThe constructed string is recorded as
Figure BDA0000970286530000071
WlIs the previous word of the word to be corrected, WpIs a word to be corrected, WrIs the latter word of the word to be corrected.
The specific place name matching rule is defined as follows:
rule one is as follows: if W islBelonging to the set of left boundary words, WrBelonging to the set of right boundary words, WpNumber of words WpLen is greater than 1, then WpIdentifying the place name to be corrected;
that is (W)l∈LeftBorderWord)&&(Wr∈RightBorderWord)&&(Wp.len>1)→AsPlace(Wp)
Rule two: if W islIn the set of left boundary words, WrBelonging to the set of place name suffixes, then will be represented by Wp、WrFormed character string
Figure BDA0000970286530000072
Identifying the place name to be corrected;
namely, it is
Figure BDA0000970286530000073
Rule three: if W islBelonging to a collection of place suffixes, WrBelonging to the set of right boundary words, WpIf the number of words is greater than 1, then W is setpIdentifying the place name to be corrected;
that is (W)l∈PlaceTailWord)&&(Wr∈RightBorderWord)&&(Wp.len>1)→AsPlace(Wp)
Rule four: if W islBelonging to a collection of place suffixes, WrSet of place name suffixes, then will be represented by Wp、WrFormed character string
Figure BDA0000970286530000081
Identifying the place name to be corrected;
namely, it is
Figure BDA0000970286530000082
The recognition of the named entity is established on the basis of the word segmentation result, and once the word is not segmented correctly, the recognition accuracy of the named entity is greatly reduced. In order to solve the problem of error identification caused by word segmentation, each word is segmented into a character, and named entity identification is carried out by taking the character as a unit.
The specific algorithm is as follows:
reading a text sentence: reading in a text sentence P input by a user voice, wherein P is P1P2...Pi...Pn(ii) a Wherein p isiRepresenting the ith Chinese character in the text sentence, and n represents the length of the text sentence;
and a to-be-corrected place name obtaining step: scanning P, and matching according to a place name matching rule to obtain a place name to be corrected;
error correction step: and carrying out short text similarity matching on the place name to be corrected and all the place names in the place name library to obtain the place name most similar to the place name to be corrected, and taking the place name most similar to the place name to be corrected as the correct place name after error checking and correcting.
In the automatic error correction stage, the invention mainly utilizes the place names in the common place name library to calibrate and confirm the voice recognition result. In other words, the place name to be corrected extracted according to the rule is compared with the place names in the common place name library in a short text, and the same or most similar place name is obtained to replace the place name to be corrected so as to realize error checking and correction.
In the automatic error correction step based on the limited semantics, calculating the similarity matching of the short text by adopting a weighted longest public subsequence algorithm; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.
The short text comparison algorithm is realized by taking pinyin as a unit, and considering that the difference of the composition structures of the initial consonant and the final in the pinyin is large, the similarity needs to be calculated for the initial consonant and the final when the pinyin similarity is calculated. In two different pinyins, once the initial consonants or the final consonants are completely the same, the similarity of 0.5 is given; if the initials or finals are similar (such as flat warped tongue sound, front and back nasal sound, etc.), a similarity of 0.25 is given.
On the basis, the method adopts a weighted longest public subsequence algorithm, calculates the pinyin similarity between the candidate place name A and the place name B in the common place name library by taking a character as a unit, and calculates the longest public subsequence of the candidate place name A and the place name B by utilizing a dynamic planning thought.
Let use two-dimensional array WLCS [ i, j]Means that the character string A is a0a1...anThe ith character and character string B ═ B0b1...bmThe longest common subsequence with weight before the jth character in the sequence has
Figure BDA0000970286530000091
Wherein i is more than or equal to 0 and less than or equal to n, and j is more than or equal to 0 and less than or equal to m. SimPY (ai, b)j) The pinyin similarity between the ith character of the character string A and the jth character of the character string B is calculated by the pinyin similarity calculation method.
The similarity SimWLCS (a, B) of the character strings a and B can be calculated by the following formula:
Figure BDA0000970286530000092
wherein WLCS (A, B) represents the sum of the longest common subsequence similarity of each corresponding bit in the string A, B; maxlan (a, B) represents the maximum value of the character length in the character string A, B.
Thirdly, artificial error correction based on semantic feedback
The basic mode of the voice interaction scheme of the artificial semantic feedback is that a voice recognition system continuously receives voice sent by a user, and recognizes and processes the voice. Under general conditions, a user normally uses voice to perform character input, when the user thinks that a certain character has recognition errors, the user can use the voice to perform correction, the corrected simple sentence pattern is ' correction, Wu is Wu ' of Kontangwu ', the system can automatically recognize the voice input pattern as a correction pattern, enter the feedback and correction processes of the system, extract correction information from the correction sentence pattern and correct the corresponding wrong character before. If there are other wrong words, the user can repeat the feedback process until the correction is satisfied, and then perform the subsequent entry, and the text entered before is confirmed by the user by default and no correction is accepted.
Specifically, when the user inputs a text sentence by voice, and the input text is inconsistent with the result expected by the user, the user can continue to speak a more correct sentence pattern by voice, and the more correct sentence pattern has two forms:
in a first form: modified, A is C for B.
In a second form: modified, Nth A is C of B.
Wherein, A and C are the same word and are called as 'indicator word'; b is an idiom or phrase containing A and C, called a "correcting word". The pinyin of the indicator word is the same as the pinyin of the wrong word in the entered text and the correct word in the corrected word under normal conditions. The existence of the indicating characters establishes the relation between the wrong characters and the corrected characters, and the correct corrected characters can be extracted from the corrected characters according to the indicating characters, the wrong characters are searched in the text, and the corrected characters are used for replacement. For example:
and (3) user voice input: i am yellow and also Rui.
And (3) voice recognition results: i call Huang-Yi Rui.
Where a word is considered by the user as an erroneous entry. It is also feared that the user can continue to use speech to say a more correct sentence "modify". In this case, the letter in section A, C is an indicator and "unfair" is a corrector. The system will start the error correction procedure and replace the wrong "one" with the "also" word, thereby showing the correct result "i am yellow or wise" on the screen.
In order to avoid that the user enters the text and a plurality of words with the same pronunciation cannot be selected to correct one of the words, the user needs to actively speak the specific sequence of the wrong words. The second Chinese character with the same tone as the indicator character can be corrected by a second form of correcting sentence pattern, using the number provided by the N part, such as "the second A is C of B", to pinpoint the position of the wrong character and avoid confusion caused by multiple homophones.
In the sentence correcting mode, the character part is indicated, on one hand, Chinese characters corresponding to pinyin are searched for in the corrected word part through the pinyin and are used as correct corrected characters; on the other hand, the corresponding Chinese character position is searched in the front text through pinyin, and the corrected character is used for replacement, so that the correction of the wrong text is completed.
The specific steps for searching for corrected words are as follows:
step (1): and converting the corrected sentence into a pinyin sequence, and segmenting according to the keywords to obtain the indicator and the corrected word. If the 'is also not so much' is converted into a pinyin sequence [ yi shi bu yi le hu yi ], the contents of the indicator A, the corrector B and the indicator C are obtained by segmenting the 'shi' and the 'de' keywords respectively to be 'yi', [ bu yi le hu ], 'yi'.
Step (2): and judging whether the indicator words A and C are the same, and if so, searching the position of the indicator word in the corrected word. That is, the position index of "yi" in [ bu yi le hu ] is 2 (starting from 1).
And (3): and in the process of matching the special knowledge base or the API, the Chinese character corresponding to the pinyin of the indicator character in the corrected word is obtained according to the position information and is used as the correct corrected word. Here, the word "never" is also true.
And (4): and searching the position of the wrong character in the previous sentence according to the pinyin of the wrong character, and replacing the wrong character with the corrected character so as to achieve the function of correcting the wrong character.
In speech feedback and correction, people often adopt words which are not easy to repeat to make words for corrected characters, such as common words, idioms, names of famous people or common phrases specially used for describing Chinese characters.
There are many proper nouns in chinese, where each word is an abbreviation of some word, such as "program", which can be described by "written program" and "course program".
Meanwhile, there is a common phenomenon in chinese, that is, a character is described by using a radical, which is often used in descriptions of surnames or words that are not easy to be formed, such as "grass-headed yellow" and "ancient chinese fiddle".
The following table lists several cases describing Chinese characters:
table 1 describes several cases of Chinese characters
Figure BDA0000970286530000111
For idioms, celebrity names and common words, the existing speech recognition API can correctly recognize the idioms, and accurate corrected words can be obtained. However, for describing Chinese font words, the existing speech recognition API can not correctly recognize all words because the words do not belong to common words. In this respect, the invention introduces a specialized knowledge base based on idioms to improve the recognition accuracy of the words.
The word to be corrected represented by each record of the knowledge base belongs to the category of common error-prone words, and stores the mapping of proper nouns and pinyin thereof, such as:
lizao zhang, standing chapter of morning chapter
gong chang zhang for treating long arch
After the user input is recognized by using the voice recognition API and the corrected sentence pattern is obtained, the system extracts the Chinese characters of the corrected word part, converts the Chinese characters into a pinyin sequence, uses the pinyin sequence to search a matched pinyin sequence in a knowledge base, and replaces the original recognition result of the corrected word part with the Chinese character words corresponding to the pinyin sequence to be used as a new corrected word part. If the correction words of the user cannot be matched with the local knowledge base, the system extracts the correction words according to the original API recognition result, if the modification sentence of the user is 'Qu is Qu' of fringed pink, and if the local knowledge base does not have corresponding records and the API can accurately recognize the fringed pink, the system can correct the wrong words by 'Qu'.
In speech recognition, the recognition result is not intended by the user, particularly when a single character is recorded, due to the interference of the accent or noise of the user. Even if the user inputs the single characters in a right-cavity circular manner, the single characters are difficult to recognize as the characters actually spoken by the user, such as 'ox' and 'Liu', 'Hu' and 'Fu', as well as errors of flat-rolled tongue sounds and front and back nasal sounds, due to the existence of accents and the fact that the single characters are not assisted by context words. In the invention, the indicator part is the result of single character recognition, and according to the process of extracting the corrected characters, if the corrected character part is correctly recognized, but the indicator part is recognized as common fuzzy-pronunciation characters, such as milk Liu, and the like, the result cannot be matched by searching in the pinyin sequence [ niu, nai ] by using the pinyin liu. This requires adding a blur sound at the time of the search to improve the correction success rate.
In the above paragraph, "Liu of milk" is taken as an example, we construct a fuzzy sound array [ liu, niu ] of Pinyin liu, and sequentially use the elements in the data to search in Pinyin sequences [ niu, nai ]. For the case where there are multiple fuzzy sounds such as zhen, the fuzzy sound array is ordered by similarity to the original sound, i.e. [ zhen, zen, zheng, zeng ]. The system will sequentially traverse the arrays and find matches in the pinyin sequence.
The use of fuzzy sound can improve the success rate of extracting corrected characters from corrected words. Similarly, when applying a corrective structure to the text, i.e., finding the wrong word and replacing, a fuzzy matching is also required to find the wrong word. The specific implementation is that the pinyin corresponding to the correct character is expanded into a fuzzy-tone array, the pinyin sequence of each element in the array in the front is used for searching, and then the found Chinese character is replaced.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (6)

1. A method for calibrating error-prone characters in voice interaction is characterized by comprising the following steps:
and a context recognizing step: creating respective contextual knowledge bases for different domains, the step of constructing the contextual knowledge bases comprising: firstly, according to keywords of a field, obtaining related documents through a search engine to serve as a corpus of the field; then, acquiring core words of the field according to semantic knowledge, and clustering according to the core words to obtain example sentences of the field, thereby constructing a context knowledge base;
in the step of identifying the context, judging according to the similarity of the text sentences and the contexts in different fields in a context knowledge base, and taking the similarity as the premise of automatic error correction; the specific algorithm of the context similarity is as follows:
s1: counting the occurrence times of each word in the text sentence A, and expressing the occurrence times into a vector form;
s2: according to a cosine similarity calculation formula, calculating a cosine value of a vector included angle between two vectors of a text sentence A and each example sentence B in a vector form in a context Ci, and taking the cosine value as the word shape similarity based on the vectors;
s3: converting all words of the text sentence A into a pinyin form, counting the occurrence times of each different pinyin sequence in the text sentence A, expressing the pinyin sequence into a vector form, calculating a cosine value of a vector included angle between two vectors of the text sentence A expressed in the pinyin form and each example sentence B in the vector form in the context Ci, and obtaining the pinyin similarity based on the vectors;
s4: the sentence similarity between the text sentence A and each example sentence B is calculated by giving different weights to the pinyin similarity and the morphological similarity, and the value with the maximum sentence similarity is selected as the sentence similarity between the text sentence A and the context Ci;
s5: calculating the matching rate of the core words of the text sentence A and the context Ci, namely the number of all the core words in the context Ci contained in the text sentence A accounts for the percentage of the number of all the words in the text sentence A;
s6: the context similarity of the text sentence A and the context Ci is calculated by giving different weights to the sentence similarity and the matching rate of the core words;
s7: computing smooth contextual similarity SmoothContextSim (a, C) of text sentence a and context Ci based on the context in fronti):
SmoothContextSim(A,Ci)=λ1·ContextSim(A-2,Ci)+λ2·ContextSim(A-1,Ci)+λ3·ContextSim(A,Ci)
λ123=1
λ1≤λ2≤λ3
Wherein, A-1,A-2Respectively representing a current text sentence, a first sentence before the current text sentence and a second sentence before the current text sentence; lambda [ alpha ]123Is a constant; ContextSim (X, Y) represents the contextual similarity of the text sentence X and the context Y;
the method for calibrating the error-prone characters in the voice interaction further comprises the following steps:
and (3) automatic error correction based on the limited semantics: and acquiring the place name to be corrected in the text sentence input by the voice of the user, and performing error correction on the place name to be corrected.
2. The method for calibrating error-prone words in voice interaction according to claim 1, wherein the automatic error correction based on constrained semantics comprises:
reading a text sentence: reading in a text sentence P input by a user voice, wherein P is P1P2...Pi...Pn(ii) a Wherein p isiRepresenting the ith Chinese character in the text sentence, and n represents the length of the text sentence;
and a to-be-corrected place name obtaining step: scanning P, and matching according to a place name matching rule to obtain a place name to be corrected;
error correction step: and carrying out short text similarity matching on the place name to be corrected and all the place names in the place name library to obtain the place name most similar to the place name to be corrected, and taking the place name most similar to the place name to be corrected as the correct place name after error checking and correcting.
3. The method of claim 2, wherein the place name matching rule comprises any one of the following rules:
rule one is as follows: if W islBelonging to the set of left boundary words, WrBelonging to the set of right boundary words, WpNumber of words WpLen is greater than 1, then WpIdentifying the place name to be corrected;
rule two: if W islIn the set of left boundary words, WrBelonging to the set of place name suffixes, then will be represented by Wp、WrFormed character string
Figure FDA0002381584340000021
Identifying the place name to be corrected;
rule three: if W islBelonging to a collection of place suffixes, WrBelonging to the set of right boundary words, WpIf the number of words is greater than 1, then W is setpIdentifying the place name to be corrected;
rule four: if W islBelonging to a collection of place suffixes, WrSet of place name suffixes, then will be represented by Wp、WrFormed character string
Figure FDA0002381584340000022
Identifying the place name to be corrected;
wherein, WlIs the previous word of the word to be corrected, WpIs a word to be corrected, WrIs the latter word of the word to be corrected.
4. The method of error-prone word alignment in speech interaction of claim 1, wherein in the limited-semantic-based auto-error correction step, a weighted longest common subsequence algorithm is used to calculate short-text similarity matches; the weighted longest common subsequence algorithm is as follows: and a similarity function exists between any two elements of the two sequences, and a public subsequence with the maximum sum of the similarities in the two sequences is searched, wherein the similarity function is defined as the pinyin similarity between two pinyins.
5. The method of claim 4, wherein the Pinyin similarity is: the similarity of initial consonants in the two phonetics and the similarity of final consonants in the two phonetics are respectively calculated, and the corresponding similarity is respectively given to the situation of syllable mixing.
6. The method of calibrating error-prone words in a voice interaction of claim 1, further comprising:
and (3) artificial error correction based on semantic feedback: correcting errors according to the correct sentence pattern input by the voice; wherein, the form of correcting sentence pattern includes:
in a first form: modifying, wherein the character A is a character C of the word B;
a second form: modifying, wherein the Nth character A is a character C of the word B;
wherein, the character A and the character C are the same character and are marked as indicating characters; the word B is a idiom or phrase containing the character A and the character C and is marked as a correcting word;
the pinyin of the indicating character is the same as the pinyin of the wrong character in the input text and the same as the pinyin of the correct character in the corrected word;
and according to the indicator, extracting the correct character from the corrected word as the corrected character for replacement.
CN201610248440.8A 2016-04-20 2016-04-20 Error-prone character calibration method in voice interaction Expired - Fee Related CN107305768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610248440.8A CN107305768B (en) 2016-04-20 2016-04-20 Error-prone character calibration method in voice interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610248440.8A CN107305768B (en) 2016-04-20 2016-04-20 Error-prone character calibration method in voice interaction

Publications (2)

Publication Number Publication Date
CN107305768A CN107305768A (en) 2017-10-31
CN107305768B true CN107305768B (en) 2020-06-12

Family

ID=60152309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610248440.8A Expired - Fee Related CN107305768B (en) 2016-04-20 2016-04-20 Error-prone character calibration method in voice interaction

Country Status (1)

Country Link
CN (1) CN107305768B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909996B (en) * 2017-11-02 2020-11-10 威盛电子股份有限公司 Voice recognition method and electronic device
CN109785842B (en) * 2017-11-14 2023-09-05 蔚来(安徽)控股有限公司 Speech recognition error correction method and speech recognition error correction system
CN108133706B (en) * 2017-12-21 2020-10-27 深圳市沃特沃德股份有限公司 Semantic recognition method and device
EP3544001B8 (en) * 2018-03-23 2022-01-12 Articulate.XYZ Ltd Processing speech-to-text transcriptions
CN108694167B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method, candidate word ordering method and device
CN108733646B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108664467B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108628826B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108647202B (en) * 2018-04-11 2022-09-06 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN109102797B (en) * 2018-07-06 2024-01-26 平安科技(深圳)有限公司 Speech recognition test method, device, computer equipment and storage medium
CN109040481A (en) * 2018-08-09 2018-12-18 武汉优品楚鼎科技有限公司 The automatic error-correcting smart phone inquiry method, system and device of field of securities
CN109036424A (en) * 2018-08-30 2018-12-18 出门问问信息科技有限公司 Audio recognition method, device, electronic equipment and computer readable storage medium
CN109065056B (en) * 2018-09-26 2021-05-11 珠海格力电器股份有限公司 Method and device for controlling air conditioner through voice
CN109166581A (en) * 2018-09-26 2019-01-08 出门问问信息科技有限公司 Audio recognition method, device, electronic equipment and computer readable storage medium
CN110135879B (en) * 2018-11-17 2024-01-16 华南理工大学 Customer service quality automatic scoring method based on natural language processing
CN111462748B (en) * 2019-01-22 2023-09-26 北京猎户星空科技有限公司 Speech recognition processing method and device, electronic equipment and storage medium
CN110288985B (en) * 2019-06-28 2022-03-08 北京猎户星空科技有限公司 Voice data processing method and device, electronic equipment and storage medium
CN110880316A (en) * 2019-10-16 2020-03-13 苏宁云计算有限公司 Audio output method and system
CN111028834B (en) * 2019-10-30 2023-01-20 蚂蚁财富(上海)金融信息服务有限公司 Voice message reminding method and device, server and voice message reminding equipment
CN110807319B (en) * 2019-10-31 2023-07-25 北京奇艺世纪科技有限公司 Text content detection method, detection device, electronic equipment and storage medium
CN111144101B (en) * 2019-12-26 2021-12-03 北大方正集团有限公司 Wrongly written character processing method and device
CN111209737B (en) * 2019-12-30 2022-09-13 厦门市美亚柏科信息股份有限公司 Method for screening out noise document and computer readable storage medium
CN111541904B (en) * 2020-04-15 2024-03-22 腾讯科技(深圳)有限公司 Information prompting method, device, equipment and storage medium in live broadcast process
CN111554295B (en) * 2020-04-24 2021-06-22 科大讯飞(苏州)科技有限公司 Text error correction method, related device and readable storage medium
CN111581970B (en) * 2020-05-12 2023-01-24 厦门市美亚柏科信息股份有限公司 Text recognition method, device and storage medium for network context
CN111782896B (en) * 2020-07-03 2023-12-12 深圳市壹鸽科技有限公司 Text processing method, device and terminal after voice recognition
CN114079797A (en) * 2020-08-14 2022-02-22 阿里巴巴集团控股有限公司 Live subtitle generation method and device, server, live client and live system
CN112016305B (en) * 2020-09-09 2023-03-28 平安科技(深圳)有限公司 Text error correction method, device, equipment and storage medium
CN111931490B (en) * 2020-09-27 2021-01-08 平安科技(深圳)有限公司 Text error correction method, device and storage medium
CN112863516B (en) * 2020-12-31 2024-07-23 竹间智能科技(上海)有限公司 Text error correction method and system and electronic equipment
CN112331191B (en) * 2021-01-07 2021-04-16 广州华源网络科技有限公司 Voice recognition system and method based on big data
CN112767924A (en) * 2021-02-26 2021-05-07 北京百度网讯科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112905026B (en) * 2021-03-30 2024-04-16 完美世界控股集团有限公司 Method, device, storage medium and computer equipment for showing word suggestion
CN118588065A (en) * 2024-08-02 2024-09-03 比亚迪股份有限公司 Intention recognition method, electronic device, vehicle and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001273293A (en) * 2000-03-23 2001-10-05 Nippon Telegr & Teleph Corp <Ntt> Method and device for estimating word and recording medium storing word estimation program
CN101655837A (en) * 2009-09-08 2010-02-24 北京邮电大学 Method for detecting and correcting error on text after voice recognition
JP2014035361A (en) * 2012-08-07 2014-02-24 Nippon Telegr & Teleph Corp <Ntt> Speech recognition device and method and program thereof
CN105183848A (en) * 2015-09-07 2015-12-23 百度在线网络技术(北京)有限公司 Human-computer chatting method and device based on artificial intelligence
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001273293A (en) * 2000-03-23 2001-10-05 Nippon Telegr & Teleph Corp <Ntt> Method and device for estimating word and recording medium storing word estimation program
CN101655837A (en) * 2009-09-08 2010-02-24 北京邮电大学 Method for detecting and correcting error on text after voice recognition
JP2014035361A (en) * 2012-08-07 2014-02-24 Nippon Telegr & Teleph Corp <Ntt> Speech recognition device and method and program thereof
CN105183848A (en) * 2015-09-07 2015-12-23 百度在线网络技术(北京)有限公司 Human-computer chatting method and device based on artificial intelligence
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于实例语境的语音识别后文本检错与纠错研究;龙丽霞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315(第3期);第22、36页 *
基于生物实体语境的语音识别后文本纠错算法研究;姜俊;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120815(第8期);第27-29页 *
龙丽霞.基于实例语境的语音识别后文本检错与纠错研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2011,(第3期),第I136-164页. *

Also Published As

Publication number Publication date
CN107305768A (en) 2017-10-31

Similar Documents

Publication Publication Date Title
CN107305768B (en) Error-prone character calibration method in voice interaction
US11238845B2 (en) Multi-dialect and multilingual speech recognition
US8868431B2 (en) Recognition dictionary creation device and voice recognition device
Schuster et al. Japanese and korean voice search
CN110517663B (en) Language identification method and system
CN107729313B (en) Deep neural network-based polyphone pronunciation distinguishing method and device
KR102390940B1 (en) Context biasing for speech recognition
US20070219777A1 (en) Identifying language origin of words
WO2022105235A1 (en) Information recognition method and apparatus, and storage medium
CN112287680B (en) Entity extraction method, device and equipment of inquiry information and storage medium
CN113327574A (en) Speech synthesis method, device, computer equipment and storage medium
CN112232055A (en) Text detection and correction method based on pinyin similarity and language model
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
KR20230156125A (en) Lookup table recursive language model
JP2007041319A (en) Speech recognition device and speech recognition method
CN114999463B (en) Voice recognition method, device, equipment and medium
Granell et al. Multimodality, interactivity, and crowdsourcing for document transcription
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
JP4764203B2 (en) Speech recognition apparatus and speech recognition program
CN113012685B (en) Audio recognition method and device, electronic equipment and storage medium
Pellegrini et al. Automatic word decompounding for asr in a morphologically rich language: Application to amharic
US11341961B2 (en) Multi-lingual speech recognition and theme-semanteme analysis method and device
JP2011048405A (en) Speech recognition device and speech recognition program
JP2006107353A (en) Information processor, information processing method, recording medium and program
JP5480844B2 (en) Word adding device, word adding method and program thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200612

CF01 Termination of patent right due to non-payment of annual fee