CN113113019A - Voice library generating system and method - Google Patents

Voice library generating system and method Download PDF

Info

Publication number
CN113113019A
CN113113019A CN202110328947.5A CN202110328947A CN113113019A CN 113113019 A CN113113019 A CN 113113019A CN 202110328947 A CN202110328947 A CN 202110328947A CN 113113019 A CN113113019 A CN 113113019A
Authority
CN
China
Prior art keywords
voice
module
server
data
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110328947.5A
Other languages
Chinese (zh)
Inventor
尤文杰
邬锡敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hongzhen Information Science & Technology Co ltd
Original Assignee
Shanghai Hongzhen Information Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hongzhen Information Science & Technology Co ltd filed Critical Shanghai Hongzhen Information Science & Technology Co ltd
Priority to CN202110328947.5A priority Critical patent/CN113113019A/en
Publication of CN113113019A publication Critical patent/CN113113019A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a voice library generating system and a method, belonging to the technical field of voice library systems, and comprising a client, a voice recording system, a server and an instruction output end, wherein the voice recording system is connected to the server, voice data are collected through the voice recording system, the collected voice data are transmitted to the server, data comparison and storage are carried out through the server, the client is connected to the server, the voice instruction is input into the server through the client, and the voice instruction is output through the instruction output end after the voice instruction is compared through the server, so that voice instruction output is finished.

Description

Voice library generating system and method
Technical Field
The invention relates to the technical field of voice library systems, in particular to a voice library generating system and a voice library generating method.
Background
With the development of voice recognition technology, digital equipment and multimedia technology, voice endpoint detection technology has been well developed, voice endpoint detection is a technology for detecting voice segments in continuous signals, and voice endpoint detection can be combined with an automatic voice recognition system and a voiceprint recognition system, so that a voice library needs to be further improved in order to issue instructions directly by multiple devices through languages.
Disclosure of Invention
The embodiment of the invention provides a system and a method for generating a voice library, which aim to solve the technical problem that the voice library in the prior art needs to be further improved.
The embodiment of the invention adopts the following technical scheme: a voice library generating system comprises a client, a voice recording system, a server and an instruction output end, wherein the voice recording system is connected to the server, voice data are collected through the voice recording system, the collected voice data are transmitted to the server, data comparison and storage are carried out through the server, the client is connected to the server, the client inputs voice instructions into the server to be carried out, and the voice instructions are output through the instruction output end after being compared through the server, so that voice instruction output is completed.
Furthermore, the server is composed of a voice matching classification module, a voice data repository, a voice receiving module and a voice comparison module, the voice recording system is connected to the voice matching classification module, voice data and instructions recorded by the voice recording system can be correspondingly classified through the voice matching classification module, the classified voice data are transmitted to the voice data repository for storage, a voice instruction sent by the client is input into the voice database through the voice receiving module, the voice instruction input by the client is compared with the voice data in the voice data repository through the voice comparison module, and therefore the relative instruction of the adaptation part is output through the instruction output end.
Furthermore, the server is also provided with an invalid voice library, the voice matching classification module and the voice data repository are connected to the invalid voice library, unrecognized voice recorded in the voice matching classification module is transmitted to the invalid voice library, invalid voice data is input into the invalid voice library in the voice database, so that the space occupied by the invalid voice data in the voice data repository can be reduced, and a manager can regularly check the voice data in the invalid voice library for debugging.
Furthermore, an error feedback module is arranged in the server, and when the sound data output by the client violates the wish of the client, the sound data can be fed back through the error feedback module, so that the server can be improved according to the requirements of the client conveniently.
Further, the voice recording system comprises a task deployment module and a plurality of recording ends, the plurality of recording ends are deployed with tasks through the task deployment module, the plurality of recording ends record voice data according to the tasks deployed by the task deployment module, and the recorded voice data are transmitted to the server for centralized processing and storage, so that the voice data are collected and learned.
Furthermore, a network uploading module is arranged at the receiving and recording end, and the sound data recorded by the receiving and recording end is transmitted to the server through the network uploading module, so that the receiving and recording efficiency of the receiving and recording end can be greatly improved, and different sound data are used for receiving and recording from various places.
A voice library generation method comprises the following steps:
s1: and the task deployment module performs task deployment on the plurality of receiving and recording ends.
S2: and the recording end records the specified voice data according to the tasks arranged by the task deployment module and uploads the voice data to the server through the network uploading module.
S3: and matching and classifying the sound data through a voice matching and classifying module, transmitting the classified sound data to a sound data repository for storage, and transmitting the unrecognized voice to an invalid voice library.
S4: the client side transmits the instruction sound to the sound comparison module through the voice receiving module, and compares the direct current sound with the sound data in the voice data storage base through the sound comparison module, so that a matched instruction is obtained.
S5: and outputting the instruction sound through an instruction output end.
S6: the sound data output by the client end can be fed back through the error feedback module when the sound data output by the client end violates the desire of the client end.
The embodiment of the invention adopts at least one technical scheme which can achieve the following beneficial effects:
the system comprises a server, a client, a voice command input end, a command output end, a voice receiving module, a voice database and a voice comparison module, wherein the server is used for comparing and storing data, the client inputs the voice command into the server, the server is used for comparing the voice command and outputting the voice command through the command output end, the voice command sent by the client is input into the voice database through the voice receiving module, the voice command input by the client is compared with voice data in the voice data repository through the voice comparison module, and therefore a matching part carries out relative command and outputs the voice command through the command output end.
Secondly, an invalid voice library is arranged in the system, the voice matching classification module and the voice data repository are connected to the invalid voice library, unrecognized voice recorded in the voice matching classification module is transmitted to the invalid voice library, invalid voice data are input into the invalid voice library in the voice database, accordingly, the space occupied by the invalid voice data in the voice data repository can be reduced, and a manager can regularly check the voice data in the invalid voice library for debugging.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a system architecture diagram of the present invention;
fig. 2 is an architecture diagram of the voice recording system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
A voice library generating system comprises a client, a voice recording system, a server and an instruction output end, wherein the voice recording system is connected to the server, voice data are collected through the voice recording system, the collected voice data are transmitted to the server, data comparison and storage are carried out through the server, the client is connected to the server, the client inputs voice instructions into the server to be carried out, and the voice instructions are output through the instruction output end after being compared through the server, so that voice instruction output is completed.
Preferably, the server is composed of a voice matching classification module, a voice data repository, a voice receiving module and a voice comparison module, the voice recording system is connected to the voice matching classification module, voice data and instructions recorded by the voice recording system can be correspondingly classified through the voice matching classification module, the classified voice data are transmitted to the voice data repository for storage, a voice instruction sent by the client is input into the voice database through the voice receiving module, the voice instruction input by the client is compared with the voice data in the voice data repository through the voice comparison module, and therefore the relative instruction at the adaptation position is output through the instruction output end.
Preferably, the server is further provided with an invalid voice library, the voice matching classification module and the voice data repository are both connected to the invalid voice library, unrecognized voice recorded in the voice matching classification module is transmitted to the invalid voice library, and invalid voice data is input into the invalid voice library in the voice database, so that the space occupied by the invalid voice data in the voice data repository can be reduced, and a manager can regularly check and debug the voice data in the invalid voice library.
Preferably, the server is provided with an error feedback module, and when the sound data output by the client violates the wish of the client, the feedback can be performed through the error feedback module, so that the server can improve the sound data according to the requirements of the client.
Preferably, the voice recording system is composed of a task deployment module and a plurality of recording ends, the tasks are deployed on the plurality of recording ends through the task deployment module, the plurality of recording ends record voice data according to the tasks deployed by the task deployment module, and the recorded voice data are transmitted to the server to be processed and stored in a centralized manner, so that the voice data are collected and learned.
Preferably, a network uploading module is arranged at the receiving and recording end, and the sound data recorded by the receiving and recording end is transmitted to the server through the network uploading module, so that the receiving and recording efficiency of the receiving and recording end can be greatly improved, and different sound data are used for receiving and recording from various places.
A voice library generation method comprises the following steps:
s1: and the task deployment module performs task deployment on the plurality of receiving and recording ends.
S2: and the recording end records the specified voice data according to the tasks arranged by the task deployment module and uploads the voice data to the server through the network uploading module.
S3: and matching and classifying the sound data through a voice matching and classifying module, transmitting the classified sound data to a sound data repository for storage, and transmitting the unrecognized voice to an invalid voice library.
S4: the client side transmits the instruction sound to the sound comparison module through the voice receiving module, and compares the direct current sound with the sound data in the voice data storage base through the sound comparison module, so that a matched instruction is obtained.
S5: and outputting the instruction sound through an instruction output end.
S6: the sound data output by the client end can be fed back through the error feedback module when the sound data output by the client end violates the desire of the client end.
The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (7)

1. A speech library generation system, comprising: the voice recording system is connected to the server, voice data are collected through the voice recording system, the collected voice data are transmitted to the server, data comparison and storage are conducted through the server, the client is connected to the server, the voice instruction is input into the server through the client, the voice instruction is output through the instruction output end after being compared through the server, and therefore voice instruction output is completed.
2. A speech library generation system according to claim 1, wherein: the server comprises a voice matching classification module, a voice data storage base, a voice receiving module and a voice comparison module, wherein the voice receiving system is connected to the voice matching classification module, voice data and instructions received by the voice receiving system can be correspondingly classified through the voice matching classification module, the classified voice data are transmitted to the voice data storage base to be stored, a voice instruction sent by the client is input into the voice data base through the voice receiving module, the voice instruction input by the client is compared with the voice data in the voice data storage base through the voice comparison module, and therefore the relative instruction of the adaptation part is output through the instruction output end.
3. A speech library generation system according to claim 1, wherein: the server is also provided with an invalid voice library, the voice matching classification module and the voice data repository are connected to the invalid voice library, unrecognizable voices recorded in the voice matching classification module are transmitted to the invalid voice library, invalid voice data are input into the invalid voice library in the voice database, so that the space occupied by the invalid voice data in the voice data repository can be reduced, and a manager can regularly check the voice data in the invalid voice library for debugging.
4. A speech library generation system according to claim 1, wherein: the server is provided with an error feedback module, and when the sound data output by the client violates the wish of the client, the feedback can be carried out through the error feedback module, so that the server can be improved according to the requirements of clients conveniently.
5. A speech library generation system according to claim 1, wherein: the voice recording system is composed of a task deployment module and a plurality of recording ends, the plurality of recording ends are deployed through the task deployment module, the plurality of recording ends record voice data according to the tasks deployed by the task deployment module, and the recorded voice data are transmitted to the server to be processed and stored in a centralized mode, so that the voice data are collected and learned.
6. A speech library generation system according to claim 1, wherein: the network uploading module is arranged at the recording end, and the sound data recorded by the recording end is transmitted to the server through the network uploading module, so that the recording efficiency of the recording end can be greatly improved, and different sound data are used for recording from different places.
7. A method for a speech library generation system according to claims 1-6, comprising the steps of:
s1: the task deployment module performs task deployment on the plurality of recording ends;
s2: the recording end records the appointed voice data according to the tasks arranged by the task deployment module and uploads the voice data to the server through the network uploading module;
s3: matching and classifying the sound data through a voice matching and classifying module, transmitting the classified sound data to a sound data repository for storage, and transmitting the unrecognized voice to an invalid voice library;
s4: the client side transmits the instruction sound to the sound comparison module through the voice receiving module, and compares the direct current sound with the sound data in the voice data repository through the sound comparison module so as to acquire a matched instruction;
s5: outputting the instruction sound through an instruction output end;
s6: the sound data output by the client end can be fed back through the error feedback module when the sound data output by the client end violates the desire of the client end.
CN202110328947.5A 2021-03-27 2021-03-27 Voice library generating system and method Pending CN113113019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110328947.5A CN113113019A (en) 2021-03-27 2021-03-27 Voice library generating system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110328947.5A CN113113019A (en) 2021-03-27 2021-03-27 Voice library generating system and method

Publications (1)

Publication Number Publication Date
CN113113019A true CN113113019A (en) 2021-07-13

Family

ID=76712393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110328947.5A Pending CN113113019A (en) 2021-03-27 2021-03-27 Voice library generating system and method

Country Status (1)

Country Link
CN (1) CN113113019A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000767A (en) * 2006-01-09 2007-07-18 杭州世导科技有限公司 Speech recognition equipment and method
US20090210221A1 (en) * 2008-02-20 2009-08-20 Shin-Ichi Isobe Communication system for building speech database for speech synthesis, relay device therefor, and relay method therefor
CN101847406A (en) * 2010-05-18 2010-09-29 中国农业大学 Speech recognition query method and system
CN102708858A (en) * 2012-06-27 2012-10-03 厦门思德电子科技有限公司 Voice bank realization voice recognition system and method based on organizing way
CN203456091U (en) * 2013-04-03 2014-02-26 中金数据系统有限公司 Construction system of speech corpus
CN103927006A (en) * 2014-04-08 2014-07-16 弗徕威智能机器人科技(上海)有限公司 Robot based information interaction system and method
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN109102807A (en) * 2018-10-18 2018-12-28 珠海格力电器股份有限公司 Personalized voice database creation system, voice recognition control system and terminal
CN109389969A (en) * 2018-10-29 2019-02-26 百度在线网络技术(北京)有限公司 Corpus optimization method and device
CN109471931A (en) * 2018-11-22 2019-03-15 平安科技(深圳)有限公司 Corpus collection method, device, computer equipment and storage medium
CN109801628A (en) * 2019-02-11 2019-05-24 龙马智芯(珠海横琴)科技有限公司 A kind of corpus collection method, apparatus and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000767A (en) * 2006-01-09 2007-07-18 杭州世导科技有限公司 Speech recognition equipment and method
US20090210221A1 (en) * 2008-02-20 2009-08-20 Shin-Ichi Isobe Communication system for building speech database for speech synthesis, relay device therefor, and relay method therefor
CN101847406A (en) * 2010-05-18 2010-09-29 中国农业大学 Speech recognition query method and system
CN102708858A (en) * 2012-06-27 2012-10-03 厦门思德电子科技有限公司 Voice bank realization voice recognition system and method based on organizing way
CN203456091U (en) * 2013-04-03 2014-02-26 中金数据系统有限公司 Construction system of speech corpus
CN103927006A (en) * 2014-04-08 2014-07-16 弗徕威智能机器人科技(上海)有限公司 Robot based information interaction system and method
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN109102807A (en) * 2018-10-18 2018-12-28 珠海格力电器股份有限公司 Personalized voice database creation system, voice recognition control system and terminal
CN109389969A (en) * 2018-10-29 2019-02-26 百度在线网络技术(北京)有限公司 Corpus optimization method and device
CN109471931A (en) * 2018-11-22 2019-03-15 平安科技(深圳)有限公司 Corpus collection method, device, computer equipment and storage medium
CN109801628A (en) * 2019-02-11 2019-05-24 龙马智芯(珠海横琴)科技有限公司 A kind of corpus collection method, apparatus and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王楠: "语料库在药学英语词汇教学中的应用", 《湖北科技学院学报》 *

Similar Documents

Publication Publication Date Title
US11018885B2 (en) Summarization system
WO2020238209A1 (en) Audio processing method, system and related device
Havard et al. Speech-coco: 600k visually grounded spoken captions aligned to mscoco data set
US9595255B2 (en) Single interface for local and remote speech synthesis
US20200004878A1 (en) System and method for generating dialogue graphs
Sangwan et al. 'houston, we have a solution': using NASA apollo program to advance speech and language processing technology.
CN111798833A (en) Voice test method, device, equipment and storage medium
WO2022074869A1 (en) System and method for producing metadata of an audio signal
CN117762464A (en) Cloud computing-based software operation and maintenance system and method
CN112734604A (en) Device for providing multi-mode intelligent case report and record generation method thereof
CN113113019A (en) Voice library generating system and method
CN101950564A (en) Remote digital voice acquisition, analysis and identification system
KR102307249B1 (en) Storage system of voice recording information based on blockchain
JP2545914B2 (en) Speech recognition method
JP2005196020A (en) Speech processing apparatus, method, and program
CN108170669A (en) Power dispatching network command issuing method, system and voice recognition and verification unit module thereof
CN112270922B (en) Automatic filling method and device for scheduling log
US10915715B2 (en) System and method for identifying and tagging assets within an AV file
US11392639B2 (en) Method and apparatus for automatic speaker diarization
CN113763949A (en) Speech recognition correction method, electronic device, and computer-readable storage medium
US8831940B2 (en) Hierarchical quick note to allow dictated code phrases to be transcribed to standard clauses
CN111914777B (en) Method and system for identifying robot instruction in cross-mode manner
CN118438441A (en) Intelligent voice management system of scenic spot self-service robot
CN111785260B (en) Clause method and device, storage medium and electronic equipment
CN113066507B (en) End-to-end speaker separation method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713

RJ01 Rejection of invention patent application after publication