CN110600036A

CN110600036A - Conference picture switching device and method based on voice recognition

Info

Publication number: CN110600036A
Application number: CN201910907963.2A
Authority: CN
Inventors: 陈洪浩; 冯文澜
Original assignee: Suirui Technology Group Co Ltd
Current assignee: Suirui Technology Group Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2019-12-20

Abstract

The invention discloses a conference picture switching device and method based on voice recognition, wherein the conference picture switching method based on voice recognition comprises the following steps: step one, generating a voice recognition library based on sign-in information and platform address book information; secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result; thirdly, performing semantic analysis on the matching result; and step four, switching pictures according to the semantic analysis result. The conference picture switching method based on the voice recognition enables the conference picture to be switched more intelligently, and has real experience of face-to-face meeting. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.

Description

Conference picture switching device and method based on voice recognition

Technical Field

The present invention relates to the field of wireless communication, and more particularly, to a conference screen switching device and method based on voice recognition.

Background

In the process of the video conference, especially in the multi-party conference, the video conference object picture is often required to be switched so as to ensure better conference effect.

In the prior art, conference pictures are switched to manual switching or voice excitation, wherein leaders or other members are required to put forward video switching requirements in the manual switching process, then operators perform video switching, and the conference pictures to be switched need to be searched in the operation process, so that the efficiency is low, and the experience is poor. In the process of voice excitation, a certain speaking party speaks, and corresponding video switching is performed after voice is recognized. However, in an actual process, a speaker temporarily leaves without hearing a call, the handover function cannot be triggered, the misjudgment rate is high, the active handover function is not supported, only 245428is available, core waits for and continuously queries, the effect is poor, and the experience is poor.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a conference picture switching device based on voice recognition and a method thereof, which enable the conference picture switching to be more intelligent and have the real experience of face-to-face meeting. Not only can accurately judge the participant who the user wants to watch, but also can add specific voice operation, thereby improving the conference experience.

In order to achieve the above object, the present invention provides a conference screen switching apparatus based on voice recognition, including: the voice recognition library generating module generates a voice recognition library based on the check-in information and the platform address book information; the voice recognition module is used for carrying out conference voice recognition based on the voice recognition library so as to find a matching result; the semantic analysis module is used for carrying out semantic analysis on the matching result of the voice recognition module; and the picture switching module is used for switching pictures according to the semantic analysis result.

The invention also provides a conference picture switching method based on voice recognition, which comprises the following steps: step one, generating a voice recognition library based on sign-in information and platform address book information; secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result; thirdly, performing semantic analysis on the matching result; and step four, switching pictures according to the semantic analysis result.

In a preferred embodiment, the step one specifically includes: the method comprises a sign-in step, a step of acquiring information of participating members from a platform and a step of generating a voice recognition detection list.

In a preferred embodiment, the platform address book information includes: names, nicknames and remark names of others.

In a preferred embodiment, the method for acquiring the check-in information includes: face recognition, manual check-in, card swiping check-in and terminal automatic check-in.

In a preferred embodiment, the second step specifically includes: the method comprises a voice recognition step, a matching step based on a voice recognition library and a matching judgment step.

In a preferred embodiment, step three specifically includes: semantic learning and editing, semantic analysis and generation recording and semantic scene conforming judgment.

In a preferred embodiment, the semantic analysis is to analyze whether the main sentence calls or talks about a person or directs an operation.

In a preferred embodiment, the step four specifically includes: a step of viewing display strategy and a step of switching conference pictures.

In a preferred embodiment, the display strategy for switching the screen is as follows: the picture with large proportion of people is displayed preferentially, and if the proportion of people is equivalent, the front picture of people is displayed preferentially.

Compared with the prior art, the conference picture switching method based on the voice recognition integrates the sign-in information and the address book information into the voice recognition library, so that the conference picture switching is more intelligent, and the conference picture switching method has real experience of face-to-face meeting. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.

Drawings

Fig. 1 is a flowchart of a conference screen switching method based on voice recognition according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

The conference picture switching device based on voice recognition according to the preferred embodiment of the present invention comprises: the device comprises a voice recognition library generation module, a voice recognition module, a semantic analysis module and a picture switching module. The voice recognition library generating module generates a voice recognition library based on the sign-in information and the platform address book information; the voice recognition module carries out conference voice recognition based on the voice recognition library so as to find a matching result; the semantic analysis module is used for performing semantic analysis on the matching result of the voice recognition module; and the picture switching module is used for switching pictures according to the semantic analysis result.

As shown in fig. 1, the main flow of the switching method of the conference picture switching apparatus based on voice recognition according to the preferred embodiment of the present invention is as follows: and (3) voice recognition name-the terminal (or the conference room system) corresponding to the matched name-switching the camera picture of the corresponding terminal (or the conference room system). The method specifically comprises the following steps:

firstly, generating a voice recognition library based on sign-in information and platform address book information;

the steps mainly relate to platform address book and sign-in function. Wherein, platform address book information includes: names and nicknames of the participants and remark names of the participants. The acquisition mode of the check-in information comprises the following steps: the system comprises face recognition, manual check-in, card swiping check-in and terminal automatic check-in, and is used for determining information such as presence members of a conference room, a place where the conference room is located and the like. The conference picture switching method combines the information to form a voice recognition library and a query table.

Secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result;

for example: "… … the next time is handed to Zhang III, 'Zhang III' (i.e., the call process where a match is found)".

Thirdly, performing semantic analysis on the matching result;

semantic analysis means to analyze whether a main sentence calls a person, talks about a person or commands an operation; if the semantic is determined to be the call instruction, finding the appointed meeting place according to the query table, and switching the pictures. For example: "… … the next time is handed to Zhang III, 'Zhang III', two 'Zhang III' appeared in the former words, both will be recognized in the speech recognition, because of recognizing the name, both will be lost to the semantic analysis step for analysis, according to the pause time before and after, the coherence, or the prior art means to analyze whether it is the call instruction. In the conference process, the switching is not sensed (for example, Zhang III wants to make the Li IV send an opinion, only the Li IV needs to be said, and what you have is seen, at this moment, the picture is switched to the picture of the Li IV, and then everybody can see the video picture of the Li IV to wait for the answer of the Li IV, so that the real experience of face-to-face meeting is better). And the conference picture switching method based on the voice recognition is added with a semantic recognition function, so that not only can accurate judgment be carried out on participants who the user wants to watch, but also specific voice operation (command operation) can be added, such as switching to a Beijing conference place.

And finally, switching pictures according to the semantic analysis result.

Switching screens based on a display policy (for example, directly switching screens in the case of a single window, switching screens in the case of multiple windows, switching screens in a large window, switching screens in a window arranged at the top in the case of a large window, etc.), for example: zhang III can sign in meeting room 1, and at the same time, a notebook computer and a mobile phone are used in meeting room 1 to carry out a meeting, at the moment, more than two cameras are all aligned to Zhang III, such as meeting room 1 system cameras (more than 2 cameras can be used in a meeting room system) and notebook terminal cameras. In this case, there may be a priority algorithm, such as a priority display in which the image person ratio is large (the upper face is certainly the notebook terminal camera), and if the ratio is about large, a priority display in the front face, etc. The method can also be matched with the voice excitation mode to select the method with short pickup distance.

In conclusion, the conference picture switching method based on the voice recognition integrates the sign-in information and the address book information into the voice recognition library, so that the conference picture switching is more intelligent, and the real experience of face-to-face meeting is achieved. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A conference screen switching apparatus based on voice recognition, comprising:

the voice recognition library generating module generates a voice recognition library based on the check-in information and the platform address book information;

the voice recognition module is used for carrying out conference voice recognition based on the voice recognition library so as to find a matching result;

the semantic analysis module is used for carrying out semantic analysis on the matching result of the voice recognition module; and

and the picture switching module is used for switching pictures according to the semantic analysis result.

2. The switching method of the conference screen switching apparatus based on the voice recognition as claimed in claim 1, comprising the steps of:

step one, generating a voice recognition library based on sign-in information and platform address book information;

thirdly, performing semantic analysis on the matching result;

and step four, switching pictures according to the semantic analysis result.

3. The conference screen switching method based on speech recognition as claimed in claim 2, wherein the first step specifically comprises: the method comprises a sign-in step, a step of acquiring information of participating members from a platform and a step of generating a voice recognition detection list.

4. The conference screen switching method based on voice recognition according to claim 2, wherein the platform address book information includes: names, nicknames and remark names of others.

5. The conference screen switching method based on voice recognition according to claim 2, wherein the acquisition mode of the check-in information includes: face recognition, manual check-in, card swiping check-in and terminal automatic check-in.

6. The conference screen switching method based on speech recognition according to claim 3, wherein the second step specifically comprises: the method comprises a voice recognition step, a matching step based on a voice recognition library and a matching judgment step.

7. The conference screen switching method based on speech recognition according to claim 6, wherein the third step specifically comprises: semantic learning and editing, semantic analysis and generation recording and semantic scene conforming judgment.

8. The method as claimed in claim 2, wherein the semantic analysis is to analyze whether a main sentence is to call or talk about a person or to command an operation.

9. The conference screen switching method based on speech recognition according to claim 7, wherein the fourth step specifically comprises: a step of viewing display strategy and a step of switching conference pictures.

10. The conference screen switching method based on speech recognition according to claim 2, wherein the display policy of the switching screen is: the picture with large proportion of people is displayed preferentially, and if the proportion of people is equivalent, the front picture of people is displayed preferentially.