CN110600036A - Conference picture switching device and method based on voice recognition - Google Patents
Conference picture switching device and method based on voice recognition Download PDFInfo
- Publication number
- CN110600036A CN110600036A CN201910907963.2A CN201910907963A CN110600036A CN 110600036 A CN110600036 A CN 110600036A CN 201910907963 A CN201910907963 A CN 201910907963A CN 110600036 A CN110600036 A CN 110600036A
- Authority
- CN
- China
- Prior art keywords
- voice recognition
- conference
- method based
- semantic analysis
- switching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000001514 detection method Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 230000005284 excitation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a conference picture switching device and method based on voice recognition, wherein the conference picture switching method based on voice recognition comprises the following steps: step one, generating a voice recognition library based on sign-in information and platform address book information; secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result; thirdly, performing semantic analysis on the matching result; and step four, switching pictures according to the semantic analysis result. The conference picture switching method based on the voice recognition enables the conference picture to be switched more intelligently, and has real experience of face-to-face meeting. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.
Description
Technical Field
The present invention relates to the field of wireless communication, and more particularly, to a conference screen switching device and method based on voice recognition.
Background
In the process of the video conference, especially in the multi-party conference, the video conference object picture is often required to be switched so as to ensure better conference effect.
In the prior art, conference pictures are switched to manual switching or voice excitation, wherein leaders or other members are required to put forward video switching requirements in the manual switching process, then operators perform video switching, and the conference pictures to be switched need to be searched in the operation process, so that the efficiency is low, and the experience is poor. In the process of voice excitation, a certain speaking party speaks, and corresponding video switching is performed after voice is recognized. However, in an actual process, a speaker temporarily leaves without hearing a call, the handover function cannot be triggered, the misjudgment rate is high, the active handover function is not supported, only 245428is available, core waits for and continuously queries, the effect is poor, and the experience is poor.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a conference picture switching device based on voice recognition and a method thereof, which enable the conference picture switching to be more intelligent and have the real experience of face-to-face meeting. Not only can accurately judge the participant who the user wants to watch, but also can add specific voice operation, thereby improving the conference experience.
In order to achieve the above object, the present invention provides a conference screen switching apparatus based on voice recognition, including: the voice recognition library generating module generates a voice recognition library based on the check-in information and the platform address book information; the voice recognition module is used for carrying out conference voice recognition based on the voice recognition library so as to find a matching result; the semantic analysis module is used for carrying out semantic analysis on the matching result of the voice recognition module; and the picture switching module is used for switching pictures according to the semantic analysis result.
The invention also provides a conference picture switching method based on voice recognition, which comprises the following steps: step one, generating a voice recognition library based on sign-in information and platform address book information; secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result; thirdly, performing semantic analysis on the matching result; and step four, switching pictures according to the semantic analysis result.
In a preferred embodiment, the step one specifically includes: the method comprises a sign-in step, a step of acquiring information of participating members from a platform and a step of generating a voice recognition detection list.
In a preferred embodiment, the platform address book information includes: names, nicknames and remark names of others.
In a preferred embodiment, the method for acquiring the check-in information includes: face recognition, manual check-in, card swiping check-in and terminal automatic check-in.
In a preferred embodiment, the second step specifically includes: the method comprises a voice recognition step, a matching step based on a voice recognition library and a matching judgment step.
In a preferred embodiment, step three specifically includes: semantic learning and editing, semantic analysis and generation recording and semantic scene conforming judgment.
In a preferred embodiment, the semantic analysis is to analyze whether the main sentence calls or talks about a person or directs an operation.
In a preferred embodiment, the step four specifically includes: a step of viewing display strategy and a step of switching conference pictures.
In a preferred embodiment, the display strategy for switching the screen is as follows: the picture with large proportion of people is displayed preferentially, and if the proportion of people is equivalent, the front picture of people is displayed preferentially.
Compared with the prior art, the conference picture switching method based on the voice recognition integrates the sign-in information and the address book information into the voice recognition library, so that the conference picture switching is more intelligent, and the conference picture switching method has real experience of face-to-face meeting. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.
Drawings
Fig. 1 is a flowchart of a conference screen switching method based on voice recognition according to an embodiment of the present invention.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
The conference picture switching device based on voice recognition according to the preferred embodiment of the present invention comprises: the device comprises a voice recognition library generation module, a voice recognition module, a semantic analysis module and a picture switching module. The voice recognition library generating module generates a voice recognition library based on the sign-in information and the platform address book information; the voice recognition module carries out conference voice recognition based on the voice recognition library so as to find a matching result; the semantic analysis module is used for performing semantic analysis on the matching result of the voice recognition module; and the picture switching module is used for switching pictures according to the semantic analysis result.
As shown in fig. 1, the main flow of the switching method of the conference picture switching apparatus based on voice recognition according to the preferred embodiment of the present invention is as follows: and (3) voice recognition name-the terminal (or the conference room system) corresponding to the matched name-switching the camera picture of the corresponding terminal (or the conference room system). The method specifically comprises the following steps:
firstly, generating a voice recognition library based on sign-in information and platform address book information;
the steps mainly relate to platform address book and sign-in function. Wherein, platform address book information includes: names and nicknames of the participants and remark names of the participants. The acquisition mode of the check-in information comprises the following steps: the system comprises face recognition, manual check-in, card swiping check-in and terminal automatic check-in, and is used for determining information such as presence members of a conference room, a place where the conference room is located and the like. The conference picture switching method combines the information to form a voice recognition library and a query table.
Secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result;
for example: "… … the next time is handed to Zhang III, 'Zhang III' (i.e., the call process where a match is found)".
Thirdly, performing semantic analysis on the matching result;
semantic analysis means to analyze whether a main sentence calls a person, talks about a person or commands an operation; if the semantic is determined to be the call instruction, finding the appointed meeting place according to the query table, and switching the pictures. For example: "… … the next time is handed to Zhang III, 'Zhang III', two 'Zhang III' appeared in the former words, both will be recognized in the speech recognition, because of recognizing the name, both will be lost to the semantic analysis step for analysis, according to the pause time before and after, the coherence, or the prior art means to analyze whether it is the call instruction. In the conference process, the switching is not sensed (for example, Zhang III wants to make the Li IV send an opinion, only the Li IV needs to be said, and what you have is seen, at this moment, the picture is switched to the picture of the Li IV, and then everybody can see the video picture of the Li IV to wait for the answer of the Li IV, so that the real experience of face-to-face meeting is better). And the conference picture switching method based on the voice recognition is added with a semantic recognition function, so that not only can accurate judgment be carried out on participants who the user wants to watch, but also specific voice operation (command operation) can be added, such as switching to a Beijing conference place.
And finally, switching pictures according to the semantic analysis result.
Switching screens based on a display policy (for example, directly switching screens in the case of a single window, switching screens in the case of multiple windows, switching screens in a large window, switching screens in a window arranged at the top in the case of a large window, etc.), for example: zhang III can sign in meeting room 1, and at the same time, a notebook computer and a mobile phone are used in meeting room 1 to carry out a meeting, at the moment, more than two cameras are all aligned to Zhang III, such as meeting room 1 system cameras (more than 2 cameras can be used in a meeting room system) and notebook terminal cameras. In this case, there may be a priority algorithm, such as a priority display in which the image person ratio is large (the upper face is certainly the notebook terminal camera), and if the ratio is about large, a priority display in the front face, etc. The method can also be matched with the voice excitation mode to select the method with short pickup distance.
In conclusion, the conference picture switching method based on the voice recognition integrates the sign-in information and the address book information into the voice recognition library, so that the conference picture switching is more intelligent, and the real experience of face-to-face meeting is achieved. And the semantic recognition function is added, so that the accurate judgment can be performed on the participants the user wants to watch, the specific voice operation can be added, and the conference experience is improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (10)
1. A conference screen switching apparatus based on voice recognition, comprising:
the voice recognition library generating module generates a voice recognition library based on the check-in information and the platform address book information;
the voice recognition module is used for carrying out conference voice recognition based on the voice recognition library so as to find a matching result;
the semantic analysis module is used for carrying out semantic analysis on the matching result of the voice recognition module; and
and the picture switching module is used for switching pictures according to the semantic analysis result.
2. The switching method of the conference screen switching apparatus based on the voice recognition as claimed in claim 1, comprising the steps of:
step one, generating a voice recognition library based on sign-in information and platform address book information;
secondly, conference voice recognition is carried out based on a voice recognition library so as to find a matching result;
thirdly, performing semantic analysis on the matching result;
and step four, switching pictures according to the semantic analysis result.
3. The conference screen switching method based on speech recognition as claimed in claim 2, wherein the first step specifically comprises: the method comprises a sign-in step, a step of acquiring information of participating members from a platform and a step of generating a voice recognition detection list.
4. The conference screen switching method based on voice recognition according to claim 2, wherein the platform address book information includes: names, nicknames and remark names of others.
5. The conference screen switching method based on voice recognition according to claim 2, wherein the acquisition mode of the check-in information includes: face recognition, manual check-in, card swiping check-in and terminal automatic check-in.
6. The conference screen switching method based on speech recognition according to claim 3, wherein the second step specifically comprises: the method comprises a voice recognition step, a matching step based on a voice recognition library and a matching judgment step.
7. The conference screen switching method based on speech recognition according to claim 6, wherein the third step specifically comprises: semantic learning and editing, semantic analysis and generation recording and semantic scene conforming judgment.
8. The method as claimed in claim 2, wherein the semantic analysis is to analyze whether a main sentence is to call or talk about a person or to command an operation.
9. The conference screen switching method based on speech recognition according to claim 7, wherein the fourth step specifically comprises: a step of viewing display strategy and a step of switching conference pictures.
10. The conference screen switching method based on speech recognition according to claim 2, wherein the display policy of the switching screen is: the picture with large proportion of people is displayed preferentially, and if the proportion of people is equivalent, the front picture of people is displayed preferentially.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910907963.2A CN110600036A (en) | 2019-09-24 | 2019-09-24 | Conference picture switching device and method based on voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910907963.2A CN110600036A (en) | 2019-09-24 | 2019-09-24 | Conference picture switching device and method based on voice recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110600036A true CN110600036A (en) | 2019-12-20 |
Family
ID=68862930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910907963.2A Pending CN110600036A (en) | 2019-09-24 | 2019-09-24 | Conference picture switching device and method based on voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110600036A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111405232B (en) * | 2020-03-05 | 2021-08-06 | 深圳震有科技股份有限公司 | Video conference speaker picture switching processing method and device, equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510990A (en) * | 2009-02-27 | 2009-08-19 | 深圳华为通信技术有限公司 | Method and system for processing remote presentation conference user signal |
CN102131071A (en) * | 2010-01-18 | 2011-07-20 | 华为终端有限公司 | Method and device for video screen switching |
CN102638671A (en) * | 2011-02-15 | 2012-08-15 | 华为终端有限公司 | Method and device for processing conference information in video conference |
CN105608754A (en) * | 2014-11-12 | 2016-05-25 | 中兴通讯股份有限公司 | Video conference signing method, video conference signing apparatus and video conference signing system |
CN106231259A (en) * | 2016-07-29 | 2016-12-14 | 北京小米移动软件有限公司 | The display packing of monitored picture, video player and server |
CN107277427A (en) * | 2017-05-16 | 2017-10-20 | 广州视源电子科技股份有限公司 | Method and device for automatically selecting camera picture and audio/video system |
-
2019
- 2019-09-24 CN CN201910907963.2A patent/CN110600036A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510990A (en) * | 2009-02-27 | 2009-08-19 | 深圳华为通信技术有限公司 | Method and system for processing remote presentation conference user signal |
CN102131071A (en) * | 2010-01-18 | 2011-07-20 | 华为终端有限公司 | Method and device for video screen switching |
CN102638671A (en) * | 2011-02-15 | 2012-08-15 | 华为终端有限公司 | Method and device for processing conference information in video conference |
CN105608754A (en) * | 2014-11-12 | 2016-05-25 | 中兴通讯股份有限公司 | Video conference signing method, video conference signing apparatus and video conference signing system |
CN106231259A (en) * | 2016-07-29 | 2016-12-14 | 北京小米移动软件有限公司 | The display packing of monitored picture, video player and server |
CN107277427A (en) * | 2017-05-16 | 2017-10-20 | 广州视源电子科技股份有限公司 | Method and device for automatically selecting camera picture and audio/video system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111405232B (en) * | 2020-03-05 | 2021-08-06 | 深圳震有科技股份有限公司 | Video conference speaker picture switching processing method and device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9064160B2 (en) | Meeting room participant recogniser | |
US20220254158A1 (en) | Learning situation analysis method, electronic device, and storage medium | |
US10334002B2 (en) | Communication device and method | |
EP3125154A1 (en) | Photo sharing method and device | |
CN107644646B (en) | Voice processing method and device for voice processing | |
CN112653902B (en) | Speaker recognition method and device and electronic equipment | |
US20160359941A1 (en) | Automated video editing based on activity in video conference | |
CN111258528B (en) | Voice user interface display method and conference terminal | |
WO2020119032A1 (en) | Biometric feature-based sound source tracking method, apparatus, device, and storage medium | |
US9641801B2 (en) | Method, apparatus, and system for presenting communication information in video communication | |
KR102453084B1 (en) | Electronic apparatus and method for controlling thereof | |
CN106331293A (en) | Incoming call information processing method and device | |
CN110769189B (en) | Video conference switching method and device and readable storage medium | |
CN112991553A (en) | Information display method and device, electronic equipment and storage medium | |
KR20140078258A (en) | Apparatus and method for controlling mobile device by conversation recognition, and apparatus for providing information by conversation recognition during a meeting | |
CN104991910A (en) | Album creation method and apparatus | |
CN117897930A (en) | Streaming data processing for hybrid online conferencing | |
CN110673811B (en) | Panoramic picture display method and device based on sound information positioning and storage medium | |
CN110351513B (en) | Court trial recording method and device, computer equipment and storage medium | |
CN106326804B (en) | Recording control method and device | |
CN110600036A (en) | Conference picture switching device and method based on voice recognition | |
CN112312039A (en) | Audio and video information acquisition method, device, equipment and storage medium | |
CN114240342A (en) | Conference control method and device | |
CN116193179A (en) | Conference recording method, terminal equipment and conference recording system | |
CN117768597B (en) | Guide broadcasting method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191220 |