CN113709545A

CN113709545A - Video processing method and device, computer equipment and storage medium

Info

Publication number: CN113709545A
Application number: CN202110395901.5A
Authority: CN
Inventors: 吴启亮; 陈嘉健
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-11-26

Abstract

The application relates to a video processing method and device, computer equipment and a storage medium. The method comprises the following steps: collecting and displaying a video picture in response to the triggering operation of the collected video; when the object designation operation of a target object in a designated video picture is triggered, highlighting the target object at a position matched with the target object in the video picture; and when the target object meets the highlight ending condition after being highlighted, canceling the highlight of the target object in the video picture. By the method, audiences can know the target object more visually, the transmission efficiency of video information is improved, manual editing of the video in a later period is not needed, the video is not needed to be processed by auxiliary materials, the efficiency of processing the video can be improved, and the user experience is improved.

Description

Video processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video processing method and apparatus, a computer device, and a storage medium.

Background

With the rapid development of internet technology and intelligent equipment, various video generation related technologies appear, and various video platforms also come into play. For example, live videos, short videos and the like, the video platforms bring new social contact modes and online shopping modes, and great convenience is brought to the life of people.

At present, when generating a video or playing a video directly, in addition to using main video materials, in order to enrich the information conveyed by the video and improve the viewing experience brought by the video for users, the video is generally subjected to post-processing by a post-manual editing mode or auxiliary materials provided by an adding system. However, in both the post-stage manual editing mode and the system for providing auxiliary materials to process the video, the user operation is complicated, time and labor are wasted, the video processing mode is inefficient, and the information conveyed by the video is limited by the user operation.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video processing method, an apparatus, a computer device, and a storage medium, which can improve efficiency of processing video and improve conveyed video information.

A method of processing video, the method comprising:

collecting and displaying a video picture in response to the triggering operation of the collected video;

when an object specifying operation for specifying a target object in the video picture is triggered, highlighting the target object at a position in the video picture, which is matched with the target object;

canceling the highlight of the target object in the video picture when the highlight ending condition is met after the target object is highlighted.

In one embodiment, the highlighting the target object at a position in the video picture matching the target object when an object specification operation specifying the target object in the video picture is triggered includes: and when a target object matched with the voice synchronously acquired with the video picture exists in the video picture, highlighting the target object at a position matched with the target object in the video picture.

In one embodiment, the highlighting the target object at a position in the video picture matching the target object when an object specification operation specifying the target object in the video picture is triggered includes: in response to a trigger operation for delineating a target object in the video picture, highlighting the target object at a location in the video picture that matches the target object.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: and at the position matched with the target object in the video picture, magnifying and showing a target enclosing area comprising the target object, wherein the target enclosing area is an area which is cut out from the video picture and comprises the target object.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: and gradually magnifying and displaying a target delineation area including the target object within a preset time length at a position matched with the target object in the video picture, wherein the target delineation area is an area including the target object and is intercepted from the video picture.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: and magnifying and displaying a target delineation area including the target object in the video picture at a position matched with the target object, and synchronously playing special effect audio for highlighting the target object, wherein the target delineation area is an area including the target object and is cut from the video picture.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: and in the video picture, continuously drawing and tracing along the edge of the target object until a closed tracing is formed, ending drawing and displaying the closed tracing.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: when the target object moves in the video picture, highlighting the target object at a location in the video picture that varies with the movement of the target object in the video picture.

In one embodiment, the highlighting the target object at the position in the video picture matching the target object comprises: and displaying a virtual decoration material for highlighting the target object at a position matched with the target object in the video picture.

In one embodiment, canceling the highlighting of the target object in the video frame when a highlighting end condition is satisfied after the target object is highlighted comprises: and when the highlighted duration of the target object is greater than a preset threshold, canceling the highlighted presentation of the target object in the video picture.

In one embodiment, canceling the highlighting of the target object in the video frame when a highlighting end condition is satisfied after the target object is highlighted comprises: and when the highlighted duration of the target object is less than a preset threshold value and the target object is moved out of the video picture, the highlighted presentation of the target object in the video picture is cancelled.

In one embodiment, the target object is a first target object, and when an object specification operation for specifying a target object in the video picture is triggered, highlighting the target object at a position in the video picture that matches the target object includes: when an object designation operation for designating a first target object in the video picture is triggered, highlighting the first target object at a position in the video picture matching the first target object, and when an object designation operation for designating a second target object in the video picture is triggered, continuing highlighting the first target object in the video picture and highlighting the second target object in the video picture.

In one embodiment, the target object is a first target object, and when an object specification operation for specifying a target object in the video picture is triggered, highlighting the target object at a position in the video picture that matches the target object includes: when an object specifying operation for specifying a first target object in the video picture is triggered, highlighting the first target object at a position in the video picture matching the first target object, and when an object specifying operation for specifying a second target object in the video picture is triggered, canceling the highlighting of the first target object in the video picture and highlighting the second target object in the video picture.

In one embodiment, the trigger operation for capturing the video is a trigger operation for starting live video, and the video frame is a live video frame, and the method further includes: transmitting the video live broadcast picture to a user terminal; when the target object is highlighted in the live video frame, the user terminal highlights the target object in a video playing interface; and when the target object is not highlighted in the video live broadcast picture, the user terminal cancels the highlighting of the target object in a video playing interface.

In one embodiment, the capturing and displaying the video picture in response to the triggering operation of capturing the video includes: responding to a trigger operation for starting video live broadcast, and displaying a video live broadcast picture; when an object specifying operation for specifying a target object in the video picture is triggered, highlighting the target object at a position in the video picture which is matched with the target object, including: when target commodity information matched with live commenting voice synchronously acquired with the live video pictures exists in the live video pictures, the target commodity information is highlighted and displayed at a position matched with the target commodity information in the live video pictures.

In one embodiment, the method further comprises: responding to the trigger operation of ending video acquisition to obtain a target video; responding to the trigger operation of playing the target video, playing the target video and displaying a video playing interface; when the video picture comprising the target object is played, the voice synchronously collected with the video picture is taken from the position matched with the target object in the video playing interface; and placing the sound spectrum image corresponding to the voice and the preset sound spectrum images corresponding to various objects, and performing prominent display on the target object.

In one embodiment, the method further comprises: acquiring voice synchronously acquired with the video picture; comparing the voice spectrum image corresponding to the voice with preset voice spectrum images corresponding to various objects, and determining the object matched with the voice according to a comparison result; carrying out object recognition on the video picture, and recognizing an object contained in the video picture; matching the object matched with the voice with the recognized object, and determining that a target object matched with the voice synchronously acquired with the video picture exists in the video picture when the matching is successful.

An apparatus for processing video, the apparatus comprising:

the response module is used for responding to the triggering operation of the collected video and collecting and displaying the video picture;

the first display module is used for carrying out prominent display on a target object at a position matched with the target object in the video picture when object designation operation for designating the target object in the video picture is triggered;

and the second display module is used for canceling the highlight display of the target object in the video picture when the target object meets the highlight display ending condition after being highlighted.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of:

A computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read from the computer readable storage medium by a processor of a computer device, the processor executing the computer instructions, causing the computer device to perform the steps of the method of processing video described above.

According to the video processing method, the video processing device, the computer equipment and the storage medium, in the video acquisition process, when the object designation operation for designating the target object in the video picture is triggered, the target object in the video picture is automatically highlighted and displayed until the highlight display finishing condition is met after the highlight display, the highlight display of the target object is cancelled in the video picture, and the normal video picture is recovered, so that the target object can be more visually known by audiences, the video information transmission efficiency is improved, the video is not required to be manually edited in the later period, the video is not required to be processed by using auxiliary materials, the video processing efficiency can be improved, and the user experience is improved.

A method of processing video, the method comprising:

responding to the triggering operation of playing the video, playing the video and displaying a video playing interface;

when an object designation operation of a target object in a video picture designated for playing is triggered, highlighting the target object at a position matched with the target object in the video playing interface;

and when the target object meets the highlight ending condition after being highlighted, canceling the highlight of the target object in the video playing interface.

An apparatus for processing video, the apparatus comprising:

the response module is used for responding to the triggering operation of playing the video, playing the video and displaying a video playing interface;

the first display module is used for highlighting the target object at a position matched with the target object in the video playing interface when the object designation operation of the target object in the video picture designated to be played is triggered;

and the second display module is used for canceling the highlight display of the target object in the video playing interface when the highlight display ending condition is met after the target object is highlighted.

According to the video processing method, the video processing device, the computer equipment and the storage medium, in the video playing process, when the object designation operation for designating the target object in the video is triggered, the target object in the video playing interface is automatically highlighted and displayed until the highlight display ending condition is met after the highlight display, the highlight display of the target object is cancelled in the video playing interface, and the normal playing of the normal video image is recovered, so that the target object can be more visually known by audiences in the video playing process, the video information transmission efficiency is improved, the video is not required to be edited manually in the later period, the video is not required to be processed by auxiliary materials, the video processing efficiency can be improved, and the user experience is improved.

Drawings

FIG. 1 is a diagram of an exemplary video processing system;

FIG. 2 is a flow diagram that illustrates a method for processing video, according to one embodiment;

FIG. 3 is a flowchart illustrating a video processing method according to another embodiment;

FIG. 4 is a flow diagram that illustrates highlighting of a target object in a video frame, under an embodiment;

FIG. 5 is a flowchart illustrating a video processing method according to another embodiment;

FIG. 6 is a flow diagram illustrating a process for displaying a target object in a video frame in an enlarged manner, according to an embodiment;

FIG. 7 is a diagram illustrating an interface of a video frame according to an embodiment;

FIG. 8 is a diagram illustrating the delineation of a target region by a gesture in one embodiment;

FIG. 9 is a flowchart showing a video processing method according to still another embodiment;

FIG. 10 is a flow diagram of a method for processing video in accordance with one illustrative embodiment;

FIG. 11 is a block diagram showing a configuration of a video processing apparatus according to one embodiment;

FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The video processing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network and the server 104 communicates with the user terminal 106 via a network. In one embodiment, the terminal 102 collects and displays a video picture in response to a trigger operation of collecting a video; when the object designation operation of a target object in a designated video picture is triggered, highlighting the target object at a position matched with the target object in the video picture; and when the target object meets the highlight ending condition after being highlighted, canceling the highlight of the target object in the video picture.

In one embodiment, the terminal 102 may transmit the video frames to the server 104 in real time, and push the video frames to each user terminal 106 through the server 104 in real time, so that the user terminal 106 may view the video frames captured by the terminal 102 in real time, and when the terminal 10 highlights the target object at a position in the video frames matching with the target object, the user terminal 106 highlights the target object in the video frames synchronously with the terminal 102.

Further, in one embodiment, the terminal 102 may obtain the target video in response to a trigger operation to end video capture; sending the target video to the server 104, pushing the target video to each user terminal 106 through the server 104, and enabling each user terminal 106 to play the target video and display a video playing interface; when the video playing interface is played to the video picture comprising the target object, the target object is highlighted at the position matched with the target object in the video playing interface. That is to say, for the target object highlighted in the video capture process, when the target video is obtained after the video capture is finished, and the user terminal 106 plays the target video and plays the target video to the video screen including the target object, the target object will be highlighted.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, a cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a CDN, and big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The server 104 may also be a blockchain node in a blockchain network, for example, video data collected by the terminal 102 may be stored on the blockchain node.

In a specific application scenario, a terminal 102 is an intelligent device used by a user who shoots a video, an application program supporting a video shooting function is run on the terminal 102, after the user enters a user interaction interface of the application program, a camera on the terminal 102 is triggered and called through the user interaction interface to shoot the video and display a video picture, the user can introduce or explain a shot object while shooting, or one person shoots and the other person carries out explanation on the other person, and when a target object matched with the explanation content exists in the shot video picture, the target object is highlighted at a position matched with the target object in the video picture. It will be appreciated that the user explaining at the side may or may not appear in the video screen.

In a specific application scene, the terminal 102 is an intelligent device used by a user playing a video, an application program supporting a video playing function is run on the terminal 102, the user selects a video to be played from a user interaction interface to play after entering the application program, a video picture and voice are played synchronously, and when a target object matched with the current voice exists in the played video picture, the target object is highlighted at a position matched with the target object in the video picture, so that the user can pay more attention to the target object in the video picture, the video interest is more sufficient, and the user experience is better.

Alternatively, the application running on the terminal 102 may be a native application running on the basis of an operating system on the terminal 102, or may be a sub-application running on the basis of an environment provided by the native application, such as an applet.

In one embodiment, as shown in fig. 2, a video processing method is provided, which is described by taking the method as an example of the application of the method to the computer device (e.g. the terminal 102) in fig. 1, and includes the following steps:

and 202, responding to the triggering operation of the video acquisition, and acquiring and displaying a video picture.

The trigger operation of capturing the video may be a trigger operation of initiating video shooting or video live broadcasting. The trigger operation may be a single-click operation, a double-click operation, or a slide operation triggered in the user interaction interface. The video picture is a real-time picture for presenting video content, and the video picture can fill the screen of the terminal or only occupy a part of the screen of the terminal. The video picture may be a real-time picture when the video is shot, for example, the video picture may be a picture shot by a user in real time when the video is recorded, or may be a live video picture collected by a live broadcast device when the user performs live video broadcast.

In some embodiments, the video frame may further include a control for processing the video, and when the user needs to perform some video processing on the acquired video frame, the terminal may acquire a trigger operation of the user for the control to process the acquired video frame. For example, the terminal may obtain a trigger operation of a user for a video beautification control, and beautify video content in the acquired video picture. The terminal can acquire the triggering operation of the user for the audio adding control and add background music to the acquired video picture. The terminal can acquire the triggering operation of the user for the video material adding control, and add materials, such as stickers, animation effects and the like, into the acquired video pictures.

In one embodiment, the video processing method may be applied to a video content generation application. In particular, when a user desires to convey information to a viewer in a manner that generates a video, the video may be generated by the video content generation application. The terminal can display a user interaction interface of the video content generation application, and starts to acquire and display video pictures after video shooting is started in the user interaction interface.

In one embodiment, the video processing method can be applied to an online shopping application. Specifically, when a user needs to introduce a commodity to a viewer in a live video mode, the commodity can be live video through the online shopping application. The terminal can enter a live video interaction interface in the online shopping application, obtain live video operation triggered by a user in the live video interaction interface, and collect and display live video pictures of the commodity after the live video is started.

In some embodiments, the video processing method may also be applied to other video live broadcast applications, such as an educational video live broadcast application, a sporting event video live broadcast application, an online shopping video live broadcast application, and the like. For live broadcasting in different scenes, the content in the collected video pictures is different.

And 204, when the object specifying operation for specifying the target object in the video picture is triggered, highlighting the target object at the position matched with the target object in the video picture.

The target object is any object that can be presented in the video picture, and may be any object in daily life. For example, in the daily video shooting scenario, the target object may be a simulated yellow duck, a car, a lipstick, and a skirt. In addition, the target object may also be description information related to some object, for example, in the context of live online shopping, the target object may be a product pattern appearing in a promotion poster of a product, or a product name, a product pattern appearing in an online purchasing interface of a product, and the like.

The position matched with the target object in the video frame may be a position within a preset range centered on the target object. For example, the position may be a position within a preset range with the target object as a geometric center, and the position may also be a position close to the target object. Further, the position is attached to the target object to move as the target object moves in the video screen.

The object specifying operation is a trigger operation for specifying a target object in the video screen, and may be an operation automatically recognized by the terminal, or may be an operation recognized by the terminal in accordance with a trigger operation by the user. In one embodiment, the object specifying operation is an operation automatically recognized by the terminal, for example, when the terminal detects that a target object matched with the voice captured synchronously with the video picture exists in the captured video picture, the object specifying operation for specifying the target object is triggered. In one embodiment, the object specifying operation is an operation that is recognized by the terminal according to a trigger operation of the user, for example, if the user detects a triggered operation for delineating a target object during video capture, the object specifying operation for specifying the target object is triggered.

Specifically, in the video acquisition process, when the terminal detects that an object designation operation for designating a target object in a video picture is triggered, the target object is highlighted at a position matched with the target object in the video picture. Optionally, in the process of capturing the video, when an object specifying operation for specifying a target object in the video picture is not triggered, the video picture is normally displayed, and each object of the video picture is normally displayed.

It can be understood that, a manner of highlighting the target object in the video picture is different from a manner of normally revealing the target object in the video picture, when the terminal normally reveals the target object, the terminal usually reveals the target object according to the original size and/or style of the captured target object without further processing the object in the video or the content of the video, and when the terminal highlights the target object, the highlighted target object is highlighted in the video picture and can attract attention. The manner in which the target object is highlighted will be described below.

And step 206, canceling the highlighting of the target object in the video picture when the target object meets the highlighting ending condition after being highlighted.

Wherein the highlight ending condition is a trigger condition for canceling the highlight of the target object in the highlight state. The highlight end condition may include at least one trigger condition. When the end-of-highlight condition includes multiple trigger conditions, there may be a priority among the multiple trigger conditions. For example, the highlight ending condition may be at least one of a duration of time that the target object is highlighted being greater than a preset threshold, the target object being moved out of the video picture, the target object being occluded in the video picture, and an object specifying operation of another target object in the video picture being triggered. Optionally, the plurality of trigger conditions in the highlight ending condition only need to satisfy one of the plurality of trigger conditions.

Specifically, when the terminal detects an object designation operation for designating a target object in the video image, the target object is highlighted at a position in the video image where the target object is matched, and the highlighting continues until the terminal detects that the target object is highlighted and meets a highlighting ending condition, the highlighting of the target object is cancelled, and the normal video image before the highlighting is recovered. Optionally, when the target object does not satisfy the highlight ending condition after being highlighted, the terminal continues to highlight the target object at a position matched with the target object in the video picture.

In one embodiment, de-highlighting is a restoration to a state in which the target object was normally shown in the video frame before the target object was not highlighted.

In one embodiment, when the highlighted duration of the target object is less than a preset threshold, the display continues to highlight the target object in the video picture. In one embodiment, when the time length of the highlighted target object is greater than a preset threshold, the highlight of the target object in the video picture is cancelled.

The preset threshold value can be set according to actual needs, and can be 1-5 seconds, for example. For example, when the time length of the target object highlighted in the video picture is less than 3 seconds, the target object is continuously highlighted in the video picture, and when the time length of the target object highlighted in the video picture from the highlighted time is more than 3 seconds, the target object is not highlighted in the video picture, and the target object is displayed in the original captured style or size.

In some embodiments, in order to highlight the target object, a feature parameter matrix for transforming the target object needs to be obtained, where each feature parameter in the feature parameter matrix is a parameter corresponding to a different transformation manner. For example, the characteristic parameter matrix includes parameters corresponding to respective transformation modes such as translation, scaling, rotation, tilting, perspective and the like of the target object. The terminal can perform at least one of translation, scaling, rotation, inclination, perspective and other transformation processing on the target object according to the characteristic parameter matrix so as to realize the highlight display of the target object.

In one embodiment, the characteristic parameter matrix may be illustrated as follows:

the rotation angle corresponding to the rotation transformation can be determined according to the abscissa scaling parameter, the ordinate scaling parameter, the abscissa inclination parameter and the ordinate inclination parameter; the scaling corresponding to scaling transformation can be determined according to the abscissa scaling parameter and the ordinate scaling parameter; the translation distance corresponding to translation transformation can be determined according to the abscissa translation parameter and the ordinate translation parameter; the inclination degree corresponding to the inclination transformation can be determined together according to the abscissa inclination parameter and the ordinate inclination parameter; transformation parameters corresponding to three-dimensional perspective transformation can be determined together according to the first perspective parameter, the second perspective parameter and the third perspective parameter, the three-dimensional perspective transformation is also called projection transformation, and is a transformation process of projecting object coordinates in a three-dimensional space to screen coordinates.

In one embodiment, the feature parameters corresponding to the respective transformation modes may be fixedly set, that is, the feature transformation parameter matrix is fixed. Of course, the feature transformation parameter matrix may also be adaptively changed according to the current target object.

Optionally, when the feature transformation parameter matrix is fixedly set, after the terminal obtains the position of the target object in the video picture, the terminal may use a TexttureView component (a component for controlling the display of the video picture on the screen), and call a setTransform method in the TexttureView component, and after directly transmitting the feature transformation parameter matrix, that is, setTransform (matrix), perform corresponding transformation on the coordinates of the pixel point at the position where the target object is located, thereby implementing the highlighted rendering display of the target object in the video picture.

In an embodiment, the feature transformation parameter matrix may also be adaptively changed according to the current target object, that is, the terminal needs to calculate a value corresponding to each parameter in the feature transformation parameter matrix. For example, the highlighted size of the target object is a preset size, that is, the highlighted sizes of the target objects in the video frame are the same, and the feature transformation parameter matrices corresponding to the highlighted sizes of the target objects are different.

Specifically, the terminal needs to determine an original pattern of the target object in the video frame, and calculate a value corresponding to each parameter in the feature transformation parameter matrix according to the original pattern and the highlighted preset pattern. For example, the terminal needs to determine an original size of the target object in the video frame, and calculate values corresponding to the abscissa scaling parameter and the ordinate scaling parameter in the feature transformation parameter matrix according to the original size and the highlighted preset size, for example, the terminal calculates values corresponding to the abscissa tilting parameter and the ordinate tilting parameter in the feature transformation parameter matrix according to an original angle of the target object in the video frame and a preset angle required when highlighted, and calculates values corresponding to the abscissa panning parameter and the ordinate panning parameter in the feature transformation parameter matrix according to an original position of the target object in the video frame and a preset position when highlighted.

After a terminal obtains a characteristic transformation parameter matrix required for highlighting a target object in a video picture, a setTransform method in a TextureView component is called, and after the characteristic transformation parameter matrix is transmitted, the coordinates of pixel points at the position of the target object are correspondingly transformed, so that the highlighting rendering display of the target object in the video picture is realized.

In some embodiments, the terminal may also render the target object to be highlighted using a SurfaceView component (a view component used to draw images or animations). In the process of video acquisition, video image acquisition and display are used as a main thread, the highlight display of a target object in the video image is used as a sub-thread, the main thread acquires video image data acquired by a camera, video coding is carried out on a region where the target object is located by using a MediaCodec hard coding mode to obtain video coded data, and then the video coded data of the region where the target object is located is rendered and displayed in a highlight display corresponding mode through a SurfaceView component. The rendering and displaying of the video data through the SurfaceView component does not share one thread with the whole video picture, and the video data can be independently drawn on one sub-thread without occupying the resources of the main thread.

In some embodiments, the terminal may further determine a proportion of the target object in the video image, determine, according to the proportion, a scaling parameter corresponding to the target object being highlighted, including an abscissa scaling parameter and an ordinate scaling parameter, and perform re-rendering display on a pixel point where the target object is located according to the scaling parameter. For example, for some target objects occupying a larger position of the video screen, the terminal may display the target object at a fixed position or another position in the video screen after zooming out to highlight the target object. For another example, for some target objects occupying a small position of the video frame, the terminal may enlarge the target object and display the enlarged target object in the video frame to highlight the target object. The terminal can obtain the preset size of the target object which is highlighted, and determine the scaling parameter corresponding to the highlighted target object according to the ratio of the target object in the video picture and the preset size.

According to the video processing method, in the video acquisition process, when the object designation operation for designating the target object in the video picture is triggered, the target object in the video picture is automatically highlighted and displayed until the highlight display ending condition is met after the highlight display, the highlight display of the target object is cancelled in the video picture, and the normal video picture is recovered, so that the target object can be more visually known by audiences, the transmission efficiency of video information is improved, the video does not need to be edited manually in the later period, the video does not need to be processed by using auxiliary materials, the video processing efficiency can be improved, and the user experience is improved.

In one embodiment, the object specifying operation is an operation automatically recognized by the terminal, for example, when the terminal detects that a target object matched with a voice captured in synchronization with a video picture exists in the captured video picture, the object specifying operation for specifying the target object is triggered. That is, as shown in fig. 3, the video processing method includes:

and step 302, collecting and displaying a video picture in response to the triggering operation of the collected video.

And step 304, when a target object matched with the voice synchronously collected with the video picture exists in the video picture, highlighting the target object at a position matched with the target object in the video picture.

And step 306, canceling the highlighting of the target object in the video picture when the target object meets the highlighting ending condition after being highlighted.

The voice synchronously acquired with the video picture is the voice synchronously acquired when the terminal acquires the video. In some embodiments, the voice may be a speaker voice captured live when the user takes a video, such as a voice of the user presenting a food item live when the user takes the food item, or such as a voice of the user explaining a certain product live when the user is live. In some embodiments, the voice collected synchronously with the video frame may also be a voice played by some devices in the field, such as a sound of a robot explaining a certain object, or a voice guide played by a smart device.

Specifically, in the process of acquiring the video picture by the terminal, when a target object matched with the voice appears in the video picture, the terminal determines the position of the target object in the video picture, and highlights the target object at the position. The highlighted target object is more obvious in the video picture, the attention of the user can be attracted, the capability of transmitting information to the user by the video is improved, and the video content is richer.

In one embodiment, in the process of acquiring a video picture, the terminal automatically identifies a target object matched with the synchronously acquired voice, automatically identifies an object in the video picture, matches the identified object with the target object, and highlights the target object in the video picture when the matching is successful.

FIG. 4 is a flow diagram that illustrates a process for highlighting a target object in a video frame, in one embodiment. Referring to fig. 4, when the terminal collects a video picture, a photographer explains the video picture beside, and when the terminal judges that an object mentioned by voice exists in the video picture, the object is highlighted in the video picture, otherwise, the video picture is normally displayed.

In an embodiment, the video processing method further includes: acquiring voice synchronously acquired with a video picture; comparing the voice spectrum image corresponding to the voice with preset voice spectrum images corresponding to various objects, and determining the object matched with the voice according to the comparison result; carrying out object identification on the video picture to identify an object contained in the video picture; and matching the object matched with the voice with the recognized object, and determining that a target object matched with the voice synchronously acquired with the video picture exists in the video picture when the matching is successful.

The voice spectrum image is a voice ripple corresponding to the reading voice when the name of each object is read, and the corresponding voice spectrum image is different when the names of different objects are different. The terminal synchronously collects voice when collecting the video picture, compares the voice spectrum image in the voice with the voice spectrum images corresponding to various preset objects, and determines the object matched with the voice in the video picture according to the comparison result. For example, when the comparison result shows that the similarity between the audio images exceeds a threshold, the object mentioned by the voice can be determined. Meanwhile, the terminal identifies the object in the collected video picture, identifies the object in the video picture, and highlights the object in the video picture when the identified object comprises the object mentioned by the voice.

In one embodiment, the terminal may store the audio spectrum images corresponding to various objects in advance, and store the object names and the corresponding audio spectrum images correspondingly. For example, the mandarin chinese can be used to say "little yellow duck", obtain the corresponding voice spectrum image of the voice, and associate the voice spectrum image with "little yellow duck". Of course, the audio spectrum image corresponding to each object may be various, and for example, the audio spectrum image may include an audio spectrum image spoken by mandarin, an audio spectrum image spoken by dialect, and an audio spectrum image spoken by english.

In one embodiment, the terminal may store images corresponding to various types of objects in advance, and associate the object name with the image corresponding to the object. The images corresponding to the object may include images from different perspectives, under different scenes. For example, by taking pictures of the 'little yellow duck' at different angles, including pictures of front view, side view, back view, top view, bottom view and other special angles, the taken pictures are associated with the 'little yellow duck' and then stored.

Optionally, preset audio spectrum images corresponding to various objects and images corresponding to various objects can be stored in the server, the terminal can send the collected video pictures and voices to the server, the server compares the video pictures with the images corresponding to the various objects stored in the server to identify the objects in the video pictures, the server extracts the audio spectrum images corresponding to the voices, compares the extracted audio images with the stored audio spectrum images corresponding to the various objects to identify the objects extracted by the voices, matches the extracted audio images with the stored audio spectrum images to obtain a matching result, and returns the matching result to the terminal.

In one embodiment, the terminal may also perform voice recognition on voice collected synchronously with the video picture, and convert the voice into text, so as to determine the name of the object mentioned by the voice, and meanwhile, the terminal performs object recognition on the collected video picture, identifies an object in the video picture, and highlights the object mentioned by the voice in the video picture when the identified object includes the name of the object mentioned by the voice.

In one embodiment, the terminal may perform object detection on each captured video frame, identify a location where an object exists in the video frame and a corresponding object name, where the location of the object in the video frame may be represented by coordinates. Optionally, the terminal may perform object detection on each acquired frame of video picture by using a pre-trained object detection model based on a neural network. And the terminal performs voice recognition on the voice collected in the video collection process, converts the voice into corresponding text content, matches the object name extracted from the video picture with the text content, and triggers an object designation operation for designating the object if the matching is successful.

In one embodiment, in order to reduce resource consumption caused by the fact that the terminal identifies the target object in each frame of the captured video picture, the terminal may identify the target object matched with the voice only when the voice is captured, and only perform object detection on the video picture captured synchronously with the voice. For example, in the 10 th to 13 th seconds of video acquisition, the user mentions "see! There is a little yellow duck, and the terminal only needs to detect the object of the collected 10 th to 13 th video pictures and determine whether the little yellow duck exists in the video picture, so that the resource consumption caused by object detection of all the collected video pictures is avoided.

In some embodiments, the terminal may send the captured video picture to the server, send the voice captured in synchronization with the video to the server, perform object recognition on the video picture by the server, determine an object recognized in the video picture, perform voice recognition on the voice by the server, recognize an object mentioned by the voice, match the two, when the object recognized in the video picture matches the object mentioned by the voice, determine that the matching is successful, and the server returns the name and the position of the object in the video picture to the terminal.

In one embodiment, the terminal may synchronously display text content corresponding to the captured voice in the video picture, and the text content may be displayed in the video picture in the form of subtitles, barrage or bubble subtitles.

For example, when the user shoots and explains while shooting, the terminal can automatically identify the text content corresponding to the voice synchronously collected when the video picture is shot, namely, the text content is displayed in the video picture without editing the video and adding subtitles in a manual editing mode in the later period, subtitles can be directly generated and displayed in the shooting process, and the video processing efficiency is improved.

In the embodiment, in the video acquisition process, when the target object mentioned by voice exists in the video picture, the target object is automatically highlighted in the video picture, the transmission efficiency of video information is improved, the video is automatically highlighted without manual editing in the later period, the video is not required to be processed by auxiliary materials, the video processing efficiency can be improved, and the user experience is improved.

In one embodiment, the object specifying operation is recognized by the terminal according to a triggering operation of the user, for example, when the user detects a triggered operation for delineating a target object during video capture, the object specifying operation for specifying the target object is triggered. That is, as shown in fig. 5, the video processing method includes:

and 502, responding to the triggering operation of the video acquisition, and acquiring and displaying a video picture.

And 504, in response to the trigger operation for defining the target object in the video picture, highlighting the target object at the position matched with the target object in the video picture.

And step 506, canceling the highlighting of the target object in the video picture when the target object meets the highlighting ending condition after being highlighted.

The triggering operation for delineating the target object in the video picture is an operation for delineating the target object triggered in the video picture, and the triggering operation can be any one of a click operation, a sliding operation or a double-click operation. The operation may delineate a target region, the target region comprising the target object.

In an application scene, regardless of whether the current voice refers to a target object in a video picture, a user can manually circle the target object in the video picture according to own will, and when the terminal detects the trigger operation, the target object is highlighted at a position matched with the target object in the video picture.

In an application scenario, when a user finds that a target object mentioned by voice exists in a video picture but cannot be successfully identified, the user can define a target area in the video picture in a manually assisted manner, wherein the target area comprises the target object, so that the target object in the target area can be highlighted when the target area is highlighted.

Specifically, the terminal records coordinates of a contact point of a gesture track on a screen by monitoring the gesture track triggered by a user in the screen, so that a target area in a video picture is determined. For example, when the terminal detects that the user draws a circle around the little yellow duck in the video picture on the screen by using a finger, the terminal can display the circle in the video picture in an enlarged mode, so that the little yellow duck is displayed in the enlarged mode.

FIG. 6 is a diagram illustrating the target area being delineated by a gesture in one embodiment. Referring to part (a) of fig. 6, the duckling mentioned by voice exists in the video picture, and the corresponding subtitle is displayed in the video picture, but the duckling in the video picture cannot be successfully identified, referring to part (b) of fig. 6, the user can enlarge and display the duckling by delineating the touch operation of the duckling, as shown in part (c) of fig. 6.

In one embodiment, step 504 may also be: and responding to the trigger operation for defining the target area in the video acquisition interface, and highlighting the target area corresponding to the trigger operation in the video picture.

The triggering operation for delineating the target area is an operation for delineating the target area, which is triggered in the video picture, and the triggering operation can be any one of a click operation, a sliding operation or a double-click operation. The target area may be any one of the areas in the video picture. For example, the target area may be an area including a target object, or may be a background area, a blank area, or an area where some sundries are located in the video frame. The shape of the target area may be circular, rectangular or other irregular shape.

Optionally, when the user wants to highlight any one object in the video frame, regardless of whether the object is mentioned in the voice captured synchronously with the video frame, the user may trigger a trigger operation for delineating the target area on the screen, and at this time, the terminal highlights the target area delineated by the user, for example, an enlarged display.

In this embodiment, in the video acquisition's in-process, when detecting the trigger operation that the user circled the target object in the video picture, the automatic show that stands out is carried out to this target object in the video picture, also just realized the show that stands out at the video shooting in-process equally, does not need the later stage to carry out artifical the edition to this video, also need not utilize auxiliary material to process this video, can promote the efficiency of handling the video, improves user experience.

In one embodiment, highlighting the target object at a location in the video frame that matches the target object includes: and at the position matched with the target object in the video picture, magnifying and showing a target delineation area comprising the target object, wherein the target delineation area is an area comprising the target object and is cut out from the video picture.

The target delineation area is an area where the target object is located, and may be a circular area including the target object in the video picture, a rectangular area including the target object in the video picture, or an area formed by a boundary bounding box of the target object.

Specifically, after a terminal captures a target delineation area from a video picture, the target delineation area is amplified and displayed at the position of a target object, and the target delineation area is used for covering the target object displayed according to the acquired original style in the video picture.

Alternatively, the terminal may perform target detection on the video picture, and when the target object is detected, the target delineation area is intercepted from the video picture according to the surrounding frame coordinates of the detected target object in the video picture.

Optionally, when the duration of the enlarged display of the target delineation region including the target object is greater than a preset threshold, the terminal cancels the highlighted display of the target delineation region in the video picture, and the video picture resumes normal display.

Fig. 7 is a flowchart illustrating a video processing method according to an embodiment. Referring to fig. 7, when the terminal collects a video picture or video live broadcast, collects the video picture and the sound explained by the side, and when the terminal judges that an object mentioned by voice exists in the video picture, the object is displayed in the video picture in an amplified manner, otherwise, the target object is normally displayed in the video picture.

FIG. 8 is a diagram illustrating a video frame according to an embodiment. Referring to part (a) of fig. 8, when no voice is captured when a video frame is captured or content mentioned in the voice does not appear in the video frame, the entire video frame is normally displayed, and an object in the video frame is displayed according to the original captured style or size, for example, a "little yellow duck" in the video frame is normally displayed. Referring to part (b) of FIG. 8, when there is a little yellow duck in the captured video picture, and someone speaks "you see, there is a little yellow duck! And if so, amplifying and displaying the target delineation area surrounding the little yellow duck in the video picture, and displaying the caption corresponding to the voice.

In this embodiment, through the object that the automatic recognition pronunciation mentioned, the object in the automatic recognition video picture, after the object that the pronunciation was mentioned matches successfully with the target object in the video picture, the local area including the target object is enlargied automatically, shows this target object prominently, can let the more audio-visual understanding video content of viewer, promotes user experience.

In one embodiment, the terminal may directly and gradually enlarge and display the target delineation area including the target object at the position matched with the target object in the video picture. That is, highlighting the target object at a position in the video frame that matches the target object includes: and at the position matched with the target object in the video picture, gradually magnifying and displaying a target delineation area including the target object within a preset time length, wherein the target delineation area is an area including the target object and is intercepted from the video picture.

Specifically, after the terminal captures the target delineation region from the video image, the target delineation region may be gradually enlarged from an original size within a preset time period, and continuously displayed in the video image in an enlarged manner after being enlarged to a certain extent. For example, after the terminal recognizes the target object from the voice, the terminal gradually enlarges the target delineation area including the target object within 1 second, the enlarged target delineation area is continuously displayed in the video picture, when the duration of the continuous enlarged display of the target object is more than 3 seconds, the enlarged display of the target object in the video picture is cancelled, and the normal display of the target object in the video picture is recovered.

In this embodiment, the target object matched with the voice in the video picture is highlighted by gradually amplifying the animation effect, so that a viewer can know the video content more intuitively, and the user experience is improved.

In one embodiment, highlighting the target object at a location in the video frame that matches the target object includes: and amplifying and displaying a target delineation area including the target object in the video picture at a position matched with the target object, and synchronously playing special effect audio for highlighting the target object, wherein the target delineation area is an area including the target object and is cut from the video picture.

The special effect audio for highlighting the target object may be a default special effect audio, or may be set individually according to user requirements, and a duration of the special effect audio may be consistent with a duration of the amplified display of the target object, or may be slightly shorter than the duration of the amplified display of the target object, for example, may be 1 to 3 seconds. Specifically, when the terminal displays the target delineation area including the target object in the video picture in an enlarged manner, a section of special-effect audio is also played synchronously, and the special-effect audio and the target delineation area displayed in the enlarged manner are part of the video together, so that the target object is displayed prominently.

Optionally, when the terminal displays the target delineation region including the target object in the video picture in an enlarged manner, at least one of the animation special effect and the audio special effect can be played synchronously. Optionally, when the terminal displays the target delineation region including the target object in the video image in an enlarged manner, the terminal can also be vibrated to provide the user with the sensory interaction, so that the interestingness of the user in the video interaction is improved.

In the embodiment, the special effect audio for highlighting the target object is played synchronously while the target delineation area including the target object is displayed in an enlarged manner, so that the target object is more highlighted in the video picture, and the information conveyed to the user by the video is richer.

In one embodiment, highlighting the target object at a location in the video frame that matches the target object includes: in the video picture, continuous drawing and stroke are performed along the edge of the target object until the closed stroke is formed, and the drawing is finished and the closed stroke is displayed.

The stroke is dynamic and continuous prompt information, and is dynamic prompt information displayed while being drawn along the edge of the target object. When a target object matched with the voice exists in the video picture, the terminal displays a point along the edge of the target object to start continuous drawing and delineates, and displays the closed delineates after the delineates are finished. The color used for the stroke may be a color that can be highlighted in the video frame.

Optionally, after the drawing of the closed stroke is finished, the terminal continuously displays the closed stroke in the video picture until the duration of the continuous display is greater than a preset threshold, and cancels the display of the closed stroke.

Alternatively, the terminal may not display the process of drawing the stroke along the edge of the target object, but directly display the stroke of the target object in the video picture.

Optionally, the terminal may perform denoising and contrast enhancement on the video image to obtain a preprocessed image, then, an edge detection algorithm is used to identify edge pixel points belonging to the target object in the image, and the terminal performs delineation along the edge pixel points of the target object by using a preset color until a closed delineation of the target object is formed.

In this embodiment, in order to highlight the target object in the video frame, the target object is dynamically and continuously drawn along the edge of the target object and displayed to the user in the form of a stroked animation by displaying the stroked animation in the video frame, so that the user can be helped to clearly understand the object that is currently mentioned.

In one embodiment, highlighting the target object at a location in the video frame that matches the target object includes: when the target object moves in the video picture, the target object is highlighted at a position in the video picture that changes as the target object moves in the video picture.

Specifically, when the target object moves in the video picture, the terminal may track the target object in the video picture and highlight the tracked target object.

Optionally, after the target object stops moving in the video picture, when the duration of the target object being highlighted is less than a preset threshold, displaying the target object that is continuously highlighted in the video picture, and when the duration is greater than the preset threshold, canceling the highlighting of the target object in the video picture, and resuming the normal display of the target object in the video picture.

Optionally, when the target object stops moving and when the duration of the highlighted target object is less than a preset threshold, the highlighted target object in the video frame is immediately canceled, and the target object in the video frame resumes normal display. For example, for a target object moving in a video picture, the terminal may perform enlarged display, and when the target object stops moving and the length of time for which the target object is enlarged and displayed is less than a preset threshold, the terminal continues to display the enlarged and displayed target object in the video picture until the preset threshold is exceeded, and the target object in the video picture resumes normal display.

Alternatively, the terminal may track a target object continuously appearing in the video picture by using a target detection model based on a neural network, determine a position of the target object when the target object moves in the video picture by using the target detection model, and highlight the tracked target object at the position.

In some embodiments, the terminal may perform object identification only on a video picture in which the target object appears for the first time, and after the target object is identified, track the target object in a subsequent video picture, specifically, after determining key pixel points of the target object in the video picture in which the target object appears for the first time, find out pixel points closest to euclidean distances of the key pixel points in the subsequent video picture, so as to track the target object in the subsequent video picture and obtain a movement trajectory of the target object in the video picture, for example, the movement trajectory of the target object may be determined according to a change in positions of the key pixel points of the target object in the video picture. The key pixel points of the target object are pixel points capable of reflecting key features of the target object, for example, pixel points belonging to the target object and having a large local contrast, pixel points having a large local feature difference, and the like. And after the target object is highlighted by the highlight content in the video picture which appears for the first time, re-rendering and displaying the highlight content in the subsequent video pictures according to the moving track of the target object.

In addition, since the size of the same target object in each frame of video image may change, for example, when the camera of the terminal is pulled close or the terminal is close to the target object for shooting, the occupation screen of the target object in the video image becomes larger, and when the camera of the terminal is pulled far or the terminal is far away from the target object for shooting, the occupation screen of the target object in the video image becomes smaller. Therefore, the terminal can also calculate the scaling parameter of the target object in each frame of video picture according to the occupation ratio of the area of the target object in each video picture in the video picture. As mentioned above, the terminal may obtain the preset size of the highlighted target object, determine the scaling parameter corresponding to the highlighted target object according to the ratio of the target object in the video image and the preset size, determine the position of the target object in the video image according to the moving track of the target object, and then re-render and display the highlighted content of the target object according to the scaling parameter.

In this embodiment, when the target object moves in the video image, the position for highlighting the target object also moves, so that the target object is dynamically and continuously highlighted in the video image.

In one embodiment, highlighting the target object at a location in the video frame that matches the target object includes: and displaying the virtual decoration material for highlighting the target object at the position matched with the target object in the video picture.

The virtual decoration material for highlighting the target object can be a material matched with the style of the target object, can also be a random decoration material, can be a default decoration material, and can also be set in a personalized way according to the needs of a user. The virtual decoration material may be a static sticker, a dynamic sticker, a bubble or arrow, or the like.

Optionally, the virtual decoration material may be displayed in a style surrounding the target object, may also be displayed in a style partially covering the target object, and may also be displayed close to the target object. Optionally, the style of the virtual decoration material can be displayed according to the user-defined requirement.

In this embodiment, the virtual decoration material is displayed in the video picture close to the target object matched with the voice, so that the user can be helped to notice the target object in the video picture, and meanwhile, the interest and the user experience in the video acquisition process are improved.

In one embodiment, the method further comprises: and when the time length of the highlighted target object is less than a preset threshold value and the target object is moved out of the video picture, the highlighted target object in the video picture is canceled.

The foregoing object specifying operation for triggering highlighting of the target object is triggered, and the target object is highlighted only when the object specifying operation triggered by the automatic triggering operation or the triggering operation of the user needs to satisfy the requirement that the target object exists in the video image. Therefore, only when a target object matching the voice exists in the video picture, the target object is highlighted in the video picture.

In this embodiment, even if the duration of the target object being highlighted in the video frame is less than the threshold, after the target object is moved out of the video frame, for example, the target object moves in the video frame and gradually moves out of the video frame, or after the target object in the video frame is removed from the video frame, for example, the target object moves out of the video frame along with the movement of the shooting lens, that is, when the target object does not exist in the video frame, it is not necessary to continue highlighting the target object in the video frame.

In one embodiment, the target object is a first target object, and the method further includes: when object designation operation of a first target object in a designated video picture is triggered, highlighting the first target object at a position in the video picture matched with the first target object, when the highlighted duration of the first target object is less than a preset threshold value and when object designation operation of a second target object in the designated video picture is triggered, continuing highlighting the first target object in the video picture, and highlighting the second target object in the video picture.

The object specifying operation of the first target object and the object specifying operation of the second target object may be of the same operation type, and may be both operations automatically recognized by the terminal, or may be operations recognized by the terminal according to a trigger operation of a user, for example. The operation type of the object specifying operation of the first target object and the operation type of the object specifying operation of the second target object may be different, for example, the object specifying operation of the first target object may be automatically recognized by the terminal according to voice, and the object specifying operation of the second target object may be an operation of the detected user delineating the second target object.

Optionally, when the highlighted duration of the first target object is smaller than a preset threshold and a second target object matched with the current voice synchronously acquired from the video picture exists in the video picture, the first target object is continuously highlighted in the video picture, and the second target object is highlighted in the video picture.

Optionally, a time length of the first target object being highlighted is preset with a threshold, and when a trigger operation for delineating a second target object in the video image is detected, the first target object is continuously highlighted in the video image, and the second target object is highlighted at a position in the video image, where the position is matched with the second target object.

In this embodiment, the first target object is an object recognized first, when a second target object, that is, another target object, is recognized from the captured speech, and if the duration of the first target object being highlighted is less than a preset threshold, the first target object continues to be highlighted in the video frame, and the second target object is highlighted in the video frame at the same time, until the duration of the first target object being highlighted is greater than the preset threshold, the first target object is canceled from being highlighted.

For example, the preset threshold value is 3 seconds, the first target object is a 'little yellow duck' in a video picture, the 'little yellow duck' is displayed in an amplifying mode within 3 seconds, if a 'water cup' in the video picture is recognized from voice, the 'water cup' is displayed in an amplifying mode at the same time by the terminal, when the time length of the 'little yellow duck' displayed in the amplifying mode is longer than 3 seconds, the 'little yellow duck' is displayed normally, and when the time length of the 'water cup' displayed in the amplifying mode is longer than 3 seconds, the whole video picture is displayed normally.

In one embodiment, the target object is a first target object, and the method further includes: when an object designation operation for designating a first target object in a video picture is triggered, highlighting the first target object at a position in the video picture matched with the first target object, canceling the highlighting of the first target object in the video picture when the highlighted duration of the first target object is less than a preset threshold and triggering the object designation operation for designating a second target object in the video picture, and highlighting the second target object in the video picture.

Optionally, when the highlighted duration of the first target object is less than a preset threshold and a second target object matched with the current voice synchronously acquired from the video picture exists in the video picture, the highlighted presentation of the first target object is cancelled in the video picture, and the second target object is highlighted in the video picture.

Optionally, a time length of the first target object being highlighted is preset with a threshold, and when a trigger operation for delineating a second target object in the video picture is detected, the highlighting of the first target object in the video picture is cancelled, and the second target object is highlighted at a position in the video picture matched with the second target object.

In this embodiment, the first target object is an object recognized first, and when a second target object, that is, another target object, is recognized from the captured speech, and if the duration of time during which the first target object is highlighted is less than the preset threshold, the first target object is no longer highlighted in the video frame, but the highlighting of the first target object in the video frame is cancelled, and the second target object is highlighted in the video frame until the duration of time during which the second target object is highlighted is greater than the preset threshold, the highlighting of the second target object is cancelled.

For example, the preset threshold value is 3 seconds, the first target object is a 'little yellow duck' in the video picture, and within 3 seconds of the amplified display of the 'little yellow duck', if a 'water cup' in the video picture is recognized from voice, for example, a photographer says the 'water cup' at the moment, the terminal preferentially amplifies and displays the 'water cup', and hides the amplified 'little yellow duck', namely, the 'little yellow duck' is restored to the normal display. And the whole video picture is restored to be normally displayed until the time length of the amplified display of the 'water cup' is more than 3 seconds.

In one embodiment, the trigger operation for capturing the video is a trigger operation for starting live video, and the video picture is a live video picture, and the method further includes: transmitting the video live broadcast picture to a user terminal; when the target object is highlighted in the video live broadcast picture, the user terminal highlights the target object in the video playing interface; and when the target object is not highlighted in the video live broadcast picture, the user terminal cancels the highlighting of the target object in the video playing interface.

In a live video scene, the collected video pictures can be transmitted to the user terminal in real time. When the target object is highlighted in the live video picture, the target object in the video picture transmitted to the user terminal is also highlighted, and when the target object is not highlighted in the live video picture, the target object in the video picture transmitted to the user terminal is also synchronously highlighted, that is, the video effects seen by the main broadcasting side and the audience side of the live video are consistent. Therefore, when the main broadcast explains the target object or describes a certain target object, the target object highlighted in the video picture can improve the interaction between the main broadcast and the audience and improve the capability of the video for transmitting information to the user.

In one embodiment, step 202, in response to a trigger operation of capturing a video, capturing and displaying a video frame: responding to a trigger operation for starting video live broadcast, and displaying a video live broadcast picture; step 204, when an object specifying operation for specifying a target object in a video image is triggered, highlighting the target object at a position in the video image matched with the target object, including: when target commodity information matched with live commenting voice synchronously acquired with the live video pictures exists in the live video pictures, the target commodity information is highlighted and displayed at a position matched with the target commodity information in the live video pictures.

The target commodity information may be a picture including the target commodity, or may be the target commodity itself. In a live online shopping scene, when a collected video picture comprises target commodity information and live commentary voice synchronously collected with the video picture also refers to the target commodity information, the target commodity information is highlighted at a position matched with the target commodity information in the live video picture. Thus, the user who watches the live introduction of the commodity can further know the object and the content introduced by the live narration voice through the highlighted content.

Optionally, step 204 may also be: and when the trigger operation for delineating the target commodity information in the live video frame is detected, the target commodity information is highlighted and displayed at the position matched with the target commodity information in the live video frame.

That is to say, in the live broadcast process, the anchor can manually circle the target commodity information in the live video picture according to the wish of the anchor, so that the target commodity information is highlighted in the live video picture, and the anchor can be assisted to accurately convey the information related to the commodity in the live broadcast process.

In one embodiment, the method further comprises: responding to the trigger operation of ending video acquisition to obtain a target video; responding to the trigger operation of playing the target video, playing the target video and displaying a video playing interface; when the video playing interface is played to a video picture comprising the target object, the target object is highlighted at a position matched with the target object in the video playing interface.

The triggering operation for ending the video acquisition can be automatically triggered after the video acquisition time reaches a preset time, or can be the triggering operation for manually ending the video acquisition by a user. Specifically, after the target video is obtained after the video capture is finished, when the target video is played to a video screen including the target object, the target object will be highlighted in the video playing interface as in the capture process.

Optionally, the terminal may record a time point when the target object is started to be prominently displayed, and when the target video is played, the target object is prominently displayed from the time point until the time length of the prominence display is greater than a preset threshold, and the prominence display of the target object is cancelled. For example, it is recorded that "little yellow duck" is at 00: 01 are shown enlarged, starting from 00: 01 to 00: the enlarged 'little yellow duck' is shown at the time period of 03, and the weight ratio of the duck to the duck is 00: and 03, hiding the amplified little yellow ducks, and normally displaying the 'little yellow ducks' in the video pictures.

As shown in fig. 9, a video processing method is provided, which is described by taking the method as an example applied to the terminal 102 in fig. 1, and includes the following steps:

and step 902, responding to the triggering operation of playing the video, playing the video and displaying a video playing interface.

Step 904, when the object designating operation for designating the target object in the played video picture is triggered, highlighting the target object at the position matched with the target object in the video playing interface.

And step 906, canceling the highlighting of the target object in the video playing interface when the target object meets the highlighting ending condition after being highlighted.

In one embodiment, step 904, when an object specifying operation for specifying a target object in a played video frame is triggered, performing a highlight display on the target object at a position in a video playing interface, where the position matches the target object, includes: and when a target object matched with the voice synchronized with the video picture exists in the video picture, highlighting the target object at a position matched with the target object in the video playing interface.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: and at the position matched with the target object in the video playing interface, magnifying and displaying a target delineation area comprising the target object, wherein the target delineation area is an area comprising the target object and is cut from the video picture.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: and gradually amplifying and displaying a target delineation area including the target object within a preset time at a position matched with the target object in the video playing interface, wherein the target delineation area is an area including the target object and is intercepted from the video picture.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: and at the position matched with the target object in the video playing interface, magnifying and displaying a target delineation area comprising the target object, and synchronously playing special effect audio for highlighting the target object, wherein the target delineation area is an area which is cut from the video picture and comprises the target object.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: and in the video playing interface, continuously drawing and tracing along the edge of the target object until a closed tracing is formed, ending drawing and displaying the closed tracing.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: when the target object moves in the video picture, the target object is highlighted at a position in the video playing interface, wherein the position changes along with the movement of the target object in the video picture.

In one embodiment, highlighting the target object at a location in the video playback interface that matches the target object includes: and displaying the virtual decoration material for highlighting the target object at a position matched with the target object in the video playing interface.

In one embodiment, canceling the highlighting of the target object in the video playing interface when the target object meets the highlighting ending condition after being highlighted comprises: and when the time length of the highlighted target object is greater than a preset threshold value, the highlighted target object in the video picture is cancelled.

In one embodiment, canceling the highlighting of the target object in the video playing interface when the target object meets the highlighting ending condition after being highlighted comprises: and when the time length of the highlighted target object is less than a preset threshold value and the target object is moved out of the video picture, the highlighted target object in the video picture is canceled.

In one embodiment, the target object is a first target object, and when an object specifying operation for specifying the target object in the played video picture is triggered, highlighting the target object at a position in the video playing interface, where the position matches the target object, includes: when the object designation operation of a first target object in a designated video picture is triggered, highlighting the first target object at a position in the video playing interface matched with the first target object, and when the object designation operation of a second target object in the designated video picture is triggered, continuing highlighting the first target object in the video playing interface and highlighting the second target object in the video playing interface.

In one embodiment, the target object is a first target object, and when an object specifying operation for specifying the target object in the played video picture is triggered, highlighting the target object at a position in the video playing interface, where the position matches the target object, includes: when the object designation operation of a first target object in a designated video picture is triggered, highlighting the first target object at a position in the video playing interface matched with the first target object, and when the object designation operation of a second target object in the designated video picture is triggered, canceling the highlighting of the first target object in the video playing interface and highlighting the second target object in the video playing interface.

For the specific embodiment of the method, reference is made to the foregoing description, which is not repeated herein.

According to the video processing method, in the video playing process, when the object designation operation for designating the target object in the video picture is triggered, the target object in the video playing interface is automatically highlighted and displayed until the highlight display ending condition is met after the highlight display, the highlight display of the target object in the video playing interface is cancelled, and the normal playing of a normal video picture is recovered, so that the target object can be more visually known by audiences in the video playing process, the transmission efficiency of video information is improved, the video is not required to be manually edited in the later period, the video is not required to be processed by using auxiliary materials, the video processing efficiency can be improved, and the user experience is improved.

Fig. 10 is a schematic flow chart of a video processing method in a specific embodiment. Referring to fig. 10, referring to part (a) of fig. 10, first, the terminal stores an object name in association with an object image in the cloud server by inputting an image of a different object, and at the same time, the terminal obtains a sound spectrum image of the different object by inputting a voice of the different object, and stores the object name in association with the corresponding sound spectrum image in the cloud server. Then, referring to part (b) of fig. 10, after the terminal starts to collect a video picture, sending the video picture and the voice collected synchronously with the video picture to the cloud server, recognizing the video picture and the voice through the cloud server, displaying characters corresponding to the voice in the video picture, and judging whether an object mentioned by the voice is recognized, if not, normally displaying the video picture, judging whether an object in the video picture is recognized, if not, normally displaying the video picture, when recognizing the object mentioned by the voice and the object in the video picture, matching the two recognized objects, if matching is successful, displaying the object in the video picture in an enlarged manner, and if matching is unsuccessful, normally displaying the video picture.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in a strict order unless explicitly stated herein, and may be performed in other orders. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

In one embodiment, as shown in fig. 11, there is provided a video processing apparatus 1100, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a response module 1102, a first display module 1104, and a second display module 1106, wherein:

a response module 1102, configured to collect and display a video picture in response to a trigger operation for collecting a video;

a first display module 1104, configured to, when an object specifying operation that specifies a target object in a video frame is triggered, highlight the target object at a position in the video frame that matches the target object;

a second showing module 1106, configured to cancel the highlighting of the target object in the video frame when the target object meets the highlighting ending condition after being highlighted.

In one embodiment, the first presenting module 1104 is further configured to highlight the target object at a position in the video picture matching the target object when the target object matching the voice captured in synchronization with the video picture exists in the video picture.

In one embodiment, the first rendering module 1104 is further configured to highlight the target object in the video frame at a location in the video frame that matches the target object in response to a triggering operation for delineating the target object in the video frame.

In one embodiment, the first presenting module 1104 is further configured to present a target delineation region including the target object in the video picture in an enlarged manner at a position matched with the target object, the target delineation region being a region including the target object and cut out from the video picture.

In one embodiment, the first presenting module 1104 is further configured to present a target delineation area including the target object in the video frame in a gradually enlarged manner within a preset time period at a position in the video frame matching the target object, where the target delineation area is an area including the target object and cut out from the video frame.

In one embodiment, the first presentation module 1104 is further configured to enlarge and present a target delineation region including the target object in the video screen at a position matching the target object, and to synchronously play the special effect audio for highlighting the target object, the target delineation region being a region including the target object cut from the video screen.

In one embodiment, the first presentation module 1104 is further configured to display a continuous drawing stroke along an edge of the target object in the video frame until the drawing is finished and the closed stroke is displayed when the closed stroke is formed.

In one embodiment, the first presentation module 1104 is further configured to highlight the target object at a position in the video frame that varies with the movement of the target object in the video frame as the target object moves in the video frame.

In one embodiment, the first presentation module 1104 is further configured to present virtual decoration material for highlighting the target object at a location in the video frame that matches the target object.

In one embodiment, the second presentation module 1106 is further configured to cancel the highlighting of the target object in the video frame when the duration of the highlighting of the target object is greater than a preset threshold.

In one embodiment, the second displaying module 1106 is further configured to cancel the highlighting of the target object in the video frame when the duration of the highlighting of the target object is less than a preset threshold and when the target object moves out of the video frame.

In one embodiment, the target object is a first target object, and the first display module 1104 is further configured to highlight the first target object at a position in the video frame that matches the first target object when an object designation operation that designates the first target object in the video frame is triggered, and continue to highlight the first target object in the video frame and highlight the second target object in the video frame when an object designation operation that designates the second target object in the video frame is triggered.

In one embodiment, the target object is a first target object, the first showing module 1104 is further configured to highlight the first target object at a position in the video frame that matches the first target object when an object specifying operation is triggered to specify the first target object in the video frame, and cancel the highlighting of the first target object in the video frame and highlight the second target object in the video frame when an object specifying operation is triggered to specify the second target object in the video frame.

In one embodiment, the trigger operation of capturing the video is a trigger operation of starting live video, and the video picture is a live video picture, the apparatus further includes:

the transmission module is used for transmitting the video live broadcast picture to the user terminal;

the video live broadcast picture transmitted to the user terminal is used for highlighting the target object in the video playing interface when the target object is highlighted in the video live broadcast picture; and when the target object is not highlighted in the video live-action picture, the user terminal cancels the highlight display of the target object in the video playing interface.

In one embodiment, the response module 1102 is further configured to display a live video frame in response to a trigger operation for starting a live video; the first display module 1104 is further configured to, when there is target commodity information matched with the live commentary voice synchronously acquired from the video live broadcast picture in the video live broadcast picture, prominently display the target commodity information at a position matched with the target commodity information in the video live broadcast picture.

In one embodiment, the apparatus further comprises a third presentation module for presenting, in the video frame, text material corresponding to speech captured in synchronization with the video frame.

In one embodiment, the apparatus further includes a video generation module, configured to obtain a target video in response to a trigger operation for ending video acquisition; the video playing module is used for responding to triggering operation of playing the target video, playing the target video and displaying a video playing interface; when the video picture including the target object is played, the voice synchronously collected with the video picture is taken from the position matched with the target object in the video playing interface; and placing the voice spectrum image corresponding to the voice and the preset voice spectrum images corresponding to various objects, and highlighting the target object.

In one embodiment, the response module 1102 is further configured to, in response to a trigger operation of capturing a video, capture a video frame and display the video frame in the video capture interface; the first display module 1104 is further configured to highlight a target area corresponding to a trigger operation in a video picture in response to the trigger operation for delineating the target area in the video capture interface.

In the above embodiment, the first presenting module 1104 is further configured to highlight the target area including the target object in the video picture when the target area includes the target object matched with the voice captured synchronously with the video picture.

In one embodiment, the apparatus further comprises a matching module for obtaining the voice collected synchronously with the video picture; comparing the voice spectrum image corresponding to the voice with preset voice spectrum images corresponding to various objects, and determining the object matched with the voice according to the comparison result; carrying out object identification on the video picture to identify an object contained in the video picture; and matching the object matched with the voice with the recognized object, and determining that a target object matched with the voice synchronously acquired with the video picture exists in the video picture when the matching is successful.

Above-mentioned video processing apparatus 1100, in the video acquisition's in-process, the target object that the pronunciation that automatic identification and video picture synchronous collection match, and when there is this target object in the video picture, begin to show that the target object in the video picture is outstanding automatically, when the duration until showing that is outstanding is greater than the predetermined threshold value, cancel the outstanding show to this target object in this video picture, resume normal video picture, thereby can let spectator's more audio-visual understanding target object, video information's transmission efficiency has been promoted, and do not need the later stage to carry out manual editing to this video, also need not utilize auxiliary material to process this video, can promote the efficiency of handling the video, improve user experience.

In one embodiment, an apparatus for processing video is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: a response module, a first display module, and a second display module, wherein:

and the second display module is used for canceling the highlight display of the target object in the video playing interface when the target object meets the highlight display ending condition after the highlight display.

Above-mentioned video processing apparatus, in the in-process of video picture broadcast, when the object designation operation that is arranged in appointed video picture target object is triggered, begin to carry out the outstanding show to this target object in the video broadcast interface automatically, when satisfying outstanding show end condition after showing prominently, cancel the outstanding show to this target object again in this video broadcast interface, resume the normal broadcast of normal video picture, thereby can let spectator's more audio-visual understanding target object in the video broadcast in-process, the transmission efficiency of video information has been promoted, and do not need the later stage to carry out manual editing to this video, also need not utilize auxiliary material to process this video, can promote the efficiency of handling the video, improve user experience.

For specific limitations of the video processing apparatus, reference may be made to the above limitations of the video processing method, which is not described herein again. The respective modules in the video processing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of processing video. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is further provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A method for processing video, the method comprising:

2. The method according to claim 1, wherein the highlighting of the target object at a position in the video picture matching the target object when triggering an object designation operation that designates the target object in the video picture comprises:

and when a target object matched with the voice synchronously collected with the video picture exists in the video picture, highlighting the target object at a position matched with the target object in the video picture.

3. The method according to claim 1, wherein the highlighting of the target object at a position in the video picture matching the target object when triggering an object designation operation that designates the target object in the video picture comprises:

in response to a trigger operation for delineating a target object in the video picture, highlighting the target object at a location in the video picture that matches the target object.

4. The method of claim 1, wherein highlighting the target object at a location in the video frame that matches the target object comprises:

and at the position matched with the target object in the video picture, magnifying and showing a target delineation area comprising the target object, wherein the target delineation area is an area which is cut from the video picture and comprises the target object.

5. The method of claim 1, wherein highlighting the target object at a location in the video frame that matches the target object comprises:

and gradually magnifying and displaying a target delineation area including the target object within a preset time length at a position matched with the target object in the video picture, wherein the target delineation area is an area including the target object and is cut from the video picture.

6. The method of claim 1, wherein highlighting the target object at a location in the video frame that matches the target object comprises:

and magnifying and displaying a target delineation area including the target object in the video picture at a position matched with the target object, and synchronously playing special effect audio for highlighting the target object, wherein the target delineation area is an area including the target object and is cut from the video picture.

7. The method according to claim 1, wherein canceling the highlighting of the target object in the video frame when a highlighting end condition is satisfied after the target object is highlighted comprises:

and when the highlighted duration of the target object is greater than a preset threshold, canceling the highlighted presentation of the target object in the video picture.

8. The method according to claim 1, wherein canceling the highlighting of the target object in the video frame when a highlighting end condition is satisfied after the target object is highlighted comprises:

and when the highlighted duration of the target object is less than a preset threshold value and the target object is moved out of the video picture, the highlighted presentation of the target object in the video picture is cancelled.

9. The method according to claim 1, wherein the target object is a first target object, and the highlighting the target object at a position in the video picture matching the target object when an object specifying operation for specifying the target object in the video picture is triggered comprises:

when an object designation operation for designating a first target object in the video picture is triggered, highlighting the first target object at a position in the video picture matched with the first target object, when the duration of the first target object being highlighted is less than a preset threshold and when an object designation operation for designating a second target object in the video picture is triggered, continuing highlighting the first target object in the video picture, and highlighting the second target object in the video picture.

10. The method according to claim 1, wherein the target object is a first target object, and the highlighting the target object at a position in the video picture matching the target object when an object specifying operation for specifying the target object in the video picture is triggered comprises:

when an object designation operation for designating a first target object in the video picture is triggered, highlighting the first target object at a position in the video picture matched with the first target object, when the time length for highlighting the first target object is less than a preset threshold value and when an object designation operation for designating a second target object in the video picture is triggered, canceling the highlighting of the first target object in the video picture, and highlighting the second target object in the video picture.

11. The method of claim 1, wherein the trigger operation for capturing the video is a trigger operation for starting a live video, and the video frame is a live video frame, and the method further comprises:

transmitting the video live broadcast picture to a user terminal;

when the target object is highlighted in the live video frame, the user terminal highlights the target object in a video playing interface;

and when the target object is not highlighted in the video live broadcast picture, the user terminal cancels the highlighting of the target object in a video playing interface.

12. The method of claim 1, wherein capturing and displaying video frames in response to a triggering operation for capturing video comprises:

responding to a trigger operation for starting video live broadcast, and displaying a video live broadcast picture;

when an object specifying operation for specifying a target object in the video picture is triggered, highlighting the target object at a position in the video picture which is matched with the target object, including:

when target commodity information matched with live commenting voice synchronously acquired with the live video pictures exists in the live video pictures, the target commodity information is highlighted and displayed at a position matched with the target commodity information in the live video pictures.

13. The method of claim 1, further comprising:

responding to the trigger operation of ending video acquisition to obtain a target video;

responding to a trigger operation for playing the target video, and playing the target video;

when the video picture including the target object is played, highlighting the target object at a position matched with the target object.

14. The method according to any one of claims 1 to 13, further comprising:

acquiring voice synchronously acquired with the video picture;

comparing the voice spectrum image corresponding to the voice with preset voice spectrum images corresponding to various objects, and determining the object matched with the voice according to a comparison result;

carrying out object recognition on the video picture, and recognizing an object contained in the video picture;

and matching the object matched with the voice with the recognized object, and triggering an object specifying operation for specifying a target object in the video picture when the matching is successful.

15. A method for processing video, the method comprising: