CN116958406A

CN116958406A - Face three-dimensional reconstruction method and device, electronic equipment and storage medium

Info

Publication number: CN116958406A
Application number: CN202310253330.0A
Authority: CN
Inventors: 陈人望; 曹玮剑; 汪铖杰; 张振宇; 葛志鹏; 丁中干; 赵艳丹; 王福东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-10-27

Abstract

The application discloses a face three-dimensional reconstruction method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of face images of a target face corresponding to a plurality of shooting visual angles; determining a field angle corresponding to each face image in the plurality of face images; rendering the initial three-dimensional face model into two-dimensional images based on the view angles corresponding to each face image to obtain rendered face images corresponding to each face image; and adjusting the face parameters of the initial three-dimensional face model based on each face image and the rendering face image corresponding to each face image until the preset convergence condition is met, so as to obtain the target three-dimensional reconstruction face. The application utilizes the actual view angle of the face image and the information of multiple view angles to assist the three-dimensional reconstruction of the face based on the three-dimensional face model, thereby not only obtaining more accurate target three-dimensional reconstructed face, but also having smaller limit on the shooting illumination environment of a plurality of face images and improving the stability of the face reconstruction.

Description

Face three-dimensional reconstruction method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for three-dimensional reconstruction of a face, an electronic device, and a storage medium.

Background

Face three-dimensional (3D) reconstruction refers to a 3D model of a face reconstructed from one or more two-dimensional (2D) images.

In the related art, a two-dimensional face image is generally reconstructed based on a three-dimensional deformable face model (3D morphable model,3DMM) to obtain a corresponding three-dimensional face. However, the three-dimensional face reconstructed based on the 3DMM model in the related art has a large difference from the actual face, and the ambiguity from the 2D image to the 3D space cannot be well eliminated, resulting in poor accuracy of reconstruction.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a face three-dimensional reconstruction method, a device, electronic equipment and a storage medium. The technical scheme is as follows:

in one aspect, a method for three-dimensional reconstruction of a face is provided, the method comprising:

acquiring a plurality of face images of a target face corresponding to a plurality of shooting visual angles;

determining a field angle corresponding to each face image in the plurality of face images;

rendering the initial three-dimensional face model into two-dimensional images based on the view angles corresponding to each face image to obtain rendered face images corresponding to each face image;

And adjusting the face parameters of the initial three-dimensional face model based on each face image and the rendering face image corresponding to each face image until the preset convergence condition is met, so as to obtain the target three-dimensional reconstruction face.

In another aspect, a three-dimensional face reconstruction device is provided, the device including:

the face image acquisition module is used for acquiring a plurality of face images of a target face corresponding to a plurality of shooting visual angles;

the view angle determining module is used for determining the view angle corresponding to each face image in the plurality of face images;

the rendering module is used for respectively rendering the initial three-dimensional face model into two-dimensional images based on the view angles corresponding to each face image to obtain rendered face images corresponding to each face image;

and the face parameter optimization module is used for adjusting the face parameters of the initial three-dimensional face model based on each face image and the rendered face image corresponding to each face image until the preset convergence condition is met, so as to obtain the target three-dimensional reconstruction face.

In an exemplary embodiment, the rendering module includes:

the first acquisition module is used for acquiring initial face parameters;

the model initialization module is used for constructing an initial three-dimensional face model based on the initial face parameters and the basis vectors corresponding to the face parameters; the base vector is a vector obtained based on a preset face data set and used for representing basic attributes of the face;

The second acquisition module is used for acquiring initial attitude parameters and initial illumination parameters corresponding to each face image;

and the rendering sub-module is used for respectively rendering the initial three-dimensional face model into a two-dimensional image based on the field angle, the initial posture parameter and the initial illumination parameter corresponding to each face image to obtain a rendered face image corresponding to each face image.

In an exemplary embodiment, the face parameter optimization module includes:

the rendering loss determination module is used for determining rendering loss based on pixel differences between each face image and the corresponding rendering face image;

the key point loss determining module is used for determining key point loss based on the image position difference between the first face key point in each face image and the second face key point in the corresponding rendered face image; the second face key points are pixel points, corresponding to the first face key points on the initial three-dimensional face model, of which the vertexes are projected to the rendered face image;

a comprehensive loss determination module for determining a comprehensive loss based at least on the rendering loss and the keypoint loss;

and the parameter adjustment module is used for adjusting the face parameters of the initial three-dimensional face model, the initial posture parameters and the initial illumination parameters based on the comprehensive loss until a preset convergence condition is reached, so as to obtain the target three-dimensional reconstruction face.

In an exemplary embodiment, the face parameters include shape parameters and texture parameters, and the comprehensive loss determination module includes:

a regularization loss determination module for determining a regularization loss based on the shape parameter and the texture parameter;

the weight determining module is used for determining regularization loss weight, rendering loss weight and key point loss weight;

and the weighted summation module is used for carrying out weighted summation on the regularization loss, the rendering loss and the key point loss based on the regularization loss weight, the rendering loss weight and the key point loss weight to obtain comprehensive loss.

In an exemplary embodiment, the face parameters further include an expression parameter, and the regularization loss determination module is specifically configured to determine regularization loss based on the shape parameter, the texture parameter, and the expression parameter.

In an exemplary embodiment, the view angle determining module includes:

the third acquisition module is used for acquiring the lens focal length and the image sensor size corresponding to each face image;

and the view angle calculation module is used for determining the view angle corresponding to each face image based on the ratio of the size of the image sensor corresponding to each face image to the focal length of the lens.

In an exemplary embodiment, the plurality of face images includes face images of the target face in different illumination environments.

In another aspect, an electronic device is provided, including a processor and a memory, where the memory stores at least one instruction or at least one program, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the face three-dimensional reconstruction method of any one of the above aspects.

In another aspect, a computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement a face three-dimensional reconstruction method as in any of the above aspects is provided.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the face three-dimensional reconstruction method of any one of the above aspects.

According to the embodiment of the application, the plurality of face images corresponding to a plurality of shooting visual angles of the target face are obtained, the visual angle corresponding to each face image is determined, the initial three-dimensional face model is rendered into the two-dimensional image based on the visual angle corresponding to each face image to obtain the rendered face image corresponding to each face image, and the face parameters of the initial three-dimensional face model are adjusted based on each face image and the corresponding rendered face image to obtain the final target three-dimensional reconstructed face, so that the actual visual angle of the face image and the information of the plurality of visual angles are utilized to assist the three-dimensional reconstruction of the face based on the three-dimensional face model, the more accurate target three-dimensional reconstructed face can be obtained, the shooting illumination environment of the plurality of face images is limited less, and the stability of the face reconstruction is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

fig. 2 is a schematic flow chart of a face three-dimensional reconstruction method according to an embodiment of the present application;

fig. 3 is an example of face image corresponding shooting information provided by an embodiment of the present application;

fig. 4 is a schematic flow chart of another face three-dimensional reconstruction method according to an embodiment of the present application;

FIG. 5 is an example of the loss involved in three-dimensional reconstruction of a face provided by an embodiment of the present application;

fig. 6 is a block diagram of a face three-dimensional reconstruction device according to an embodiment of the present application;

fig. 7 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

The terms involved in the embodiments of the present application are explained below.

3DMM: the model is called 3D Morphable Model, namely a three-dimensional deformable face model, is a general three-dimensional face model, and represents a face by using fixed points. The 3DMM may include a shape base vector representing a shape attribute of a face, a texture base vector representing a texture attribute of the face, and an expression base vector representing an expression of the face.

Fov: the Field of view, or Field angle, refers to the angle formed by the visible area of the camera along the vertical/horizontal axis.

Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown, where the implementation environment includes a terminal 110 and a server 120, and communication between the terminal 110 and the server 120 may be through a wired or wireless network connection.

Terminal 110 includes, but is not limited to, a cell phone, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle terminal, an aircraft, and the like. The terminal 110 is installed with client software having a face image processing function, such as an Application (App), which may be a stand-alone Application or a subroutine in the Application. In the embodiment of the application, the face image processing function can be specifically a function of reconstructing a corresponding three-dimensional face based on a two-dimensional face image.

The server 120 may provide a background service for an application in the terminal 110, which may specifically be an image data processing service. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

In an exemplary embodiment, the terminal 110 and the server 120 may be node devices in a blockchain system, and may share the acquired and generated information to other node devices in the blockchain system, so as to implement information sharing between multiple node devices. The plurality of node devices in the blockchain system can be configured with the same blockchain, the blockchain consists of a plurality of blocks, and the blocks adjacent to each other in front and back have an association relationship, so that the data in any block can be detected through the next block when being tampered, thereby avoiding the data in the blockchain from being tampered, and ensuring the safety and reliability of the data in the blockchain.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.

The three-dimensional reconstruction of a human face is taken as one of important research directions in computer vision, and a two-dimensional human face image is generally reconstructed based on a three-dimensional deformable human face model (3D morphable model,3DMM) at present to obtain a corresponding three-dimensional human face, but the three-dimensional human face reconstructed at present has a large difference from an actual human face, so that ambiguity from a 2D image to a 3D space cannot be well eliminated, and the accuracy of reconstruction is poor.

The inventor finds that a preset fixed field angle is adopted for rendering when the 3DMM face is reconstructed in the related technology in the process of realizing the technical scheme of the application, and the preset fixed field angle is different from the actual field angle of the two-dimensional face image to be reconstructed, so that the reconstructed three-dimensional face is greatly different from the actual face.

Based on the above, the embodiment of the application provides a face three-dimensional reconstruction method, which comprises the steps of obtaining a plurality of face images of a target face corresponding to a plurality of shooting visual angles, determining the visual angle corresponding to each face image, further respectively rendering an initial three-dimensional face model into a two-dimensional image based on the visual angle corresponding to each face image to obtain a rendered face image corresponding to each face image, and adjusting face parameters of the initial three-dimensional face model based on each face image and the corresponding rendered face image to obtain a final target three-dimensional reconstruction face, thereby assisting the three-dimensional reconstruction of the face based on the three-dimensional face model by utilizing the actual visual angle of the face image and the information of the plurality of visual angles, not only obtaining a more accurate target three-dimensional reconstruction face, but also having less limitation on shooting of the plurality of face images and improving the stability of the face reconstruction.

Referring to fig. 2, a flow chart of a face three-dimensional reconstruction method according to an embodiment of the present application is shown, where an execution subject of the method may be an electronic device, and the electronic device may be a terminal or a server. It is noted that the present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In actual system or product execution, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). As shown in fig. 2 in particular, the method may include:

s201, a plurality of face images of the target face corresponding to a plurality of shooting visual angles are obtained.

The target face is a face to be reconstructed, and each face image is an image obtained by shooting the target face under a shooting view angle. It will be appreciated that one or several face images may be taken at each viewing angle.

The plurality of photographing angles may be photographing angles of the image capturing device set based on actual needs, and different photographing angles may photograph different poses of the target face, such as a front face, a left side face, a right side face, a look-up, a look-down, and the like.

It should be noted that, the multiple face images of the target face may be obtained by taking a self-timer scene as an example by the same image capturing device under multiple photographing angles, and the user may use the image capturing device (such as a terminal device configured with a camera) to capture 2-3 self-timer face images of the user's face from different photographing angles, so as to obtain 2-3 self-timer face images of the user under different photographing angles.

The plurality of face images of the target face may also be images of the target face captured by different image capturing apparatuses at a plurality of capturing angles, for example, the image capturing apparatus 1 captures a face image p1 of the target face at a capturing angle a, and the image capturing apparatus 2 captures a face image p2 and a face image p3 of the target face at a capturing angle b and a capturing angle c, respectively.

The plurality of face images corresponding to the plurality of shooting angles of the target face may be images obtained by real-time shooting based on a camera on the terminal device, and in the case that the execution main body of the embodiment of the application is the terminal device, the terminal device may obtain the plurality of face images after the plurality of face images corresponding to the plurality of shooting angles of the target face are collected in real time based on the camera on the terminal device; in the case that the execution subject of the embodiment of the application is a background server, after the terminal equipment acquires a plurality of face images corresponding to a plurality of shooting angles of a target face in real time based on the camera thereon, the terminal equipment can send a face three-dimensional reconstruction request to the background server based on the plurality of face images, and after receiving the face three-dimensional reconstruction request, the background server analyzes the request to obtain a plurality of face images corresponding to a plurality of shooting angles of the target face.

The plurality of face images of the target face corresponding to the plurality of shooting angles may be target face images stored in a history image library of a target object to which the target face belongs, and the history image library may be an image library of a terminal local or cloud. In a specific implementation, the target object can select a plurality of face images with different shooting angles from the historical image library thereof, and then initiate three-dimensional reconstruction of the face based on the selected face images with different shooting angles.

For example, the plurality of face images of the target face corresponding to the plurality of photographing angles may be obtained by photographing the target face in different illumination environments.

S203, determining a field angle corresponding to each face image in the plurality of face images.

The angle of view corresponding to each face image is the angle of view Fov adopted by the camera when the image is captured.

The size of the angle of view determines the field of view of the optical instrument, takes the lens of the optical instrument as the vertex, and uses the included angle formed by the two edges of the maximum range of the object image of the measured object, which is called the angle of view. The field angle is related to the focal length of the lens and the size of the image sensor. The larger the size of the image sensor, the larger the field angle, assuming the same focal length of the lens, and vice versa, the larger the focal length of the lens, the larger the field angle, assuming the same size of the image sensor. The focal length of the lens represents the distance between the lens and the focused image on the image sensor.

In an exemplary embodiment, the determining the field angle corresponding to each of the plurality of face images may be: and acquiring the lens focal length and the image sensor size corresponding to each face image, and further determining the view angle corresponding to each face image based on the ratio of the image sensor size corresponding to each face image and the lens focal length.

Illustratively, the field angle Fov corresponding to each face image may be calculated based on the following formula (1):

wherein f _cam Representing a focal length of a lens used for shooting the face image; ccdw represents the image sensor size used to capture the face image.

By utilizing the lens focal length and the image sensor size employed at the time of photographing each face image, an independent angle of view Fov for each face image can be obtained.

In a specific implementation, the size of the image sensor may be found based on the model of the image capturing device that captures the face image, for example, the size of the image sensor corresponding to the camera model "Canon EIS 400D DIGITAL" is 22.4mm.

In practical applications, each face image may correspond to shooting information, where the shooting information includes a focal length of a lens used in shooting and related parameters of an image capturing device (such as a model number, an exposure time, etc. of the image capturing device), as shown in fig. 3, which is an example of shooting information corresponding to a certain face image, as can be seen from fig. 3, a camera model number used in shooting the face image is "Canon EIS 400D DIGITAL", and thus an image sensor size corresponding to the camera model number may be determined to be 22.4mm, and with continued reference to fig. 3, a focal length of a lens used in shooting the face image may be determined to be 50 mm, and then a viewing angle corresponding to the face image may be determined based on the foregoing formula (1)

In the above embodiment, the actual field angle corresponding to each face image during shooting can be rapidly and accurately determined by the ratio of the size of the image sensor corresponding to each face image to the focal length of the lens.

And S205, respectively rendering the initial three-dimensional face model into two-dimensional images based on the view angles corresponding to each face image to obtain rendered face images corresponding to each face image.

The initial three-dimensional face model is a three-dimensional face model obtained by initializing a standard three-dimensional face model, the standard three-dimensional face model is represented based on a preset face data set, the specific standard three-dimensional face model can be a 3DMM model, the 3DMM model is composed of a network (Mesh), each dimension coefficient of the 3DMM model controls local change of a face, and the network (Mesh) can be a triangular network. The triangle network may be composed of vertices in three-dimensional space and triangular patches between the three vertices, each of which may contain information of color, normal, etc. in addition to the position coordinates.

Based on the 3DMM model, any face may be weighted and combined by all preset face models (assuming m preset face models) in the preset face dataset as follows formula (2):

Wherein S is _model Representing a three-dimensional face shape;representing an average of the shapes of all preset face models in the preset face data set; s is(s) _i Representing a shape basis vector determined based on a preset face data set; t (T) _model Representing three-dimensional face textures; />Representing the average of textures of all preset face models in the preset face data set; t is t _i Representing a shape basis vector determined based on a preset face data set; alpha _i 、β _i Are all face parameters.

It should be noted that, the expression of any three-dimensional face may be further weighted and combined by introducing expression basis vectors of the 3DMM model based on the above formula (2).

From the above representation of any face, the problem to be solved in solving a three-dimensional face based on 3DMM is the face parameters (α as described above _i 、β _i Equal coefficients). Based on this, as shown in fig. 4, the step S205 may include, when implemented:

s401, acquiring initial face parameters.

The initial face parameters can be set arbitrarily based on practical experience, and the face parameters are optimized and adjusted continuously in the following steps.

S403, constructing an initial three-dimensional face model based on the initial face parameters and the basis vectors corresponding to the face parameters.

The base vector is a vector obtained based on a preset face data set and used for representing basic attributes of the face. The basic attributes of the face include shape, texture, expression and the like.

Corresponding to the formula (2), the face parameters include a shape parameter alpha _i And texture parameter beta _i And further the shape parameter alpha _i The corresponding basis vector is the shape basis vector s _i Texture parameter beta _i The corresponding base vector is the texture base vector, i.e., t _i Furthermore, an initial three-dimensional face model can be constructed by using the formula (2).

S405, acquiring initial attitude parameters and initial illumination parameters corresponding to each face image.

The initial pose parameters and the initial illumination parameters corresponding to each face image represent initialized pose parameters and initialized illumination parameters, which can be set based on actual experience, and the initial pose parameters and the initial illumination references of different face images can be the same or different, and the initial pose parameters and the initial illumination parameters can be continuously optimized and adjusted in later steps. The gesture parameters are used for rotating and translating the model in the rendering process, and the illumination parameters are used for providing illumination information in the rendering process.

And S407, respectively rendering the initial three-dimensional face model into a two-dimensional image based on the view angle, the initial attitude parameter and the initial illumination parameter corresponding to each face image to obtain a rendered face image corresponding to each face image.

Wherein, the corresponding field angle of each face image can be used for providing corresponding field range information in the rendering process.

Specifically, a micro-renderer can be adopted, and the initial three-dimensional face model is mapped to the two-dimensional image by utilizing the field angle, the initial attitude parameter and the initial illumination parameter corresponding to each face image, so as to obtain a rendered face image corresponding to each face image.

S207, based on each face image and the rendering face image corresponding to each face image, adjusting face parameters of the initial three-dimensional face model until the preset convergence condition is met, and obtaining the target three-dimensional reconstruction face.

Specifically, the face parameters of the initial three-dimensional face model are adjusted based on each face image and the rendered face image corresponding to each face image, so as to obtain a three-dimensional face model corresponding to the adjusted face parameters, and then the steps S205 to S207 are iteratively executed to perform optimization until the preset convergence condition is met, the adjustment of the face parameters is finished, and the three-dimensional face model corresponding to the face parameters at the end is the target three-dimensional reconstructed face.

In practical application, the iterative optimization of the face parameters can adopt a gradient descent algorithm, when each iteration is performed, the adjusted face parameters are firstly used for reconstructing a current three-dimensional face model, then the current three-dimensional face model is input into a micro-renderer, meanwhile, the field angle corresponding to each face image is input into the micro-renderer, the micro-renderer respectively renders the current three-dimensional face model into a two-dimensional image based on the field angle corresponding to each face image to obtain the current rendered face image corresponding to each face image, then the back propagation gradient is calculated based on each face image and the current rendered face image corresponding to each face image, and the face parameters are adjusted based on the back propagation gradient, and of course, other parameters needing adjustment such as gesture parameters and illumination parameters can be also included.

In some exemplary embodiments, with continued reference to fig. 4, the step S207 may include, when implemented:

s409, determining a rendering loss based on a pixel difference between each face image and the corresponding rendered face image.

Wherein the rendering penalty is used to make the rendered face image as close as possible to its corresponding face image.

In a specific implementation, a face region in each face image and a face region in a rendered face image corresponding to each face image may be determined first, so as to determine a rendering loss based on a difference between pixels in the two face regions. Illustratively, the rendering loss may be calculated by the following equation (3):

wherein L is _Render Representing rendering loss; i _i An ith face image representing a target face; i _Render,i Representing a rendered face image corresponding to the ith face image; m is M _i Representing a face region in the ith face image; n represents the total number of the plurality of face images of the target face; II indicates the norm.

S411, determining a key point loss based on the image position difference between the first face key point in each face image and the second face key point in the corresponding rendered face image.

The second face key points are pixel points where the vertices of the first face key points corresponding to the initial three-dimensional face model are projected into the rendered face image. The first face key point is any one of a plurality of face key points.

The face key points are key points on the target face, and exemplary face key points may include key points corresponding to eyes, key points corresponding to nose, key points corresponding to mouth, key points corresponding to eyebrows, key points corresponding to chin, and the like.

In a specific implementation, the face key points in each face image can be obtained by detecting the face key points of the input face image through a face key point detection network. The face key point detection network may be any neural network with face key point detection capability, such as a convolutional neural network.

The key point loss is used for enabling the image positions of the second face key point in the rendered face image and the corresponding first face key point in the face image corresponding to the rendered face image to be as close as possible. Illustratively, the keypoint loss may be calculated by the following equation (4):

wherein L is _Rendmark Representing a keypoint loss; p is p _2d,i,j Image positions (may be coordinates in a face image) of a jth face key point in an ith face image representing a target face; p is p _3d,i,j The vertex of the j-th face key point in the i-th face image corresponding to the three-dimensional face model is projected to the image position (the coordinate in the rendered face image) in the rendered face image; m represents the total number of face key points; n represents the total number of the plurality of face images of the target face; II indicates the norm.

And S413, determining comprehensive loss at least based on the rendering loss and the key point loss.

Specifically, the rendering loss and the keypoint loss may be weighted and summed to obtain a composite loss.

And S415, adjusting face parameters, initial posture parameters and initial illumination parameters of the initial three-dimensional face model based on the comprehensive loss until a preset convergence condition is reached, and obtaining the target three-dimensional reconstruction face.

Specifically, the back propagation gradient can be calculated based on the comprehensive loss, and then the face parameter, the initial attitude parameter and the initial illumination parameter can be adjusted based on the back propagation gradient, and iterative optimization is performed based on each adjusted parameter until a preset convergence condition is reached, so that a three-dimensional face model determined by the face parameter, namely the target three-dimensional reconstructed face, is obtained.

The preset convergence condition may be that the iteration number reaches a preset iteration number, the preset iteration number may be set based on practical experience, and when the number of general face images is large, the number of the iteration number may be set to be more, whereas when the number of the face images is small, the number of the iteration number may be set to be less; of course, the preset convergence condition may also be that the integrated loss reaches a preset loss threshold, or that the difference between the integrated losses corresponding to two adjacent iterations reaches a preset loss difference.

In practical applications, in order to make the reconstructed three-dimensional face approach to the average face, a rule may be added to the face parameters, and further, in the case that the face parameters include shape parameters and texture parameters, as shown in fig. 5, the step S413 may include, in addition to rendering loss and key point loss, a rule term loss corresponding to the face parameters (i.e., shape parameters and texture parameters), where the step S413 includes: determining a regularization loss based on the shape parameter and the texture parameter; determining regularization loss weights, rendering loss weights, and keypoint loss weights; and carrying out weighted summation on the regularization loss, the rendering loss and the key point loss based on the regularization loss weight, the rendering loss weight and the key point loss weight to obtain comprehensive loss.

Illustratively, the regularization loss may be calculated by the following equation (5):

wherein L is _Reg Representing regularization loss; n (N) _shape The dimension of the shape base vector in the 3DMM model; alpha _k A value of the kth dimension for the shape parameter; sigma (sigma) _shape，k The variance value corresponding to the kth dimension of the shape base vector; n (N) _tex Representing the dimension of the texture base vector in the 3DMM model; beta _k A value of a kth dimension for the texture parameter; sigma (sigma) _tex，k The variance value corresponding to the kth dimension of the texture base vector; w (w) _shape Representing the weight, w, corresponding to the shape parameter _tex The weights corresponding to the texture parameters can be set based on actual needs, such as w _shape ＝1.0，w _tex ＝0.8。

Further, the integrated loss can be expressed as the following formula (6):

L＝w _Render ·L _Render +w _Landmark ·L _Landmark +w _Reg ·L _Reg (6)

wherein w is _Render Loss weights for rendering; w (w) _Landmark Losing weight for the key point; w (w) _Reg Weights are lost for regularization. w (w) _Render For adjusting rendering loss L _Render Contribution value, w _Landmark Contribution value for adjusting key point loss, w _Reg The contribution value for adjusting the regularization loss can be specifically set based on actual needs.

In practical application, the 3DMM may further include an expression base vector when representing a three-dimensional face, and further the face parameters may further include expression parameters, so that in order to obtain an accurate three-dimensional reconstructed face when a expressive image is mixed in an input face image, when determining regularization loss based on the shape parameter and the texture parameter, the regularization loss may be determined based on the shape parameter, the texture parameter and the expression parameter.

In a specific implementation, regularization loss terms corresponding to expression parameters, namely, can be introduced into the regularization loss of the formula (5)Wherein N is _exp Representing the dimension of the expression basis vector in the 3DMM model; gamma ray _k The value of the k-th dimension of the expression parameter; gamma ray _exp，k The variance value corresponding to the kth dimension of the expression base vector is obtained; w (w) _exp Representing weights corresponding to expression parameters, e.g. accepting w _shape ＝1.0，w _tex =0.8, can be w _exp ＝1.7e-3。

As can be seen from the above technical solutions of the embodiments of the present application, in the embodiments of the present application, by calculating the actual corresponding field angle of each of the face images corresponding to the plurality of different photographing angles during photographing, and further reconstructing the three-dimensional face corresponding to the target face by using the information of the actual field angle and the multiple viewing angles, a more accurate three-dimensional face can be obtained, that is, the proportion of the five elements of the three-dimensional reconstructed face reconstructed by using the embodiments of the present application accords with the real face more. In addition, the embodiment of the application can enable the three-dimensional reconstruction process of the whole face based on the 3DMM model to be more stable, and even if a plurality of face images are face images of the target face in different illumination environments, the accurate three-dimensional face can still be reconstructed.

Corresponding to the face three-dimensional reconstruction method provided by the above embodiments, the embodiment of the present application further provides a face three-dimensional reconstruction device, and since the face three-dimensional reconstruction device provided by the embodiment of the present application corresponds to the face three-dimensional reconstruction method provided by the above embodiments, the implementation of the face three-dimensional reconstruction method is also applicable to the face three-dimensional reconstruction device provided by the embodiment, and will not be described in detail in the embodiment.

Referring to fig. 6, a schematic structural diagram of a three-dimensional face reconstruction device according to an embodiment of the present application is shown, where the device has a function of implementing the three-dimensional face reconstruction method in the above method embodiment, and the function may be implemented by hardware or implemented by executing corresponding software by hardware. As shown in fig. 6, the face three-dimensional reconstruction apparatus 600 may include:

a face image obtaining module 610, configured to obtain a plurality of face images corresponding to a plurality of shooting angles of view of a target face;

a view angle determining module 620, configured to determine a view angle corresponding to each of the face images;

the rendering module 630 is configured to render the initial three-dimensional face model into two-dimensional images based on the view angles corresponding to each face image, so as to obtain rendered face images corresponding to each face image;

the face parameter optimization module 640 is configured to adjust face parameters of the initial three-dimensional face model based on each face image and a rendered face image corresponding to each face image until a preset convergence condition is met, so as to obtain a target three-dimensional reconstructed face.

In an exemplary embodiment, the rendering module 630 includes:

the first acquisition module is used for acquiring initial face parameters;

the rendering sub-module is used for respectively rendering the initial three-dimensional face model into a two-dimensional image based on the field angle, the initial posture parameter and the initial illumination parameter corresponding to each face image to obtain a rendered face image corresponding to each face image.

In an exemplary embodiment, the face parameter optimization module 640 includes:

the key point loss determining module is used for determining key point loss based on the image position difference between the first face key point in each face image and the second face key point in the corresponding rendered face image; the second face key point is a pixel point of the first face key point projected to the rendered face image on the vertex corresponding to the initial three-dimensional face model;

In one exemplary embodiment, the view angle determination module 620 includes:

In one exemplary embodiment, the plurality of face images includes face images of the target face in different lighting environments.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

The embodiment of the application provides electronic equipment, which comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor so as to realize any face three-dimensional reconstruction method provided by the embodiment of the method.

The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.

The method embodiments provided by the embodiments of the present application may be performed in a computer terminal, a server, or a similar computing device, i.e., the electronic device may include a computer terminal, a server, or a similar computing device. Taking the operation on a terminal as an example, fig. 7 is a block diagram of a hardware structure of an electronic device for operating a face three-dimensional reconstruction method according to an embodiment of the present application, specifically:

The terminal can include RF (Radio Frequency) circuitry 710, memory 720 including one or more computer-readable storage media, input unit 730, display unit 740, sensor 750, audio circuitry 760, wiFi (wireless fidelity ) module 770, processor 780 including one or more processing cores, and power supply 790, among other components. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 7 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the RF circuit 710 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 780; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 710 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier ), a duplexer, and the like. In addition, the RF circuit 710 may also communicate with networks and other terminals through wireless communication. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service ), CDMA (Code Division Multiple Access, code division multiple access), WCDMA (Wideband Code Division Multiple Access ), LTE (Long Term Evolution, long term evolution), email, SMS (Short Messaging Service, short message service), and the like.

The memory 720 may be used to store software programs and modules, and the processor 780 may perform various functional applications and data processing by executing the software programs and modules stored in the memory 720. The memory 720 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 720 may also include a memory controller to provide access to memory 720 by processor 780 and input unit 730.

The input unit 730 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 730 may include a touch-sensitive surface 731 and other input devices 732. The touch-sensitive surface 731, also referred to as a touch display screen or touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on or thereabout the touch-sensitive surface 731 using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection device according to a pre-set program. Alternatively, touch-sensitive surface 731 may comprise two parts, a touch-detecting device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 780, and can receive commands from the processor 780 and execute them. In addition, the touch sensitive surface 731 may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 731, the input unit 730 may also include other input devices 732. In particular, the other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 740 may be used to display information input by a user or information provided to the user and various graphic user interfaces of the terminal, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 740 may include a display panel 741, and alternatively, the display panel 741 may be configured in the form of an LCD (Liquid Crystal Display ), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 731 may overlay the display panel 741, and upon detection of a touch operation thereon or thereabout by the touch-sensitive surface 731, the touch-sensitive surface 731 is passed to the processor 780 for determining the type of touch event, and the processor 780 then provides a corresponding visual output on the display panel 741 based on the type of touch event. Wherein the touch-sensitive surface 731 and the display panel 741 may be two separate components for input and input functions, but in some embodiments the touch-sensitive surface 731 may be integrated with the display panel 741 for input and output functions.

The terminal may also include at least one sensor 750, such as a light sensor, a motion sensor, and other sensors. In particular, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 741 according to the brightness of ambient light and a proximity sensor that may turn off the display panel 741 and/or the backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the device is stationary, and the device can be used for applications of recognizing the gesture of a terminal (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may be configured for the terminal are not described in detail herein.

Audio circuitry 760, speaker 761, microphone 762 may provide an audio interface between a user and the terminal. The audio circuit 760 may transmit the received electrical signal converted from audio data to the speaker 761, and the electrical signal is converted into a sound signal by the speaker 761 to be output; on the other hand, microphone 762 converts the collected sound signals into electrical signals, which are received by audio circuit 760 and converted into audio data, which are processed by audio data output processor 780 for transmission to, for example, another terminal via RF circuit 710, or which are output to memory 720 for further processing. Audio circuitry 760 may also include an ear bud jack to provide communication between a peripheral ear bud and the terminal.

WiFi belongs to a short-distance wireless transmission technology, and the terminal can help a user to send and receive e-mails, browse web pages, access streaming media and the like through the WiFi module 770, so that wireless broadband Internet access is provided for the user. Although fig. 7 shows a WiFi module 770, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as needed within the scope of not changing the essence of the invention.

The processor 780 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 720, and calling data stored in the memory 720. Optionally, the processor 780 may include one or more processing cores; preferably, the processor 780 may integrate an application processor that primarily processes operating systems, user interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 780.

The terminal also includes a power supply 790 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 780 through a power management system, such as to provide for the management of charge, discharge, and power consumption by the power management system. Power supply 790 may also include one or more of any components, such as a dc or ac power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the terminal further includes a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors. The one or more programs include instructions for performing the face three-dimensional reconstruction method provided by the method embodiment.

Embodiments of the present application also provide a computer readable storage medium, which may be disposed in an electronic device, to store at least one instruction or at least one program related to implementing a face three-dimensional reconstruction method, where the at least one instruction or the at least one program is loaded and executed by the processor to implement any of the face three-dimensional reconstruction methods provided in the foregoing method embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the electronic device executes any one of the face three-dimensional reconstruction methods provided in the above method embodiments.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims

1. A method for three-dimensional reconstruction of a human face, the method comprising:

2. The method according to claim 1, wherein the rendering the initial three-dimensional face model into two-dimensional images based on the field angle corresponding to each face image to obtain the rendered face image corresponding to each face image includes:

acquiring initial face parameters;

constructing an initial three-dimensional face model based on the initial face parameters and the basis vectors corresponding to the face parameters; the base vector is a vector obtained based on a preset face data set and used for representing basic attributes of the face;

acquiring initial attitude parameters and initial illumination parameters corresponding to each face image;

and rendering the initial three-dimensional face model into a two-dimensional image based on the view angle, the initial attitude parameter and the initial illumination parameter corresponding to each face image, and obtaining a rendered face image corresponding to each face image.

3. The method according to claim 2, wherein the adjusting the face parameters of the initial three-dimensional face model based on each face image and the rendered face image corresponding to each face image until the preset convergence condition is satisfied, to obtain the target three-dimensional reconstructed face includes:

determining a rendering loss based on pixel differences between each face image and the corresponding rendered face image;

determining a key point loss based on an image position difference between a first face key point in each face image and a second face key point in the corresponding rendered face image; the second face key points are pixel points, corresponding to the first face key points on the initial three-dimensional face model, of which the vertexes are projected to the rendered face image;

determining a composite loss based at least on the rendering loss and the keypoint loss;

and adjusting the face parameters of the initial three-dimensional face model, the initial attitude parameters and the initial illumination parameters based on the comprehensive loss until a preset convergence condition is reached, so as to obtain the target three-dimensional reconstruction face.

4. A method according to claim 3, wherein the face parameters include shape parameters and texture parameters; the determining a composite loss based at least on the rendering loss and the keypoint loss comprises:

Determining a regularization loss based on the shape parameter and texture parameter;

determining regularization loss weights, rendering loss weights, and keypoint loss weights;

and carrying out weighted summation on the regularization loss, the rendering loss and the key point loss based on the regularization loss weight, the rendering loss weight and the key point loss weight to obtain comprehensive loss.

5. The method of claim 4, wherein the face parameters further comprise expression parameters; the determining regularization loss based on the shape parameters and texture parameters includes:

determining regularization loss based on the shape parameter, texture parameter, and the expression parameter.

6. The method according to any one of claims 1 to 5, wherein determining a field angle corresponding to each of the plurality of face images includes:

acquiring a lens focal length and an image sensor size corresponding to each face image;

and determining the view angle corresponding to each face image based on the ratio of the size of the image sensor corresponding to each face image to the focal length of the lens.

7. The method of any one of claims 1-5, wherein the plurality of face images comprises face images of a target face in different lighting environments.

8. A three-dimensional reconstruction device for a human face, the device comprising:

9. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the face three-dimensional reconstruction method of any one of claims 1-7.

10. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the face three-dimensional reconstruction method of any one of claims 1 to 7.

11. A computer program, characterized in that the computer program, when being executed by a processor, implements the face three-dimensional reconstruction method according to any one of claims 1 to 7.