WO2024145935A1

WO2024145935A1 - Point cloud encoding method and apparatus, point cloud decoding method and apparatus, device, and storage medium

Info

Publication number: WO2024145935A1
Application number: PCT/CN2023/071072
Authority: WO
Inventors: 孙泽星
Original assignee: Oppo广东移动通信有限公司
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2024-07-11

Abstract

The present application provides a point cloud encoding method and apparatus, a point cloud decoding method and apparatus, a device, and a storage medium. The point cloud encoding/decoding method comprises: determining N prediction nodes of a current node in a prediction reference frame of a current frame to be encoded/decoded, and performing predictive encoding/decoding on coordinate information of points in the current node on the basis of geometry encoding/decoding information of points in the N prediction nodes. That is to say, in embodiments of the present application, a node is subjected to direct encoding/decoding DCM for optimization, the time-domain correlation between the adjacent frames is considered, and geometry information of the prediction nodes in the prediction reference frame is used to perform predictive encoding/decoding on geometry information of points in a node to be subjected to IDCM (i.e., the current node); the time-domain correlation between the adjacent frames is considered, so that the encoding/decoding efficiency of the geometry information of a point cloud is further improved.

Description

Point cloud encoding and decoding method, device, equipment and storage medium

Technical Field

The present application relates to the field of point cloud technology, and in particular to a point cloud encoding and decoding method, device, equipment and storage medium.

Background technique

The surface of the object is collected by the acquisition device to form point cloud data, which includes hundreds of thousands or even more points. In the video production process, the point cloud data is transmitted between the point cloud encoding device and the point cloud decoding device in the form of point cloud media files. However, such a large number of points brings challenges to transmission, so the point cloud encoding device needs to compress the point cloud data before transmission.

Point cloud compression is also called point cloud encoding. In the point cloud encoding process, for points that are isolated in the geometric space, the use of infer direct coding (IDCM) can greatly reduce the complexity. When the direct coding method is used to encode and decode the current node, the geometric information of the point in the current node is directly encoded. However, when encoding the geometric information of the point in the current node, the inter-frame information is not considered, which reduces the encoding and decoding performance of the point cloud.

Summary of the invention

The embodiments of the present application provide a point cloud encoding and decoding method, apparatus, device and storage medium, which take into account inter-frame information when encoding the geometric information of the node midpoint, thereby improving the encoding and decoding performance of the point cloud.

In a first aspect, an embodiment of the present application provides a point cloud decoding method, comprising:

In a prediction reference frame of a current frame to be decoded, determining N prediction nodes of a current node, wherein the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;

Based on the geometric decoding information of the N predicted nodes, the coordinate information of the midpoint of the current node is predicted and decoded.

In a second aspect, the present application provides a point cloud encoding method, comprising:

In a prediction reference frame of a current frame to be encoded, determining N prediction nodes of a current node, wherein the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;

Based on the geometric coding information of the N predicted nodes, the coordinate information of the midpoint of the current node is predicted and coded.

In a third aspect, the present application provides a point cloud decoding device for executing the method in the first aspect or its respective implementations. Specifically, the device includes a functional unit for executing the method in the first aspect or its respective implementations.

In a fourth aspect, the present application provides a point cloud encoding device for executing the method in the second aspect or its respective implementations. Specifically, the device includes a functional unit for executing the method in the second aspect or its respective implementations.

In a fifth aspect, a point cloud decoder is provided, comprising a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the first aspect or its implementation manners.

In a sixth aspect, a point cloud encoder is provided, comprising a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the second aspect or its respective implementations.

In a seventh aspect, a point cloud encoding and decoding system is provided, comprising a point cloud encoder and a point cloud decoder. The point cloud decoder is used to execute the method in the first aspect or its respective implementations, and the point cloud encoder is used to execute the method in the second aspect or its respective implementations.

In an eighth aspect, a chip is provided for implementing the method in any one of the first to second aspects or their respective implementations. Specifically, the chip includes: a processor for calling and running a computer program from a memory, so that a device equipped with the chip executes the method in any one of the first to second aspects or their respective implementations.

In a ninth aspect, a computer-readable storage medium is provided for storing a computer program, wherein the computer program enables a computer to execute the method of any one of the first to second aspects or any of their implementations.

In a tenth aspect, a computer program product is provided, comprising computer program instructions, which enable a computer to execute the method in any one of the first to second aspects or their respective implementations.

In an eleventh aspect, a computer program is provided, which, when executed on a computer, enables the computer to execute the method in any one of the first to second aspects or in each of their implementations.

In a twelfth aspect, a code stream is provided, which is generated based on the method of the second aspect. Optionally, the code stream includes at least one of the first parameter and the second parameter.

Based on the above technical solution, when decoding the current node in the current coded frame, in the predicted reference frame of the current frame to be coded, the N predicted nodes of the current node are determined, and based on the geometric coding and decoding information of the midpoints of the N predicted nodes, the coordinate information of the midpoint of the current node is predicted and coded. In other words, the embodiment of the present application optimizes the node when performing DCM direct coding and decoding, and predicts and codes the geometric information of the midpoint of the IDCM node (i.e., the current node) of the to-be-coded node by considering the correlation in the time domain between adjacent frames, and further improves the coding and decoding efficiency of the geometric information of the point cloud by considering the correlation in the time domain between adjacent frames.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1A is a schematic diagram of a point cloud;

Figure 1B is a partial enlarged view of the point cloud;

FIG2 is a schematic diagram of six viewing angles of a point cloud image;

FIG3 is a schematic block diagram of a point cloud encoding and decoding system according to an embodiment of the present application;

FIG4A is a schematic block diagram of a point cloud encoder provided in an embodiment of the present application;

FIG4B is a schematic block diagram of a point cloud decoder provided in an embodiment of the present application;

FIG5A is a schematic plan view;

FIG5B is a schematic diagram of node coding sequence;

FIG5C is a schematic diagram of a plane logo;

FIG5D is a schematic diagram of a sibling node;

FIG5E is a schematic diagram of the intersection of a laser radar and a node;

FIG5F is a schematic diagram of neighborhood nodes at the same division depth and the same coordinates;

FIG5G is a schematic diagram of a neighboring node when the node is located at a lower plane position of the parent node;

FIG5H is a schematic diagram of a neighboring node when the node is located at a high plane position of the parent node;

FIG5I is a schematic diagram of predictive coding of planar position information of a laser radar point cloud;

FIG6A is a schematic diagram of IDCM encoding;

FIG6B is a schematic diagram of coordinate transformation of a point cloud acquired by a rotating laser radar;

FIG6C is a schematic diagram of predictive coding in the X or Y axis direction;

FIG6D is a schematic diagram showing the angle of the X or Y plane predicted by the horizontal azimuth angle;

FIG6E is a schematic diagram of predictive coding of the X or Y axis;

7A to 7C are schematic diagrams of geometric information encoding based on triangular facets;

FIG8A is a schematic diagram of an encoding framework of AVS;

FIG8B is a schematic diagram of a decoding framework of AVS;

FIG9A is a schematic diagram of reference nodes selected by each sub-node;

FIG9B is a schematic diagram of four groups of reference neighbor nodes of the current node;

FIG9C is a schematic diagram showing that sub-blocks correspond to 6 adjacent parent blocks respectively;

FIG9D is a schematic diagram of 18 neighboring blocks and their Morton sequence numbers used by the current block to be encoded;

FIG9E is a simplified prediction tree diagram;

FIG10 is a schematic diagram of a point cloud decoding method flow chart provided in an embodiment of the present application;

FIG11 is a schematic diagram of octree partitioning;

FIG12 is a schematic diagram of a prediction node;

FIG13 is a schematic diagram of a domain node;

FIG14 is a schematic diagram of corresponding nodes of a domain node;

FIG15A is a schematic diagram of a prediction node of a current node in a prediction reference frame;

FIG15B is a schematic diagram of the prediction nodes of the current node in two prediction reference frames;

FIG16A is a schematic diagram of IDCM encoding;

FIG16B is a schematic diagram of IDCM decoding;

FIG17 is a schematic diagram of a point cloud encoding method flow chart provided by an embodiment of the present application;

FIG18 is a schematic block diagram of a point cloud decoding device provided in an embodiment of the present application;

FIG19 is a schematic block diagram of a point cloud encoding device provided in an embodiment of the present application;

FIG20 is a schematic block diagram of an electronic device provided in an embodiment of the present application;

Figure 21 is a schematic block diagram of the point cloud encoding and decoding system provided in an embodiment of the present application.

Detailed ways

The present application can be applied to the field of point cloud upsampling technology, for example, can be applied to the field of point cloud compression technology.

In order to facilitate understanding of the embodiments of the present application, the relevant concepts involved in the embodiments of the present application are briefly introduced as follows:

Point cloud refers to a set of irregularly distributed discrete points in space that express the spatial structure and surface properties of a three-dimensional object or three-dimensional scene. Figure 1A is a schematic diagram of a three-dimensional point cloud image, and Figure 1B is a partial enlarged view of Figure 1A. It can be seen from Figures 1A and 1B that the point cloud surface is composed of densely distributed points.

Two-dimensional images have information expressed at each pixel point, and the distribution is regular, so there is no need to record its position information; however, the distribution of points in the point cloud in three-dimensional space is random and irregular, so it is necessary to record the position of each point in space to fully express a point cloud. Similar to two-dimensional images, each position has corresponding attribute information during the acquisition process.

Point cloud data is a specific record form of point cloud. Points in the point cloud may include the location information of the point and the attribute information of the point. For example, the location information of the point may be the three-dimensional coordinate information of the point. The location information of the point may also be called the geometric information of the point. For example, the attribute information of the point may include color information, reflectance information, normal vector information, etc. Color information reflects the color of an object, and reflectance information reflects the surface material of an object. The color information may be information in any color space. For example, the color information may be (RGB). For another example, the color information may be information about brightness and chromaticity (YcbCr, YUV). For example, Y represents brightness (Luma), Cb (U) represents blue color difference, Cr (V) represents red, and U and V represent chromaticity (Chroma) for describing color difference information. For example, according to the point cloud obtained by the laser measurement principle, the points in the point cloud may include the three-dimensional coordinate information of the point and the laser reflection intensity (reflectance) of the point. For another example, according to the point cloud obtained by the photogrammetry principle, the points in the point cloud may include the three-dimensional coordinate information of the point and the color information of the point. For another example, a point cloud is obtained by combining the principles of laser measurement and photogrammetry. The points in the point cloud may include the three-dimensional coordinate information of the point, the laser reflection intensity (reflectance) of the point, and the color information of the point. FIG2 shows a point cloud image, where FIG2 shows six viewing angles of the point cloud image. Table 1 shows the point cloud data storage format composed of a file header information part and a data part:

Table 1

In Table 1, the header information includes the data format, data representation type, the total number of point cloud points, and the content represented by the point cloud. For example, the point cloud in this example is in the ".ply" format, represented by ASCII code, with a total number of 207242 points, and each point has three-dimensional position information XYZ and three-dimensional color information RGB.

Point clouds can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes. Point clouds are obtained by directly sampling real objects, so they can provide a strong sense of reality while ensuring accuracy. Therefore, they are widely used, including virtual reality games, computer-aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive remote presentation, and three-dimensional reconstruction of biological tissues and organs.

Point cloud data can be obtained by at least one of the following ways: (1) computer equipment generation. Computer equipment can generate point cloud data based on virtual three-dimensional objects and virtual three-dimensional scenes. (2) 3D (3-Dimension) laser scanning acquisition. 3D laser scanning can be used to obtain point cloud data of static real-world three-dimensional objects or three-dimensional scenes, and millions of point cloud data can be obtained per second; (3) 3D photogrammetry acquisition. The visual scene of the real world is collected by 3D photography equipment (i.e., a group of cameras or camera equipment with multiple lenses and sensors) to obtain point cloud data of the visual scene of the real world. 3D photography can be used to obtain point cloud data of dynamic real-world three-dimensional objects or three-dimensional scenes. (4) Point cloud data of biological tissues and organs can be obtained by medical equipment. In the medical field, point cloud data of biological tissues and organs can be obtained by medical equipment such as magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information.

Point clouds can be divided into dense point clouds and sparse point clouds according to the way they are acquired.

Point clouds are divided into the following types according to the time series of the data:

The first type of static point cloud: the object is stationary, and the device that obtains the point cloud is also stationary;

The second type of dynamic point cloud: the object is moving, but the device that obtains the point cloud is stationary;

The third type of dynamic point cloud acquisition: the device that acquires the point cloud is moving.

Point clouds can be divided into two categories according to their uses:

Category 1: Machine perception point cloud, which can be used in autonomous navigation systems, real-time inspection systems, geographic information systems, visual sorting robots, disaster relief robots, etc.

Category 2: Point cloud perceived by the human eye, which can be used in point cloud application scenarios such as digital cultural heritage, free viewpoint broadcasting, 3D immersive communication, and 3D immersive interaction.

The above point cloud acquisition technology reduces the cost and time of point cloud data acquisition and improves the accuracy of data. The change in the point cloud data acquisition method makes it possible to acquire a large amount of point cloud data. With the growth of application demand, the processing of massive 3D point cloud data encounters bottlenecks of storage space and transmission bandwidth.

Taking a point cloud video with a frame rate of 30fps (frames per second) as an example, the number of points in each point cloud frame is 700,000, and each point has coordinate information xyz (float) and color information RGB (uchar). The data volume of a 10s point cloud video is approximately 0.7 million X (4ByteX3+1ByteX3) X 30fpsX10s = 3.15GB, while the YUV sampling format is 4:2:0, and the frame rate is 24fps. The data volume of a 1280X720 two-dimensional video in 10s is approximately 1280X720X12bitX24framesX10s≈0.33GB, and the data volume of a 10s two-view 3D video is approximately 0.33X2 = 0.66GB. It can be seen that the data volume of a point cloud video far exceeds that of a two-dimensional video and a three-dimensional video of the same length. Therefore, in order to better realize data management, save server storage space, and reduce the transmission traffic and transmission time between the server and the client, point cloud compression has become a key issue in promoting the development of the point cloud industry.

The following is an introduction to the relevant knowledge of point cloud encoding and decoding.

FIG3 is a schematic block diagram of a point cloud encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG3 is only an example, and the point cloud encoding and decoding system of the embodiment of the present application includes but is not limited to that shown in FIG3. As shown in FIG3, the point cloud encoding and decoding system 100 includes an encoding device 110 and a decoding device 120. The encoding device is used to encode (which can be understood as compression) the point cloud data to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded point cloud data.

The encoding device 110 of the embodiment of the present application can be understood as a device with a point cloud encoding function, and the decoding device 120 can be understood as a device with a point cloud decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, such as smartphones, desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, point cloud game consoles, vehicle-mounted computers, etc.

In some embodiments, the encoding device 110 may transmit the encoded point cloud data (such as a code stream) to the decoding device 120 via the channel 130. The channel 130 may include one or more media and/or devices capable of transmitting the encoded point cloud data from the encoding device 110 to the decoding device 120.

In one example, the channel 130 includes one or more communication media that enable the encoding device 110 to transmit the encoded point cloud data directly to the decoding device 120 in real time. In this example, the encoding device 110 can modulate the encoded point cloud data according to the communication standard and transmit the modulated point cloud data to the decoding device 120. The communication medium includes a wireless communication medium, such as a radio frequency spectrum, and optionally, the communication medium may also include a wired communication medium, such as one or more physical transmission lines.

In another example, the channel 130 includes a storage medium, which can store the point cloud data encoded by the encoding device 110. The storage medium includes a variety of locally accessible data storage media, such as optical disks, DVDs, flash memories, etc. In this example, the decoding device 120 can obtain the encoded point cloud data from the storage medium.

In another example, the channel 130 may include a storage server that can store the point cloud data encoded by the encoding device 110. In this example, the decoding device 120 can download the stored encoded point cloud data from the storage server. Optionally, the storage server can store the encoded point cloud data and transmit the encoded point cloud data to the decoding device 120, such as a web server (e.g., for a website), a file transfer protocol (FTP) server, etc.

In some embodiments, the encoding device 110 includes a point cloud encoder 112 and an output interface 113. The output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.

In some embodiments, the encoding device 110 may further include a point cloud source 111 in addition to the point cloud encoder 112 and the input interface 113 .

The point cloud source 111 may include at least one of a point cloud acquisition device (e.g., a scanner), a point cloud archive, a point cloud input interface, and a computer graphics system, wherein the point cloud input interface is used to receive point cloud data from a point cloud content provider, and the computer graphics system is used to generate point cloud data.

The point cloud encoder 112 encodes the point cloud data from the point cloud source 111 to generate a code stream. The point cloud encoder 112 transmits the encoded point cloud data directly to the decoding device 120 via the output interface 113. The encoded point cloud data can also be stored in a storage medium or a storage server for subsequent reading by the decoding device 120.

In some embodiments, the decoding device 120 includes an input interface 121 and a point cloud decoder 122 .

In some embodiments, the decoding device 120 may further include a display device 123 in addition to the input interface 121 and the point cloud decoder 122 .

The input interface 121 includes a receiver and/or a modem. The input interface 121 can receive the encoded point cloud data through the channel 130 .

The point cloud decoder 122 is used to decode the encoded point cloud data to obtain decoded point cloud data, and transmit the decoded point cloud data to the display device 123.

The decoded point cloud data is displayed on the display device 123. The display device 123 may be integrated with the decoding device 120 or may be external to the decoding device 120. The display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.

In addition, Figure 3 is only an example, and the technical solution of the embodiment of the present application is not limited to Figure 3. For example, the technology of the present application can also be applied to unilateral point cloud encoding or unilateral point cloud decoding.

The current point cloud encoder can adopt two point cloud compression coding technology routes proposed by the International Standards Organization Moving Picture Experts Group (MPEG), namely Video-based Point Cloud Compression (VPCC) and Geometry-based Point Cloud Compression (GPCC). VPCC projects the three-dimensional point cloud into two dimensions and uses the existing two-dimensional coding tools to encode the projected two-dimensional image. GPCC uses a hierarchical structure to divide the point cloud into multiple units step by step, and encodes the entire point cloud by encoding the division process.

The following uses the GPCC encoding and decoding framework as an example to explain the point cloud encoder and point cloud decoder applicable to the embodiments of the present application.

FIG. 4A is a schematic block diagram of a point cloud encoder provided in an embodiment of the present application.

From the above, we can know that the points in the point cloud can include the location information of the points and the attribute information of the points. Therefore, the encoding of the points in the point cloud mainly includes location encoding and attribute encoding. In some examples, the location information of the points in the point cloud is also called geometric information, and the corresponding location encoding of the points in the point cloud can also be called geometric encoding.

In the GPCC coding framework, the geometric information of the point cloud and the corresponding attribute information are encoded separately.

As shown in FIG. 4A below, the current geometric coding and decoding of G-PCC can be divided into octree-based geometric coding and decoding and prediction tree-based geometric coding and decoding.

The process of position coding includes: preprocessing the points in the point cloud, such as coordinate transformation, quantization, and removal of duplicate points; then, geometric coding the preprocessed point cloud, such as constructing an octree, or constructing a prediction tree, and geometric coding based on the constructed octree or prediction tree to form a geometric code stream. At the same time, based on the position information output by the constructed octree or prediction tree, the position information of each point in the point cloud data is reconstructed to obtain the reconstructed value of the position information of each point.

The attribute encoding process includes: given the reconstruction information of the input point cloud position information and the original value of the attribute information, selecting one of the three prediction modes for point cloud prediction, quantizing the predicted result, and performing arithmetic coding to form an attribute code stream.

As shown in Figure 4A, position encoding can be achieved by the following units:

Coordinate transformation (Tanmsform coordinates) unit 201, voxel (Voxelize) unit 202, octree partition (Analyze octree) unit 203, geometry reconstruction (Reconstruct geometry) unit 204, arithmetic encoding (Arithmetic enconde) unit 205, surface fitting unit (Analyze surface approximation) 206 and prediction tree construction unit 207.

The coordinate conversion unit 201 can be used to convert the world coordinates of the point in the point cloud into relative coordinates. For example, the geometric coordinates of the point are respectively subtracted from the minimum value of the xyz coordinate axis, which is equivalent to a DC removal operation, so as to realize the conversion of the coordinates of the point in the point cloud from the world coordinates to the relative coordinates.

The voxel unit 202 is also called a quantize and remove points unit, which can reduce the number of coordinates by quantization; after quantization, originally different points may be assigned the same coordinates, based on which, duplicate points can be deleted by deduplication operation; for example, multiple clouds with the same quantized position and different attribute information can be merged into one cloud by attribute conversion. In some embodiments of the present application, the voxel unit 202 is an optional unit module.

The octree division unit 203 may use an octree encoding method to encode the position information of the quantized points. For example, the point cloud is divided in the form of an octree, so that the position of the point can correspond to the position of the octree one by one, and the position of the point in the octree is counted and its flag is recorded as 1 to perform geometric encoding.

In some embodiments, in the process of geometric information encoding based on triangle soup (trisoup), the point cloud is also divided into octrees by the octree division unit 203. However, different from the geometric information encoding based on the octree, the trisoup does not need to divide the point cloud into unit cubes with a side length of 1X1X1 step by step, but stops dividing when the block (sub-block) has a side length of W. Based on the surface formed by the distribution of the point cloud in each block, at most twelve vertices (intersections) generated by the surface and the twelve edges of the block are obtained, and the intersections are surface fitted by the surface fitting unit 206, and the fitted intersections are geometrically encoded.

The prediction tree construction unit 207 can use the prediction tree encoding method to encode the position information of the quantized points. For example, the point cloud is divided in the form of a prediction tree, so that the position of the point can correspond to the position of the node in the prediction tree one by one. By counting the positions of the points in the prediction tree, different prediction modes are selected to predict the geometric position information of the node to obtain the prediction residual, and the geometric prediction residual is quantized using the quantization parameter. Finally, through continuous iteration, the prediction residual of the prediction tree node position information, the prediction tree structure and the quantization parameter are encoded to generate a binary code stream.

The geometric reconstruction unit 204 can perform position reconstruction based on the position information output by the octree division unit 203 or the intersection points fitted by the surface fitting unit 206 to obtain the reconstructed value of the position information of each point in the point cloud data. Alternatively, the position reconstruction can be performed based on the position information output by the prediction tree construction unit 207 to obtain the reconstructed value of the position information of each point in the point cloud data.

The arithmetic coding unit 205 can use entropy coding to perform arithmetic coding on the position information output by the octree analysis unit 203 or the intersection points fitted by the surface fitting unit 206, or the geometric prediction residual values output by the prediction tree construction unit 207 to generate a geometric code stream; the geometric code stream can also be called a geometry bitstream.

Attribute encoding can be achieved through the following units:

A color conversion (Transform colors) unit 210, a recoloring (Transfer attributes) unit 211, a Region Adaptive Hierarchical Transform (RAHT) unit 212, a Generate LOD (Generate LOD) unit 213, a lifting (lifting transform) unit 214, a Quantize coefficients (Quantize coefficients) unit 215 and an arithmetic coding unit 216.

It should be noted that the point cloud encoder 200 may include more, fewer, or different functional components than those shown in FIG. 4A .

The color conversion unit 210 may be used to convert the RGB color space of the points in the point cloud into a YCbCr format or other formats.

The recoloring unit 211 recolors the color information using the reconstructed geometric information so that the uncoded attribute information corresponds to the reconstructed geometric information.

After the original value of the attribute information of the point is converted by the recoloring unit 211, any transformation unit can be selected to transform the points in the point cloud. The transformation unit may include: RAHT transformation 212 and lifting (lifting transform) unit 214. Among them, the lifting transformation depends on generating a level of detail (LOD).

Any of the RAHT transformation and the lifting transformation can be understood as being used to predict the attribute information of a point in a point cloud to obtain a predicted value of the attribute information of the point, and then obtain a residual value of the attribute information of the point based on the predicted value of the attribute information of the point. For example, the residual value of the attribute information of the point can be the original value of the attribute information of the point minus the predicted value of the attribute information of the point.

In one embodiment of the present application, the process of generating LOD by the LOD generating unit includes: obtaining the Euclidean distance between points according to the position information of the points in the point cloud; and dividing the points into different detail expression layers according to the Euclidean distance. In one embodiment, the Euclidean distances can be sorted and the Euclidean distances in different ranges can be divided into different detail expression layers. For example, a point can be randomly selected as the first detail expression layer. Then the Euclidean distances between the remaining points and the point are calculated, and the points whose Euclidean distances meet the first threshold requirement are classified as the second detail expression layer. The centroid of the points in the second detail expression layer is obtained, and the Euclidean distances between the points other than the first and second detail expression layers and the centroid are calculated, and the points whose Euclidean distances meet the second threshold are classified as the third detail expression layer. By analogy, all points are classified into the detail expression layer. By adjusting the threshold of the Euclidean distance, the number of points in each LOD layer can be increased. It should be understood that the LOD division method can also be adopted in other ways, and the present application does not limit this.

It should be noted that the point cloud may be directly divided into one or more detail expression layers, or the point cloud may be first divided into a plurality of point cloud slices, and then each point cloud slice may be divided into one or more LOD layers.

For example, the point cloud can be divided into multiple point cloud blocks, and the number of points in each point cloud block can be between 550,000 and 1.1 million. Each point cloud block can be regarded as a separate point cloud. Each point cloud block can be divided into multiple detail expression layers, and each detail expression layer includes multiple points. In one embodiment, the detail expression layer can be divided according to the Euclidean distance between points.

The quantization unit 215 may be used to quantize the residual value of the attribute information of the point. For example, if the quantization unit 215 is connected to the RAHT transformation unit 212, the quantization unit 215 may be used to quantize the residual value of the attribute information of the point output by the RAHT transformation unit 212.

The arithmetic coding unit 216 may use zero run length coding to perform entropy coding on the residual value of the attribute information of the point to obtain an attribute code stream. The attribute code stream may be bit stream information.

Figure 4B is a schematic block diagram of a point cloud decoder provided in an embodiment of the present application.

As shown in Fig. 4B, the decoder 300 can obtain the point cloud code stream from the encoding device, and obtain the position information and attribute information of the points in the point cloud by parsing the code. The decoding of the point cloud includes position decoding and attribute decoding.

The process of position decoding includes: performing arithmetic decoding on the geometric code stream; merging after building the octree, reconstructing the position information of the point to obtain the reconstructed information of the point position information; performing coordinate transformation on the reconstructed information of the point position information to obtain the point position information. The point position information can also be called the geometric information of the point.

The attribute decoding process includes: obtaining the residual value of the attribute information of the point in the point cloud by parsing the attribute code stream; obtaining the residual value of the attribute information of the point after dequantization by dequantizing the residual value of the attribute information of the point; based on the reconstruction information of the point position information obtained in the position decoding process, selecting one of the following RAHT inverse transform and lifting inverse transform to predict the point cloud to obtain the predicted value, and adding the predicted value to the residual value to obtain the reconstructed value of the attribute information of the point; performing color space inverse conversion on the reconstructed value of the attribute information of the point to obtain the decoded point cloud.

As shown in FIG4B , position decoding can be achieved by the following units:

Arithmetic decoding unit 301, octree reconstruction (synthesize octree) unit 302, surface reconstruction unit (Synthesize suface approximation) 303, geometry reconstruction (Reconstruct geometry) unit 304, inverse transform coordinates (inverse transform coordinates) unit 305 and prediction tree reconstruction unit 306.

Attribute encoding can be achieved through the following units:

Arithmetic decoding unit 310, inverse quantize unit 311, RAHT inverse transform unit 312, generate LOD unit 313, inverse lifting unit 314 and inverse trasform colors unit 315.

It should be noted that decompression is the inverse process of compression. Similarly, the functions of each unit in the decoder 300 can refer to the functions of the corresponding units in the encoder 200. In addition, the point cloud decoder 300 may include more, fewer or different functional components than those in FIG. 4B.

For example, the decoder 300 may divide the point cloud into multiple LODs according to the Euclidean distance between points in the point cloud; then, the attribute information of the points in the LODs is decoded in sequence; for example, the number of zeros (zero_cnt) in the zero-run encoding technique is calculated to decode the residual based on zero_cnt; then, the decoding framework 200 may perform inverse quantization based on the decoded residual value, and obtain the reconstruction value of the point cloud based on the addition of the inverse quantized residual value and the predicted value of the current point, until all point clouds are decoded. The current point will be used as the nearest point of the subsequent LOD point, and the attribute information of the subsequent point will be predicted using the reconstruction value of the current point.

The above is the basic process of the point cloud codec based on the GPCC codec framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the basic process of the point cloud codec based on the GPCC codec framework, but is not limited to the framework and process.

The following introduces octree-based geometric coding and prediction tree-based geometric coding.

Octree-based geometric encoding includes: first, coordinate transformation of geometric information so that all point clouds are contained in a bounding box. Then quantization is performed. This step of quantization mainly plays a role of scaling. Due to quantization rounding, the geometric information of some points is the same. Whether to remove duplicate points is determined based on parameters. The process of quantization and removal of duplicate points is also called voxelization. Next, the bounding box is continuously divided into trees (octree/quadtree/binary tree) in the order of breadth-first traversal, and the placeholder code of each node is encoded. In an implicit geometric division method, the bounding box of the point cloud is first calculated.

Assume that the bounding box of _dx > _dy > _dz corresponds to a cuboid. During geometric partitioning, binary tree partitioning will be performed based on the x-axis to obtain two child nodes. When the condition of _dx = _dy > _dz is met, quadtree partitioning will be performed based on the x-axis and y-axis to obtain four child nodes. When the condition of _dx = _dy = _dz is finally met, octree partitioning will be performed until the leaf node obtained by partitioning is a 1x1x1 unit cube. The partitioning will be stopped, and the points in the leaf node will be encoded to generate a binary code stream. In the process of binary tree/quadtree/octree partitioning, two parameters are introduced: K and M. Parameter K indicates the maximum number of binary tree/quadtree partitioning before octree partitioning; parameter M is used to indicate that the minimum block side length corresponding to binary tree/quadtree partitioning is ^2M . At the same time, K and M must meet the following conditions: Assuming d _max = max(d _x , _dy , d _z ), d _min = min(d _x , _dy , d _z ), parameter K satisfies: K＞＝d _max -d _min ; parameter M satisfies: M＞＝d _min . The reason why parameters K and M meet the above conditions is that in the process of geometric implicit partitioning of G-PCC, the priority of partitioning is binary tree, quadtree and octree. When the node block size does not meet the conditions of binary tree/quadtree, the node will be partitioned into octree until it is partitioned into the smallest unit of leaf node 1X1X1.

The octree-based geometric information encoding mode can effectively encode the geometric information of the point cloud by utilizing the correlation between adjacent points in space. However, for some relatively flat nodes or nodes with planar characteristics, the encoding efficiency of the point cloud geometric information can be further improved by using plane coding.

For example, as shown in FIG5A , the (a) series belongs to the low plane position in the Z-axis direction, and the (b) series belongs to the high plane position in the Z-axis direction. Taking (a) as an example, it can be seen that the four occupied subnodes in the current node are all located in the low plane position of the current node in the Z-axis direction, so it can be considered that the current node belongs to a Z plane and is a low plane in the Z-axis direction. Similarly, (b) indicates that the occupied subnodes in the current node are located in the high plane position of the current node in the Z-axis direction.

Taking (a) as an example, the efficiency of octree coding and plane coding is compared. As shown in Figure 5B, if the octree coding method is used for (a) in Figure 1, the placeholder information of the current node is represented as: 11001100. However, if the plane coding method is used, first, an identifier needs to be encoded to indicate that the current node is a plane in the Z-axis direction. Secondly, if the current node is a plane in the Z-axis direction, the plane position of the current node needs to be represented. Secondly, only the placeholder information of the low plane node in the Z-axis direction needs to be encoded (that is, the placeholder information of the four child nodes 0246). Therefore, based on the plane coding method, only 6 bits need to be encoded to encode the current node, which can reduce the representation of 2 bits compared to the original octree coding. Based on this analysis, plane coding has a more obvious coding efficiency than octree coding. Therefore, for an occupied node, if a plane encoding method is used for encoding in a certain dimension, as shown in FIG5C , firstly, the plane identification (planarMode) and plane position (PlanePos) information of the current node in the dimension need to be represented, and secondly, the occupancy information of the current node is encoded based on the plane information of the current node. It should be noted that: PlaneMode _i (i＝0,1,2): 0 represents that the current node is not a plane in the i-axis direction. When the node is a plane in the i-axis direction, PlanePosition _i : 0 represents that the current node is a plane in the i-axis direction, and the plane position is a low plane, and 1 represents that the current node is a high plane in the i-axis direction. Exemplarily, i＝0 represents the X-axis, i＝1 represents the Y-axis, and i＝2 represents the Z-axis.

The following is a detailed introduction to the current G-PCC standard for determining whether a node meets the plane coding conditions and predictive coding of the node plane identification and plane position information when the node meets the plane coding conditions.

Currently, there are three types of judgment conditions in G-PCC to determine whether a node meets the plane coding criteria. The following describes them one by one:

The first type: judge based on the plane probability of the node in each dimension.

First, determine the local area density (local_node_density) of the current node and the probability Prob(i) of the current node in each dimension.

When the local area density of the node is less than the threshold Th (Th = 3), the plane probability Prob (i) of the current node in three dimensions is compared with the thresholds Th0, Th1 and Th2, where Th0 < Th1 < Th2 (Th0 = 0.6, Th1 = 0.77, Th2 = 0.88). Eligible _i (i = 0, 1, 2) is used below to indicate whether plane coding is enabled in each dimension, where the judgment process of Eligible _i is shown in formula (1). For example, if Eligible _i > = threshold, it means that plane coding is enabled in the i-th dimension:

Eligible _i = Prob(i)> = threshold (1)

It should be noted that the threshold is adaptively changed. For example, when Prob(0)>Prob(1)>Prob(2), the threshold value is as shown in formula (2):

Eligible ₀ = Prob(0) > = Th0

Eligible ₁ = Prob(1) > = Th1

Eligible ₂ = Prob(2) > = Th2 (2)

The following describes the update process of local_node_density and the update of Prob(i).

In one example, Prob(i) is updated by the following formula (3):

Prob(i) _new = (L x Prob(i) + δ(coded node)) / L + 1 (3)

Where L=255, when the coded node is a plane, it is 1, otherwise it is 0.

In one example, local_node_density is updated by the following formula (4):

local_node_density _new = local_node_density + 4*numSiblings (4)

Among them, local_node_density is initialized to 4, numSiblings is the number of siblings of the node, as shown in Figure 5D, the current node is the left node, the right node is the sibling of the current node, then the number of siblings of the current node is 5 (including itself).

The second method: Determine whether the current layer nodes meet the plane coding requirements based on the point cloud density of the current layer.

The density of the points in the current layer is used to determine whether to perform plane coding on the nodes in the current layer. Assuming that the number of points in the current point cloud to be coded is pointCount, the number of points reconstructed after IDCM coding is numPointCountRecon, and because the octree is coded in the order of breadth-first traversal, the number of nodes to be coded in the current layer can be obtained as nodeCount. It is assumed that planarEligibleKOctreeDepth is used to indicate whether the current layer starts plane coding. The judgment process of planarEligibleKOctreeDepth is shown in formula (5):

planarEligibleKOctreeDepth＝(pointCount-numPointCountRecon)<nodeCount*1.3 (5)

When planarEligibleKOctreeDepth is true, all nodes in the current layer are plane coded; otherwise, no plane coding is performed and only octree coding is used.

The third method is to determine whether the current node meets the plane coding requirements based on the acquisition parameters of the lidar point cloud.

As shown in Figure 5E, it can be seen that the large cube node on the top is traversed by two lasers at the same time, so the current node is not a plane in the vertical direction of the Z axis, and the small cube node on the bottom is small enough that it cannot be traversed by two nodes at the same time, so it may be a plane. Therefore, based on the number of lasers corresponding to the current node, it can be judged whether the current node meets the plane coding.

The following will introduce the predictive coding of plane identification information and plane position information for nodes that currently meet the plane coding conditions.

1. Predictive Coding of Plane Marking Information

Currently, three contexts are used to encode the plane identification information, that is, the plane representation in each dimension is separately designed in context.

The encoding of the planar position information of non-lidar point clouds and lidar point clouds is introduced separately below.

1) Coding of non-lidar point cloud planar position information

1. Predictive coding of planar position information.

The plane position information is predictively coded based on the following information:

(1) Using the occupancy information of neighboring nodes, the plane position information of the current node is predicted to be three elements: predicted as a low plane, predicted as a high plane, and unpredictable;

(2) The spatial distance between the nodes at the same partition depth and the same coordinates as the current node and the current node is “close” or “far”;

(3) The plane position of the node at the same partition depth and the same coordinates as the current node;

(4) Coordinate dimension (i=0, 1, 2).

As shown in Figure 5F, the current node to be encoded is the left node, then the neighboring node is searched for as the right node at the same octree partition depth level and the same vertical coordinate, the distance between the two nodes is judged as "near" and "far", and the plane position of the reference node is used.

In one example, as shown in FIG5G , the black node is the current node. If the current node is located at the lower plane of the parent node, the plane position of the current node is determined in the following manner:

a) If any of the child nodes 4 to 7 of the oblique line node is occupied, and all the dot nodes are not occupied, it is very likely that there is a plane in the current node, and the plane is at a lower position.

b) If the child nodes 4 to 7 of the oblique line node are not occupied, and any dot node is occupied, it is very likely that there is a plane in the current node, and the plane is at a higher position.

c) If the child nodes 4 to 7 of the oblique line node are all empty nodes and the dot nodes are all empty nodes, the plane position cannot be inferred and is therefore marked as unknown.

If any of the children 4 to 7 of the dashed node are occupied and any of the dotted nodes are occupied, the plane position cannot be inferred and is therefore marked as unknown.

In another example, as shown in FIG5H , the black node is the current node. If the node is at a high plane position of the parent node, the plane position of the current node is determined in the following manner:

a) If any of the child nodes 4 to 7 of the dot node is occupied, and the dashed node is not occupied, it is very likely that there is a plane in the current node, and the plane position is lower.

b) If the child nodes 4 to 7 of the dot node are not occupied, but the oblique line node is occupied, it is very likely that there is a plane in the current node, and the plane position is relatively high.

c) If the child nodes 4 to 7 of the dot node are all unoccupied, and the slash node is unoccupied, the plane position cannot be inferred, so it is marked as unknown.

d) If one of the child nodes 4-7 of the dot node is occupied, and the slash node is occupied, the plane position cannot be inferred and is therefore marked as unknown.

2) Coding of planar position information of laser radar point cloud

Figure 5I is the predictive coding of the plane position information of the laser radar point cloud. The plane position of the current node is predicted by using the laser radar acquisition parameters, and the position is quantized into four intervals by using the position where the current node intersects with the laser ray, and finally used as the context of the plane position of the current node. The specific calculation process is as follows: Assuming that the coordinates of the laser radar are (x _Lidar , y _Lidar , z _Lidar ), and the geometric coordinates of the current point are (x, y, z), first calculate the vertical tangent value tanθ of the current point relative to the laser radar. The calculation process is shown in formula (6):

Since each laser has a certain offset angle relative to the laser radar, the relative tangent value tanθ _corr,L of the current node relative to the laser is calculated. The specific calculation process is shown in formula (7):

Finally, the corrected tangent value of the current node is used to predict the plane position of the current node. Specifically, assuming that the tangent value of the lower boundary of the current node is tan(θ bottom ), and the tangent value of the upper boundary is tan(θ top ), the plane position is quantized into 4 quantization intervals according to tanθ _corr,L , which is the context of the plane position.

However, the octree-based geometric information coding mode only has an efficient compression rate for points with correlation in space. For points in isolated positions in the geometric space, the use of the direct coding model (DCM) can greatly reduce the complexity. For all nodes in the octree, the use of DCM is not indicated by the flag information, but is inferred by the parent node and neighbor information of the current node. There are three ways to determine whether the current node is eligible for DCM coding, as shown in Figure 6A:

(1) The current node has no sibling child nodes, that is, the parent node of the current node has only one child node, and the parent node of the parent node of the current node has only two occupied child nodes, that is, the current node has at most one neighbor node.

(2) The parent node of the current node has only one child node, the current node. At the same time, the six neighbor nodes that share a face with the current node are also empty nodes.

(3) The number of sibling nodes of the current node is greater than 1.

If the current node does not have the DCM coding qualification, it will be divided into octrees. If it has the DCM coding qualification, the number of points contained in the node will be further determined. When the number of points is less than the threshold 2, the node will be DCM-encoded, otherwise the octree division will continue. When the DCM coding mode is applied, it is first necessary to encode whether the current node is a true isolated point, that is, IDCM_flag. When IDCM_flag is true, the current node is encoded using DCM, otherwise it is still encoded using octrees. When the current node meets the DCM coding requirements, it is necessary to encode the DCM coding mode of the current node. There are currently two DCM modes: 1: only one point exists (or multiple points, but they are repeated points); 2: contains two points. Finally, it is necessary to encode the geometric information of each point. Assuming that the side length of the node is ^2d , d bits are required to encode each component of the geometric coordinates of the node, and the bit information is directly encoded into the bit stream. It should be noted here that when encoding the lidar point cloud, the three-dimensional coordinate information is predictively encoded by using the lidar acquisition parameters, so as to further improve the coding efficiency of the geometric information.

Next, the IDCM encoding process is introduced in detail:

When the current node satisfies the direct coding mode (DCM), the number of points numPoints of the current node is first encoded, and the number of points of the current node is encoded according to different DirectModes, including the following methods:

1. If the current node does not meet the requirements of the DCM node, exit directly (that is, the number of points is greater than 2 points and is not a duplicate point).

2. If the number of points numPonts in the current node is less than or equal to 2, the encoding process is as follows:

1) First, encode whether the numPonts of the current node is greater than 1;

2) If the current node has only one point and the geometry coding environment is geometry lossless coding, it is necessary to encode that the second point of the current node is not a duplicate point.

3. If the number of points numPonts contained in the current node is greater than 2, the encoding process is as follows:

1) First, encode the numPonts of the current node to be less than or equal to 1;

2) Secondly, encode whether the second point of the current node is a repeated point, and then encode whether the number of repeated points of the current node is greater than 1. When the number of repeated points is greater than 1, it is necessary to perform exponential Columbus decoding on the remaining number of repeated points.

After encoding the number of points in the current node, the coordinate information of the points contained in the current node is encoded. The following will introduce the lidar point cloud and the human eye point cloud separately.

Point cloud for human eyes

1) If the current node contains only one point, the geometric information of the point in three dimensions will be directly encoded (Bypass coding).

2) If the current node contains two points, the priority coded coordinate axis dirextAxis will be obtained by using the geometric coordinates of the points. It should be noted that the currently compared coordinate axes only include the x and y axes, not the z axis. Assuming that the geometric coordinates of the current node are nodePos, the priority coded coordinate axis is determined by the method shown in formula (8):

dirextAxis=! (nodePos[0]<nodePos[1]) (8)

That is to say, the axis with the smaller node coordinate geometric position is used as the coordinate axis dirextAxis for priority encoding.

Secondly, the geometry information of the dirextAxis coordinate axis of the priority coding is first encoded as follows, assuming that the bit depth of the geometry to be encoded corresponding to the priority coding axis is nodeSizeLog2, and assuming that the coordinates of the two points are pointPos[0] and pointPos[1] respectively:

After encoding the priority encoding axis dirextAxis, the geometric coordinates of the current point are directly encoded. Assuming that the remaining encoding bit depth of each point is nodeSizeLog2, the specific encoding process is as follows:

for(int axisIdx＝0；axisIdx<3；++axisIdx)

for(int mask＝(1<<nodeSizeLog2[axisIdx])>>1；mask；mask>>1)

encodePosBit(！！(pointPos[axisIdx]&mask));

For LiDAR point clouds

1) If the current node contains two points, the priority coded coordinate axis dirextAxis will be obtained by using the geometric coordinates of the points. Assuming that the geometric coordinates of the current node are nodePos, the priority coded coordinate axis is determined by the method shown in formula (9):

dirextAxis=! (nodePos[0]<nodePos[1]) (9)

That is to say, the axis with the smaller node coordinate geometric position is used as the coordinate axis dirextAxis for priority encoding. It should be noted here that the currently compared coordinate axes only include the x and y axes, but not the z axis.

After encoding the priority encoding axis dirextAxis, the geometric coordinates of the current point are encoded.

Since the laser radar point cloud can obtain the acquisition parameters of the laser radar point cloud, the geometric coordinate information of the current node can be predicted, so as to further improve the efficiency of the geometric information encoding of the point cloud. Similarly, the geometric information nodePos of the current node is first used to obtain a directly encoded main axis direction, and then the geometric information of the encoded direction is used to predict the geometric information of another dimension. Also, assuming that the axis direction of the direct encoding is directAxis, and assuming that the bit depth to be encoded in the direct encoding is nodeSizeLog2, the encoding method is as follows:

for(int mask＝(1<<nodeSizeLog2)>>1；mask；mask>>1)

encodePosBit(！！(pointPos[directAxis]&mask));

It should be noted here that all geometric accuracy information in the directAxis direction will be encoded here.

After encoding all the precision of the directAxis coordinate direction, the LaserIdx corresponding to the current point, i.e. pointLaserIdx in Figure 6B, will be calculated first, and the LaserIdx of the current node, i.e. nodeLaserIdx, will be calculated. Then the LaserIdx of the node, i.e. nodeLaserIdx, will be used to predictively encode the LaserIdx of the point, i.e. pointLaserIdx. The calculation method of the LaserIdx of the node or point is as follows:

Assume that the geometric coordinates of the point are pointPos, the starting coordinates of the laser ray are LidarOrigin, and the number of Lasers is LaserNum, the tangent value of each Laser is tanθ _i , and the offset position of each Laser in the vertical direction is _Zi , then:

After calculating the LaserIdx of the current point, the LaserIdx of the current node is first used to predict the pointLaserIdx of the point. After encoding the LaserIdx of the current point, the three-dimensional geometric information of the current point is predictively encoded using the acquisition parameters of the laser radar.

The specific algorithm is shown in FIG6C . First, the LaserIdx corresponding to the current point is used to obtain the corresponding predicted value of the horizontal azimuth angle, that is,

Secondly, the node geometry information corresponding to the current point is used to obtain the horizontal azimuth angle corresponding to the node

Among them, the horizontal azimuth

The calculation method between the node geometry information is shown in formula (10), assuming that the geometry coordinates of the node are nodePos:

By using the acquisition parameters of the laser radar, the number of rotation points numPoints of each Laser can be obtained, which represents the number of points obtained when each laser ray rotates one circle. The rotation angular velocity deltaPhi of each Laser can then be calculated using the number of rotation points of each Laser, as shown in formula (11):

As shown in FIG6D , using the horizontal azimuth angle of the node

And the horizontal azimuth of the previous Laser code point corresponding to the current point

Calculate the predicted horizontal azimuth angle corresponding to the current point

The specific calculation formula is shown in formula (12):

Finally, as shown in FIG6E , by using the predicted value of the horizontal azimuth

and the low plane horizontal azimuth of the current node

and the horizontal azimuth of the high plane

To predict the geometric information of the current node. The details are as follows:

After encoding the LaserIdx of the point, the LaserIdx corresponding to the current point will be used to predict the Z-axis direction of the current point. That is, the depth information radius of the cylindrical coordinate system is calculated by using the x and y information of the current point. Then, the tangent value of the current point and the vertical value are obtained by using the laser LaserIdx of the current point. Then, the predicted value of the Z-axis direction of the current point, namely Z_pred, can be obtained:

Finally, Z_pred is used to predict the geometric information of the current point in the Z-axis direction to obtain the prediction residual Z_res, and finally Z_res is encoded.

It should be noted that when nodes are divided into leaf nodes, in the case of geometric lossless coding, the number of repeated points in the leaf nodes needs to be encoded. Finally, the placeholder information of all nodes is encoded to generate a binary code stream. In addition, G-PCC currently introduces a plane coding mode. In the process of geometric division, it will determine whether the child nodes of the current node are in the same plane. If the child nodes of the current node meet the conditions of the same plane, the child nodes of the current node will be represented by the plane.

In the octree-based geometric decoding, the decoding end follows the order of breadth-first traversal. Before decoding the placeholder information of each node, it will first use the reconstructed geometric information to determine whether the current node is to be plane decoded or IDCM decoded. If the current node meets the conditions for plane decoding, the plane identification and plane position information of the current node will be decoded first, and then the placeholder information of the current node will be decoded based on the plane information; if the current node meets the conditions for IDCM decoding, it will first decode whether the current node is a real IDCM node. If it is a real IDCM decoding, it will continue to parse the DCM decoding mode of the current node, and then the number of points in the current DCM node can be obtained, and finally the geometric information of each point will be decoded. For nodes that do not meet neither plane decoding nor DCM decoding, the placeholder information of the current node will be decoded. By continuously parsing in this way, the placeholder code of each node is obtained, and the nodes are continuously divided in turn until the division is stopped when the unit cube of 1X1X1 is obtained, the number of points contained in each leaf node is obtained by parsing, and finally the geometric reconstructed point cloud information is restored.

The following is a detailed introduction to the IDCM decoding process:

The same process as encoding, first use the prior information to decide whether the node starts IDCM, that is, the starting conditions of IDCM are as follows:

(3) The number of sibling nodes of the current node is greater than 1.

When a node meets the conditions for DCM encoding, first decode whether the current node is a real DCM node, that is, IDCM_flag. When IDCM_flag is true, the current node adopts DCM encoding, otherwise it still adopts octree encoding.

Secondly, decode the number of points numPoints of the current node. The specific decoding method is as follows:

1) First, decode whether the numPonts of the current node is greater than 1;

2) If the numPonts of the current node obtained by decoding is greater than 1, continue decoding to see if the second point is a duplicate point. If the second point is not a duplicate point, it can be implicitly inferred that the second type that satisfies the DCM mode contains only two points.

3) If the numPonts of the current node obtained by decoding is less than or equal to 1, continue to decode whether the second point is a repeated point. If the second point is not a repeated point, it can be implicitly inferred that the second type that satisfies the DCM mode contains only one point; if the second point obtained by decoding is a repeated point, it can be inferred that the third type that satisfies the DCM mode contains multiple points, but they are all repeated points, then continue to decode whether the number of repeated points is greater than 1 (entropy decoding). If it is greater than 1, continue to decode the number of remaining repeated points (decoding using exponential Columbus).

If the current node does not meet the requirements of the DCM node, that is, the number of points is greater than 2 points and it is not a duplicate point, exit directly.

After decoding the number of points in the current node, the coordinate information of the points contained in the current node is decoded. The following will introduce the lidar point cloud and the human eye point cloud separately.

Point cloud for human eyes

1) If the current node contains only one point, the geometric information of the point in three dimensions will be directly decoded (Bypass coding);

2) If the current node contains two points, the priority decoding coordinate axis dirextAxis will be obtained by using the geometric coordinates of the points. It should be noted that the coordinate axes currently compared only include the x and y axes, not the z axis. Assuming that the geometric coordinates of the current node are nodePos, the method shown in formula (13) is used to determine the priority encoding coordinate axis:

dirextAxis=! (nodePos[0]<nodePos[1]) (13)

That is to say, the axis with the smaller node coordinate geometric position is used as the coordinate axis dirextAxis for priority decoding.

Secondly, the geometry information of the dirextAxis coordinate axis to be decoded is first decoded as follows, assuming that the bit depth of the geometry to be decoded corresponding to the axis to be decoded is nodeSizeLog2, and assuming that the coordinates of the two points are pointPos[0] and pointPos[1] respectively:

After decoding the priority decoding axis dirextAxis, the geometric coordinates of the current point are directly decoded. Assuming that the remaining encoding bit depth of each point is nodeSizeLog2, the specific decoding process is as follows, assuming that the coordinate information of the point is pointPos:

For LiDAR point clouds

1) If the current node contains two points, the priority decoding coordinate axis dirextAxis will be obtained by using the geometric coordinates of the points. Assuming that the geometric coordinates of the current node are nodePos, the priority encoding coordinate axis is determined by the method shown in formula (14):

dirextAxis=! (nodePos[0]<nodePos[1]) (14)

That is to say, the axis with the smaller node coordinate geometric position is used as the coordinate axis dirextAxis for priority decoding. It should be noted here that the currently compared coordinate axes only include the x and y axes, but not the z axis.

Secondly, the geometry information of the dirextAxis coordinate axis that is encoded first is decoded as follows, assuming that the bit depth of the geometry to be encoded corresponding to the axis that is decoded first is nodeSizeLog2, and assuming that the coordinates of the two points are pointPos[0] and pointPos[1] respectively:

After decoding the priority decoding axis dirextAxis, the geometric coordinates of the current point are decoded.

Similarly, first use the current node's geometry information nodePos to get a direct decoding main axis direction, and then use the geometry information of the decoded direction to decode the geometry information of another dimension. Also assuming that the axis direction of direct decoding is directAxis, and assuming that the bit depth to be decoded in direct decoding is nodeSizeLog2, the decoding method is as follows:

It should be noted here that all geometric accuracy information in the directAxis direction will be decoded here.

After decoding all the precisions of the directAxis coordinate direction, the LaserIdx of the current node, i.e., nodeLaserIdx, is calculated first. Then, the LaserIdx of the node, i.e., nodeLaserIdx, is used to predict and decode the LaserIdx of the point, i.e., pointLaserIdx. The calculation method of the LaserIdx of the node or point is the same as that of the encoder. Finally, the LaserIdx of the current point and the predicted residual information of the LaserIdx of the node are decoded to obtain ResLaserIdx. The calculation formula is shown in Formula 15:

PointLaserIdx＝nodeLaserIdx+ResLaserIdx (15)

After decoding the LaserIdx of the current point, the three-dimensional geometric information of the current point is predicted and decoded using the acquisition parameters of the laser radar.

Specifically, as shown in FIG6B , the LaserIdx corresponding to the current point is first used to obtain the corresponding predicted value of the horizontal azimuth, that is,

Among them, it is assumed that the geometric coordinates of the node are nodePos, and the horizontal azimuth is

The calculation method between the node geometry information is shown in formula (16):

By using the acquisition parameters of the laser radar, the number of rotation points numPoints of each Laser can be obtained, which represents the number of points obtained when each laser ray rotates one circle. The rotation angular velocity deltaPhi of each Laser can then be calculated using the number of rotation points of each Laser, as shown in formula (17):

Next, as shown in FIG6D , the horizontal azimuth angle of the node is used

The predicted value of the horizontal azimuth angle. The calculation method is shown in formula (18):

Finally, by using the predicted value of the horizontal azimuth

and the low plane horizontal azimuth of the current node

and the horizontal azimuth of the high plane

After decoding the LaserIdx of the point, the Z-axis direction of the current point will be predicted and decoded using the LaserIdx corresponding to the current point. That is, the depth information radius of the cylindrical coordinate system is calculated by using the x and y information of the current point. Then, the tangent value of the current point and the vertical offset are obtained using the laser LaserIdx of the current point. Then, the predicted value of the Z-axis direction of the current point, namely Z_pred, can be obtained:

Finally, the decoded Z_res and Z_pred are used to reconstruct and restore the geometric information of the current point in the Z-axis direction.

In the geometric information coding framework based on trisoup (triangle soup, triangle patch set), geometric division must also be performed first, but different from the geometric information coding based on binary tree/quadtree/octree, this method does not need to divide the point cloud into unit cubes with a side length of 1x1x1 step by step, but stops dividing when the block (sub-block) has a side length of W. Based on the surface formed by the distribution of the point cloud in each block, at most twelve vertices (intersection points) generated by the surface and the twelve edges of the block are obtained. The vertex coordinates of each block are encoded in turn to generate a binary code stream.

When reconstructing the geometric information of the point cloud based on trisoup, the vertex coordinates are first decoded to complete the reconstruction of the triangle facets at the decoding end. The process is shown in Figures 7A to 7C. There are three vertices (v1, v2, v3) in the block shown in Figure 7A. The triangle facet set formed by these three vertices in a certain order is called triangle soup, i.e., trisoup, as shown in Figure 7B. After that, sampling is performed on the triangle facet set, and the obtained sampling points are used as the reconstructed point cloud in the block, as shown in Figure 7C.

The geometric coding based on the prediction tree includes: first, sorting the input point cloud. The currently used sorting methods include unordered, Morton order, azimuth order and radial distance order. At the encoding end, the prediction tree structure is established by using two different methods, including: KD-Tree (high-latency slow mode) and using the laser radar calibration information to divide each point into different Lasers, and establish a prediction structure according to different Lasers (low-latency fast mode). Next, based on the structure of the prediction tree, traverse each node in the prediction tree, predict the geometric position information of the node by selecting different prediction modes to obtain the prediction residual, and quantize the geometric prediction residual using the quantization parameter. Finally, through continuous iteration, the prediction residual of the prediction tree node position information, the prediction tree structure and the quantization parameters are encoded to generate a binary code stream.

Based on the geometric decoding of the prediction tree, the decoding end reconstructs the prediction tree structure by continuously parsing the bit stream, and then obtains the geometric position prediction residual information and quantization parameters of each prediction node through parsing, and dequantizes the prediction residual to recover the reconstructed geometric position information of each node, and finally completes the geometric reconstruction of the decoding end.

After the geometric encoding is completed, the geometric information is reconstructed. At present, attribute encoding is mainly performed on color information. First, the color information is converted from the RGB color space to the YUV color space. Then, the point cloud is recolored using the reconstructed geometric information so that the unencoded attribute information corresponds to the reconstructed geometric information. In color information encoding, there are two main transformation methods. One is the distance-based lifting transformation that relies on LOD (Level of Detail) division, and the other is to directly perform RAHT (Region Adaptive Hierarchal Transform) transformation. Both methods will convert color information from the spatial domain to the frequency domain, obtain high-frequency coefficients and low-frequency coefficients through transformation, and finally quantize and encode the coefficients to generate a binary code stream.

When using geometric information to predict attribute information, Morton codes can be used to search for nearest neighbors. The Morton code corresponding to each point in the point cloud can be obtained from the geometric coordinates of the point. The specific method for calculating the Morton code is described as follows. For each component of the three-dimensional coordinate represented by a d-bit binary number, its three components can be expressed as formula (19):

in,

The highest bits of x, y, and z are

To the lowest position

The corresponding binary value. The Morton code M is x, y, z, starting from the highest bit, arranged in sequence

To the lowest bit, the calculation formula of M is shown in the following formula (20):

in,

The highest bit of M

To the lowest position

After obtaining the Morton code M of each point in the point cloud, the points in the point cloud are arranged in order from small to large Morton codes, and the weight w of each point is set to 1.

There are 4 general test conditions for GPCC:

Condition 1: The geometric position is limitedly lossy and the attributes are lossy;

Condition 2: The geometric position is lossless, but the attributes are lossy;

Condition 3: The geometric position is lossless, and the attributes are limitedly lossy;

Condition 4: The geometric position and attributes are lossless.

The general test sequences include Cat1A, Cat1B, Cat3-fused, and Cat3-frame. The Cat2-frame point cloud only contains reflectance attribute information, the Cat1A and Cat1B point clouds only contain color attribute information, and the Cat3-fused point cloud contains both color and reflectance attribute information.

There are two technical routes of GPCC, which are distinguished by the algorithm used for geometric compression, and are divided into octree coding branch and prediction tree coding branch.

Among them, in the octree coding branch, at the encoding end, the bounding box is divided into sub-cubes in sequence, and the non-empty (containing points in the point cloud) sub-cubes are continued to be divided until the leaf node obtained by division is a 1X1X1 unit cube. In the case of geometric lossless coding, it is necessary to encode the number of points contained in the leaf node, and finally complete the encoding of the geometric octree to generate a binary code stream. At the decoding end, the decoding end obtains the placeholder code of each node by continuous parsing in the order of breadth-first traversal, and continuously divides the nodes in sequence until the division is a 1x1x1 unit cube. In the case of geometric lossless decoding, it is necessary to parse the number of points contained in each leaf node, and finally restore the geometric reconstructed point cloud information.

In the prediction tree coding branch, the prediction tree structure is established at the encoding end by using two different methods, including: KD-Tree (high-latency slow mode) and using the laser radar calibration information to divide each point into different lasers and establish a prediction structure according to different lasers (low-latency fast mode). Next, based on the structure of the prediction tree, each node in the prediction tree is traversed, and the geometric position information of the node is predicted by selecting different prediction modes to obtain the prediction residual, and the geometric prediction residual is quantized using the quantization parameter. Finally, through continuous iteration, the prediction residual of the prediction tree node position information, the prediction tree structure, and the quantization parameters are encoded to generate a binary code stream. At the decoding end, the decoding end reconstructs the prediction tree structure by continuously parsing the code stream, and then obtains the geometric position prediction residual information and quantization parameters of each prediction node through parsing, and dequantizes the prediction residual to restore the reconstructed geometric position information of each node, and finally completes the geometric reconstruction of the decoding end.

The AVS encoding and decoding framework is introduced below.

In the point cloud AVS encoder framework, the geometric information of the point cloud and the attribute information corresponding to each point are encoded separately.

FIG8A is a schematic diagram of the encoding framework of AVS, and FIG8B is a schematic diagram of the decoding framework of AVS. As shown in FIG8A, the geometric information is firstly transformed into coordinates so that all point clouds are contained in a bounding box. Before the preprocessing process, it is determined whether to divide the entire point cloud sequence into multiple slices according to the parameter configuration, and each divided slice is treated as a single independent point cloud serial processing. The preprocessing process includes quantization and removal of duplicate points. Quantization mainly plays a role in scaling. Due to the quantization rounding, the geometric information of some points is the same, and whether to remove duplicate points is determined according to the parameters. Next, the bounding box is divided in the order of breadth-first traversal (octree/quadtree/binary tree), and the placeholder code of each node is encoded.

As shown in FIG8B , in the octree-based geometric coding framework, the bounding box is divided into sub-cubes in sequence, and the non-empty (containing points in the point cloud) sub-cubes are divided until the leaf nodes obtained by the division are 1x1x1 unit cubes, and then the division is stopped. Then, in the case of geometric lossless coding, the number of points contained in the leaf nodes is encoded, and finally the encoding of the geometric octree is completed to generate a binary code stream. In the octree-based geometric decoding process, the decoding end obtains the placeholder code of each node by continuously parsing in the order of breadth-first traversal, and continuously divides the nodes in sequence until the division is stopped when the 1x1x1 unit cube is obtained, and the number of points contained in each leaf node is parsed to finally restore the geometric reconstructed point cloud information.

There are two encoding methods in the current AVS geometric coding, one is octree coding and the other is prediction tree coding.

Octree coding: If octree coding is used, there are two context coding models. Context model one is used for cat1-A and cat2 point cloud sequences; context model two is used for cat1-B and cat3 sequences.

The following is an introduction to Model 1.

Model 1 below includes the sub-layer neighbor prediction of the current point and the neighbor prediction of the current point layer.

1) Sub-layer neighbor prediction of the current point

Under the octree breadth-first traversal division method, the neighbor information that can be obtained when encoding the child node of the current point includes the neighbor child nodes in the three directions of left, front and bottom. The context model of the child node layer is designed as follows: For the child node layer to be encoded, find the occupancy of the three coplanar, three colinear, and one co-point nodes in the left, front and bottom direction of the same layer as the child node to be encoded, and the node in the negative direction of the dimension with the shortest node side length, which is two node side lengths away from the current child node to be encoded. Taking the node with the shortest side length in the X dimension as an example, the reference node selected by each child node is shown in Figure 9A. The dotted box node is the current node, the gray node is the current child node to be encoded, and the solid box node is the reference node selected by each child node.

Among them, the occupancy of the 3 coplanar nodes, 3 colinear nodes, and the node at the negative direction of the dimension with the shortest node side length and two node side lengths away from the current sub-node to be encoded is considered in detail. There are 2 ⁷ = 128 cases for the occupancy of these 7 nodes. If not all are unoccupied, there are 2 ⁷ -1 = 127 cases, and 1 context is assigned to each case; if all of these 7 nodes are unoccupied, the occupancy of the common neighbor nodes is considered. There are 2 possibilities for the common neighbor: occupied or unoccupied. A separate context is assigned to the case where the common neighbor node is occupied. If the common neighbor is also unoccupied, the occupancy of the neighbors at the current node layer to be described next is considered. That is, the neighbors at the sub-node layer to be encoded correspond to a total of 127 + 2-1 = 128 contexts.

2) Neighbor prediction of the current node layer

If the eight reference nodes in the same layer of the subnode to be encoded are not occupied, the occupancy of the four groups of neighbors in the current node layer as shown in Figure 9B is considered, where the dotted frame node is the current node and the solid frame node is the neighbor node.

For the current node layer, the context is determined as follows:

1. First consider the three coplanar neighbors to the upper right of the current node. There are 2 ³ = 8 possible occupancy situations of the three coplanar neighbors to the upper right of the current node. For the cases where all of them are not occupied, a context is assigned to each. Considering that the child node to be encoded is located at the position of the current node, this group of neighbor nodes provides a total of (8-1) × 8 = 56 contexts. If the three coplanar neighbors to the upper right of the current point are not occupied, then continue to consider the remaining three groups of neighbors at the current node layer.

2. Consider the distance between the most recently occupied node and the current node.

The specific correspondence between neighbor node distribution and distance is shown in Table 2.

Table 2 Correspondence between the current node layer occupancy and distance

当前节点层占位情况Current node layer occupancy	距离distance
左前下共面邻居占据或右上后共线邻居占据The left front and bottom coplanar neighbors occupy or the right top and back colinear neighbors occupy	11
左前下共面邻居、右上后共线邻居都不占据且左前下共线邻居占据The left front and bottom coplanar neighbors and the right top and back collinear neighbors are not occupied, and the left front and bottom collinear neighbors are occupied	22
当前节点层的四组邻居都不占据None of the four neighbor groups at the current node level are occupied	33

From Table 2, we can see that the distance has 3 values. One context is assigned to each of the 3 values, and considering the position of the sub-node to be encoded at the current node, there are 3×8=24 contexts in total.

So far, this set of context models has allocated a total of 128+56+24=208 contexts.

The following is an introduction to context model 2.

This method uses a two-layer context reference relationship configuration, as shown in formula (21), the first layer is the occupancy of the encoded adjacent blocks of the parent node of the current sub-block to be encoded (i.e., ctxIdxParent), and the second layer is the occupancy of the adjacent encoded blocks at the same depth as the current sub-block to be encoded (i.e., ctxIdxChild).

First, for each sub-block to be encoded, the ctxIdxChild of the second layer is as shown in formula (22), ^where _Ci1 represents the sub-block with the current sub-block.

Occupancy of the three coded sub-blocks with a distance of 1.

idx＝LUT[ctxIdxParent][ctxIdxChild] (21)

Secondly, for the first layer ctxIdxParent, for the relative positions of different sub-blocks, the adjacent parent blocks that are coplanar and colinear with them are found by table lookup, and the ctxIdxParent is calculated according to the occupancy status according to formula (23). Figure 9C is a schematic diagram of sub-blocks corresponding to 6 adjacent parent blocks. As shown in Figure 9C, each sub-graph shows the relative position relationship of the 6 adjacent parent blocks found by the i-th sub-block, including 3 coplanar parent blocks (P _i,0 ,P _i,1 ,P _i,2 ) and 3 colinear parent blocks (P _i,3 ,P _i,4 ,P _i,5 ). The position relationship between each sub-block and the adjacent parent block is obtained by Table 2. The numbers in Table 2 correspond to the Morton numbers in Figure 6. This method takes into account the different sub-block positions and the geometric center rotation symmetry. FIG9D is a schematic diagram of the 18 neighboring blocks and their Morton sequence numbers used by the current block to be encoded. It can be seen from FIG9D that, with the current block as the center, this method has a larger receptive field and can use up to 18 neighboring parent blocks that have been encoded around it. The method used in formula (3) is the combination of the occupancy of the three coplanar parent blocks and the sum of the number of occupancy of the three colinear parent blocks.

Therefore, the number of contexts used in this method is at most 2 ³ ×2 ⁵ =256.

Table 3 shows the relationship between child block i and its adjacent parent block j. The numbers in the table correspond to the Morton sequence numbers in Figure 9D.

table 3

Prediction tree coding: If prediction tree coding is used, first, the geometric information of the point cloud is used at the encoding end to perform Morton code sorting, and then the geometric information of the point cloud is predicted and encoded using KD-Tree, which is similar to a single chain structure that predicts and encodes the geometric information of the child node by using the parent node. As shown in Figure 9E, the prediction tree adopts a single chain structure: except for the only leaf node, each tree node has only one child node. Except for the root node, which is predicted by the default value, other nodes are provided with geometric prediction values by their parent nodes.

During the multitree geometry coding process, the isolated point direct coding mode is effective when the current block satisfies the following three conditions at the same time:

1. The isolated point direct coding mode identifier in the geometry header information is 1;

2. The current block contains only one point cloud data point;

3. The sum of the number of Morton code bits to be encoded for the points in the current block is greater than twice the number of directions that have not reached the minimum side length.

This branch is entered when all three conditions above are met. A flag is introduced to indicate whether the current node uses the isolated point coding mode. The flag uses a context for entropy coding. If the flag is True, the isolated point mode is used to directly encode the geometric coordinates of the point, and the octree division is terminated. If the flag is False, the occupancy code is encoded and the octree division continues.

In certain cases, this flag can be inferred to be False and not encoded. If the parent block of the current block already allows the use of isolated point coding mode, and the current block is the only child node of the parent block, then the current block must not contain isolated points. Therefore, under this condition, the bits for encoding the flag can be omitted.

After encoding the flag bit, since the current block contains only one point cloud point, the uncoded bits of the Morton code corresponding to the geometric coordinates of the point cloud point are directly encoded. The specific encoding process is as follows:

Assuming that the remaining encoding bit depth of the point is nodeSizeLog2, the specific encoding process is as follows:

for(int axisIdx＝0；axisIdx<3；++axisIdx)

for(int mask＝(1<<nodeSizeLog2[axisIdx])>>1；mask；mask>>1)

encodePosBit(！！(pointPos[axisIdx]&mask));

After the geometric encoding is completed, the geometric information is reconstructed. At present, attribute encoding is mainly performed on color and reflectivity information. As shown in Figure 8A, the encoding end first determines whether to perform color space conversion. If color space conversion is performed, the color information is converted from RGB color space to YUV color space. Then, the reconstructed point cloud is recolored using the original point cloud so that the unencoded attribute information corresponds to the reconstructed geometric information. In color information encoding, it is divided into two modules: attribute prediction and attribute transformation. The attribute prediction process is as follows: first, the point cloud is reordered, and then differential prediction is performed. There are two reordering methods: Morton reordering and Hilbert reordering. For cat1A sequence and cat2 sequence, Hilbert reordering is performed on them; for cat1B sequence and cat3 sequence, Morton reordering is performed on them. The attribute prediction of the sorted point cloud is performed using a differential method, and finally the prediction residual is quantized and entropy encoded to generate a binary code stream. The attribute transformation process is as follows: first, wavelet transform is performed on the point cloud attributes and the transform coefficients are quantized; secondly, the attribute reconstruction value is obtained through inverse quantization and inverse wavelet transform; then the difference between the original attribute and the attribute reconstruction value is calculated to obtain the attribute residual and quantize it; finally, the quantized transform coefficients and attribute residuals are entropy encoded to generate a binary code stream.

The general test conditions of AVS PCC are introduced below.

There are 4 general test conditions for AVS:

Condition 2: The geometric position is lossless, but the attributes are lossy;

Condition 4: The geometric position and attributes are lossless.

The general test sequences include five categories: Cat1A, Cat1B, Cat1C, Cat2-frame and Cat3. Cat1A and Cat2-frame point clouds only contain reflectance attribute information, Cat1B and Cat3 point clouds only contain color attribute information, and Cat1B point cloud contains both color and reflectance attribute information.

Technical routes: There are 2 types in total, distinguished by the algorithm used for attribute compression.

Technical route 1: prediction branch, attribute compression adopts the method based on intra-frame prediction:

At the encoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, the Morton order, the Hilbert order, etc.), and the prediction algorithm is first used to obtain the attribute prediction value, and the attribute residual is obtained according to the attribute value and the attribute prediction value. Then, the attribute residual is quantized to generate a quantized residual, and finally the quantized residual is encoded;

At the decoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, Morton order, Hilbert order, etc.). The prediction algorithm is first used to obtain the attribute prediction value, and then the decoding is performed to obtain the quantized residual. The quantized residual is then dequantized, and finally the attribute reconstruction value is obtained based on the attribute prediction value and the dequantized residual.

Technical route 2: Prediction transform branch - limited resources, attribute compression uses a method based on intra-frame prediction and DCT transform. When encoding the quantized transform coefficients, there is a maximum point number X (such as 4096), that is, at most every X points are encoded as a group:

At the encoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, the Morton order, the Hilbert order, etc.), and the entire point cloud is first divided into several small groups with a maximum length of Y (such as 2), and then these small groups are combined into several large groups (the number of points in each large group does not exceed X, such as 4096), and then the prediction algorithm is used to obtain the attribute prediction value, and the attribute residual is obtained according to the attribute value and the attribute prediction value. The attribute residual is transformed by DCT in small groups to generate transformation coefficients, and then the transformation coefficients are quantized to generate quantized transformation coefficients, and finally the quantized transformation coefficients are encoded in large groups;

At the decoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, Morton order, Hilbert order, etc.). First, the entire point cloud is divided into several small groups with a maximum length of Y (such as 2), and then these small groups are combined into several large groups (the number of points in each large group does not exceed X, such as 4096). The quantized transform coefficients are decoded in large groups, and then the prediction algorithm is used to obtain the attribute prediction value. The quantized transform coefficients are dequantized and inversely transformed in small groups. Finally, the attribute reconstruction value is obtained based on the attribute prediction value and the dequantized and inversely transformed coefficients.

Technical route 3: Prediction transform branch - resources are not limited. Attribute compression adopts a method based on intra-frame prediction and DCT transform. When encoding the quantized transform coefficients, there is no limit on the maximum number of points X, that is, all coefficients are encoded together:

At the encoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, the Morton order, the Hilbert order, etc.). First, the entire point cloud is divided into several small groups with a maximum length of Y (such as 2). Then, the prediction algorithm is used to obtain the attribute prediction value. The attribute residual is obtained according to the attribute value and the attribute prediction value. The attribute residual is transformed by DCT in small groups to generate transformation coefficients. The transformation coefficients are quantized to generate quantized transformation coefficients. Finally, the quantized transformation coefficients of the entire point cloud are encoded.

At the decoding end, the points in the point cloud are processed in a certain order (the original acquisition order of the point cloud, Morton order, Hilbert order, etc.). First, the entire point cloud is divided into several small groups with a maximum length of Y (such as 2), and the quantized transformation coefficients of the entire point cloud are obtained by decoding. Then, the prediction algorithm is used to obtain the attribute prediction value, and then the quantized transformation coefficients are dequantized and inversely transformed in groups. Finally, the attribute reconstruction value is obtained based on the attribute prediction value and the dequantized and inversely transformed coefficients.

Technical route 4: Multi-layer transformation branch, attribute compression adopts a method based on multi-layer wavelet transform:

At the encoding end, the entire point cloud is subjected to multi-layer wavelet transform to generate transform coefficients, which are then quantized to generate quantized transform coefficients, and finally the quantized transform coefficients of the entire point cloud are encoded;

At the decoding end, decoding obtains the quantized transform coefficients of the entire point cloud, and then dequantizes and inversely transforms the quantized transform coefficients to obtain attribute reconstruction values.

When directly encoding the current node, the encoder encodes the position information of the midpoint of the current node after determining that the current node is eligible for direct encoding and decoding. However, when encoding the geometric information of the midpoint of the current node, inter-frame information is not considered, thereby reducing the encoding and decoding performance of the point cloud.

In order to solve the above technical problems, when encoding and decoding the current node in the current decoding frame, the embodiment of the present application determines N predicted nodes of the current node in the predicted reference frame of the current frame to be encoded and decoded, and predicts and decodes the coordinate information of the midpoint of the current node based on the geometric encoding and decoding information of the midpoints of the N predicted nodes. In other words, the embodiment of the present application optimizes the node when performing DCM direct encoding and decoding, and predicts and decodes the geometric information of the midpoint of the IDCM node (i.e., the current node) of the to-be-encoded node by considering the correlation in the time domain between adjacent frames, and further improves the efficiency of geometric information encoding and decoding of the point cloud by considering the correlation in the time domain between adjacent frames.

The point cloud encoding and decoding method involved in the embodiments of the present application is introduced below in conjunction with specific embodiments.

First, taking the decoding end as an example, the point cloud decoding method provided in the embodiment of the present application is introduced.

Fig. 10 is a schematic diagram of a point cloud decoding method according to an embodiment of the present application. The point cloud decoding method according to the embodiment of the present application can be implemented by the point cloud decoding device or point cloud decoder shown in Fig. 3 or Fig. 4B or Fig. 8B.

As shown in FIG10 , the point cloud decoding method of the embodiment of the present application includes:

S101 . Determine N prediction nodes of a current node in a prediction reference frame of a current frame to be decoded.

The current node is the node to be decoded in the current frame to be decoded.

As can be seen from the above, the point cloud includes geometric information and attribute information, and the decoding of the point cloud includes geometric decoding and attribute decoding. The embodiment of the present application relates to geometric decoding of point clouds.

In some embodiments, the geometric information of the point cloud is also referred to as the position information of the point cloud. Therefore, the geometric decoding of the point cloud is also referred to as the position decoding of the point cloud.

In the octree-based encoding method, the encoding end constructs the octree structure of the point cloud based on the geometric information of the point cloud. As shown in Figure 11, the point cloud is enclosed by the smallest cuboid. The enclosing box is first divided into octrees to obtain 8 nodes. The occupied nodes among the 8 nodes, that is, the nodes including the points, continue to be divided into octrees, and so on, until the division is to the voxel level, for example, to a 1X1X1 cube. The point cloud octree structure obtained by such division includes multiple layers of nodes, for example, N layers. When encoding, the placeholder information of each layer is encoded layer by layer until the voxel-level leaf nodes of the last layer are encoded. That is to say, in octree encoding, the point cloud is divided through the octree, and finally the points in the point cloud are divided into the voxel-level leaf nodes of the octree. The encoding of the point cloud is achieved by encoding the entire octree.

Correspondingly, the decoding end first decodes the geometric code stream of the point cloud to obtain the placeholder information of the root node of the octree of the point cloud, and based on the placeholder information of the root node, determines the child nodes included in the root node, that is, the nodes included in the second layer of the octree. Then, the geometric code stream is decoded to obtain the placeholder information of each node in the second layer, and based on the placeholder information of each node, determines the nodes included in the third layer of the octree, and so on.

However, the octree-based geometric information encoding mode has an efficient compression rate for correlated points in space, and for points in isolated positions in the geometric space, the use of direct encoding can greatly reduce the complexity and improve the encoding and decoding efficiency.

Since the direct encoding method directly encodes the geometric information of the points included in the node, if the number of points included in the node is large, the compression effect is poor when the direct encoding method is used. Therefore, for the nodes in the octree, before direct encoding, first determine whether the node can be encoded using the direct encoding method. If it is determined that the node can be encoded using the direct encoding method, the direct encoding method is used to directly encode the geometric information of the points included in the node. If it is determined that the node cannot be encoded using the direct encoding method, the octree method is continued to be used to divide the node.

Specifically, the encoding end first determines whether the node is qualified for direct encoding. If the node is qualified for direct encoding, it determines whether the number of points of the node is less than or equal to the preset threshold. If the number of points of the node is less than or equal to the preset threshold, it is determined that the node can be decoded by direct encoding. Then, the number of points included in the node and the geometric information of each point are encoded into the bitstream. Correspondingly, after determining that the node is qualified for direct decoding, the decoding end decodes the bitstream, obtains the number of points of the node and the geometric information of each point, and implements geometric decoding of the node.

Currently, when predictive coding is performed on the position information of the midpoint of the current node, inter-frame information is not considered, resulting in low coding performance of the point cloud.

In order to solve the above problems, in an embodiment of the present application, the decoding end predicts and decodes the position information of the midpoint of the current node based on the inter-frame information corresponding to the current node, thereby improving the decoding efficiency and decoding performance of the point cloud.

Specifically, the decoding end first determines N prediction nodes of the current node in the prediction reference frame of the current frame to be decoded.

It should be noted that the current frame to be decoded is a point cloud frame. In some embodiments, the current frame to be decoded is also referred to as the current frame or the current point cloud frame or the point cloud frame to be decoded. The current node can be understood as any non-leaf node in the current frame to be decoded. In other words, the current node is not a leaf node in the octree corresponding to the current frame to be decoded, that is, the current node is any middle node of the octree, and the current node is not a non-empty node, that is, it includes at least 1 point.

In the embodiment of the present application, when decoding the current node in the current frame to be decoded, the decoding end first determines the prediction reference frame of the current frame to be decoded, and determines N prediction nodes of the current node in the prediction reference frame. For example, FIG12 shows a prediction node of the current node in the prediction reference frame.

It should be noted that the embodiment of the present application does not limit the number of prediction reference frames of the current frame to be decoded. For example, the current frame to be decoded has one prediction reference frame, or the current frame to be decoded has multiple prediction reference frames. At the same time, the embodiment of the present application does not limit the number N of prediction nodes of the current node, which is determined according to actual needs.

The embodiment of the present application does not limit the specific method of determining the prediction reference frame of the current frame to be decoded.

In some embodiments, one or several decoded frames before the current frame to be decoded are determined as prediction reference frames of the current frame to be decoded.

For example, if the current frame to be decoded is a P frame, the inter-frame reference frame of the P frame includes the previous frame of the P frame (ie, the forward frame). Therefore, the previous frame of the current frame to be decoded (ie, the forward frame) can be determined as the predicted reference frame of the current frame to be decoded.

For another example, if the current frame to be decoded is a B frame, the inter-frame reference frames of the B frame include the previous frame of the P frame (i.e., the forward frame) and the next frame of the P frame (i.e., the backward frame). Therefore, the previous frame of the current frame to be decoded (i.e., the forward frame) can be determined as the predicted reference frame of the current frame to be decoded.

In some embodiments, one or several decoded frames following the current frame to be decoded are determined as prediction reference frames of the current frame to be decoded.

For example, if the current frame to be decoded is a B frame, the frame following the current frame to be decoded may be determined as a prediction reference frame of the current frame to be decoded.

In some embodiments, one or several decoded frames before the current frame to be decoded, and one or several decoded frames after the current frame to be decoded, are determined as prediction reference frames of the current frame to be decoded.

For example, if the current frame to be decoded is a B frame, the previous frame and the next frame of the current frame to be decoded may be determined as prediction reference frames of the current frame to be decoded. In this case, the current frame to be decoded has two prediction reference frames.

Taking the current frame to be decoded including K prediction reference frames as an example, the specific process of determining N prediction nodes of the current node in the prediction reference frame of the current frame to be decoded in S101-A above is introduced.

In some embodiments, the decoding end selects at least one prediction reference frame from the K prediction reference frames based on the placeholder information of the node in the current frame to be decoded and the placeholder information of the node in each of the K prediction reference frames, and then searches for the prediction node of the current node in the at least one prediction reference frame. For example, at least one prediction reference frame whose placeholder information of the node is closest to the placeholder information of the node in the current frame to be decoded is selected from the K prediction reference frames, and then searches for the prediction node of the current node in the at least one prediction reference frame.

In some embodiments, the decoding end may determine N predicted nodes of the current node through the following steps S101-A1 and S101-A2:

S101-A1, for a k-th prediction reference frame among K prediction reference frames, determining at least one prediction node of a current node in the k-th prediction reference frame, where k is a positive integer less than or equal to K, and K is a positive integer;

S101-A2. Determine N prediction nodes of the current node based on at least one prediction node of the current node in K prediction reference frames.

In this embodiment, the decoding end determines at least one prediction node of the current node from each of the K prediction reference frames, and finally summarizes at least one prediction node in each of the K prediction reference frames to obtain N prediction nodes of the current node.

Among them, the process of the decoding end determining at least one prediction point of the current node in each of the K prediction reference frames is the same. For the convenience of description, the kth prediction reference frame among the K prediction reference frames is taken as an example for explanation.

The specific process of determining at least one prediction node of the current node in the kth prediction reference frame in the above S101-A1 is introduced below.

The embodiment of the present application does not limit the specific manner in which the decoding end determines at least one prediction node of the current node in the kth prediction reference frame.

Method 1: In the kth prediction reference frame, a prediction node of the current node is determined. For example, a node in the kth prediction reference frame having the same division depth as the current node is determined as the prediction node of the current node.

For example, assuming that the current node is located at the third layer of the octree of the current frame to be decoded, the nodes located at the third layer of the octree in the kth prediction reference frame can be obtained, and then the prediction node of the current node can be determined from these nodes.

In one example, if the number of prediction nodes of the current node in the kth prediction reference frame is 1, then among the points at which the kth prediction reference frame and the current node are at the same division depth, a node whose occupancy information is the smallest different from that of the current node can be selected, recorded as node 1, and node 1 is determined as a prediction node of the current node in the kth prediction reference frame.

In another example, if the number of prediction nodes of the current node in the kth prediction reference frame is greater than 1, the node 1 determined above and at least one domain node of node 1 in the kth prediction reference frame, such as at least one domain node that is coplanar, colinear, or co-point with node 1, are determined as the prediction nodes of the current node in the kth prediction reference frame.

Mode 2, in the above S101-A1, determining at least one prediction node of the current node in the kth prediction reference frame includes the following steps S101-A11 to S101-A13:

S101-A11, in the current frame to be decoded, determine M domain nodes of the current node, the M domain nodes include the current node, and M is a positive integer;

S101-A12, for the i-th domain node among the M domain nodes, determine the corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M;

S101-A13. Determine at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame.

In this implementation, before determining at least one prediction node of the current node in the kth prediction reference frame, the decoding end first determines M domain nodes of the current node in the current frame to be decoded, and the M domain nodes include the current node itself.

It should be noted that in the embodiment of the present application, there is no limitation on the specific method of determining the M domain nodes of the current node.

In one example, the M domain nodes of the current node include at least one domain node among the domain nodes that are coplanar, colinear, and co-pointed with the current node in the current frame to be decoded. As shown in FIG13 , the current node includes 6 coplanar nodes, 12 colinear nodes, and 8 co-pointed nodes.

In another example, the M domain nodes of the current node may include other nodes within the reference neighborhood in addition to at least one domain node in the current frame to be decoded that is coplanar, colinear, and co-point with the current node. This embodiment of the present application does not impose any restrictions on this.

Based on the above steps, the decoding end determines the M domain nodes of the current node in the current frame to be decoded, and then determines the corresponding node of each of the M domain nodes in the kth prediction reference frame, and then determines at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame.

The embodiment of the present application does not limit the specific implementation method of S101-A13.

In a possible implementation, at least one corresponding node is selected from the corresponding nodes of the M domain nodes in the k-th prediction reference frame as at least one prediction node of the current node in the k-th prediction reference frame. For example, at least one corresponding node whose placeholder information has the smallest difference with the placeholder information of the current node is selected from the corresponding nodes of the M domain nodes in the k-th prediction reference frame as at least one prediction node of the current node in the k-th prediction reference frame. The method of determining the difference between the placeholder information of the corresponding node and the placeholder information of the current node can refer to the above-mentioned process of determining the difference in placeholder information, for example, performing an XOR operation on the placeholder information of the corresponding node and the placeholder information of the current node, and using the XOR operation result as the difference between the placeholder information of the corresponding node and the placeholder information of the current node.

In another possible implementation, the decoding end determines the corresponding nodes of the M domain nodes in the kth prediction reference frame as at least one prediction node of the current node in the kth prediction reference frame. For example, the M domain nodes each have a corresponding node in the kth prediction reference frame, and then there are M corresponding nodes, and these M corresponding nodes are determined as the prediction nodes of the current node in the kth prediction reference frame, and there are M prediction nodes in total.

The above describes the process of determining at least one prediction node of the current node in the kth prediction reference frame. In this way, the decoding end can use the same method as above to determine at least one prediction node of the current node in each of the K prediction reference frames.

For example, if the current frame to be decoded is a P frame, the K predicted reference frames include the forward frame of the current frame to be decoded. At this point, the decoding end can determine at least one predicted node of the current node in the forward frame based on the above steps. Exemplarily, as shown in FIG15A , it is assumed that the current node includes three domain nodes, which are respectively recorded as node 11, node 12 (current node) and node 13. These three domain nodes correspond to a corresponding node in the forward frame, respectively recorded as node 21, node 22 and node 23, and then node 21, node 22 and node 23 are determined as the three predicted nodes of the current node in the forward frame, or 1 or 2 nodes are selected from node 21, node 22 and node 23 to be determined as 1 or 2 predicted nodes of the current node in the forward frame.

For another example, if the current frame to be decoded is a B frame, the K prediction reference frames include the forward frame and the backward frame of the current frame to be decoded. At this time, the decoding end can determine at least one prediction node of the current node in the forward frame, and at least one prediction node of the current node in the backward frame based on the above steps. Exemplarily, as shown in FIG15B, it is assumed that the current node includes three domain nodes, which are respectively recorded as

nodes

11, 12, and 13. These three domain nodes correspond to a corresponding node in the forward frame, which are recorded as

nodes

21, 22, and 23, respectively. These three domain nodes correspond to a corresponding node in the backward frame, which are recorded as

nodes

41, 42, and 43. In this way, the decoding end can determine

nodes

21, 22, and 23 as the three prediction nodes of the current node in the forward frame, or select 1 or 2 nodes from

nodes

21, 22, and 23 to determine as 1 or 2 prediction nodes of the current node in the forward frame. Similarly, the decoding end can determine node 41, node 42 and node 43 as three prediction nodes of the current node in the backward frame, or select one or two nodes from node 41, node 42 and node 43 as one or two prediction nodes of the current node in the backward frame.

After the decoding end determines at least one prediction node of the current node in each of the K prediction reference frames, it executes the above step S101-B, that is, based on at least one prediction node of the current node in the K prediction reference frames, determines N prediction nodes of the current node.

In an example, at least one prediction node of the current node in K prediction reference frames is determined as N prediction nodes of the current node.

For example, K=2, that is, the K prediction reference frames include the first prediction reference frame and the second prediction reference frame. Assuming that the current node has 2 prediction nodes in the first prediction reference frame and the current node has 3 prediction nodes in the second prediction reference frame, it can be determined that the current node has 5 prediction nodes, and N=5 at this time.

In another example, N prediction nodes of the current node are screened out from at least one prediction node of the current node in K prediction reference frames.

Continuing with the above example, assume that K=2, that is, the K prediction reference frames include the first prediction reference frame and the second prediction reference frame. Assume that the current node has 2 prediction nodes in the first prediction reference frame, and the current node has 3 prediction nodes in the second prediction reference frame. From these 5 prediction nodes, select 3 prediction nodes as the final prediction nodes of the current node. For example, from these 5 prediction nodes, select the 3 prediction nodes whose placeholder information has the smallest difference with the placeholder information of the current node, and determine them as the final prediction nodes of the current node.

In the second method, after the decoding end determines the M domain nodes of the current node in the current frame to be decoded, it determines the corresponding node of each of the M domain nodes in the kth prediction reference frame, and then determines at least one prediction point of the current node in the kth prediction reference frame based on the corresponding node of each of the M domain nodes.

Mode 3, in the above S101-A1, determining at least one prediction node of the current node in the kth prediction reference frame includes the following steps S101-B11 to S101-B13:

S101-B11, determining a corresponding node of the current node in the kth prediction reference frame;

S101-B12, determining at least one domain node of the corresponding node;

S101-B13. Determine at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.

In this method 3, for each of the K prediction reference frames, the decoding end first determines the corresponding node of the current node in each prediction reference frame. For example, the corresponding node 1 of the current node in the prediction reference frame 1 is determined, and the corresponding node 2 of the current node in the prediction reference frame 2 is determined. Next, the decoding end determines at least one domain node of each corresponding node. For example, at least one domain node of the corresponding node 1 is determined in the prediction reference frame 1, and at least one domain node of the corresponding node 2 is determined in the prediction reference frame 2. In this way, at least one domain node of the corresponding node 1 in the prediction reference frame 1 can be determined as at least one prediction node of the current node in the prediction reference frame 1, and at least one domain node of the corresponding node 2 in the prediction reference frame 2 can be determined as at least one prediction node of the current node in the prediction reference frame 2.

The process of determining the corresponding node of the i-th domain node in the k-th prediction reference frame in S101-A12 of the second method is basically the same as the process of determining the corresponding node of the current node in the k-th prediction reference frame in S101-B11 of the above-mentioned third method. For the convenience of description, the above-mentioned i-th domain node and the current node are recorded as the i-th node, and the specific process of determining the corresponding node of the i-th node in the k-th prediction reference frame is introduced below.

The decoding end determines the corresponding node of the i-th node in the k-th prediction reference frame in at least the following ways:

Method 1: Determine a node in the k-th prediction reference frame that has the same division depth as the i-th node as the corresponding node of the i-th node.

For example, assuming that the i-th node is located at the third layer of the octree of the current frame to be decoded, the nodes located at the third layer of the octree in the k-th prediction reference frame can be obtained, and then the corresponding node of the i-th node can be determined from these nodes. For example, among the points where the k-th prediction reference frame and the i-th node are at the same division depth, a node whose placeholder information has the smallest difference with the placeholder information of the i-th node is selected and determined as the corresponding node of the i-th node in the k-th prediction reference frame.

Mode 2: The above S101-A12 and S101-B11 include the following steps:

S101-A121, in the current frame to be decoded, determine the parent node of the i-th node as the i-th parent node;

S101-A122, determining a matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node;

S101-A123, determine one of the child nodes of the i matching nodes as the corresponding node of the i-th node in the k-th prediction reference frame.

In the method 2, for the i-th node, the decoding end determines the parent node of the i-th node in the current frame to be decoded, and then determines the matching node of the parent node of the i-th prediction domain node in the k-th prediction reference frame. For the convenience of description, the parent node of the i-th node is recorded as the i-th parent node, and the matching node of the parent node of the i-th node in the k-th prediction reference frame is determined as the i-th matching node. Then, a child node of the child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame, thereby accurately determining the corresponding node of the i-th node in the k-th prediction reference frame.

The specific process of determining the matching node of the i-th parent node in the k-th prediction reference frame in the above S101-A122 is introduced below.

The embodiment of the present application does not limit the specific manner in which the decoding end determines the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the division depth of the i-th parent node in the current frame to be decoded is determined, for example, the i-th parent node is in the second layer of the octree of the current frame to be decoded. In this way, the decoding end can determine one of the nodes in the k-th prediction reference frame that have the same division depth as the i-th parent node as the matching node of the i-th parent node in the k-th prediction reference frame. For example, one of the nodes in the second layer in the k-th prediction reference frame is determined as the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the decoding end determines the matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node. Specifically, since the placeholder information of the i-th parent node in the current frame to be decoded has been decoded, and the placeholder information of each node in the k-th prediction reference frame has also been decoded, the decoding end can search for the matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node.

For example, a node whose placeholder information in the k-th prediction reference frame has the smallest difference with the placeholder information of the i-th parent node is determined as a matching node of the i-th parent node in the k-th prediction reference frame.

For example, assuming that the placeholder information of the i-th parent node is 11001101, the node whose placeholder information is the smallest different from the placeholder information 11001101 is searched in the k-th prediction reference frame. Specifically, the decoding end performs an XOR operation on the placeholder information of the i-th parent node and the placeholder information of each node in the k-th prediction reference frame, and determines the node with the smallest XOR operation result in the k-th prediction reference frame as the matching node of the i-th parent node in the k-th prediction reference frame.

For example, assuming that the occupancy information of node 1 in the kth predicted reference frame is 10001101, 11001101 and 10001101 are XORed, where the first bit of 11001101 and the first bit of 10001101 are both 1, so the XOR result of the first bit of the two is 0, the second bit of 11001101 is different from the second bit of 10001111, so the XOR result of the second bit of the two is 1, and so on, the XOR result of 11001101 and 10001111 is 0+1+0+0+0+0+1+0=2. According to this method, the decoding end can determine the XOR operation result of the occupancy information of the i-th parent node and the occupancy information of each node in the k-th prediction reference frame, and then determine the node in the k-th prediction reference frame with the smallest XOR operation with the occupancy information of the i-th parent node as the matching node of the i-th parent node in the k-th prediction reference frame.

Based on the above steps, the decoding end can determine the matching node of the i-th parent node in the k-th prediction reference frame. For the convenience of description, this matching node is recorded as the i-th matching node.

Next, the decoding end determines one of the child nodes of the i-th matching node as the corresponding node of the i-th domain node in the k-th prediction reference frame.

For example, the decoding end determines a default child node among the child nodes included in the i-th matching node as the corresponding node of the i-th node in the k-th prediction reference frame. Assume that the first child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.

For another example, the decoding end determines the first sequence number of the i-th node in the child nodes included in the parent node; the child node with the first sequence number in the child nodes of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame. Exemplarily, as shown in FIG14, the i-th node is the second child node of the i-th parent node, and the first sequence number is 2. In this way, the second child node of the i-th matching node can be determined as the corresponding node of the i-th node.

The above describes the process of determining the corresponding node of the i-th domain node in the M domain nodes in the k-th prediction reference frame, and determining the corresponding node of the current node in the k-th prediction reference frame. In this way, the decoding end can use the second or third method to determine the N prediction nodes of the current node in the prediction reference frame.

Based on the above steps, the decoding end determines N prediction nodes of the current node in the prediction reference frame of the current frame to be decoded, and then performs the following step S102.

S102. Based on the geometric decoding information of the midpoints of the N predicted nodes, predict and decode the position information of the midpoint of the current node.

Due to the correlation between adjacent frames of the point cloud, the embodiment of the present application refers to the relevant information between frames when predicting the position information of the midpoint of the current node based on the correlation between adjacent frames of the point cloud. Specifically, the WeChat information of the midpoint of the current node is predictively encoded based on the geometric decoding information of the midpoints of the N predicted nodes of the current node, thereby improving the encoding and decoding efficiency and encoding and decoding performance of the point cloud.

In one example, as shown in FIG16A , the process of the encoder directly encoding the current node includes: determining whether the current node is qualified for direct encoding, and if it is determined that the current node is qualified for direct decoding, setting IDCMEligible to true. Next, determining whether the number of points included in the current node is less than a preset threshold, and if it is less than the preset threshold, determining to encode the current node in a direct encoding manner, that is, directly encoding the number of points of the current node and the geometric information of the midpoint of the current node.

Correspondingly, when the decoding end decodes the current node, as shown in FIG16B , the decoding end first determines whether the current node is eligible for direct decoding, and if the decoding end determines that the current node is eligible for direct decoding, it sets IDCMEligible to true. Next, the geometric information of the midpoint of the current node is decoded.

It should be noted that, in the embodiment of the present application, predictive decoding of the coordinate information of the midpoint of the current node is performed based on the geometric decoding information of the N predicted nodes, which can be understood as using the geometric decoding information of the N predicted nodes as context to predictively decode the coordinate information of the midpoint of the current node. For example, the decoding end determines the index of the context model based on the geometric decoding information of the N predicted nodes, and then determines the target context model from the preset multiple context models based on the index of the context model, and uses the context model to predictively decode the coordinate information of the midpoint of the current node.

In an embodiment of the present application, based on the geometric decoding information of N predicted nodes, the process of predicting and decoding the coordinate information of each point in the current node is basically the same. For the sake of ease of description, the predicting and decoding of the coordinate information of the current point in the current node is taken as an example to illustrate.

In some embodiments, the above S102 includes the following steps:

S102-A, determining an index of a context model based on geometric decoding information of N prediction nodes;

S102-B, determining a context model based on an index of the context model;

S102-C. Use the context model to predict and decode the coordinate information of the current point in the current node.

In an embodiment of the present application, multiple context models, for example, Q context models, are set for the decoding process of the coordinate information. The embodiment of the present application does not limit the specific number of context models corresponding to the coordinate information, as long as Q is greater than 1. That is to say, in an embodiment of the present application, at least one optimal context model is selected from two context models to predict and decode the coordinate information of the current point in the current node, so as to improve the decoding efficiency of the coordinate information of the current point.

Exemplarily, the coordinate information corresponds to multiple context models as shown in Table 4:

索引 index		上下文模型Context Model
00	上下文模型A Context Model A
11	上下文模型BContext Model B
………	………

In this way, the decoding end determines the index of the context model based on the geometric decoding information of the N prediction nodes. Then, based on the index of the context model, a context model is selected from the context models corresponding to Table 4 to predict and decode the coordinate information of the current point in the current node.

In the embodiment of the present application, the geometric decoding information of the prediction node can be understood as any information involved in the geometric decoding process of the prediction node, such as the number of points included in the prediction node, the placeholder information of the prediction node, the decoding method of the prediction node, the geometric information of the midpoint of the prediction node, etc.

In some embodiments, the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or coordinate information of a point in the prediction node, wherein the direct decoding information of the prediction node is used to indicate whether the prediction node meets the conditions for decoding in a direct decoding manner.

Based on this, the above S102-A includes the following step S102-A1:

S102-A1. Determine a first context index based on direct decoding information of the N prediction nodes, and/or determine a second context index based on coordinate information of midpoints of the N prediction nodes.

Correspondingly, the above S102-B includes the following steps S102-B:

S102-B1. Select a context model from a plurality of preset context models based on the first context index and/or the second context index.

In this embodiment, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or coordinate information of the midpoint of the prediction node, the decoding end can determine the first context index based on the direct decoding information of the N prediction nodes, and/or determine the second context index based on the position information of the midpoint of the N prediction nodes, and then select the final context model from the preset multiple context models based on the first context index and/or the second context index.

It can be seen that in this embodiment, the decoding end determines the context model in the following ways, including but not limited to:

In one possible implementation, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node, the process of determining the context model may be to determine a first context index based on the direct decoding information of N prediction nodes, and then based on the first context index, select a final context model from a plurality of preset context models to decode the coordinate information of the current point.

For example, the decoding end selects the final context model from the context models shown in Table 4 based on the first context index.

In another possible implementation, if the geometric decoding information of the prediction node includes the coordinate information of the midpoint of the prediction node, the process of determining the context model may be to determine the second context index based on the coordinate information of the midpoints of N prediction nodes, and then based on the second context index, select the final context model from the preset multiple context models to decode the coordinate information of the current point.

For example, the decoding end selects the final context model from the context models shown in Table 4 based on the second context index.

In another possible implementation, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node and coordinate information of the midpoint of the prediction node, the process of determining the context model may be to determine a first context index based on the direct decoding information of N prediction nodes, and then determine a second context index based on the first context index and the coordinate information of the midpoint of the N prediction nodes, and then select a final context model from a plurality of preset context models based on the second context index, the first context index and the second context index to decode the coordinate information of the current point.

Exemplarily, the correspondence between the first context index, the second context index and the context model is shown in Table 5:

table 5

	第二上下文索引1 Second context index 1	第二上下文索引2 Second context index 2	第二上下文索引3 Second context index 3	………
第一上下文索引1 First context index 1	上下文模型11 Context Model 11	上下文模型12 Context Model 12	上下文模型13 Context Model 13	………
第一上下文索引2 First context index 2	上下文模型21 Context Model 21	上下文模型22 Context Model 22	上下文模型23 Context Model 23	………
第一上下文索引3 First context index 3	上下文模型31 Context Model 31	上下文模型32Context Model 32	上下文模型33 Context Model 33	………
………	………	………	………	………

In this mode 3, the decoding end determines the first context index based on the direct decoding information of the N prediction nodes, and determines the second context index based on the coordinate information of the midpoints of the N prediction nodes, and then checks the above table 5 to obtain the final context model. For example, the decoding end determines that the first context index is the first context index 2 based on the direct decoding information of the N prediction nodes, and determines that the second context index is the second context index 3 based on the coordinate information of the midpoints of the N prediction nodes. In this way, by checking Table 5, the final context model can be obtained as context model 23, and then the decoding end uses context model 23 to decode the coordinate information of the current point.

The specific process of determining the first context index based on the direct decoding information of the N prediction nodes in S102-A1 is introduced below.

In the embodiment of the present application, the decoding end determines the first context index in the following ways, but is not limited to:

Method 1: the above S102-A1 includes the following steps S102-A1-11 and S102-A1-12:

S102-A1-11. For any prediction node among the N prediction nodes, determine a first value corresponding to the prediction node based on direct decoding information of the prediction node.

In this manner, for each of the N prediction nodes, a first value corresponding to the prediction node is determined based on direct decoding information of the prediction node, and finally a first context index is determined based on the first values corresponding to the N prediction nodes.

The following describes the process of determining the first value corresponding to the prediction node.

As can be seen from the above, the direct decoding information of the prediction node is used to indicate whether the prediction node meets the conditions for decoding in the direct decoding manner. The embodiment of the present application does not limit the specific content of the direct decoding information.

In some embodiments, the direct decoding information includes the number of points included in the prediction node. In this way, the first value corresponding to the prediction node can be determined based on the number of points included in the prediction node.

In one example, in the GPCC framework, when the number of points included in the prediction node is greater than or equal to 2, the first value corresponding to the prediction node is determined to be 1, and if the number of points included in the prediction node is less than 2, the first value corresponding to the prediction node is determined to be 0. In the AVS framework, when the number of points included in the prediction node is greater than or equal to 1, the first value corresponding to the prediction node is determined to be 1, and if the number of points included in the prediction node is less than 1, the first value corresponding to the prediction node is determined to be 0.

In another example, the number of points included in the prediction node is determined as the first value corresponding to the prediction node. For example, when the prediction node includes 2 points, the first value corresponding to the prediction node is determined to be 2.

In some embodiments, the direct decoding information of the prediction node includes the direct decoding mode of the prediction node. In this case, the above S102-A1-11 includes: numbering the direct decoding mode of the prediction node and determining the first value corresponding to the prediction node.

For example, in the GPCC framework, if the direct decoding mode of the prediction node is mode 0, the first value corresponding to the prediction node is determined to be 0. If the direct decoding mode of the prediction node is mode 1, the first value corresponding to the prediction node is determined to be 1. If the direct decoding mode of the prediction node is mode 2, the first value corresponding to the prediction node is determined to be 2.

For another example, in the AVS framework, if the direct decoding mode of the prediction node is mode 0, the first value corresponding to the prediction node is determined to be 0. If the direct decoding mode of the prediction node is mode 1, the first value corresponding to the prediction node is determined to be 1.

Based on the above steps, after the decoding end determines the first value corresponding to each of the N prediction nodes, the following step S102-A1-12 is executed.

S102-A1-12. Determine a first context index based on first numerical values corresponding to the N prediction nodes.

After the decoding end determines the first values corresponding to the N prediction nodes based on the above steps, it determines the first context index based on the first values corresponding to the N prediction nodes.

Wherein, determining the first context index based on the first values corresponding to the N prediction nodes includes at least the following implementation methods:

Method 1: Determine the average value of the sum of the first numerical values corresponding to the N prediction nodes as the first context index.

Mode 2, S102-A1-12 includes the following steps S102-A1-121 to S102-A1-123:

S102-A1-121, determine a first weight corresponding to the prediction node;

S102-A1-122, based on the first weight, weighting the first values corresponding to the N prediction nodes to obtain a first weighted prediction value;

S102-A1-123. Determine a first context index based on the first weighted prediction value.

In method 2, if the current node includes multiple prediction nodes, i.e., N prediction nodes, when determining the first context index based on the first numerical values corresponding to the N prediction nodes, a weight, i.e., the first weight, can be determined for each of the N prediction nodes. In this way, the first numerical values corresponding to each prediction node can be weighted based on the first weight of each prediction node, and then the first context index can be determined based on the final weighted result, thereby improving the accuracy of determining the first context index based on the geometric decoding information of the N prediction nodes.

The embodiment of the present application does not limit the determination of the first weights corresponding to the N prediction nodes.

In some embodiments, the first weight corresponding to each prediction node in the above-mentioned N prediction nodes is a preset value. As can be seen from the above, the above-mentioned N prediction nodes are determined based on the M domain nodes of the current node. Assuming that prediction node 1 is the prediction node corresponding to domain node 1, if domain node 1 is a coplanar node of the current node, the first weight of prediction node 1 is the preset weight 1, if domain node 1 is a colinear node of the current node, the first weight of prediction node 1 is the preset weight 2, and if domain node 1 is a co-point node of the current node, the first weight of prediction node 1 is the preset weight 3.

In some embodiments, for each of the N prediction nodes, a first weight corresponding to the prediction node is determined based on the distance between the domain node corresponding to the prediction node and the current node. For example, the smaller the distance between the domain node and the current node, the stronger the inter-frame correlation between the prediction node corresponding to the domain node and the current node, and thus the greater the first weight of the prediction node.

For example, taking prediction node 1 among N prediction nodes as an example, assuming that prediction node 1 is the corresponding point of domain node 1 among M domain nodes of the current node in the prediction reference frame, the first weight of prediction node 1 can be determined based on the distance between domain node 1 and the current node. For example, the inverse of the distance between domain node 1 and the current node is determined as the first weight of prediction node 1.

In one example, if domain node 1 is a coplanar node of the current node, the first weight of the predicted node 1 is 1; if domain node 1 is a colinear node of the current node, the first weight of the predicted node 1 is a preset weight.

If domain node 1 is a common node of the current node, the first weight of predicted node 1 is the preset weight

In one example, if domain node 1 is a coplanar node of the current node, the first weight of predicted node 1 is

If domain node 1 is a collinear node of the current node, the first weight of predicted node 1 is the preset weight

In some embodiments, based on the above steps, after the weight corresponding to each prediction node in the N prediction nodes is determined, the weight is normalized, and the normalized weight is used as the final first weight of the prediction node.

The embodiment of the present application does not limit the specific method of obtaining the first weighted prediction value by weighting the first numerical values corresponding to N prediction nodes based on the first weight.

In one example, based on the first weight, a weighted average is performed on the first numerical values corresponding to the N prediction nodes to obtain a first weighted prediction value.

In another example, based on the first weight, a weighted sum is performed on the first numerical values corresponding to the N prediction nodes to obtain a first weighted prediction value.

After determining the first weighted prediction value based on the method steps, the first context index is determined based on the first weighted prediction value, that is, the above S102-A1-123 includes at least the following examples:

Example 1: Determine the first weighted prediction value as the first context index.

Example 2: Determine the weighted prediction value range in which the first weighted prediction value is located, and determine the index corresponding to the range as the first context index.

The above is an introduction to the process in which the decoding end performs weighted processing on N prediction nodes to obtain the first context index.

In some embodiments, the decoding end may also adopt the following method 2 to determine the first context index.

Method 2: if K is greater than 1, determine the second weighted prediction value corresponding to each of the K prediction reference frames, and then determine the first context index based on the second weighted prediction values corresponding to the K prediction reference frames. At this time, the above S102-A1 includes the following steps S102-A1-21 to S102-A1-23:

S102-A1-21. For a j-th prediction reference frame among the K prediction reference frames, determine a first value corresponding to the prediction node in the j-th prediction reference frame based on direct decoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K.

S102-A1-22, determining a first weight corresponding to the prediction node, and performing weighted processing on the first value corresponding to the prediction node in the j-th prediction reference frame based on the first weight, to obtain a second weighted prediction value corresponding to the j-th prediction reference frame;

S102-A1-23. Determine a first context index based on second weighted prediction values corresponding to K prediction reference frames.

In the second method, when determining the first context index, each of the K prediction reference frames is considered as separate context information. Specifically, the direct decoding information of the prediction node included in each of the K prediction reference frames is determined, and the second weighted prediction value corresponding to each prediction reference frame is determined, and then based on the second weighted prediction value corresponding to each prediction reference frame, the first context index is determined to achieve accurate selection of the first context index, thereby improving the decoding efficiency of the point cloud.

In the embodiment of the present application, the specific method in which the decoding end determines the second weighted prediction value corresponding to each of the K prediction reference frames is the same. For the sake of ease of description, the jth prediction reference frame among the K prediction reference frames is taken as an example for explanation.

In an embodiment of the present application, the current node includes at least one prediction node in the jth prediction reference frame, so that the first value of the at least one prediction node is determined based on the direct decoding information of the at least one prediction node in the jth prediction reference frame.

For example, the j-th prediction reference frame includes two prediction nodes of the current node, which are respectively recorded as prediction node 1 and prediction node 2, and then the first value of prediction node 1 is determined based on the direct decoding information of prediction node 1, and the first value of prediction node 2 is determined based on the direct decoding information of prediction node 2. The process of determining the first value corresponding to the prediction node based on the direct decoding information of the prediction node can refer to the description of the above embodiment, and the first value corresponding to the prediction node is determined based on the direct decoding mode of the prediction node, for example, the number (0, 1 or 2) of the direct decoding mode of the prediction node is determined as the first value corresponding to the prediction node.

After the decoding end determines the first value of at least one prediction node included in the j-th prediction reference frame, it determines the first weight corresponding to the at least one prediction node, and weightedly processes the first value corresponding to the at least one prediction node based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame.

In one example, based on the first weight, a weighted average is performed on the first values corresponding to the prediction nodes in the j-th prediction reference frame to obtain a second weighted prediction value corresponding to the j-th prediction reference frame.

In another example, based on the first weight, a weighted sum is performed on the first numerical values corresponding to the prediction nodes in the j-th prediction reference frame to obtain a second weighted prediction value corresponding to the j-th prediction reference frame.

The process of determining the first weight may refer to the description of the above embodiment and will not be repeated here.

The above introduces the process of determining the second weighted prediction value corresponding to the j-th prediction reference frame among the K prediction reference frames. The second weighted prediction values corresponding to other prediction reference frames among the K prediction reference frames are determined in accordance with the method corresponding to the j-th prediction reference frame.

After the decoding end determines the second weighted prediction value corresponding to each of the K prediction reference frames, it executes the above step S102-A1-23.

The present application does not limit the specific method of determining the first context index based on the second weighted prediction value corresponding to K prediction reference frames.

In some embodiments, the decoding end determines an average value of the second weighted prediction values corresponding to the K prediction reference frames as the first context index.

In some embodiments, the decoding end determines second weights corresponding to K predicted reference frames, and performs weighted processing on second weighted prediction values corresponding to the K predicted reference frames based on the second weights to obtain a first context index.

In this embodiment, the decoding end first determines the second weight corresponding to each of the K prediction reference frames. The embodiment of the present application does not limit the determination of the second weight corresponding to each of the K prediction reference frames.

In some embodiments, the second weight corresponding to each of the K predicted reference frames is a preset value. As can be seen from the above, the K predicted reference frames are forward frames and/or backward frames of the current frame to be decoded. Assuming that predicted reference frame 1 is the forward frame of the current frame to be decoded, the second weight corresponding to predicted reference frame 1 is the preset weight 1. If predicted reference frame 1 is the backward frame of the current frame to be decoded, the second weight corresponding to predicted reference frame 1 is the preset weight 2.

In some embodiments, based on the time difference between the predicted reference frame and the current frame to be decoded, the second weight corresponding to the predicted reference frame is determined. In an embodiment of the present application, each point cloud includes time information, and the time information may be the time when the point cloud acquisition device acquires the point cloud of the frame. Based on this, if the time difference between the predicted reference frame and the current frame to be decoded is smaller, the inter-frame correlation between the predicted reference frame and the current frame to be decoded is stronger, and thus the second weight corresponding to the predicted reference frame is larger. For example, the inverse of the time difference between the predicted reference frame and the current frame to be decoded can be determined as the second weight corresponding to the predicted reference frame.

After determining the second weight corresponding to each of the K prediction reference frames, weighted processing is performed on the second weighted prediction values respectively corresponding to the K prediction reference frames based on the second weight to obtain a first context index.

For example, assuming K=2, for example, the current frame to be decoded includes 2 prediction reference frames, and these 2 prediction reference frames include the forward frame and backward frame of the current frame to be decoded. Assuming that the second weight corresponding to the forward frame is W1, and the second weight corresponding to the backward frame is W2, based on W1 and W2, the second weighted prediction value corresponding to the forward frame and the second weighted prediction value corresponding to the backward frame are weighted to obtain the first context index.

In one example, based on the second weight, weighted averaging is performed on the second weighted prediction values corresponding to the K prediction reference frames to obtain the first context index.

In another example, based on the second weight, the second weighted prediction values corresponding to the K prediction reference frames are weighted summed to obtain the first context index.

The above describes the process of determining the first context index at the decoding end.

The following introduces the process of determining the second context index at the decoding end.

From the above process of determining the prediction node, it can be seen that each of the N prediction nodes includes one point or multiple points. If each of the N prediction nodes includes one point, the second context index is determined using one point included in each prediction node.

In some embodiments, if the prediction node includes multiple points, a point is selected from the multiple points to determine the second context index. In this case, the above S102-A1 includes the following steps S102-A1-31 and S102-A1-32:

S102-A1-31, for any prediction node among the N prediction nodes, select a first point corresponding to the current point of the current node from the points included in the prediction node;

S102-A1-32. Determine a second context index based on the coordinate information of the first point included in the N prediction nodes.

For example, assuming that N prediction nodes include prediction node 1 and prediction node 2, where prediction node 1 includes point 1 and point 2, and prediction node 2 includes point 3, point 4, and point 5, then one point is selected as the first point from point 1 and point 2 included in prediction node 1, and one point is selected as the first point from point 3, point 4, and point 5 included in prediction node 2. In this way, the geometric information of the current point can be determined based on the geometric information of the first point in prediction node 1 and the first point in prediction node 2.

The embodiment of the present application does not limit the specific method of selecting the first point corresponding to the current point of the current node from the points included in the prediction node.

In a possible implementation, the points in the prediction node that are in the same order as the current point are determined as the first point corresponding to the current point. For example, assuming that the current point is the second point in the current node, point 2 in prediction node 1 can be determined as the first point corresponding to the current point, and point 4 in prediction node 2 can be determined as the first point corresponding to the current point. For another example, if the prediction node includes only one point, the point included in the prediction node is determined as the first point corresponding to the current point.

In one possible implementation, if the encoder selects the first point corresponding to the current point from the points included in the prediction node based on the rate-distortion cost (or approximate cost), the encoder writes the identification information of the first point in the prediction node into the bitstream, so that the decoder obtains the first point in the prediction node by decoding the bitstream.

For each of the N prediction nodes, the decoding end determines the first point corresponding to the current point in each prediction node based on the above method, and then executes the above step S102-A1-32.

In the embodiment of the present application, the decoding end encodes the coordinate information of the current point on different coordinate axes respectively. Based on, the above S102-A1-32 includes the following step S102-A1-321:

S102-A1-321. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis.

The above-mentioned i-th coordinate axis can be the X-axis, the Y-axis or the Z-axis, and this embodiment of the present application does not limit this.

In some embodiments, if the above point cloud is a lidar point cloud, then it can be seen from the above that the i-th coordinate axis is the X-axis or the Y-axis.

In some embodiments, if the point cloud is a point cloud facing the human eye, it can be known from the above that the i-th coordinate axis can be any one of the X-axis, Y-axis or Z-axis.

In an embodiment of the present application, when the decoding end decodes the coordinate information of the current point on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis is determined based on the coordinate information of the first point on the i-th coordinate axis included in the N prediction nodes. In this way, the context model corresponding to the i-th coordinate axis can be selected from multiple context models based on the first context index and/or the second context index corresponding to the i-th coordinate axis, and then the context model corresponding to the i-th coordinate axis is used to predict and decode the coordinate information of the current point on the i-th coordinate axis. For example, the decoding end determines the second context index corresponding to the X-axis based on the coordinate information of the first point on the X-axis included in the N prediction nodes, and selects the context model corresponding to the X-axis from multiple context models based on the first context index and/or the second context index corresponding to the X-axis, and then uses the context model corresponding to the X-axis to predict and decode the coordinate information of the current point on the X-axis to obtain the X coordinate value of the current point. For another example, the decoding end determines the second context index corresponding to the Y-axis based on the coordinate information of the first point on the Y-axis included in the N prediction nodes, and based on the first context index and/or the second context index corresponding to the Y-axis, selects the context model corresponding to the Y-axis from multiple context models, and then uses the context model corresponding to the Y-axis to predict and decode the coordinate information of the current point on the Y-axis to obtain the Y coordinate value of the current point.

The following introduces a process of determining the second context index corresponding to the i-th coordinate axis at the decoding end based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis.

In the embodiment of the present application, the implementation methods of the above S102-A1-321 include but are not limited to the following:

Method 1: weight the first points included in the N prediction nodes, and determine the second context index corresponding to the i-th coordinate axis based on the weighted coordinate information. At this time, the above S102-A1-321 includes the following steps S102-A1-321-11 to S102-A1-321-13:

S102-A1-321-11, determine a first weight corresponding to the prediction node;

S102-A1-321-12, based on the first weight, weighting the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point;

S102-A1-321-13. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the first weighted point on the i-th coordinate axis.

In the first mode, if the current node includes multiple prediction nodes, that is, N prediction nodes, when determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes, a weight, that is, the first weight, can be determined for each of the N prediction nodes. In this way, based on the first weight of each prediction node, the coordinate information of the first point included in each prediction node can be weighted to obtain a first weighted point, and then the second context index corresponding to the i-th coordinate axis can be determined based on the coordinate information of the first weighted point on the i-th coordinate axis, thereby improving the accuracy of decoding the current point based on the geometric decoding information of the N prediction nodes.

The process of determining the first weights corresponding to the N prediction nodes respectively in the embodiment of the present application can refer to the description of the above embodiment, which will not be repeated here.

After the decoding end determines the first weight corresponding to each of the N prediction nodes, based on the first weight, the coordinate information of the first point included in the N prediction nodes is weighted to obtain a first weighted point.

The embodiment of the present application does not limit the specific method of obtaining the first weighted point by weighted processing the coordinate information of the first point included in the N prediction nodes based on the first weight.

In one example, based on the first weight, the coordinate information of the first point included in the N prediction nodes is weighted averaged to obtain a first weighted point.

Based on the method steps, after the first weighted point is determined, the second context index corresponding to the i-th coordinate axis is determined based on the coordinate information of the first weighted point on the i-th coordinate axis.

As can be seen from the above, the first weighted point is obtained by weighting the first point in the N prediction nodes, wherein the values of each bit of the first point after the prediction node have only two results, which are 0 or 1. Therefore, in some embodiments, the values of each bit of the first weighted point obtained by weighting the first point in the N prediction nodes are also 0 or 1. In this way, when decoding the i-th bit of the current point on the i-th coordinate axis, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined based on the value of the i-th bit of the first weighted point on the i-th coordinate axis. For example, if the value of the i-th bit of the first weighted point on the i-th coordinate axis is 0, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 0. For another example, if the value of the i-th bit of the first weighted point on the i-th coordinate axis is 1, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 1. Finally, the decoding end determines the context model corresponding to the i-th bit on the i-th coordinate axis based on the first context index and/or the second context index corresponding to the i-th bit on the i-th coordinate axis, and uses the context model to predict and decode the value of the i-th bit of the current point on the i-th coordinate axis.

In addition to determining the second context index based on the above-mentioned method 1, the decoding end may also determine the second context index through the following method 2.

Method 2: if K is greater than 1, weight the first point included in the prediction node in each of the K prediction reference frames, and determine the second context index corresponding to the i-th coordinate axis based on the weighted coordinate information. At this time, the above S102-B includes the following steps S102-B21 to S102-B23:

S102-A1-321-21. For a j-th prediction reference frame among the K prediction reference frames, determine a first weight corresponding to a prediction node in the j-th prediction reference frame;

S102-A1-321-22, based on the first weight, weighting the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K;

S102-A1-321-23. Determine a second context index corresponding to the i-th coordinate axis based on the second weighted points corresponding to the K predicted reference frames.

In the second method, when determining the geometric information of the current point, each of the K prediction reference frames is considered separately. Specifically, the coordinate information of the first point in the prediction node of each of the K prediction reference frames is determined, and the second weighted point corresponding to each prediction reference frame is determined, and then based on the coordinate information of the second weighted point corresponding to each prediction reference frame, the second context index corresponding to the i-th coordinate axis is determined, and the second context index is accurately predicted, thereby improving the decoding efficiency of the point cloud.

In the embodiment of the present application, the specific method in which the decoding end determines the second weighted point corresponding to each of the K prediction reference frames is the same. For ease of description, the jth prediction reference frame among the K prediction reference frames is used as an example for illustration.

In an embodiment of the present application, the current node includes at least one prediction node in the jth prediction reference frame, so that based on the coordinate information of the first point included in the at least one prediction node in the jth prediction reference frame, the second weighted point corresponding to the jth prediction reference frame is determined.

For example, the j-th prediction reference frame includes two prediction nodes of the current node, which are respectively recorded as prediction node 1 and prediction node 2, and then the geometric information of the first point included in prediction node 1 and the coordinate information of the first point included in prediction node 2 are weighted to obtain the second weighted point corresponding to the j-th prediction reference frame.

Before the decoder performs weighted processing on the geometric information of the first point included in the prediction node in the jth prediction reference frame, it is necessary to first determine the first weight corresponding to each prediction node in the jth prediction reference frame. The process of determining the first weight can refer to the description of the above embodiment and will not be repeated here.

Next, the decoding end performs weighted processing on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame based on the first weight, to obtain a second weighted point corresponding to the j-th prediction reference frame.

In one example, based on the first weight, weighted averaging is performed on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame.

The above introduces the process of determining the second weighted point corresponding to the j-th prediction reference frame among the K prediction reference frames. The second weighted points corresponding to other prediction reference frames among the K prediction reference frames can be determined by referring to the method corresponding to the j-th prediction reference frame.

After the decoding end determines the second weighted point corresponding to each of the K predicted reference frames, it executes the above step S102-A1-321-23.

The present application does not limit the specific method of determining the second context index corresponding to the i-th coordinate axis based on the second weighted points corresponding to the K prediction reference frames.

In some embodiments, the decoding end determines an average value of coordinate information of second weighted points corresponding to K predicted reference frames on the i-th coordinate axis, and determines a second context index corresponding to the i-th coordinate axis based on the average value.

In some embodiments, the above S102-A1-321-23 includes the following steps S102-A1-321-231 to S102-A1-321-233:

S102-A1-321-231, determine second weights corresponding to K prediction reference frames;

S102-B232, weighting the geometric information of the second weighted points corresponding to the K prediction reference frames based on the second weight to obtain a third weighted point;

S102-B233. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the third weighted point on the i-th coordinate axis.

In this embodiment, the decoding end can refer to the method of the above embodiment to determine the second weight corresponding to each of the K prediction reference frames. Then, based on the second weight, the coordinate information of the second weighted points corresponding to the K prediction reference frames is weighted to obtain a third weighted point.

For example, assuming K=2, for example, the current frame to be decoded includes 2 prediction reference frames, and these 2 prediction reference frames include the forward frame and backward frame of the current frame to be decoded. Assume that the second weight corresponding to the forward frame is W1, and the second weight corresponding to the backward frame is W2. Based on W1 and W2, the geometric information of the second weighted point corresponding to the forward frame and the geometric information of the second weighted point corresponding to the backward frame are weighted to obtain the geometric information of the third weighted point.

In one example, based on the second weight, weighted averaging is performed on the geometric information of the second weighted points respectively corresponding to the K prediction reference frames to obtain coordinate information of the third weighted point.

After determining the geometric information of the third weighted point based on the above steps, the decoding end determines the second context index corresponding to the i-th coordinate axis based on the coordinate information of the third weighted point on the i-th coordinate axis.

As can be seen from the above, the third weighted point is obtained by weighting the first point in the prediction node in each prediction reference frame, wherein the value of each bit of the first point after the prediction node has only two results, which is 0 or 1. Therefore, in some embodiments, the value of each bit of the first weighted point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when the i-th bit of the current point on the i-th coordinate axis is decoded, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined based on the value of the i-th bit of the third weighted point on the i-th coordinate axis. For example, if the value of the i-th bit of the third weighted point on the i-th coordinate axis is 0, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 0. For another example, if the value of the i-th bit of the third weighted point on the i-th coordinate axis is 1, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 1. Finally, the decoding end determines the context model corresponding to the i-th bit on the i-th coordinate axis based on the first context index and/or the second context index corresponding to the i-th bit on the i-th coordinate axis, and uses the context model to predict and decode the value of the i-th bit of the current point on the i-th coordinate axis.

After the decoding end determines the first context index and/or the second context index based on the above steps, it determines the context model based on the first context index and/or the second context index, and uses the context model to decode the coordinate information of the current point.

In an example, assuming that the DCM mode information of the prediction node is PredDCMode, and the number of points contained in the prediction node is PredNumPoints, and assuming that the geometric information of the first point in the prediction node is predPointPos. Assume that the decoder uses the IDCM mode of the prediction node and the geometric information of the point in the prediction node to predict and decode the geometric information of the current point, that is, the geometric decoding information of the prediction node used by the decoder includes the following two types:

1) IDCM mode of the prediction node;

2) The geometric information of the midpoint of the prediction node (i.e., the first point), that is, the bit information (0 or 1) corresponding to the accuracy of the midpoint of the prediction node.

For example, in the GPCC framework, the IDCM mode of the prediction node includes PredDCMode(0,1,2). In the AVS framework, the IDCM of the prediction node

Modes include PredDCMode(0,1).

Assuming that the number of points in the current node is numPoints, and the geometric information of each point is PointPos, and the bit precision depth to be encoded is nodeSizeLog2, the decoding process of the geometric information of each point in the current node is as follows:

Through the above decoding process, the geometric information of each point in the current node can be obtained. Wherein, ctx1 is the first context index, and ctx2 is the second context index.

The point cloud decoding method provided by the embodiment of the present application, when decoding the current node in the current decoding frame, determines N predicted nodes of the current node in the predicted reference frame of the current frame to be decoded, and predicts and decodes the coordinate information of the midpoint of the current node based on the geometric decoding information of the midpoints of these N predicted nodes. In other words, the embodiment of the present application optimizes the direct DCM decoding of the node, and predicts and decodes the geometric information of the midpoint of the IDCM node (i.e., the current node) of the to-be-decoded node by considering the correlation in the time domain between adjacent frames, and further improves the efficiency of geometric information decoding of the point cloud by considering the correlation in the time domain between adjacent frames.

The above takes the decoding end as an example to introduce in detail the point cloud decoding method provided in the embodiment of the present application. The following takes the encoding end as an example to introduce the point cloud encoding method provided in the embodiment of the present application.

Fig. 17 is a schematic diagram of a point cloud coding method according to an embodiment of the present application. The point cloud coding method according to the embodiment of the present application can be implemented by the point cloud coding device shown in Fig. 3 or Fig. 4A or Fig. 8A.

As shown in FIG. 17 , the point cloud encoding method of the embodiment of the present application includes:

S201. Determine N prediction nodes of a current node in a prediction reference frame of a current frame to be encoded.

The current node is the node to be encoded in the current frame to be encoded.

As can be seen from the above, the point cloud includes geometric information and attribute information, and the encoding of the point cloud includes geometric encoding and attribute encoding. The embodiment of the present application relates to geometric encoding of point clouds.

In some embodiments, the geometric information of the point cloud is also referred to as the position information of the point cloud, and therefore, the geometric encoding of the point cloud is also referred to as the position encoding of the point cloud.

However, the octree-based geometric information encoding mode has an efficient compression rate for correlated points in space, and for points in isolated positions in the geometric space, the use of direct encoding can greatly reduce complexity and improve encoding efficiency.

Specifically, the encoding end first determines whether the node is qualified for direct encoding. If the node is qualified for direct encoding, it determines whether the number of points of the node is less than or equal to a preset threshold. If the number of points of the node is less than or equal to the preset threshold, it determines that the node can be encoded using direct encoding. Then, the number of points included in the node and the geometric information of each point are encoded into the bitstream.

In order to solve the above problems, in an embodiment of the present application, the encoding end predictively encodes the position information of the midpoint of the current node based on the inter-frame information corresponding to the current node, thereby improving the encoding efficiency and encoding performance of the point cloud.

Specifically, the encoder first determines N prediction nodes of the current node in the prediction reference frame of the current frame to be encoded.

It should be noted that the current frame to be encoded is a point cloud frame. In some embodiments, the current frame to be encoded is also referred to as the current frame or the current point cloud frame or the current point cloud frame to be encoded. The current node can be understood as any non-leaf node in the current frame to be encoded. In other words, the current node is not a leaf node in the octree corresponding to the current frame to be encoded, that is, the current node is an arbitrary middle node of the octree, and the current node is not a non-empty node, that is, it includes at least 1 point.

In the embodiment of the present application, when encoding the current node in the current frame to be encoded, the encoder first determines the prediction reference frame of the current frame to be encoded, and determines N prediction nodes of the current node in the prediction reference frame. For example, FIG12 shows a prediction node of the current node in the prediction reference frame.

It should be noted that the embodiment of the present application does not limit the number of prediction reference frames of the current frame to be encoded, for example, the current frame to be encoded has one prediction reference frame, or the current frame to be encoded has multiple prediction reference frames. At the same time, the embodiment of the present application does not limit the number N of prediction nodes of the current node, which is determined according to actual needs.

The embodiment of the present application does not limit the specific method of determining the prediction reference frame of the current frame to be encoded.

In some embodiments, one or several encoded frames before the current frame to be encoded are determined as prediction reference frames for the current frame to be encoded.

For example, if the current frame to be encoded is a P frame, the inter-frame reference frame of the P frame includes the previous frame of the P frame (ie, the forward frame). Therefore, the previous frame of the current frame to be encoded (ie, the forward frame) can be determined as the prediction reference frame of the current frame to be encoded.

For another example, if the current frame to be encoded is a B frame, the inter-frame reference frames of the B frame include the previous frame of the P frame (i.e., the forward frame) and the next frame of the P frame (i.e., the backward frame). Therefore, the previous frame of the current frame to be encoded (i.e., the forward frame) can be determined as the predicted reference frame of the current frame to be encoded.

In some embodiments, one or several encoded frames following the current frame to be encoded are determined as prediction reference frames of the current frame to be encoded.

For example, if the current frame to be encoded is a B frame, the frame following the current frame to be encoded may be determined as a prediction reference frame of the current frame to be encoded.

In some embodiments, one or several encoded frames before the current frame to be encoded, and one or several encoded frames after the current frame to be encoded, are determined as prediction reference frames of the current frame to be encoded.

For example, if the current frame to be encoded is a B frame, the previous frame and the next frame of the current frame to be encoded may be determined as prediction reference frames of the current frame to be encoded. In this case, the current frame to be encoded has two prediction reference frames.

Taking the current frame to be encoded including K prediction reference frames as an example, the specific process of determining N prediction nodes of the current node in the prediction reference frame of the current frame to be encoded in S201-A above is introduced.

In some embodiments, the encoder selects at least one prediction reference frame from the K prediction reference frames based on the placeholder information of the node in the current frame to be encoded and the placeholder information of the node in each of the K prediction reference frames, and then searches for the prediction node of the current node in the at least one prediction reference frame. For example, at least one prediction reference frame whose placeholder information of the node is closest to the placeholder information of the node in the current frame to be encoded is selected from the K prediction reference frames, and then searches for the prediction node of the current node in the at least one prediction reference frame.

In some embodiments, the encoder may determine N prediction nodes of the current node through the following steps S201-A1 and S201-A2:

S201-A1, for a k-th prediction reference frame among K prediction reference frames, determining at least one prediction node of a current node in the k-th prediction reference frame, where k is a positive integer less than or equal to K, and K is a positive integer;

S201-A2: Determine N prediction nodes of the current node based on at least one prediction node of the current node in K prediction reference frames.

In this embodiment, the encoding end determines at least one prediction node of the current node from each of the K prediction reference frames, and finally summarizes at least one prediction node in each of the K prediction reference frames to obtain N prediction nodes of the current node.

Among them, the process of the encoding end determining at least one prediction point of the current node in each of the K prediction reference frames is the same. For the sake of ease of description, the kth prediction reference frame among the K prediction reference frames is taken as an example for explanation.

The specific process of determining at least one prediction node of the current node in the kth prediction reference frame in the above S201-A1 is introduced below.

The embodiment of the present application does not limit the specific manner in which the encoder determines at least one prediction node of the current node in the kth prediction reference frame.

Mode 2, in the above S201-A1, determining at least one prediction node of the current node in the kth prediction reference frame includes the following steps S201-A11 to S201-A13:

S201-A11, in the current frame to be encoded, determine M domain nodes of the current node, the M domain nodes include the current node, and M is a positive integer;

S201-A12, for the i-th domain node among the M domain nodes, determine the corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M;

S201-A13. Determine at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame.

In this implementation, before determining at least one prediction node of the current node in the kth prediction reference frame, the encoder first determines M domain nodes of the current node in the current frame to be encoded, and the M domain nodes include the current node itself.

In one example, the M domain nodes of the current node include at least one domain node among the domain nodes that are coplanar, colinear, and co-pointed with the current node in the current frame to be encoded. As shown in FIG13 , the current node includes 6 coplanar nodes, 12 colinear nodes, and 8 co-pointed nodes.

In another example, the M domain nodes of the current node may include other nodes within the reference neighborhood in addition to at least one domain node in the current frame to be encoded that is coplanar, colinear, and co-point with the current node. This embodiment of the present application does not impose any restrictions on this.

Based on the above steps, the encoding end determines the M domain nodes of the current node in the current frame to be encoded, and then determines the corresponding node of each of the M domain nodes in the kth prediction reference frame, and then determines at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame.

The embodiment of the present application does not limit the specific implementation method of S201-A13.

In another possible implementation, the encoder determines the corresponding nodes of the M domain nodes in the kth prediction reference frame as at least one prediction node of the current node in the kth prediction reference frame. For example, the M domain nodes each have a corresponding node in the kth prediction reference frame, and thus there are M corresponding nodes. The M corresponding nodes are determined as the prediction nodes of the current node in the kth prediction reference frame, and there are M prediction nodes in total.

The above describes the process of determining at least one prediction node of the current node in the kth prediction reference frame. In this way, the encoder can use the same method as above to determine at least one prediction node of the current node in each of the K prediction reference frames.

After the encoder determines at least one prediction node of the current node in each of the K prediction reference frames, it executes the above step S201-B, that is, determines N prediction nodes of the current node based on at least one prediction node of the current node in the K prediction reference frames.

In the second method, after the encoding end determines the M domain nodes of the current node in the current frame to be encoded, it determines the corresponding node of each of the M domain nodes in the kth prediction reference frame, and then determines at least one prediction point of the current node in the kth prediction reference frame based on the corresponding node of each of the M domain nodes.

Mode 3, in the above S201-A1, determining at least one prediction node of the current node in the kth prediction reference frame includes the following steps S201-B11 to S201-B13:

S201-B11, determining the corresponding node of the current node in the kth prediction reference frame;

S201-B12, determining at least one domain node of the corresponding node;

S201-B13. Determine at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.

In this method 3, for each of the K prediction reference frames, the encoder first determines the corresponding node of the current node in each prediction reference frame. For example, the corresponding node 1 of the current node in the prediction reference frame 1 is determined, and the corresponding node 2 of the current node in the prediction reference frame 2 is determined. Next, the encoder determines at least one domain node of each corresponding node. For example, at least one domain node of the corresponding node 1 is determined in the prediction reference frame 1, and at least one domain node of the corresponding node 2 is determined in the prediction reference frame 2. In this way, at least one domain node of the corresponding node 1 in the prediction reference frame 1 can be determined as at least one prediction node of the current node in the prediction reference frame 1, and at least one domain node of the corresponding node 2 in the prediction reference frame 2 can be determined as at least one prediction node of the current node in the prediction reference frame 2.

The process of determining the corresponding node of the i-th domain node in the k-th prediction reference frame in S201-A12 of the second method is basically the same as the process of determining the corresponding node of the current node in the k-th prediction reference frame in S201-B11 of the above-mentioned third method. For the convenience of description, the above-mentioned i-th domain node and the current node are recorded as the i-th node, and the specific process of determining the corresponding node of the i-th node in the k-th prediction reference frame is introduced below.

The encoder determines the corresponding node of the i-th node in the k-th prediction reference frame in at least the following ways:

Mode 2: The above S201-A12 and S201-B11 include the following steps:

S201-A121, in the current frame to be encoded, determine the parent node of the i-th node as the i-th parent node;

S201-A122, determining a matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node;

S201-A123. Determine one of the child nodes of the i matching nodes as the corresponding node of the i-th node in the k-th prediction reference frame.

In the method 2, for the i-th node, the encoding end determines the parent node of the i-th node in the current frame to be encoded, and then determines the matching node of the parent node of the i-th prediction domain node in the k-th prediction reference frame. For the convenience of description, the parent node of the i-th node is recorded as the i-th parent node, and the matching node of the parent node of the i-th node in the k-th prediction reference frame is determined as the i-th matching node. Then, a child node of the child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame, thereby accurately determining the corresponding node of the i-th node in the k-th prediction reference frame.

The specific process of determining the matching node of the i-th parent node in the k-th prediction reference frame in the above S201-A122 is introduced below.

The embodiment of the present application does not limit the specific manner in which the encoder determines the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the division depth of the i-th parent node in the current frame to be encoded is determined, for example, the i-th parent node is in the second layer of the octree of the current frame to be encoded. In this way, the encoder can determine one of the nodes in the k-th prediction reference frame that have the same division depth as the i-th parent node as the matching node of the i-th parent node in the k-th prediction reference frame. For example, one of the nodes in the second layer in the k-th prediction reference frame is determined as the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the encoder determines the matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node. Specifically, since the placeholder information of the i-th parent node in the current frame to be encoded has been encoded, and the placeholder information of each node in the k-th prediction reference frame has also been encoded, the encoder can search for the matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node.

For example, assuming that the placeholder information of the i-th parent node is 11001101, the node whose placeholder information is the smallest different from the placeholder information 11001101 is searched in the k-th prediction reference frame. Specifically, the encoder performs an XOR operation on the placeholder information of the i-th parent node and the placeholder information of each node in the k-th prediction reference frame, and determines the node with the smallest XOR operation result in the k-th prediction reference frame as the matching node of the i-th parent node in the k-th prediction reference frame.

Based on the above steps, the encoder can determine the matching node of the i-th parent node in the k-th prediction reference frame. For ease of description, the matching node is recorded as the i-th matching node.

Next, the encoder determines one of the child nodes of the i-th matching node as the corresponding node of the i-th domain node in the k-th prediction reference frame.

For example, the encoder determines a default child node among the child nodes included in the i-th matching node as the corresponding node of the i-th node in the k-th prediction reference frame. Assume that the first child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.

For another example, the encoder determines the first sequence number of the i-th node in the child nodes included in the parent node; the child node with the first sequence number in the child nodes of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame. Exemplarily, as shown in FIG14, the i-th node is the second child node of the i-th parent node, and the first sequence number is 2. In this way, the second child node of the i-th matching node can be determined as the corresponding node of the i-th node.

The above describes the process of determining the corresponding node of the i-th domain node in the M domain nodes in the k-th prediction reference frame, and determining the corresponding node of the current node in the k-th prediction reference frame. In this way, the encoder can use the second or third method to determine the N prediction nodes of the current node in the prediction reference frame.

Based on the above steps, the encoder determines N prediction nodes of the current node in the prediction reference frame of the current frame to be encoded, and then performs the following step S202.

S202. Based on the geometric coding information of the midpoints of the N predicted nodes, predictive coding is performed on the position information of the midpoint of the current node.

Due to the correlation between adjacent frames of the point cloud, the embodiment of the present application refers to the relevant information between frames when predicting the position information of the midpoint of the current node based on the correlation between adjacent frames of the point cloud. Specifically, the WeChat information of the midpoint of the current node is predictively encoded based on the geometric encoding information of the midpoints of the N predicted nodes of the current node, thereby improving the encoding efficiency and encoding performance of the point cloud.

It should be noted that, in the embodiment of the present application, the coordinate information of the midpoint of the current node is predictively encoded based on the geometric encoding information of the N prediction nodes, which can be understood as using the geometric encoding information of the N prediction nodes as context to predictively encode the coordinate information of the midpoint of the current node. For example, the encoding end determines the index of the context model based on the geometric encoding information of the N prediction nodes, and then determines the target context model from the preset multiple context models based on the index of the context model, and uses the context model to predictively encode the coordinate information of the midpoint of the current node.

In an embodiment of the present application, based on the geometric coding information of N prediction nodes, the process of predictive coding the coordinate information of each point in the current node is basically the same. For the sake of ease of description, the predictive coding of the coordinate information of the current point in the current node is taken as an example to illustrate.

In some embodiments, the above S202 includes the following steps:

S202-A, determining the index of the context model based on the geometric coding information of the N prediction nodes;

S202-B, determining a context model based on an index of the context model;

S202-C. Use the context model to predictively encode the coordinate information of the current point in the current node.

In the embodiment of the present application, multiple context models are set for the encoding process of the coordinate information, for example, Q context models. The embodiment of the present application does not limit the specific number of context models corresponding to the coordinate information, as long as Q is greater than 1. That is to say, in the embodiment of the present application, at least one optimal context model is selected from two context models, and the coordinate information of the current point in the current node is predictively encoded to improve the encoding efficiency of the coordinate information of the current point.

In the embodiment of the present application, the geometric coding information of the prediction node can be understood as any information involved in the geometric coding process of the prediction node, such as the number of points included in the prediction node, the placeholder information of the prediction node, the coding method of the prediction node, the geometric information of the midpoint of the prediction node, etc.

In some embodiments, the geometric coding information of the prediction node includes direct coding information of the prediction node and/or coordinate information of a midpoint of the prediction node, wherein the direct coding information of the prediction node is used to indicate whether the prediction node meets the conditions for encoding in a direct coding manner.

Based on this, the above S202-A includes the following step S202-A1:

S202-A1. Determine a first context index based on direct encoding information of the N prediction nodes, and/or determine a second context index based on coordinate information of midpoints of the N prediction nodes.

Correspondingly, the above S202-B includes the following steps S202-B:

S202-B1. Select a context model from a plurality of preset context models based on the first context index and/or the second context index.

In this embodiment, if the geometric coding information of the prediction node includes the direct coding information of the prediction node and/or the coordinate information of the midpoint of the prediction node, the encoding end can determine the first context index based on the direct coding information of the N prediction nodes, and/or determine the second context index based on the position information of the midpoint of the N prediction nodes, and then select the final context model from the preset multiple context models based on the first context index and/or the second context index.

It can be seen that in this embodiment, the encoding end determines the context model in the following ways, including but not limited to:

In one possible implementation, if the geometric coding information of the prediction node includes direct coding information of the prediction node, the process of determining the context model may be to determine a first context index based on the direct coding information of N prediction nodes, and then based on the first context index, select a final context model from a plurality of preset context models to encode the coordinate information of the current point.

For example, the encoder selects the final context model from the context models shown in Table 4 based on the first context index.

In another possible implementation, if the geometric coding information of the prediction node includes the coordinate information of the midpoint of the prediction node, the process of determining the context model may be to determine the second context index based on the coordinate information of the midpoints of N prediction nodes, and then based on the second context index, select the final context model from the preset multiple context models to encode the coordinate information of the current point.

For example, the encoder selects the final context model from the context models shown in Table 4 based on the second context index.

In another possible implementation, if the geometric coding information of the prediction node includes the direct coding information of the prediction node and the coordinate information of the midpoint of the prediction node, the process of determining the context model may be to determine a first context index based on the direct coding information of N prediction nodes, and then determine a second context index based on the first context index and the coordinate information of the midpoint of the N prediction nodes, and then select a final context model from a plurality of preset context models based on the second context index, the first context index and the second context index to encode the coordinate information of the current point.

Exemplarily, the correspondence between the first context index, the second context index and the context model is shown in Table 5.

In this mode 3, the encoder determines the first context index based on the direct coding information of the N prediction nodes, and determines the second context index based on the coordinate information of the midpoints of the N prediction nodes, and then checks the above table 5 to obtain the final context model. For example, the encoder determines that the first context index is the first context index 2 based on the direct coding information of the N prediction nodes, and determines that the second context index is the second context index 3 based on the coordinate information of the midpoints of the N prediction nodes. By checking Table 5, the final context model can be obtained as context model 23, and then the encoder uses context model 23 to encode the coordinate information of the current point.

The specific process of determining the first context index based on the direct encoding information of the N prediction nodes in S202-A1 is introduced below.

In the embodiment of the present application, the encoding end determines the first context index in the following ways, but is not limited to:

Method 1: the above S202-A1 includes the following steps S202-A1-11 and S202-A1-12:

S202-A1-11. For any prediction node among the N prediction nodes, determine a first numerical value corresponding to the prediction node based on direct encoding information of the prediction node.

In this manner, for each of the N prediction nodes, a first numerical value corresponding to the prediction node is determined based on direct encoding information of the prediction node, and finally a first context index is determined based on the first numerical values corresponding to the N prediction nodes.

As can be seen from the above, the direct coding information of the prediction node is used to indicate whether the prediction node meets the conditions for encoding in the direct coding manner. The embodiment of the present application does not limit the specific content of the direct coding information.

In some embodiments, the direct encoding information includes the number of points included in the prediction node. In this way, the first value corresponding to the prediction node can be determined based on the number of points included in the prediction node.

In some embodiments, the direct encoding information of the prediction node includes the direct encoding mode of the prediction node. In this case, the above S202-A1-11 includes: numbering the direct encoding mode of the prediction node to determine the first value corresponding to the prediction node.

For example, in the GPCC framework, if the direct encoding mode of the prediction node is mode 0, the first value corresponding to the prediction node is determined to be 0. If the direct encoding mode of the prediction node is mode 1, the first value corresponding to the prediction node is determined to be 1. If the direct encoding mode of the prediction node is mode 2, the first value corresponding to the prediction node is determined to be 2.

Based on the above steps, after the encoder determines the first value corresponding to each of the N prediction nodes, it executes the following step S202-A1-12.

S202-A1-12. Determine a first context index based on first numerical values corresponding to the N prediction nodes.

After determining the first values corresponding to the N prediction nodes based on the above steps, the encoder determines a first context index based on the first values corresponding to the N prediction nodes.

Mode 2, S202-A1-12 includes the following steps S202-A1-121 to S202-A1-123:

S202-A1-121, determine a first weight corresponding to the prediction node;

S202-A1-122, based on the first weight, weighting the first values corresponding to the N prediction nodes to obtain a first weighted prediction value;

S202-A1-123. Determine a first context index based on the first weighted prediction value.

In method 2, if the current node includes multiple prediction nodes, i.e., N prediction nodes, when determining the first context index based on the first numerical values corresponding to the N prediction nodes, a weight, i.e., the first weight, can be determined for each of the N prediction nodes. In this way, the first numerical values corresponding to each prediction node can be weighted based on the first weight of each prediction node, and then the first context index can be determined based on the final weighted result, thereby improving the accuracy of determining the first context index based on the geometric coding information of the N prediction nodes.

After determining the first weighted prediction value based on the method steps, the first context index is determined based on the first weighted prediction value, that is, the above S202-A1-123 includes at least the following examples:

The above is an introduction to the process in which the encoder performs weighted processing on N prediction nodes to obtain the first context index.

In some embodiments, the encoding end may also adopt the following method 2 to determine the first context index.

Method 2: if K is greater than 1, determine the second weighted prediction value corresponding to each of the K prediction reference frames, and then determine the first context index based on the second weighted prediction values corresponding to the K prediction reference frames. At this time, the above S202-A1 includes the following steps S202-A1-21 to S202-A1-23:

S202-A1-21. For a j-th prediction reference frame among the K prediction reference frames, determine a first value corresponding to the prediction node in the j-th prediction reference frame based on direct encoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K.

S202-A1-22, determining a first weight corresponding to the prediction node, and performing weighted processing on the first value corresponding to the prediction node in the j-th prediction reference frame based on the first weight, to obtain a second weighted prediction value corresponding to the j-th prediction reference frame;

S202-A1-23. Determine a first context index based on the second weighted prediction values corresponding to the K prediction reference frames.

In the second method, when determining the first context index, each of the K prediction reference frames is considered as separate context information. Specifically, the direct coding information of the prediction node included in each of the K prediction reference frames is determined, and the second weighted prediction value corresponding to each prediction reference frame is determined, and then based on the second weighted prediction value corresponding to each prediction reference frame, the first context index is determined to achieve accurate selection of the first context index, thereby improving the coding efficiency of the point cloud.

In the embodiment of the present application, the specific method in which the encoding end determines the second weighted prediction value corresponding to each of the K prediction reference frames is the same. For the sake of ease of description, the jth prediction reference frame among the K prediction reference frames is used as an example for illustration.

In an embodiment of the present application, the current node includes at least one prediction node in the jth prediction reference frame, so that the first value of the at least one prediction node is determined based on the direct encoding information of the at least one prediction node in the jth prediction reference frame.

For example, the j-th prediction reference frame includes two prediction nodes of the current node, which are respectively recorded as prediction node 1 and prediction node 2, and then the first value of prediction node 1 is determined based on the direct coding information of prediction node 1, and the first value of prediction node 2 is determined based on the direct coding information of prediction node 2. The process of determining the first value corresponding to the prediction node based on the direct coding information of the prediction node can refer to the description of the above embodiment, and the first value corresponding to the prediction node is determined based on the direct coding mode of the prediction node, for example, the number (0, 1 or 2) of the direct coding mode of the prediction node is determined as the first value corresponding to the prediction node.

After the encoding end determines the first value of at least one prediction node included in the j-th prediction reference frame, it determines the first weight corresponding to the at least one prediction node, and weightedly processes the first value corresponding to the at least one prediction node based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame.

After the encoding end determines the second weighted prediction value corresponding to each of the K prediction reference frames, it executes the above step S202-A1-23.

In some embodiments, the encoding end determines an average value of the second weighted prediction values corresponding to the K prediction reference frames as the first context index.

In some embodiments, the encoding end determines second weights corresponding to K prediction reference frames, and performs weighted processing on second weighted prediction values corresponding to the K prediction reference frames based on the second weights to obtain a first context index.

In this embodiment, the encoding end first determines the second weight corresponding to each of the K prediction reference frames. The embodiment of the present application does not limit the determination of the second weight corresponding to each of the K prediction reference frames.

In some embodiments, the second weight corresponding to each of the K prediction reference frames is a preset value. As can be seen from the above, the K prediction reference frames are forward frames and/or backward frames of the current frame to be encoded. Assuming that prediction reference frame 1 is the forward frame of the current frame to be encoded, the second weight corresponding to prediction reference frame 1 is the preset weight 1. If prediction reference frame 1 is the backward frame of the current frame to be encoded, the second weight corresponding to prediction reference frame 1 is the preset weight 2.

In some embodiments, based on the time difference between the predicted reference frame and the current frame to be encoded, the second weight corresponding to the predicted reference frame is determined. In an embodiment of the present application, each point cloud includes time information, and the time information may be the time when the point cloud acquisition device acquires the point cloud of the frame. Based on this, if the time difference between the predicted reference frame and the current frame to be encoded is smaller, the inter-frame correlation between the predicted reference frame and the current frame to be encoded is stronger, and thus the second weight corresponding to the predicted reference frame is larger. For example, the inverse of the time difference between the predicted reference frame and the current frame to be encoded can be determined as the second weight corresponding to the predicted reference frame.

For example, assuming K=2, for example, the current frame to be encoded includes 2 prediction reference frames, and these 2 prediction reference frames include the forward frame and the backward frame of the current frame to be encoded. Assuming that the second weight corresponding to the forward frame is W1, and the second weight corresponding to the backward frame is W2, based on W1 and W2, the second weighted prediction value corresponding to the forward frame and the second weighted prediction value corresponding to the backward frame are weighted to obtain the first context index.

The above describes the process of determining the first context index at the encoding end.

The following introduces the process of determining the second context index at the encoding end.

In some embodiments, if the prediction node includes multiple points, a point is selected from the multiple points to determine the second context index. At this time, the above S202-A1 includes the following steps S202-A1-31 and S202-A1-32:

S202-A1-31, for any prediction node among the N prediction nodes, select a first point corresponding to the current point of the current node from the points included in the prediction node;

S202-A1-32. Determine a second context index based on the coordinate information of the first point included in the N prediction nodes.

In one possible implementation, if the encoder selects the first point corresponding to the current point from the points included in the prediction node based on the rate-distortion cost (or approximate cost), the encoder writes the identification information of the first point in the prediction node into the bitstream, so that the encoder obtains the first point in the prediction node by encoding the bitstream.

For each of the N prediction nodes, the encoder determines the first point corresponding to the current point in each prediction node based on the above method, and then executes the above step S202-A1-32.

In the embodiment of the present application, the encoding end encodes the coordinate information of the current point on different coordinate axes respectively. Based on, the above S202-A1-32 includes the following step S202-A1-321:

S202-A1-321. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis.

In an embodiment of the present application, when the encoding end encodes the coordinate information of the current point on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis is determined based on the coordinate information of the first point on the i-th coordinate axis included in the N prediction nodes. In this way, the context model corresponding to the i-th coordinate axis can be selected from multiple context models based on the first context index and/or the second context index corresponding to the i-th coordinate axis, and then the context model corresponding to the i-th coordinate axis is used to predict the coordinate information of the current point on the i-th coordinate axis. For example, the encoding end determines the second context index corresponding to the X-axis based on the coordinate information of the first point on the X-axis included in the N prediction nodes, and selects the context model corresponding to the X-axis from multiple context models based on the first context index and/or the second context index corresponding to the X-axis, and then uses the context model corresponding to the X-axis to predict the coordinate information of the current point on the X-axis to obtain the X coordinate value of the current point. For another example, the encoding end determines the second context index corresponding to the Y-axis based on the coordinate information of the first point on the Y-axis included in the N prediction nodes, and based on the first context index and/or the second context index corresponding to the Y-axis, selects the context model corresponding to the Y-axis from multiple context models, and then uses the context model corresponding to the Y-axis to predict and encode the coordinate information of the current point on the Y-axis to obtain the Y coordinate value of the current point.

The following introduces a process in which the encoder determines the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis.

In the embodiment of the present application, the implementation methods of the above S202-A1-321 include but are not limited to the following:

Method 1: weight the first points included in the N prediction nodes, and determine the second context index corresponding to the i-th coordinate axis based on the weighted coordinate information. At this time, the above S202-A1-321 includes the following steps S202-A1-321-11 to S202-A1-321-13:

S202-A1-321-11. Determine a first weight corresponding to the prediction node;

S202-A1-321-12, based on the first weight, weighting the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point;

S202-A1-321-13. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the first weighted point on the i-th coordinate axis.

In the first mode, if the current node includes multiple prediction nodes, that is, N prediction nodes, when determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes, a weight, that is, the first weight, can be determined for each of the N prediction nodes. In this way, based on the first weight of each prediction node, the coordinate information of the first point included in each prediction node can be weighted to obtain a first weighted point, and then the second context index corresponding to the i-th coordinate axis can be determined based on the coordinate information of the first weighted point on the i-th coordinate axis, thereby improving the accuracy of encoding the current point based on the geometric encoding information of the N prediction nodes.

After the encoder determines the first weight corresponding to each of the N prediction nodes, the encoder performs weighted processing on the coordinate information of the first point included in the N prediction nodes based on the first weight to obtain a first weighted point.

As can be seen from the above, the first weighted point is obtained by weighting the first point in the N prediction nodes, wherein the values of each bit of the first point after the prediction node have only two results, which are 0 or 1. Therefore, in some embodiments, the values of each bit of the first weighted point obtained by weighting the first point in the N prediction nodes are also 0 or 1. In this way, when the i-th bit of the current point on the i-th coordinate axis is encoded, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined based on the value of the i-th bit of the first weighted point on the i-th coordinate axis. For example, if the value of the i-th bit of the first weighted point on the i-th coordinate axis is 0, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 0. For another example, if the value of the i-th bit of the first weighted point on the i-th coordinate axis is 1, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 1. Finally, the encoding end determines the context model corresponding to the i-th bit on the i-th coordinate axis based on the first context index and/or the second context index corresponding to the i-th bit on the i-th coordinate axis, and uses the context model to predict the value of the i-th bit of the current point on the i-th coordinate axis.

In addition to determining the second context index based on the above-mentioned method 1, the encoder may also determine the second context index through the following method 2.

Method 2: if K is greater than 1, weight the first point included in the prediction node in each of the K prediction reference frames, and determine the second context index corresponding to the i-th coordinate axis based on the weighted coordinate information. At this time, the above S202-B includes the following steps S202-B21 to S202-B23:

S202-A1-321-21. For a j-th prediction reference frame among the K prediction reference frames, determine a first weight corresponding to a prediction node in the j-th prediction reference frame;

S202-A1-321-22, based on the first weight, weighting the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K;

S202-A1-321-23. Determine a second context index corresponding to the i-th coordinate axis based on the second weighted points corresponding to the K predicted reference frames.

In the second method, when determining the geometric information of the current point, each of the K prediction reference frames is considered separately. Specifically, the coordinate information of the first point in the prediction node of each prediction reference frame in the K prediction reference frames is determined, and the second weighted point corresponding to each prediction reference frame is determined, and then based on the coordinate information of the second weighted point corresponding to each prediction reference frame, the second context index corresponding to the i-th coordinate axis is determined, and the second context index is accurately predicted, thereby improving the coding efficiency of the point cloud.

In the embodiment of the present application, the specific method in which the encoding end determines the second weighted point corresponding to each of the K prediction reference frames is the same. For ease of description, the jth prediction reference frame among the K prediction reference frames is used as an example for illustration.

Before the encoder performs weighted processing on the geometric information of the first point included in the prediction node in the jth prediction reference frame, it is necessary to first determine the first weight corresponding to each prediction node in the jth prediction reference frame. The process of determining the first weight can refer to the description of the above embodiment and will not be repeated here.

Next, the encoder performs weighted processing on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighted point corresponding to the j-th prediction reference frame.

After the encoding end determines the second weighted point corresponding to each of the K predicted reference frames, it executes the above-mentioned step S202-A1-321-23.

In some embodiments, the encoding end determines an average value of coordinate information of second weighted points corresponding to K predicted reference frames on the i-th coordinate axis, and determines a second context index corresponding to the i-th coordinate axis based on the average value.

In some embodiments, the above S202-A1-321-23 includes the following steps S202-A1-321-231 to S202-A1-321-233:

S202-A1-321-231, determine second weights corresponding to K prediction reference frames;

S202-B232, weighting the geometric information of the second weighted points corresponding to the K prediction reference frames based on the second weight to obtain a third weighted point;

S202-B233. Determine a second context index corresponding to the i-th coordinate axis based on the coordinate information of the third weighted point on the i-th coordinate axis.

In this embodiment, the encoding end may refer to the method of the above embodiment to determine the second weight corresponding to each of the K prediction reference frames. Then, based on the second weight, weighted processing is performed on the coordinate information of the second weighted points corresponding to the K prediction reference frames to obtain a third weighted point.

For example, assuming K=2, for example, the current frame to be encoded includes 2 prediction reference frames, and these 2 prediction reference frames include the forward frame and backward frame of the current frame to be encoded. Assuming that the second weight corresponding to the forward frame is W1, and the second weight corresponding to the backward frame is W2, based on W1 and W2, the geometric information of the second weighted point corresponding to the forward frame and the geometric information of the second weighted point corresponding to the backward frame are weighted to obtain the geometric information of the third weighted point.

After determining the geometric information of the third weighted point based on the above steps, the encoder determines the second context index corresponding to the i-th coordinate axis based on the coordinate information of the third weighted point on the i-th coordinate axis.

As can be seen from the above, the third weighted point is obtained by weighting the first point in the prediction node in each prediction reference frame, wherein the value of each bit of the first point after the prediction node has only two results, which is 0 or 1. Therefore, in some embodiments, the value of each bit of the first weighted point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when the i-th bit of the current point on the i-th coordinate axis is encoded, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined based on the value of the i-th bit of the third weighted point on the i-th coordinate axis. For example, if the value of the i-th bit of the third weighted point on the i-th coordinate axis is 0, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 0. For another example, if the value of the i-th bit of the third weighted point on the i-th coordinate axis is 1, the second context index corresponding to the i-th bit on the i-th coordinate axis is determined to be 1. Finally, the encoding end determines the context model corresponding to the i-th bit on the i-th coordinate axis based on the first context index and/or the second context index corresponding to the i-th bit on the i-th coordinate axis, and uses the context model to predict the value of the i-th bit of the current point on the i-th coordinate axis.

After the encoding end determines the first context index and/or the second context index based on the above steps, it determines the context model based on the first context index and/or the second context index, and uses the context model to encode the coordinate information of the current point.

In an example, assuming that the DCM mode information of the prediction node is PredDCMode, and the number of points contained in the prediction node is PredNumPoints, and assuming that the geometric information of the first point in the prediction node is predPointPos. Assume that the encoder uses the IDCM mode of the prediction node and the geometric information of the point in the prediction node to predict the geometric information of the current point, that is, the geometric coding information of the prediction node used by the encoder includes the following two types:

1) IDCM mode of the prediction node;

2) The geometric information of the midpoint of the prediction node (i.e., the first point), that is, the bit information (0 or 1) corresponding to the precision of the midpoint of the prediction node.

For example, in the GPCC framework, the IDCM mode of the prediction node includes PredDCMode(0,1,2). In the AVS framework, the IDCM mode of the prediction node includes PredDCMode(0,1).

Assuming that the number of points in the current node is numPoints, and the geometric information of each point is PointPos, and the bit precision depth to be encoded is nodeSizeLog2, the geometric information encoding process of each point in the current node is as follows:

Through the above encoding process, the geometric information of each point in the current node can be obtained. Among them, ctx1 is the first context index, and ctx2 is the second context index.

The point cloud encoding method provided by the embodiment of the present application, when encoding the current node in the current encoding frame, determines N predicted nodes of the current node in the predicted reference frame of the current frame to be encoded, and predictively encodes the coordinate information of the midpoint of the current node based on the geometric encoding information of the midpoints of these N predicted nodes. In other words, the embodiment of the present application optimizes the direct DCM encoding of the node, and predictively encodes the geometric information of the midpoint of the IDCM node (i.e., the current node) of the to-be-encoded node by considering the correlation in the time domain between adjacent frames, using the geometric information of the predicted node in the predicted reference frame, and further improves the efficiency of the geometric information encoding of the point cloud by considering the correlation in the time domain between adjacent frames.

It should be understood that Figures 10 to 17 are merely examples of the present application and should not be construed as limitations to the present application.

The preferred embodiments of the present application are described in detail above in conjunction with the accompanying drawings. However, the present application is not limited to the specific details in the above embodiments. Within the technical concept of the present application, the technical solution of the present application can be subjected to a variety of simple modifications, and these simple modifications all belong to the protection scope of the present application. For example, the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present application will not further explain various possible combinations. For another example, the various different embodiments of the present application can also be arbitrarily combined, as long as they do not violate the ideas of the present application, they should also be regarded as the contents disclosed in the present application.

It should also be understood that in the various method embodiments of the present application, the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application. In addition, in the embodiment of the present application, the term "and/or" is merely a description of the association relationship of associated objects, indicating that three relationships may exist. Specifically, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this article generally indicates that the objects associated before and after are in an "or" relationship.

The above text, in combination with Figures 10 to 17 , describes in detail a method embodiment of the present application. The following text, in combination with Figures 18 to 19 , describes in detail a device embodiment of the present application.

Figure 18 is a schematic block diagram of the point cloud decoding device provided in an embodiment of the present application.

As shown in FIG. 18 , the point cloud decoding device 10 may include:

A determination unit 11 is used to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be decoded, wherein the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;

The decoding unit 12 is used to predict and decode the coordinate information of the midpoint of the current node based on the geometric decoding information of the N predicted nodes.

In some embodiments, the current frame to be decoded includes K prediction reference frames, and the determination unit 11 is specifically used to determine at least one prediction node of the current node in the kth prediction reference frame among the K prediction reference frames, where k is a positive integer less than or equal to K, and K is a positive integer; based on at least one prediction node of the current node in the K prediction reference frames, determine N prediction nodes of the current node.

In some embodiments, the determination unit 11 is specifically used to determine M domain nodes of the current node in the current frame to be decoded, where the M domain nodes include the current node, and M is a positive integer; for the i-th domain node among the M domain nodes, determine the corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M; based on the corresponding nodes of the M domain nodes in the k-th prediction reference frame, determine at least one prediction node of the current node in the k-th prediction reference frame.

In some embodiments, the determination unit 11 is specifically used to determine the corresponding node of the current node in the kth prediction reference frame; determine at least one domain node of the corresponding node; and determine the at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.

In some embodiments, the determination unit 11 is specifically used to determine the parent node of the i-th node in the current frame to be decoded, as the i-th parent node, the i-th node being the i-th domain node or the current node; determine the matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node; and determine one of the child nodes of the i-matching node as the corresponding node of the i-th node in the k-th prediction reference frame.

In some embodiments, the determination unit 11 is specifically configured to determine a matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node.

In some embodiments, the determination unit 11 is specifically used to determine the node whose occupancy information in the kth prediction reference frame has the smallest difference with the occupancy information of the i-th parent node as the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the determination unit 11 is specifically used to determine the first serial number of the i-th node among the child nodes included in the parent node; and determine the child node with the first serial number among the child nodes of the i-th matching node as the corresponding node of the i-th node in the k-th prediction reference frame.

In some embodiments, the determination unit 11 is specifically configured to determine corresponding nodes of the M domain nodes in the k-th prediction reference frame as at least one prediction node of the current node in the k-th prediction reference frame.

In some embodiments, the determination unit 11 is specifically configured to determine at least one prediction node of the current node in the K prediction reference frames as N prediction nodes of the current node.

In some embodiments, if the current frame to be decoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be decoded.

In some embodiments, if the current frame to be decoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be decoded.

In some embodiments, the prediction unit 12 is specifically used to determine the index of the context model based on the geometric decoding information of the N prediction nodes; determine the context model based on the index of the context model; and use the context model to predict and decode the coordinate information of the current point in the current node.

In some embodiments, the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or position information of the midpoint of the prediction node, and the direct decoding information is used to indicate whether the prediction node meets the conditions for decoding by direct decoding. The prediction unit 12 is specifically used to determine a first context index based on the direct decoding information of the N prediction nodes, and/or determine a second context index based on the coordinate information of the midpoint of the N prediction nodes; based on the first context index and/or the second context index, select the context model from a plurality of preset context models.

In some embodiments, the prediction unit 12 is specifically used to determine, for any prediction node among the N prediction nodes, a first numerical value corresponding to the prediction node based on direct decoding information of the prediction node; and determine the first context index based on the first numerical values corresponding to the N prediction nodes.

In some embodiments, the prediction unit 12 is specifically configured to determine the first value corresponding to the prediction node by using the number of the direct decoding mode of the prediction node.

In some embodiments, the prediction unit 12 is specifically used to determine a first weight corresponding to the prediction node; based on the first weight, weighted processing is performed on the first numerical values corresponding to the N prediction nodes to obtain a first weighted prediction value; based on the first weighted prediction value, the first context index is determined.

In some embodiments, the prediction unit 12 is specifically used to determine, for the j-th prediction reference frame among the K prediction reference frames, a first numerical value corresponding to the prediction node in the j-th prediction reference frame based on direct decoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K; determine a first weight corresponding to the prediction node, and weightedly process the first numerical value corresponding to the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame; and determine the first context index based on the second weighted prediction values corresponding to the K prediction reference frames.

In some embodiments, the prediction unit 12 is specifically used to determine second weights corresponding to the K prediction reference frames; based on the second weights, weighted processing is performed on second weighted prediction values corresponding to the K prediction reference frames respectively to obtain the first context index.

In some embodiments, the prediction unit 12 is specifically used to select, for any prediction node among the N prediction nodes, a first point corresponding to the current point in the current node from the points included in the prediction node; and determine the second context index based on the coordinate information of the first point included in the N prediction nodes.

In some embodiments, the prediction unit 12 is specifically used to determine the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis, where the i-th coordinate axis is the X-coordinate axis, the Y-coordinate axis or the Z-coordinate axis; based on the first context index and/or the second context index corresponding to the i-th coordinate axis, select the context model corresponding to the i-th coordinate axis from the multiple context models; and use the context model corresponding to the i-th coordinate axis to predict and decode the coordinate information of the current point on the i-th coordinate axis.

In some embodiments, the prediction unit 12 is specifically used to determine a first weight corresponding to the prediction node; based on the first weight, weighted processing is performed on the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point; based on the coordinate information of the first weighted point on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis is determined.

In some embodiments, if K is greater than 1, the prediction unit 12 is specifically used to determine a first weight corresponding to a prediction node in the j-th prediction reference frame based on the j-th prediction reference frame among the K prediction reference frames; based on the first weight, weighted processing is performed on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K; based on the second weighted points corresponding to the K prediction reference frames, a second context index corresponding to the i-th coordinate axis is determined.

In some embodiments, the prediction unit 12 is specifically used to determine the second weights corresponding to the K prediction reference frames; based on the second weights, weighted processing is performed on the coordinate information of the second weighted points corresponding to the K prediction reference frames to obtain a third weighted point; based on the coordinate information of the third weighted point on the i-th coordinate axis, determine the second context index corresponding to the i-th coordinate axis.

In some embodiments, the prediction unit 12 is specifically configured to determine a first weight corresponding to the prediction node based on a distance between a domain node corresponding to the prediction node and the current node.

In some embodiments, the prediction unit 12 is specifically used to determine the second weight corresponding to the predicted reference frame based on the time difference between the predicted reference frame and the current frame to be decoded.

It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, no further description is given here. Specifically, the point cloud decoding device 10 shown in FIG. 18 may correspond to the corresponding subject in the point cloud decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the point cloud decoding device 10 are respectively for implementing the corresponding processes in the point cloud decoding method, and for the sake of brevity, no further description is given here.

Figure 19 is a schematic block diagram of the point cloud encoding device provided in an embodiment of the present application.

As shown in FIG. 19 , the point cloud encoding device 20 includes:

The determination unit 21 is specifically configured to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be encoded, wherein the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;

The encoding unit 22 is used to perform predictive encoding on the coordinate information of the midpoint of the current node based on the geometric encoding information of the N predicted nodes.

In some embodiments, the current frame to be encoded includes K prediction reference frames, and the determination unit 21 is specifically used to determine at least one prediction node of the current node in the kth prediction reference frame among the K prediction reference frames, where k is a positive integer less than or equal to K, and K is a positive integer; based on at least one prediction node of the current node in the K prediction reference frames, determine N prediction nodes of the current node.

In some embodiments, the determination unit 21 is specifically used to determine M domain nodes of the current node in the current frame to be encoded, where the M domain nodes include the current node, and M is a positive integer; for the i-th domain node among the M domain nodes, determine the corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M; based on the corresponding nodes of the M domain nodes in the k-th prediction reference frame, determine at least one prediction node of the current node in the k-th prediction reference frame.

In some embodiments, the determination unit 21 is specifically used to determine the corresponding node of the current node in the kth prediction reference frame; determine at least one domain node of the corresponding node; and determine the at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.

In some embodiments, the determination unit 21 is specifically used to determine the parent node of the i-th node in the current frame to be encoded, as the i-th parent node, the i-th node being the i-th domain node or the current node; determine the matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node; and determine one of the child nodes of the i-matching node as the corresponding node of the i-th node in the k-th prediction reference frame.

In some embodiments, the determination unit 21 is specifically configured to determine a matching node of the i-th parent node in the k-th prediction reference frame based on the placeholder information of the i-th parent node.

In some embodiments, the determination unit 21 is specifically configured to determine the node whose placeholder information in the kth prediction reference frame has the smallest difference with the placeholder information of the i-th parent node as the matching node of the i-th parent node in the k-th prediction reference frame.

In some embodiments, the determination unit 21 is specifically used to determine the first serial number of the i-th node among the child nodes included in the parent node; and determine the child node with the first serial number among the child nodes of the i-th matching node as the corresponding node of the i-th node in the k-th prediction reference frame.

In some embodiments, the determination unit 21 is specifically configured to determine corresponding nodes of the M domain nodes in the k-th prediction reference frame as at least one prediction node of the current node in the k-th prediction reference frame.

In some embodiments, the determination unit 21 is specifically configured to determine at least one prediction node of the current node in the K prediction reference frames as N prediction nodes of the current node.

In some embodiments, if the current frame to be encoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be encoded.

In some embodiments, if the current frame to be encoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be encoded.

In some embodiments, the encoding unit 22 is specifically used to determine the index of the context model based on the geometric encoding information of the N prediction nodes; determine the context model based on the index of the context model; and use the context model to predict and encode the coordinate information of the current point in the current node.

In some embodiments, the geometric coding information of the prediction node includes direct coding information of the prediction node and/or position information of the midpoint of the prediction node, and the direct coding information is used to indicate whether the prediction node meets the conditions for encoding in a direct coding manner. The encoding unit 22 is specifically used to determine a first context index based on the direct coding information of the N prediction nodes, and/or determine a second context index based on the coordinate information of the midpoints of the N prediction nodes; based on the first context index and/or the second context index, select the context model from a plurality of preset context models.

In some embodiments, the encoding unit 22 is specifically used to determine, for any prediction node among the N prediction nodes, a first numerical value corresponding to the prediction node based on direct encoding information of the prediction node; and determine the first context index based on the first numerical values corresponding to the N prediction nodes.

In some embodiments, the direct encoding information includes a direct encoding mode of the prediction node, and the encoding unit 22 is specifically configured to determine a first value corresponding to the prediction node by serializing the direct encoding mode of the prediction node.

In some embodiments, the encoding unit 22 is specifically used to determine a first weight corresponding to the prediction node; based on the first weight, weighted processing is performed on the first numerical values corresponding to the N prediction nodes to obtain a first weighted prediction value; based on the first weighted prediction value, the first context index is determined.

In some embodiments, the encoding unit 22 is specifically used to determine, for the j-th prediction reference frame among the K prediction reference frames, a first numerical value corresponding to the prediction node in the j-th prediction reference frame based on direct encoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K; determine a first weight corresponding to the prediction node, and weightedly process the first numerical value corresponding to the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame; and determine the first context index based on the second weighted prediction values corresponding to the K prediction reference frames.

In some embodiments, the encoding unit 22 is specifically used to determine the second weights corresponding to the K prediction reference frames; based on the second weights, weighted processing is performed on the second weighted prediction values corresponding to the K prediction reference frames respectively to obtain the first context index.

In some embodiments, the encoding unit 22 is specifically used to select, for any prediction node among the N prediction nodes, a first point corresponding to the current point in the current node from the points included in the prediction node; and determine the second context index based on the coordinate information of the first point included in the N prediction nodes.

In some embodiments, the encoding unit 22 is specifically used to determine the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis, where the i-th coordinate axis is the X-coordinate axis, the Y-coordinate axis or the Z-coordinate axis; based on the first context index and/or the second context index corresponding to the i-th coordinate axis, select the context model corresponding to the i-th coordinate axis from the multiple context models; and use the context model corresponding to the i-th coordinate axis to predict and encode the coordinate information of the current point on the i-th coordinate axis.

In some embodiments, the encoding unit 22 is specifically used to determine a first weight corresponding to the prediction node; based on the first weight, weighted processing is performed on the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point; based on the coordinate information of the first weighted point on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis is determined.

In some embodiments, the encoding unit 22 is specifically used to determine, for the j-th prediction reference frame among the K prediction reference frames, a first weight corresponding to a prediction node in the j-th prediction reference frame; based on the first weight, weighted processing is performed on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K; based on the second weighted points corresponding to the K prediction reference frames, a second context index corresponding to the i-th coordinate axis is determined.

In some embodiments, the encoding unit 22 is specifically used to determine a second weight corresponding to the K predicted reference frames; based on the second weight, weighted processing is performed on the coordinate information of the second weighted point corresponding to the K predicted reference frames to obtain a third weighted point; based on the coordinate information of the third weighted point on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis is determined.

In some embodiments, the encoding unit 22 is specifically configured to determine a first weight corresponding to the prediction node based on a distance between a domain node corresponding to the prediction node and the current node.

In some embodiments, the encoding unit 22 is specifically configured to determine a second weight corresponding to the predicted reference frame based on a time difference between the predicted reference frame and the current frame to be encoded.

It should be understood that the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, it will not be repeated here. Specifically, the point cloud coding device 20 shown in Figure 19 may correspond to the corresponding subject in the point cloud coding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the point cloud coding device 20 are respectively for implementing the corresponding processes in the point cloud coding method. For the sake of brevity, they will not be repeated here.

The above describes the device and system of the embodiment of the present application from the perspective of the functional unit in conjunction with the accompanying drawings. It should be understood that the functional unit can be implemented in hardware form, can be implemented by instructions in software form, and can also be implemented by a combination of hardware and software units. Specifically, the steps of the method embodiment in the embodiment of the present application can be completed by the hardware integrated logic circuit and/or software form instructions in the processor, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software units in the decoding processor to perform. Optionally, the software unit can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, an electrically erasable programmable memory, a register, etc. The storage medium is located in a memory, and the processor reads the information in the memory, and completes the steps in the above method embodiment in conjunction with its hardware.

FIG. 20 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

As shown in FIG. 20 , the electronic device 30 may be a point cloud decoding device or a point cloud encoding device as described in the embodiment of the present application, and the electronic device 30 may include:

The memory 33 and the processor 32, the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32. In other words, the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.

For example, the processor 32 may be configured to execute the steps in the method 200 according to the instructions in the computer program 34 .

In some embodiments of the present application, the processor 32 may include but is not limited to:

General-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

In some embodiments of the present application, the memory 33 includes but is not limited to:

Volatile memory and/or non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM) or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link DRAM (SLDRAM) and direct RAM bus random access memory (Direct Rambus RAM, DR RAM).

In some embodiments of the present application, the computer program 34 may be divided into one or more units, which are stored in the memory 33 and executed by the processor 32 to complete the method provided by the present application. The one or more units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30.

As shown in FIG. 20 , the electronic device 30 may further include:

The transceiver 33 may be connected to the processor 32 or the memory 33 .

The processor 32 may control the transceiver 33 to communicate with other devices, specifically, to send information or data to other devices, or to receive information or data sent by other devices. The transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include an antenna, and the number of antennas may be one or more.

It should be understood that the various components in the electronic device 30 are connected via a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus and a status signal bus.

As shown in Figure 21, the point cloud encoding and decoding system 40 may include: a point cloud encoder 41 and a point cloud decoder 42, wherein the point cloud encoder 41 is used to execute the point cloud encoding method involved in the embodiment of the present application, and the point cloud decoder 42 is used to execute the point cloud decoding method involved in the embodiment of the present application.

The present application also provides a code stream, which is generated according to the above encoding method.

The present application also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, the computer can perform the method of the above method embodiment. In other words, the present application embodiment also provides a computer program product containing instructions, and when the instructions are executed by a computer, the computer can perform the method of the above method embodiment.

When software is used for implementation, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function according to the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrations. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. For example, each functional unit in each embodiment of the present application may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The above contents are only specific implementation methods of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A point cloud decoding method, characterized by comprising:

In a prediction reference frame of a current frame to be decoded, determining N prediction nodes of a current node, wherein the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;

Based on the geometric decoding information of the N predicted nodes, the coordinate information of the midpoint of the current node is predicted and decoded.
The method according to claim 1, characterized in that the current frame to be decoded includes K prediction reference frames, and determining N prediction nodes of the current node in the prediction reference frame of the current frame to be decoded comprises:

For a k-th prediction reference frame among the K prediction reference frames, determining at least one prediction node of the current node in the k-th prediction reference frame, where k is a positive integer less than or equal to K, and K is a positive integer;

Based on at least one prediction node of the current node in the K prediction reference frames, N prediction nodes of the current node are determined.
The method according to claim 2, characterized in that the determining of at least one prediction node of the current node in the k-th prediction reference frame comprises:

In the current frame to be decoded, determine M domain nodes of the current node, the M domain nodes include the current node, and M is a positive integer;

For an i-th domain node among the M domain nodes, determine a corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M;

Based on the corresponding nodes of the M domain nodes in the k-th prediction reference frame, at least one prediction node of the current node in the k-th prediction reference frame is determined.
The method according to claim 2, characterized in that the determining of at least one prediction node of the current node in the k-th prediction reference frame comprises:

Determine a corresponding node of the current node in the k-th prediction reference frame;

Determining at least one domain node of the corresponding node;

The at least one domain node is determined as at least one prediction node of the current node in the k-th prediction reference frame.
The method according to claim 3 or 4, characterized in that the method further comprises:

In the current frame to be decoded, determine the parent node of the ith node as the ith parent node, the ith node being the ith domain node or the current node;

Determine a matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node;

One of the child nodes of the i matching nodes is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
The method according to claim 5, characterized in that the determining the matching node of the i-th parent node in the k-th prediction reference frame comprises:

Based on the placeholder information of the i-th parent node, a matching node of the i-th parent node in the k-th prediction reference frame is determined.
The method according to claim 6, characterized in that the determining, based on the placeholder information of the i-th parent node, a matching node of the i-th parent node in the k-th prediction reference frame comprises:

A node whose placeholder information in the k-th prediction reference frame has the smallest difference with the placeholder information of the i-th parent node is determined as a matching node of the i-th parent node in the k-th prediction reference frame.
The method according to claim 5, characterized in that the step of determining one of the child nodes of the i matching nodes as the corresponding node of the i-th node in the k-th prediction reference frame comprises:

Determine the first sequence number of the i-th node among the child nodes included in the parent node;

The child node with the first sequence number among the child nodes of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
The method according to claim 3, characterized in that the determining at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame comprises:

The corresponding nodes of the M domain nodes in the k-th prediction reference frame are determined as at least one prediction node of the current node in the k-th prediction reference frame.
The method according to claim 2, characterized in that the determining the N prediction nodes of the current node based on at least one prediction node of the current node in the K prediction reference frames comprises:

At least one prediction node of the current node in the K prediction reference frames is determined as N prediction nodes of the current node.
The method according to claim 2 is characterized in that if the current frame to be decoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be decoded.
The method according to claim 2 is characterized in that, if the current frame to be decoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be decoded.
The method according to any one of claims 2 to 12, characterized in that the predictive decoding of the coordinate information of the midpoint of the current node based on the geometric decoding information of the N predicted nodes comprises:

Determining an index of a context model based on the geometric decoding information of the N prediction nodes;

Determining the context model based on the index of the context model;

Using the context model, the coordinate information of the current point in the current node is predicted and decoded.
The method according to claim 13 is characterized in that the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or position information of a midpoint of the prediction node, and the direct decoding information is used to indicate whether the prediction node satisfies a condition for decoding in a direct decoding manner, and determining the index of the context model based on the geometric decoding information of the N prediction nodes comprises:

Determine a first context index based on direct decoding information of the N prediction nodes, and/or determine a second context index based on coordinate information of midpoints of the N prediction nodes;

The selecting the context model based on the index of the context model comprises:

Based on the first context index and/or the second context index, the context model is selected from a plurality of preset context models.
The method according to claim 14, characterized in that the determining the first context index based on the direct decoding information of the N prediction nodes comprises:

For any prediction node among the N prediction nodes, determining a first value corresponding to the prediction node based on direct decoding information of the prediction node;

The first context index is determined based on first values corresponding to the N prediction nodes.
The method according to claim 15, characterized in that the direct decoding information includes a direct decoding mode of the prediction node, and determining the first value corresponding to the prediction node based on the direct decoding information of the prediction node comprises:

The direct decoding mode number of the prediction node is used to determine a first value corresponding to the prediction node.
The method according to claim 15, characterized in that the determining the first context index based on the first values corresponding to the N prediction nodes comprises:

Determining a first weight corresponding to the prediction node;

Based on the first weight, weighting the first values corresponding to the N prediction nodes to obtain a first weighted prediction value;

Based on the first weighted prediction value, the first context index is determined.
The method according to claim 14, characterized in that, if the K is greater than 1, the determining the first context index based on the direct decoding information of the N prediction nodes comprises:

For a j-th prediction reference frame among the K prediction reference frames, determining a first value corresponding to the prediction node in the j-th prediction reference frame based on direct decoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K;

Determine a first weight corresponding to the prediction node, and perform weighted processing on a first value corresponding to the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame;

The first context index is determined based on second weighted prediction values corresponding to the K prediction reference frames.
The method according to claim 18, characterized in that the determining the first context index based on the second weighted prediction values corresponding to the K prediction reference frames comprises:

Determine second weights corresponding to the K prediction reference frames;

Based on the second weight, weighted processing is performed on the second weighted prediction values respectively corresponding to the K prediction reference frames to obtain the first context index.
The method according to claim 14, characterized in that the determining the second context index based on the coordinate information of the midpoints of the N prediction nodes comprises:

For any prediction node among the N prediction nodes, selecting a first point corresponding to a current point in the current node from the points included in the prediction node;

The second context index is determined based on coordinate information of a first point included in the N prediction nodes.
The method according to claim 20, characterized in that the determining the second context index based on the coordinate information of the first point included in the N prediction nodes comprises:

Determine, based on coordinate information of a first point included in the N prediction nodes on an i-th coordinate axis, a second context index corresponding to the i-th coordinate axis, where the i-th coordinate axis is an X-coordinate axis, a Y-coordinate axis, or a Z-coordinate axis;

The selecting the context model from a plurality of preset context models based on the first context index and/or the second context index includes:

Based on the first context index and/or the second context index corresponding to the i-th coordinate axis, selecting a context model corresponding to the i-th coordinate axis from the multiple context models;

The using the context model to predict and decode the coordinate information of the current point in the current node includes:

Use the context model corresponding to the i-th coordinate axis to predict and decode the coordinate information of the current point on the i-th coordinate axis.
The method according to claim 21, characterized in that the determining, based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis, the second context index corresponding to the i-th coordinate axis comprises:

Determining a first weight corresponding to the prediction node;

Based on the first weight, weighted processing is performed on the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point;

Based on the coordinate information of the first weighted point on the i-th coordinate axis, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 21, characterized in that if K is greater than 1, determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis includes:

For a j-th prediction reference frame among the K prediction reference frames, determining a first weight corresponding to a prediction node in the j-th prediction reference frame;

Based on the first weight, weighted processing is performed on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K;

Based on the second weighted points corresponding to the K prediction reference frames, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 23, characterized in that the determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the second weighted points corresponding to the K prediction reference frames comprises:

Determine second weights corresponding to the K prediction reference frames;

performing weighted processing on the coordinate information of the second weighted points corresponding to the K prediction reference frames based on the second weight to obtain a third weighted point;

Based on the coordinate information of the third weighted point on the i-th coordinate axis, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 17, 18, 22 or 23, characterized in that determining the first weight corresponding to the prediction node comprises:

Based on the distance between the domain node corresponding to the prediction node and the current node, a first weight corresponding to the prediction node is determined.
The method according to claim 19 or 24, characterized in that the determining the second weights corresponding to the K prediction reference frames comprises:

Based on the time difference between the predicted reference frame and the current frame to be decoded, a second weight corresponding to the predicted reference frame is determined.
A point cloud encoding method, characterized by comprising:

In a prediction reference frame of a current frame to be encoded, determining N prediction nodes of a current node, wherein the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;

Based on the geometric coding information of the N predicted nodes, the coordinate information of the midpoint of the current node is predicted and coded.
The method according to claim 27, characterized in that the current frame to be encoded includes K prediction reference frames, and determining N prediction nodes of the current node in the prediction reference frame of the current frame to be encoded comprises:

For a k-th prediction reference frame among the K prediction reference frames, determining at least one prediction node of the current node in the k-th prediction reference frame, where k is a positive integer less than or equal to K, and K is a positive integer;

Based on at least one prediction node of the current node in the K prediction reference frames, N prediction nodes of the current node are determined.
The method according to claim 28, characterized in that the determining of at least one prediction node of the current node in the k-th prediction reference frame comprises:

In the current frame to be encoded, determine M domain nodes of the current node, the M domain nodes include the current node, and M is a positive integer;

For an i-th domain node among the M domain nodes, determine a corresponding node of the i-th domain node in the k-th prediction reference frame, where i is a positive integer less than or equal to M;

Based on the corresponding nodes of the M domain nodes in the k-th prediction reference frame, at least one prediction node of the current node in the k-th prediction reference frame is determined.
The method according to claim 28, characterized in that the determining of at least one prediction node of the current node in the k-th prediction reference frame comprises:

Determine a corresponding node of the current node in the k-th prediction reference frame;

Determining at least one domain node of the corresponding node;

The at least one domain node is determined as at least one prediction node of the current node in the k-th prediction reference frame.
The method according to claim 29 or 30, characterized in that the method further comprises:

In the current frame to be encoded, determine the parent node of the ith node as the ith parent node, the ith node being the ith domain node or the current node;

Determine a matching node of the i-th parent node in the k-th prediction reference frame as the i-th matching node;

One of the child nodes of the i matching nodes is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
The method according to claim 31, characterized in that the determining the matching node of the i-th parent node in the k-th prediction reference frame comprises:

Based on the placeholder information of the i-th parent node, a matching node of the i-th parent node in the k-th prediction reference frame is determined.
The method according to claim 32, characterized in that the determining, based on the placeholder information of the i-th parent node, a matching node of the i-th parent node in the k-th prediction reference frame comprises:

A node whose placeholder information in the k-th prediction reference frame has the smallest difference with the placeholder information of the i-th parent node is determined as a matching node of the i-th parent node in the k-th prediction reference frame.
The method according to claim 31, characterized in that the step of determining one of the child nodes of the i matching nodes as the corresponding node of the i-th node in the k-th prediction reference frame comprises:

Determine the first sequence number of the i-th node among the child nodes included in the parent node;

The child node with the first sequence number among the child nodes of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
The method according to claim 29, characterized in that the determining at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M domain nodes in the kth prediction reference frame comprises:

The corresponding nodes of the M domain nodes in the k-th prediction reference frame are determined as at least one prediction node of the current node in the k-th prediction reference frame.
The method according to claim 28, characterized in that the determining the N prediction nodes of the current node based on at least one prediction node of the current node in the K prediction reference frames comprises:

At least one prediction node of the current node in the K prediction reference frames is determined as N prediction nodes of the current node.
The method according to claim 28 is characterized in that if the current frame to be encoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be encoded.
The method according to claim 28 is characterized in that if the current frame to be encoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be encoded.
The method according to any one of claims 28 to 38, characterized in that the predictive coding of the coordinate information of the midpoint of the current node based on the geometric coding information of the N predicted nodes comprises:

Determining an index of a context model based on the geometric coding information of the N prediction nodes;

Determining the context model based on the index of the context model;

Using the context model, the coordinate information of the current point in the current node is predictively encoded.
The method according to claim 39 is characterized in that the geometric coding information of the prediction node includes direct coding information of the prediction node and/or position information of a midpoint of the prediction node, the direct coding information is used to indicate whether the prediction node satisfies a condition for encoding in a direct coding manner, and the determining the index of the context model based on the geometric coding information of the N prediction nodes comprises:

Determine a first context index based on direct encoding information of the N prediction nodes, and/or determine a second context index based on coordinate information of midpoints of the N prediction nodes;

The selecting the context model based on the index of the context model comprises:

Based on the first context index and/or the second context index, the context model is selected from a plurality of preset context models.
The method according to claim 40, characterized in that the determining the first context index based on the direct encoding information of the N prediction nodes comprises:

For any prediction node among the N prediction nodes, determining a first value corresponding to the prediction node based on direct encoding information of the prediction node;

The first context index is determined based on first values corresponding to the N prediction nodes.
The method according to claim 41, characterized in that the direct encoding information includes a direct encoding mode of the prediction node, and determining the first value corresponding to the prediction node based on the direct encoding information of the prediction node comprises:

The direct coding mode number of the prediction node is used to determine a first value corresponding to the prediction node.
The method according to claim 41, characterized in that the determining the first context index based on the first values corresponding to the N prediction nodes comprises:

Determining a first weight corresponding to the prediction node;

Based on the first weight, weighting the first values corresponding to the N prediction nodes to obtain a first weighted prediction value;

Based on the first weighted prediction value, the first context index is determined.
The method according to claim 40, characterized in that if the K is greater than 1, then the determining the first context index based on the direct encoding information of the N prediction nodes comprises:

For a j-th prediction reference frame among the K prediction reference frames, determining a first value corresponding to the prediction node in the j-th prediction reference frame based on direct encoding information of the prediction node of the current node in the j-th prediction reference frame, where j is a positive integer less than or equal to K;

Determine a first weight corresponding to the prediction node, and perform weighted processing on a first value corresponding to the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the j-th prediction reference frame;

The first context index is determined based on second weighted prediction values corresponding to the K prediction reference frames.
The method according to claim 44, characterized in that the determining the first context index based on the second weighted prediction values corresponding to the K prediction reference frames comprises:

Determine second weights corresponding to the K prediction reference frames;

Based on the second weight, weighted processing is performed on the second weighted prediction values respectively corresponding to the K prediction reference frames to obtain the first context index.
The method according to claim 40, characterized in that the determining the second context index based on the coordinate information of the midpoints of the N prediction nodes comprises:

For any prediction node among the N prediction nodes, selecting a first point corresponding to a current point in the current node from the points included in the prediction node;

The second context index is determined based on coordinate information of a first point included in the N prediction nodes.
The method according to claim 46, characterized in that the determining the second context index based on the coordinate information of the first point included in the N prediction nodes comprises:

Determine, based on coordinate information of a first point included in the N prediction nodes on an i-th coordinate axis, a second context index corresponding to the i-th coordinate axis, where the i-th coordinate axis is an X-coordinate axis, a Y-coordinate axis, or a Z-coordinate axis;

The selecting the context model from a plurality of preset context models based on the first context index and/or the second context index includes:

Based on the first context index and/or the second context index corresponding to the i-th coordinate axis, selecting a context model corresponding to the i-th coordinate axis from the multiple context models;

The using the context model to predictively encode the coordinate information of the current point in the current node includes:

The context model corresponding to the i-th coordinate axis is used to predictively encode the coordinate information of the current point on the i-th coordinate axis.
The method according to claim 47, characterized in that the determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis comprises:

Determining a first weight corresponding to the prediction node;

Based on the first weight, weighted processing is performed on the coordinate information of the first point included in the N prediction nodes to obtain a first weighted point;

Based on the coordinate information of the first weighted point on the i-th coordinate axis, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 47, characterized in that if K is greater than 1, determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes on the i-th coordinate axis includes:

For a j-th prediction reference frame among the K prediction reference frames, determining a first weight corresponding to a prediction node in the j-th prediction reference frame;

Based on the first weight, weighted processing is performed on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K;

Based on the second weighted points corresponding to the K prediction reference frames, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 49, characterized in that the determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the second weighted points corresponding to the K prediction reference frames comprises:

Determine second weights corresponding to the K prediction reference frames;

performing weighted processing on the coordinate information of the second weighted points corresponding to the K prediction reference frames based on the second weight to obtain a third weighted point;

Based on the coordinate information of the third weighted point on the i-th coordinate axis, a second context index corresponding to the i-th coordinate axis is determined.
The method according to claim 43, 44, 48 or 49, characterized in that determining the first weight corresponding to the prediction node comprises:

Based on the distance between the domain node corresponding to the prediction node and the current node, a first weight corresponding to the prediction node is determined.
The method according to claim 45 or 50, characterized in that the determining the second weights corresponding to the K prediction reference frames comprises:

Based on the time difference between the predicted reference frame and the current frame to be encoded, a second weight corresponding to the predicted reference frame is determined.
A point cloud decoding device, characterized by comprising:

A determination unit, configured to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be decoded, wherein the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;

A decoding unit is used to predict and decode the coordinate information of the midpoint of the current node based on the geometric decoding information of the N predicted nodes.
A point cloud encoding device, characterized by comprising:

A determination unit, specifically configured to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be encoded, wherein the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;

The encoding unit is used to predict the coordinate information of the midpoint of the current node based on the geometric encoding information of the N prediction nodes.
An electronic device, characterized in that it comprises: a processor and a memory;

The memory is used to store computer programs;

The processor is used to call and run the computer program stored in the memory to perform the method according to any one of claims 1 to 26 or 27 to 52.
A computer-readable storage medium, characterized in that it is used to store a computer program, wherein the computer program enables a computer to execute the method according to any one of claims 1 to 26 or 27 to 52.