research-article

Public Access

An Inverse Procedural Modeling Pipeline for SVBRDF Maps

Authors:

Yiwei Hu,

Chengan He,

Valentin Deschaintre,

Julie Dorsey,

Holly RushmeierAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 41, Issue 2

Article No.: 18, Pages 1 - 17

https://doi.org/10.1145/3502431

Published: 04 January 2022 Publication History

All formats PDF

Abstract

Procedural modeling is now the de facto standard of material modeling in industry. Procedural models can be edited and are easily extended, unlike pixel-based representations of captured materials. In this article, we present a semi-automatic pipeline for general material proceduralization. Given Spatially Varying Bidirectional Reflectance Distribution Functions (SVBRDFs) represented as sets of pixel maps, our pipeline decomposes them into a tree of sub-materials whose spatial distributions are encoded by their associated mask maps. This semi-automatic decomposition of material maps progresses hierarchically, driven by our new spectrum-aware material matting and instance-based decomposition methods. Each decomposed sub-material is proceduralized by a novel multi-layer noise model to capture local variations at different scales. Spatial distributions of these sub-materials are modeled either by a by-example inverse synthesis method recovering Point Process Texture Basis Functions (PPTBF) [30] or via random sampling. To reconstruct procedural material maps, we propose a differentiable rendering-based optimization that recomposes all generated procedures together to maximize the similarity between our procedural models and the input material pixel maps. We evaluate our pipeline on a variety of synthetic and real materials. We demonstrate our method’s capacity to process a wide range of material types, eliminating the need for artist designed material graphs required in previous work [38, 53]. As fully procedural models, our results expand to arbitrary resolution and enable high-level user control of appearance.

1 Introduction

The appearance of a scene is defined by the interaction between lighting, geometries, and materials. Materials define the way the light is scattered and absorbed on and within geometries. Despite progress in material authoring tools, creating a realistic material remains time-consuming, even for expert artists. To reproduce existing materials, the Material Acquisition field defines methods for extracting properties through measurements sampling the material reflectance at different light and camera positions [22]. Leveraging the recent progress in lightweight material acquisition, we propose the first method aiming at generating a procedural material representation, allowing higher-level editability as well as arbitrary scale and resolution. Examples of the application and editing of procedural representations is shown in Figure.

Procedural modeling is now common in professional material modeling with tools such as Blender or Substance Designer [1]. Procedural models are often represented as graphs of operations, defining sub-elements of the targeted material. Each of these operations is procedural, allowing infinite resolution or scale and high-level control through modification of parameters.

Recent methods for simplified material acquisition [19, 15, 45, 16, 26] target the reconstruction of analytic material maps, which compactly represent materials, but lack the editability and resolution/scale increase possibilities offered by procedural representations.

Inverse procedural modeling of materials focuses on the creation of a procedural material representation from images, or in our case, from an existing set of analytic material maps. We postulate that this ill-posed challenge requires division of a material into meaningful components that can be represented as procedures. Recently, several methods [38, 53] took a first step by proposing different inverse modeling frameworks to select a procedural graph among existing models and optimize the node parameters to best match an input texture.

In this article, we target the challenging task of creating entirely new procedural models from a set of SVBRDF pixel maps as input, eliminating the need for pre-existing artist-designed procedural models. We propose inverting the material modeling process in a sequential way by hierarchically breaking down the material into several sub-materials, then fitting their spatial distributions and material properties with procedures.

Given a material, we hierarchically decompose it into a tree of atomic sub-materials. The different sub-materials are segmented to encourage uniform statistical variation through either a user-guided, Fourier spectrum-aware KNN matting approach or an automatic instance-based segmentation algorithm. Segmented regions provide information about local texture statistics and the properties of different sub-materials. We represent the global spatial distributions of sub-materials with mask maps. Each component of the tree is then automatically converted to a full procedural model. We present a multi-layer procedural noise model based on random phase noise [23] to model the appearance of sub-materials to capture local variations at multiple scales. To procedurally model global spatial distributions, we propose an optimization-based inverse modeling method based on a procedural texture basis function [30] and random sampling to convert mask maps to procedures. With the proceduralized sub-materials and their procedural mask maps organized in a tree, we build a small material graph by adding optimizable operators. The material graph is optimized in a differentiable fashion based on a rendering loss to better match the input material.

Our material representation relies solely on procedural components, enabling high-level editing and arbitrary scale/resolution. We use an analytic representation as input to reduce the uncertainty inherent in single-picture material acquisition. This allows our method to seamlessly benefit from progress in lightweight material acquisition methods [17, 26].

We show application of our pipeline on a wide range of materials and define a new taxonomy for material decomposition complexity, highlighting the existing challenges of this ill-posed task. In summary, this article presents a novel research direction to create a procedural representation of general analytical materials without relying on pre-existing material graphs. Specifically, we present the following contributions:

•

We present the first pipeline for semi-automatic generation of procedural representation of materials. As such, our method does not rely on a pre-existing library of material node graphs.

•

We propose a new spectrum-aware, hierarchical segmentation method for guided material segmentation.

•

We define procedures and their corresponding inverse approaches to proceduralize sub-materials and their distributions.

•

We offer a differentiable rendering-based optimization routine to match our procedural material to the input material during reconstruction.

•

We define a procedural representation that allows for complex multi-scale edits and arbitrary scale and/or resolution.

Our implementation will be publicly available.

2 Related Work

2.1 Material Acquisition

Material acquisition seeks to recover the reflectance properties of existing surfaces or objects. Our method complements this body of work, since acquired materials are used as input for our pipeline to create procedural representations. While material acquisition has challenged researchers for decades, as discussed in the excellent survey by Guarnera et al. [29], significant progress was achieved in recent years by leveraging deep learning. In particular, different methods were proposed to acquire materials from one [44, 45, 46, 15] or multiple [16, 26, 33, 10, 18] pictures of surfaces or objects. While our approach also targets the creation of a material representation, we do not seek to recover material properties from captured sample(s), but rather to proceduralize an existing material. This difference not only allows our method to benefit from material acquisition and its progress, but also to handle any existing analytical material.

2.2 Texture Synthesis

In our method, we leverage texture synthesis, which creates new textures with larger scale or higher resolution from an exemplar. Several surveys provide a comprehensive overview of example-based texture synthesis [61, 2, 51]. Texture synthesis can be classified into three families: patch re-arrangement based, arranging patches available in the original texture to synthesize a new texture [21, 20, 39], statistics based, estimating statistics of the original texture and transferring them to a new texture [34, 23, 24, 28, 25, 35], and neural-network-based, reproducing texture or material appearance using machine learning [58, 9, 63, 54, 37, 36, 48]. We use the second family of methods to reproduce the local variations of our decomposed atomic components’ sub-materials. Alone, these methods can reproduce micro-structures well, but fail for larger scale patterns. They are also limiting in the scope of possible editability. But by making use of progressive decomposition, extending them in a multi-layer fashion, and incorporating them with an optimizable post-processing step, our method can represent complex SVBRDFs as procedural models with arbitrary scale and resolution as well as high-level editability.

2.3 Procedural Modeling

Procedural models allow artists to retain editability and produce arbitrary scale and resolution. Specialized procedural models have been proposed but are limited to specific materials such as wood or leather and mostly stationary materials [32]. A more generic option lies in the design software allowing artists to combine procedures to create a new material (see Figure 1). While this procedural representation provides great editability and versatility, it remains time-consuming and challenging, even for expert artists, to reproduce an existing or imagined material.

Fig. 1.

Fig. 2.

Recent methods explore inverse procedural modeling of materials, aiming at the reproduction of a given material appearance using a procedural model. Hu et al. [38] and Shi et al. (MATch) [53] both propose different frameworks that leverage existing Substance graph procedural models and infer their parameters to match the appearance of input textures. To do so, Hu et al. rely on neural networks trained for each individual material graph, while Shi et al. (MATch) rely on gradient descent and a differentiable version of the Substance Engine. Both of these methods, however, work on the premise of known pre-existing procedural node graphs, manually created by artists. This approach requires a large, expressive enough dataset of procedural graphs to be available and non-trivial search methods to find the closest ones among hundred of options. As opposed to these methods, our proposed pipeline does not require predefined material graphs and simply relies on a few user scribbles, which typically only require two minutes of interaction.

A key component of our inverse procedural modeling pipeline is the procedural representation of the structure of materials. That is, the global spatial distributions of decomposed sub-materials, which we represent using a set of binary mask maps. To preserve the procedural aspect of our results, we need to represent these masks procedurally. Rosenberger et al. [52] propose a shape synthesis method to generate layered control maps but is limited to unstructured shapes with fractal-like boundaries. Alternatively, we could use an L-system [56], but that approach requires predefined grammars. Rather, we make procedural generation of structured mask maps possible by proposing a by-example inverse Point Process Texture Basis Functions (PPTBF) [30] modeling approach and leveraging random sampling.

2.4 Image Segmentation and Matting

Image segmentation aims at separating an image into different regions, based on some specific criteria. In our material proceduralization framework, segmentation is used for partitioning the given material into several sub-materials. Image segmentation has been extensively studied in the past decades [40], with recent methods leveraging deep learning [47]. Close to our challenge are the recent works of Cimpoi et al. and Bell et al. [[8], [14]], which semantically split natural scenes into their different components (such as wood, plastic, etc.). However, these methods focus on the segmentation of complete scene images and use context [8] to better recognize materials, which is not available in our material exemplars.

Most existing segmentation methods do not perform well in our context, as they result in significant error on the boundaries between two sub-materials. We therefore use image matting, originally developed to segment background and foreground with fuzzy boundaries. We found alpha matting to better represent the transition between sub-materials in a SVBRDF and to preserve good quality boundaries, even after thresholding.

Many methods for image matting suggest affinity-based solutions, solving a Laplacian clustering problem [43, 12, 4]. Recent work [13, 62] attempts to learn alpha maps directly using deep neural networks, but is limited to two-layer (foreground and background) alpha matting due to available training data. Aksoy et al. [3] proposed an improved affinity-based method by combining deep semantic features with affinity Laplacian. While it supports multiple layers, it is geared toward natural scene images and requires user to provide the number of layers. Our decomposition approach of materials is similar to the one adopted by Lawrence et al. [41]: We aim to decompose the input spatially varying material maps into regions of similar sub-materials. The approach by Lawrence et al. was further developed by AppWand [49], AppProp [5], and Material Matting [42], which introduce segmenting measured or analytical material images to make them easy to edit. However, in practice, these approaches seem to handle limited normal variations. More importantly, they do not target the creation of fully procedural representations. They treat the materials as SVBRDF images, limiting the editing of results to pixelwise (rather than parametric) modifications of the segmentation and/or uniform (rather than multi-scale) material editing. Beyond the editing limits, these methods do not provide for the generation of textures of arbitrary spatial extent.

We choose to build upon KNN matting [12] by proposing a spectrum-sensitive affinity-based image matting method. KNN matting supports multiple layers and enables user control to conveniently define layers of interest, which are useful features for our material segmentation purpose. The aforementioned material segmentation approaches [49, 5, 42] could produce interesting segmentation results, but we chose KNN matting because of its generality, the availability of its implementation, and its efficient runtime.

3 Overview

Given a set of SVBRDF maps, we want to generate a procedural representation of the material. Our inverse procedural modeling pipeline starts by hierarchically decomposing input SVBRDF maps into multiple sub-materials (Section 4) organized as a tree structure. Each sub-material represents a statistically similar region and its local variation, while the associated segmentation masks encode the global spatial variations of these sub-materials. The tree structure provides a layered relationship between different sub-materials. We traverse the material tree to convert each component to their procedural counterparts. We represent procedural sub-materials with a multi-layer procedural noise model combining procedural noise maps (Section 5.1) and mask maps using Point Process Texture Basis Functions (PPTBF) [30] and random sampling (Section 5.2). After proceduralization, we compose these procedural representations and use a differentiable rendering-based optimization to match the appearance of the input material (Section 5.3). We show an overview of our method in Figure 2.

Fig. 3.

In this article, we demonstrate our method on common physically based material parameters, which can be easily acquired using recent methods [17]: albedo maps, normal maps, and roughness maps. As normal maps encode vectors, their channels represent 3D directions that are difficult to proceduralize. Instead of directly working in normal map space, we convert to a height map using Poisson reconstruction [50] a produce a better proceduralization. Figure 3 provides an example of this reconstruction. Our approach can easily handle additional gray-scale or color maps.

Fig. 4.

4 SVBRDF Decomposition

The first step of our pipeline is a hierarchical decomposition of the input SVBRDF maps into multiple sub-materials in a semi-automatic way. Although it is possible to spatially decompose SVBRDF maps into multiple sub-parts in one step, we propose an iterative decomposition into a material hierarchy, allowing the encoding of a layered relationship between sub-materials for proceduralization. Specifically, given a set of SVBRDF maps, we decompose it into multiple sub-materials using a spectrum-aware matting algorithm (Section 4.1), which generates mask maps for each segmented sub-material. For each masked sub-material, we let the user decide whether to further decompose it using either our matting algorithm or a lightweight instance-based decomposition algorithm (Section 4.2). Using this process, users can define the elements they consider important in the texture and iterate with new sub-divisions until no salient sub-material is left, creating a tree structure of decomposed sub-materials.

4.1 Spectrum-aware SVBRDF Decomposition

We first consider how to decompose a set of SVBRDF maps into sub-materials. As classical segmentation [40] does not allow for smooth boundaries, we use alpha matting [12]. Inspired by KNN matting, our algorithm allows users to draw a few strokes to conveniently indicate different regions of interest.

As an affinity-based algorithm, KNN matting [12] relies on a feature vector \(X(i)\) for each pixel \(i\) of the image to compute an affinity matrix. For SVBRDF maps, traditionally used features are albedo color (r, g, b), height (h), roughness (\(\alpha\)), and position (x,y):

\begin{equation} X(i)=(r(i), g(i), b(i), h(i), \alpha (i), x(i), y(i)). \end{equation}

(1)

We propose an additional feature, based on the noise spectrum, enforcing the statistical uniformity expected from decomposed sub-materials. To distinguish between the differences in local noise, we take the noise Fourier spectrum into account. Notice that although position features \((x, y)\) are taken into account, we do not enforce spatial continuity of the regions, because we sample multiple non-local neighborhoods with different weights of \((x, y)\), to explore nonlocality (as done in the original KNN matting)

We estimate the local spectrum at each pixel \(i\) using Welch’s method [60] and reduce its dimensionality to 3 using Principal Component Analysis (PCA). We therefore compute the affinity matrix and matting using a feature vector \(X(i)\) composed of albedo color, height, roughness, position, and our spectrum estimation.

Alpha matting results in multiple alpha maps that we process to generate binary mask maps. Each pixel in the image is assigned to the binary for which it has the highest alpha map value. The thresholded alpha-matting better represents the transition between sub-materials than direct segmentation methods. Figure 4 shows a comparison between the decomposition results with and without spectrum features on example textures. In these two images, color and position features do not provide enough hints to separate two layers, because they are similar in color but differ in noise spectra. More decomposition results on material maps with user scribbles can be found in our supplemental document.

Fig. 5.

Once a material decomposed, we provide the option to further process each sub-material using the same matting algorithm with the newly generated mask map(s) as an additional constraint.

4.2 Instance-based Decomposition

As an alternative to a progressive decomposition into sub-materials, we provide a lightweight solution for instance-based decomposition. This is particularly useful for repeating materials such as a tile wall composed of different types of tiles (the input shown in Figure 5). Rather than manually segmenting each different tile, we frame this as an instance detection and clustering problem, using—in this example—the mask map of segmented tiles to extract each instances of tile. Different instances correspond to disconnected regions in the mask map segmented from the previous layer in the decomposed material tree. We then scale each instance to the same size based on its bounding box. For each instance, we estimate its color histogram and local spectrum as features and build a feature matrix for agglomerative clustering, as shown in Figure 5. The clustering result allows us to extract the different types of tiles, their frequency, and their sub-material to assign them with a similar frequency in the procedural model.

Fig. 6.

5 Material Proceduralization and Recomposition

After decomposing the input SVRBDF maps into a tree of sub-materials and mask maps, we traverse the tree and convert each component into a procedural version layer-by-layer. Finally, we introduce optimizable parameters during the final recomposition phase to best match the input SVBRDF.

5.1 Multi-layer Procedural Noise Model

In this section, we describe the conversion of the segmented sub-material into procedural models. Each leaf node in our tree is a sub-material while the intermediate nodes store the mask maps. Each sub-material is represented by a set of masked SVBRDF maps. We use a multi-layer procedural noise model to proceduralize their texture appearance. As our images are masked and incomplete, the spectrum of the entire image is unavailable, preventing the direct fitting of procedural noise models using noise synthesis methods, e.g., Galerne et al. [23], Galerne et al. [24], Gilet et al. [28], and Heitz and Neyret [35]. While Guingo et al. [31] propose estimating local spectra in the valid regions only, their approach cannot capture the global variation of the noise textures, e.g., when the scale of the randomness is comparable to the scale of the full image. We leverage a similar sliding window approach but propose a multi-layer procedural model—shown in Algorithm 1—to deal with incomplete images. Given an input masked image, we decompose it into several noise layers using a progressive filtering strategy. For each layer, we filter our input image with a Gaussian kernel. The kernel size of the filter becomes larger as the number of layers grows, aiming at capturing larger scale spatial variation. For each filtered layer, we estimate its local spectrum and convert it to a procedural noise model similar to Guingo et al. [31]. Finally, our algorithm uses the mean value of filtered image \(I\) as the base color \(C\) and extracts the final noise layer \(N \leftarrow I-C\).

This last layer, however, represents the lowest frequency of the input image, preventing the use of sliding window spectrum estimation. As each sub-material is masked, the full image spectrum is unavailable. We therefore inpaint the missing data [57, 7]. To reduce artifacts introduced by this step, we apply a set of Gabor noises as a basis to approximate the power spectrum of the inpainted image [24]. This last step and our use of multi-layer noise allows our method to produce a fully procedural representation of the sub-material’s properties. Figure 6 shows an example of our multi-layer decomposition on a masked colored texture image, where each layer captures different levels of details from an input image and the last layer shows the global variation of the noise image.

Fig. 7.

Using the procedural noise \(p_i\) fitted to each layer \(i\) and the base color \(C\), we reconstruct the original texture as \(\sum p_i + C, i=1,2,\ldots ,n\) where \(n\) is the number of noise layers. We further improve the results of this reconstruction for SVBRDF maps by introducing a differentiable rendering-based optimization scheme, described in Section 5.3. Our approach provides a fully procedural representation and can reproduce fine-grained details and yield a smoother noise texture than single-layer methods (see Figure 7). Furthermore, our multi-layer approach gives users more control over level of detail.

Fig. 8.

5.1.1 Images with Multiple Channels.

Synthesizing colored images, such as the albedo maps, could result in poor color mixing during synthesis. Similar to previous approaches [34, 23, 31], given a multi-channel noise texture, we synthesize it in a PCA color space. We project the image from the original RGB space to the PCA space, allowing us to synthesize each channel independently before projecting it back to the original RGB space. We then match the histogram of the synthesized image to the input image as a post-processing step to ensure a matching color distribution.

5.2 Procedural Mask Synthesis

While we represent the sub-material local variations using our multi-layer noise model, we use mask maps to represent the global spatial distributions of the sub-materials. Proceduralizing these mask maps allows us to reach a full procedural representation of the material. With it, we can easily edit, resample, and extend the resolution of the spatial distribution. Similar to sub-material proceduralization, mask proceduralization is done recursively during traversal of the material tree. We model mask maps by two methods (1) Point Process Texture Basis Functions (PPTBFs) [30] to model decomposed binary mask maps by matting algorithm (Section 4.1); (2) Random Sampling to model decomposed instances (Section 4.2).

5.2.1 Inverse Mask Fitting by PPTBF.

As introduced in Guehl et al. [30], PPTBFs are defined by the sparse convolution of randomly sampled 2D points \(\mathbf {x_i}\) with a kernel function being the product between a visual feature \(f\) and a blending window \(w\):

\begin{equation} PPTBF_k(\mathbf {x})=\sum _{\mathbf {x_i} \in \mathcal {N}_k(\mathbf {x})} f(\mathbf {x}-\mathbf {x_i}) w(\mathbf {x}-\mathbf {x_i}), \end{equation}

(2)

where \(\mathcal {N}_k(\mathbf {x})=\lbrace \mathbf {x_1},\ldots ,\mathbf {x_k}\rbrace\) describes the \(k\) closest sample points around \(\mathbf {x}\). According to Equation (2), the behavior of PPTBF depends on 2D spatial point distribution of \(\mathbf {x_i}\), visual feature function \(f\), and blending window function \(w\), which are in turn controlled by intuitive parameters such as kernel size, degree of smoothing, and so on, to produce a continuous scalar field. After applying a threshold, we use PPTBF to model binary mask maps segmented by our matting algorithm. In this section, We first describe the fitting of mask maps on the top level of the material tree where the mask covers the entire image and then discuss fitting maps in the lower hierarchy where mask contains missing regions.

As PPTBF is a forward generation process, Guehl et al. [30] did not focus on by-example modeling. The authors suggested an approach that queries a precomputed PPTBF database and then optimizes parameters, but did not provide details. We therefore experimented with different feature representations and optimization routines and here discuss the techniques we chose. Detailed implementation of our query-and-optimization method can be found in our released code.

Query: As PPTBFs are not fully differentiable and partly rely on discrete parameters, starting from a good initialization is particularly important for efficient optimization. We first query a database, uniformly covering the variations allowed by PPTBFs, to retrieve the nearest neighbor parameters of our input mask. We use the mask image database provided in Guehl et al. [30]. The database contains 450K images and was generated by non-regular sampling with three different thresholds for binarization. To measure the similarity between the input mask and pre-sampled mask maps in the database, we choose Local Binary Pattern (LBP) to encode local statistics, Gram Matrix of pre-trained VGG19 [55] deep features for global statistics, and Fourier power spectrum for regularity. We weight and concatenate these three features to build a high dimensional vector acting as a descriptor for binary mask maps. We precompute such a vector for each mask map in the database and reduce their dimensionality by PCA to 512 dimensions for a more robust representation. The storage of the processed database is 7.81 GB in total. We build an acceleration structure for fast nearest neighbor search based on \(L_2\) distance of two feature vectors. The precomputation takes around 36 hours, but each query then requires less than 1 second.

Optimization: After we retrieve a pre-sampled binary mask map from the database, we use its parameters to initialize our optimization algorithm to better match the input mask. Since PPTBF contains both continuous parameters as well as discrete parameters, we apply coordinate descent to optimize continuous and discrete ones alternatively. As evaluation of PPTBF is expensive and not easily differentiable, we apply gradient-free approaches to avoid costly finite differentiation. Our implementation for PPTBF optimization is CPU-based. For continuous parameters, we apply the Powell method (SciPy [59]) and for discrete parameters, we adopt a Bayesian optimization method with Gaussian Process (GPyOpt [6]).

As Figure 8 shows, our query results provide an initial guess about the general structure, but it can be hard to find a perfect match from the database due to sparse sampling. Our optimization step can correct the parameters and match the structure of the input image.

Fig. 9.

5.2.2 Incomplete Mask Maps.

Different from masks at the top layer of our tree, masks for sub-materials in the lower hierarchy are masked by higher-level masks. If naively processed, then this leads to poor procedural reconstruction of the sub-material distribution.

To fit incomplete mask maps, we propose two solutions. First, we adjust our optimization by computing losses only for unmasked pixels between input mask and PPTBF output. Second, we propose proceduralizing an inpainted version—using PatchMatch [7]—of incomplete mask maps. However, features in the mask map are not sufficient to guide inpainting. Instead, we inpaint the sub-material and compute the mask map on it, as shown in Figure 9. Because PatchMatch inpainting is a sample-based method, this second approach works best for stochastic distributions.

Fig. 10.

5.2.3 Random Sampling.

When mask maps are generated using our instance-based decomposition solution, we can model their distributions by random sampling. Suppose we have \(n\) mask maps, each representing one type of segmented instance. We count the number of instances in each mask map and estimate their probability of occurrence. During mask synthesis, we randomly—following the estimated distribution—assign a label between 1 and \(n\) to instances in the procedural version of the mask map. The corresponding sub-materials are then synthesized in the labeled regions.

5.3 Recomposition

Finally, using our generated procedural noise maps and masks tree, we compose them into an output SVBRDF to reproduce the appearance of the original SVBRDF inputs. To relate this approach to a classical Substance Designer [1] pipeline, our procedural noise maps and binary masks function as generators nodes, which do not rely on existing artist designed graphs.

To better match the original SVBRDF inputs, we add optimizable operators to control the appearance of procedural noise and binary maps. Given a noise map \(I\), we modify its appearance by \(G(I*\alpha +\delta , \sigma)\), where \(G\) is a Gaussian filter, \(\alpha\) controls the intensity, \(\delta\) biases the noise, and \(\sigma\) is the standard deviation of the Gaussian filter. For a binary mask map \(M\) that represents the distribution of sub-materials, we model the smooth transition between the boundaries of sub-materials by \(G(M, \sigma)\). The parameters \(\alpha\), \(\delta\), and \(\sigma\) are optimizable for each Gaussian filter per noise or mask map. In each leaf node of our tree, the noise maps are multiplied by their corresponding mask maps and linearly combined to reconstruct the SVBRDF maps for a procedural sub-material. Procedural Sub-materials from different layers are recursively computed and aggregated in a bottom-up fashion to build the final output SVBRDF maps.

These optimizable operators, together with our reconstructed noise maps and binary maps, build a small optimizable material graph. We optimize this material graph, using a differentiable rendering-based optimization routine. The reconstructed SVBRDF maps and the input SVBRDF maps are rendered using a Cook-Torrance, GGX shading model under randomly sampled lighting configurations. Considering that the structure of the procedural SVBRDF maps might not align perfectly with the original inputs, we define the loss as a combination of Style loss and SSIM loss. Style loss is defined as the \(L_1\) difference between Gram matrices computed over VGG [55] features of the renderings, similar to the style transfer literature [27]. SSIM loss is computed by the \(L_1\) difference between the structural similarity (SSIM) indices [64] of the renderings. The full loss is written as

\begin{equation} L =\sum \Vert GM(I_i)-GM(I^*_i)\Vert _1 + \beta \Vert SSIM(I_i) - SSIM(I^*_i)\Vert _1, \end{equation}

(3)

where \(GM\) is the Gram matrix and SSIM is an operator to compute structure similarity index; \(I_i\) and \(I^*_i\) are input/procedural albedo map, normal map (computed from height map), roughness map, and their renderings; \(\beta\) balances the weight between the style term and the SSIM term. In Figure 10, we show an example with and without this optimization process. Often in real data local texture appearance cannot be well represented by Gaussian noise models, resulting in fitting errors and artifacts when generating procedural models (third row in Figure 10). This is particularly problematic for normal/height map modeling. Our optimization helps refine the parameters of our procedural graph, significantly improving the results, even when the exemplar violates our assumption of locally uniform appearance such as in Figure 10.

Fig. 11.

5.4 Final Procedural Representation

Our final procedural representation of material maps is the equivalent of small material graphs with (i) procedural noise maps, (ii) procedural mask maps, (iii) optimizable operators, similar to Figure 1 . In Shi et al. [53]’s terminology, each procedural noise and mask model is a generator, while optimizable operators serve as filters. As a fully procedural model, editable parameters of our small material graph are (i) parameters of our generators, e.g., noise models and mask models; (ii) parameters in the filter nodes. Editing can be done on individual maps or jointly on all of them. The procedural model can then generate the material at arbitrary resolution and scale.

6 Results

We demonstrate our pipeline with a variety of materials as inputs. Each material is defined by a set of SVBRDF maps including albedo map, normal map, and roughness map—these are the maps our input materials use, but our method can adapt to any additional or different gray-scale or color maps.

Our pipeline is implemented in Python and partly relies on Matlab. Differentiable optimization of our material graphs (Section 5.3) is implemented in PyTorch. We adopt an L-BFGS-B optimizer to optimize our material graph with a learning rate of 0.005. It takes around 200 steps—2 minutes on a Nvidia RTX 2070 Super GPU with a CPU of Intel Core i7-9700—to converge. A complete inverse material modeling process takes less than 5 minutes for user interaction and about 20 minutes for computing, depending on the complexity of the input material. The typical computation time is: 2 minutes for spectrum-aware matting; 13 minutes for procedural mask query (1 s) and optimization (depending on initialization quality); 3 minutes for multi-layer noise modeling and 2 minutes for optimization-based recomposition. We provide this time for reference but highlight that our current implementation is not optimized.

Figure 11 shows our inverse procedural modeling results on different materials. For each example, inputs are albedo map, normal map, and roughness map only. We decompose these material maps using our hierarchical segmentation methods and visualize the computed labeled mask maps. Given material maps and computed hierarchical mask maps, we generate procedural materials and output albedo, normal, and roughness maps (Section 5). The results show that our pipeline can reproduce well a variety of stochastic and regular materials. We see that our method recovers both large-scale patterns and fine-grained details, thanks to our sub-material decomposition approach. The general structure and texture appearance are not perfectly registered with the original input, because our model is a procedural approximation of it. We aim at reproducing its appearance rather than a pixel-perfect match [15], allowing us to preserve the material global appearance while sampling a new realization of it or editing it, as shown in Figure 15.

Fig. 12.

6.1 Natural Images as Input

Although our pipeline was designed to take material maps as input, it can also work on natural and flash images, benefiting from existing material acquisition methods [15, 16, 17, 26, 33]. Providing a set of captured images, we first apply a material acquisition method to generate material maps and use the generated SVBRDF maps with our method. Figure 12 shows examples of picture(s) as input. We convert the input natural image(s) to SVBRDF maps using state-of-the-art material acquisition methods [16, 17, 33]. As opposed to “artist-designed” SVBRDF maps or manually post-processed SVBRDF maps, automatically generated SVBRDF maps are not perfectly clean. Different maps can be noisy and exhibit shading and color variations due to incomplete lighting removal. Normal maps in particular can represent strong height variations even within a single sub-material. This makes it challenging to recover an exact match to the input image. Furthermore, irregularities often occur in acquired SVBRDF, which should be regular. We show in Figure 12 that our pipeline is capable of matching acquired SVBRDF overall appearance, and also of enforcing better regularity.

Fig. 13.

6.2 Application

6.2.1 Procedural Material Editing.

Once proceduralized using our method, materials can be further edited. Figure 15 shows examples of operations that we can apply on our results. We can freely edit the way mask maps and noise models are combined, the large-scale structures (e.g., patterns and distributions), and fine-scale details (e.g., fine normals and roughness). Once generated, our procedural model allows users to get interactive feedback on each of their edits. As opposed to Shi et al. [53] and Hu et al. [38], we do not rely on pre-defined material graph, allowing artists to use our method to generate a small tune-able and extendable material graph to start a new design. Furthermore, Hu et al. [38] relied on a style transfer–like post-process step to better match the input texture appearance, limiting the editability of their final results.

6.2.2 High-resolution Material Generation.

As a fully procedural model, materials converted by our pipeline can be expanded to arbitrary size. Global structures can be reproduced and extended using the procedural mask map with PPTBF and random sampling, while the appearance of sub-materials can be losslessly synthesized to higher resolution using procedural noise models. Figure 13 shows examples of high-resolution material synthesis. The resolution of the result material and rendered images is 2,048 \(\times\) 1,024. We also show an example of material super-resolution in Figure 14, where the resolution of the input SVBRDF is 512 \(\times\) 512, and we double its size in each dimension by procedurally upsampling its mask maps. Such operation will not affect its global scale, providing a super-resolved material rather than synthesize a larger scale one, as demonstrated in Figure 13. More high-resolution materials generated by our method are shown in the supplemental material.

Fig. 14.

Fig. 15.

Fig. 16.

6.3 Comparison with Prior Work

We compare our results to previous work in inverse procedural modeling and in texture synthesis. Since we take the output of a material acquisition method as input to our pipeline, we do not compare to material acquisition methods.

6.3.1 Inverse Procedural Material Modeling.

We compare our approach with state-of-the-art inverse procedural material modeling frameworks [38, 53] in Figures 17 and 18. Both are built on a collection of pre-defined material graphs and rely on model selection with parameter estimation. While Hu et al. directly apply neural network to predict parameters, Shi et al. (MATch) optimize parameters by end-to-end differential rendering. We use the model selection scheme of Hu et al. to choose input substance models for both methods. Despite not using pre-existing material graphs, we show that our pipeline well reproduces material maps with regular and stochastic patterns, e.g., brick and stucco, similar to the examples shown in Hu et al. [38]. Hu et al. additionally propose a post-processing step (see Figure 17(b)) to enrich the details, but this step is done on the rendered images (instead of material maps) and loses its procedural aspect.

Direct comparison to (MATch) [53] is difficult, because the optimization framework of MATch strongly depends on the quality of initialization. Indeed, the MATch framework cannot optimize discrete parameters. This limitation requires the scale of the structured target to be the same as the selected procedural material. The user is then required to manually fine-tune discrete parameters in Substance Designer. Additionally, only a subset of the Substance Engine was made differentiable in MATch, limiting the pool of compatible procedural graphs. In Figure 18 and the supplemental material, we attempt to provide good initialization and discrete parameters for MATch.

6.3.2 Texture Synthesis.

We also compare our approach with texture synthesis methods, as they share the ability of resolution extension our method allows. We experiment with several state-of-the-art example-based texture synthesis methods and generalize them to accept material maps as input. We stack all material maps together to build a high-dimensional texture map where each texel encodes albedo values, normal directions, and roughness values. Loss functions are computed on each material map and averaged. For self-tuning texture optimization [39], its generalization to multi-channel material maps is not trivial and we therefore run their algorithm separately on each material map. Figure 16 shows these comparisons in which the input resolution is 300 \(\times\) 300 and the output resolution is 512 \(\times\) 512. We see that our method is capable of extending both structured and stochastic materials and does not suffer from artifacts seen in traditional texture synthesis approaches such as structure error. Finally, and most significantly, compared with these other texture synthesis approaches, our method enables editing and fast synthesis of larger resolution without necessarily augmenting the scale.

Fig. 17.

Fig. 18.

Fig. 19.

7 Limitations

7.1 Texture Taxonomy for Proceduralization

To characterize the limitations of our work, we describe the space of possible materials we attempt to model. We consider materials as collections of elements. The space of different element collections has four dimensions defining the complexity of the proceduralization of a material, illustrated in Figure 19. The first axis is the number of different material element types in the texture. Both the types of elements and their spatial distribution relative to one another need to be modeled. The second axis is the nature of the spatial distribution, whether the elements on the texture are sparsely or densely positioned, with overlap for example. Dense spatial distribution are harder to segment using masks. The third axis is complexity of the contour of the spatial mask. The more complex, the more difficult it is for PPTBF or any procedural approach to represent them faithfully. The fourth axis is the texture within each element, whether it can easily be represented as noise or is more structured and semantically meaningful.

Fig. 20.

Our general pipeline that segments SVBRDF and then estimates parameters for each element type applies to the complete space we have described. However, our current implementation is limited to the less complex end of each of the last three axes. Our segmentation method is limited in that it cannot appropriately segment dense overlapping material. As stated, PPTBF performs well with irregular simple contours but not with complex contours. Finally, for the fourth axis, we have not developed a method to proceduralize semantically meaningful elements such as the bunny.

7.2 Specific Limitation Examples

Our results show that our pipeline is able to proceduralize different types of materials with multiple SVBRDF maps, e.g., albedo, normal, and roughness. In Figure 20, we show failure cases that result from the general limitations we just described. The combination of complex contours and a semantically meaningful structure results in the failure shown in the top row of Figure 20. The complex contours form a “flower” shape that is not preserved by the PPTBF mask. The presence of a sub-material that is itself composed of a densely packed set of elements results in the failure shown in the bottom row of Figure 20. Neither a noise pattern nor our segmentation method can capture the arrangement of the small pebbles forming the fill material between the square elements. Finer segmentation could produce better results, but the overlapping partial shapes would still be difficult to match using PPTBF. In some cases, such as the second example of Figure 16, the retrieved procedural mask will have small differences in the contours compared to the original. Adding a loss more sensitive to contours could improve such cases.

Fig. 21.

Finally, as our procedural material model is built upon multi-scale Gaussian noise models and Gaussian filters, it poorly reproduces highly structured variations, e.g., extremely strong and directional normal variations seen as Figure 21. In these cases, the segmentation of the height map is also ambiguous, leading to less faithful procedural reconstruction.

Fig. 22.

8 Future Work

In this work, we propose the first complete pipeline for inverse procedural modelling of general materials and highlight the challenges of each step. We describe here interesting future work to enable better proceduralization.

Modelling. New generative approaches such as deep texture or material generation [37, 36] could help better reproduce complex material appearances that cannot be simply represented by noise models, while sacrificing some control.

Segmentation. Our pipeline relies on user input, allowing user control and specification of the artistically important elements in the material. Nevertheless, a fully automatic segmentation would provide a faster approach and allow for quickly proceduralization of a large amount of materials. Current methods [13, 62, 8, 11] are not geared toward material segmentation and fail on the materials we consider.

Recomposition. Complex details and patterns might be lost during procedural modeling, as our method relies on noise and mask fitting. Introducing advanced differentiable filters and generator—such as ones in Substance Designer and MATch [53]—to modify the appearance of procedural noise maps and binary maps would allow to represent a wider range of appearances, further narrowing the gap between input and generated procedural materials.

9 Conclusion

We present the first pipeline for semi-automatic material proceduralization. Given a set of input material maps, our pipeline decomposes them into a tree of sub-materials and corresponding binary mask maps. We model the local appearance of sub-materials by procedural noise models and proceduralize binary mask map to reproduce global distribution of sub-materials.

Compared with previous work [38, 53], our pipeline does not rely on any predefined material graphs. With our approach, combined with the state-of-the-art material acquisition methods, we enable convenient creation of high-quality spatially varying procedural materials. We take a first step toward general material proceduralization and hope our work and highlighted challenges will inspire future research.

References

[1]

Adobe. 2021. Substance designer. Retrieved from https://www.substance3d.com/.

Abstract

1 Introduction

2 Related Work

2.1 Material Acquisition

2.2 Texture Synthesis

2.3 Procedural Modeling

2.4 Image Segmentation and Matting

3 Overview

4 SVBRDF Decomposition

4.1 Spectrum-aware SVBRDF Decomposition

4.2 Instance-based Decomposition

5 Material Proceduralization and Recomposition

5.1 Multi-layer Procedural Noise Model

5.1.1 Images with Multiple Channels.

5.2 Procedural Mask Synthesis

5.2.1 Inverse Mask Fitting by PPTBF.

5.2.2 Incomplete Mask Maps.

5.2.3 Random Sampling.

5.3 Recomposition

5.4 Final Procedural Representation

6 Results

6.1 Natural Images as Input

6.2 Application

6.2.1 Procedural Material Editing.

6.2.2 High-resolution Material Generation.

6.3 Comparison with Prior Work

6.3.1 Inverse Procedural Material Modeling.

6.3.2 Texture Synthesis.

7 Limitations

7.1 Texture Taxonomy for Proceduralization

7.2 Specific Limitation Examples

8 Future Work

9 Conclusion

References

Cited By

Index Terms

Recommendations

Generating Procedural 3D materials from Images using Neural Networks

Build your own procedural grooming pipeline

Inverse design of urban procedural models

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations