License: CC BY 4.0
arXiv:2402.01466v1 [cs.CV] 02 Feb 2024

Scaled 360 layouts:
Revisiting non-central panoramas

[Uncaptioned image] Bruno Berenguel-Baeta
Instituto de Investigacion en Ingenieria de Aragon
Department of Computer Science and Systems Engineering
University of Zaragoza
Zaragoza, Spain
[email protected]
&[Uncaptioned image] Jesus Bermudez-Cameo
Instituto de Investigacion en Ingenieria de Aragon
Department of Computer Science and Systems Engineering
University of Zaragoza
Zaragoza, Spain
[email protected]
&[Uncaptioned image] Jose J. Guerrero
Instituto de Investigacion en Ingenieria de Aragon
Department of Computer Science and Systems Engineering
University of Zaragoza
Zaragoza, Spain
[email protected]
Corresponding author.
Abstract

From a non-central panorama, 3D lines can be recovered by geometric reasoning. However, their sensitivity to noise and the complex geometric modeling required has led these panoramas being very little investigated. In this work we present a novel approach for 3D layout recovery of indoor environments using single non-central panoramas. We obtain the boundaries of the structural lines of the room from a non-central panorama using deep learning and exploit the properties of non-central projection systems in a new geometrical processing to recover the scaled layout. We solve the problem for Manhattan environments, handling occlusions, and also for Atlanta environments in an unified method. The experiments performed improve the state-of-the-art methods for 3D layout recovery from a single panorama. Our approach is the first work using deep learning with non-central panoramas and recovering the scale of single panorama layouts.

A final version of this article can be found at https://doi.org/10.1109/CVPRW53098.2021.00410

Keywords Omnidirectional Vision  \cdot 3D Vision  \cdot Non-central Cameras  \cdot Layout recovery  \cdot Scene understanding

1 Introduction

Layout recovery and 3D understanding of indoor environments is a hot topic in computer vision research Zou et al. (2021). Most recent approaches for layout recovery use different neural network architectures to recover the structural elements of an indoor environment. In this context, the use of omnidirectional images has important advantages to retrieve the shape of a whole room from a single image. In order to handle the heavy distortions that introduce the omnidirectional representations, we can find different approaches in the state of the art. Dula-Net Yang et al. (2019) extracts from the equirectangular image a perspective view of the ceiling and extrude the layout using a dual-branched architecture. Similarly but with a different network architecture, AtlantaNet Pintore et al. (2020) obtains the floor plan dividing the panorama into two perspective views of ceiling and floor separately to adjust the height of the room. On a different approach, Corners for Layouts (CFL) Fernandez-Labrador et al. (2020) and HorizonNet Sun et al. (2019) aim to extract the boundaries of the structural lines and corners of the room directly from the equirectangular panorama, obtaining an up-to-scale layout in a post-processing.

In this paper we propose a new method for layout recovery from single panoramas. We revisit the non-central panoramas Li et al. (2004) in order to obtain scaled 3D lines Bermudez-Cameo et al. (2017) and vertical planes from a single image. As the equirectangular panorama, the non-central circular panorama provides 360 information of the environment, but the image distortion in non-central panoramas includes subtle differences allowing geometric 3D reasoning Perdigoto and Araujo (2012). This characteristic is a clear advantage with regard equirectangular panoramas since allows to recover the scale of the environment without any prior knowledge. In Fig. 1 we have an equirectangular panorama and a non-central panorama of the same scene. Even though both images look similar allowing to recover the shape of the room, only from the non-central circular panorama we can recover the scale without any assumption.

Refer to caption
Figure 1: Central (up-left) and non-central panoramas (bottom-left) have similar appearance but there are subtle differences in favor of the second if we want to obtain 3D information including the scale. On the right, the scaled layout obtained from a non-central panorama in an Atlanta environment with our solution.
Refer to caption
Figure 2: Pipeline of the proposed method. The non-central circular panorama is processed by the fine-tuned network. The network provides the pixel information of the structural lines and a per-column probability of a wall-wall intersection. Then the proposed geometric pipeline, including the new solvers, gives the final scaled layout.

In our proposal, we adapt the neural network architecture of HorizonNet Sun et al. (2019) to non-central circular panoramas for the extraction of structural lines of indoor environments. Besides, we propose a geometric pipeline with two new linear solvers that jointly obtain the room height and vertical walls location for global layout extraction in Manhattan and Atlanta world assumptions. Our experiments show that our proposal improves the state of the art solutions, being ours the first that obtains the scale of the layout without using additional assumptions.

2 Proposed method

In order to recover the layout of a room from a single non-central circular panorama, we propose a new pipeline, shown in Fig. 2. We use a neural network as a line extractor from the image to reconstruct the scaled layout in a new geometrical processing that includes two new linear solvers for non-central projection systems.

2.1 Neural network as line extractor

For the first part of our pipeline, we propose a neural network as boundary extractor for structural lines. In classical approaches, lines are extracted with Hough transform and vanishing points, followed by an hypothesis generation-verification algorithm Zhang et al. (2014). This approach consumes lots of time and resources. On the other hand, neural networks have proven that can obtain patterns on images with high accuracy in a short time and, therefore, current approaches for layout recovery rely on the use of neural networks.

Even though many state-of-the-art networks can handle omnidirectional panoramas, there are not any that have considered non-central systems. We propose to adapt the existing network architecture of HorizonNet Sun et al. (2019) to handle non-central panoramas. The main advantage of this architecture is that handles the information of the panorama column by column. This is particularly interesting for the non-central circular panorama since it is locally a central projection system in each column.

In order to adapt the network to the distortions of the non-central panoramas, we have fine-tuned it. However, since non-central projection systems are little used in the research community, there is no data-set available. To overcome this difficulty, we have generated a data-set of non-central circular panoramas with ground truth information for layout recovery in synthetic indoor environments. This data-set is composed of around 650 different layouts from 6 to 10 walls and more than 2500 images. The data-set will be available under request.

2.2 Geometric solvers

In the second part of our pipeline, we take the pixel information provided by the network and reconstruct the scaled layout of the room. For that purpose, we derive two new geometric solvers to jointly obtain the room height and vertical walls location for Manhattan or Atlanta world assumptions.

We define a wall as a set of two parallel lines contained in a vertical plane (see Fig. 3). Let 𝐋=(𝐥T,𝐥¯T)T𝐋superscriptsuperscript𝐥𝑇superscript¯𝐥𝑇𝑇\mathbf{L}=(\mathbf{l}^{T},\mathbf{\bar{l}}^{T})^{T}bold_L = ( bold_l start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , over¯ start_ARG bold_l end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝐌=(𝐦T,𝐦¯T)T𝐌superscriptsuperscript𝐦𝑇superscript¯𝐦𝑇𝑇\mathbf{M}=(\mathbf{m}^{T},\mathbf{\bar{m}}^{T})^{T}bold_M = ( bold_m start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , over¯ start_ARG bold_m end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be the ceiling and floor lines defined in Plücker coordinates Pottmann and Wallner (2009) and {𝐞1,𝐞2,𝐞3}subscript𝐞1subscript𝐞2subscript𝐞3\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}\}{ bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } an orthonormal basis attached to the vertical wall. We define the closest points of the lines to the acquisition system 𝐱Lsubscript𝐱𝐿\mathbf{x}_{L}bold_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and 𝐱Msubscript𝐱𝑀\mathbf{x}_{M}bold_x start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT with hcsubscript𝑐h_{c}italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and hfsubscript𝑓h_{f}italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, distance from the acquisition system to the ceiling and floor planes, and d𝑑ditalic_d, distance to the wall plane, such that 𝐱L=d𝐞2+hc𝐞3subscript𝐱𝐿𝑑subscript𝐞2subscript𝑐subscript𝐞3\mathbf{x}_{L}=d\mathbf{e}_{2}+h_{c}\mathbf{e}_{3}bold_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_d bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and 𝐱M=d𝐞2+hf𝐞3subscript𝐱𝑀𝑑subscript𝐞2subscript𝑓subscript𝐞3\mathbf{x}_{M}=d\mathbf{e}_{2}+h_{f}\mathbf{e}_{3}bold_x start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = italic_d bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Notice that with this description we can parameterize the Plücker coordinates of the lines as 𝐥=𝐦=𝐞1𝐥𝐦subscript𝐞1\mathbf{l}=\mathbf{m}=\mathbf{e}_{1}bold_l = bold_m = bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐥¯=𝐱𝐋×𝐥=hc𝐞2d𝐞3¯𝐥subscript𝐱𝐋𝐥subscript𝑐subscript𝐞2𝑑subscript𝐞3\mathbf{\bar{l}}=\mathbf{x_{L}}\times\mathbf{l}=h_{c}\mathbf{e}_{2}-d\mathbf{e% }_{3}over¯ start_ARG bold_l end_ARG = bold_x start_POSTSUBSCRIPT bold_L end_POSTSUBSCRIPT × bold_l = italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_d bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, 𝐦¯=𝐱𝐌×𝐦=hf𝐞2d𝐞3¯𝐦subscript𝐱𝐌𝐦subscript𝑓subscript𝐞2𝑑subscript𝐞3\mathbf{\bar{m}}=\mathbf{x_{M}}\times\mathbf{m}=h_{f}\mathbf{e}_{2}-d\mathbf{e% }_{3}over¯ start_ARG bold_m end_ARG = bold_x start_POSTSUBSCRIPT bold_M end_POSTSUBSCRIPT × bold_m = italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_d bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. We also define the projecting rays that intersect the ceiling and floor lines as 𝚵=(𝝃T,𝝃¯T)T𝚵superscriptsuperscript𝝃𝑇superscriptbold-¯𝝃𝑇𝑇\bm{\Xi}=(\bm{\xi}^{T},\bm{\bar{\xi}}^{T})^{T}bold_Ξ = ( bold_italic_ξ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , overbold_¯ start_ARG bold_italic_ξ end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝑿=(𝝌T,𝝌¯T)T𝑿superscriptsuperscript𝝌𝑇superscriptbold-¯𝝌𝑇𝑇\bm{X}=(\bm{\chi}^{T},\bm{\bar{\chi}}^{T})^{T}bold_italic_X = ( bold_italic_χ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , overbold_¯ start_ARG bold_italic_χ end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT respectively.

side(𝚵,𝐋)=𝝃T(hc𝐞2d𝐞3)+𝝃¯T𝐞1=0𝑠𝑖𝑑𝑒𝚵𝐋superscript𝝃𝑇subscript𝑐subscript𝐞2𝑑subscript𝐞3superscriptbold-¯𝝃𝑇subscript𝐞10side(\bm{\Xi},\mathbf{L})=\bm{\xi}^{T}\left(h_{c}\mathbf{e}_{2}-d\mathbf{e}_{3% }\right)+\bm{\bar{\xi}}^{T}\mathbf{e}_{1}=0italic_s italic_i italic_d italic_e ( bold_Ξ , bold_L ) = bold_italic_ξ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_d bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) + overbold_¯ start_ARG bold_italic_ξ end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 (1)
side(𝑿,𝐌)=𝝌T(hf𝐞2d𝐞3)+𝝌¯T𝐞1=0𝑠𝑖𝑑𝑒𝑿𝐌superscript𝝌𝑇subscript𝑓subscript𝐞2𝑑subscript𝐞3superscriptbold-¯𝝌𝑇subscript𝐞10side(\bm{X},\mathbf{M})=\bm{\chi}^{T}\left(h_{f}\mathbf{e}_{2}-d\mathbf{e}_{3}% \right)+\bm{\bar{\chi}}^{T}\mathbf{e}_{1}=0italic_s italic_i italic_d italic_e ( bold_italic_X , bold_M ) = bold_italic_χ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_d bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) + overbold_¯ start_ARG bold_italic_χ end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 (2)

Known the projecting rays, given by the output of the neural network, we aim to obtain the 3D lines that define each wall in the environment. The relation among the projection rays and the lines is given by their intersection, defined in equations (1) and (2). This is, in general, a non-linear problem which is difficult to tackle directly. However, we propose two new DLT-like approaches that allows to compute the solution for the layout as a linear problem.

ξ¯1ux+ξ¯2uyξ1vyξ2vxdξ3=0subscript¯𝜉1subscript𝑢𝑥subscript¯𝜉2subscript𝑢𝑦subscript𝜉1subscript𝑣𝑦subscript𝜉2subscript𝑣𝑥𝑑subscript𝜉30\bar{\xi}_{1}u_{x}+\bar{\xi}_{2}u_{y}-\xi_{1}v_{y}-\xi_{2}v_{x}-d\xi_{3}=0over¯ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + over¯ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_d italic_ξ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 (3)
χ¯1uxχ¯2uyχ1wyχ2wxdχ3=0subscript¯𝜒1subscript𝑢𝑥subscript¯𝜒2subscript𝑢𝑦subscript𝜒1subscript𝑤𝑦subscript𝜒2subscript𝑤𝑥𝑑subscript𝜒30\bar{\chi}_{1}u_{x}-\bar{\chi}_{2}u_{y}-\chi_{1}w_{y}-\chi_{2}w_{x}-d\chi_{3}=0over¯ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - over¯ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_χ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT - italic_χ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_d italic_χ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 (4)

Let the main direction of a wall be horizontal and described by the vector 𝐮=(ux,uy)T𝐮superscriptsubscript𝑢𝑥subscript𝑢𝑦𝑇\mathbf{u}=\left(u_{x},u_{y}\right)^{T}bold_u = ( italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT such that 𝐥=𝐦=(ux,uy,0)T𝐥𝐦superscriptsubscript𝑢𝑥subscript𝑢𝑦0𝑇\mathbf{l}=\mathbf{m}=(u_{x},u_{y},0)^{T}bold_l = bold_m = ( italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , 0 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. We can define vectors 𝐯=hc𝐮𝐯subscript𝑐𝐮\mathbf{v}=h_{c}\mathbf{u}bold_v = italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_u and 𝐰=hf𝐮𝐰subscript𝑓𝐮\mathbf{w}=h_{f}\mathbf{u}bold_w = italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_u such that expressions (1) and (2) become linear obtaining a set of expressions ((3) and (4)) depending on the unknown wall homogeneous vector 𝐖=(𝐮T,𝐯T,𝐰T,d)T𝐖superscriptsuperscript𝐮𝑇superscript𝐯𝑇superscript𝐰𝑇𝑑𝑇\mathbf{W}=\left(\mathbf{u}^{T},\mathbf{v}^{T},\mathbf{w}^{T},d\right)^{T}bold_W = ( bold_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_d ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

λ(𝐯𝟏hc𝐮𝟏)=hc𝐮𝟎𝐯𝟎𝜆subscript𝐯1subscript𝑐subscript𝐮1subscript𝑐subscript𝐮0subscript𝐯0\lambda(\mathbf{v_{1}}-h_{c}\mathbf{u_{1}})=h_{c}\mathbf{u_{0}}-\mathbf{v_{0}}italic_λ ( bold_v start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ) = italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT - bold_v start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT (5)
λ(𝐰𝟏hf𝐮𝟏)=hf𝐮𝟎𝐰𝟎𝜆subscript𝐰1subscript𝑓subscript𝐮1subscript𝑓subscript𝐮0subscript𝐰0\lambda(\mathbf{w_{1}}-h_{f}\mathbf{u_{1}})=h_{f}\mathbf{u_{0}}-\mathbf{w_{0}}italic_λ ( bold_w start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ) = italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT - bold_w start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT (6)

However in this linear system A𝐖=0𝐴𝐖0A\mathbf{W}=0italic_A bold_W = 0, 𝐮𝐮\mathbf{u}bold_u, 𝐯𝐯\mathbf{v}bold_v and 𝐰𝐰\mathbf{w}bold_w are independent variables which are non-parallel. In order to impose the parallelism of these vectors we compute the null space of the system with a Singular Value Decomposition (SVD) obtaining a parametric solution which is the linear combination of the singular vectors with a set of parameters λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Two horizontal lines contained in a vertical plane have 4 degrees of freedom. A minimal solution would need 2 rays for each line of the wall, describing the null-space with three singular vectors and two parameters λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By solving a system of two quadratic equations for λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (with resultants, action matrices or as a polynomial eigenvalue problem Kukelova et al. (2011)) we obtain a set of 4 different solutions which should be discriminated. Since the network provides enough robust information, instead of the minimal solution, we propose to solve the over-determined case (with a minimum of 3 rays lying to each line) with a linear combination involving two singular vectors and single parameter λ𝜆\lambdaitalic_λ (such that 𝐖=𝐖0+λ𝐖1𝐖subscript𝐖0𝜆subscript𝐖1\mathbf{W}=\mathbf{W}_{0}+\lambda\mathbf{W}_{1}bold_W = bold_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_λ bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) obtaining two uncoupled quadratic equations (5) and (6) respectively. These equations provide two solutions, where only one of them sets the ceiling line above the floor line.

Refer to caption
Figure 3: Rays and wall parameter definition. The parameters are: wall reference system {𝐞1,𝐞2,𝐞3}subscript𝐞1subscript𝐞2subscript𝐞3\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}\}{ bold_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT }; 𝚵𝚵\bm{\Xi}bold_Ξ and 𝑿𝑿\bm{X}bold_italic_X define the projecting rays; (𝐥,𝐥¯)𝐥¯𝐥\mathbf{(l,\bar{l})}( bold_l , over¯ start_ARG bold_l end_ARG ) and (𝐦,𝐦¯)𝐦¯𝐦\mathbf{(m,\bar{m})}( bold_m , over¯ start_ARG bold_m end_ARG ) are the ceiling and floor lines that define the wall; 𝐱𝐋,𝐱𝐌subscript𝐱𝐋subscript𝐱𝐌\mathbf{x_{L},x_{M}}bold_x start_POSTSUBSCRIPT bold_L end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT bold_M end_POSTSUBSCRIPT define the closest points of the lines to the origin; hcsubscript𝑐h_{c}italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, hfsubscript𝑓h_{f}italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and d𝑑ditalic_d are the ceiling and floor height and distance to the wall respectively.

Notice that with Manhattan world assumption there is a set of walls sharing the wall direction 𝐮=(ux,uy)T𝐮superscriptsubscript𝑢𝑥subscript𝑢𝑦𝑇\mathbf{u}=\left(u_{x},u_{y}\right)^{T}bold_u = ( italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and the complementary set of walls share the orthogonal direction vector 𝐮=(uy,ux)Tsubscript𝐮perpendicular-tosuperscriptsubscript𝑢𝑦subscript𝑢𝑥𝑇\mathbf{u}_{\perp}=\left(-u_{y},u_{x}\right)^{T}bold_u start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT = ( - italic_u start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Since all the walls share the ceiling height hcsubscript𝑐h_{c}italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and the floor hfsubscript𝑓h_{f}italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we extend the DLT-like fitting to the whole set of walls, by computing the null-space of 𝖠𝔏M=0𝖠subscript𝔏𝑀0\mathsf{A}\mathfrak{L}_{M}=0sansserif_A fraktur_L start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = 0 where 𝔏Msubscript𝔏𝑀\mathfrak{L}_{M}fraktur_L start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is the layout vector 𝔏M=(𝐮T,𝐯T,𝐰T,d1,,dN)Tsubscript𝔏𝑀superscriptsuperscript𝐮𝑇superscript𝐯𝑇superscript𝐰𝑇subscript𝑑1subscript𝑑𝑁𝑇\mathfrak{L}_{M}=\left(\mathbf{u}^{T},\mathbf{v}^{T},\mathbf{w}^{T},d_{1},% \dotsm,d_{N}\right)^{T}fraktur_L start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = ( bold_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_d start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where N𝑁Nitalic_N is the number of walls and the matrix 𝖠𝖠\mathsf{A}sansserif_A is full-filed with relations (3) and (4). The same reasoning as in the case of a single wall can be used to enforce parallelism among 𝐮𝐮\mathbf{u}bold_u, 𝐯𝐯\mathbf{v}bold_v, 𝐰𝐰\mathbf{w}bold_w .

ξ1¯+hcξ2dξ3=0superscript¯subscript𝜉1subscript𝑐superscriptsubscript𝜉2𝑑superscriptsubscript𝜉30\bar{\xi_{1}}^{\prime}+h_{c}\xi_{2}^{\prime}-d\xi_{3}^{\prime}=0over¯ start_ARG italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_d italic_ξ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 (7)
χ1¯+hfχ2dχ3=0superscript¯subscript𝜒1subscript𝑓superscriptsubscript𝜒2𝑑superscriptsubscript𝜒30\bar{\chi_{1}}^{\prime}+h_{f}\chi_{2}^{\prime}-d\chi_{3}^{\prime}=0over¯ start_ARG italic_χ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_χ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_d italic_χ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 (8)

For Atlanta world assumption each wall can have a different horizontal direction, therefore we have to extract each wall independently. Notice that in this case we are not imposing that hcsubscript𝑐h_{c}italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and hfsubscript𝑓h_{f}italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT are common to all the walls. However, if the direction of each wall is known (for example extracting each wall independently) we can derive a new solution for the whole layout. Assuming that wall directions are known, we can express the projecting rays of each wall in its own local reference system and then equations (1) and (2) become (7) and (8) respectively, where 𝚵superscript𝚵\bm{\Xi}^{\prime}bold_Ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝑿superscript𝑿\bm{X}^{\prime}bold_italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are the projecting rays in each wall reference system. Then, we can solve the null-space of a system of linear equations 𝖠𝔏A=𝟎𝖠subscript𝔏𝐴0\mathsf{A}\mathfrak{L}_{A}=\mathbf{0}sansserif_A fraktur_L start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = bold_0 with 𝔏A=(1,hc,hf,d1,,dN)subscript𝔏𝐴1subscript𝑐subscript𝑓subscript𝑑1subscript𝑑𝑁\mathfrak{L}_{A}=\left(1,h_{c},h_{f},d_{1},\dotsm,d_{N}\right)fraktur_L start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = ( 1 , italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_d start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) where 𝖠𝖠\mathsf{A}sansserif_A is composed from equations (7) and (8). The main advantage of this second approach is that can be used for Manhattan as well as Atlanta world environments whenever the layout has only one ceiling and floor heights.

3 Experiments

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Figure 4: Examples of 3D reconstruction from the proposed method. We shown the non-central panorama and the 3D reconstruction. The green wire frame is the real 3D layout of the room.
Table 1: Comparison of different methods of 3D layout recovery.
Manhattan World assumption
3D IoU 3D IoU CEN CE
(u2s)
CFL Fernandez-Labrador et al. (2020) 78.87 - 0.75 -
HorizonNet Sun et al. (2019) 82.66 - 0.69 -
AtlantaNet Pintore et al. (2020) 83.94 - 0.71 -
Ours 93.88 86.18 0.787 0.223
Atlanta World assumption
HorizonNet Sun et al. (2019) 73.53 - - -
AtlantaNet Pintore et al. (2020) 80.01 - - -
Ours 91.67 76.17 1.335 0.513
higher is better smaller is better

We have performed a set of experiments in order to evaluate our proposal and make a comparison with the state-of-the-art methods. The metrics used for the comparison are: 3D IoU, which refer to the 3D intersection over union of the predicted layout and the ground truth; 3D IoU(u2s), which refer to the up-to-scale intersection over union of the layout; CEN, which refer to the Corner Error Normalized computed as the L2 distance of the corners divided by the diagonal of the layout; CE, which refers to the Corner Error computed as the L2 distance of the corners in meters.

The comparison with state-of-the-art methods is not completely fair. The datasets used for the different methods are different, so the results can depend on the dataset used and not only on the method. Besides, our proposal only uses the image information in order to recover the 3D layout with the scale while the rest of the methods in the state of the art provide up-to-scale measures, relying on some measure in the environment for the 3D reconstruction, e.g. the camera height. Nevertheless, a summary of this comparison is shown in Table 1.

These results show that our proposal outperforms the state of the art methods for Manhattan as well as Atlanta world assumptions. Besides, our method also recovers the scale of the layout without any prior assumption. Some examples of our results are shown in Fig. 4, where different layouts are tested. We demonstrate that our method can handle quite challenging layouts, in different illumination conditions and world assumptions.

4 Conclusions

In this paper we have proposed two new solvers for indoor layout recovery from a single non-central circular panorama. We use a neural network for extracting the edges of structural lines from a non-central projection system and geometrically process the output in order to recover the 3D information of the layout. Our experiments show that our approach with non-central circular panoramas has better performance than state-of-the-art methods for Manhattan and Atlanta environments. In addition, our method can extract the scale of the room without any prior knowledge.

References

  • Zou et al. [2021] C. Zou, JW. Su, CH. Peng, A. Colburn, Q. Shan, P. Wonka, HK. Chu, and D. Hoiem. Manhattan room layout reconstruction from a single 360 image: A comparative study of state-of-the-art methods. IJCV, 2021.
  • Yang et al. [2019] ST. Yang, FE. Wang, CH. Peng, P. Wonka, M. Sun, and HK. Chu. Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama. In CVPR. IEEE, 2019.
  • Pintore et al. [2020] G. Pintore, M. Agus, and E. Gobbetti. Atlantanet: Inferring the 3d indoor layout from a single 360 image beyond the manhattan world assumption. In ECCV. Springer, 2020.
  • Fernandez-Labrador et al. [2020] C. Fernandez-Labrador, JM. Facil, A. Perez-Yus, C. Demonceaux, J. Civera, and JJ. Guerrero. Corners for layout: End-to-end layout recovery from 360 images. IEEE Robotics and Automation Letters, 2020.
  • Sun et al. [2019] C. Sun, CW. Hsiao, M. Sun, and HT. Chen. Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In CVPR. IEEE, 2019.
  • Li et al. [2004] Y. Li, HY. Shum, CK. Tang, and R. Szeliski. Stereo reconstruction from multiperspective panoramas. PAMI, 2004.
  • Bermudez-Cameo et al. [2017] J. Bermudez-Cameo, O. Saurer, G. Lopez-Nicolas, JJ. Guerrero, and M. Pollefeys. Exploiting line metric reconstruction from non-central circular panoramas. Pattern Recognition Letters, 2017.
  • Perdigoto and Araujo [2012] L. Perdigoto and H. Araujo. Reconstruction of 3d lines from a single axial catadioptric image using cross-ratio. In ICPR. IEEE, 2012.
  • Zhang et al. [2014] Y. Zhang, S. Song, P. Tan, and J. Xiao. Panocontext: A whole-room 3d context model for panoramic scene understanding. In ECCV. Springer, 2014.
  • Pottmann and Wallner [2009] H. Pottmann and J. Wallner. Computational line geometry. Springer Science & Business Media, 2009.
  • Kukelova et al. [2011] Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigenvalue solutions to minimal problems in computer vision. PAMI, 2011.

ACKNOWLEDGMENT

This work was supported by RTI2018-096903-B-100 (AEI/ FEDER, UE).