JP7538862B2

JP7538862B2 - Creating an Arbitrary View

Info

Publication number: JP7538862B2
Application number: JP2022525977A
Authority: JP
Inventors: チュイ・クラレンス; パーマー・マヌ; シートン・ブルック・アーロン; ジェイン・ヒマーンシュ
Original assignee: Outward Inc
Current assignee: Outward Inc
Priority date: 2019-11-08
Filing date: 2020-11-05
Publication date: 2024-08-22
Anticipated expiration: 2040-11-05
Also published as: WO2021092229A1; EP4055567A4; JP2022553844A; JP2024113035A; KR20220076514A; EP4055567A1

Description

他の出願への相互参照
本願は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１８年１０月２５日出願の米国特許出願第１６／１７１，２２１号の一部継続出願であり、後者は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１７年９月２９日出願の米国特許出願第１５／７２１，４２１号（現在の米国特許第１０，１６３，２４９号）の継続出願であり、後者は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１６年３月２５日出願の米国特許出願第１５／０８１，５５３号（現在の米国特許第９，９９６，９１４号）の一部継続出願であり、これらはすべて、すべての目的のために参照によって本明細書に組み込まれる。米国特許出願第１５／７２１，４２１号（現在の米国特許第１０，１６３，２４９号）は、さらに、「ＦＡＳＴＲＥＮＤＥＲＩＮＧＯＦＡＳＳＥＭＢＬＥＤＳＣＥＮＥＳ」と題する２０１７年８月４日出願の米国仮特許出願第６２／５４１，６０７号に基づく優先権を主張し、後者は、すべての目的のために参照によって本明細書に組み込まれる。 CROSS-REFERENCE TO OTHER APPLICATIONS This application is a continuation-in-part of U.S. patent application Ser. No. 16/171,221, entitled "ARBITRAY VIEW GENERATION," filed Oct. 25, 2018, which is a continuation-in-part of U.S. patent application Ser. No. 15/721,421, entitled "ARBITRAY VIEW GENERATION," filed Sep. 29, 2017 (now U.S. Patent No. 10,163,249), which is a continuation-in-part of U.S. patent application Ser. No. 15/081,553, entitled "ARBITRAY VIEW GENERATION," filed Mar. 25, 2016 (now U.S. Patent No. 9,996,914), all of which are incorporated herein by reference for all purposes. U.S. Patent Application No. 15/721,421 (now U.S. Patent No. 10,163,249) further claims priority to U.S. Provisional Patent Application No. 62/541,607, filed August 4, 2017, entitled "FAST RENDERING OF ASSEMBLED SCENES," the latter of which is incorporated herein by reference for all purposes.

本願は、すべての目的のために参照によって本明細書に組み込まれる、「ＱＵＡＮＴＩＺＥＤＰＥＲＳＰＥＣＴＩＶＥＣＡＭＥＲＡＶＩＥＷＳ」と題する２０１９年１１月８日出願の米国仮特許出願第６２／９３３，２５４号に基づく優先権を主張する。 This application claims priority to U.S. Provisional Patent Application No. 62/933,254, filed November 8, 2019, entitled "QUANTIZED PERSPECTIVE CAMERA VIEWS," which is incorporated by reference herein for all purposes.

既存のレンダリング技術は、品質および速度という相反する目標の間のトレードオフに直面している。高品質なレンダリングは、かなりの処理リソースおよび時間を必要とする。しかしながら、遅いレンダリング技術は、インタラクティブなリアルタイムアプリケーションなど、多くのアプリケーションで許容できない。通例は、低品質だが高速なレンダリング技術が、かかるアプリケーションでは好まれる。例えば、比較的高速なレンダリングのために品質を犠牲にして、ラスタ化が、リアルタイムグラフィックスアプリケーションによって一般に利用される。したがって、品質も速度も大きく損なうことのない改良技術が求められている。 Existing rendering techniques face a trade-off between the conflicting goals of quality and speed. High-quality rendering requires significant processing resources and time. However, slow rendering techniques are unacceptable for many applications, such as interactive real-time applications. Typically, lower-quality but faster rendering techniques are preferred for such applications. For example, rasterization is commonly utilized by real-time graphics applications, sacrificing quality for relatively fast rendering. Thus, improved techniques that do not significantly compromise either quality or speed are needed.

以下の詳細な説明と添付の図面において、本発明の様々な実施形態を開示する。 Various embodiments of the present invention are disclosed in the following detailed description and accompanying drawings.

シーンの任意ビューを生成するためのシステムの一実施形態を示すハイレベルブロック図。1 is a high-level block diagram illustrating one embodiment of a system for generating an arbitrary view of a scene.

データベースアセットの一例を示す図。FIG. 13 is a diagram showing an example of a database asset.

任意パースペクティブを生成するための処理の一実施形態を示すフローチャート。1 is a flow diagram illustrating one embodiment of a process for generating an arbitrary perspective.

アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 1 illustrates an example embodiment of an application in which independent objects are combined to generate an ensemble or composite object.

任意アンサンブルビューを生成するための処理の一実施形態を示すフローチャート。1 is a flow chart illustrating one embodiment of a process for generating an arbitrary ensemble view.

本発明は、処理、装置、システム、物質の組成、コンピュータ読み取り可能な格納媒体上に具現化されたコンピュータプログラム製品、および／または、プロセッサ（プロセッサに接続されたメモリに格納および／またはそのメモリによって提供される命令を実行するよう構成されたプロセッサ）を含め、様々な形態で実装されうる。本明細書では、これらの実施例または本発明が取りうる任意の他の形態が、技術と呼ばれうる。一般に、開示されている処理の工程の順序は、本発明の範囲内で変更されてもよい。特に言及しない限り、タスクを実行するよう構成されるものとして記載されたプロセッサまたはメモリなどの構成要素は、或る時間にタスクを実行するよう一時的に構成された一般的な構成要素として、または、タスクを実行するよう製造された特定の構成要素として実装されてよい。本明細書では、「プロセッサ」という用語は、１または複数のデバイス、回路、および／または、コンピュータプログラム命令などのデータを処理するよう構成された処理コアを指すものとする。 The present invention may be implemented in various forms, including as a process, an apparatus, a system, a composition of matter, a computer program product embodied on a computer-readable storage medium, and/or a processor configured to execute instructions stored in and/or provided by a memory coupled to the processor. These embodiments or any other form the present invention may take may be referred to herein as techniques. In general, the order of steps of a disclosed process may be altered within the scope of the present invention. Unless otherwise noted, components such as a processor or memory described as configured to perform a task may be implemented as general components temporarily configured to perform the task at a given time, or as specific components manufactured to perform the task. As used herein, the term "processor" refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.

以下では、本発明の原理を示す図面を参照しつつ、本発明の１または複数の実施形態の詳細な説明を行う。本発明は、かかる実施形態に関連して説明されているが、どの実施形態にも限定されない。本発明の範囲は、特許請求の範囲によってのみ限定されるものであり、本発明は、多くの代替物、変形物、および、等価物を含む。以下の説明では、本発明の完全な理解を提供するために、多くの具体的な詳細事項が記載されている。これらの詳細事項は、例示を目的としたものであり、本発明は、これらの具体的な詳細事項の一部または全てがなくとも特許請求の範囲に従って実施可能である。簡単のために、本発明に関連する技術分野で周知の技術事項については、本発明が必要以上にわかりにくくならないように、詳細には説明していない。 The following provides a detailed description of one or more embodiments of the present invention with reference to drawings illustrating the principles of the invention. The present invention has been described in connection with such embodiments, but is not limited to any of them. The scope of the present invention is limited only by the claims, and the present invention includes many alternatives, modifications, and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are for the purpose of example, and the present invention may be practiced according to the claims without some or all of these specific details. For simplicity, technical matters well known in the art related to the present invention have not been described in detail so as not to unnecessarily obscure the present invention.

シーンの任意ビューを生成するための技術が開示されている。本明細書に記載の実例は、非常に低い処理オーバヘッドまたは計算オーバヘッドを伴いつつ、高精細度出力も提供し、レンダリング速度と品質との間の困難なトレードオフを効果的に排除する。されている技術は、インタラクティブなリアルタイムグラフィックスアプリケーションに関して、高品質出力を非常に高速に生成するために特に有効である。かかるアプリケーションは、提示されたインタラクティブなビューまたはシーンのユーザ操作に応答してそれに従って、好ましい高品質出力を実質的に即時に提示することに依存する。 Techniques are disclosed for generating arbitrary views of a scene. The examples described herein provide high definition output with very low processing or computational overhead, effectively eliminating the difficult trade-off between rendering speed and quality. The techniques described are particularly useful for generating high quality output very quickly for interactive real-time graphics applications. Such applications depend on the substantially immediate presentation of a preferred high quality output in response to and according to user manipulation of a presented interactive view or scene.

図１は、シーンの任意ビューを生成するためのシステム１００の一実施形態を示すハイレベルブロック図である。図に示すように、任意ビュー生成器１０２が、任意ビューの要求を入力１０４として受信し、既存のデータベースアセット１０６に基づいて、要求されたビューを生成し、入力された要求に応答して、生成されたビューを出力１０８として提供する。様々な実施形態において、任意ビュー生成器１０２は、中央処理装置（ＣＰＵ）またはグラフィックス処置装置（ＧＰＵ）などのプロセッサを備えてよい。図１に示すシステム１００の構成は、説明のために提示されている。一般に、システム１００は、記載した機能を提供する任意の他の適切な数および／または構成の相互接続された構成要素を備えてもよい。例えば、別の実施形態において、任意ビュー生成器１０２は、異なる構成の内部構成要素１１０～１１６を備えてもよく、任意ビュー生成器１０２は、複数の並列物理および／または仮想プロセッサを備えてもよく、データベース１０６は、複数のネットワークデータベースまたはアセットのクラウドを備えてもよい、などである。 1 is a high-level block diagram illustrating one embodiment of a system 100 for generating an arbitrary view of a scene. As shown, an arbitrary view generator 102 receives a request for an arbitrary view as an input 104, generates the requested view based on existing database assets 106, and provides the generated view as an output 108 in response to the input request. In various embodiments, the arbitrary view generator 102 may comprise a processor, such as a central processing unit (CPU) or a graphics processing unit (GPU). The configuration of the system 100 shown in FIG. 1 is presented for illustrative purposes. In general, the system 100 may comprise any other suitable number and/or configuration of interconnected components that provide the described functionality. For example, in another embodiment, the arbitrary view generator 102 may comprise different configurations of internal components 110-116, the arbitrary view generator 102 may comprise multiple parallel physical and/or virtual processors, the database 106 may comprise multiple network databases or clouds of assets, etc.

任意ビュー要求１０４は、シーンの任意パースペクティブの要求を含む。いくつかの実施形態において、シーンの他のパースペクティブすなわち視点を含むシーンの要求パースペクティブは、アセットデータベース１０６内にまだ存在してはいない。様々な実施形態において、任意ビュー要求１０４は、プロセスまたはユーザから受信されてよい。例えば、入力１０４は、提示されたシーンまたはその一部のユーザ操作（提示されたシーンのカメラ視点のユーザ操作など）に応答して、ユーザインターフェスから受信されうる。別の例において、任意ビュー要求１０４は、シーンのフライスルーなど、仮想環境内での運動または移動の経路の指定に応答して受信されてもよい。いくつかの実施形態において、要求できるシーンの可能な任意ビューは、少なくとも部分的に制約されている。例えば、ユーザは、提示されたインタラクティブシーンのカメラ視点を任意のランダムな位置に操作することができない場合があり、シーンの特定の位置またはパースペクティブに制約される。 The arbitrary view request 104 includes a request for an arbitrary perspective of a scene. In some embodiments, the requested perspective of the scene, including other perspectives or viewpoints of the scene, is not already present in the asset database 106. In various embodiments, the arbitrary view request 104 may be received from a process or a user. For example, the input 104 may be received from a user interface in response to user manipulation of the presented scene or a portion thereof (e.g., user manipulation of a camera viewpoint of the presented scene). In another example, the arbitrary view request 104 may be received in response to specifying a path of movement or travel in the virtual environment, such as a fly-through of the scene. In some embodiments, the possible arbitrary views of the scene that can be requested are at least partially constrained. For example, a user may not be able to manipulate the camera viewpoint of a presented interactive scene to any random position, but is constrained to a particular position or perspective of the scene.

データベース１０６は、格納された各アセットの複数のビューを格納している。所与の文脈において、アセットとは、仕様が複数のビューとしてデータベース１０６に格納されている個々のシーンのことである。様々な実施形態において、シーンは、単一のオブジェクト、複数のオブジェクト、または、リッチな仮想環境を含みうる。具体的には、データベース１０６は、各アセットの異なるパースペクティブすなわち視点に対応する複数の画像を格納する。データベース１０６に格納されている画像は、高品質の写真または写実的レンダリングを含む。データベース１０６に入力されるかかる高精細度すなわち高解像度の画像は、オフライン処理中にキャプチャまたはレンダリングされ、もしくは、外部ソースから取得されてよい。いくつかの実施形態において、対応するカメラ特性が、データベース１０６に格納された各画像と共に格納される。すなわち、相対的な位置または場所、向き、回転、奥行情報、焦点距離、絞り、ズームレベルなどのカメラ属性が、各画像と共に格納される。さらに、シャッター速度および露出などのカメラの光学情報が、データベース１０６に格納された各画像と共に格納されてもよい。 The database 106 stores multiple views of each stored asset. In the given context, an asset is an individual scene whose specifications are stored in the database 106 as multiple views. In various embodiments, a scene may include a single object, multiple objects, or a rich virtual environment. Specifically, the database 106 stores multiple images corresponding to different perspectives or viewpoints of each asset. The images stored in the database 106 include high-quality photographs or photorealistic renderings. Such high-definition or high-resolution images entered into the database 106 may be captured or rendered during offline processing or obtained from an external source. In some embodiments, corresponding camera characteristics are stored with each image stored in the database 106. That is, camera attributes such as relative position or location, orientation, rotation, depth information, focal length, aperture, zoom level, etc. are stored with each image. Additionally, camera optical information such as shutter speed and exposure may be stored with each image stored in the database 106.

様々な実施形態において、アセットの任意の数の異なるパースペクティブがデータベース１０６に格納されてよい。図２は、データベースアセットの一例を示す。与えられた例では、椅子オブジェクトの周りの異なる角度に対応する７３のビューがキャプチャまたはレンダリングされ、データベース１０６に格納される。ビューは、例えば、椅子の周りでカメラを回転させるかまたはカメラの前で椅子を回転させることによってキャプチャされてよい。相対的なオブジェクトおよびカメラの位置および向きの情報が、生成された各画像と共に格納される。図２は、１つのオブジェクトを含むシーンのビューを具体的に示している。データベース１０６は、複数のオブジェクトまたはリッチな仮想環境を含むシーンの仕様も格納してよい。かかるケースにおいては、シーンまたは三次元空間の中の異なる位置または場所に対応する複数のビューがキャプチャまたはレンダリングされ、対応するカメラ情報と共にデータベース１０６に格納される。一般に、データベース１０６に格納された画像は、二次元または三次元を含んでよく、アニメーションまたはビデオシーケンスのスチールまたはフレームを含んでよい。 In various embodiments, any number of different perspectives of an asset may be stored in the database 106. FIG. 2 shows an example of a database asset. In the given example, 73 views corresponding to different angles around a chair object are captured or rendered and stored in the database 106. The views may be captured, for example, by rotating the camera around the chair or rotating the chair in front of the camera. Relative object and camera position and orientation information is stored with each generated image. FIG. 2 specifically shows a view of a scene including one object. The database 106 may also store specifications for scenes including multiple objects or rich virtual environments. In such cases, multiple views corresponding to different positions or locations in a scene or three-dimensional space are captured or rendered and stored in the database 106 along with the corresponding camera information. In general, the images stored in the database 106 may include two-dimensional or three-dimensional images and may include stills or frames of an animation or video sequence.

データベース１０６にまだ存在しないシーンの任意ビューの要求１０４に応答して、任意ビュー生成器１０２は、データベース１０６に格納されたシーンの複数の他の既存ビューから、要求された任意ビューを生成する。図１の構成例では、任意ビュー生成器１０２のアセット管理エンジン１１０が、データベース１０６を管理する。例えば、アセット管理エンジン１１０は、データベース１０６におけるデータの格納およびリトリーブを容易にしうる。シーン１０４の任意ビューの要求に応答して、アセット管理エンジン１１０は、データベース１０６からシーンの複数の他の既存ビューを特定して取得する。いくつかの実施形態において、アセット管理エンジン１１０は、データベース１０６からシーンのすべての既存ビューをリトリーブする。あるいは、アセット管理エンジン１１０は、既存ビューの一部（例えば、要求された任意ビューに最も近いビュー）を選択してリトリーブしてもよい。かかるケースにおいて、アセット管理エンジン１１０は、要求された任意ビューを生成するためのピクセルの収集元になりうる一部の既存ビューをインテリジェントに選択するよう構成される。様々な実施形態において、複数の既存ビューが、アセット管理エンジン１１０によって一緒にリトリーブされてもよいし、任意ビュー生成器１０２のその他の構成要素によって必要になり次第リトリーブされてもよい。 In response to a request 104 for an arbitrary view of a scene that does not already exist in the database 106, the arbitrary view generator 102 generates the requested arbitrary view from multiple other existing views of the scene stored in the database 106. In the example configuration of FIG. 1, an asset management engine 110 of the arbitrary view generator 102 manages the database 106. For example, the asset management engine 110 may facilitate the storage and retrieval of data in the database 106. In response to a request for an arbitrary view of the scene 104, the asset management engine 110 identifies and retrieves multiple other existing views of the scene from the database 106. In some embodiments, the asset management engine 110 retrieves all existing views of the scene from the database 106. Alternatively, the asset management engine 110 may select and retrieve a portion of the existing views (e.g., views that are closest to the requested arbitrary view). In such a case, the asset management engine 110 is configured to intelligently select a portion of the existing views from which pixels can be collected to generate the requested arbitrary view. In various embodiments, multiple existing views may be retrieved together by the asset management engine 110 or on an as-needed basis by other components of the arbitrary view generator 102.

アセット管理エンジン１１０によってリトリーブされた各既存ビューのパースペクティブは、任意ビュー生成器１０２のパースペクティブ変換エンジン１１２によって、要求された任意ビューのパースペクティブに変換される。上述のように、正確なカメラ情報が既知であり、データベース１０６に格納された各画像と共に格納されている。したがって、既存ビュー要求された任意ビューへのパースペクティブ変更は、単純な幾何マッピングまたは幾何変換を含む。様々な実施形態において、パースペクティブ変換エンジン１１２は、既存ビューのパースペクティブを任意ビューのパースペクティブに変換するために、任意の１または複数の適切な数学的手法を用いてよい。要求されたビューがどの既存ビューとも同一ではない任意ビューを含む場合、任意ビューのパースペクティブへの既存ビューの変換は、少なくともいくつかのマッピングされていないピクセルまたは失われたピクセル、すなわち、既存ビューに存在しない任意ビューに導入された角度または位置にあるピクセルを含むことになる。 The perspective of each existing view retrieved by the asset management engine 110 is transformed by the perspective transformation engine 112 of the arbitrary view generator 102 into the perspective of the requested arbitrary view. As mentioned above, the exact camera information is known and stored with each image stored in the database 106. Thus, the perspective change from the existing view to the requested arbitrary view includes a simple geometric mapping or transformation. In various embodiments, the perspective transformation engine 112 may use any one or more suitable mathematical techniques to transform the perspective of the existing view into the perspective of the arbitrary view. If the requested view includes an arbitrary view that is not identical to any existing view, the transformation of the existing view into the perspective of the arbitrary view will include at least some unmapped or missing pixels, i.e., pixels at angles or positions introduced into the arbitrary view that are not present in the existing view.

単一のパースペクティブ変換された既存ビューからのピクセル情報では、別のビューのすべてのピクセルを埋めることができない。しかしながら、多くの場合、すべてではないが、要求された任意ビューのほとんどのピクセルが、複数のパースペクティブ変換された既存ビューから収集されうる。任意ビュー生成器１０２のマージエンジン１１４が、複数のパースペクティブ変換された既存ビューからのピクセルを組み合わせて、要求された任意ビューを生成する。理想的には、任意ビューを構成するすべてのピクセルが既存ビューから収集される。これは、例えば、考慮対象となるアセットについて十分に多様なセットの既存ビューまたはパースペクティブが利用可能である場合、および／または、要求されたパースペクティブが既存のパースペクティブとはそれほど異なっていない場合に、可能でありうる。 Pixel information from a single perspective-transformed existing view cannot fill all the pixels of another view. However, in many cases, most, if not all, pixels of the requested arbitrary view may be collected from multiple perspective-transformed existing views. The merge engine 114 of the arbitrary view generator 102 combines pixels from multiple perspective-transformed existing views to generate the requested arbitrary view. Ideally, all pixels that make up the arbitrary view are collected from existing views. This may be possible, for example, when a sufficiently diverse set of existing views or perspectives is available for the asset under consideration and/or when the requested perspective is not too different from the existing perspectives.

複数のパースペクティブ変換された既存ビューからのピクセルを組み合わせまたはマージして、要求された任意ビューを生成するために、任意の適切な技術が用いられてよい。一実施形態において、要求された任意ビューに最も近い第１既存ビューが、データベース１０６から選択されてリトリーブされ、要求された任意ビューのパースペクティブに変換される。次いで、ピクセルが、このパースペクティブ変換された第１既存ビューから収集され、要求された任意ビュー内の対応するピクセルを埋めるために用いられる。第１既存ビューから取得できなかった要求任意ビューのピクセルを埋めるために、これらの残りのピクセルの少なくとも一部を含む第２既存ビューが、データベース１０６から選択されてリトリーブされ、要求任意ビューのパースペクティブへ変換される。次いで、第１既存ビューから取得できなかったピクセルは、このパースペクティブ変換された第２既存ビューから収集され、要求任意ビュー内の対応するピクセルを埋めるために用いられる。この処理は、要求任意ビューのすべてのピクセルが埋められるまで、および／または、すべての既存ビューが使い果たされるかまたは所定の閾値数の既存ビューが利用されるまで、任意の数のさらなる既存ビューについて繰り返されてよい。 Any suitable technique may be used to combine or merge pixels from multiple perspective-transformed existing views to generate the requested arbitrary view. In one embodiment, a first existing view that is closest to the requested arbitrary view is selected and retrieved from the database 106 and transformed into the perspective of the requested arbitrary view. Pixels are then collected from this perspective-transformed first existing view and used to fill the corresponding pixels in the requested arbitrary view. To fill the pixels of the requested arbitrary view that could not be obtained from the first existing view, a second existing view that includes at least a portion of these remaining pixels is selected and retrieved from the database 106 and transformed into the perspective of the requested arbitrary view. The pixels that could not be obtained from the first existing view are then collected from this perspective-transformed second existing view and used to fill the corresponding pixels in the requested arbitrary view. This process may be repeated for any number of additional existing views until all pixels of the requested arbitrary view are filled and/or until all existing views are exhausted or a predetermined threshold number of existing views are utilized.

いくつかの実施形態において、要求任意ビューは、どの既存ビューからも取得できなかったいくつかのピクセルを含みうる。かかる場合、補間エンジン１１６が、要求任意ビューのすべての残りのピクセルを埋めるよう構成されている。様々な実施形態において、要求任意ビュー内のこれらの埋められていないピクセルを生成するために、任意の１または複数の適切な補間技術が補間エンジン１１６によって用いられてよい。利用可能な補間技術の例は、例えば、線形補間、最近隣補間などを含む。ピクセルの補間は、平均法または平滑化を導入する。全体の画像品質は、ある程度の補間によって大きい影響を受けることはないが、過剰な補間は、許容できない不鮮明さを導入しうる。したがって、補間は、控えめに用いることが望ましい場合がある。上述のように、要求任意ビューのすべてのピクセルを既存ビューから取得できる場合には、補間は完全に回避される。しかしながら、要求任意ビューが、どのビューからも取得できないいくつかのピクセルを含む場合には、補間が導入される。一般に、必要な補間の量は、利用可能な既存ビューの数、既存ビューのパースペクティブの多様性、および／または、任意ビューのパースペクティブが既存ビューのパースペクティブに関してどれだけ異なるか、に依存する。 In some embodiments, the requested arbitrary view may include some pixels that could not be obtained from any of the existing views. In such cases, the interpolation engine 116 is configured to fill in all remaining pixels of the requested arbitrary view. In various embodiments, any one or more suitable interpolation techniques may be used by the interpolation engine 116 to generate these unfilled pixels in the requested arbitrary view. Examples of available interpolation techniques include, for example, linear interpolation, nearest neighbor interpolation, etc. Interpolation of pixels introduces averaging or smoothing. While the overall image quality is not significantly affected by a certain degree of interpolation, excessive interpolation may introduce unacceptable blurring. Therefore, it may be desirable to use interpolation sparingly. As mentioned above, if all pixels of the requested arbitrary view can be obtained from existing views, then interpolation is avoided entirely. However, if the requested arbitrary view includes some pixels that cannot be obtained from any of the views, then interpolation is introduced. In general, the amount of interpolation required depends on the number of existing views available, the diversity of perspectives of the existing views, and/or how different the perspective of the arbitrary view is with respect to the perspective of the existing views.

図２に示した例に関して、椅子オブジェクトの周りの７３のビューが、椅子の既存ビューとして格納される。格納されたビューとのいずれとも異なるすなわち特有の椅子オブジェクトの周りの任意ビューが、もしあったとしても好ましくは最小限の補間で、複数のこれらの既存ビューを用いて生成されうる。しかしながら、既存ビューのかかる包括的なセットを生成して格納することが、効率的でなかったり望ましくなかったりする場合がある。いくつかの場合、その代わりに、十分に多様なセットのパースペクティブを網羅する十分に少ない数の既存ビューが生成および格納されてもよい。例えば、椅子オブジェクトの７３のビューが、椅子オブジェクトの周りの少数のビューの小さいセットに縮小されてもよい。 For the example shown in FIG. 2, 73 views around the chair object are stored as existing views of the chair. Any view around the chair object that is different or unique from any of the stored views may be generated using a plurality of these existing views, preferably with minimal, if any, interpolation. However, generating and storing such a comprehensive set of existing views may not be efficient or desirable. In some cases, a sufficiently small number of existing views may instead be generated and stored to cover a sufficiently diverse set of perspectives. For example, the 73 views of the chair object may be reduced to a small set of fewer views around the chair object.

上述のように、いくつかの実施形態において、要求できる可能な任意ビューが、少なくとも部分的に制約される場合がある。例えば、ユーザは、インタラクティブなシーンに関連する仮想カメラを特定の位置に動かすことを制限されうる。図２で与えられた例に関しては、要求できる可能な任意ビューは、椅子オブジェクトの周りの任意の位置に制限され、例えば、椅子オブジェクトの底部のために存在するピクセルデータが不十分であるので、椅子オブジェクトの下の任意の位置を含みえない。許容される任意ビューについてのかかる制約は、要求任意ビューを任意ビュー生成器１０２によって既存データから生成できることを保証する。 As mentioned above, in some embodiments, the possible arbitrary views that can be requested may be at least partially constrained. For example, a user may be restricted from moving a virtual camera associated with an interactive scene to a particular position. With respect to the example given in FIG. 2, the possible arbitrary views that can be requested are restricted to any position around the chair object and may not include any position below the chair object, for example, because insufficient pixel data exists for the bottom of the chair object. Such constraints on the allowed arbitrary views ensure that the requested arbitrary views can be generated from existing data by the arbitrary view generator 102.

任意ビュー生成器１０２は、入力された任意ビュー要求１０４に応答して、要求任意ビュー１０８を生成して出力する。生成された任意ビュー１０８の解像度または品質は、既存ビューからのピクセルが任意ビューを生成するために用いられているので、それを生成するために用いられた既存ビューの品質と同じであるかまたは同等である。したがって、ほとんどの場合に高精細度の既存ビューを用いると、高精細度の出力が得られる。いくつかの実施形態において、生成された任意ビュー１０８は、関連シーンの他の既存ビューと共にデータベース１０６に格納され、後に、任意ビューに対する将来の要求に応答して、そのシーンの他の任意ビューを生成するために用いられてよい。入力１０４がデータベース１０６内の既存ビューの要求を含む場合、要求ビューは、上述のように、他のビューから生成される必要がなく、その代わり、要求ビューは、簡単なデータベースルックアップを用いてリトリーブされ、出力１０８として直接提示される。 The arbitrary view generator 102 generates and outputs a requested arbitrary view 108 in response to an input arbitrary view request 104. The resolution or quality of the generated arbitrary view 108 is the same or comparable to the quality of the existing view used to generate it, since pixels from the existing view are used to generate the arbitrary view. Thus, in most cases, using a high definition existing view will result in a high definition output. In some embodiments, the generated arbitrary view 108 may be stored in the database 106 along with other existing views of the associated scene and later used to generate other arbitrary views of that scene in response to future requests for arbitrary views. If the input 104 includes a request for an existing view in the database 106, the requested view does not need to be generated from the other views as described above; instead, the requested view is retrieved using a simple database lookup and directly presented as the output 108.

任意ビュー生成器１０２は、さらに、記載した技術を用いて任意アンサンブルビューを生成するよう構成されてもよい。すなわち、入力１０４は、複数のオブジェクトを組み合わせて単一のカスタムビューにするための要求を含んでよい。かかる場合、上述の技術は、複数のオブジェクトの各々に対して実行され、複数のオブジェクトを含む単一の統合されたビューすなわちアンサンブルビューを生成するように組み合わせられる。具体的には、複数のオブジェクトの各々の既存ビューが、アセット管理エンジン１１０によってデータベース１０６から選択されてリトリーブされ、それらの既存ビューは、パースペクティブ変換エンジン１１２によって、要求されたビューのパースペクティブに変換され、パースペクティブ変換された既存ビューからのピクセルが、マージエンジン１１４によって、要求されたアンサンブルビューの対応するピクセルを埋めるために用いられ、アンサンブルビュー内の任意の残りの埋められていないピクセルが、補間エンジン１１６によって補間される。いくつかの実施形態において、要求されたアンサンブルビューは、アンサンブルを構成する１または複数のオブジェクトのためにすでに存在するパースペクティブを含みうる。かかる場合、要求されたパースペクティブに対応するオブジェクトアセットの既存ビューは、オブジェクトの他の既存ビューから要求されたパースペクティブを最初に生成する代わりに、アンサンブルビュー内のオブジェクトに対応するピクセルを直接埋めるために用いられる。 The arbitrary view generator 102 may be further configured to generate an arbitrary ensemble view using the described techniques. That is, the input 104 may include a request to combine multiple objects into a single custom view. In such a case, the above-described techniques are performed for each of the multiple objects and combined to generate a single integrated or ensemble view that includes the multiple objects. Specifically, existing views of each of the multiple objects are selected and retrieved from the database 106 by the asset management engine 110, the existing views are transformed to the perspective of the requested view by the perspective transformation engine 112, pixels from the perspective-transformed existing views are used to fill corresponding pixels of the requested ensemble view by the merge engine 114, and any remaining unfilled pixels in the ensemble view are interpolated by the interpolation engine 116. In some embodiments, the requested ensemble view may include perspectives that already exist for one or more objects that make up the ensemble. In such cases, an existing view of the object asset that corresponds to the requested perspective is used to directly fill the pixels that correspond to the object in the ensemble view, instead of first generating the requested perspective from other existing views of the object.

複数のオブジェクトを含む任意アンサンブルビューの一例として、図２の椅子オブジェクトおよび別個に撮影またはレンダリングされたテーブルオブジェクトを考える。椅子オブジェクトおよびテーブルオブジェクトは、両方のオブジェクトの単一のアンサンブルビューを生成するために、開示されている技術を用いて組み合わせられてよい。したがって、開示された技術を用いて、複数のオブジェクトの各々の別個にキャプチャまたはレンダリングされた画像またはビューが、複数のオブジェクトを含み所望のパースペクティブを有するシーンを生成するために、矛盾なく組み合わせられうる。上述のように、各既存ビューの奥行情報は既知である。各既存ビューのパースペクティブ変換は、奥行変換を含んでおり、複数のオブジェクトが、アンサンブルビュー内で互いに対して適切に配置されることを可能にする。 As an example of an arbitrary ensemble view that includes multiple objects, consider the chair object and the separately photographed or rendered table object of FIG. 2. The chair object and the table object may be combined using the disclosed techniques to generate a single ensemble view of both objects. Thus, using the disclosed techniques, separately captured or rendered images or views of each of the multiple objects may be consistently combined to generate a scene that includes the multiple objects and has a desired perspective. As described above, the depth information of each existing view is known. The perspective transformation of each existing view includes a depth transformation, allowing the multiple objects to be properly positioned relative to each other in the ensemble view.

任意アンサンブルビューの生成は、複数の単一オブジェクトを組み合わせてカスタムビューにすることに限定されない。むしろ、複数のオブジェクトまたは複数のリッチな仮想環境を有する複数のシーンが、同様に組み合わせられてカスタムアンサンブルビューにされてもよい。例えば、複数の別個に独立して生成された仮想環境（おそらくは異なるコンテンツ生成源に由来し、おそらくは異なる既存の個々のパースペクティブを有する）が、所望のパースペクティブを有するアンサンブルビューになるように組み合わせられてよい。したがって、一般に、任意ビュー生成器１０２は、おそらくは異なる既存ビューを含む複数の独立したアセットを、所望のおそらくは任意パースペクティブを有するアンサンブルビューに矛盾なく組み合わせまたは調和させるよう構成されてよい。すべての組み合わせられたアセットが同じパースペクティブに正規化されるので、完璧に調和した結果としてのアンサンブルビューが生成される。アンサンブルビューの可能な任意パースペクティブは、アンサンブルビューを生成するために利用可能な個々のアセットの既存ビューに基づいて制約されうる。 The generation of an arbitrary ensemble view is not limited to combining multiple single objects into a custom view. Rather, multiple scenes with multiple objects or multiple rich virtual environments may be similarly combined into a custom ensemble view. For example, multiple separate and independently generated virtual environments (possibly originating from different content sources and possibly having different pre-existing individual perspectives) may be combined into an ensemble view with a desired perspective. Thus, in general, the arbitrary view generator 102 may be configured to consistently combine or blend multiple independent assets, possibly including different pre-existing views, into an ensemble view with a desired, possibly arbitrary perspective. Since all combined assets are normalized to the same perspective, a perfectly matched resulting ensemble view is generated. The possible arbitrary perspectives of the ensemble view may be constrained based on the pre-existing views of the individual assets available for generating the ensemble view.

図３は、任意パースペクティブを生成するための処理の一実施形態を示すフローチャートである。処理３００は、例えば、図１の任意ビュー生成器１０２によって用いられてよい。様々な実施形態において、処理３００は、所定のアセットの任意ビューまたは任意アンサンブルビューを生成するために用いられてよい。 FIG. 3 is a flow diagram illustrating one embodiment of a process for generating an arbitrary perspective. Process 300 may be used, for example, by arbitrary view generator 102 of FIG. 1. In various embodiments, process 300 may be used to generate an arbitrary view or an arbitrary ensemble view of a given asset.

処理３００は、任意パースペクティブの要求が受信される工程３０２において始まる。いくつかの実施形態では、工程３０２において受信された要求は、シーンのどの既存の利用可能なパースペクティブとも異なる所定のシーンの任意パースペクティブの要求を含みうる。かかる場合、例えば、任意パースペクティブ要求は、そのシーンの提示されたビューのパースペクティブの変更を要求されたことに応じて受信されてよい。パースペクティブのかかる変更は、カメラのパン、焦点距離の変更、ズームレベルの変更など、シーンに関連する仮想カメラの変更または操作によって促されてよい。あるいは、いくつかの実施形態において、工程３０２で受信された要求は、任意アンサンブルビューの要求を含んでもよい。一例として、かかる任意アンサンブルビュー要求は、複数の独立したオブジェクトの選択を可能にして、選択されたオブジェクトの統合されたパースペクティブ修正済みのアンサンブルビューを提供するアプリケーションに関して受信されうる。 Process 300 begins at step 302, where a request for an arbitrary perspective is received. In some embodiments, the request received at step 302 may include a request for an arbitrary perspective of a given scene that is different from any existing available perspective of the scene. In such a case, for example, the arbitrary perspective request may be received in response to a request to change the perspective of a presented view of the scene. Such a change in perspective may be prompted by a change or manipulation of a virtual camera associated with the scene, such as panning the camera, changing the focal length, changing the zoom level, etc. Alternatively, in some embodiments, the request received at step 302 may include a request for an arbitrary ensemble view. As an example, such an arbitrary ensemble view request may be received in relation to an application that allows selection of multiple independent objects to provide a unified perspective-modified ensemble view of the selected objects.

工程３０４において、要求された任意パースペクティブの少なくとも一部を生成する元となる複数の既存画像が、１または複数の関連アセットデータベースからリトリーブされる。複数のリトリーブされた画像は、工程３０２において受信された要求が所定のアセットの任意パースペクティブの要求を含む場合には、所定のアセットに関連してよく、また、工程３０２において受信された要求が任意アンサンブルビューの要求を含む場合には、複数のアセットに関連してよい。 At step 304, a number of pre-existing images from which at least a portion of the requested arbitrary perspective is generated are retrieved from one or more relevant asset databases. The retrieved images may be associated with a given asset if the request received at step 302 includes a request for an arbitrary perspective of the given asset, or may be associated with multiple assets if the request received at step 302 includes a request for an arbitrary ensemble view.

工程３０６において、異なるパースペクティブを有する工程３０４でリトリーブされた複数の既存画像の各々が、工程３０２において要求された任意パースペクティブに変換される。工程３０４でリトリーブされた既存画像の各々は、関連するパースペクティブ情報を含む。各画像のパースペクティブは、相対位置、向き、回転、角度、奥行、焦点距離、絞り、ズームレベル、照明情報など、その画像の生成に関連するカメラ特性によって規定される。完全なカメラ情報が各画像について既知であるので、工程３０６のパースペクティブ変換は、単純な数学演算を含む。いくつかの実施形態において、工程３０６は、任意選択的に、すべての画像が同じ所望の照明条件に一貫して正規化されるような光学変換をさらに含む。 In step 306, each of the multiple existing images retrieved in step 304 having different perspectives is transformed to the arbitrary perspective requested in step 302. Each of the existing images retrieved in step 304 includes associated perspective information. The perspective of each image is defined by the camera characteristics associated with the generation of that image, such as relative position, orientation, rotation, angle, depth, focal length, aperture, zoom level, lighting information, etc. Since the complete camera information is known for each image, the perspective transformation of step 306 involves simple mathematical operations. In some embodiments, step 306 optionally further includes an optical transformation such that all images are consistently normalized to the same desired lighting conditions.

工程３０８では、工程３０２において要求された任意パースペクティブを有する画像の少なくとも一部が、パースペクティブ変換済みの既存画像から収集されたピクセルで埋められる。すなわち、複数のパースペクティブ補正済みの既存画像からのピクセルが、要求された任意パースペクティブを有する画像を生成するために用いられる。 In step 308, at least a portion of the image having the requested arbitrary perspective in step 302 is filled with pixels collected from the existing perspective-transformed image. That is, pixels from multiple existing perspective-corrected images are used to generate the image having the requested arbitrary perspective.

工程３１０では、要求された任意パースペクティブを有する生成された画像が完成したか否かが判定される。要求された任意パースペクティブを有する生成された画像が完成していないと工程３１０において判定された場合、生成された画像の任意の残りの埋められていないピクセルを取得するためのさらなる既存画像が利用可能であるか否かが工程３１２において判定される。さらなる既存画像が利用可能であると工程３１２において判定された場合、１または複数のさらなる既存画像が工程３１４においてリトリーブされ、処理３００は工程３０６に進む。 In step 310, it is determined whether the generated image having the requested arbitrary perspective is complete. If it is determined in step 310 that the generated image having the requested arbitrary perspective is not complete, it is determined in step 312 whether additional existing images are available from which to obtain any remaining unfilled pixels of the generated image. If it is determined in step 312 that additional existing images are available, one or more additional existing images are retrieved in step 314 and process 300 proceeds to step 306.

要求された任意パースペクティブを有する生成された画像が完成していないと工程３１０において判定され、かつ、もはや既存画像が利用できないと工程３１２において判定された場合、生成された画像のすべての残りの埋められていないピクセルが工程３１６において補間される。任意の１または複数の適切な補間技術が、工程３１６で用いられてよい。 If it is determined in step 310 that the generated image having the requested arbitrary perspective is not complete and it is determined in step 312 that no existing image is available any more, then all remaining unfilled pixels of the generated image are interpolated in step 316. Any suitable interpolation technique or techniques may be used in step 316.

要求された任意パースペクティブを有する生成された画像が完成したと工程３１０において判定された場合、または、工程３１６においてすべての残りの埋められていないピクセルを補間した後、要求された任意パースペクティブを有する生成済みの画像が工程３１８において出力される。その後、処理３００は終了する。 If the generated image having the requested arbitrary perspective is determined to be complete in step 310, or after interpolating all remaining unfilled pixels in step 316, the generated image having the requested arbitrary perspective is output in step 318. Thereafter, process 300 ends.

上述のように、開示されている技術は、他の既存のパースペクティブに基づいて任意パースペクティブを生成するために用いられてよい。カメラ情報が各既存パースペクティブと共に保存されているので、異なる既存のパースペクティブを共通の所望のパースペクティブに正規化することが可能である。所望のパースペクティブを有する結果としての画像は、パースペクティブ変換された既存画像からピクセルを取得することで構築できる。開示されている技術を用いた任意パースペクティブの生成に関連する処理は、高速でほぼ即時であるだけでなく、高品質の出力も生み出すため、開示されている技術は、インタラクティブなリアルタイムグラフィックスアプリケーションに対して特に強力な技術となっている。 As mentioned above, the disclosed techniques may be used to generate arbitrary perspectives based on other existing perspectives. Because camera information is stored with each existing perspective, it is possible to normalize the different existing perspectives to a common desired perspective. A resulting image having the desired perspective can be constructed by taking pixels from the perspective-transformed existing images. The processing associated with generating arbitrary perspectives using the disclosed techniques is not only fast and nearly instantaneous, but also produces high-quality output, making the disclosed techniques particularly powerful for interactive real-time graphics applications.

開示されている技術は、さらに、複数のオブジェクトの各々の利用可能な画像またはビューを用いた、複数のオブジェクトを含む任意アンサンブルビューの生成を記載する。上述のように、パースペクティブの変換および／または正規化は、複数のオブジェクトの別個にキャプチャまたはレンダリングされた画像またはビューを含むピクセルが、所望の任意アンサンブルビューになるように矛盾なく組み合わせられることを可能にする。 The disclosed techniques further describe the generation of an arbitrary ensemble view that includes multiple objects using available images or views of each of the multiple objects. As described above, perspective transformation and/or normalization allows pixels that comprise separately captured or rendered images or views of multiple objects to be consistently combined into a desired arbitrary ensemble view.

いくつかの実施形態において、シーンまたはアンサンブルビューに含まれることが望ましいコンテンツを選択して配置することによって、最初にシーンまたはアンサンブルビューを構築または組み立てることが望ましい場合がある。いくつかのかかる場合に、複数のオブジェクトが、シーンまたはアンサンブルビューを含む合成オブジェクトを作成するために、積木のようにスタックまたは組み合わせられてよい。一例として、複数の独立したオブジェクトが、シーンまたはアンサンブルビューを作成するために、選択され、例えばキャンバス上に、適切に配置されるインタラクティブアプリケーションを考える。インタラクティブアプリケーションは、例えば、視覚化アプリケーションまたはモデリングアプリケーションを含んでよい。かかるアプリケーションにおいて、関連する焦点距離に起因する射影歪みにより、シーンまたはアンサンブルビューを構築するために、オブジェクトの任意ビューを利用できない。むしろ、実質的に射影歪みがない所定のオブジェクトビューが、次に記載するように利用される。 In some embodiments, it may be desirable to first construct or assemble a scene or ensemble view by selecting and arranging content that is desired to be included in the scene or ensemble view. In some such cases, multiple objects may be stacked or combined like building blocks to create a composite object that comprises the scene or ensemble view. As an example, consider an interactive application in which multiple independent objects are selected and appropriately positioned, e.g., on a canvas, to create a scene or ensemble view. Interactive applications may include, for example, visualization or modeling applications. In such applications, due to perspective distortions caused by the associated focal lengths, arbitrary views of objects cannot be used to construct the scene or ensemble view. Rather, predetermined object views that are substantially free of perspective distortion are utilized as described below.

オブジェクトの正投影ビューが、いくつかの実施形態において、複数の独立したオブジェクトを含むシーンまたはアンサンブルビューをモデル化または規定するために用いられる。正投影ビューは、光線または投影線が実質的に平行になるように、そのサイズに対して対象物から遠い距離に配置され、比較的長い焦点距離を有する（仮想の）カメラによって近似された平行射影を含む。正投影ビューは、奥行を有しておらず、または、固定の奥行を有しており、そのため、射影歪みが全くまたはほとんどない。したがって、オブジェクトの正投影ビューは、アンサンブルシーンまたは合成オブジェクトを指定する時に積木と同様に用いられてよい。任意の組みあわせのオブジェクトを含むアンサンブルシーンが、かかる正投影ビューを用いて指定または規定された後、シーンまたはそのオブジェクトは、図１～図３の記載に関して上述した任意ビュー生成技術を用いて任意の所望のカメラパースペクティブに変換されてよい。 Orthographic views of objects are used in some embodiments to model or define scenes or ensemble views that include multiple independent objects. An orthographic view includes a parallel projection approximated by a (virtual) camera with a relatively long focal length, positioned at a large distance from the object relative to its size, such that the light or projection rays are substantially parallel. An orthographic view has no depth or a fixed depth, and therefore has little or no perspective distortion. Thus, orthographic views of objects may be used similarly to building blocks when specifying ensemble scenes or composite objects. After an ensemble scene containing any combination of objects has been specified or defined using such orthographic views, the scene or its objects may be transformed to any desired camera perspective using the arbitrary view generation techniques described above with respect to the description of Figures 1-3.

いくつかの実施形態において、図１のシステム１００のデータベース１０６に格納されたアセットの複数のビューは、アセットの１または複数の正投影ビューを含む。かかる正投影ビューは、三次元ポリゴンメッシュモデルからキャプチャ（例えば、撮影またはスキャン）もしくはレンダリングされてよい。あるいは、正投影ビューは、図１～図３の記載に関して上述した任意ビュー生成技術に従って、データベース１０６内で利用可能なアセットの他のビューから生成されてもよい。 In some embodiments, the multiple views of an asset stored in database 106 of system 100 of FIG. 1 include one or more orthographic views of the asset. Such orthographic views may be captured (e.g., photographed or scanned) or rendered from a three-dimensional polygon mesh model. Alternatively, the orthographic views may be generated from other views of the asset available in database 106 according to any of the view generation techniques described above with respect to the description of FIGS. 1-3.

図４Ａ～図４Ｎは、アンサンブルまたは合成オブジェクトまたはシーンを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す。具体的には、図４Ａ～図４Ｎは、異なるユニットソファ構成を生成するために、様々な独立したソファ構成要素が組み合わせられる家具組み立てアプリケーションの一例を示す。 FIGS. 4A-4N show example embodiments of applications in which separate objects are combined to generate an ensemble or composite object or scene. Specifically, FIGS. 4A-4N show an example of a furniture assembly application in which various separate sofa components are combined to generate different modular sofa configurations.

図４Ａは、３つの独立したソファ構成要素（すなわち、左アーム付き一人掛け、アームなし二人掛け、および、右アーム付きシェーズロング）を示す斜視図の一例である。図４Ａの例における斜視図は各々、２５ｍｍの焦点距離を有する。図に見られるように、結果として生じる射影歪みが、互いに隣接させた構成要素のスタッキング（すなわち、構成要素の隣り合わせの配置）（これは、構成要素を含むユニットソファ構成を組み立てる時に望まれる場合がある）を妨げる。 Figure 4A is an example of a perspective view showing three separate sofa components (i.e., a one-seater with left arms, a loveseat without arms, and a chaise longue with right arms). Each of the perspective views in the example of Figure 4A has a focal length of 25 mm. As can be seen, the resulting perspective distortion prevents stacking of the components adjacent to one another (i.e., placing the components side-by-side), which may be desired when assembling a modular sofa configuration that includes the components.

図４Ｂは、図４Ａと同じ３つの構成要素の正投影ビューの一例を示す。図に示すように、オブジェクトの正投影ビューは、モジュール式またはブロック状であり、隣り合わせにスタックまたは配置されるのに適している。しかしながら、奥行情報が、正投影ビューでは実質的に失われる。図に見られるように、図４Ａでは、特にシェーズロングに関して、奥行の差が見られるが、正投影ビューでは、３つとも同じ奥行を有するように見える。 Figure 4B shows an example of an orthographic view of the same three components as Figure 4A. As can be seen, the orthographic view of the objects is modular or block-like, suitable for stacking or placing next to each other. However, depth information is essentially lost in the orthographic view. As can be seen, in Figure 4A there are depth differences, particularly with respect to the chaise longue, but in the orthographic view, all three appear to have the same depth.

図４Ｃは、合成オブジェクトを規定するために、図４Ｂの３つの構成要素の正投影ビューを組み合わせた一例を示す。すなわち、図４Ｃは、図４Ｂの３つの構成要素の正投影ビューを隣り合わせに配置することによるユニットソファの正投影ビューの生成を示している。図４Ｃに示すように、３つのソファ構成要素の正投影ビューの境界ボックスが互いに隣接してぴったりと合うことで、ユニットソファの正投影ビューが作成される。すなわち、構成要素の正投影ビューは、シーン内での構成要素のユーザフレンドリーな操作と、正確な配置とを容易にする。 Figure 4C shows an example of combining the orthographic views of the three components of Figure 4B to define a composite object. That is, Figure 4C illustrates the creation of an orthographic view of a modular sofa by placing the orthographic views of the three components of Figure 4B side-by-side. As shown in Figure 4C, the bounding boxes of the orthographic views of the three sofa components fit closely next to each other to create the orthographic view of the modular sofa. That is, the orthographic views of the components facilitate user-friendly manipulation and precise placement of the components in the scene.

図４Ｄおよび図４Ｅは各々、図１～図３の記載に関して上述した任意ビュー生成技術を用いて、図４Ｃの合成オブジェクトの正投影ビューを任意カメラパースペクティブに変換した一例を示す。すなわち、図４Ｄおよび図４Ｅの各例において、合成オブジェクトの正投影ビューが、奥行を正確に描写する通常のカメラパースペクティブに変換されている。図に示すように、正投影ビューでは失われていた一人掛けおよび二人掛けに対するシェーズロングの相対的な奥行が、図４Ｄおよび図４Ｅの斜視図で見えるようになっている。 Each of Figures 4D and 4E shows an example of converting the orthographic view of the composite object of Figure 4C to an arbitrary camera perspective using the arbitrary view generation technique described above with respect to the description of Figures 1-3. That is, in each of the examples of Figures 4D and 4E, the orthographic view of the composite object has been converted to a normal camera perspective that accurately depicts depth. As can be seen, the relative depth of the chaise longue to the one-seater and loveseater, which was lost in the orthographic view, is now visible in the perspective views of Figures 4D and 4E.

図４Ｆ、図４Ｇ、および、図４Ｈは、それぞれ、左アーム付き一人掛け、アームなし二人掛け、および、右アーム付きシェーズロングの複数の正投影ビューの例を示す。上述のように、アセットの任意の数の異なるビューまたはパースペクティブが、図１のシステム１００のデータベース１０６に格納されていてよい。図４Ｆ～図４Ｈのセットは、別個にキャプチャまたはレンダリングされてデータベース１０６に格納された各アセットの周りの異なる角度に対応する２５の正射影ビューを含んでおり、それらの正射影ビューから、オブジェクトの任意の組み合わせの任意の任意ビューが生成されうる。家具組み立てアプリケーションにおいて、例えば、上面ビューは、床面配置に有用でありうるが、前面ビューは、壁面配置に有用でありうる。いくつかの実施形態において、よりコンパクトな参照データセットを維持するために、所定の数の正投影ビューのみが、データベース１０６内にアセットに対して格納され、そこからアセットの任意の任意ビューが生成されてよい。 4F, 4G, and 4H show examples of multiple orthographic views of a single seat with left arms, a loveseat without arms, and a chaise longue with right arms, respectively. As mentioned above, any number of different views or perspectives of an asset may be stored in the database 106 of the system 100 of FIG. 1. The set of FIGS. 4F-4H includes 25 orthographic views corresponding to different angles around each asset separately captured or rendered and stored in the database 106, from which any arbitrary view of any combination of objects may be generated. In a furniture assembly application, for example, a top view may be useful for floor placement, while a front view may be useful for wall placement. In some embodiments, to maintain a more compact reference data set, only a predetermined number of orthographic views are stored for an asset in the database 106, from which any arbitrary view of the asset may be generated.

図４Ｉ～図４Ｎは、オブジェクトの任意の組みあわせの任意ビューまたはパースペクティブを生成する様々な例を示す。具体的には、図４Ｉ～図４Ｎの各々は、複数の別個のソファオブジェクトまたは構成要素を含むユニットソファの任意パースペクティブまたは任意ビューの生成を示している。各任意ビューは、例えば、図１～図３の記載に関して上述した任意ビュー生成技術を用いて、アンサンブルビューまたは合成オブジェクトを構成するオブジェクトの１または複数の正投影（またはその他の）ビューを任意ビューに変換し、任意ビューを埋めるためにピクセルを取り入れ、場合によっては任意の残りの不足ピクセルを補間することによって、生成されてよい。 Figures 4I-4N show various examples of generating an arbitrary view or perspective of an arbitrary combination of objects. Specifically, each of Figures 4I-4N shows the generation of an arbitrary perspective or view of a unit sofa that includes multiple separate sofa objects or components. Each arbitrary view may be generated, for example, by converting one or more orthographic (or other) views of the objects that make up the ensemble view or composite object into the arbitrary view using the arbitrary view generation techniques described above with respect to the description of Figures 1-3, incorporating pixels to fill the arbitrary view, and possibly interpolating any remaining missing pixels.

上述のように、データベース１０６内のアセットの各画像またはビューは、対応するメタデータ（相対的なオブジェクトおよびカメラの位置および向きの情報ならびに照明情報など）と共に格納されていてよい。メタデータは、アセットの三次元ポリゴンメッシュモデルからビューをレンダリングする時、アセットを撮像またはスキャンする時（この場合、奥行および／または面法線のデータが推定されてよい）、または、それら両方を組み合わせた時に、生成されてよい。 As mentioned above, each image or view of an asset in database 106 may be stored with corresponding metadata (such as relative object and camera position and orientation information, as well as lighting information). The metadata may be generated when rendering the view from a three-dimensional polygon mesh model of the asset, when imaging or scanning the asset (in which case depth and/or surface normal data may be estimated), or a combination of both.

アセットの所定のビューまたは画像が、画像を含む各ピクセルのピクセル強度値（例えば、ＲＧＢ値）と、各ピクセルに関連する様々なメタデータパラメータとを含む。いくつかの実施形態において、ピクセルの赤、緑、および、青（ＲＧＢ）のチャネルまたは値の内の１または複数が、ピクセルメタデータを符号化するために用いられてよい。ピクセルメタデータは、例えば、そのピクセルに投影される三次元空間内の点の相対的な場所または位置（例えば、ｘ、ｙ、および、ｚ座標値）に関する情報を含んでよい。さらに、ピクセルメタデータは、その位置における面法線ベクトルに関する情報（例えば、ｘ、ｙ、および、ｚ軸となす角度）を含んでもよい。また、ピクセルメタデータは、テクスチャマッピング座標（例えば、ｕおよびｖ座標値）を含んでもよい。かかる場合、点における実際のピクセル値は、テクスチャ画像における対応する座標のＲＧＢ値を読み取ることによって決定される。 A given view or image of an asset includes pixel intensity values (e.g., RGB values) for each pixel comprising the image, as well as various metadata parameters associated with each pixel. In some embodiments, one or more of the red, green, and blue (RGB) channels or values of a pixel may be used to encode pixel metadata. The pixel metadata may include, for example, information about the relative location or position (e.g., x, y, and z coordinate values) of a point in three-dimensional space that is projected onto that pixel. In addition, the pixel metadata may include information about the surface normal vector at that location (e.g., the angle with the x, y, and z axes). The pixel metadata may also include texture mapping coordinates (e.g., u and v coordinate values). In such cases, the actual pixel value at the point is determined by reading the RGB values of the corresponding coordinates in the texture image.

面法線ベクトルは、生成された任意ビューまたはシーンの照明の修正または変更を容易にする。より具体的には、シーンの照明変更は、ピクセルの面法線ベクトルが、新たに追加、削除、または、その他の方法で変更された光源の方向にどれだけうまく一致するか（例えば、光源方向とピクセルの法線ベクトルとのドット積によって、少なくとも部分的に定量化されうる）に基づいて、ピクセル値をスケーリングすることを含む。テクスチャマッピング座標を用いてピクセル値を規定すると、生成された任意ビューまたはシーンもしくはその一部のテクスチャの修正または変更が容易になる。より具体的には、テクスチャは、参照されたテクスチャ画像を、同じ寸法を有する別のテクスチャ画像と単に交換または置換することによって変更されることができる。 The surface normal vectors facilitate modification or alteration of the lighting of any generated view or scene. More specifically, altering the lighting of a scene includes scaling pixel values based on how well the pixel's surface normal vector matches the direction of a newly added, removed, or otherwise altered light source (e.g., which may be quantified, at least in part, by the dot product of the light source direction and the pixel's normal vector). Defining pixel values using texture mapping coordinates facilitates modification or alteration of the texture of any generated view or scene or part thereof. More specifically, the texture can be altered by simply exchanging or replacing a referenced texture image with another texture image having the same dimensions.

開示されている任意ビュー生成技術は、効果的に、比較的低い計算コストのパースペクティブ変換および／またはルックアップ動作に基づいている。任意（アンサンブル）ビューは、正しいピクセルを単に選択し、生成される任意ビューをそれらのピクセルで適切に埋めることによって生成されてよい。いくつかの場合、ピクセル値は、例えば、照明が調整されている場合に、任意選択的にスケーリングされてよい。開示されている技術の低いストレージオーバヘッドおよび処理オーバヘッドは、生成の元となる高精細度の参照ビューと同等の品質で、複雑なシーンの任意ビューを高速、リアルタイム、または、オンデマンドで生成することを容易にする。 The disclosed arbitrary view generation techniques are effectively based on relatively low computational cost perspective transformations and/or lookup operations. An arbitrary (ensemble) view may be generated by simply selecting the correct pixels and appropriately filling the generated arbitrary view with those pixels. In some cases, pixel values may be optionally scaled, for example, if lighting has been adjusted. The low storage and processing overhead of the disclosed techniques facilitates fast, real-time, or on-demand generation of arbitrary views of complex scenes with quality comparable to the high-definition reference views from which they are generated.

上述のように、いくつかの実施形態においてアンサンブルまたは合成オブジェクトまたはシーンを組み立てることは、正投影ビューを用いて、アンサンブルを構成する複数のオブジェクトアセットを指定することを含む。正投影ビューは、アンサンブルシーンにおける複数のオブジェクトまたはアセットの正確な配置および整列を容易にする。次いで、アンサンブルシーンの正投影ビューが、例えば、任意の望ましいまたは要求されたパースペクティブを生成するために、任意の任意カメラパースペクティブに変換されてよい。アンサンブルビューを所定のカメラパースペクティブへ変換することは、上述の技術を用いて、アンサンブルシーンを構成する複数のオブジェクトまたはアセットの各々を所定のパースペクティブへ個別に変換することを含んでよい。任意アンサンブルビューを生成するための上述の技術は、比較的効率的であるが、さらにいっそう効率的であることが、エンドユーザにはほとんど検出できない待ち時間ペナルティで、ほとんど即時または少なくとも非常に高速に、出力を生成することが有利である特定のアプリケーション（例えば、インタラクティブでリアルタイムな体験をユーザに提供するアプリケーションなど）で望ましい場合がある。 As described above, assembling an ensemble or composite object or scene in some embodiments involves specifying the multiple object assets that make up the ensemble using an orthographic view. The orthographic view facilitates precise placement and alignment of the multiple objects or assets in the ensemble scene. The orthographic view of the ensemble scene may then be transformed to any arbitrary camera perspective, for example, to generate any desired or required perspective. Transforming the ensemble view to a given camera perspective may include individually transforming each of the multiple objects or assets that make up the ensemble scene to the given perspective using the techniques described above. While the techniques described above for generating an arbitrary ensemble view are relatively efficient, being even more efficient may be desirable in certain applications (e.g., applications that provide users with an interactive, real-time experience) where it is advantageous to generate output almost instantly or at least very quickly, with a latency penalty that is barely detectable to the end user.

いくつかの実施形態において、効率のさらなる改善が、アンサンブルシーンを構成する複数のオブジェクトまたはアセットの大部分（例えば、その正投影ビューまたはその他のビュー）を所定の任意パースペクティブに変換することに関連する処理を排除することによって、少なくとも部分的に促進されてもよい。その代わり、アンサンブルシーン内のオブジェクトまたはアセットの所定の位置および向きについて所定の任意パースペクティブに最も近くまたは最も類似したオブジェクトまたはアセットの利用可能な既存ビューが、所定の任意パースペクティブを表す出力アンサンブルビューまたは画像を生成する時に、オブジェクトまたはアセットに対して用いられる。ほとんどの場合、結果として得られる出力アンサンブルビューは、完全にパースペクティブが正確なわけではないが、多くのアプリケーションにとって許容可能であり完全にパースペクティブの正確な出力を生成するよりも大幅に短い待ち時間で生成される適切な近似を提供する。次に、アンサンブルを構成する１または複数のオブジェクトまたはアセットのすでに存在する参照ビューの最大限に量子化されたサブセットを用いた、任意カメラ姿勢に対する任意アンサンブルビューのかかる近似の生成について、さらに詳しく記載する。 In some embodiments, further improvements in efficiency may be facilitated, at least in part, by eliminating the processing associated with transforming a large portion of the objects or assets (e.g., their orthographic or other views) that make up the ensemble scene to the predetermined arbitrary perspective. Instead, the available existing views of the objects or assets that are closest or most similar to the predetermined arbitrary perspective for a given position and orientation of the objects or assets in the ensemble scene are used for the objects or assets when generating the output ensemble view or image that represents the predetermined arbitrary perspective. In most cases, the resulting output ensemble view is not completely perspective accurate, but provides a suitable approximation that is acceptable for many applications and is generated with significantly less latency than generating a completely perspective accurate output. The generation of such an approximation of an arbitrary ensemble view for an arbitrary camera pose using a maximally quantized subset of already existing reference views of one or more objects or assets that make up the ensemble is described in more detail below.

図５は、任意アンサンブルビューを生成するための処理の一実施形態を示すハイレベルフローチャートである。いくつかの実施形態において、処理５００は、アンサンブルシーンを構成する（大部分または全部ではないとしても）少なくとも１または複数のオブジェクトまたはアセットの単一の最良一致既存ビューを適切に組みあわせまたは合成することに少なくとも部分的に基づいて、アンサンブルシーンの出力画像を効率的に生成するために用いられる。 FIG. 5 is a high-level flow chart illustrating one embodiment of a process for generating an arbitrary ensemble view. In some embodiments, the process 500 is used to efficiently generate an output image of an ensemble scene based at least in part on appropriately combining or compositing a single best-match existing view of at least one or more objects or assets (if not most or all) that make up the ensemble scene.

処理５００は、アンサンブルシーンの所定のパースペクティブの要求が受信される工程５０２において始まる。アンサンブルシーンの要求された所定のパースペクティブは、アンサンブルシーンに関して選択されまたは他の方法で指定されたカメラパースペクティブを含んでおり、一般に、任意の任意ビューを含んでよい。所与の文脈での任意ビューは、仕様またはカメラ姿勢が要求の前に予めわかっていないシーンの任意の所望のビューまたはパースペクティブを含む。アンサンブルシーンは、複数の独立したオブジェクトまたはアセットの複合ビューを含む。一般に、独立したオブジェクトまたはアセットの仕様は、異なるカメラパースペクティブおよび対応するメタデータを有する個々のオブジェクトまたはアセットの既存参照画像またはビューのセットを含んでおり、その内の１または複数が、オブジェクトまたはアセットに関連するアンサンブルシーンの一部を生成または指定するために用いられてよい。いくつかの実施形態において、工程５０２の要求は、アンサンブルシーン空間におけるカメラアングルまたはカメラ姿勢の操作、および／または、合成シーンまたはアンサンブルシーンを作成するための複数のオブジェクトまたはアセットの配置、を容易にするインタラクティブなモバイルまたはウェブベースのアプリケーションから受信される。例えば、要求は、視覚化アプリケーションまたはモデリングアプリケーションもしくは拡張現実（ＡＲ）アプリケーションから受信されてよい。いくつかの実施形態において、正投影ビューは、アンサンブルシーンを構成する複数のオブジェクトまたはアセットのより容易な操作、配置、および、整列を容易にするので、工程５０２の要求は、アンサンブルシーンの正投影ビューに関して受信される。 The process 500 begins at step 502, where a request for a predefined perspective of an ensemble scene is received. The requested predefined perspective of the ensemble scene includes a selected or otherwise specified camera perspective for the ensemble scene, and may generally include any arbitrary view. An arbitrary view in a given context includes any desired view or perspective of a scene where the specification or camera pose is not known in advance prior to the request. The ensemble scene includes a composite view of multiple independent objects or assets. In general, the specification of an independent object or asset includes a set of existing reference images or views of the individual objects or assets with different camera perspectives and corresponding metadata, one or more of which may be used to generate or specify a portion of the ensemble scene associated with the object or asset. In some embodiments, the request of step 502 is received from an interactive mobile or web-based application that facilitates the manipulation of camera angles or camera poses in the ensemble scene space and/or the positioning of multiple objects or assets to create a composite or ensemble scene. For example, the request may be received from a visualization or modeling application or an augmented reality (AR) application. In some embodiments, the request of operation 502 is received with respect to an orthographic view of the ensemble scene, since an orthographic view facilitates easier manipulation, placement, and alignment of multiple objects or assets that make up the ensemble scene.

工程５０４では、最も近くまたは最も類似した一致既存参照画像またはビューが、アンサンブルシーンを構成する１または複数のオブジェクトまたはアセットの少なくとも一部の各々に対して選択される。工程５０４では、アンサンブルシーンを構成する個々または独立したオブジェクトまたはアセットに対して、順次および／または並列に実行されてよい。いくつかの実施形態において、アンサンブルシーン空間におけるオブジェクトまたはアセットの所与の姿勢のために要求された所定のパースペクティブに最良一致する１つのみすなわち単一の既存参照画像またはビューが、オブジェクトまたはアセットに対して選択される。アンサンブルシーン空間は、適切な方法で規定された所定の原点（アンサンブルシーンの中心（例えば、重心）など）を有するアンサンブルシーン座標系を備える。工程５０４では、アンサンブルシーンを構成するオブジェクトまたはアセットに対して最も近い一致既存参照画像またはビューを選択するために、アンサンブルシーン座標系に関するオブジェクトまたはアセットの位置ならびに向きまたは姿勢が決定され、その後、オブジェクトまたはアセットの既存参照画像またはビューに関連付けられた個々の座標系における等価の姿勢に転換または変換またはその他の方法で相関される。したがって、最も近い一致既存参照画像またはビューが工程５０４において選択されうるように、比較的低い計算の複雑性を有する単純なカメラメトリクス計算が、アンサンブルシーンにおいて要求されたパースペクティブならびに相対的なオブジェクトまたはアセットの姿勢に基づいて実行される。 In step 504, the closest or most similar matching existing reference image or view is selected for each of at least a portion of one or more objects or assets that make up the ensemble scene. Step 504 may be performed sequentially and/or in parallel for individual or independent objects or assets that make up the ensemble scene. In some embodiments, only one or a single existing reference image or view is selected for an object or asset that best matches a predetermined perspective required for a given pose of the object or asset in the ensemble scene space. The ensemble scene space comprises an ensemble scene coordinate system having a predetermined origin (such as the center (e.g., center of gravity) of the ensemble scene) defined in an appropriate manner. In step 504, to select the closest matching existing reference image or view for an object or asset that makes up the ensemble scene, the position and orientation or pose of the object or asset with respect to the ensemble scene coordinate system is determined and then translated, transformed or otherwise correlated to an equivalent pose in the individual coordinate system associated with the existing reference image or view of the object or asset. Therefore, a simple camera metric calculation with relatively low computational complexity is performed based on the requested perspective and the relative object or asset poses in the ensemble scene so that the closest matching existing reference image or view can be selected in step 504.

１または複数の基準および／または閾値が、オブジェクトまたはアセットに対する最も近い一致既存参照画像またはビューを決定または特定するために規定されてよい。いくつかの場合、既存参照画像またはビューは、１または複数のかかる閾値が満たされた場合にのみ、工程５０４において選択される。理想的な場合、完全な一致が、工程５０４において見つけられて選択される。しかしながら、いくつかの場合において、利用可能な既存参照画像データセットが不完全すぎる場合（オブジェクトまたはアセットの利用可能な既存参照画像またはビューが、要求されたパースペクティブとかなり異なっている場合、など）、もしくは、オブジェクトまたはアセットに対して利用可能な参照画像またはビューが存在しない場合には、１または複数の選択基準および／または閾値が満たされえない。いくつかのかかる場合に、オブジェクトまたはアセットの最も近い一致プレースホルダまたはゴースト画像またはビューが、工程５０４において代わりに選択される。かかるプレースホルダ画像またはビューは、オブジェクトまたはアセットの形状を表すが、その他の属性（テクスチャおよび光学特性など）を欠いている。いくつかの実施形態において、オブジェクトまたはアセットの周りの十分な密度の可能なビューのセットに及ぶ（例えば、オブジェクトまたはアセットの周りの３６０°を網羅する角度を含む）１セットのプレースホルダ画像が、比較的計算の複雑性が低いレンダリング技術を用いて、各固有のオブジェクト形状に対して生成および格納される。プレースホルダは、オブジェクトまたはアセットの完全にレンダリングされたバージョンが、利用不可能であり、または、要求されたパースペクティブから許容できないズレを示している時に利用される。 One or more criteria and/or thresholds may be defined to determine or identify the closest matching existing reference image or view for the object or asset. In some cases, an existing reference image or view is selected in step 504 only if one or more such thresholds are met. In the ideal case, a perfect match is found and selected in step 504. However, in some cases, one or more selection criteria and/or thresholds cannot be met if the available existing reference image data set is too incomplete (e.g., if the available existing reference image or view of the object or asset is significantly different from the requested perspective) or if no reference image or view is available for the object or asset. In some such cases, a closest matching placeholder or ghost image or view of the object or asset is selected instead in step 504. Such a placeholder image or view represents the shape of the object or asset but lacks other attributes (e.g., texture and optical properties). In some embodiments, a set of placeholder images spanning a sufficiently dense set of possible views around the object or asset (e.g., including angles covering 360° around the object or asset) are generated and stored for each unique object shape using relatively low computational complexity rendering techniques. Placeholders are utilized when a fully rendered version of the object or asset is unavailable or exhibits unacceptable deviations from the requested perspective.

工程５０６では、アンサンブルシーンの出力画像が、少なくとも部分的には、工程５０４において選択されたアンサンブルシーンを構成するオブジェクトまたはアセットの最も近い一致既存参照画像またはビューを適切に組みあわせまたは合成することによって、要求された所定のパースペクティブに対して生成される。工程５０６では、オブジェクトまたはアセットに対して選択された最も近い一致既存参照画像またはビューを適切にスケーリングまたはリサイズする工程、および／または、オブジェクトまたはアセットに対して選択された最も近い一致既存参照画像またはビューをペーストまたは合成するアンサンブルビュー内の場所または位置を決定する工程、を含んでよい。ほとんどの場合、アンサンブルシーンの生成済みの出力画像は、要求された所定のパースペクティブを厳密に近似する。アンサンブルシーンを構成するほとんどのオブジェクトまたはアセットは、それらに最も近くまたは最も類似した利用可能な既存の姿勢を持つ出力画像に表現されるので、これらのオブジェクトまたはアセットは、厳密にレンダリングまたは生成されないため、パースペクティブが完全に正確ではない。すなわち、ほとんどの場合、これらのオブジェクトまたはアセットは、利用可能な既存画像またはビューの中で完全な一致が見出されない限りは、出力画像において、要求された所定のパースペクティブを持たない。かかるオブジェクトまたはアセットの消失点のすべてが出力画像内の同じ点に向かうわけではないが、オブジェクトまたはアセットは、ほとんどの場合、出力画像をほとんどの部分で正確なパースペクティブとして認識するように人間の視覚系を錯覚させるのに十分に小さい量（例えば、数度）だけ、オフセットまたは傾斜される。 In step 506, an output image of the ensemble scene is generated for the requested predefined perspective, at least in part, by appropriately combining or compositing the closest matching existing reference images or views of the objects or assets that make up the ensemble scene selected in step 504. Step 506 may include appropriately scaling or resizing the closest matching existing reference images or views selected for the objects or assets, and/or determining a location or position within the ensemble view into which to paste or composite the closest matching existing reference images or views selected for the objects or assets. In most cases, the generated output image of the ensemble scene closely approximates the requested predefined perspective. Since most objects or assets that make up the ensemble scene are represented in the output image with their nearest or most similar existing pose available, the perspective is not completely accurate because these objects or assets are not exactly rendered or generated. That is, in most cases, these objects or assets will not have the requested predefined perspective in the output image unless a perfect match is found among the available existing images or views. While not all of the vanishing points of such objects or assets point to the same point in the output image, the objects or assets are in most cases offset or tilted by a small enough amount (e.g., a few degrees) to trick the human visual system into perceiving the output image as having the correct perspective for the most part.

アンサンブルシーンの出力画像における一貫性は、さらに、全体的に一貫した方法または同様の方法でアンサンブルシーンの少なくともいくつかの部分を生成することによって促進され、これは、さらに、人間が出力画像を実質的に視覚的に正確なものとして解釈することを容易にする。例えば、アンサンブルシーンを構成する１または複数のオブジェクトまたはアセット、ならびに／もしくは、アンサンブルシーンを構成する（平坦なまたはその他の）表面、構造要素、全体的特徴などは、パースペクティブが正確であるように、すなわち、要求されたパースペクティブの近似ではなく、要求された所定のパースペクティブを有するように、厳密にレンダリングまたは生成されうる。例えば、アンサンブルシーンが、部屋などの空間を含む場合、壁、天井、床、ラグ、壁掛け、などは、要求されたパースペクティブのカメラ姿勢を用いて生成されうるため、工程５０６において生成されるアンサンブルシーンの出力画像内に正確に表現されうる。さらに、アンサンブルシーンの出力画像は、例えば、利用可能なメタデータ（面法線ベクトルなど）を用いて照明変更する時に、同様の一貫した方法でシーンのすべての部分に影響を与えるグローバルな照明位置を備えてよい。したがって、グローバルな方法またはパースペクティブを修正する方法でアンサンブルビューのいくつかの部分を生成し、アンサンブルビューを構成するほとんどの独立したオブジェクトを最良近似として表現することにより、多くの場合で完全にパースペクティブの正確なバージョンからほとんど見分けられない出力が生成される。いくつかの場合、何らかの傾斜が見られうるが、デザイナーまたはユーザが、パースペクティブの正確さに関係なく、オブジェクトまたはアセットのアンサンブルを見ることで恩恵を受けるムードボードアプリケーションまたは空間／部屋計画アプリケーションなど、完全に正確なビューを必要としない特定のアプリケーションでは、それでも許容可能でありうる。とは言え、利用可能な既存の画像またはビューのリポジトリまたはデータベースが、時間と共に増大するにつれて、開示されている技術は、要求された所定のパースペクティブをますます正確に表現する出力を生成し続ける。最適な場合では、すべてのオブジェクトまたはアセットに対して完全な一致が見つかり、近似ではなく、要求された所定のパースペクティブを実際に有する出力画像を生成するために用いられる。 Consistency in the output image of the ensemble scene is further promoted by generating at least some parts of the ensemble scene in a globally consistent or similar manner, which further facilitates a human interpretation of the output image as substantially visually accurate. For example, one or more objects or assets that make up the ensemble scene and/or surfaces (flat or otherwise), structural elements, global features, etc. that make up the ensemble scene may be rendered or generated to be perspective accurate, i.e., to have the requested predefined perspective, rather than an approximation of the requested perspective. For example, if the ensemble scene includes a space such as a room, the walls, ceiling, floor, rugs, wall hangings, etc. may be generated using the camera pose of the requested perspective, and therefore may be accurately represented in the output image of the ensemble scene generated in step 506. Furthermore, the output image of the ensemble scene may comprise a global lighting position that affects all parts of the scene in a similarly consistent manner, for example, when modifying the lighting using available metadata (such as surface normal vectors). Thus, by generating some parts of the ensemble view in a global or perspective-correcting manner and representing most of the independent objects that make up the ensemble view as a best approximation, an output is generated that is often nearly indistinguishable from a completely perspective accurate version. In some cases, some skewness may be observed, but this may still be acceptable for certain applications that do not require a completely accurate view, such as mood board applications or space/room planning applications where the designer or user benefits from seeing the ensemble of objects or assets regardless of the perspective accuracy. However, as the repository or database of available existing images or views grows over time, the disclosed technique continues to generate output that more and more accurately represents the desired predefined perspective. In the optimal case, an exact match is found for all objects or assets and is used to generate an output image that actually has the desired predefined perspective, not an approximation.

上述の実施形態は、理解しやすいようにいくぶん詳しく説明されているが、本発明は、提供された詳細事項に限定されるものではない。本発明を実施する多くの代替方法が存在する。開示されている実施形態は、例示であり、限定するものではない。
［適用例１］方法であって、
複数のアセットを含むアンサンブルシーンの所定のパースペクティブの要求を受信し、
前記複数のアセットの少なくとも一部の各々の単一の既存画像を組み合わせることに少なくとも部分的に基づいて、前記要求された所定のパースペクティブを近似する前記アンサンブルシーンの出力画像を生成すること、
を備える、方法。
［適用例２］適用例１に記載の方法であって、前記要求は、前記アンサンブルシーンの正投影ビューに関して受信される、方法。
［適用例３］適用例２に記載の方法であって、前記アンサンブルシーンの前記正投影ビューは、前記複数のアセットの組み合わせられた正投影ビューを含む、方法。
［適用例４］適用例１に記載の方法であって、さらに、前記複数のアセットの前記少なくとも一部の各々の前記単一の既存画像を選択することを備える、方法。
［適用例５］適用例４に記載の方法であって、前記選択することは、前記要求された所定のパースペクティブとの完全な一致を選択することを含む、方法。
［適用例６］適用例４に記載の方法であって、前記選択することは、前記要求された所定のパースペクティブと最も近くまたは最も類似する利用可能な一致を選択することを含む、方法。
［適用例７］適用例４に記載の方法であって、前記選択することは、前記アンサンブルシーン内での関連アセットの姿勢に基づいて選択することを含む、方法。
［適用例８］適用例４に記載の方法であって、前記選択することは、関連アセットの回転された既存画像を選択することを含む、方法。
［適用例９］適用例４に記載の方法であって、前記選択することは、前記アンサンブルシーン内での関連アセットの姿勢に基づいて、前記要求された所定のパースペクティブと最も近くまたは最も類似した利用可能な一致を選択することを含む、方法。
［適用例１０］適用例１に記載の方法であって、前記アンサンブルシーンの前記出力画像を生成することは、アセットの前記一部の内の１または複数のアセットの前記単一の既存画像をスケーリングすることを含む、方法。
［適用例１１］適用例１に記載の方法であって、前記アンサンブルシーンの前記出力画像を生成することは、アセットの前記一部の内の１または複数のアセットの前記単一の既存画像をリサイズすることを含む、方法。
［適用例１２］適用例１に記載の方法であって、前記アンサンブルシーンの前記出力画像を生成することは、前記アンサンブルシーン内のアセットの少なくとも前記一部の各々の前記単一の既存画像を含める位置を決定することを含む、方法。
［適用例１３］適用例１に記載の方法であって、前記組み合わせることは、合成することを含む、方法。
［適用例１４］適用例１に記載の方法であって、前記アンサンブルシーンの前記出力画像を生成することは、前記要求された所定のパースペクティブを有する前記複数のアセットの内の少なくとも１つのアセットのビューを生成することを含む、方法。
［適用例１５］適用例１４に記載の方法であって、前記ビューは、前記少なくとも１つのアセットの複数の既存画像を用いて生成される、方法。
［適用例１６］適用例１に記載の方法であって、前記アンサンブルシーンの前記出力画像を生成することは、前記要求された所定のパースペクティブを有するように前記アンサンブルシーンの少なくとも１つの部分を生成することを含む、方法。
［適用例１７］適用例１６に記載の方法であって、前記少なくとも１つの部分は、前記アンサンブルシーンの表面を含む、方法。
［適用例１８］適用例１６に記載の方法であって、前記少なくとも１つの部分は、前記アンサンブルシーンの構造要素を含む、方法。
［適用例１９］適用例１６に記載の方法であって、前記少なくとも１つの部分は、前記アンサンブルシーンの全体的特徴を含む、方法。
［適用例２０］適用例１に記載の方法であって、さらに、前記アンサンブルシーンの前記生成された出力画像を全体的に照明変更することを備える、方法。
［適用例２１］適用例１に記載の方法であって、前記出力画像は、ビデオシーケンスのフレームを含む、方法。
［適用例２２］システムであって、
プロセッサであって、
複数のアセットを含むアンサンブルシーンの所定のパースペクティブの要求を受信し、
前記複数のアセットの少なくとも一部の各々の単一の既存画像を組み合わせることに少なくとも部分的に基づいて、前記要求された所定のパースペクティブを近似する前記アンサンブルシーンの出力画像を生成するよう構成されている、プロセッサと、
前記プロセッサに接続され、前記プロセッサに命令を提供するよう構成されているメモリと、
を備える、システム。
［適用例２３］コンピュータプログラム製品であって、持続性のコンピュータ読み取り可能な記憶媒体内に具現化され、
複数のアセットを含むアンサンブルシーンの所定のパースペクティブの要求を受信するためのコンピュータ命令と、
前記複数のアセットの少なくとも一部の各々の単一の既存画像を組み合わせることに少なくとも部分的に基づいて、前記要求された所定のパースペクティブを近似する前記アンサンブルシーンの出力画像を生成するためのコンピュータ命令と、
を備える、コンピュータプログラム製品。 Although the above embodiments have been described in some detail for ease of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not limiting.
[Application Example 1] A method, comprising:
receiving a request for a given perspective of an ensemble scene including multiple assets;
generating an output image of the ensemble scene that approximates the requested predetermined perspective based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets;
A method comprising:
[Application Example 2] The method according to Application Example 1, wherein the request is received with respect to an orthogonal view of the ensemble scene.
[Application Example 3] The method according to Application Example 2, wherein the orthogonal projection view of the ensemble scene includes a combined orthogonal projection view of the multiple assets.
[Application Example 4] The method according to Application Example 1, further comprising selecting the single existing image of each of the at least a portion of the plurality of assets.
[Application Example 5] The method according to Application Example 4, wherein the selecting includes selecting an exact match with the requested predetermined perspective.
[Application Example 6] The method according to Application Example 4, wherein the selecting includes selecting the closest or most similar available match to the requested given perspective.
[Application Example 7] The method according to Application Example 4, wherein the selecting includes selecting based on a posture of a related asset within the ensemble scene.
[Application Example 8] The method according to Application Example 4, wherein the selecting includes selecting a rotated existing image of the related asset.
[Application Example 9] The method described in Application Example 4, wherein the selecting includes selecting the closest or most similar available match to the requested given perspective based on the pose of related assets within the ensemble scene.
[Application Example 10] The method described in Application Example 1, wherein generating the output image of the ensemble scene includes scaling the single existing image of one or more assets within the portion of assets.
[Application Example 11] The method described in Application Example 1, wherein generating the output image of the ensemble scene includes resizing the single existing image of one or more assets within the portion of assets.
[Application Example 12] A method as described in Application Example 1, wherein generating the output image of the ensemble scene includes determining a position to include the single existing image of each of at least the portion of the assets in the ensemble scene.
[Application Example 13] The method according to Application Example 1, wherein the combining includes synthesizing.
[Application Example 14] A method as described in Application Example 1, wherein generating the output image of the ensemble scene includes generating a view of at least one asset of the multiple assets having the requested predetermined perspective.
[Application Example 15] The method according to Application Example 14, wherein the view is generated using a plurality of existing images of the at least one asset.
[Application Example 16] A method as described in Application Example 1, wherein generating the output image of the ensemble scene includes generating at least one portion of the ensemble scene to have the requested predetermined perspective.
[Application Example 17] The method according to Application Example 16, wherein the at least one portion includes a surface of the ensemble scene.
[Application Example 18] The method according to Application Example 16, wherein the at least one portion includes a structural element of the ensemble scene.
[Application Example 19] The method according to Application Example 16, wherein the at least one portion includes a global characteristic of the ensemble scene.
[Application Example 20] The method according to Application Example 1, further comprising globally modifying the illumination of the generated output image of the ensemble scene.
[Application Example 21] The method according to Application Example 1, wherein the output image comprises a frame of a video sequence.
[Application Example 22] A system,
1. A processor comprising:
receiving a request for a given perspective of an ensemble scene including multiple assets;
a processor configured to generate an output image of the ensemble scene that approximates the requested predetermined perspective based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets;
a memory coupled to the processor and configured to provide instructions to the processor;
A system comprising:
[Application Example 23] A computer program product embodied in a non-transitory computer-readable storage medium,
computer instructions for receiving a request for a given perspective of an ensemble scene including a plurality of assets;
computer instructions for generating an output image of the ensemble scene that approximates the requested predetermined perspective based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets;
A computer program product comprising:

Claims

1. A method comprising:
receiving a request for a given perspective of an ensemble scene including multiple assets;
generating an output image of the ensemble scene that approximates the requested predetermined perspective, but is not entirely accurate, based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets that are offset or tilted by several degrees;
wherein a single pre-existing image of each of at least a portion of the plurality of assets does not include the requested predetermined perspective .

The method of claim 1, wherein the request is received for an orthogonal view of the ensemble scene.

The method of claim 2, wherein the orthogonal view of the ensemble scene includes a combined orthogonal view of the multiple assets.

The method of claim 1, further comprising selecting the single existing image of each of the at least some of the plurality of assets.

The method of claim 4, wherein the selecting includes selecting an available match that is closest or most similar to the requested given perspective.

The method of claim 4, wherein the selecting includes selecting based on a pose of a related asset in the ensemble scene.

The method of claim 4, wherein the selecting includes selecting a rotated existing image of the related asset.

The method of claim 4, wherein the selecting includes selecting the closest or most similar available match to the requested given perspective based on poses of related assets in the ensemble scene.

The method of claim 1, wherein generating the output image of the ensemble scene includes scaling the single existing image of one or more assets within the portion of assets.

The method of claim 1, wherein generating the output image of the ensemble scene includes resizing the single existing image of one or more assets within the portion of assets.

The method of claim 1, wherein generating the output image of the ensemble scene includes determining a location to include the single existing image of each of at least the portion of assets in the ensemble scene.

The method of claim 1, wherein the combining includes compounding.

The method of claim 1, wherein generating the output image of the ensemble scene includes generating a view of at least one asset of the plurality of assets having the requested predetermined perspective.

The method of claim 13 , wherein the view is generated using a plurality of existing images of the at least one asset.

The method of claim 1, wherein generating the output image of the ensemble scene includes generating at least one portion of the ensemble scene to have the requested predetermined perspective.

The method of claim 15 , wherein the at least one portion comprises a surface of the ensemble scene.

The method of claim 15 , wherein the at least one portion includes a structural element of the ensemble scene.

The method of claim 15 , wherein the at least one portion comprises a global characteristic of the ensemble scene.

The method of claim 1, further comprising globally varying the illumination of the generated output image of the ensemble scene.

The method of claim 1, wherein the output image comprises a frame of a video sequence.

1. A system comprising:
1. A processor comprising:
receiving a request for a given perspective of an ensemble scene including multiple assets;
a processor configured to generate an output image of the ensemble scene based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets offset or tilted by several degrees , the output image approximating the requested predetermined perspective, but not being completely accurate in perspective;
a memory coupled to the processor and configured to provide instructions to the processor;
wherein each single existing image of at least a portion of the plurality of assets does not include the requested predetermined perspective .

A computer program product embodied in a non-transitory computer-readable storage medium, comprising:
computer instructions for receiving a request for a given perspective of an ensemble scene including a plurality of assets;
computer instructions for generating an output image of the ensemble scene based at least in part on combining a single pre-existing image of each of at least a portion of the plurality of assets offset or tilted by several degrees , the output image approximating the desired predetermined perspective, where the perspective is not entirely accurate;
wherein each single existing image of at least a portion of the plurality of assets does not include the requested predetermined perspective .