1.1. Augmented Reality and Our Motivation
Augmented reality (AR) technology is a revolutionary solution to make things more efficient and attractive by overlaying 3D virtual objects onto the real world in real time. AR seems like something straight out of a science fiction movie, but it is here today and available on many current information devices. AR also presents a particular form of human–computer interaction (HCI). The initial concept of AR is to replace parts of reality with additional information; the latter can be computer-generated and hidden from the human senses. Azuma in 1997 [
1] defined AR as technologies that combine real-world objects with virtual 3D objects, thereby creating interactivity between them (real and virtual) in real time; the virtual objects, however, could not be perceived directly. AR has been used extensively in many systems such as military operations in urban terrain and air forces [
2], architecture systems [
3], and games [
4].
Figure 1 shows an example of an AR application for tourism. Here, a virtual information label augments the real scene of the Eiffel Tower; the virtual information label is computer-generated and overlaid on a partial area of the actual Eiffel Tower scene.
Creating an effective AR experience requires the use of various tools such as graphics rendering tools, tracking and registration tools, and various display or interaction techniques. When creating AR displays [
1], one faces numerous problems such as the need to avoid producing confusing visuals, the need to reduce processing complexity, and the limited number of allowed virtual items. However, one central problem is the determination of computer-generated objects and their position and orientation so that they could be aligned accurately with the physical objects in the real world. In many existing applications, graphical content is often put on pre-defined markers as they provide a convenient way for detecting the encoded contents and calculating the camera poses. For example, an image tag system such as BazAR [
5,
6] uses natural (colour) picture as markers. The camera position related to the marker is calculated to blend the virtual information into the real world environment. This relationship is called pinhole camera model [
7] or pose estimation in computer vision.
For robust and unambiguous applications [
8], black and white markers with thick borders are often used [
9,
10,
11]. They include at least the three types: (1) template markers; (2) barcode markers; and (3) circular marker (
Figure 2). These markers are made up of a white/light coloured padding, surrounded by a thick black/dark coloured border and a high contrast pattern of either a template figure, a square or a circular 2D bar code. The pattern is what makes these markers unique. The black border of markers is recognised, tracked, and used to calculate the position in 3D space. Some other newly invented fiducial marker designs combining payload with the structure of the tag, such as [
12,
13], are still collections of black and white squares or dots.
Both template and bar-code tags have their pros and cons. Template tags may contain some meaningful pictures of the objects they are presenting, such as a flying eagle in
Figure 2. For this type of marker, one could use feature matching techniques for identifying them (by comparing the appearances with other markers stored in a database). However, this system must be trained sufficiently for the proper template matching; moreover, template recognition could be unreliable due to the undesired similarity between template markers [
14]. Consequently, in such a system, the number of different templates need to be small for reliable matching results.
On the other hand, bar-code and circular markers are encoded in “0” or “1” by arranging the marker region into many black and white bars. Examples are CyberCode [
9], Bokode [
15], and AprilTag [
16]. Decoding techniques are used to decrypt the encoded data. It is relatively easy to detect and recognise bar-code using various feature detection technologies [
17]. However, these markers display no useful information for the users. It is thus difficult to know which marker represents which virtual object just by looking at the black and white pattern themselves.
Our goal is to not only address the issues of the current marker types, but also utilise the advantages from these markers. In this paper, we introduce a new AR marker that optically hides a multi-level barcode code [
18] inside a colour picture. The details will be described in this article.
1.2. The Rise of Augmented Reality and Its Applications
Since 1990, AR has been used in many different applications. It is now believed to be the next step in how people collect, process and interact with information [
19]. For example, in health care, suddenly collapsed patients can be saved more quickly by using AED4EU designed by Lucien Engelen from the University Nijmegen Medical Centre, The Netherlands [
20]. It helps users add places where automated external defibrillator (ADEs) are located. They are then able to project the exact location of the nearest AEDs through their phones when needed.
Another example is in ultrasound imaging [
21], whereby physicians could use AR to see directly inside a patient for accurate positioning of the surgical site [
22]. Their surgical team now can easily view the imaging data in real time while they are processing the procedure. In 2007, Christoph Bichlmeier and his team introduced a new AR system for intuitive viewing on the deep-seated anatomy of the patient in real time [
23]. They also proposed that the system could be integrated surgical tools for rendering volumetrics of computerised tomography scans or magnetic resonance imaging data directly without wasting time.
AR has also been used for gaming purposes popularly. ARQuake is an indoor/outdoor AR first person game [
24]. It allows players to shoot down virtual demons while moving in the physical world, using a large Head-mounted display to interact with virtual objects. However, it is not a practical idea when people are focusing on handy and light accessories nowadays. Pokémon GO is an indicative AR based game that addicts many people around the world [
25]. Players can use their mobile devices to catch virtual Pokémon characters in the real world and interact with other users via Wi-Fi or 3/4G networks.
1.3. Current Technical Pros and Cons of Augmented Reality
Creating an effective AR experience requires the use of different tools such as tracking, registration (for aligning the virtual objects with the real scene), and rendering (for displaying the virtual information). These tools are easy to implement with the advantage of today’s technologies. However, there are still a few long-term AR disadvantages that should be concerning, such as system processing complexity and information orientation. The main principle behind AR technology is finding the target (could be a pictorial marker or a bar-code marker) and orienting the digital information on the detected target.
Pictorial markers as shown in
Figure 3 are often used as they are convenient for detecting and displaying content. They are also more meaningful to the users and especially the younger ones. However, this approach requires image registrations, and the recognition processing becomes unreliable sometimes due to the undesired similarity between images [
14]. Consequently, the method is used mostly in AR applications with small data sets such as Magic books [
26], an application designed for children usage. Processing complexity issue can occur if this method is applied for AR applications with large datasets such as AR Chemistry application as shown in
Figure 4. These chemistry education applications usually include an extensive data set that contains the information of over hundreds chemical elements and compounds. For this reason, this type of application is designed to work best with barcode markers.
As stated, the bar-code markers are more popularly used within the larger sized, robust and unambiguous applications. These markers are normally designed with black and white data cells surrounding with a thick dark-coloured border. The data cells are encoded as a binary number that made them unique from each other. The decoded binary number on each marker will be compared with stored binary numbers in the dataset. The advantage of this method compared to pictorial markers is that the system processing complexity issue will be minimised. Another advantage is that, besides encoding information, it is possible to use the binary bits for error detection and correction. Hamming codes [
27] and Reed–Solomon [
28] codes are two methods that are often used for error detection and correction; some examples are ALVAR and ARTag [
29]. Even though there are advantages over pictorial markers, these data cells are meaningless to the users unless some pictorial and/or explanatory information are added along side.
Both pictorial and bar-code makers have their advantages and disadvantages, as shown in
Table 1. The pictorial approach has a meaningful appearance to the users, but the system processing complexity and reliability could be problematic. The bar-code markers overcome the problems and facilitate error detection and correction; however, they provide no useful information to the users and thus appear unattractive.
1.4. Related and Previous Works
There exist some digital watermarking systems trying to solve the aforementioned AR issues. Visual Syncar [
30] is one example, the Japanese telecommunications company NTT (Tokyo, Japan) embeds digital watermarks as time codes into the video sequences. A smart device can read those time codes, and display the augmented virtual graphics in sync with the live content on the screen. This is an ingenious approach, but it can only be used for images displaying on a computer/TV’s screen.
CyberCode [
9] is another visual tag used for the AR environment, which place a 2D barcode next to a graphic business card or other physical spaces. The problem is: if the barcode is too small, it is not easily readable/scannable by general webcam. On the other hand, if the barcode is too large, it is getting distracting and leaving little spaces for presenting the pictorial or literature information of the marker.
Another way to fix the issue is to combine both pictorial markers and barcode marker in one single AR Tag. For instance, Alvarez [
31] proposed an approach of using a 2D barcode in the centre of the tag and the border is used to present the meaning of the tag, regarding texts (as seen in
Figure 5a). This marker looks distracting with strong colours, and vertical texts are not easily readable.
Qualcomm Vuforia Marker (as shown in
Figure 5) is the more decent AR marker. This marker is capable of storing a few bytes of binary codes by adding some black and white dots next to the black border of the marker; the central region is used to display the pictorial content. This concept works well in some simple applications. However, the appearance of the barcode is still quite obvious and distracting. Moreover, the inner images must have lighter backgrounds so that they do not confuse the detection of the binary codes.
We tackle the problem with a different but novel approach. We use graphic patterns of the original image to encode the binary information. As a result, no obvious binary barcode is presented on the appearance of the AR tag. The pictorial, graphic region itself holding the barcode; thus, it has the same size as the picture. The detected tag is therefore readable/scannable by the general colour camera. We have introduced its initial designs briefly in some short conference papers, such as [
33,
34,
35]. In this manuscript, the proposed idea, design, and implementation will be covered in much more details.