PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Research into visual perception ultimately affects display design. Advance in display technology affects, in turn, our study of perception. Although this statement is too general to provide controversy, this paper present a real-life example that may prompt display engineers to make greater use of basic knowledge of visual perception, and encourage those who study perception to track more closely leading edge display technology. Our real-life example deals with an ancient problem, the moon illusion: why does the horizon moon appear so large while the elevated moon look so small. This was a puzzle for many centuries. Physical explanations, such as refraction by the atmosphere, are incorrect. The difference in apparent size may be classified as a misperception, so the answer must lie in the general principles of visual perception. The factors underlying the moon illusion must be the same factors as those that enable us to perceive the sizes of ordinary objects in visual space. Progress toward solving the problem has been irregular, since methods for actually measuring the illusion under a wide range of conditions were lacking. An advance in display technology made possible a serious and methodologically controlled study of the illusion. This technology was the first heads-up display. In this paper we will describe how the heads-up display concept made it possible to test several competing theories of the moon illusion, and how it led to an explanation that stood for nearly 40 years. We also consider the criticisms of that explanation and how the optics of the heads-up display also played a role in providing data for the critics. Finally, we will describe our own advance on the original methodology. This advance was motivated by previously unrelated principles of space perception. We used a stereoscopic heads up display to test alternative hypothesis about the illusion and to discrimate between two classes of mutually contradictory theories. At its core, the explanation for the moon illusion has implications for the design of virtual reality displays. Howe do we scale disparity at great distances to reflect depth between points at those distances. We conjecture that one yardstick involved in that scaling is provided by oculomotor cues operating at near distances. Without the presence of such a yardstick it is not possible to account for depth at long distances. As we shall explain, size and depth constancy should both fail in virtual reality display where all of the visual information is optically in one plane. We suggest ways to study this problem, and also means by which displays may be designed to present information at different optical distances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The standard model of visual processing is based on the selective properties of linear spatial filters which are tuned to different orientations and radial frequencies. This standard model is well suited for the description of a wide range of phenomena in vision but it is not clear, whether the whole range of basic properties of early vision is entirely within the models explanatory scope. Here we suggest that there exists a basic selective processing property in early vision which is definitely outside the explanatory scope of the standard model: the selectivity for intrinsically 2D signals. This property has already been observed in the classical experiments of Hubel and Wiesel, and has more recently been found in more complex form in the extra-classical receptive field properties of various visual neurons. We show here that this selectivity cannot be described within the framework of linear spatial filtering because of reasons which lie at the heart of the theory o f linear systems: the restriction of such systems to OR- combinations of their intrinsically 1D eigenfunctions. We present a general nonlinear framework for the modeling of i2D-selective systems which is based on AND-like combinations of frequency components, and which is closely related to the Wiener-Volterra representation of nonlinear systems. To our knowledge, i2D-selectivity is the only non- standard property for which such a theoretical framework yet exists. The framework enables the combination of the nonlinear i2D-selectivity with other basic selectivities of visual neurons, for examples with simple and complex-like properties, and makes it thus possible, to construct models for the variety of neurophysiological observations on the i2D-selective processing in visual neurons. As an insight of general interest for the recent discussion on second-order properties in early vision, the framework reveals the existence of extended equivalence classes in which nonlinear schemes can have very dissimilar structural properties, and lead nevertheless to identical input-output relations. Finally, there is a close relation between i2D-selectivity and the higher-order statistical redundancies in natural images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Blur is an intrinsic property of the retinal image that can vary substantially in natural viewing. We examined how processes of contrast adaptation might adjust the visual system to regulate the perception of blur. Observers viewed a blurred or sharpened image for 2-5 minutes, and then judged the apparent focus of a series of 0.5-sec test images interleaved with 6-sec of readaptation. A 2AFC staircase procedure was used to vary the amplitude spectrum of successive test to find the image that appeared in focus. Adapting to a blurred image causes a physically focused image to appear too sharp. Opposite after-effects occur for sharpened adapting images. Pronounced biases were observed over a wide range of magnitudes of adapting blur, and were similar for different types of blur. After-effects were also similar for different classes of images but were generally weaker when the adapting and test stimuli were different images, showing that the adaptation is not adjusting simply to blur per se. These adaptive adjustments may strongly influence the perception of blur in normal vision and how it changes with refractive errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The study of subjective visual quality, and the development of computed quality metrics, require accurate and meaningful measurement of visual impairment. A natural unit for impairment is the JND. In many cases, what is required is a measure of an impairment scale, that is, the growth of the subjective impairment, in JNDS, as some physical parameters is increased.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One fundamental problem in predicating the subject5ive quality of a degraded video is that the perceived quality depends on the properties of the video itself, or the context of the degradation. In this paper, we present the result for a series of experiments designed to measure the detection thresholds and annoyance values of small regions of MPEG-2 artifacts inserted into mostly uncorrupted video sequences. In previous work, we found that the detection threshold contains much, but not all, of the information needed to remove the dependence of quality on the context. In this paper, we report the result of two experiments. In one experiment, we varied the type of MPEG-2 artifacts inserted into the test sequences. In the other experiment, we varied the location , size, and duration of the corrupted regions in the test sequences. From each set of dat, we estimated detection thresholds and fitted the parameters of a quality function. The experimental results demonstrated that, under a wide set of test conditions, the detection threshold is still very useful for the estimation of quality as context varies. In fact, the detection threshold was the only factor necessary to model the changes in the quality function parameters with artifact type. The experimental data showed that the detection threshold and the quality function parameters do depend on the size and duration of the degraded region. However, the effects of size and duration are minor relative to the effect of artifact location.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A subjective quality evaluation was performed to qualify vie4wre responses to visual defects that appear in low bit rate video at full and reduced frame rates. The stimuli were eight sequences compressed by three motion compensated encoders - Sorenson Video, H.263+ and a Wavelet based coder - operating at five bit/frame rate combinations. The stimulus sequences exhibited obvious coding artifacts whose nature differed across the three coders. The subjective evaluation was performed using the Single Stimulus Continuos Quality Evaluation method of UTI-R Rec. BT.500-8. Viewers watched concatenated coded test sequences and continuously registered the perceived quality using a slider device. Data form 19 viewers was colleted. An analysis of their responses to the presence of various artifacts across the range of possible coding conditions and content is presented. The effects of blockiness and blurriness on perceived quality are examined. The effects of changes in frame rate on perceived quality are found to be related to the nature of the motion in the sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional visual quality metrics measure fidelity instead of quality, even though fidelity, i.e. the accuracy of the reproduction of the original on the display, is just one of the many factors determining the overall perceived quality. In this paper, the addition of image appeal attributes is investigated in order to bridge this gap. Sharpness and colorfulness are identified among these attributes and are quantified by means of an isotropic local contrast measure and the distribution of chroma, respectively. The benefits of using these attributes are demonstrated with the help of data from subjective experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a lossy, wavelet-based approach for the compression of digital angiogram video. An analysis of the high-frequency sub-bands of a wavelet decomposition of an angiogram image reveals significantly sized regions containing no diagnostically important information. The encoding of the high-frequency sub-band wavelet coefficients in such regions proves to be burdensome, although if removed, the coefficients are notable by their absence. This paper aims to model these wavelet coefficients using a texture modeling approach. This is only performed in regions which are considered diagnostically unimportant, with diagnostically important regions encoded as normal. The effect of this procedure significantly reduces the bit-rate of diagnostically unimportant areas of the image without a perceptible loss of image quality. The effectiveness of the algorithm at different bit-rates is assessed by a consultant cardiologist with the key aim of identifying any degradation in the diagnostic content of the images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A perceptual wavelet coder was developed to satisfy a wide range of requirements, from high-end commercial applications where no distortion can be tolerated, to inherent and wireless types of applications, where bandwidth is highly restricted. A perceptual model was designed and incorporated in the compression scheme to allow the encoder to allocate bits for each subband based on minimizing the overall perceptual distortion. The perceptual model takes into account contrasts sensitivity at different frequency subband, local background at desired perceptual quality. The coder achieves scalability by multi-rate quantization and entropy coding. The result of the perceptual wavelet encoder were compared to those of JPEG and wavelet encoder without perceptual model. In addition to PSNR, a quality assessment model based on visual discrimination model was used to evaluate the compressed images. Results demonstrated better performance of the perceptual wavelet encoder than that of the other two encoders. The codec is asymmetric in that the decoder does not need to have knowledge of the perceptual model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The cortex transform provides a meaningful representation of images in terms of the responses of cortical cells. It is based on experimental results form human vision research. The multiple orientations obtained in the expansion are of interest for image analysis applications. In image coders, quantization can exploit to a large extent psychovisual properties. This transform belongs to a group of overcomplete transforms. This property has not benefitted their use in coding applications. However, the inherent redundancy of overcomplete representation can be exploited to increase the robustness of the code. Multiple description coding of overcomplete expansions has been reported to confer more graceful degradation to partial reconstructions in the event of channel erasures. This paper proposes a coding strategy based on orientation transforms that yield perceptually meaningful coefficients. The coding budget is reduced by sampling and quantization. The remaining redundancy is used to provide robustness. In addition, the descriptions can be organized to allow progressive reconstruction. The tradeoff between quantization strength, perceived quality, redundancy and robustness can be incorporated in the design of the coder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We examine perceptual metrics and use them to evaluate the quality of still image coders. We show that mean-squared- error based metrics fail to predict image quality when one compares artifacts generated by different types of image coders. We consider three different types of coders: JPEG, the Safranek-Johnston perceptual subband coder (PIC), and the Said-Pearlman SPIHT algorithm with perceptually weighted subband quantization, based on the Watson et al. visual thresholds. We show that incorporating perceptual weighting in the SPIHT algorithm results in significant improvement in visual quality. The metrics we consider are based on the same image decompositions as the corresponding compression algorithms. Such metrics are computationally efficient and considerably simpler than more elaborate metrics. However, since each of the metrics is used for the optimization of a coder, one expects that they would be biased towards that coder. We use the metrics to evaluate the performance of the compression techniques for a wide range of bit rates. Our experiments indicate that the PIC metric provides the best correlation with subjective evaluations. It predicts that at very low bit rates the SPIHT algorithm and the 8 by 8 PIC coder perform the best, while at high bit rates the 4 by 4 PIC coder is the best. More importantly, we show that the relative algorithm performance depends on image content, with the subband and DCT coders performing best for images with a lot of high frequency content, and the wavelet coders performing best for smoother images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Halftones are intended to produce the illusion of continuous images form binary output states, so the visibility of undesired halftone textures is an essential quality factor of halftone patterns. We propose a metric to predict the visibility of color halftone textures. The metric utilizes the human visual threshold function and contrast sensitivity functions of luminance and chrominance. The threshold is related to the average background luminance level by de Vries-Rose law. An iterative approach was used to determine the distance in which the visual error just exceeds the visual threshold. This distance is the metric that predicts the critical distance that a human observer can just discriminate the textures. To verify the metric, the texture visibility was determined experimentally by a psychological experiment. The halftone stimuli were presented on an SGI monitor. Starting from an initial distance, where the halftone images appeared as continuous color patches, the subject walked toward the monitor and found the distance where he or she could just discriminate the spatial changes caused by the textures. Then the distances determined by the experiment and those predicted by the metric were compared. A good correlation was achieved. The results show that the metric is able to predict the visibility over a wide range of texture characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to automate the image evaluation task, an engineering model for predicting the visual differences of color images is developed. The present CVDP consists of a color appearance model, a set of contrast sensitivity functions, the modified cortex transform, and a multichannel interaction model for masking effects. Based ona pixel-by- pixel difference metric similar to the CIELAB color difference, the predictions of the simplified CVDP are found to correlate fairly with the psychophysical test results over 51 pairs of natural images with some detection failures. These failures can be eliminated by including additional image quality metrics: the clarity in the shadow and highlight areas and the graininess in the mid-tone areas. The modified model is found to be able to identify 55 percent of those visually indistinguishable image pairs. The preliminary results using the complete CVDP for selected image pairs indicate that the effects of masking introduce only little changes to the results of the simplified CVDP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Masking of color targets was measured for fixed pattern noises made of all additive combinations of white/black, red/green, and blue/yellow noise. Results are compared with the predictions of a cone-contrast-based masking model with and without cross-channel masking. The model without cross- channel masking performed very well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Keynote Session: Human and Computer Vision in Electronic Imaging
The perception of objects is a well-developed field, but the perception of materials has been studied rather little. This is surprising given how important materials are for humans, and how important they must become for intelligent robots. We may learn something by looking at other fields in which material appearance is recognized as important. Classical artists were highly skilled at generating convincing materials. The simulation of material appearance is a topic of great importance in 3D computer graphics. Some fields, such as mineralogy, use the concept of a 'habit' which is a combination of shape and texture, and which may be used for characterizing certain objects or materials. We have recently taken steps toward material recognition by machines, using techniques derived from the domain of texture analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gloss is a visual attribute, which, as well as color, provides qualitative information on the surrounding objets. The relevant physical quantity for gloss measurement is the BRDF that characterizes the geometrical distribution of the reflected light on the sample. We hypothesize that the light reflection on a glossy sample is split in 2 parts. The volume diffusion and the surface reflection. We assume that the surface of the sample consists of tiny mirror facets the orientation of which is specific of the surface microstructure. Each facet is the source of specular reflection. With this hypothesis, we have calculated a theoretical model of the BRDF according to Fresnel laws. We have applied the model to black and white painted samples of various gloss indexes. In addition, we have acquired real measurements of the BRDF of the same samples by using a new device which gives the compete measurement of the luminance geometrical distribution in the hemisphere. The comparison of the measurements and the theoretical model shows that, for this series of samples, the facet hypothesis is reliable and makes it possible to model the real BRDF in all directions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Texture as a surface representation is the subject of a wide body of computer vision and computer graphics literature. While texture is always associated with a form of repetition in the image, the repeating quantity may vary. The texture may be a color or albedo variation as in a checkerboard, a paisley print or zebra stripes. Very often in real-world scenes, texture is instead due to a surface height variation, e.g. pebbles, gravel, foliage and any rough surface. Such surfaces are referred to here as 3D textured surfaces. Standard texture recognition algorithms are not appropriate for 3D textured surfaces because the appearance of these surfaces changes in a complex manner with viewing direction and illumination direction. Recent methods have been developed for recognition of 3D textured surfaces using a database of surfaces observed under varied imaging parameters. One of these methods is based on 3D textons obtained using K-means clustering of multiscale feature vectors. Another method uses eigen-analysis originally developed for appearance-based object recognition. In this work we develop a hybrid approach that employs both feature grouping and dimensionality reduction. The method is tested using the Columbia-Utrecht texture database and provides excellent recognition rates. The method is compared with existing recognition methods for 3D textured surfaces. A direct comparison is facilitated by empirical recognition rates from the same texture data set. The current method has key advantages over existing methods including requiring less prior information on both the training and novel images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reflectance characteristics of materials are known to have visible effects on image shading functions. The present study quantified the light scattering distributions form roughened glass and correlated these with the microscopic surface topography. Samples were 11 silica microscope slides, roughened with commercial diamond pastes with particle sizes ranging from 4-67 m. In-plane and out-of- plane scattering measurements were made with laser light incident at 45 degrees. In-plane scattering distributions for samples illuminated with 633 nm varied significantly in their shape, but all curves were roughly centered on the specular angles. In-plane scattering distributions for 514 nm also disappearance of specular components was observed for samples whose RMS roughness just exceeded the wavelength of light. The results indicate that small changes in microscopic surface roughness can produce large changes in scattering. Evidently, surface microstructure has a pronounced effect on macroscopic appearance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Physical surfaces such as metal, plastic, and paper possess different optical qualities that lead to different characteristics in images. We have found that humans can effectively estimate certain surface reflectance properties from a single image without knowledge of illumination. We develop a machine vision system to perform similar reflectance estimation tasks automatically. The problem of estimating reflectance form single images under unknown, complex illumination proves highly under-constrained due to the variety of potential reflectances and illuminations. Our solution relies on statistical regularities in the spatial structure of real-world illumination. These regularities translate into predictable relationships between surface reflectance and certain statistical features of the image. We determine these relationships using machine learning techniques. Our algorithms do not depend on color or polarization; they apply even to monochromatic imagery. An ability to estimate reflectance under uncontrolled illumination will further efforts to recognize materials and surface properties, tp capture computer graphics models from photographs, and to generalize classical motion and stereo algorithms such that they can handle non-Lambertian surfaces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The automated detection of humans in computer vision as well as the realistic rendering of people in computer graphics necessitates a better understanding of human skin reflectance Prior vision and graphics research on this topic has primarily focused on images acquired with conventional color cameras. Although tri-color skin data is prevalent, it does not provide adequate information for explaining skin color or for discriminating between human skin and dyes designed to mimic human skin color. A better understanding of skin reflectance can be achieved through spectrographic analysis. Previous work in this field has largely been undertaken in the medical domain and focuses on the detection of pathology. Our work concentrates on the impact of skin reflectance on the image formation process. In our radiometric facility we measure the light reflected from the skin using a high resolution, high accuracy spectrograph under precisely calibrated illumination conditions. This paper presents observations from the first body of data gathered at this facility. From the measurements collected thus far, we have observed population-independent factors of skin reflectance. We show how these factors can be exploited in skin recognition. Finally, we provide a biological explanation for the existence of a distinguishing pattern in human skin reflectance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skin color reproduction becomes increasingly important with the recent progress in various imaging systems. In this paper, based on subjective experiments, correlation maps are analyzed between appearance of Japanese facial images and amount of melanin and hemoglobin components in the facial skin. Facial color images were taken by digital still camera. The spatial distributions of melanin and hemoglobin components in the facial color image were separated by independent component analysis of skin colors. The separated components were synthesized to simulate the various facial color images by changing the quantities of the two separated pigments. The synthesized images were evaluated subjectively by comparing with the original facial images. From the analysis of correlation map, we could find the visual or psychological terms that are well related to melanin components influence the appearance of facial color image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current vision systems are designed to perform in clear weather. Needless to say, in any outdoor application, there is not escape from 'bad' weather. Ultimately, computer vision systems must include mechanisms that enable them to function in the presence of haze, fog, rain, hail and snow. We begin by studying the visual manifestations of different weather conditions. For this, we draw on what is already known about atmospheric optics, and identify effects caused by bad weather that can be turned to our advantage. Since the atmosphere modulates the information carried form a scene point to the observer, it can be viewed as a mechanism of visual information coding. We exploit two fundamental scattering models and develop methods for recovering pertinent scene properties, such as 3D structure, from one or two images taken under poor weather conditions. Next, we model the chromatic effects of the atmospheric scattering gand verify it for fog and haze. The, based on this chromatic model we derive several geometric constraints on scene color changes caused by varying atmospheric conditions. Finally, using these constraints we develop algorithms for computing fog or haze color, depth segmentation , extracting 3D structure, and recovering 'clear day' scene colors, from two or more images taken under different but unknown weather conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider the flatland or 2D properties of the light field generated when a homogeneous convex curved surface reflects a distant illumination field. Besides being of considerable theoretical interest, this problem has applications in computer vision and graphics - for instance, in determining lighting and bidirectional reflectance distribution functions (BRDFs), in rendering environment maps, and in image-based rendering. We demonstrate that the integral for the reflected light transforms to a simple product of coefficients in Fourier space. Thus, the operation of rendering can be viewed in simple signal processing terms as a filtering operation that convolves the incident illumination with the BRDF. This analysis leads to a number of interesting observations for computer graphics, computer vision, and visual perception.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How do we perceive the lightness of a region of an achromatic curved surface containing a glossy highlight . We manipulated diffuse and specular reflectance components of computer-generated displays of ellipsoid surfaces. Each contained a pattern of patches on a background of uniform reflectance Observers indicated whether the diffuse components of a test patch's reflectance appeared higher or lower than that of other patches, which had identical reflectances. factors affecting performance included the position of the highlight relative to the test patch, and degree of similarity of the reflectance of the patch pattern to that of the background. Results demonstrate that observers can discriminate the diffuse component of reflectance even when luminance in a region is perturbed by a highlight. Performance improves if the background's diffuse reflectance is close to that the standard patches, and also if it is higher, rather than lower, than that of the standard patches. Results suggest that visual mechanisms that are sensitive to iso-photes compare intensities in neighboring regions, to assess degree of gloss. We conclude with a demonstration of included glossiness, whereby a given block of image intensities can appear shiny or dull, depending on manipulations of intensity values in other parts of the image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we introduce a new model of surface appearance that is based on quantitative studies of gloss perception. We use image synthesis techniques to conduct experiments that explore the relationships between the physical dimensions of glossy reflectance and the perceptual dimensions of glossy appearance. The product of these experiments is a psychophysically-based model of surface gloss, with dimensions that are both physically and perceptually meaningful and scales that reflect our sensitivity to gloss variations. We demonstrate that the model can be used to describe and control the appearance of glossy surfaces in synthesis images, allowing prediction of gloss matches and quantification of gloss differences. This work represents some initial steps toward developing psychophyscial models of the goniometric aspects of surface appearance to complement widely-used colorimetric models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Shading is an important shape cue. Theories of 'shape from shading' assume that the shading is due to collimated beams irradiating opaque smooth Lambertian surface. Many objects are not at all opaque though. In cases of translucent objects photons penetrate the surface and enter the volume of the object, perhaps to re-emerge from the surface at another location. In such cases there can be no 'shading' proper. In the limit of very strong scattering these materials approach opaque Lambertian surfaces, in the limit of very weak scattering they approach transparent objects such as glass or water. A general theory of 'shading' in the case of translucent objects is not available. We study the optical properties for a number of geometries. In simple cases the scattering problem can be solved and we obtain models of 'shading' of translucent material that are distinct from the opaque Lambertian case. In more general cases one needs to make certain approximations. We show how to develop rules of thumb for generic cases. Such rules are likely candidates for models of human visual perception of wrinkles in human skin or articulations of cumulus clouds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Observers routinely perceive 3D pictorial spaces when looking at 2D photographs. If an object is photographed in different poses, the photographs are different and so are the pictorial spaces. Observers can easily identify corresponding points in photographs of a single object in different poses. This is perhaps surprising, since no algorithm can presently do this except when extreme constraints are met. In this study we find correspondences and subsequently probe the pictorial surface attitude at corresponding points. Since we can fit a surface at a dens field of surface attitude samples, we obtain two surfaces in pictorial space that correspond to the two poses of the object. We explore the relation between these two surfaces. In Euclidean space the surfaces of an object in different poses are related through an isometry. Since pictorial space has a non-Euclidian structure the empirical correspondence is not an isometry though. The results allow us to draw conclusions concerning the geometrical structure of pictorial space. The results are of practical importance because many scenes are routinely documented through a sequence of photographs taken from different vantage points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Geometric objects are often represented by many millions of triangles or polygons, which limits the ease with which they can be transmitted and displayed electronically. This has lead to the development of many algorithms for simplifying geometric models, and to the recognition that metrics are required to evaluate their success. The goal is to create computer graphic renderings of the object that do not appear to be degraded to a human observer. The perceptual evaluation of simplified objects is a new topic. One approach has been to sue image-based metrics to predict the perceived degradation of simplified 3D models. Since that 2D images of 3D objects can have significantly different perceived quality, depending on the direction of the illumination, 2D measures of image quality may not adequately capture the perceived quality of 3D objects. To address this question, we conducted experiments in which we explicitly compared the perceived quality of animated 3D objects and their corresponding 2D still image projections. Our results suggest that 2D judgements do not provide a good predictor of 3D image quality, and identify a need to develop 'object quality metrics.'
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual Attention: Eye Movements and Region of Interest
Eye movements are an important aspect of human visual behavior. The temporal and space-variant nature of sampling a visual scenes requires frequent attentional gaze shifts, saccades, to fixate onto different parts of an image. Fixations are often directed towards the most informative regions in the visual scene. We introduce a model and its simulation that can select such regions based on prior knowledge of similar scenes. Having representations of scene categories as probabilistic combination of hypothetical objects, i.e., prototypical regions with certain properties, it is possible to assess the likely contribution of each image region to the successive recognition process. The regions are obtained by segmenting low-resolution images using the normalized cut algorithm. Based on low-level features, such as average color, size, position, regions are clustered into a small set of hypothetical objects. Using conditions probabilities for each object given the scene category, the model can then predict the informative value of the corresponding region and initiate a sequential spatial information-gathering algorithm analogous to an eye movement saccade to a new fixation. The article demonstrates how the initial hypothesis determines the next region of interest to visit and how these scene hypotheses are affected by sequentially visiting each new image region.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Studies of visual attention and eye movements have shown that people generally attend to only a few areas in typical scenes. These areas are commonly referred to as regions of interest (ROIs). When scenes are viewed with the same context and motivation, these ROIs are often highly correlated amongst different people, motivating the development of computational models of visual attention. This paper describes a novel model of visual attention designed to provide an accurate and robust prediction of a viewer's locus of attention across a wide range of typical video content. The model has been calibrated and verified using data gathered in an experiment in which the eye movements of 24 viewers were recorded while viewing material form a large database of still and video scenes. Certain characteristics of the scene content, such as moving objects, people, foreground and centrally-located objects, were found to exert a strong influence on viewers' attention. The results of comparing model predictions to experimental data demonstrate a strong correlation between the predicted ROIs and viewers' fixations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We explore the way in which people look at images of different semantic categories and directly relate those results to computational approaches for automatic image classification. Our hypothesis is that the eye movements of human observers differ for images of different semantic categories, and that this information can be effectively used in automatic content-based classifiers. First, we present eye tracking experiments that show the variation in eye movements across different individuals for image of 5 different categories: handshakes, crowd, landscapes, main object in uncluttered background, and miscellaneous. The eye tracking results suggest that similar viewing patterns occur when different subjects view different images in the same semantic category. Using these results, we examine how empirical data obtained from eye tracking experiments across different semantic categories can be integrated with existing computational frameworks, or used to construct new ones. In particular, we examine the Visual Apprentice, a system in which images classifiers are learned form user input as the user defines a multiple level object definition hierarchy based on an object and its parts and labels examples for specific classes. The resulting classifiers are applied to automatically classify new images. Although many eye tracking experiments have been performed, to our knowledge, this is the first study that specifically compares eye movements across categories, and that links category-specific eye tracking results to automatic image classification techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Late 19th Century Gestalt psychologists rebelled against the tenants of traditional epistemology and newly defined psychophysics. They introduced powerful new ideas. These ideas were supported by a myriad of experiments showing that the whole is not the sum of the parts. These appearance experiments include Simultaneous Contrast, Gelb's doorway, and Benary's Cross. The traditional Gestalt explanations of these experiments call for cognitive top-down interferences, that are difficult to describe as detailed computer models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Theories of human color constancy have ben based on experiments with relatively simple laboratory stimuli. Even recent 'nearly natural' stimuli are optically much simpler than natural visual environments. I review here some of the complexity of natural visual environments. I argue that several kinds of optical structure exploited by theories of human color constancy may not occur in most natural scenes. Continued progress in color constancy research will require better descriptions of natural visual environments and of human color constancy performance within them. Both pose large challenges.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A growing body of evidence suggests that the brain computes lightness in a two-stage process that involves (1) an early neural encoding of contrast at the locations of luminance borders in the visual image, and (2) a subsequent filling-in of the lightnesses of the regions lying between the borders. I will review evidence that supports this theory and present a computational model of lightness based on filling-in by a spatially-spreading cortical diffusion mechanism. The behavior of the model will be illustrate by showing how it quantitatively accounts for the lightness matching data of Rudd and Arrington. The model's performance will be compared that of other theories of lightness, including retinex theory, a modified version of retinex theory that assumes edge integration with a falloff in spatial weighting of edge information with distance, lightness anchoring based on the highest luminance rule, and the BCS/FCS filling-in model developed by Grossberg and his colleagues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How well-focused an image appears can be strongly influenced by the surroundings context. A blurred surround can cause a central image to appear too sharp, while sharped surrounds can induce blur. We examined some spatial properties and stimulus selectivities of this 'simultaneous blur contrast.' Observers adjusted the focus of a central test image by a 2AFC staircase procedure that varied the slope of the image amplitude spectrum. The test were surrounded by 8 identical images with biased spectra, that were presented concurrently with the test for 0.5 sec on a uniform gray background. Contrast effects were comparable in magnitude for image sizes ranging from 1-deg to 4-deg in visual angle, but were stronger for test that were viwe4 in the periphery rather than fixated directly. Consistent biases were found for different types of grayscale images, including natural images, filtered noise, and simple edges. However, effects were weaker when surrounds and tests were drawn from different images, or differed in contrast-polarity or color, and thus do not depend on blur or on average spatial- frequency content per se. These induction effects may in part reflect a manifestation of selective contrast gain control
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We differentiate a cognitive branch of the visual system form a sensorimotor branch with the Roelofs effect, a perception that a target's position is biased in the direction opposite the offset of a surrounding fame. When a small fixed target is presented inside a frame that is offset to one side, normal humans perceive the target to be deviated in the direction opposite the frame's offset. They can still jab the target accurately, however, even though it is perceptually mislocalized. This dissociation indicates that motor coordinates are coded in a 'sensorimotor', possibly dorsal, pathway containing visual information that can be inconsistent with perceived information in a 'cognitive', possibly ventral pathway. Lack of a Roelofs effect indicates use of information in the sensorimotor pathway, independent from perception. We ask whether the sensorimotor pathway can handle a transformation of target position, in an anti-jabbing task analogous to anti-saccade tasks: the observer jabs a position symmetrically opposite the target's position, relative to the midline of the head. A 1 deg left or right. Observers were to jab the symmetrically opposite position as soon as the target disappeared. The result was a large and consistent Roelofs effect for an open-loop motor task, indicating that information from the cognitive pathway must be used to perform this task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The experiments reported here were designed to address two aims. Th first was to determine the sufficiency of head- generated motion parallax, when present in isolation, for the control of natural prehensile movements. The second was to assess the consequences of providing enhanced parallax information for prehension. Enhanced parallax was created by changing the spatial extend of the movement of a camera relative to the extend of the teleoperator's head movements. The gain ranged from 0.5 to 4. The scene was viewed for 2 secs before reaches were made in open-loop conditions. Results showed clearly that information from motion parallax is sufficient to support reliable and accurate motor movements. The enhanced information, led to predictable distortions in perceived size and distance and corresponding alterations in the transport and grip components. The results suggest that the provision of parallax information is beneficial for tasks requiring the recovery of metric depth information. However, if enhanced parallax is used, which facilitates performance in a range of perceptual tasks, re-calibration of the relative motion information is necessary to prevent size/distance distortions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monitor characterization has taken on new importance for non-professional users, who are not usually equipped to make photometric measurements. Our purpose was to examine some of the visual judgements used in characterization schemes that have been proposed for web users. We studied adjusting brightness to set the black level, banding effects du to digitization, and gamma estimation in the light an din the dark, and a color-matching tasks in the light, on a desktop CRT and a laptop LCD. Observers demonstrated the sensitivity of the visual system for comparative judgements in black- level adjustment, banding visibility, and gamma estimation. The results of the color-matching task were ambiguous. In the brightness adjustment task, the action of the adjustment was not as presumed; however, perceptual judgements were as expected under the actual conditions. Whenthe gamma estimates of observers were compared to photometric measurements, pro9blems with the definition of gamma were identified. Information about absolute light levels that would be important for characterizing a display, given the shortcomings of gamma in measuring apparent contrast, are not measurable by eye alone. The LCD was not studied as extensively as the CRT because of viewing-angle problems, and its transfer function did not follow a power law, rendering gamma estimation meaningless.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We conducted a perceptual image preference experiment over the web to find our (1) if typical computer users have significant variations in their display gamma settings, and (2) if so, do the gamma settings have significant perceptual effect on the appearance of images in their web browsers. The digital image renderings used were found to have preferred tone characteristics from a previous lab- controlled experiment. They were rendered with 4 different gamma settings. The subjects were asked to view the images over the web, with their own computer equipment and web browsers. The subjects werewe asked to view the images over the web, with their own computer equipment and web browsers. The subjects made pair-wise subjective preference judgements on which rendering they liked bets for each image. Each subject's display gamma setting was estimated using a 'gamma estimator' tool, implemented as a Java applet. The results indicated that (1) the user's gamma settings, as estimated in the experiment, span a wide range from about 1.8 to about 3.0; (2) the subjects preferred images that werewe rendered with a 'correct' gamma value matching their display setting. Subjects disliked images rendered with a gamma value not matching their displays'. This indicates that display gamma estimation is a perceptually significant factor in web image optimization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a fast perceptual method for evaluating color scales for data visualization that uses a monochrome photographic image of a human face as a test pattern. We conducted an experiment in which we applied various color scales to a photographic image of a face and asked observers to rate the naturalness of each image. We found a very strong correlation between the perceived naturalness of the images and the luminance monotonicity of the color scales. Since color scales with monotonic luminance profiles are widely recommended and used for visualizing continuous scalar data, we conclude that using a human face as a test patten provides a quick, simple method for evaluating such color scale in Internet environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current image retrieval systems compare images based on low- level primitives, such as color, color layout, texture and shape. However, recent psychophysical experiments show that human observers primarily use high-level semantic descriptors and categories to judge image similarity. To model. these high-level descriptors in terms of low-level primitives we use hierarchical clustering to segment the psychophysically determined image similarity space into semantically meaningful categories. We then conduct a series of psychophysical experiments to evaluate the perceptual salience of these categories. For each category we investigate the correlation with low-level pictorial features to identify semantically relevant features, their organization and distribution. Our findings suggest a new semantically based image similarity model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Edges are of fundamental importance in the analysis of images, and of course in the field of image quality. To incorporate the edge information as coded by the HVS in a vector quantization scheme, we have developed a classification strategy to separate edge vectors form non- edge vectors. This strategy allows the generation of different sets of codewords of different size for each kind of vectors. For each one of the 'edge' sets, the final size is perceptually tuned. Finally, when an image is encoded, its associated edge map is generated. Then the selection of the appropriate 'edge' set is made in respect with the edge amount present in the image. Then the second set of non-edge vectors is performed in order to respect the required compression rate. Statistical measure and psychophysical experiments have been performed to judge the quality of reconstructed images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital visual data has bene rapidly becoming more and more ubiquitous, and so content-based techniques to perform indexing are imperative. This paper outlines research geared towards an image retrieval system that identifies shapes in images and classify them into appropriate categories. The system functions on pre-processed, segmented images extracting component regions on the basis of color. This is followed by shape analysis using invariant moments with perceptual considerations made on the basis of subjective testing. The subjectivity is incorporated into the system via a set of thresholds whose strictness can be manipulated by the user. The implementation employing SQL and DB2 provided and 84 percent placement rate for the various images investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a computational approach to main subject detection, which provides a measure of saliency or importance for different regions that are associated with different subjects in an image with unconstrained scene content. It is built primarily upon selected image semantics, with low-level vision feature also contributing to the decision. The algorithm consists of region segmentation, perceptual grouping, feature extraction, and probabilistic reasoning. To accommodate the inherent ambiguity in the problem as reflected by the ground truth, we have developed a novel training mechanism for Bayes nets- based on fractional frequency counting. Using a set of images spanning the 'photo space', experimental results have shown the promise of our approach in that most of the regions that independent observers ranked as the main subject are also labeled as such by our system. In addition, our approach lends itself to performance scalable configurations within the Bayes net-based framework. Different applications have different degrees of tolerance to performance degradation and sped aggravation; computing a full set of features may be not practical for time- critical applications. We have designed the algorithm to run under three configurations, without reorganization or retraining of the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The combination of the increased size of digital image databases and the increased frequency with which non- specialist access these databases is raising the question of the efficacy of visual search and retrieval tools. We hypothesize that the use of color harmony has the potential for improving image-search efficiency. We describe an image- retrieval algorithm that relies on a color harmony model. This mode, built on Munsell hue, value, and chroma contrast, is used to divide the image database into clusters that can be individually searched. To test the efficacy of the algorithm, it is compared to existing algorithms developed by Niblack et al and Feldman et al. A second study that utilizes the image query system in a retail application is also described.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The HAL 9000 computer, the inimitable star of the classic Kubrick and Clarke film '2001: A Space Odyssey,' displayed image understanding capabilities vastly beyond today's computer systems. HAL could not only instantly recognize who he was interacting with, but also he could lip read, judge aesthetics of visual sketches, recognize emotions subtly expressed by scientists on board the ship, and respond to these emotions in an adaptive personalized way. Of course, HAL also had capabilities that we might not want to give to machines, like the ability to terminate life support or otherwise take lives of people. This presentation highlights recent research in giving machines certain affective abilities that aim to make them ore intelligent, shows examples of some of these systems, and describes the role that affective abilities may play in future human-computer interaction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a series of experiments, observers' cognitive and psychophysiological responses to pictorial stimuli were evaluated. In the first experiment, subjects were viewing a set of randomly presented images. After each image presentation, they rates every image on a number of cognitive scales. In the second experiment, images producing certain physiological effects - deactivating, neutral, or activating - were individually selected based on the results of the first experiment and shown to the subjects again. Psychophysiological measurements included electrocardiogram, hand temperature, muscle tension, eye movements, blood oxygen, respiration, and galvanic skin response. Our result indicate that images produced significant emotional changes based on verbal and physiological assessment. The changes were in agreement with the predictions derived from the metric that we developed in a number of cases that exceeded the change level. The direction of changes corresponded to previous findings reported elsewhere.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Several international research teams are currently developing artificial human vision systems that have the potential to restore some visual faculties to blind persons. Given the significant advancements from these teams, it is conceivable that the implantation of a safe a d useful prosthesis will occur soon, perhaps in the next 2-4 years. It is thus timely to suggest and demonstrate methods to increase the information content of such artificial vision systems. Several ideas are suggested in this paper, such as brightness modulation, range indication, importance mapping and the delivery of supplementary information, which will do much towards providing visual information comparable to that obtained via a normally functioning human eye but at far lower information rates. This paper briefly describes the framework of artificial vision systems and outlines basic considerations of digital image processing as applied to artificial vision systems. We describe the poor quality of anticipated images produced by these artificial vision systems and the need for enhancing the images to allow increased scene understanding. Several techniques are identified which could enhance the information content of images. We then describe our own research in this area, which aims to determines the performance envelope of useful low quality images associated with artificial vision systems. Our subjective assessment studies using representative test patterns have investigated how much information and what types of information are needed to recognize or perceive a scene. This testing has been to identify the most informative image processing operations which lead to better understanding of picture content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Emerging broadband communication systems promise a future of multimedia telephony. The addition of visual information, for example, during telephone conversions would be most beneficial to people with impaired hearing useful for speech reading, based on existing narrowband communications system used for speech signal. A Hidden Markov Model (HMM)-based visual speech synthesizer is designed to improve speech understanding. The key elements in the application of HMMs to this problem are: a) the decomposition of the overall modeling task into key stages; and, b) the judicious determination of the components of the observation vector for each stage. The main contribution of this paper is the development of a novel correlation HMM model that is able to integrate independently trained acoustic and visual HMMs for speech-to-visual synthesis. This model allows increased flexibility in choosing model topologies for the acoustic and visual HMMs. It also reduces the amount of required training data compared to early integration modeling techniques. Results form objective and subjective analysis show that an HMM correlating model can significantly decrease audio-visual synchronization errors and increase speech understanding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
What makes a graphic image a good visualization. Why is one visualization better than another. Why are 3D visualizations better than 2D visualizations in some cases but not others. How does the size of the display, color, contrast level, brightness or frame rate affect the usability of the visualization, and how do these 'physical' quantities affect the type and amount of information that can be extracted from the visualization by the user. These are just a few of the questions that a multi-disciplinary effort at the NRL are trying to answer. By combining visualization experts, physicist and cognitive scientists, awe are trying to understand the cognitive processes carried out in the minds of scientists at the time they perform a visual analysis of their data. The results from this project are being used for the design of visualization methodologies and basic cognitive work. In this paper, we present a general description of our project and a brief discussion of the results obtained trying to understand why 3D visualizations are sometimes better than 2D, as most of the attempts at studying this problem have resulted in theories that are either to vague or under-specified, or not informative across different contexts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A variety of developments in twentieth century painting have expanded the depiction of space beyond the direct representation of optical space. This paper analyzes some of the artistic explorations of spatial concepts, giving particular attention to: (1) spatial composition, (2) spatial density and optical impressions, and (3) the deconstruction of visual space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A major factor that seriously hampers the use of binocular displays is visual discomfort. We have experimentally determined the level of (dis)comfort experienced by 24 subjects for short presentations of a wide range of binocular image imperfections. The image manipulations are representative of commonly encountered optical errors, imperfect filters, and stereoscopic disparities. The results how that nearly all binocular asymmetries seriously affect the visual comfort if present in a large enough amount. Threshold values for the onset of discomfort are estimated from the data. The database collected in the study should allow a more accurate prediction of visual comfort from the specification of a binocular viewing system than so far is possible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An all encompassing goal of our research is to develop an extra high quality imaging system which is able to convey a high level artistic impression faithfully. We have defined a high order sensation as such a high level artistic impression, and it is supposed that the high order sensation is expressed by the combination of the psychological factor which can be described by plural assessment words. In order to pursue the quality factors that are important for the reproduction of the high order sensation, we have focused on the image quality evaluation of the extra high quality images using the assessment words considering the high order sensation. In this paper, we have obtained the hierarchical structure between the collected assessment words and the principles of European painting based on the conveyance model of the high order sensation, and we have determined a key assessment word 'plasticity' which is able to evaluate the reproduction of the high order sensation more accurately. The results of the subjective assessment experiments using the prototype of the developed extra high quality imaging system have shown that the obtained key assessment word 'plasticity' is the most appropriate assessment word to evaluate the image quality of the extra high quality images quasi-quantitatively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of our research is to enable artists to interact with the world of their imagination; to create art by moving, molding, and shaping virtual forms as if by their own bodies. We show a print that represents a scene painted into virtual reality and tell how the scene and the subsequent print were done. There will be some discussion of the meaning of the work, the intent of the artist and the relationship between the art and the technology used to create it. The print seen here is both a picture of an exotic place and the artistic fruit of research into how to embody the user in virtual reality. Our approach is based on the premise that embodiment is an appropriate direction for developing tools to facilitate artistic expression. It is also a premise of this research that the constraints requisite to creating art will also result in tools that serve the visualization community in general.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The perception of spatio-temporal pattern is a fundamental part of visual cognition. In order to understand more about the principles behind these biological processes, we are analyzing and modeling the presentation of spatio-temporal structures on different levels of abstraction. For the low- level processing of motion information we have argued for the existence of a spatio-temporal memory in early vision. The basic properties of this structure are reflected in a neural network model which is currently developed. Here we discuss major architectural features of this network which is base don Kohonens SOMs. In order to enable the representation, processing and prediction of spatio-temporal pattern on different levels of granularity and abstraction the SOMs are organized in a hierarchical manner. The model has the advantage of a 'self-teaching' learning algorithm and stored temporal information try local feedback in each computational layer. The constraints for the neural modeling and data set for training the neural network are obtained by psychophysical experiments where human subjects' abilities for dealing with spatio-temporal information is investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Different stereoscopic effects, base don 100 percent binocular luminance contrast have been described previously: the 'sieve' effect, the 'binocular lustre' effect, the 'floating' effect and the rivaldepth' effect. By mean of a dichoptic set-up, we have measured the detection thresholds for these different effects in function of binocular luminance contrast. Psychometric data have ben recorded using a Yes-No paradigm, a spatial 2AFC paradigm and a temporal 2AFC paradigm. Our results show that even for small contrast all these stereoscopic effects are perceived. We have noticed an increase of the detection thresholds in the following order: 'sieve', 'binocular lustre', 'rivaldepth' and 'floating' effect. Two groups have been distinguished.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perception of the 3D shape of a smoothly curving surface can be facilitated or impeded by the use of different surface texture patterns. In this paper we report the results of a series of experiments intended to provide insight into how to select or design an appropriate texture for shape representation in compute graphics. In these experiments, we examine the effect of the presence and direction of luminance texture pattern anisotropy on the accuracy of observers' judgements of 3D surface shape. Our stimuli consists of complicated, smoothly curving level surfaces from a typical volumetric dataset, across which we have generated four different texture patterns via 3D line integral convolution: one isotropic and three anisotropic, with the anisotropic patterns oriented across the surface either in a single uniform direction, in a coherently varying direction, or in the first principal direction at every surface point. Observers indicated shape judgements via manipulating an array of local probes so that their circular bases appeared to lie in the tangent plane to the surface at the probe's's center, and the perpendicular extensions appeared to point in the direction of the local surface normal. Stimuli were displayed as binocularly viewed flat images in the first trials, and in stereo during the second trials. Under flat viewing, performance was found to be better in the cases of the isotropic pattern and the anisotropic pattern that followed the first principal direction than in the cases of the other two anisotropic and principal direction patterns than for the other two. Our results are consistent with a hypothesis that texture pattern anisotropy impedes surface shape perception in the case that the direction of the anisotropy does not locally follow the direction of greatest normal curvature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Keynote Session: Human and Computer Vision in Electronic Imaging
Image-based rendering and modeling has lead to major advances in computer graphics. Using images as input has enabled the creation of more realistic and convincing computer generated images. Many of the current image-based rendering algorithms, however, use primarily geometric information. The next logical step are to incorporates more photometric information, including lighting and material properties. Material properties are particularly important, since understanding the optical properties of materials - or appearance models - requires interdisciplinary research involving expertise in both vision and graphics. In this paper, we review the current work and suggest directions for future research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.