It is our great pleasure to welcome you to the 1sd ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA 2010).
Thanks to the confluence of urban monitoring applications and portable cameras carried by humans everywhere, video acquisition, processing, and storage systems have become an integral part of the fabric of today's life. Besides the classical applications such as security and surveillance, cameras have been considered for creating novel applications based on the notions of smart homes, ambient intelligence, human-computer interaction, social networks, ambient assisted living, smart seminar rooms, building emergency management, assistive technologies, and many more. Cameras are mounted at fixed location in the urban infrastructure, or are moved around by humans or vehicles. Fusion of data between the cameras is explored as a means to enhance the interpretation task or to add confidence to the monitoring results. Such fusion can occur across different spatio-temporal levels and between subsets of fixed and mobile cameras as per the needs of the application and the availability of valuable information from each camera.
Application of multimedia, computer vision, and pattern recognition research based on combined fixed and mobile cameras can hence find novel areas of exploration. Detection, recognition, tracking, re-identification, pose/posture estimation, activity analysis, multimodal signal fusion, and other techniques applied in individual, local, or global settings are example technologies that offer new opportunities for defining multimodal pervasive video processing frameworks.
This workshop aims to act as a forum for sharing new techniques and applications based on pervasive video analysis for researchers, developers and practitioners from academia and industry. Addressing new challenges related to processing of distributed observations with a network of cameras and applications based on joint video analysis between fixed and mobile cameras will be the subjects of interest. Techniques and applications based on fixed or mobile camera systems will also be considered. This workshop is intended to bring together the successful series of the Video Surveillance and Sensor Networks (VSSN) workshops, held until 2006 in conjunction with ACM Multimedia conference and the workshop on Vision Networks for Behavior Analysis held on conjunction with ACM Multimedia 2008. This new workshop inherits from the above mentioned workshops the interests of their communities, but shifts the focus to cover higher level topics and applications under the common framework of "multimodal pervasive video analysis", hence aiming to adapt to the evolving directions of interest in the field, and reaching out to other research communities with overlapping interests.
The call for papers attracted 17 submissions from Asia, Europe, and North America. The program committee accepted 11 papers that cover a wide variety of topics within MPVA.
One important goal of this workshop is to bring together several communities and the connection between them is given by the computer vision algorithms and the multimodal framework employed in a large variety of applications where the human's explicit or implicit analysis/interaction is necessary.
Proceeding Downloads
Image-based indoor positioning system: fast image matching using omnidirectional panoramic images
In this paper, we developed an image-based indoor localization system using omnidirectional panoramic images to which location information is added. By the combination of the robust image matching by PCA-SIFT and fast nearest neighbor search algorithm ...
Real time multiple people tracking and pose estimation
In this paper we present a combined probability estimation approach to detect and track multiple people for pose estimation at the same time. It can deal with partial and total occlusion between persons by adding torso appearance to the tracker. ...
Video narrative authoring with motion inpainting
Storytelling and narrative creation are recent interests in the areas of interactive media designs. Instead of using virtual reality-based 3-D models, we propose a system which uses video technologies to generate video story from existing avatars and ...
Efficient person identification using active cameras in a smartroom
Identifying people is an important task in a Smartroom environment. Active cameras are well suited for the task as they provide high resolution images at almost any location in the room. Since active cameras only observe a small part of the field of ...
Learning local features for age estimation on real-life faces
In this paper, we investigate age estimation on real-life faces acquired in unconstrained conditions. This is a challenging but relatively understudied problem, with interesting applications in many areas (e.g., visual surveillance). We use the large ...
Spatial-temporal understanding of urban scenes through large camera network
Outdoor surveillance cameras have become prevalent as part of the urban infrastructure, and provided a good data source for studying urban dynamics. In this work, we provide a spatial-temporal analysis of 8 weeks of video data collected from the large ...
3d gesture recognition applying long short-term memory and contextual knowledge in a CAVE
- Dejan Arsićc,
- Luis Roalter,
- Martin Wöllmer,
- Florian Eyben,
- Björn Schuller,
- Moritz Kaiser,
- Matthias Kranz,
- Gerhard Rigoll
Virtual reality applications are emerging into various regions of research and entertainment. Although visual and acoustic capabilities are already quite impressive, a wide range of users still criticizes the user interface. Frequently complex and very ...
Space speaks: towards socially and personality aware visual surveillance
There is a complex unwritten code which regulates human interactions. In this paper we present a camera based monitoring system that explores the relationship between proxemics, visual attention and personality traits during interaction. People's ...
Modeling and recognition of complex multi-person interactions in video
In this paper, we focus on the problem of searching for complex activities involving multiple, interacting objects in video. We examine the dynamics of formation and dispersal of groups as well as their interactions with other groups and individuals. In ...
Learning human pose in crowd
In a crowded public space, body and head pose can provide useful information for understanding human behaviours and intentions. In this paper, we propose a novel framework for locating people and inferring their body and head poses. Human detection and ...
Video topic modelling with behavioural segmentation
Topic models such as Latent Dirichlet Allocation (LDA) are used extensively for modelling multi-object behaviour and anomaly detection in busy scenes. However, existing topic models suffer from the sensitivity problem, where they are unable to detect ...
- Proceedings of the 1st ACM international workshop on Multimodal pervasive video analysis