Article

MacVisSTA: a system for multimodal analysis

Authors:

R. Travis Rose,

Yang ShiAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Pages 259 - 264

https://doi.org/10.1145/1027933.1027976

Published: 13 October 2004 Publication History

Abstract

The study of embodied communication requires access to mul-tiple data sources such as multistream video and audio, various derived and meta-data such as gesture, head, posture, facial expression and gaze information. The common element that runs through these data is the co-temporality of the multiple modes of behavior. In this paper, we present the multimedia Visualization for Situated Temporal Analysis (MacVisSTA) system for the analysis of multimodal human communication through video, audio, speech transcriptions, and gesture and head orientation data. The system uses a multiple linked representation strategy in which different rep-resentations are linked by the current time focus. In this framework, the multiple display components associated with the disparate data types are kept in synchrony, each compo-nent serving as both a controller of the system as well as a display. Hence the user is able to analyze and manipulate the data from different analytical viewpoints (e.g. through the time-synchronized speech transcription or through motion segments of interest). MacVisSTA supports analysis of the synchronized data at varying timescales. It provides an annotation interface that permits users to code the data into 'music-score' objects, and to make and organize multimedia observa-tions about the data. Hence MacVisSTA integrates flexible visualization with annotation within a single framework. An XML database manager has been created for storage and search of annotation data. We compare the system with other existing annotation tools with respect to functionality and interface design. The software runs on Macintosh OS X computer systems.

References

[1]

McNeill, D., Hand and Mind: What Gestures Reveal about thought. 1992, Chicago: University of Chicago Press.

[2]

Kendon, A., Gesticulation and speech: Two apsects of the process of utterance, in Relationship Between Verbal and Nonverbal Communication, M.R. Key, Editor. 1980: The Hague. p. 207--227.

[3]

Kozma, R.B., A Reply: Media and Methods. Educational Technology Research and Development, 1994. 42(3): p. 1--14.

[4]

Kozma, R.B., et al., The Use of Multiple, Linked Representations to Facilitate Science Understanding, in International Perspectives on the Design of Technology-Supported Learning Environments, S. Vosniadou, et al., Editors. 1996: Mahwah, New Jersey.

[5]

Kipp, M., Anvil: Annotation of Video and Spoken Language. 2003.

[6]

Neidle, C., SignStream™: A Database Tool for Research on Visual-Gestural Language. Sign Transcription and Database Storage of Sign Information, a special issue of Sign Language and Linguistics, 2002. 4(1/2): p. 203--214.

[7]

Sanderson, P.M., et al., MacSHAPA and the enterprise of Exploratory Sequential Data Analysis (ESDA). International Journal of Human-Computer Studies, 1994. 41: p. 633--668.

Digital Library

[8]

Nivre, J., et al. Towards Multimodal Spoken Language Corpora: TransTool and SyncTool. in Proceedings of the Workshop on Partially Automated Techniques for Transcribing Naturally Occurring Speech at COLING-ACL '98. 1998. Montreal, Canada.

Digital Library

[9]

Hanke, T. and S. Prillwitz. SyncWRITER: Integrating Video into the Transcription and Analysis of Sign Language. in Proceedings of the Fourth European Congress on Sign Language Research. 1994. Munich, Germany.

[10]

CHILDES, Using CLAN (Manual available for download). 2003, CHILDES Project, Carnegie Mellon University.

[11]

Wittenberg, P., MediaTagger. 2000, Max Planck Institute for Psycholinguistics.

[12]

Dybkjaer, L., et al., Survey of Existing Tools, Standards and User Needs for Annotation of Natural Interaction and Multimodal Data. 2001, International Standards for Language Engineering, Natural Interaction and MultiModality Project: Odense, Denmark. p. 1--111.

[13]

Bigbee, T., D. Loehr, and L. Harper. Emerging Requirements for Multi-Modal Annotation and Analysis Tools. in In Proceedings, Eurospeech 2001 Special Event: Existing and Future Corpora -- Acoustic, Linguistic, and Multi-modal Requirements. 2001.

[14]

Quek, F., et al. VisSTA: A Tool for Analyzing Multimodal Discourse Data. in Seventh International Conference on Spoken Language Processing. 2002. Denver, CO.

[15]

Quek, F., et al., A multimedia database system for temporally situated perceptual psycholinguistic analysis. Multimedia Tools and Applications, 2002. 18(2): p. 91--113.

Digital Library

Cited By

Wang INarayana PSmith JDraper BBeveridge RRuiz JBerkovsky SHijikata YRekimoto JBurnett MBillinghurst MQuigley A(2018)EASELProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3173003(595-599)Online publication date: 5-Mar-2018
https://dl.acm.org/doi/10.1145/3172944.3173003
Dupre DAkpan DElias EAdam JMeillon BBonnefond NDubois MTcherkassof A(2015)OudjatInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2015.05.01083:C(51-61)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.ijhcs.2015.05.010
Thompson ABohus DEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)A framework for multimodal data collection, visualization, annotation and learningProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2531751(67-68)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2531751
Show More Cited By

Index Terms

MacVisSTA: a system for multimodal analysis
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Multimodal human discourse: gesture and speech

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...
Natural gesture in descriptive monologues
SIGGRAPH '06: ACM SIGGRAPH 2006 Courses

Gesture plays a prominent role in human-human interaction, and it offers promise as a new modality for human-computer interaction. However, our understanding of gesture is still at an early stage. This study explores gesture in natural interaction and ...
When Gestures and Words Synchronize: Exploring A Human Lecturer's Multimodal Interaction for the Design of Embodied Pedagogical Agents
CSCW '23 Companion: Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing

Embodied Pedagogical Agents (EPAs) possess significant potential for using multimodal interactions to effectively convey social and educational information. While previous studies have examined the impact of EPAs on learning performance, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
420
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang INarayana PSmith JDraper BBeveridge RRuiz JBerkovsky SHijikata YRekimoto JBurnett MBillinghurst MQuigley A(2018)EASELProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3173003(595-599)Online publication date: 5-Mar-2018
https://dl.acm.org/doi/10.1145/3172944.3173003
Dupre DAkpan DElias EAdam JMeillon BBonnefond NDubois MTcherkassof A(2015)OudjatInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2015.05.01083:C(51-61)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.ijhcs.2015.05.010
Thompson ABohus DEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)A framework for multimodal data collection, visualization, annotation and learningProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2531751(67-68)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2531751
Miller CQuek FMorency LEpps JChen FOviatt SMase KSears AJokinen KSchuller B(2013)Interactive relevance search and modelingProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2522889(149-156)Online publication date: 9-Dec-2013
https://dl.acm.org/doi/10.1145/2522848.2522889
Quek FOliveira F(2013)Enabling the blind to see gesturesACM Transactions on Computer-Human Interaction10.1145/2442106.244211020:1(1-32)Online publication date: 11-Apr-2013
https://dl.acm.org/doi/10.1145/2442106.2442110
Miller CMorency LQuek FMorency LBohus DAghajan HCassell JNijholt AEpps J(2012)Structural and temporal inference search (STIS)Proceedings of the 14th ACM international conference on Multimodal interaction10.1145/2388676.2388702(101-108)Online publication date: 22-Oct-2012
https://dl.acm.org/doi/10.1145/2388676.2388702
Georgeon OMille ABellet TMathern BRitter F(2012)Supporting activity modelling from activity tracesExpert Systems: The Journal of Knowledge Engineering10.1111/j.1468-0394.2011.00584.x29:3(261-275)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.1111/j.1468-0394.2011.00584.x
Pfeiffer T(2012)Using virtual reality technology in linguistic researchProceedings of the 2012 IEEE Virtual Reality10.1109/VR.2012.6180893(83-84)Online publication date: 4-Mar-2012
https://dl.acm.org/doi/10.1109/VR.2012.6180893
Oliveira FQuek FCowan HFang B(2012)The Haptic Deictic System—HDSIEEE Transactions on Haptics10.1109/TOH.2011.355:2(172-183)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1109/TOH.2011.35
(2012)ReferencesMultimedia Information Extraction10.1002/9781118219546.refs(425-460)Online publication date: 24-Aug-2012
https://doi.org/10.1002/9781118219546.refs
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents