An RSS-feed auditory aggregator using earcons

Athina Bikaki; Andreas Floros

An RSS-feed auditory aggregator using earcons

2011, ACM International Conference Proceeding Series

An RSS-feed Auditory Aggregator Using Earcons Athina Bikaki Andreas Floros School of Science & Technology Hellenic Open University Parodos Aristotelous 18, 262 22 Patras, Greece +30 697 6119757 Dept. of Audiovisual Arts Ionian University Plateia Tsirigoti 7, 49100 Corfu, Greece +30 26610 87725 [email protected] [email protected] space, allowing a significant reduction of the visual input impact to the overall human perception. ABSTRACT In this work we present a data sonification framework based on parallel / concurrent sonic earcons’ representations for monitoring in real-time information related to stock market. The information under consideration is conveyed through the well-known Really Simple Syndication (RSS) feed Internet mechanism and includes both text and numeric values, converted to speech and earcons using existing speech synthesis techniques and sonic design guidelines. Due to the considered application characteristics, particular emphasis is provided on information representation concurrency, mainly achieved using sound source spatialization techniques and different timbre characteristics. Spatial positioning of sound sources is performed through typical binaural processing and reproduction. A number of systematic, subjective assessments performed have shown that the overall perceptual efficiency and sonic representation accuracy fulfills the overall application requirements, provided that the users are appropriately trained prior to using the proposed RSS-feed auditory aggregator. On the other hand, the rapid growth of the available information over the Internet has lead to the development of multiple means and protocols for (real or non-real time) data delivery over packetbased (IP) networks. Moreover, the enrollment of multimedia networking technologies has increased the number of potential formats for information delivery and representation. A typical, widely-employed protocol for information delivery over Internet is the so called Really Simple Syndication (RSS), a relatively simple and lightweight XML format designed for sharing Web content among targeted subscribers [4]. RSS-enabled web sites encapsulate transmitted data in information channels (termed as RSS-feeds), which are accessed by remote clients/aggregators. These clients are very frequently responsible for information representation, using mainly visual output means. Typical content transmitted through RSS include news and announcements, calendars, search results, e.tc. [5] and is formed and finally represented as text. Categories and Subject Descriptors In this work, we aim to extend the concept of RSS-based information delivery and aggregation using sonification. More specifically, we propose an application framework, which transparently parses a number of RSS-feeds and represents them through an appropriately shaped spatial auditory display. The apparent advantage of this approach is that the perception of the RSS-feed content becomes an “eye-free”, non-visual process, while, in addition, parallel and comparative web content representation is feasible. We subjectively assess the performance of the overall sonification-enabled RSS system for a typical and very common application scenario: concurrent news and stock market monitoring. The choice of this scenario is based on the fact that it incorporates multiple parallel and real-time information transmissions, allowing the evaluation of the overall functionality of the proposed system. J.5 [Arts and Humanities]: Music – data sonification, earcons, sonic design General Terms Algorithms, Design. Keywords RSS-feed sonification, earcons, auditory displays. 1. INTRODUCTION Data sonification represents a widely-employed approach for representing information originating from every-day life activities through appropriately designed auditory displays. Typical application fields include visual representation systems enhancement [1], as well as exclusively non-visual systems for data representation [2]-[3]. Sonification takes advantage of fundamental human auditory features, including (for example) the high dynamic range of human hearing and the ability to accurately localize sound source positions in the three dimensional (3D) The rest of the paper is organized as following: the next Section provides a detailed overview of different sonification techniques already proposed in the literature, with emphasis on earcons, a sonification technique that is in particular considered in this work. Additional details regarding the implementation of the proposed sonification-enabled RSS-feed are also analyzed in Section 3, followed by the subjective assessment of the performance of the overall system presented in Section 4. The last Section concludes the work and indicates a number of issues that should be further investigated in the future. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AM ‘11, September 7–9, 2011, Coimbra, Portugal. Copyright © 2011 ACM 978-1-4503-1081-9…$10.00. 95 2. SONIFICATION THROUGH EARCONS: AN OVERVIEW 3. IMPLEMENTATION ASPECTS 3.1 Design requirements In general, the term sonification defines the process of using nonspeech audio to convey information [6]. It is different in concept compared to audification, which particularly refers to the process of considering the data values under representation as the amplitude values of the sonic waveform. The fundamental implementation aspect of sonification is the auditory display [7], which incorporates all the necessary sound signal components for achieving the desired acoustic perception. With the continuous evolution of three dimensional (3D) sound coding and reproduction formats, the information volume that can be conveyed through an auditory display is significantly increased, since it can be reproduced and perceived in parallel / concurrently [8]. As mentioned in the introductory Section, aim of this work is to define, develop and assess a framework for RSS-feed sonification. Provided that the web content distributed though RSS refers to a variety of information types, the following initial requirements were considered during the RSS-feed auditory aggregator design phase: A number of different sonic design approaches for data sonification has been already appeared in the literature and is widely employed in many application fields, including non-visual human-machine interfaces for blind and visually impaired users. Starting from simple alarm signals, one can additionally identify parameter mapping techniques [9], where the monitored data values are directly mapped to specific sonic attributes (such as pitch, vibrato strength or speed, brightness, e.tc), producing a kind of “sonic” scatter plot. Musical sonification extends the above concept by providing additional mapping parameters of organized music structure [10], such as tempo, rhythm or even orchestration, hence adding aesthetic characteristics to the final sonic representation. Moreover, significant enhancements can be achieved when using Model-Based Sonification (MBS) [11], which aims to introduce dynamic characteristics for the connection between the monitored data and the acoustic representation, allowing the production of sounds driven and controlled specifically by the user actions. • The complete system should be able to support multiple types of information conveyed through corresponding RSSfeeds. • Data transmissions through an RSS-feed are asynchronous. Hence, many multiple data-streams can arrive simultaneously to the RSS aggregator. The probability of concurrent reception increases with the generation rate of the transmitted data. • In order to avoid any information losses imposed by auditory masking phenomena during concurrent RSS-feeds transmissions, a sonification strategy that supports parallel sonic events should be selected and designed. • Periodic or very frequent data transmissions impose short durations of sonic events used to create the complete auditory display. • Sonification concurrency also resolves the very-first requirement of supporting multiple types and formats of RSS transmitted information. Given the above initial assumptions, the earcons sonification strategy was selected for the purposes of this work, since it fulfills the above list of initial requirements. Moreover, in order to develop a realistic test application scenario, we have considered the following use case: The subscribed RSS-feeds included two types of information: a) stock market data values, delivered as text but translated to numeric values by the RSS-aggregator system and b) announcements and news related to the stock market transactions, received as plain text. Auditory icons [12] also represent a common sonification technique, widely employed in every-day applications. An auditory icon is the direct equivalent of a visual icon. It mainly relies on everyday sounds and is targeted to provide a recognizable sonification mapping based on the specific type of data rather than mapping to sound the data values themselves. Earcons on the other hand are defined as “nonverbal audio messages used in the user-computer interface to provide information to the user about some computer object, operation or interaction” [13]. Under this perspective, earcons are constructed using specific sonic motives employed as fundamental building sound structures combined to form the complete sonic event. Sonic motives variable characteristics may include rhythm, pitch, intensity, timbre and register [14]. An exception to this earcons structure principle are spearcons, which are derived from speech signals sufficiently depressed in time, in order to sound not as speech, but as an earcon [15]. 3.2 System Architecture The RSS-feed auditory aggregator general architecture is illustrated in Figure 1. We hereby assume that the proposed sonification-enabled RSS-feed aggregator is subscribed to N different RSS-feeds. Depending on their type, the parsed information is organized into M information streams (where M≤N, a value that exclusively depends on the type of the received data). Currently, this information categorization is performed manually by the user during the initial subscription to a specific feed. However, more advanced automated methods could be employed in the future, taking into account the RSS-feed identity, as well as the transmitted content itself. Hence, due to their variable and controlled structure, earcons are generally flexible and can be additionally organized in hierarchical families, a significant feature for representing information organized in a tree-format. Moreover, concurrent (or parallel) earcons may extend the presentation bandwidth of the auditory display using sonic reproduction in a temporally overlapping fashion, provided that specific design guidelines will be followed [16]. However, despite their flexibility, earcons suffer from the lack of a consistent and meaningful relation with their referent context. In order to overcome this, targeted users have to learn this relation in advance and train themselves. Data-stream clustering information is transmitted to the Earcon Design Engine, which is responsible for producing the appropriate earcons in real-time, taking into account the type of each information category (a process that is illustrated in Figure 1 as a dotted arrow). Real-time earcons construction is preferable compared to alternative approaches (i.e. pre-recorded set of earcons), due to the high amount of storage capacity required to support a large variety of transmitted RSS information. 96 caused by the more intensive calculations; however, this does not impose any quality issues to the overall implementation. web ... N-Subscribed RSS-feeds Clustering information XML Parser M-Information streams (M<=N) ... Earcon Design Engine ... Binaural Processing Auditory Display Synthesis Audio output (L/R) M-concurrent / spatialized earcons Figure 1. Diagram of the RSS-feed auditory aggregator general architecture Figure 2. Schematic description of the proposed implementation The binaural processing module is used to introduce spatialization in the final auditory display and provide a robust and efficient (in terms of acoustic perception) mean of concurrency during sonification. The choice of binaural technology was based a) on the localization accuracy it achieves (especially when using equalized headphones; however, audio reproduction through a loudspeaker pair is feasible if cross-talk cancellation preprocessing techniques are applied) b) on the simplicity of the reproduction system setup, and c) on the portability that can be supported, a factor that can address mobility issues frequently required by the end user. Finally, the derived spatialized earcon signals are forwarded to the Auditory Display Synthesis module, which is responsible for mixing the corresponding binaural signals and reproducing the complete auditory display. Regarding the RSS-feed subscription handling, a categorized list of different RSS feeds was offered by the system. By selecting a category, the list was filtered accordingly and the corresponding RSS feeds were displayed to the user. The user was also able to define RSS-feed clusters (i.e. stock quotes), a process that corresponds to the information categorization procedure described previously. Figure 2 illustrates the particular implementation details and data flow followed within the scope of the application scenario described previously. Both the announcements and the stock values data were received from RSS-feed subscriptions of the www.nasdaq.com web site. The stock market news feeds were converted to speech, using text-to-speech (TTS) techniques. The synthesized speech signal was processed by the binaural processing sub-system to allow the creation of a virtual speech source in the 3D space of the final auditory display. The control parameters of the speech synthesis and the voice virtual source binaural processing are appeared in Figure 3. On the other hand, the stock values data were converted to earcons, following the procedure that is described in detail in the next Section. Binaural signal processing was performed using the CSound audio synthesis programming framework [17]. The selection of CSound was based on the spatialization simplicity required by the developed application: for the purposes of this work, virtual sound sources placement was allowed in discrete, pre-defined positions at different elevation and azimuth values. Moreover, the relatively short duration of the earcons and speech signals relaxed the requirements for real time processing and binaural rendering, especially for sampling rates equal to 44.1kHz. Higher sampling rate values were found to induce unacceptable reproduction delays Figure 3. Speech synthesis parameterization options 97 concurrently presented earcons, we also followed Brewster’s guidelines [14] to add an extra silent time interval between them. Additionally, the user was allowed to re-define a number of the default earcons design parameters (see Figure 4), in order to create some optimized personal earcons profiles. However, only specific parameters could be assessed by the user, in order to ensure that the basic design principles were not ignored. 3.3 Earcons Design As mentioned in the previous Sections, stock market values were sonified using earcons. Towards a robust and efficient earcons sonic design, we followed basic design principles introduced in [14] and [16]. Hence, we mapped different, numeric RSS data feeds to different musical instruments (timbres). Additionally, numeric and text data representation concurrency was implemented using sound source spatialization (realized again through earcons binaural processing). In [16], this approach was found to achieve better concurrent / parallel earcons identification scores, compared to earcons derived using non-spatial information only. The musical instruments employed were selected to be members of a symphonic orchestra, providing an adequate variety of available timbres. Moreover, this approach was selected for defining their spatial positions, which followed a typical symphonic orchestra layout. More specifically, we considered instruments from different categories i.e. strings, woodwind, brass etc, in order to allow a maximum amount of perceptual timbre discrimination. The different combinations of instruments in the orchestra allowed the investigation of how to produce a wellsounded chord of certain tone-quality, acquire a uniformity of structure and requisite power. Using this information, we tried to balance and correct the “colour” of the final sound. Additionally, during sound source spatialization, we placed stock information categorized in the same stock cluster (defined by the user, see previous Section) in the same orchestra spatial section i.e. left, right, front-left, front-right etc, so that grouping of stock market information was supported. Figure 4. Earcons design parameterization options 4. ASSESMENT AND RESULTS The assessment of the performance of the RSS-feed auditory aggregator was performed through a sequence of subjective tests. During these tests, ten participants (six men and four women aged between 32 and 75) used the system under six different initial test setups. These test sequences included different numbers and types of active/received RSS-feeds, as well as earcons design and voice parameters. The third column of Table 2 summarizes these applicable test variable conditions. For earcons construction, we kept the rhythm and tempo fixed, but we varied the notes’ durations within one measure and consequently the number of notes in the same time interval. Small note lengths are better to be avoided as suggested in [14], since they might not be noticed. Thus we considered note lengths always greater than 0.4 seconds. We defined a numerical scale of stock value changes (SVCs) and assigned different ranges to different note values in the measure as well as to the number of notes, as displayed in Table 1. The numerical scale was built using historical data and based on the price changes of a specific stock category i.e. oil companies’ stock data. Moreover, we applied sound amplitude variations to control the volatility of each stock quote. For example, if a stock value was increased, the corresponding SVC was positive and it was mapped to a gradually increasing sound gain and vice versa. In the case of speech reproduction we have used two different voices (one male and one female). The participants were requested to fill a questionnaire suitable for measuring the average perceptual score during each test case. Prior to that, detailed guidelines explaining the earcons design rules were provided to all human subjects, which were allowed to train themselves in practice by listening to pre-known stock values sonification cases. It should be also noted that the majority of the participants responded that they didn’t have any kind of systematic music education and that they were not music experts. The score values range was defined within the interval 0 (low perception accuracy) and 5 (high perceptual accuracy). With the term perceptual accuracy, we hereby define the difference measure between the real information parsed by the RSS aggregator and the actual one perceived by the user. The smaller this difference is (measured in percentage units), the higher the obtained perceptual accuracy. Table 1. Stock value changes numerical scale and earcon structure Note Values in the Measure Stock Value Change (%) |SVC| ≥ 1.5 0.8 < |SVC| < 1.5 A summary of the results/average scores obtained is displayed in Table 2. In general, the following conclusions were drawn: 0.2 < |SVC| ≤ 0.8 • |SVC| ≤ 0.2 We generally tried to keep the earcons’ structure as simple as possible, so that it would be easier for a human user to understand, learn and memorize it. To improve the identification of 98 Components that have similar attributes are more likely to be grouped together. Hence, similar news and data are better to be positioned in the same direction (proximity), so that objects close to each other are grouped together. This trend was successfully identified in the sequence of the experiments performed. • The subject’s ability to simultaneously comprehend a speech stream and understand other related sound data decreases as the number of transmitted RSS-channels increase. • The use of male–female voices alternatively increases the ability of the user to comprehend and easily pay attention to different text RSS-streams/announcements. • The use of different musical instruments for earcon sonic construction resulted into better perceptual segregation compared to the simplistic use of different pitch. • Variations in tempo and envelope/amplitude dynamics additionally and significantly contribute in optimizing the overall perception measured. Table 3. Evaluation tests results summary RSS Feeds 1 3 2 3 3 3 4 4 5 6 6 6 Setup Properties Speech-speech- speech same gain and rate, different voices Earcon-earcon-earcon same pitch, different timbre, gain, rhythm Earcon-earcon-earcon same timbre, different pitch, gain, rhythm Earcon-speech-earconspeech same pitch, volume and rate, different timbre, gain, rhythm and voices Earcon-earcon-speechearcon-earcon-speech partially same pitch, same gain and rate, partially different timbre, different gain, rhythm and voices Earcon-speech-earconspeech-earcon-speech same pitch, volume and rate, different timbre, gain, rhythm and voices Feeds Setup Properties Average score 1 4 Earcon-speechearcon-speech 5 2 6 Earcon-speechearcon-speechearcon-speech 3.8 The results have confirmed that the grouping of information in the layout described in the previous section helped the participants to correctly identify stock price information. From the evaluation experiment we verified, that the layout of the four different RSS feeds grouped as a pair of earcon-speech for each stock, was the most robust. An increase in the number of stocks, showed a slight decrease in the identification performance. Table 2. Subjective tests results summary Test ID Test # Average perceptual score 2.5 5. CONCLUSIONS Data sonification represents an alternative mean for eye-free information representation in a wide range of every-day life applications. Starting from auditory displays targeted to blind or visual-impaired users and extending its’ potential employment in cases where visual focus cannot be achieved, there is a large number of alternative sonification techniques with different characteristics and effective perceptual performance. 4.0 3.9 In this work we aim to introduce the concept for a novel RSS-feed auditory aggregator, based on one of the most interesting realizations of sonification: earcons. RSS protocol represents a widely-employed framework for information delivery over Internet. Multiple information formats can be supported, limited only by the state-of-the-art in multimedia technologies applied in packet-based networks. The proposed RSS-feed auditory aggregator takes advantage of the fundamental properties of earcons (such as the parallel / concurrent reproduction support using 3D audio spatialization and timbre variation) for achieving an efficient representation of multiple, concurrent RSS streams with different content (i.e. organized as simple text or as discrete data values). The blending of these techniques enables a wider range of innovative applications for both mobile and stationary home use. A few examples of interesting applications that can be envisioned for the RSS-feed auditory aggregator are: real-time monitoring of different stock’s information data (open/close, high/low) and trade volume, real-time monitoring of different financial instruments (currencies, bonds, mutual funds, etc.), realtime monitoring of a selected stock portfolio, sonifying and monitoring a stock market index composition, etc. 5.0 3.9 4.1 We additionally performed two more accurate subjective tests to evaluate the system efficiency. All the tests represented daily stock value changes as well as stock news information. In each earcon we encoded the stock quote name, the stock price trend and a range of its value change, while in each speech we encoded the stock news text. We carried out a training session followed by an evaluation one for each test, for seven different participants. In the short training session we ensured that the participants had learned the sounds and we proceed to the evaluation of the system. During this evaluation, the participants had read the instruction sheet, which described the system setup, and were asked to indicate the trend and the daily value change range of each stock quote, considering the ranges appeared in Table 1. The score values range was also defined within the interval 0 (low evaluation accuracy) and 5 (high evaluation accuracy). A summary of the evaluation tests results is presented in Table 3. The realization of a typical application scenario and a sequence of subjective tests performed have shown that the proposed RSSfeed sonification approach achieves adequate performance in terms of perceptual accuracy of the transmitted content. Within the context of this work, the overall system lacks support of an automated procedure for an arbitrary number of RSS-feed subscriptions and their direct mapping to earcons design cues. However, it is the authors’ future intention to investigate this topic, towards a robust and optimized system for RSS-feeds and real-time information sonification. 99 [10] Lodha, S.K., Beahan, J., Heppe, T., Joseph, A. and ZaneUlman, B. 1997. MUSE: A Musical Data Sonification Toolkit. In Proceedings of the 4th International Conference on Auditory Display, (Palo Alto, California, November 2 – 5, 1997). ICAD’97. 6. REFERENCES [1] Brewster, S.A. 2002. Overcoming the lack of screen space on mobile computers. Personal and Ubiquitous Computing 6, 3, 88–205. [2] Edwards, K., Mynatt, E.D. and Stockton, E.D. 1995. Access to graphical interfaces for blind users. Interactions 2, 1, 56– 67. [11] Hermannand, T. and Ritter, H. 1999. Listen to your data: Model-based sonification for data analysis. Advances in intelligent computing and multimedia systems, Baden-Baden, Germany, G. E. Lasker, Ed., 189–194. [3] Brown, L. and Brewster, S.A. 2003. Drawing by ear: Interpreting sonified line graphs. In Proceedings of the International Conference on Auditory Display, (Boston, Massachusetts, July 6-9, 2003). ICAD’03, 152–156. [12] Gaver, W. 1986. Auditory Icons: Using sound in computer interfaces. Human Computer Interaction 2, 2, 167-177. [13] Blattner, M.M., Sumikawa, D.A. and Greenberg, R.M. 1989. Earcons and Icons: Their Structure and Common Design Principles. ACM SIGCHI Bulletin 21, 1 (July 1989), 123124. DOI = http://dx.doi.org/10.1145/67880.1046599. [4] The RSS 2.0 specification, RSS advisory board, 2009. http://www.rssboard.org/rss-specification [5] RSS Tutorial for Content Publishers and Webmasters. http://www.mnot.net/rss/tutorial/ [14] Brewster, A.S., Wright, P.C. and Edwards, A.D.N. 1995. Experimentally Derived Guidelines for the Creation of Earcons. In Proceedings of the HCI'95 Conference, (29 August - 1 September 1995). [6] Kramer, G. 1998. Sonification Report: Status of the Field and Research Agenda. NSF Sonification White Paper. [7] Kramer, G. 1994. An introduction to auditory display. In KRAMER, G. (Ed.) Auditory Display : sonification, audification, and auditory interfaces. Reading, MA, Addison-Wesley. [15] Dingler, T., Lindsay, J. and Walker, B.N. 2008. Learnability of Sound Cues for Environmental Features: Auditory Icons, Earcons, Spearcons and Speech. In Proceedings of the 14th International Conference on Auditory Display, (Paris, France, June 24 – 28, 2008). ICAD’08. [8] Frauenberger, C., Putz, V. and Holdrich, R. 2004. Spatial Auditory Displays A Study on the Use of Virtual Audio Environments as Interfaces for Users with Visual Disabilities. In Proceedings of the 7th International Conference on Digital Audio Effects, (Naples, Italy, October 5-8, 2004). DAFx’04. [16] McGookin, D.K. and Brewster, S.A. 2004. Empirically Derived Guidelines for the Presentation of Concurrent Earcons. In Proceedings of the HCI2004, (Leeds, UK, September 6-10, 2004). [9] Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, Gary.2002. SonART : The Sonification Application Research Toolbox. In Proceedings of the 8th International Conference on Auditory Display, (Kyoto, Japan, 2002). ICAD’02. [17] The CSound acoustic compiler official web site: http://www.csounds.com/ 100

Log In

An RSS-feed auditory aggregator using earcons

Related papers

Related papers

Related topics