\corrauth

Imran Ashraf

EmoWrite: A Sentiment Analysis-Based Thought to Text Conversion - A Validation Study

Imran Raza11affiliationmark:    Syed Asad Hussain11affiliationmark:    Muhammad Hasan Jamal11affiliationmark:   
Isabel de la Torre Diez 22affiliationmark:
   Carmen Lili Rodriguez Velasco 3,4,53,4,5affiliationmark:    Jose Manuel Brenosa 3,6,73,6,7affiliationmark: and Imran Ashraf 8,*8,*affiliationmark: 11affiliationmark: Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan.
22affiliationmark: Department of Signal Theory, Communications and Telematics Engineering. Unviersity of Valladolid, Paseo de Belen, 15. 47011 Valladolid – Spain.
33affiliationmark: Universidad Europea del Atlantico. Isabel Torres 21, 39011 Santander, Spain.
44affiliationmark: Universidad Internacional Iberoamericana Campeche 24560, Mexico.
55affiliationmark: Fundacion Universitaria Internacional de Colombia Bogota, Colombia.
66affiliationmark: Universidad Internacional Iberoamericana Arecibo, Puerto Rico 00613, USA.
77affiliationmark: Universidade Internacional do Cuanza. Cuito, Bie, Angola.
88affiliationmark: Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, South Korea; ([email protected])
[email protected]
Abstract

Objective: The objective of this study is to introduce ”EmoWrite,” a novel brain-computer interface (BCI) system aimed at addressing the limitations of existing BCI-based systems. Specifically, the objective includes improving typing speed, accuracy, user convenience, emotional state capturing, and sentiment analysis within the context of BCI technology.

Method: The method involves the development and implementation of EmoWrite, utilizing a user-centric Recurrent Neural Network (RNN) for thought-to-text conversion. The system incorporates visual feedback and introduces a dynamic keyboard with a contextually adaptive character appearance. Comprehensive evaluation and comparison against existing approaches are conducted, considering various metrics such as accuracy, typing speed, sentiment analysis, emotional state capturing, and user interface latency. The data required for this experiment was obtained from a total of 72 volunteers (40 male and 32 female) aged between 18 and 40.

Results: EmoWrite achieves notable results, including a typing speed of 6.6 Words Per Minute (WPM) and 31.9 Characters Per Minute (CPM) with a high accuracy rate of 90.36%. It excels in capturing emotional states, surpassing other systems with an Information Transfer Rate (ITR) of 87.55 bits/min for commands and 72.52 bits/min for letters. It also offers an intuitive user interface with a low latency of 2.685 seconds.

Conclusion: The introduction of EmoWrite represents a significant stride towards enhancing BCI usability and emotional integration. The findings indicate that EmoWrite shows promising potential in enhancing communication methods, with future implications for individuals with motor disabilities.

keywords:
Brain-computer interface, electroencephalogram, recurrent neural network, thought-to-text conversion

1 Introduction

Mind-reading systems were fiction, but with the advancement of technology, are becoming a reality and helping physically challenged people in performing their tasks e.g., controlling a wheelchair, robotic arm, and cursor. People having a physical disability and speech obstruction are not disabled, rather they are differently abled because they might be impaired with one or two abilities, but their other capabilities can be more precise and accurate as compared to a healthy person 1 . Particularly, when talking about paralytic patients, might not have an appropriate communication medium to convey their feelings, but their mental activity is more precise if it is utilized efficiently. A lot of research work has been done which is not limited only to assisting in rehabilitation, but to making them self-reliant 2 . Scientists are utilizing brain signals, usually through electroencephalogram (EEG), by extracting useful information from acquired signals. These brain signals are being used in many domains of daily life and exclusively in the medical domain for monitoring alertness, coma or death, brain damage, controlling anesthesia depth, brain development, testing of drugs, and monitoring sleep disorders 2 . EEG signals are also used to resolve speech impediments and eradicate communication barriers of paralytic patients by converting their thoughts (silent speech) to text.

There are two methods used in literature to decode brain signals. The first method directly decodes brain signals into a word, while the second method requires the use of an intermediate output device for converting thought to text. Converting a word directly from the brain to text seems not so feasible because only limited numbers of words can be interpreted at a time due to the need for additional training, computation power, and resources. Limited information is available regarding the aspect that whether the brain generates the same signals while perceiving similar words or not. Hence, this research area has not yet matured. 3 in their research decode five words only i.e. “Alpha”, “Bravo”, “Charlie”, “Delta” and “Echo” while 4 only decode five characters i.e. a, e, i, o, t. The second method needs a medium, which includes an interface containing characters or words, that can be selected with the help of brain signals. The character selection can be based on two mechanisms i.e. using a virtual keyboard or Visual Evoked Potential (VEP)/Steady-State Visual Evoked Potential (SSVEP) 5 . The virtual keyboard uses raw data or built-in functions of the headset i.e. left, right, up, down, or motor imagery (i.e. imaging movements of hands or feet) 2 , whereas attention-based systems (VEP/SSVEP) focus on some flickering stimulus for selection of characters 5 . The factors that can affect the performance of these systems are related to speed, accuracy, and usability of the system. Moreover, people using these systems are not able to express their feelings accurately because it is problematic to find a proper word to write according to one’s mood, so incorporation of the emotional state along with other commands from the brain will help in better utilization of these systems.

Using a deep learning approach can be significantly important in this regard and make potential contributions. For example, the study uppal2023enhancing introduced a technique to predict brain strokes with high accuracy. The brain stroke data was used to build the model which is a multilayer perceptron (MLP). It employed multiple optimizers, including adaptive moment estimation with maximum, root mean squared propagation (RMSProp), and the adaptive learning rate method. Experimental results indicate that the MLP model combined with the RMSProp optimizer performed the best, achieving a training accuracy of 95.8% and a testing accuracy of 94.9%. Another possible venture is the self-supervised learning phenomenon where large-sized unlabeled datasets are used for model training. The existing research is lacking on self-supervised learning, the study abdulrazzaq2024consequential discusses applications of self-supervised learning from the prospect of industrial engineering and medicine. The authors identify the key possibilities for prediction in these fields using self-supervised learning. Analysis is carried out in the context of medical staff predicting patient ailments more efficiently, without relying on traditional numerical models that require a lot of computation, time, storage, and effort for data annotation. Similarly, functional near-infrared spectroscopy (fNIRS) has been used for brain-computer interface (BCI) tasks. The study zafar2023metaheuristic introduces a method for selecting important features for brain-computer interface (BCI) applications using functional near-infrared spectroscopy (fNIRS). Temporal statistical features, like the mean, slope, maximum, skewness, and kurtosis, were calculated from all channels to create a training vector. Seven different optimization algorithms were tested for their ability to classify data using a k-nearest neighbor cost function: particle swarm optimization, cuckoo search optimization, the firefly algorithm, the bat algorithm, flower pollination optimization, whale optimization, and grey wolf optimization (GWO). This method was tested on an online dataset of motor imagery (MI) and mental arithmetic (MA) tasks from 29 healthy subjects. The results showed that using the features selected by these optimization algorithms significantly improved classification accuracy compared to using all available features.

The proposed innovative system ”EmoWrite” seamlessly integrates a dynamic and personalized graphical user interface with the capability to predict contextually relevant words based on the individual’s mood. The pivotal innovation of EmoWrite lies in its ability to monitor the emotional states of users and facilitate the articulation of their emotions through words. Due to nuanced differentiations among emotional classes, precise modeling becomes imperative. These emotional classes exhibit variations from person to person, thereby necessitating the successful resolution of the significant challenge of person-specific emotional class detection. The classification of brain signals, pivotal for identifying emotional classes, demands meticulous training. The adaptive arrangement of characters on the keyboard is designed to streamline character selection, facilitating swift typing. Furthermore, the character set arrangement adapts to the user’s unique typing style and contextual cues, enhancing the efficiency of communication for differently abled patients. For signal acquisition, the Emotiv Epoc+ headset, equipped with 14 EEG sensing channels, is employed to capture brain signals. EmoWrite implements established classification techniques sourced from existing literature, optimizing training efficiency. Given the deterioration of facial expressions over time in paralytic patients due to decreased or absent usage, the proposed system also encompasses emotion detection alongside the utilization of facial expressions. This dual functionality bears potential benefits for the rehabilitation process. The contributions of this paper are as follows:

  • Introduces BCI-driven solutions with the potential to support individuals with severe disabilities, focusing on initial validation and future applications for those with paralysis and speech impairments.

  • Translates inner speech into text using a dynamic keyboard featuring context-adaptive character displays.

  • Presents an innovative character arrangement on the keyboard, streamlining character selection and enhancing typing speed for users.

  • Enables sentiment-guided thought-to-text conversion and recommendations, a novel feature not previously documented in existing literature.

The rest of the paper is organized as follows. Section 2 describes the related work. Section 3 discusses the proposed scheme for data acquisition and data processing. Section 4 describes the real-time experimentation and results followed by the conclusion in Section 6 that reveals the potential of EmoWrite to convert silent speech to text.

2 Related Work

To enable communication for paralytic patients some work has been done in the past and still research is ongoing in the domain of BCI-based thoughts-to-text conversion. One of the methods to convert thoughts into text is using a graphical user interface (GUI) consisting of numbers, alphabet, or special characters, which are displayed in a certain order on a virtual keyboard.

Table 1: Comparison of Related Work
Virtual Keyboards Attention Based Keyboard Layout Action Selection
Ref. Flickering Simple SSVEP VEP Eye Gaze Attention Level Static Dynamic Raw Data Built-in Functions Emotional State Accuracy CPM/WPM
Zhang et al. (2018) 2 - - - - - - - 95.53% 6.67 CPM
Gupta et al. (2019) 8 - - - - - - 74.95% N/A
Masud et al. (2017) 16 - - - - - - - - 87.50% N/A
Chen et al. (2015) 17 - - - - - - - - - N/A 12 WPM
Cecotti (2011) 5 - - - - - - - N/A 5-10 CPM(P300) 7.34 CPM (SSVEP) 5 CPM (motor)
Chen et al. (2014) 18 - - - - - - - - 80-90% 6.5 WPM
Spüler et al. (2012) 19 - - - - - - - - 96% 9 WPM
Higger et al. (2016) 20 - - - - - - - - - 94% N/A
Akce et al. (2014) 21 - - - - - - - - N/A 11.93 CPM
Cecotti (2016) 22 - - - - - - - - - N/A 9.3 CPM
Ben-Ami et al. (2019) 23 - - - - - - - - - 25% N/A
Alomari et al. (2014) 6 - - - - - - - - N/A N/A
Hayet et al. (2019) 7 - - - - - - - - - N/A N/A
Williamson et al. (2009) 9 - - - - - - - - - N/A 7 CPM
Wang et al. (2018) 4 - - - - - - - - - - 31% N/A
Jarosiewicz et al. (2015) 24 - - - - - - - - - N/A 12 CPM
Pandarinath et al. (2017) 10 - - - - - - - - - N/A 36 CPM (QWERTY) 39 CPM (OPTI-II) 13.5 CPM (alphabetic)
Arijit et al. (2013) 25 - - - - - - - - - N/A N/A
Topal et al. (2012) 11 - - - - - - - - - N/A N/A
Pathirana et al. (2018) 12 - - - - - - - - - N/A 6.61 CPM
Andi et al. (2018) 13 - - - - - - - - - - 59.20% N/A
Birbaumer et al. (2000) 14 - - - - - - - - - N/A N/A
George et al. (2014) 26 - - - - - - - - - - N/A N/A
Mackenzie et al. (2010) 15 - - - - - - - - - - 99% 5.11 WPM
Morooka et al. (2018) 27 - - - - - - - - - - 79.90% N/A
EmoWrite - - - - - - - 90.36% 6.58 WPM 31.92 CPM

The brain signals are then used to control the selection of any desired character or alphabet from this virtual keyboard. The major selection methods used in previous systems can be categorized as 1) attention-based control like Visual Evoked Potential (VEP) or Steady-State Visual Evoked Potential (SSVEP), and 2) raw data or built-in functions of the headset to control the cursor or targeted area on the screen. The virtual keyboards are divided into single or multiple layers with static or dynamic keys and their design has a direct influence on the performance of the system. A wide literature survey has been conducted to specify different types of action selection methodologies; character arrangement and virtual keyboard designs.

Some of the major challenges for decoding EEG signals are low signal-to-noise ratio, time consumption, and accuracy. To overcome these challenges, a novel hybrid deep learning approach based on Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) is used that converts thoughts to text 2 . EEG signals are used to control the cursor of a personal computer 6 . Features from EEG signals are extracted with a 64-channel NeuroSky headset using a discrete wavelet transform and are classified using machine learning algorithms such as Support Vector Machine (SVM) and Neural Networks (NNs). 7 describe a two-layer hierarchical layout of the keyboard that works with motor imagery signals (left-hand raised, right-hand raised, nodding up and nodding down).

8 use EEG signals to enhance a written sentence with detected emotions by inserting words in it. Long Short-Term Memory (LSTM) Networks-based language modeling framework is used to verify the sentence correctness by ranking the generated suggestions. ‘Hex-o-spell’, a tilt-based gestural text entry system is introduced by 9 where the letters are arranged in 6 hexagonal shaped boxes that are rearranged after every transition to save time. 4 show how English alphabets are decoded using EEG phase information by using the 64-channel actiCHamp Brain Product to acquire EEG signals. Five alphabets (a, e, l, o, t) are chosen, and against each alphabet, the brain signals are recorded. Results showed that accuracy increased to 31% and the time improved to 200ms. Results prove that most decrypted data lies within the period of 100 to 600ms. High-performance intracortical BCI for communication is described by 10 which provides point-and-click control of the computer. These controls are translated by the ReFIT Kalman Filter (RKF) which translates the 2D cursor movement and the Hidden Markov Model (HMM) which translates the selection.

Efficiency issues of the virtual keyboard are discussed by 11 . The authors suggest that there should be at least one level of hierarchy for better usability while higher efficiency is achieved with a matrix-shaped keyboard. A novel virtual keyboard design is introduced by 12 using built-in functions of Emotiv Insight that help navigate through the interface and move the selected area. The characters are arranged circularly to utilize screen space efficiently. The dynamic caption of keys changes according to the previously entered characters using a predictive system.

Refer to caption
Figure 1: Basic Flow of Proposed System

Disabled people have limited activities and need a certain medium to translate their brain signals to interact with people around them. 13 use raw EEG data with Emotiv Epoc to translate thoughts to the text which will be implemented in SMS to provide ease of communication. A Thought Translation Device (TTD) that uses Slow Cortical Potential (SCP) to select characters or words is introduced by 14 . SCP is used because its learning rules are well-known, and the basics are well-understood. Cognitive performance drops when the user produces positive SCP while improvement in the performance and learning occur with negative SCP. To record brain activity an 8-channel EEG amplifier is used. Visual feedback from EEG is received and updated every 63 ms. 15 presents a scanning ambiguous keyboard that takes input from the user through one key or switch. The layout contains a letter section at the top and a word section at the bottom (candidate list). The focus is transferred between the letter section to the word section with the space key. The alphabets are highlighted in a sequence and the user can select them by triggering the input.

Table 1 shows the major parameters to check the performance of thought-to-text-based systems. Most existing schemes have implemented simple keyboards rather than visually evoked or flashing characters to achieve high accuracy. Moreover, the traversing of a keyboard can be easily controlled by using simple built-in functions or raw data, instead of utilizing attention-based systems with flashing or flickering characters. That is because attention-based systems require users to dwell on the desired character for a certain amount of time and it also adds training overhead. Furthermore, these systems have not incorporated the emotional state of the patients, which can be integrated to provide efficient and personalized thoughts to the text conversion system.

3 Methods

Communication medium plays an important role in human-to-human (H2H) or human-to-machine (H2M) interaction. BCI aids paralytic patients, who cannot communicate, by providing solutions for H2H and H2M interaction. Existing BCI-based work in this domain, especially thought-to-text conversion, is limited in efficiency, accuracy, and number of words per minute. Till now the maximum of 12 WPM has been achieved with a non-invasive technique 17 .

Considering all the discussed challenges, EmoWrite integrates a dynamic keyboard with a circular arrangement of keys. The traversal in the proposed keyboard is controlled by mapping brain commands with facial expressions and using the built-in functions of the headset. It also predicts the next helping verb which is displayed on the right side of the screen. Moreover, the next word prediction is emotion-based as well as personalized. Emotion-based predictions of words assist paralytic patients by efficiently converting their thoughts to text. Furthermore, the machine learning algorithm keeps on retraining itself after a specific interval to predict only the latest and up-to-date words. Additionally, integrating the emotional states of patients with machine learning techniques enhances the performance and productivity of the system.

Implementation of EmoWrite aims at reducing the typing delay, increasing accuracy, typing speed, and convenience of the interface. The signals from the brain are acquired through EEG and decoded to convert thoughts to text after extracting information by processing the signals. The extracted information from the brain signals is then classified and mapped to the mental commands (e.g., thinking of left or right direction) or facial expressions (e.g., eye blink, frown, etc.) to perform specific tasks. Emotion state detection has been integrated with machine learning for better productivity of the system. The personalized dynamic arrangement of characters on the screen uses a language model (character sequence pair) and a machine-learning algorithm to show only the desired characters on the screen. Finally, the user gets visual feedback through the typed text shown on the screen and the machine learning algorithm also gets feedback from the user to help update its weights for future predictions. EmoWrite is comprised of the following four modules: 1) Data Acquisition 2) Data Processing 3) Basic Cognition, and 4) Communication Interface. The basic flow of the proposed system is shown in Figure 1.

3.1 Data Acquisition

The primary step in any BCI application is to gather data in the form of brain signals. The process of signal acquisition is performed with different techniques as discussed earlier. In this study, the non-invasive technique is employed, which is riskless and easy to handle. It includes collecting brain signals from the surface of the scalp with the help of an EEG headset. Different versions of dry and wet electrodes-based headsets including Emotiv Insight, Emotiv Epoc+, NeuroSky, MindWave, etc. are available. This study deploys a wet electrodes-based 14-channel Emotiv Epoc+ headset.

The data for this experiment was collected from 72 volunteers (55.6% male and 44.4% female), with a mean age of 29 years and a standard deviation of 6.5 years, at the Advanced Communication Networks Lab at COMSATS University Islamabad, Lahore Campus. None of the participants had any previous BCI experience. Before the data collection process, informed consent was obtained from all participants and the study protocol was approved and supervised by the Ethics Committee of COMSATS University Islamabad, Lahore Campus. All experiments were performed under relevant guidelines and regulations. The duration of the study was two years from 2019 to 2021.

3.2 Data Processing

The acquired brain signals comprise data on multiple mental activities, but EmoWrite focuses only on data regarding emotional states, mental commands, and facial expressions. To extract meaningful information, the data acquired through the EEG headset is processed using the built-in pre-processing and classification techniques of Emotiv Applications like Emotiv BCI 111Emotiv BCI.” https://www.emotiv.com/emotiv-bci/, Emotiv PRO 222https://www.emotiv.com/emotivpro/, and EmotivBrainViz 333https://www.emotiv.com/emotiv-brainviz/.

Refer to caption
(a) First Set of Characters.
Refer to caption
(b) Character set after selection of ’t’.
Figure 2: Keyboard Interface.

3.2.1 Emotion Detection

The proposed EmoWrite system utilizes the cortex API’s performance metrics to detect emotions 444https://emotiv.gitbook.io/emotivpro/data_streams/performance-metrics that furnishes crucial insights into a user’s cognitive state through the classification of six key metrics; stress, engagement, interest, excitement, focus, and relaxation. These metrics collectively contribute to the identification of four distinct emotional classes namely, happiness, sadness, anger, and calm. Upon ascertaining the detected emotional class, a correlation finder module identifies a collection of emotion-related words from multiple datasets that have been annotated with emotions 8 , 28 . To ensure data cleanliness, the dataset undergoes preprocessing, which involves the removal of extraneous spaces and symbols. Next, separate Recurrent Neural Network (RNN) models are trained for each emotional class. These models are designed to facilitate emotion-based predictions. The selection of contextualized words depends on the emotional states inferred, and these chosen words are then integrated into the list of predictive words. This comprehensive methodology leads to an emotion-sensitive word prediction system, enhancing the accuracy and contextuality of the predicted words.

3.2.2 User-centric Machine Learning Algorithm

A machine learning algorithm is used along with emotions to predict the next word. EmoWrite employs a Recurrent Neural Network (RNN) for predicting contextualized words. RNN has been proven to be the most efficient machine-learning algorithm 29 that provides consistent refinement in the system by requiring less feature engineering, which is a time-consuming task. It also effectively adopts new data and has parallel processing abilities. RNN has an advantage over other neural networks as all the inputs are dependent on each other, it keeps track of relations with previous words and helps in anticipating preferable output. Being an online algorithm, it updates itself after a specific period when new data is entered by the user.

3.2.3 AutoComplete

The autocomplete feature of the EmoWrite system predicts the word on a character-to-character basis. For this purpose, four different datasets are created, each containing words related to an emotional class. The words are predicted from the respective dataset as per the user’s emotional state.

3.2.4 Communication Interface

The communication interface comprises a GUI through which a user interacts with the system by selecting words or characters from the GUI through mapped brain signals. The dynamic arrangement of the virtual keyboard uses a character sequence pair to display the next set of characters depending on the last text entered. The RNN model is also used here to display characters according to the user’s typing style.

3.2.5 Feedback

Usually, feedback is given through a visual or auditory stimulus. EmoWrite uses visual feedback; the user gets feedback through previously written text which is then displayed on the top of the screen. This process provides continuous learning, by storing the latest written text in the database and then feeding it to the machine learning algorithm to update the model. Every time, the latest available model is used for prediction ensuring coherent predictions and helping the user in effective system manipulation and trouble-free embracing.

3.2.6 Interface Arrangement

To address the problems of GUI present in existing systems, a circular dynamic keyboard is designed to reduce traversing time, use screen space efficiently, and reduce the distance between characters 11 . Only limited characters are shown on the virtual keyboard at a time, and the next set of characters to appear on the screen is dependent on the previously accessed character. The appearance of the next set of characters uses the character sequence pair model and machine learning algorithm, where the character sequence pair estimates the probability of occurrence of character pairs i.e., it estimates the occurrence of certain characters against previously typed characters. This probability is calculated using approximately 3.2 million characters from seven English-language novels 555http://homepages.math.uic.edu/l̃eon/mcs425s08/handouts/char_freq2.pdf. Initially, the most used characters appear on the screen. For example, the first set of characters that appears on the keys is shown in Figure 2(a). After the selection of character ‘t’, the next set of characters appears on the screen, depending on the typed character ‘t’. The next character set is shown in Figure 2(b).

3.2.7 Emotiv Commands for Interface Navigation

Control commands of the brain are mapped to control traversal in the interface. The commands used to control the interface are shown in Table 2. Some mental states, emotional classes, and facial expressions are mapped with basic functionalities to navigate the interface. Mental states are used to control the navigation direction, and facial expressions while motor imagery movement is used to transfer focus from one section to another.

Table 2: List of Emotiv Commands for Interface Navigation
Commands Actions
Mental State Left Left movement
Right Right movement
Pull Up movement
Push Down movement
Facial Expression Smile Selection
Motor Imagery Look Right Focus shifts toward the Helping Verb section
Look Left Focus shifts toward the Prediction section

3.2.8 Conversion of Thought to Text

To convert thought to text, the mental commands are detected through the brain signals Issubscript𝐼𝑠I_{s}italic_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and compared with the given trained signals Ts={left,right,up,down}subscript𝑇𝑠𝑙𝑒𝑓𝑡𝑟𝑖𝑔𝑡𝑢𝑝𝑑𝑜𝑤𝑛T_{s}=\{left,\ right,\ up,\ down\}italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = { italic_l italic_e italic_f italic_t , italic_r italic_i italic_g italic_h italic_t , italic_u italic_p , italic_d italic_o italic_w italic_n }. I(t)siI{{}_{s_{i}}}(t)italic_I start_FLOATSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT ( italic_t ) is the input signal of the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT channel at time t𝑡titalic_t which will be compared with the trained signals.

M(t)=i=114Isi(t)Ts𝑀𝑡superscriptsubscript𝑖114subscript𝐼subscript𝑠𝑖𝑡subscript𝑇𝑠M\left(t\right)=\sum_{i=1}^{14}\frac{I_{s_{i}}(t)}{T_{s}}italic_M ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 14 end_POSTSUPERSCRIPT divide start_ARG italic_I start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG (1)

So, the mental commands M(t)𝑀𝑡M\left(t\right)italic_M ( italic_t ) at time t can be measured by dividing the extracted signal at time t𝑡titalic_t with the trained data. It can be illustrated in Equation 1, where i𝑖iitalic_i is the number of channels. The detected command will be:

Dc=M(t),>0.80D_{c}=M_{\propto}\left(t\right),\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \propto>% 0.80italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT ∝ end_POSTSUBSCRIPT ( italic_t ) , ∝ > 0.80 (2)

Here, proportional-to\propto is the threshold level or the confidence level. It should be greater than 0.80 for accurate detection of brain signals. The desired command changes the focus of the keyboard. Initially, the focus will be on the center of the keyboard i.e., the space key but after the detection of a mental command from Equation 2, the focus will change accordingly. B(Dc)𝐵subscript𝐷𝑐B(D_{c})italic_B ( italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) in Equation 3 is the button in the direction Dcsubscript𝐷𝑐D_{c}italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

B(Dc)={1DcTs0DcTsB(D_{c})=\left\{\begin{matrix}1&D_{c}\in T_{s}\\ 0&D_{c}\notin T_{s}\\ \end{matrix}\right.italic_B ( italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = { start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∈ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∉ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_CELL end_ROW end_ARG (3)

The “1” value means the focused key has a yellow color. In this way, the user can change focus on any key of the keyboard. After changing the focus, the focused character can be selected using Equation 4, where f𝑓fitalic_f is the user’s facial expression, chosen from the set FE={Blink,Wink,Surprise,Frown,Smile,Clench,F_{E}\ =\{Blink,\ Wink,\ Surprise,\ Frown,\ Smile,\ Clench,italic_F start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = { italic_B italic_l italic_i italic_n italic_k , italic_W italic_i italic_n italic_k , italic_S italic_u italic_r italic_p italic_r italic_i italic_s italic_e , italic_F italic_r italic_o italic_w italic_n , italic_S italic_m italic_i italic_l italic_e , italic_C italic_l italic_e italic_n italic_c italic_h , Laugh,Smirk}\ Laugh,\ Smirk\}italic_L italic_a italic_u italic_g italic_h , italic_S italic_m italic_i italic_r italic_k } and Fosubscript𝐹𝑜F_{o}italic_F start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is the frequency of occurrence defined as set Fo={once,twice,thrice}subscript𝐹𝑜𝑜𝑛𝑐𝑒𝑡𝑤𝑖𝑐𝑒𝑡𝑟𝑖𝑐𝑒F_{o}=\{once,\ twice,\ thrice\}italic_F start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = { italic_o italic_n italic_c italic_e , italic_t italic_w italic_i italic_c italic_e , italic_t italic_h italic_r italic_i italic_c italic_e }.

SL(Dc)={1f=Blink,fFEFo=twice0fBlinkS_{L}\left(D_{c}\right)=\left\{\begin{matrix}1&f=\mathrm{Blink},f\in F_{E}% \land F_{o}=twice\\ 0&f\neq Blink\\ \end{matrix}\right.italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = { start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_f = roman_Blink , italic_f ∈ italic_F start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ∧ italic_F start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_t italic_w italic_i italic_c italic_e end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_f ≠ italic_B italic_l italic_i italic_n italic_k end_CELL end_ROW end_ARG (4)

Equation 4 implies that the selection will occur only if the facial expression FEsubscript𝐹𝐸F_{E}italic_F start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT is a blink and it is done twice. The “1” value here means the selection of a character while “0” means no selection. The selected character/label SLsubscript𝑆𝐿S_{L}italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT will be written to the text area T𝑇Titalic_T.

U={e,t,a,o,i,n,s,r,h,l,d,c,u,m,f,p,U=\{e,\ t,\ a,\ o,\ i,\ n,\ s,\ r,\ h,\ l,\ d,\ c,\ u,\ m,\ f,\ p,italic_U = { italic_e , italic_t , italic_a , italic_o , italic_i , italic_n , italic_s , italic_r , italic_h , italic_l , italic_d , italic_c , italic_u , italic_m , italic_f , italic_p , g,w,y,b,v,k,x,j,q,z}\ g,\ w,\ y,\ b,\ v,\ k,\ x,\ j,\ q,\ z\}italic_g , italic_w , italic_y , italic_b , italic_v , italic_k , italic_x , italic_j , italic_q , italic_z } represents the set of alphabets placed according to their frequency of occurrence. so the circular keyboard contains the first 6 characters from the set U𝑈Uitalic_U. The labels on the keys can be represented by a matrix Disk𝐷𝑖subscript𝑠𝑘{Dis}_{k}italic_D italic_i italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT having two rows and four columns where rows represent the number of circles, and the columns represent the number of keys in each circle.

Disk=|etinaomore|𝐷𝑖subscript𝑠𝑘matrixmatrix𝑒𝑡𝑖𝑛matrix𝑎𝑜𝑚𝑜𝑟𝑒{Dis}_{k}=\left|\begin{matrix}\begin{matrix}e&t\\ i&n\\ \end{matrix}&\begin{matrix}a&o\\ \leftarrow&more\\ \end{matrix}\\ \end{matrix}\right|italic_D italic_i italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | start_ARG start_ROW start_CELL start_ARG start_ROW start_CELL italic_e end_CELL start_CELL italic_t end_CELL end_ROW start_ROW start_CELL italic_i end_CELL start_CELL italic_n end_CELL end_ROW end_ARG end_CELL start_CELL start_ARG start_ROW start_CELL italic_a end_CELL start_CELL italic_o end_CELL end_ROW start_ROW start_CELL ← end_CELL start_CELL italic_m italic_o italic_r italic_e end_CELL end_ROW end_ARG end_CELL end_ROW end_ARG | (5)

Initially, characters having the highest frequency will be displayed on the screen shown in Equation 5. After the selection of a specific character from Equation 4, the next set of characters that will appear on the screen will be dependent on the likelihood of occurrence of each alphabet after SLsubscript𝑆𝐿S_{L}italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and is determined using Equation 6 which calculates the probabilities of all alphabets to the typed (x𝑥xitalic_x).

PNextChar(x)=i=126P(xi)P(x)xSL,xiUformulae-sequence𝑃𝑁𝑒𝑥𝑡𝐶𝑎𝑟𝑥superscriptsubscriptproduct𝑖126𝑃subscript𝑥𝑖𝑃𝑥formulae-sequence𝑥subscript𝑆𝐿subscript𝑥𝑖𝑈PNextChar\left(x\right)=\prod_{i=1}^{26}\frac{P(x_{i})}{P(x)}\ \ \ \ x\equiv S% _{L},{\ \ \ \ x}_{i}\in Uitalic_P italic_N italic_e italic_x italic_t italic_C italic_h italic_a italic_r ( italic_x ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 26 end_POSTSUPERSCRIPT divide start_ARG italic_P ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_P ( italic_x ) end_ARG italic_x ≡ italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_U (6)

Suppose the selected label SLsubscript𝑆𝐿S_{L}italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT to be ‘e𝑒eitalic_e’ then the next character probabilities will be PNextChar(e)={a(0.01),b(0.0023),...,z(0.00)}PNextChar\left(e\right)=\{a\left(0.01\right),b\left(0.0023\right),.\ .\ .\ ,z(% 0.00)\}italic_P italic_N italic_e italic_x italic_t italic_C italic_h italic_a italic_r ( italic_e ) = { italic_a ( 0.01 ) , italic_b ( 0.0023 ) , . . . , italic_z ( 0.00 ) } and the display matrix Dise𝐷𝑖subscript𝑠𝑒{Dis}_{e}italic_D italic_i italic_s start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT will be:

Dise=|rdatsnmore|𝐷𝑖subscript𝑠𝑒matrixmatrix𝑟𝑑𝑎𝑡matrix𝑠𝑛𝑚𝑜𝑟𝑒{Dis}_{e}=\left|\begin{matrix}\begin{matrix}r&d\\ a&t\\ \end{matrix}&\begin{matrix}s&n\\ \leftarrow&more\\ \end{matrix}\\ \end{matrix}\right|italic_D italic_i italic_s start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = | start_ARG start_ROW start_CELL start_ARG start_ROW start_CELL italic_r end_CELL start_CELL italic_d end_CELL end_ROW start_ROW start_CELL italic_a end_CELL start_CELL italic_t end_CELL end_ROW end_ARG end_CELL start_CELL start_ARG start_ROW start_CELL italic_s end_CELL start_CELL italic_n end_CELL end_ROW start_ROW start_CELL ← end_CELL start_CELL italic_m italic_o italic_r italic_e end_CELL end_ROW end_ARG end_CELL end_ROW end_ARG |

Here, the first most probabilistic set of ‘e𝑒eitalic_e’ will be displayed, and if the desired character is not present in the set of displayed characters, the user can click the key with the label ‘more𝑚𝑜𝑟𝑒moreitalic_m italic_o italic_r italic_e’. The next probabilistic set will then be displayed.

Given the singular and plural helping verb Sin={is,Sin=\{is,italic_S italic_i italic_n = { italic_i italic_s , am,was,has,the}\ am,\ was,\ has,\ the\}italic_a italic_m , italic_w italic_a italic_s , italic_h italic_a italic_s , italic_t italic_h italic_e } and Plu={are,were,have,a,the}𝑃𝑙𝑢𝑎𝑟𝑒𝑤𝑒𝑟𝑒𝑎𝑣𝑒𝑎𝑡𝑒Plu=\{are,were,have,a,the\}italic_P italic_l italic_u = { italic_a italic_r italic_e , italic_w italic_e italic_r italic_e , italic_h italic_a italic_v italic_e , italic_a , italic_t italic_h italic_e }, the prediction of the helping verb Hvsubscript𝐻𝑣H_{v}italic_H start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT will be dependent on the written text T. Suppose S𝑆Sitalic_S and P𝑃Pitalic_P are the set of singular and plural words respectively, then helping verb is predicted using Equation 7.

Hv(T)={xTS,xSinyTP,yPluH_{v}(T)=\left\{\begin{matrix}x&T\in S,\ x\in Sin\\ y&T\in P,\ y\in Plu\\ \end{matrix}\right.italic_H start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_T ) = { start_ARG start_ROW start_CELL italic_x end_CELL start_CELL italic_T ∈ italic_S , italic_x ∈ italic_S italic_i italic_n end_CELL end_ROW start_ROW start_CELL italic_y end_CELL start_CELL italic_T ∈ italic_P , italic_y ∈ italic_P italic_l italic_u end_CELL end_ROW end_ARG (7)
Refer to caption
Figure 3: Total time required to type the 10 words using QWERTY and EmoWrite keyboards

The word prediction will be dependent on the context and emotional state of the user. The emotional state of the user depends on the valence EVsubscript𝐸𝑉E_{V}italic_E start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT and arousal EAsubscript𝐸𝐴E_{A}italic_E start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT. Valence is the negativity or positivity of emotion which can be measured by comparing hemispherical activation and arousal is the activation level of the brain.

EA(tfti)high,lowEV(tfti)positive,negativeDE=EA+EVmatrixsubscript𝐸𝐴subscript𝑡𝑓subscript𝑡𝑖𝑖𝑔𝑙𝑜𝑤subscript𝐸𝑉subscript𝑡𝑓subscript𝑡𝑖𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒subscript𝐷𝐸subscript𝐸𝐴subscript𝐸𝑉\displaystyle\centering\begin{matrix}E_{A}\left(t_{f}-t_{i}\right)\rightarrow high% ,low\\ E_{V}\left(t_{f}-t_{i}\right)\rightarrow positive,negative\\ D_{E}=E_{A}+E_{V}\end{matrix}\@add@centeringstart_ARG start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) → italic_h italic_i italic_g italic_h , italic_l italic_o italic_w end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) → italic_p italic_o italic_s italic_i italic_t italic_i italic_v italic_e , italic_n italic_e italic_g italic_a italic_t italic_i italic_v italic_e end_CELL end_ROW start_ROW start_CELL italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = italic_E start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT end_CELL end_ROW end_ARG (8)

tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT are the initial and final time respectively, and the detected emotion DEsubscript𝐷𝐸D_{E}italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT can be one of the four categories DE={happiness,sadness,anger,calm}subscript𝐷𝐸𝑎𝑝𝑝𝑖𝑛𝑒𝑠𝑠𝑠𝑎𝑑𝑛𝑒𝑠𝑠𝑎𝑛𝑔𝑒𝑟𝑐𝑎𝑙𝑚D_{E}=\{happiness,\ sadness,\ anger,\ calm\}italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = { italic_h italic_a italic_p italic_p italic_i italic_n italic_e italic_s italic_s , italic_s italic_a italic_d italic_n italic_e italic_s italic_s , italic_a italic_n italic_g italic_e italic_r , italic_c italic_a italic_l italic_m }. For the prediction of the next word, Recurrent Neural Network (RNN) is used. It is good at learning sequential and temporal data. It also learns the word-level features. The word prediction is based on previously written sentences of the n-word. Equation 9 gives us the probability of observing a sentence.

P(m1,...,mn)=j=1nP(mj|m1,...,mj1)P\left(m_{1},.\ .\ .,m_{n}\right)=\prod_{j=1}^{n}{P(m_{j}|m_{1},.\ .\ .,m_{j-1% })}italic_P ( italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , . . . , italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_P ( italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , . . . , italic_m start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ) (9)

First, the sequence of sentences will be converted into a sequence of words w where wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents a single word. Each word will be represented as a set of elements equal to the vocabulary size Vssubscript𝑉𝑠V_{s}italic_V start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and the sequence of words will become a matrix that will be given as an input to the RNN. Three parameters X𝑋Xitalic_X, Y𝑌Yitalic_Y, and Z𝑍Zitalic_Z are used which represent the input to a layer, output to a layer, and output towards the next state respectively. The equations of RNN are:

st=tanh(Xwt+Zst1)subscript𝑠𝑡subscript𝑋subscript𝑤𝑡subscript𝑍subscript𝑠𝑡1s_{t}=\tanh(X_{w_{t}}+Z_{s_{t-1}})italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_tanh ( italic_X start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (10)
Ot=softmax(Yst)subscript𝑂𝑡𝑠𝑜𝑓𝑡𝑚𝑎𝑥subscript𝑌subscript𝑠𝑡O_{t}=softmax(Y_{s_{t}})italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (11)

Here, stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the state at the time t𝑡titalic_t and Otsubscript𝑂𝑡O_{t}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the output at time t𝑡titalic_t. Using hidden layer H=100𝐻100H=100italic_H = 100, we have wtR8000subscript𝑤𝑡superscript𝑅8000w_{t}\in\ R^{8000}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 8000 end_POSTSUPERSCRIPT, OtR8000subscript𝑂𝑡superscript𝑅8000O_{t}\in\ R^{8000}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 8000 end_POSTSUPERSCRIPT, stR100subscript𝑠𝑡superscript𝑅100s_{t}\in\ R^{100}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 100 end_POSTSUPERSCRIPT, XR1008000𝑋superscript𝑅1008000X\in\ R^{100\ast 8000}italic_X ∈ italic_R start_POSTSUPERSCRIPT 100 ∗ 8000 end_POSTSUPERSCRIPT, YR8000100𝑌superscript𝑅8000100Y\in\ R^{8000\ast 100}italic_Y ∈ italic_R start_POSTSUPERSCRIPT 8000 ∗ 100 end_POSTSUPERSCRIPT, and ZR100100𝑍superscript𝑅100100Z\in\ R^{100\ast 100}italic_Z ∈ italic_R start_POSTSUPERSCRIPT 100 ∗ 100 end_POSTSUPERSCRIPT.

First, we apply forward propagation that will predict the word probabilities and return a state as output. Then we predict that results in the highest probability word. After predicting the word, we must calculate the loss, to check whether our prediction is correct or not. A loss should be minimal and can be calculated using equation 12 that shows the loss concerning the prediction O𝑂Oitalic_O and true label t𝑡titalic_t on words in the text (training example) W𝑊Witalic_W. The greater the difference between the output and the true label, the greater the loss.

L(t,O)=1WnWtnlogOn𝐿𝑡𝑂1𝑊subscript𝑛𝑊subscript𝑡𝑛𝑙𝑜𝑔subscript𝑂𝑛L\left(t,O\right)=-\frac{1}{W}\sum_{n\in W}{t_{n}logO_{n}}italic_L ( italic_t , italic_O ) = - divide start_ARG 1 end_ARG start_ARG italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ italic_W end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_l italic_o italic_g italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (12)

Let RNNW𝑅𝑁subscript𝑁𝑊RNN_{W}italic_R italic_N italic_N start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT be the words predicted by RNN and H𝐻Hitalic_H, S𝑆Sitalic_S, A𝐴Aitalic_A, and C𝐶Citalic_C be the sets that include words from the happiness𝑎𝑝𝑝𝑖𝑛𝑒𝑠𝑠happinessitalic_h italic_a italic_p italic_p italic_i italic_n italic_e italic_s italic_s, sadness𝑠𝑎𝑑𝑛𝑒𝑠𝑠sadnessitalic_s italic_a italic_d italic_n italic_e italic_s italic_s, anger𝑎𝑛𝑔𝑒𝑟angeritalic_a italic_n italic_g italic_e italic_r, and calm𝑐𝑎𝑙𝑚calmitalic_c italic_a italic_l italic_m classes respectively. Then the emotion-based word prediction using Equations 8 -11 is:

EmoPred(DE,Ot)={pDE=happy,p(HRNNW)qDE=sad,p(SRNNW)rDE=angry,p(ARNNW)sDE=calm,p(CRNNW)\footnotesize{EmoPred(D_{E},O_{t})}=\left\{\begin{matrix}\begin{matrix}p&D_{E}% =happy,p\in(H\cap{\rm RNN}_{W})\\ q&D_{E}=sad,p\in(S\cap{\rm RNN}_{W})\\ \end{matrix}\\ \begin{matrix}r&D_{E}=angry,p\in(A\cap{\rm RNN}_{W})\\ s&D_{E}=calm,p\in(C\cap{\rm RNN}_{W})\\ \end{matrix}\\ \end{matrix}\right.italic_E italic_m italic_o italic_P italic_r italic_e italic_d ( italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = { start_ARG start_ROW start_CELL start_ARG start_ROW start_CELL italic_p end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = italic_h italic_a italic_p italic_p italic_y , italic_p ∈ ( italic_H ∩ roman_RNN start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_q end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = italic_s italic_a italic_d , italic_p ∈ ( italic_S ∩ roman_RNN start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG end_CELL end_ROW start_ROW start_CELL start_ARG start_ROW start_CELL italic_r end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = italic_a italic_n italic_g italic_r italic_y , italic_p ∈ ( italic_A ∩ roman_RNN start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_s end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT = italic_c italic_a italic_l italic_m , italic_p ∈ ( italic_C ∩ roman_RNN start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG end_CELL end_ROW end_ARG (13)

4 Results

To prove the efficiency and productivity of EmoWrite, the system is evaluated based on parameters, accuracy, ease of use, and words per minute count. To check these parameters, the following experiments are performed.

Refer to caption
Figure 4: Average time required to type a word over 10 trials using EmoWrite keyboard

4.1 Keyboard Efficiency

To evaluate the efficiency of the dynamic keyboard, we use two keyboards in the experiment, i.e., (1) a QWERTY keyboard with scan-through keys, and (2) a Dynamic keyboard (EmoWrite). There are two rounds in this experiment. In the first round, the participants are asked to write 10 words (i.e., number, could, who, down, then, which, these, water, long, and about) using each keyboard, one by one. The starting time, when the user starts thinking of the command, and the ending time, when the participants are done with writing the word, is noted and the total time taken is calculated for each word. While in the second round, the participants are asked to write a whole sentence, in this case “the Brain-computer interface helps in processing brain signals”, using each keyboard, and its total time is also calculated. Figure 3 shows the total time required to type these words using both keyboards. It is observed that EmoWrite takes less time to type in all the words while the QWERTY keyboard with scan-through keys requires more time to type in all the words.

We also determined the time taken to type a complete sentence using the QWERTY and EmoWrite keyboards. It took the participants, on average, 7 minutes and 5 seconds to type a complete sentence using the QWERTY keyboard while it only took 2 minutes and 20 seconds to type the same sentence using EnoWrite.

Moreover, another experiment is performed to check the efficiency of the system that comprises 10 trials. In each trial, the participants are asked to write 10 words of 3 to 8-character length with the help of brain signals. The time to type each word and time per character is recorded. Time per character is calculated by dividing the time to type each word by the total number of characters. Then, the average time to write a word and a character is calculated. Figure 4 shows the average time to type a word over 10 trials. It is observed that the average time to type a word gradually decreases with each trial as the participants become more familiar and trained with the system in generating specific brain signals efficiently.

4.2 Words per Minute

To compare the performance of EmoWrite with existing approaches, we calculate the total number of words typed per minute. For this experiment, the participants are asked to write the following given sentences with the help of brain signals; i.e. “f you watch the hills in London you will realize what torture it is”, “it is so annoying when she starts typing on her computer in the middle of the night”, “sucks not being able to take days off from work”, “her hotel is restricting how the accounts are done adds a bit more pressure”, “I was thinking about how excited I am for you guys to move”. These sentences comprise 68, 83, 47, 77, and 57 characters respectively. The participants were timed for one minute. After the completion of one minute, the participants are asked to stop writing and a total number of words and characters per minute are recorded. Table 3 shows the comparison of the CPM and WPM of EmoWrite with previously proposed systems. 10 and 7 have implemented the next word prediction feature, while 12 implements the next character prediction feature and the remaining systems do not have any predicting features integrated. EmoWrite outperforms existing systems with the highest CPM and WPM.

Table 3: Comparison of CPM and WPM of EmoWrite with existing studies
Reference CPM WPM
Zhang et al.2 6.7 1
Alomari et al.6 7.0 1
Pandarinath et al.10 6.6 1
Hayet et al.7 12.0 2
Pathirana et al.12 25.0 5.1
EmoWrite 31.9 6.6

4.3 Information Transfer Rate

We calculate the information transfer rate (ITR) for each word and each command which is the total time taken over the total number of actions performed. ITR for commands and letters is calculated using Equations 14-15 where Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the total number of possible commands, CNsubscript𝐶𝑁C_{N}italic_C start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the number of commands required to write an N letter word, Nlsubscript𝑁𝑙N_{l}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the total number of characters in the keyboard, LNsubscript𝐿𝑁L_{N}italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the total number of letters in a word, and t𝑡titalic_t is the total time.

ITRc=log2(Nc).CN/tformulae-sequence𝐼𝑇subscript𝑅𝑐𝑙𝑜subscript𝑔2subscript𝑁𝑐subscript𝐶𝑁𝑡{ITR}_{c}={log}_{2}\left(N_{c}\right).C_{N}/titalic_I italic_T italic_R start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_l italic_o italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) . italic_C start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT / italic_t (14)
ITRl=log2(Nl).LN/tformulae-sequence𝐼𝑇subscript𝑅𝑙𝑙𝑜subscript𝑔2subscript𝑁𝑙subscript𝐿𝑁𝑡{ITR}_{l}={log}_{2}\left(N_{l}\right).L_{N}/titalic_I italic_T italic_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_l italic_o italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) . italic_L start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT / italic_t (15)

Table 4 shows the ITR for commands and letters. The average information transfer rate of commands is 87.55 bits/min and of letters is 72.52 bits/min. These information transfer rates are more than the rates of 22 , where the information transfer rate for commands is 62.71 bits/min, and for letters, it is 50.14 bits/min.

Table 4: Comparison of ITR in bits/minutes of EmoWrite with existing study
Ref. Commands Letters
Cecotti22 62.71 50.14
EmoWrite 87.55 72.52

4.4 Accuracy and Latency

To evaluate the accuracy of the EmoWrite system, we determine the difference between the intended target and the observed target. The intended target is the character the participants aim to select while the observed target is the character selected based on brain signals, potentially due to errors. In the experiment, participants are instructed to speak aloud about what they intend to write, and the system records the selected character. Each written character is then marked as either correct or incorrect. The accuracy is computed by calculating the total number of correctly written characters compared to the total number of characters attempted. The resulting accuracy of the proposed EmoWrite system is 90.36%.

Furthermore, to check the latency of EmoWrite, we measure the mean and standard deviation of delay (which is the time taken by the user when he/she starts thinking of a command until he/she performs it). The user is shown a certain command (left, right, up, or down), and the time is noted until the user successfully achieves that target. The mean delay of EmoWrite is 2.685 seconds, which is less than the delay of the system proposed by 30 which has a mean delay of 3 seconds.

4.5 Participant Stress Monitoring during System Interaction

To assess participant responses to the system, we monitored stress levels through facial expressions. If a participant exhibited signs of stress, such as frowning, it indicated potential difficulty with the system. When stress reached a predefined threshold, a notification was displayed. Analysis revealed that fewer than 20% of measurements indicated signs of stress, suggesting the system is generally well-received and does not induce significant stress. This approach provides valuable insights into participant reactions. However, we acknowledge that additional methods could further enhance our understanding of user experience.

4.6 Effectiveness of Integrating Emotion-based Prediction

The novelty of this system is the emotion-based predictions, and to check its effectiveness, we conduct an experiment consisting of two rounds. In the first round, the participants are asked to write some simple sentences without observing their emotional state. The sentences used in the first round are “I am tired of my job”, “I don’t like this world”, “That was awesome”, “I love this world”, and “I have an infection”. In the second round, the participants are asked to write these sentences again but with the integration of their emotional states. For this, the participants are first shown some video depicting an emotional state. The difference is observed in the word predictions concerning the emotional states i.e., happiness, anger, sadness, and calm. Figure 5 shows the difference in the average time required to type the sentences with and without emotional states. It is observed that using emotion-based prediction is more effective as it gives a prediction of emotion-related words like “horribly”, “ughhh”, “awesome” and “terrible”, and this requires less time to type.

Refer to caption
Figure 5: Average time required to type sentences with and without incorporating emotional states.

5 Discussion

This paper introduces EmoWrite, a system designed to aid paralytic individuals in converting their thoughts into text using a sentiment analysis approach. EmoWrite leverages EEG signals, facial expressions, and emotional states to facilitate effective communication. The system has demonstrated promising results in terms of accuracy and typing speed. However, the study acknowledges certain limitations and areas for improvement:

5.1 Training Time

One of the primary limitations is the extensive training time required for participants to effectively use EmoWrite. Each user must undergo a substantial training period to achieve a high level of proficiency with the system. This training involves learning to control the interface with mental commands and facial expressions, as well as adapting to the personalized predictive text system.

5.2 Failure Cases

The analysis of failure cases in this study is based on the accuracy of the results. Instances where the system makes incorrect predictions are evaluated to understand the underlying causes and improve the system’s robustness. These errors can be attributed to factors such as signal noise, user fatigue, and variations in individual brain signal patterns.

5.3 Generalizability

While EmoWrite has been tailored to suit individual users by incorporating their emotional states and typing patterns, its generalizability to a broader population with varying types of paralysis and cognitive conditions remains an area for further research.

5.4 User Interface

The dynamic and adaptive nature of the virtual keyboard in EmoWrite is a significant advancement, but the interface’s complexity can pose a challenge to new users. Simplifying the interface without compromising functionality could enhance user experience and reduce the learning curve.

5.5 Future Improvements

Future iterations of EmoWrite could benefit from several enhancements to further its effectiveness. Incorporating advanced machine learning algorithms could improve the system’s accuracy and reduce prediction errors. Additionally, exploring alternative EEG headsets with higher signal resolution and reduced noise may enhance the system’s overall performance and reliability.

Moreover, a direct measure of cognitive load should be integrated into the system’s evaluation to provide a more comprehensive understanding of usability. Improvements in user training protocols and refining the system’s accuracy and usability will also be key focus areas.

In conclusion, EmoWrite represents a significant advancement in assistive communication technologies for individuals with severe disabilities. While the current version demonstrates innovative potential through its integration of sentiment analysis with thought-to-text conversion, addressing its limitations through ongoing research and development will be crucial in maximizing its impact and enhancing the quality of life for users.

6 Conclusion

In summary, we have presented a pioneering approach to converting silent speech into text, revolutionizing interaction for individuals with paralysis. Our proposed solution leverages brain signals to establish a controlled interface, empowering paralytic patients to engage with the world. Key components of this interface include a dynamic circular keyboard, word prediction, and a segment dedicated to aiding verbs. Recognizing the critical impact of keyboard design on typing speed, we employ a circular layout to minimize traversal delays. The arrangement of characters dynamically displays a limited set on the screen, curtailing delay and enhancing typing speed. Integrating machine learning algorithms, we capture user writing patterns, facilitating predictive words, and aiding verb suggestions. Notably, emotion-driven predictions further streamline the user experience, enabling the auto-completion of entire words based on context. Our approach’s efficacy was verified through tests involving novice users, assessing parameters like Words Per Minute (WPM), user-friendliness, and system accuracy. The system’s standout features encompass the dynamic character arrangement, emotion-enhanced word predictions, and user-specific contextual character display. EmoWrite yields an impressive 90.36% accuracy in thought-to-text conversion, achieving 6.58 words and 31.92 characters per minute. Importantly, Information Transfer Rates (ITR) for commands and letters stand at 87.55 and 72.52 bits/min respectively, accompanied by a latency of 2.685 seconds. These results collectively underscore the potency of our innovative system, setting new benchmarks in enhancing communication efficiency and usability for individuals with motor disabilities.

Conflicting interests

”The authors have no conflict of interests.”

Ethics Declarations

“Prior to study initiation, written informed consent was obtained from all participants and the study protocol was approved by the Ethics Committee of COMSATS University Islamabad, Lahore Campus. All experiments were performed in accordance with relevant guidelines and regulations. Informed consent was obtained from all participants for publication of identifying information/images in an online open-access publication.”

Guarantor

”Not applicable.”

Contributorship

”Not applicable.”

References

  • (1) Carrington S and Birns J. Establishing capacity in a patient with incomplete locked-in syndrome. Progress in Neurology and Psychiatry 2012; 16(6): 18–20.
  • (2) Zhang X, Yao L, Sheng QZ et al. Converting your thoughts to texts: Enabling brain typing via deep feature learning of eeg signals. In 2018 IEEE international conference on pervasive computing and communications (PerCom). IEEE, pp. 1–10.
  • (3) Porbadnigk A, Wester M, Calliess J et al. Eeg-based speech recognition-impact of temporal effects. In International Conference on Bio-Inspired Systems and Signal Processing, volume 1. SciTePress, pp. 376–381.
  • (4) Wang Y, Wang P and Yu Y. Decoding english alphabet letters using eeg phase information. Frontiers in neuroscience 2018; 12: 62.
  • (5) Cecotti H. Spelling with non-invasive brain-computer interfaces-current and future trends. Journal of Physiology-Paris 2011; 105(1-3): 106–114.
  • (6) Uppal M, Gupta D, Juneja S et al. Enhancing accuracy in brain stroke detection: Multi-layer perceptron with adadelta, rmsprop and adamax optimizers. Frontiers in Bioengineering and Biotechnology 2023; 11.
  • (7) Abdulrazzaq MM, Ramaha NT, Hameed AA et al. Consequential advancements of self-supervised learning (ssl) in deep learning contexts. Mathematics 2024; 12(5): 758.
  • (8) Zafar A, Hussain SJ, Ali MU et al. Metaheuristic optimization-based feature selection for imagery and arithmetic tasks: An fnirs study. Sensors 2023; 23(7): 3714.
  • (9) Gupta A, Sahu H, Nanecha N et al. Enhancing text using emotion detected from eeg signals. Journal of Grid Computing 2019; 17: 325–340.
  • (10) Masud U, Baig MI, Akram F et al. A p300 brain computer interface based intelligent home control system using a random forest classifier. In 2017 IEEE symposium series on computational intelligence (SSCI). IEEE, pp. 1–5.
  • (11) Chen X, Wang Y, Nakanishi M et al. High-speed spelling with a noninvasive brain–computer interface. Proceedings of the national academy of sciences 2015; 112(44): E6058–E6067.
  • (12) Chen X, Chen Z, Gao S et al. A high-itr ssvep-based bci speller. Brain-Computer Interfaces 2014; 1(3-4): 181–191.
  • (13) Spüler M, Rosenstiel W and Bogdan M. Online adaptation of a c-vep brain-computer interface (bci) based on error-related potentials and unsupervised learning. PloS one 2012; 7(12): e51077.
  • (14) Higger M, Quivira F, Akcakaya M et al. Recursive bayesian coding for bcis. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2016; 25(6): 704–714.
  • (15) Akce A, Norton JJ and Bretl T. An ssvep-based brain–computer interface for text spelling with adaptive queries that maximize information gain rates. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2014; 23(5): 857–866.
  • (16) Cecotti H. A multimodal gaze-controlled virtual keyboard. IEEE Transactions on Human-Machine Systems 2016; 46(4): 601–606.
  • (17) Ben-Ami L and Bachelet I. A thought-operated digital random-access memory. Computational Intelligence and Neuroscience 2019; 2019.
  • (18) Alomari MH, AbuBaker A, Turani A et al. Eeg mouse: A machine learning-based brain computer interface. International Journal of Advanced Computer Science and Applications 2014; 5(4).
  • (19) Hayet I, Haq TF, Mahmud H et al. Designing a hierarchical keyboard layout for brain computer interface based text entry. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, pp. 1–6.
  • (20) Williamson J, Murray-Smith R, Blankertz B et al. Designing for uncertain, asymmetric control: Interaction design for brain-computer interfaces. International Journal of Human-Computer Studies 2009; 67(10): 827–841.
  • (21) Jarosiewicz B, Sarma AA, Bacher D et al. Virtual typing by people with tetraplegia using a self-calibrating intracortical brain-computer interface. Science translational medicine 2015; 7(313): 313ra179–313ra179.
  • (22) Pandarinath C, Nuyujukian P, Blabe CH et al. High performance communication by people with paralysis using an intracortical brain-computer interface. Elife 2017; 6: e18554.
  • (23) Arijit S, Chatterjee D and Sinha A. Evaluation of different onscreen keyboard layouts using eeg signals. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on. IEEE.
  • (24) Topal C, Benligiray B and Akinlar C. On the efficiency issues of virtual keyboard design. In 2012 IEEE International Conference on Virtual Environments Human-Computer Interfaces and Measurement Systems (VECIMS) Proceedings. IEEE, pp. 38–42.
  • (25) Pathirana S, Asirvatham D and Johar MGM. Designing virtual keyboards for brain-computer interfaces. In 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC). IEEE, pp. 1–6.
  • (26) Andi A, Rio R, Sugianti L et al. Short message service application by using brain control system and support vector machine (svm) on single channel electroencephalography (eeg). Journal of Telecommunication, Electronic and Computer Engineering (JTEC) 2018; 10(1-8): 135–138.
  • (27) Birbaumer N, Kubler A, Ghanayim N et al. The thought translation device (ttd) for completely paralyzed patients. IEEE Transactions on rehabilitation Engineering 2000; 8(2): 190–193.
  • (28) George K, Iniguez A and Donze H. Automated sensing, interpretation and conversion of facial and mental expressions into text acronyms using brain-computer interface technology. In 2014 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings. IEEE, pp. 1247–1250.
  • (29) Mackenzie IS and Felzer T. Sak: Scanning ambiguous keyboard for efficient one-key text entry. ACM Transactions on Computer-Human Interaction (TOCHI) 2010; 17(3): 1–39.
  • (30) Morooka T, Ishizuka K and Kobayashi N. Electroencephalographic analysis of auditory imagination to realize silent speech bci. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE). IEEE, pp. 683–686.
  • (31) Chaffar S and Inkpen D. Using a heterogeneous dataset for emotion analysis in text. In Advances in Artificial Intelligence: 24th Canadian Conference on Artificial Intelligence, Canadian AI 2011, St. John’s, Canada, May 25-27, 2011. Proceedings 24. Springer, pp. 62–67.
  • (32) Menger V, Scheepers F and Spruit M. Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text. Applied Sciences 2018; 8(6): 981.
  • (33) Müller-Putz GR, Scherer R, Brauneis C et al. Steady-state visual evoked potential (ssvep)-based communication: impact of harmonic frequency components. Journal of neural engineering 2005; 2(4): 123.