Academia.eduAcademia.edu

The social-cognitive dynamics of metaphor performance

2008, Cognitive Systems Research

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/222664873 Using hidden Markov model to uncover processing states from eye movements in information search tasks Article in Cognitive Systems Research · October 2008 DOI: 10.1016/j.cogsys.2008.01.002 · Source: DBLP CITATIONS READS 36 53 3 authors: Jaana Simola Jarkko Salojarvi 29 PUBLICATIONS 655 CITATIONS 112 PUBLICATIONS 2,511 CITATIONS University of Helsinki SEE PROFILE University of Helsinki SEE PROFILE Ilpo Kojo University of Helsinki 37 PUBLICATIONS 580 CITATIONS SEE PROFILE All content following this page was uploaded by Jaana Simola on 16 January 2017. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Available online at www.sciencedirect.com Cognitive Systems Research 9 (2008) 237–251 www.elsevier.com/locate/cogsys Using hidden Markov model to uncover processing states from eye movements in information search tasks Action editor: Rajiv Khosla Jaana Simola a,*, Jarkko Salojärvi b, Ilpo Kojo c a b Humanities Laboratory, Centre for Language and Literature, Lund University, S-22100 Lund, Sweden Adaptive Informatics Research Centre, Department of Information and Computer Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland c Center for Knowledge and Innovation Research, Helsinki School of Economics, P.O. Box 1210, FI-00101 Helsinki, Finland Received 2 September 2007; accepted 19 January 2008 Available online 14 April 2008 Abstract We study how processing states alternate during information search tasks. Inference is carried out with a discriminative hidden Markov model (dHMM) learned from eye movement data, measured in an experiment consisting of three task types: (i) simple word search, (ii) finding a sentence that answers a question and (iii) choosing a subjectively most interesting title from a list of ten titles. The results show that eye movements contain necessary information for determining the task type. After training, the dHMM predicted the task for test data with 60.2% accuracy (pure chance 33.3%). Word search and subjective interest conditions were easier to predict than the question–answer condition. The dHMM that best fitted our data segmented each task type into three hidden states. The three processing states were identified by comparing the parameters of the dHMM states to literature on eye movement research. A scanning type of eye behavior was observed in the beginning of the tasks. Next, participants tended to shift to states reflecting reading type of eye movements, and finally they ended the tasks in states which we termed as the decision states. Ó 2008 Elsevier B.V. All rights reserved. Keywords: Eye movements; Computational models; Hidden Markov model; Information search; Scanning; Reading; Decision process 1. Introduction Eye movements are commonly used as indicators of online reading processes because of their sensitivity to word characteristics. Empirical evidence supports this eye–mind link assumption: longer eye fixations have been observed together with misspelled words, less common words, or words that are unpredictable from their context (Rayner, 1998; Rayner & Pollatsek, 1989). However, reading studies typically concentrate on microprocesses of reading, such as studying how word features determine when and where the eyes move. Moreover, their analysis of eye * Corresponding author. E-mail addresses: jaana.simola@helsinki.fi (J. Simola), jarkko.salojarvi@tkk.fi (J. Salojärvi), kojo@hse.fi (I. Kojo). 1389-0417/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.cogsys.2008.01.002 movement data is often based on linear models that fail to consider eye movements as time series data and therefore do not account for variations within a task. Our contribution is to analyze the whole sequence of fixations and saccadic eye movements to gain an insight into how processing alternates during the reading task. In other words, we assume the reverse inference approach, and try to infer the hidden cognitive states from an observable eye movement behavior (see Poldrack (2006) for a discussion on the possible benefits and pitfalls of the approach within neuroimaging research). The relationship between eye movements and cognitive states is modeled with a discriminative hidden Markov model (dHMM). In our application, we use the dHMM to map the changes in statistical patterns of eye movements to changes of the hidden states of the model as participants proceed in information search 238 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 tasks. A hypothesis on the cognitive states corresponding to the hidden states can then be made by comparing the parameters of the hidden states (for example fixation durations and saccade lengths) to literature on eye movement research where the cognitive state is known. The states discovered by our model suggest that processing alternates along the completion of the tasks, even when the abstractness of the searched topics varies. The results can be used in practical applications. Earlier, Hyrskykari, Majaranta, Aaltonen, and Räihä (2000, 2003) have used the fact that fixations are longer during processing difficulties in order to develop an interactive dictionary that gives translation aid when it detects reading difficulties. However, detecting changes in processing states makes it possible to develop more advanced applications. For example, a proactive information retrieval application can search for more documents on a specific topic after detecting eye movements that indicate careful processing when a person is reading about that topic (see Puolamäki, Salojärvi, Savia, Simola, & Kaski (2005) for a feasibility study). The goal of the present article is to show that prerequisites for implementing such techniques exist. Previously, Carver (1990) has argued that readers use different processes in order to better accomplish their goals. They change their ongoing process either by instructions or by the difficulty of the text. Carver distinguishes five basic processes based on variations in reading rates, that is, the number of words covered by reading time (i.e. words per minute, wpm). The suggested processes are called scanning, skimming, ‘rauding’, learning and memorizing. Scanning is performed at 600 wpm and is used while the reader is searching for a particular word in a text. Another rapid and selective process is skimming (450 wpm), which is used in situations where the reader tries to get an overview of the content without reading through the entire text. ‘Rauding’ (300 wpm) corresponds to normal reading in which the reader is looking at each consecutive word of a text to comprehend the content. Learning is slow (200 wpm) and is used for knowledge acquisition. Memorizing is the slowest process (138 wpm) and involves continuous checks to determine whether the ideas encountered might be remembered later. According to Carver, the processes represent different cognitive processes and he suggests that readers shift between them, in a manner similar to drivers shifting gears. He also suggests that skilled readers vary their reading processes more than poor readers. The eye movement results indicate that when participants switched up, for example, from the ‘rauding’ to the skimming process, the mean fixation durations decreased together with the mean number of fixations and regressions (i.e. fixations back to previously read text). Also the length of forward saccades increases. On the other hand, switching down resulted in more regressions, longer fixation durations, and shorter saccade lengths. Carver suggests that the primary factor influencing reading rate is the selected reading process. Minor within-pro- cess variations result from the difficulty of the text and individual differences, such as age, practice or cognitive speed. Previous research indicates also between-individual differences in reading strategies (Hyönä, Lorch, & Kaakinen, 2002). 1.1. Models of eye movement control during reading Computational models on eye movement control during reading have been successful in explaining how various perceptual, cognitive and motor processes determine when and where saccades are initiated during reading. The current controversy is whether attention in reading is allocated serially to one word at a time, as suggested by the E-Z Reader model (Pollatsek, Reichle, & Rayner, 2006; Reichle, Pollatsek, & Rayner, 2006), or whether attention is spatially distributed so that several words are processed at the same time. This parallel hypothesis is supported for example by the SWIFT (Richter, Engbert, & Kliegl, 2006), the Glenmore (Reilly & Radach, 2006) and the Competition/Interaction (Yang, 2006) models. (For a review of the computational models of reading, see: Cognitive Systems Research, 2006, 7, pp. 1–96.) However, these models are limited in their ability to consider variations in higher level reading processes. The models mentioned above construct very specific hypotheses on the reading process and thus use tailored parameter values developed in accordance with what is previously known about human vision, such as the size of the visual span and variability in saccade and fixation metrics, as well as word recognition processes like the time for lexical access. Instead of fixing model parameters manually, the model parameters can also be learned from the data. The general idea is that information required for constructing a model is learned from the empirical data, for example the best model structure or the best parameter values. To avoid overfitting, the data is split into two subsets: training and testing data sets (see e.g. Hastie, Tibshirani, & Friedman (2001)). The best model and its parameters are selected using the training data, and then its generalization capability (i.e. how well the model fits new data) is tested using the test data. Feng (2006) has applied similar approach for modeling age-related differences in reading eye movements. 1.2. Purpose of the study Our goal is to investigate how processing changes as the participants proceed in three types of information search tasks: simple word search, question–answer task and finding subjectively most interesting topic. For this purpose, we combine experimentation with data-driven modeling using a discriminative hidden Markov model (dHMM). As a time series model it is well suited for our purposes because it provides a more comprehensive description of the eye movement pattern than the basic summary statistics such as average fixation duration. To capture the relationship J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 between language processing and eye movements, we model the observed time series of fixations and saccades by assuming latent states that are supposed to be indicators of the cognitive system that switches between different states of processing. We assume that in each processing state the statistical properties of the eye movement patterns are different. The best model topology, that is, the number of hidden states, is found by comparing several possible model topologies with cross-validation, and choosing the one that best explains unseen data. We also compare the parameter values of the model to what is previously known about reading and performance in other cognitive tasks. This information is used to make inference about processing during the tasks. Our approach is not committed to any particular processing theory. Therefore many of the theoretical issues discussed in eye movement models of reading (Pollatsek et al., 2006; Reichle et al., 2006; Reilly & Radach, 2006; Richter et al., 2006), such as the parafoveal preview and parafoveal-on-foveal effects, do not concern our model. Instead, the dHMM applied here describes how eye movement behavior varies during a single trial, and the states uncovered by the dHMM can be seen as hypotheses about the ongoing processes which are based on the statistical regularities of the eye movement data. 2. Data collection 2.1. Participants Eye movement data were collected from ten volunteers (6 female). The age range was 23–29 years, mean age 25.7 years, ðSD ¼ 1:9Þ. They had normal or corrected to normal vision and all of them were native speakers of Finnish. Participants filled in a written consent before the experiment. 2.2. Procedure Our tasks represented single online information search episodes where the user is inspecting listings returned by a search engine in order to find a topic of her interest. The task types were selected to fit the possible practical implementation, a proactive information retrieval application. The task of the participants was to find a target from a list of ten titles. The level of complexity of the searched topics was varied by having three different task types: 1. Word search (W): The task is to find a word from the list. 2. Question–answer (A): A question is presented and the task is to find an answer to the question from the list. 3. True interest (I): The participants are instructed to search for the most interesting title in the list. The trial structure was similar across the tasks (Fig. 1). First, the assignment was presented: The participants saw a 239 sentence instructing them to find either a word (W), an answer to a question (A), or the most interesting sentence (I), according to the condition. After the assignment, a list of sentences was presented, and the participants were instructed to view the list until they had found the relevant line. Eye movements were recorded during this period. After finding the relevant line, they pressed ‘enter’, and were shown the same sentences with line numbers. They then typed the number corresponding to the line they had chosen. Before the experiment, participants read the instructions and practiced each of the tasks. Each participant conducted a total of 150 assignments. The experiment was divided into 10 blocks, with 15 assignments in each block. Each task type was presented five times within a block. The presentation order of the blocks and the assignments within them was randomized. 2.3. Stimulus material The text material consisted of 500 online newspaper titles, revised to grammatical sentences. The maximum length of the sentences was 80 characters. On average, there were 5.8 words per sentence, and the mean word length was 9.9 characters. The sentences were divided to 50 lists of 10 sentences. To control for the effects of previous topic knowledge, the sentences were selected to represent three general topics: Finnish homeland news (20 trials), foreign news (20 trials) and business and finance news (10 trials). The texts were written in Finnish, and a 30-point Arial font was used. The average character height was 0.9 degrees and the average character width was 0.5 degrees from the viewing distance of about 60 cm. For the word search condition, fifty words were chosen as target words. The positions of the targets in sentences were balanced, i.e., the words appeared equally often as the first, second, third or fourth word of the sentences. For the question–answer condition, we prepared 50 questions, which were validated with a pilot test including five participants. We modified the questions and sentences, and tested them again with three new participants. Their answers agreed in 74% of the trials. The actual experiments were conducted with the modified questions and sentences. In word search and question–answer conditions, the locations of the correct lines were balanced so that the answers appeared equally often in all 10 sentence-lines. For the true interest condition, no additional stimulus preparations were needed. To emphasize the differences between tasks and to minimize stimulus-driven factors on processing, the same stimuli were presented in all three task types. In order to control for the possible effects of repetition, a set of analysis was carried out with repeated measures ANOVAs. We found no significant effect of presenting the same stimulus three times during the experiment on the number of fixations (F(2,18) = 2.86, ns:), average fixation durations (F(2,18) =.18, ns:) or saccade lengths (F(2,18) = 1.00, ns:) in an assignment. Therefore we did not have to consider the effect of stimulus repetition in our modeling work. 240 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 What is the importance of religion in Pakistan? A lutheran church was built to Petroskoi with collected funds The resignation of an important politician elicited controverial views The oldest person in the world, Kamato Hongo, died at the age of 116 The Pakistani security troops attacked Al-Qaida at the border In Pakistan, Islam affects all walks of life Pakistan informed about a succesful missilechurch test was built to Petroskoi with collected funds 1. A lutheran An attack to a refugee camp left 1500resignation refugees homeless 2. The of an important politician elicited controverial views Pakistan reported another missile testoldest person in the world, Kamato Hongo, died at the age of 116 3. The The death of a priest elicited fear in Pakistan 4. The Pakistani security troops attacked Al-Qaida at the border The fire fighting in California will at least Islam a week 5. take In Pakistan, affects all walks of life 6. Pakistan informed about a succesful missile test time 7. An attack to a refugee camp left 1500 refugees homeless 8. Pakistan reported another missile test 9. The death of a priest elicited fear in Pakistan 10. The fire fighting in California will take at least a week Fig. 1. An example stimulus presenting a question–answer task. The sentences are translated. The solid time line represents the time slot when the participants were instructed to find the relevant line and their eye movements were recorded. Participants proceeded in a self-paced manner, and the next trial began immediately after they typed in the line number corresponding to the selected line. 2.4. Apparatus The stimuli were presented on a 17 in. TFT display with a screen resolution of 1280  1024 pixels. The display was located on a table at the eye level of the participants, at the distance of approximately 60 cm. In order to maintain the life-likeness of our setup, no chin or forehead rests were used for stabilizing the heads of the participants. Eye movements were recorded by a Tobii 1750 remote eye tracking system with a spatial accuracy of 0.5°. The screen coordinates of both eyes were collected from each participant at 50 Hz sampling rate. The eye tracking system was calibrated between the experimental blocks using a set of 16 calibration points shown one at a time. left out from the raw data by the Tobii software, otherwise no editing of the eye movement data was carried out. The best fixation window parameters were determined using the logistic regression model (see Sections 3.1 and 3.4.1) and a 40-fold cross-validation (see Section 3.5) of the data. The procedure produced 40 perplexity values for left-out data with each of the fixation window parameter combinations. For the Tobii 1750 eye tracker, the fixation window parameters that resulted in best classification accuracy (p < :05, Wilcoxon signed rank test) of the left-out data sets were the 40 pixel window (corresponding to approx. 3.2 letter spaces) with the minimum fixation duration of 80 ms. 3. Modeling 2.5. Preprocessing Fixations were computed from the data using a windowbased algorithm by Tobii. Visualizations of measured gaze coordinates were used to choose fixation window parameters for further analysis. Based on the visual inspections we selected three candidate parameter setups: (i) a 20 pixel widow with a minimum fixation duration of 40 ms, (ii) a 40 pixel window with 80 ms fixation duration, and (iii) a 20 pixel window with 100 ms fixation duration. Blinks were The total data consisted of 1456 eye movement trajectories, that is, fixation–saccade sequences measured from each assignment. Forty-four trials were missing because no eye movements were obtained, for example due to double key pressings of the participants. The total data were randomly split into a training set of 971 trajectories and a test set of 485 trajectories. Throughout the analysis, we used a data-driven approach: the data was used for making decisions on differ- 241 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 ent modeling questions. Best model topology was selected by using cross-validation with the training data. Parameters of the best model were then learned using the full training data, and the generalization capability, i.e., how well the model fits unseen data, was tested with the test set. The reason for using test data is that with increasing model complexity, that is, with increasing number of parameters, the model will more accurately fit the training data. At some point this turns into overfitting, where increasing the model complexity will decrease the model performance on unseen data while the performance on training data set continues to increase. 3.1. Logistic regression In our experiment the ground truth for a given eye movement trajectory, that is, the information about the task type, was always available. Suitable models for such data belong to the general category of supervised or discriminative models. The simplest discriminative model is logistic regression (see Hastie et al., 2001), which predicts the probability of class (task type), conditional on covariates (the associated data) and parameters. The covariates are assumed to be given, that is, no uncertainty is associated with their values. The model is optimized by maximizing the conditional likelihood. However, logistic regression cannot model time series data. A common approach is to compute some form of statistics from the time series and then use these as covariates. We used logistic regression as a simple classifier to obtain baseline results for the HMM, and for selecting the best fixation window parameters. cross-validation. Each of the states addresses an associated observation distribution pðxjhs Þ, from which the data is generated. The parameters hs can be different for each state (e.g. Gaussian distributions having different means and standard deviations). The changes in the distributions of the observations are thus associated with transitions between hidden states. The transitions are probabilistic, and defined by a transition matrix B. We assume a firstorder Markov property for the transitions, that is, we assume probabilities of form pðsðt þ 1ÞjsðtÞÞ; the transition to the next state sðt þ 1Þ depends only on the current state sðtÞ. Pieters, Rosbergen, and Wedel (1999) showed that eye movements follow this property. Additionally, this restricts the number of parameters in the model, making modeling computationally more efficient. A full definition of HMMs requires one more set of parameters, pðsÞ; s ¼ 1; . . . ; S, which is the probability of initiating the time sequence at state s. An example topology of an HMM is illustrated in Fig. 2. For a time series x1;...;T of observations the full likelihood of the HMM is then X pðx1;...;T jHÞ ¼ pðsð1ÞÞpðxð1Þjsð1ÞÞ S  T Y pðxðtÞjsðtÞÞpðsðtÞjsðt  1ÞÞ; ð1Þ t¼2 where S denotes all ‘‘paths” through the model, that is, all S T combinations of hidden states for a sequence of length T, and xðtÞ is the measured observation vector at time t. 0.16 0.91 3.2. Hidden Markov models To analyze the fixation–saccade sequence as a time series we used Hidden Markov model, which is commonly used for analyzing sequential data, such as speech (see e.g. Rabiner (1989) for an introduction on HMMs). The HMMs belong to the general category of generative joint density models which attempt to describe the full process of how the data is being created, that is, they do not use covariates. Whereas fully discriminative models concentrate only on separating different classes, and thus provide no physical interpretation of the parameter values, the parameters of a joint density model can be associated with the data, giving an insight into the underlying process, assuming that the model describes the data accurately enough. HMMs are optimized by maximizing the log-likelihood, log pðC; X jHÞ, of the data C [ X , given the model and its parameters H. Here X is the observation sequence, eye movement trajectory, associated with class C, the task type. HMMs are applied in a case where the statistical properties of the signal change over time. The model explains these changes by a switch of a hidden (unobservable, latent) state s within the model. The total number S of hidden states can be learned from the data, for example by 0.88 s 0.32 0.01 0.99 r 0.01 0.02 d 0.11 0.08 0.00 W 0.10 π 0.89 0.89 s 0.20 0.05 0.95 r 0.03 0.02 d 0.06 0.02 0.08 A 0.02 0.15 0.03 0.03 s 0.04 0.94 0.07 0.94 0.02 r d 0.12 0.86 I Fig. 2. The transition probabilities and topology of the discriminative hidden Markov model. Hidden states are denoted by circles, transitions among hidden states by arrows, along with their probabilities. The beginning of the sequence is denoted by p. The capital letters on the right denote the sections of the HMM that were assigned for each of the tasks (W = word search, A = question–answer, I = true interest), small letters within the hidden states denote the names of the hidden states (s = scanning, r = reading, d = decision). 242 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 Maximum likelihood parameter values of the HMMs are obtained with the Baum–Welch (BW) algorithm, a special case of Expectation–Maximization (EM) algorithm, which can be proven to converge to a local optimum. Fast computation of the most probable path (hidden state sequence) through the model, given a new data sequence, is obtained using the Viterbi algorithm. Previously, Liechty, Pieters, and Wedel (2003) applied hidden Markov models to study two states of covert attention, local and global attention. They showed that viewers were switching between the attention states while they were exploring print advertisements in magazines. The local visual attention state was characterized by short saccades, whereas in the global attention state, longer saccades were common. In another line of research, Salojärvi, Puolamäki, and Kaski (2005b) showed that perceived relevance of a text could be predicted from eye movements in an information search task. 3.3. Discriminative hidden Markov models A generative model can be converted to a discriminative model by optimizing the conditional likelihood of the model log pðCjX ; HÞ, obtained from a generative model via Bayes formula. Compared to a fully discriminative model (such as logistic regression), the converted model still has the benefits of a generative model, such as easier interpretation of model parameters (see Salojärvi, Puolamäki, & Kaski (2005c) for a description of the differences). Discriminative training of HMMs is carried out by assigning a set of ‘‘correct” hidden states Sc in the model to always correspond to a certain class c, and then maximizing the likelihood of the state sequences that go through the ‘‘correct” states for the training data, versus all the other possible state sequences S in the model (Povey, Woodland, & Gales, 2003; Schlüter & Macherey, 1998). The parameters of a discriminative HMM (dHMM) are optimized with a discriminative EM (DEM) algorithm, which is a modification of the original BW algorithm (the derivation of the algorithm is in Salojärvi, Puolamäki, & Kaski (2005a)). 3.4. Feature extraction 3.4.1. Features for logistic regression model The logistic regression was used as a baseline to a HMM. It uses averaged features that can be derived from the fixation–saccade time sequence, that is, it obtains the same information as the HMM. The features were: (1) (2) (3) (4) (5) Length of the sequence (number of fixations). Mean of fixation duration (in ms). Standard deviation of fixation duration. Mean of saccade length (in pixels). Standard deviation of saccade length. 3.4.2. Features for hidden Markov model For the time series model, four features of each fixation were computed from the eye movement trajectory, that is, from the raw fixation–saccade data from each assignment. The features are listed below with the corresponding modeling distribution (the distributions denoted by pðxjsÞ in Eq. (1)) reported in parenthesis. See e.g. Gelman, Carlin, Stern, and Rubin (2003) for the parametric form of the distributions. (1) Logarithm of fixation duration in milliseconds (onedimensional Gaussian). (2) Logarithm of outgoing saccade length in pixels (onedimensional Gaussian). (3) Outgoing saccade direction (quantized to four different directions) + a fifth state indicating that the trial had ended (Multinomial). (4) Indicator variable of whether there have been previous fixations on the word which is currently fixated (Binomial). In literature (e.g. Reichle et al. (2006)), a gamma distribution has often been used for modeling fixation durations, because its negatively skewed distribution resembles the data. There are two alternatives to implement this. In the first version, the data sequence is indexed by time, and thus the hidden state sequences are directly mapped into fixation durations (Liechty et al., 2003), and therefore the probability of staying in state s must follow a gamma distribution. However, in ordinary HMMs this probability follows an exponential rather than gamma distribution, and therefore a semi-hidden Markov model needs to be implemented, where the transition probabilities depend on the time spent in the current hidden state. We here applied the second alternative. We constructed a HMM that emitted the fixation durations, changing the time scale of the HMM into fixation counts. Instead of having a HMM that is in state s for the time t . . . t þ s, we now have a HMM that is in state s for fixation i, which has the duration s. We then make a simplifying assumption by modeling the logarithm of fixation durations with a Gaussian. Further work could include extending this model to a mixture of two log-normal distributions, since this has been found to work well for reading fixations (Carpenter & McDonald, 2007). The saccade lengths were quantified as pixels and were not converted to more conventional measures, such as characters or degrees during computations, because conversions would have added noise to data (since the Tobii 1750 allows free head movement). Saccade lengths were computed from the raw 50 Hz gaze data by computing the distance between the gaze location at the end of the previous fixation and the beginning of the current fixation. The spatial accuracy of the eye tracker was 0.5° corresponding to approximately 12 pixels. For saccade quantization, each fixation was first mapped to the closest word in the preprocessing stage. The outgoing saccade direction was then encoded with an 243 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 indicator variable that can obtain five different values: 1 – saccade forward on the current line of text, 2 – saccade upwards from the current line, 3 – saccade backwards on the current line, 4 – saccade downwards from the current line and 5 – ending the assignment. 3.5. Model selection When choosing fixation window parameters or the number of hidden states of the HMM, an n-fold cross-validation with the training data was carried out. In this procedure, the training set is divided into n non-overlapping subsets, and each of the subsets is in turn left out as a validation data set. The training is carried out using the other n  1 subsets, and then the generalization capability of the model is tested with the validation set. The procedure is carried out for all alternative modeling configurations. The method produces n paired measures of goodness of model fit, calculated from validation data, allowing us to test the out-of-sample performance of the model configurations. The reason for using cross-validation is to avoid overfitting, i.e., choosing a too complex model. Alternative methods for model selection include a computationally much heavier bootstrap method (Efron & Tibshirani, 1993), or using information theoretic criteria (Akaike, 1974; Schwartz, 1978). The latter however are not theoretically justified in case of HMMs, see e.g. Robertson, Kirshner, and Smyth (2004), and the references therein. Goodness of the model was measured in two ways; in terms of classification accuracy and perplexity. Classification accuracy is the amount of correctly predicted task types divided by the total amount of tasks. However, for relatively small data sets, the classification accuracy is a noisy measure, since each sample can be assigned to only one class. A better measure is therefore the perplexity of the test data set, which measures the confidence in the predictions of the classifier. It is defined as a function of the average of log-likelihoods L of the N s test data sequences, denoted formally by PN s 1 perp ¼ eN s i¼1 Li ; Li ¼ log pðci jxi1;...;T i ; hÞ; ð2Þ where xi1;...;T i denotes the ith sequence of observations of length T i , and ci is the type of task i. N s is the number of sequences, and h the model parameters. The best possible perplexity is 1, where the correct task type is predicted with a probability 1. On the other hand, perplexity of 3 corresponds to random guessing with a probability of 13 for each of the task types. In our data analysis, the class distribution was not equal within the training and test sets. This was mainly due to random split of the data, and in part due to missing eye movement trajectories. If these are taken into account, the random perplexity for the test set is 3.01. If perplexity is greater than this the model is doing worse than random guessing. In the worst case where the classifier gives a (close to) zero probability for the correct class, the perplexity is restricted to a maximum value of 1022 . 4. Results 4.1. Logistic regression The results of the logistic regression are reported in Table 1. The perplexity of the test set was 2.42 with a classification accuracy of 59.8%. 4.2. Discriminative hidden Markov model All modeling with HMMs was carried out in a data-driven fashion. The topology of a HMM was fully connected, that is, transitions between all states were possible. All parameter values were learned from data by maximizing the conditional likelihood. The number of hidden states in the dHMM was determined with a 6-fold cross-validation. The different hidden state configurations that were tried out were S 2 {2-2-2, 2-2-3, 2-3-3, 3-3-3, 3-3-4, 3-4-4, 44-4}, corresponding to the number of hidden states used for modeling word search, question–answer and true interest conditions, respectively. The scheme for increasing the number of hidden states in the HMM was decided after observing that the eye movement trajectories were usually longest in the true interest condition and then in the question–answer condition. The number of hidden states was decided as in Robertson et al. (2004) by comparing the mean of perplexities of validation sets. The decrease of out-of-sample perplexities started to level off when the number of hidden states was nine, suggesting that this is the optimal number of hidden states. Since the variance of conditional maximum likelihood estimates is larger than maximum likelihood estimates (Nádas, 1983), we additionally compared the paired perplexity values for eight, nine, and ten hidden state configurations with a Wilcoxon signed rank test. The difference between the 8-state and 9-state models was statistically significant ðp < :05Þ, whereas the difference between 9-state and 10-state models was not. Since the data does not support the preference of a 10-state model over a 9-state model, the less complex model should be preferred. The model with nine hidden states is obtained also when Table 1 Confusion matrix from the test data, showing the number of assignments classified by the logistic regression into the three task types (columns) versus their true task type (rows) Prediction W (77.2%) A (28.3%) I (70.6%) W (66.2%) A (45.3%) I (60.0%) 139 55 16 23 43 29 18 54 108 The diagonal contains the number of correctly predicted assignments. The percentages (in parentheses) denote row- and column-wise classification accuracies. The row-wise accuracy shows the percentage of correctly predicted assignments for the given task type, the column-wise accuracy shows the percentage of correctly predicted task types, given the prediction. 244 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 using a majority vote-based model selection scheme (Miloslavsky & van der Laan, 2002). The 9-state HMM achieved the perplexity 2.32 and classification accuracy of 60.2% for the test data. The confusion matrix of the dHMM is reported in Table 2. Both logistic regression and dHMM could separate the two extremes, word search and true interest, but predicting the question–answer-tasks is difficult. One possible reason is that some of the question–answer assignments were easier than others. The search behavior in easy assignments may have resembled the fixation patterns in word search task (in case where the question can be answered with one word), whereas difficult question–answer assignments were confused with the task of indicating subjective interest. 4.2.1. Comparing the classification accuracies and perplexities If the time series of the eye movement data contains information about the task type, the dHMM should perform better than the logistic regression model using averaged features. The perplexity of the test set for dHMMs was 2.32, whereas logistic regression achieved the perplexity of 2.42. The dHMM was significantly better than logistic regression (p < :01, comparison of perplexities with a Wilcoxon signed rank test). The time series of the eye movements therefore contained relevant information for determining the task type. 4.3. Interpreting HMM parameters Proper interpretation of the parameters of a discriminatively trained joint density model (e.g., a dHMM) is still somewhat of an open question. Based on asymptotic analysis (with infinite data), following can be said. Ordinary maximum likelihood training of a joint density model minimizes the Kullback–Leibler divergence (Cover & Thomas, 1991) between the data and the model parameters. This can be seen by considering the data to be generated from a ‘‘true”, however unknown, model with model parameters ~ h. In practise the model is always an approximation of the ‘‘truth”, and therefore the model will not fit perfectly to the data (if it were perfect, it should predict Table 2 Confusion matrix showing the number of assignments classified by the discriminative HMM into the three task types (columns) vs. their true task type (rows) Prediction W (78.9%) A (35.5%) I (62.8%) W (70.0%) A (50.0%) I (57.5%) 142 43 18 22 54 39 16 55 96 The percentages (in parentheses) denote row- and column-wise classification accuracies. all unseen data perfectly) This incorrectness causes a bias in the obtained model parameters h. Discriminative training, on the other hand, maximizes conditional likelihood which minimizes the Kullback–Leibler divergence between a subset of variables in the data and the model parameters. As a result, this subset (here the task types) is modeled as well as possible. A tradeoff is that other variables of the data are modeled more inaccurately. However, in an asymptotic case with infinite amount of data, and where the ‘‘true” model is within our model family, the parameters are the same as those obtained from maximum likelihood. In case of an incorrect model, by inspecting the gradient of the conditional likelihood (proof omitted), it can be shown that the conditional maximum likelihood and the maximum likelihood estimates are close to each other (and asymptotically the same) when (i) the model is close to the true model or (ii) the class predictions of the model are accurate, but the particular parameters do not help in discriminating between the classes. In these cases, the parameters can be interpreted as in an ordinary joint likelihood model. From this point of view, a straightforward way of interpreting parameter values is therefore to report and compare the parameter values from conditional and ordinary maximum likelihood. If the values are same, the data does not contain additional information that can be used for more accurate prediction of the task type. On the other hand, if the two parameter estimates differ, it implies that the variables that they model help in predicting the task type, and their modeling assumptions are incorrect. This fact can be used for checking and revising the model. The revised model has to be checked afterwards with new data. In our experiment the parameters of the discriminative and joint density HMMs (Table 3) are roughly the same, suggesting that our model uses the information that eye movements contain on task types fairly well. The greatest discrepancy between the parameter values follows from the log-Gaussian approximation of the fixation distributions, which was to be expected (as discussed in Section 3.4.2). The difference between the two parameter estimates also shows that the fixation durations are important in predicting the task type. We next discuss modeling results of each set of parameters of HMM. Analysis is carried out with conditional maximum likelihood parameters; maximum likelihood parameters can be analysed in a similar manner, with approximately similar results. 4.3.1. Observation distributions and hidden states The discriminative hidden Markov model that best fitted our data segmented each task type into three states (Fig. 2). The parameter values of the dHMM (Table 3) exhibited relatively similar eye behavior in the three hidden states for each of the task types. Next, we compared the parameter values to literature on reading and other cognitive tasks, and designated the states to describe the processing features that were reflected in the eye movement behavior. 245 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 Table 3 Discriminative HMM parameter values for scanning, reading and decision states for each task type (corresponding maximum likelihood estimates in parentheses) Scanning Reading Decision 32% (17) 20% (21) 15% (17)  100 134 (125)  180 68 166 (155) 409 16% (15) 10% (12) 7% (8)  140 199 (187)  284 67 132 (159) 259 0% 0% 0% 31% (34) 22% (21) 19% (16) 28% (28) 1% (0) 61% (53) 6% (9) 15% (15) 18% (23) 0% 39% (22) 6% (3) 20% (36) 17% (2) 18% (37) 23% (25)  99 134 (129)  182 60 160 (156) 422 24% (15)  141 205 (204)  299 74 133 (141) 239 78% (64)  96 177 (173)  323 48 (133) 137 391 37% 21% 16% 27% 0% 63% (63) 5% (5) 12% (12) 20% (20) 0% 33% 14% 27% 10% 16% 28% (25)  97 134 (125)  184 57 160 (165) 452 26% (21)  138 200 (196)  291 73 (131) 128 226 86% (83)  95 176 (169)  326 48 (135) 133 365 Saccade direction Forward Upward Backward Downward End assignment 41% 21% 13% 26% 0% 61% (61) 7% (7) 13% (14) 19% (18) 0% 37% (38) 15% (16) 30% (25) 11% (14) 7% (8) Previous fixations = true 27% (28) Probability of beginning the task Word search Question–answer True interest Word search observations Fixation duration (ms) Saccade length (pix) Saccade direction Forward Upward Backward Downward End assignment Previous fixations = true Question–answer observations Fixation duration (ms) Saccade length (pix) Saccade direction Forward Upward Backward Downward End assignment Previous fixations = true True interest observations Fixation duration (ms) Saccade length (pix) (39) (20) (14) (26) (43) (19) (11) (26)  92 171 (219)  320 54 132 (120) 319 (35) (15) (26) (12) (11) 24% (26) 86% (88)  lr In saccade lengths, 160 pixels approximates to 13 letters. Standard deviation r is reported with respect to mean l by , where applicable (67% of the lþr probability mass is within this interval). With a combined probability of 67% (Table 3 and Fig. 2), participants began the assignments from states which we termed as scanning, because the parameters suggested rather long saccades, with no clear preference on direction (i.e., almost random), and fewer saccades towards previously fixated areas. The fixation durations were relatively short (approximately 135 ms), which is in accordance with previous results indicating shorter fixations in association with easier tasks (Rayner, 1998). On average, participants spent 2.8 s scanning (Table 4). The second set of states were labeled as reading, because they were characterized by frequent forward saccades (over 60% probability) with an average fixation duration of about 200 ms, also typical for reading. The percentage of backward saccades was 12–15%, corresponding to the previous findings suggesting that in normal reading about 10– 15% of saccades are regressions (Rayner, 1998). The average saccade length was 10.3–10.7 letters (128–133 pixels), which corresponds to the average length of a word (9.9 characters), plus a space between words. Frequent forward and backward saccades were typical for the third and final states (Table 3). The percentage of backward fixations (20–30%) was twice the amount usually observed in reading. Saccade lengths were approximately 10.7 letters (133 pixels), corresponding to the length of a word, and occurred within the same line (with 75% probability). The fixations landed to previously fixated words with 78–86% probability. On average, the fixation durations (175 ms) were shorter than in reading states. This is possibly due to the fact that participants were mostly fixating on words which they had recently seen, and therefore the lexical access took less time. We termed the third states 246 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 Table 4 Expected dwell times and standard deviations in scanning, reading and decision states, plus times before and after reaching the decision state, along with the mean percentages of prevalence of the states W A I Mean Stdev. Mean Stdev. Mean Stdev. Total T 4.1 ± 0.4 3.1 ± 0.5 8.5 ± 1.2 7.1 ± 1.8 11.6 ± 1.1 6.7 ± 0.9 T in scanning T in reading T in decision 2.2 ± 0.3 4.3 ± 0.7 0.7 ± 0.1 1.8 ± 0.4 3.2 ± 0.7 0.8 ± 0.4 2.8 ± 0.4 6.1 ± 1.0 1.4 ± 0.4 2.3 ± 0.3 5.1 ± 0.8 2.9 ± 1.8 3.4 ± 0.4 6.2 ± 0.8 1.8 ± 0.3 2.4 ± 0.2 5.2 ± 0.6 2.0 ± 0.4 T to decision T after decision 3.4 ± 0.4 0.8 ± 0.2 2.8 ± 0.5 1.0 ± 0.4 6.1 ± 0.7 2.5 ± 0.7 4.5 ± 0.6 4.8 ± 1.6 8.0 ± 0.7 3.6 ± 0.8 4.6 ± 0.7 5.1 ± 1.0 % in scanning % in reading % in decision 51 ± 6 33 ± 6 16 ± 2 40 ± 2 41 ± 2 15 ± 2 47 ± 6 38 ± 6 15 ± 2 40 ± 2 41 ± 2 15 ± 2 47 ± 6 38 ± 6 15 ± 2 40 ± 2 41 ± 2 15 ± 2 Values are computed from the observation trajectory which was segmented using the Viterbi algorithm on dHMM. Capital letters denote the tasks (W = word search, A = question–answer, I = true interest), and units are in seconds. Error estimates () are 95% confidence intervals, obtained with a bootstrap method with 400 replicate data sets. as decision states, because the features indicated a lot of rereading of the previously seen lines. Almost without exception, participants ended the assignments while they were in the third states. This pattern is visible in Fig. 4. Shimojo, Simion, Shimojo, and Scheier (2003) have reported similar results in the context of preference decisions made for faces. They also showed that participants tended to look more often at the target they chose just before they made their decisions. One potential concern regarding the comparisons of parameters with previous reading studies, for example those reviewed by Rayner (1998), is that the participants may have varied their processing states also in the reviewed tasks. However, as brought out by Hyönä et al. (2002), in many reading studies, factors such as global reading strategies have been treated as a nuisance, and their influence is minimized by studying reading under simplified conditions (i.e. using brief and simple texts for very simple purposes). Therefore it is likely that previous results mostly reflect rather ‘pure’ types of processes. 4.3.2. Transition probabilities The transition probabilities of the dHMM are shown in Fig. 2. Within state transitions indicate that participants continued in the same processing state for several steps (i.e., fixations), indicating that the associated cognitive processes operate on time scales longer than one fixation. Similarly, previous research suggests that the ongoing processes are not reset after every saccade, but their influence survives across saccades (Yang & McConkie, 2005). An estimate of these time scales was next obtained with the dHMM. 4.3.2.1. Method. The most probable state sequence for each eye movement trajectory was computed by applying Viterbi algorithm to the learned HMM. The means and standard deviations of the process durations (Table 4) were computed from the data using the state segmentation obtained from the dHMM. The mean is the average time spent in a state, and standard deviation describes how the time varies in individual cases. An error of the two estimates, i.e., how accurate the estimates are given in our (finite) data sample, is obtained with a bootstrap method (Efron & Tibshirani, 1993). We generate 400 replicate (bootstrap) data sets by sampling from the original data with replacement. For each of the replicate data sets a bootstrap estimate was computed (e.g. the mean). The error is now the standard deviation of the 400 bootstrap estimates computed with respect to the original estimate. 4.3.2.2. Results. Table 4 shows that the times spent in each of the states did not differ considerably across the task conditions. On average, participants spent more time in scanning and reading than in decision states. The decision times were two times longer for the question–answer and for the subjective interest conditions than for the word search, where the assignment was ended approximately 1 s after reaching decision state. This corresponds to the duration of making the decision, because the participants did not go back to scanning or reading states, unlike in other conditions. Also, the time to reach the decision state increased with the task complexity. 4.3.3. Transitions between states Fig. 2 shows that in the word search condition, transitions from the decision state are rare, with only 1% probability, whereas in the question–answer condition these transitions occur with 5% probability and in the subjective interest condition with 14% probability. In the word search and question–answer conditions, participants switched more often from scanning to decision (with 80% probability) than to reading (20% probability). This can be seen from Fig. 2 by comparing the associated transition probabilities (8% vs. 2%). From reading, they shifted to the decision state. In word search, this probability was 92% (11% vs. 1%), and in the question–answer condition 55% (6% vs. 5%). In the true interest condition, there was a strong J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 tendency to switch from decision to reading with 86% probability (12% vs. 2%). 4.3.4. Eye movement trajectories When combining the most probable (Viterbi) path through the hidden Markov model with the interpretations of the hidden states, it is possible to make hypotheses on the switches of the cognitive states during an assignment. An interesting further study would be to map these switches to text contents. Fig. 3 shows example trajectories for the task types, plotted on the screen coordinates (stim- W A I Fig. 3. Examples of eye movement trajectories in the experiment. The HMM states along the most probable paths are denoted by ‘’ – state 1 (scanning), ‘M’ – state 2 (reading), ‘’ – state 3 (decision). See text for interpretations of the states. The beginning of the trajectory is marked with a circle; ending with two concentric circles. W: word search. A: question–answer. I: true interest. 247 ulus words are not plotted for clarity). It appears that when the participant closes in to the relevant line, the decision state is adopted. In the word search condition, the trajectories indicate mostly scanning, whereas in question–answer condition the lines are read word by word, but the state of processing varies, depending on whether the line is relevant for the task or not. 4.3.5. Average behavior Drawing summaries from the plots shown in Fig. 3 is difficult. Instead, it is easier to find common patterns by inspecting the mean behavior of the conditions. 4.3.5.1. Method. Computing average behavior from our time series data is not straightforward, because time sequences have different lengths and the observations are probabilities. We first computed the a posteriori probabilities of being in state s at time t, given the observations x1;...;T and model parameters h, that is, ct ðsÞ ¼ pðst jx1;...;T ; hÞ. The probabilities can be computed with a forward–backward algorithm. The probabilities were then converted to their natural parameters (by hct ðsÞ ¼ log ct ðsÞ, thus mapping the probabilities to real values). Next, the sequences were normalized to the same length by resampling them to the same length as the longest sequence (Gallinari, 1998). After that, the values were mapped back to probaexpfhct ðsÞg bilities using the inverse mapping ct ðsÞ ¼ P expfh : A ct ðiÞg i simple assumption is that for each time instance t, the probabilities are emitted from a Dirichlet distribution with parameters aðtÞ . The parameters can be estimated using the maximum likelihood criteria (see Minka (2000) for update formulas), after which the mean and standard deviation of the Dirichlet distribution can be computed (see e.g. Gelman et al. (2003)). 4.3.5.2. Results. The mean behavior along with its standard deviation is plotted in Fig. 4. In the word search condition, participants began the assignment from the scanning state with a probability of 70%. There was a slight tendency for being in the reading before switching to the final decision state. For the question–answer and subjective interest conditions the strategies were similar, although they were less emphasized. Participants began the tasks almost equally often from the scanning and reading states. In the middle of the task performance, the reading state was slightly more common and towards the end, the decision state was very common. In general, the results suggested that before shifting to the decision states participants adopted different strategies. This was also visible in the standard deviations, which were larger in the beginning and in the middle of the tasks than in the end. 5. Discussion In this paper, we applied a reverse inference approach with the aim of making hypotheses on hidden cognitive states in an experiment resembling everyday information 248 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 reading scanning W A I decision 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 Fig. 4. Average probability (y-axis) of being in state s. Horizontal axis is the normalized sequence length. Top row: word search (W). Center: question– answer (A). Bottom: true interest (I). The plots show the mean probability (and  one standard deviation; 66% confidence interval) of being in a given HMM state as a function of time. Left column: scanning state, middle column: reading state, right column: decision state. search tasks. Our setup differs from traditional research methods in psychology where controlled experiments are designed to find out what happens in eye movements when cognitive processes are manipulated. Instead, we designed a less controlled experiment, and then applied advanced statistical modeling, a hidden Markov model to make inferences about cognitive processing during the tasks (see Feng (2003) for a discussion on benefits of the data-driven approach). Our model suggests that participants shifted their eye movement behavior while they proceeded in the tasks. They typically began the assignments from a set of states reflecting a scanning type of behavior (see Fig. 4 and Table 3). The scan paths indicated long saccades with no preference on direction, accompanied with rather short fixations. Additionally, the fixations tended to land on previously unfixated areas on the text. The second set of states were labeled as reading because they contained frequent forward saccades, and the distance covered by saccades mostly corresponded to an average word length. Also the mean fixation durations (200 ms) and the amount of regressions (about 13%) were in accordance with the previous research findings of reading (Rayner, 1998). The characteristics of the third set of states suggested a more careful analysis of sentences, possibly of deciding whether the sentence is the correct answer to a given task. This was indicated by the fact that the participants ended the assignments while they were in the decision states. The saccades landed almost always on the previously seen lines and were directed either forward or backward. The distance covered by saccades was about the length of an average word. Our results support and complement the modeling work by Liechty et al. (2003), who used eye movement data to identify two states of visual attention in an advertisement viewing task. As an extension to their approach our model includes experimental manipulations of the search tasks. Although we used literal tasks, our processing states shared similarities with their findings. The scanning state had similar features with their global processing state, which were both characterized by long saccades and rather short fixations. Short saccades and long fixations were typical of their attentive processing state. In our study, the empirical data supported segmenting the attentive state into two processes, i.e. the reading and the decision processes, suggesting a finer structure. Besides their behavioral relevance, the labels given to the hidden states are suggestive, and can be used as hypotheses about the underlying processes. The hypotheses can be tested by collecting additional data with known processing states, for example by selecting tasks that emphasize pure visual scanning or naturalistic reading, to empirically validate the parameters of suspected processes. With the setup presented here, it is also possible to make more specific hypotheses by constraining the dHMM structure. For example, some of the overlapping processes across the three tasks could have been linked in the HMM training. For mutually exclusive processes the probability for being in one state at a certain time would be either one J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 or zero. However, the probabilities suggested by our model were somewhere between one and zero (see Fig. 2), indicating that the states are not mutually exclusive but rather reflect mixtures of ongoing processes that are optimal for the performance. This is in accordance with an experimental and theoretical evidence suggesting that reading eye movements are generated through multiple competing processes rather than one homogenous mechanism (Findlay & Walker, 1999). In addition, a considerable proportion of variation in eye movements can be attributed to random fluctuations in the oculomotor system (Feng, 2006). Also, McConkie and Yang (2003), Yang and McConkie (2005) have shown that a considerable amount (even 50%) of saccades during reading are executed by a basic mechanism that, repetitively, produces saccades without direct cognitive control. Our model was able to predict the task types with an accuracy of 60.2%, which is 27% units above pure chance (33.3% for three classes). We did not expect much better accuracy. First, because we used all data in modeling, including participants with noisier eye movement signals. Second, the tasks were not very controlled. Instead, the instructions allowed participants to freely choose their own search strategies. Third, the 50 Hz sampling rate of the Tobii 1750 eye tracker quantized the fixation durations to 20 ms intervals. With a higher temporal resolution the model may have been able to predict the tasks more accurately, since more information would have been available. The classification accuracy could also be improved by giving word level features, such as word frequencies and word lengths as an input to the model. This feature can be implemented for example by using a IOHMM model (Bengio, 1996; Bengio & Frasconi, 1999). Currently, the only additional information (besides eye movement data) given to our model was the task type of the learning data. Despite the moderate classification accuracy, the model parameters appeared behaviorally relevant when compared to the previous results about reading. 5.1. Relation to other models The model applied here, dHMM, makes it possible to study cognitive control across fixations, since the eye movements are inspected as a time series instead of summary measures, such as average fixation duration. Since the HMM is designed for reverse inference tasks, it differs from traditional computational models in psychology that are models of forward inference; they attempt to describe how perceptual and cognitive processes drive eye movements, whereas our model tries to make conclusions about cognition given the eye movements. According to the visuo-oculomotor research tradition, non-cognitive factors, such as the landing position of the eyes on a word, mainly determine when and where the eyes move. Furthermore, Vitu, O’Regan, Inhoff, and Topolski (1995) showed that eye movements varied little from normal reading when participants were pretending to read z- 249 strings (however see Rayner & Fischer (1996)). Similar results were also shown by McConkie and Yang (2003), Yang and McConkie (2005). A strategy-tactics model (O’Regan, 1990, 1992) suggests that, based on their expectations about the difficulty of the forthcoming task, readers can adopt either careful or risky global strategies that coarsely influence fixation times and saccade lengths. He claims that predetermined oculomotor strategies are important in defining global characteristics of eye movement behavior in reading. In our tasks, the question presented prior to the sentence lists most probably primes expectations and adjusts certain strategies for the forthcoming performance. Also, the states discovered by dHMM showed similar features across the task types. Therefore, it is possible that an oculomotor strategy optimized for the given tasks could explain the variations in processing states. Other theories have emphasized the role of cognitive control on eye movements. For example, Just and Carpenter (1980) have proposed that eye movements act as direct pointers indicating which word is being processed and for how long. Also, computational models on reading eye movements, such as the E-Z Reader (Reichle et al., 2006; Pollatsek et al., 2006), are based on the assumption that fixation durations, word skipping or regressing are determined by lexical processes. However, the current discussions on the cognitive control theory focus on the decisions of when and where the next saccade is initiated within a single fixation. In contrast, the strategic control across fixations is until recently treated marginally. In our tasks, the participants could have adjusted their processing states in a moment-to-moment basis according to the current task demands, as proposed in Carver (1990). The finding that the task types differed in the transition sequences between the processing states could support the cognitive control theory. For example, in the question–answer and the subjective interest conditions, participants switched more often from the decision state back to the reading state, whereas in the word search condition the sequence was more straightforward, starting from the scanning state and ending in decision state. 5.2. Future directions As discussed above, both cognitive and oculomotor theories can explain our results. Therefore further studies, for example combining fMRI and eye tracking, could provide valuable information about the activities that correlate with the processing states reflected in eye movement patterns. For instance, emphasized simultaneous activation in language areas could support the cognitive control theory, whereas stronger correlations with motor activities would indicate that the strategies are determined by oculomotor factors. In spite of the controversial views about the basis of the processes driving eye movements, our results are useful in practical applications. The finding that eye movement patterns differ when different processing demands are 250 J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 encountered can be used for developing an interactive information search application that learns and adapts to users’ goals and intentions. For example, by examining which parts of a search engine results are read in different states, such as reading or decision states, it is possible to infer about the intentions and interests of the user. On the basis of this information the system could provide more material which is of possible interest to her. However, further studies are needed to make this kind of proactivity from the side of the system most beneficial to the users. For future research more detailed experiments need to be designed, allowing deeper examination of the findings presented here. For example, it would be of interest to study to what extent the processing states generalize to other cognitive tasks and how individuals differ in switching between processing states. Acknowledgements This work was supported by the Academy of Finland, decisions no. 202211 and 202209, Helsingin Sanomain 100-vuotissäätiö, NordForsk and Jenny ja Antti Wihurin Rahasto. Parts of this paper were completed while the first author was employed by the Low Temperature Laboratory in Helsinki University of Technology. Therefore, this work was also supported by the Sigrid Jusélius Foundation and the Academy of Finland National Programme of Excellence 2006–2011. The authors would like to thank Jarkko Venna, Kai Puolamäki, Jukka Hyönä, Samuel Kaski, Kenneth Holmqvist and Erik D. Reichle together with anonymous reviewers for valuable comments and discussions on the manuscript. References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. Bengio, Y. (1996). Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5), 1231–1249. Bengio, Y., & Frasconi, P. (1999). Markovian models for sequential data. Neural Computing Surveys, 2, 129–162. Carpenter, R. H. S., & McDonald, S. A. (2007). Later predicts saccade latency distributions in reading. Experimental Brain Research, 177(2), 176–183. Carver, R. (1990). Reading rate: A review of research and theory. San Diego, CA: Academic Press Inc. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman and Hall. Feng, G. (2003). From eye movement to cognition: Toward a general framework of inference. Comment on Liechty et al., 2003. Psykometrika, 68, 551–556. Feng, G. (2006). Eye movements as time-series random variables: A stochastic model of eye movement control in reading. Cognitive Systems Research, 7, 70–95. Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22, 661–721. Gallinari, P. (1998). Predictive models for sequence modelling, application to speech and character recognition. In C. L. Giles & M. Gori (Eds.), Adaptive processing of sequences and data structures: International summer school on neural networks. Lecture notes in computer science (Vol. 1387, pp. 418–434). Berlin, Germany: Springer-Verlag. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer. Hyönä, J., Lorch, R., & Kaakinen, J. (2002). Individual differences in reading to summarize expository text: Evidence from eye fixation patterns. Journal of Educational Psychology, 94, 44–55. Hyrskykari, A., Majaranta, P., Aaltonen, A., & Räihä, K.-J. (2000). Design issues of idict: a gaze-assisted translation aid. In Proceedings of eye tracking research and applications (ETRA2000) (pp. 9–14). ACM Press. Hyrskykari, A., Majaranta, P., & Räihä, K.-J. (2003). Proactive response to eye movements. In G. W. M. Rauterberg, M. Menozzi, & J. Wesson (Eds.), INTERACT’03. IOS Press. Just, M., & Carpenter, P. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–354. Liechty, J., Pieters, R., & Wedel, M. (2003). Global and local covert visual attention: Evidence from a Bayesian hidden Markov model. Psychometrika, 68, 519–541. McConkie, G. W., & Yang, S.-N. (2003). How cognition affects eye movements during reading. In J. Hyönä & R. H. D. Radach (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 413–427). Amsterdam, The Netherlands: Elsevier. Miloslavsky, M., & van der Laan, M. J. (2002). Fitting of mixtures with unspecified number of components using cross validation distance estimate. Computational Statistics and Data analysis, 41, 413–428. Minka, T. (2000). Estimating a Dirichlet distribution. Unpublished but available in Web. Nádas, A. (1983). A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(4), 814–817. O’Regan, J. K. (1990). Eye movements and reading. In E. Kowler (Ed.), Eye movements and their role in visual and cognitive processes (pp. 395–453). Amsterdam, The Netherlands: Elsevier. O’Regan, J. K. (1992). Optimal viewing position in words and the strategy-tactics theory of eye movements in reading. In K. Rayner (Ed.), Eye movements and visual cognition: Scene perception and reading (pp. 333–354). New York: Springer-Verlag. Pieters, R., Rosbergen, E., & Wedel, M. (1999). Visual attention to repeated print advertising: A test of scanpath theory. Journal of Marketing Research, 36, 424–438. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10, 59–63. Pollatsek, A., Reichle, E. D., & Rayner, K. (2006). Tests of the E-Z reader model: Exploring the interface between cognition and eye-movement control. Cognitive Psychology, 52, 1–56. Povey, D.,Woodland, P., & Gales, M. (2003). Discriminative MAP for acoustic model adaptation. In IEEE international conference on acoustics, speech, and signal processing, 2003. Proceedings (ICASSP’03) (Vol. 1) (pp. 312– 315). Puolamäki, K., Salojärvi, J., Savia, E., Simola, J., & Kaski, S. (2005). Combining eye movements and collaborative filtering for proactive information retrieval. In G. Marchionini, A. Moffat, J. Tait, R. BaezaYates, & N. Ziviani (Eds.), SIGIR’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 146–153). New York, NY, USA: ACM Press. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. Rayner, K., & Fischer, M. H. (1996). Mindless reading revisited: Eye movements during reading and scanning are different. Perception and Psychophysics, 58, 734–747. J. Simola et al. / Cognitive Systems Research 9 (2008) 237–251 Rayner, K., & Pollatsek, A. (1989). The psychology of reading. New Jersey, USA: Prentice-Hall Inc.. Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E-Z reader: A cognitive control, serial-attention model of eye-movement behavior during reading. Cognitive Systems Research, 7, 4–22. Reilly, R. G., & Radach, R. (2006). Some empirical tests of an interactive activation model of eye movement control in reading. Cognitive Systems Research, 7, 34–55. Richter, E., Engbert, R., & Kliegl, R. (2006). Current advances in swift. Cognitive Systems Research, 7, 23–33. Robertson, A. W., Kirshner, S., & Smyth, P. (2004). Downscaling of daily rainfall occurrence over northeast brazil using a hidden markov model. Journal of Climate, 17(22), 4407–4424. Salojärvi, J., Puolamäki, K., & Kaski, S. (2005a). Expectation maximization algorithms for conditional likelihoods. In L. D. Raedt & S. Wrobel (Eds.), Proceedings of the 22nd international conference on machine learning (ICML-2005) (pp. 753–760). New York, USA: ACM Press. Salojärvi, J., Puolamäki, K., & Kaski, S. (2005b). Implicit relevance feedback from eye movements. In W. Duch, J. Kacprzyk, E. Oja, & S. Zadrozny (Eds.), Artificial neural networks: Biological inspirations – ICANN 2005. Lecture notes in computer science (Vol. 3696, pp. 513–518). Berlin, Germany: Springer-Verlag. View publication stats 251 Salojärvi, J., Puolamäki, K., & Kaski, S. (2005c). On discriminative joint density modeling. In J. Gama, R. Camacho, P. Brazdil, A. Jorge, & L. Torgo (Eds.), Machine learning: ECML 2005. Lecture notes in artificial intelligence (Vol. 3720, pp. 341–352). Berlin, Germany: SpringerVerlag. Schlüter, R. & Macherey, W. (1998). Comparison of discriminative training criteria. In Proceedings of the ICASSP’98 (pp. 493–496). Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. Shimojo, S., Simion, C., Shimojo, E., & Scheier, C. (2003). Gaze bias both reflects and influences preference. Nature Neuroscience, 6(12), 1317–1322. Vitu, F., O’Regan, K., Inhoff, A. W., & Topolski, R. (1995). Mindless reading: Eye-movement characteristics are similar in scanning letter strings and reading texts. Perception and Psychophysics, 57, 352–364. Yang, S.-N. (2006). An oculomotor-based model of eye movements in reading: The competition/interaction model. Cognitive Systems Research, 7, 56–69. Yang, S.-N., & McConkie, G. W. (2005). New directions in theories of eyemovement control during reading. In G. Underwood (Ed.), Cognitive processes in eye guidance (pp. 105–130). Great Britain: Oxford University Press.