1 Introduction

Place cells are principal neurons in hippocampus which respond maximally when the animal is in a specific location in an environment. They were discovered in the rat hippocampus by O’Keefe & Dostrovsky in 1971 (O’Keefe and Dostrovsky 1971; O’Keefe and Nadel 1978) and investigated in numerous studies (for reviews see Eichenbaum et al. 1999; Hölscher 2003). Place fields (PF) form from environmental cues and play an important role in spatial navigation. Cells having similar properties to rat place cells had also been found in humans using extracellular recordings from epileptic children (Ekstrom et al. 2003). Thus, the formation of PFs, and their influence on navigation remains an important experimental and theoretical question. In particular, little is known on how different sensory cues contribute to PF formation and spatial navigation. Thus, the goal of the first part of this study is to investigate how PFs are formed under visual as well as olfactory influences. In the second part, we address the question of how PFs can be used in navigation, and compare this to olfactory based navigation based on self-laid scent marks.

1.1 PF formation and their relations to other hippocampal subsystems

Different models have been proposed for hippocampal place cell formation including Gaussian functions (O’Keefe and Burgess 1996; Touretzky and Redish 1996; Hartley et al. 2000; Foster et al. 2000), back-propagation algorithm (Shapiro and Hetherington 1993), auto-associative memory (Recce and Harris 1996), competitive learning (Sharp 1991; Brown and Sharp 1995), neural architecture based on landmark recognition (Gaussier et al. 2002), neuronal plasticity (Arleo and Gerstner 2000; Arleo et al. 2004; Strösslin et al. 2005; Sheynikhovich et al. 2005; Krichmar et al. 2005), independent component analysis (Takács and Lőrincz 2006; Franzius et al. 2007), self organizing map (Chokshi et al. 2003; Ollington and Vamplew 2004) or Kalman filter (Bousquet et al. 1998; Balakrishnan et al. 1999). None of these, however, addresses the question of how multiple sensory inputs might affect PF formation. Experiments with rodents demonstrate that visual cues play an important role for the control of place cells (Muller and Kubie 1987; Knierim et al. 1995; Collett et al. 1986; O’Keefe and Speakman 1987; Maaswinkel and Whishaw 1999; Dudchenko 2001). On the other hand, in the absence of visual cues rats can rely on other cues such as olfactory, auditory or somatosensory stimuli (Hill and Best 1981; Carvell and Simons 1990; Maaswinkel and Whishaw 1999; Wallace et al. 2002a). Thus, it seems reasonable to consider the influence of such cues also on the formation of PFs. This view is supported by the observation that PFs become unstable when olfactory cues are removed, suggesting that olfactory cues are important in the formation and stability of PFs (Markus et al. 1994; Save et al. 2000).

Other types of cells related to hippocampal place cells and spatial navigation are head direction cells and grid cells. Head direction cells are found in found in many brain areas including postsubiculum, the thalamus, lateral mammillary nucleus, dorsal tegmental nucleus, and striatum (Taube et al. 1990a, b; Muller et al. 1996; Knierim et al. 1998). Head direction cells respond maximally when animal’s head is oriented in preferred direction in the horizontal plane. Like place cells, head direction cells are under control of distal stimuli, and have different preferred directions in different environments. Experimental data suggests that the head direction cell system may orient the place cell system (Jeffery and O’Keefe 1999; Calton et al. 2003; Yoganarasimha and Knierim 2005).

Grid cells are found in entorhinal cortex (Hafting et al. 2005; Sargolini et al. 2006; Barry et al. 2007). Grid cells, like place cells, also fire strongly when an animal is in specific locations in an environment, but differ from place cells in that they have multi-peak firing fields which are organized into a hexagonal grid. It has been suggested that grid cells may make associations between places and events which is needed for the formation of memories (Hafting et al. 2005).

1.2 Navigation guided by PFs and other influences

Many experimental studies have been performed on goal directed learning in rodents (Barnes et al. 1980; Morris 1984; Prados and Trobalon 1998; Lavenex and Schenk 1998; Maaswinkel and Whishaw 1999; Wallace et al. 2002a; Etienne and Jeffery 2004; Jeffery et al. 2003; Hines and Whishaw 2005). Navigation models based on place cells usually address goal learning by using reinforcement learning algorithms (Arleo and Gerstner 2000; Arleo et al. 2004; Strösslin et al. 2005; Sheynikhovich et al. 2005; Krichmar et al. 2005) where place cell representation is based on combination of visual information and information provided by head direction cells or path integration.

Path integration was considered by many researchers as evidence for an additional mechanism when navigating in the absence of visual cues (for a review see Etienne and Jeffery 2004). Experimental data suggests that grid cells may be related to the path integra tion system (Hafting et al. 2005; Sargolini et al. 2006; McNaughton et al. 2006). However, Save et al. (2000) have shown that path integration alone is not sufficient to maintain stable receptive fields of place cells when rats navigate in the dark. Without additional cues, path integration leads to an accumulation of errors in direction and distance, and it thus needs to be reset through position information from stable cues (Etienne et al. 1996, 2004). In the study of Strösslin et al. (2005) the authors claim that their model is able to work in the dark based on self-motion cues (visual cues together with path integration were used), yet it is unclear how the model can succeed if visual cues used for recalibration are not available while navigating for a longer time in the dark.

Thus, for navigation in natural environments it seems reasonable to consider other sensory inputs, and it is known from the literature that rodents can form spatial representations based on olfactory cues and use this information for spatial orientation and navigation (Tomlinson and Johnston 1991; Lavenex and Schenk 1995, 1996, 1998). Experiments show that rats can track odors or self-generated scent marks to find a food source (Wallace et al. 2002a, 2003). To accommodate these findings, we propose a novel navigation mechanism based on self-marking by odor patches combined with a Q-learning algorithm based on (multi-sensory formed) place cells in order to improve spatial navigation.

Studies show that rats use visual and/or olfactory cues when available, and that such allothetic cues dominate over path integration information (ideothetic components) (Maaswinkel and Whishaw 1999; Whishaw et al. 2001). Therefore, the focus of the current study is on place cell formation and spatial navigation in cue-rich, illuminated environment, where path integration would be extraneous.

Another interesting consideration concerns the question how navigation is affected by remapping. It is known that PFs change very quickly when the rat is confronted with a new environment and that many PFs will re-obtain their former properties as soon as the animal returns to the initial environment (Muller and Kubie 1987; Wilson and McNaughton 1993; Shapiro et al. 1997; Tanila et al. 1997; Knierim et al. 1995, 1998). It is, however, an unresolved question how remapping affects navigation and navigation (re-)learning (Jeffery et al. 2003).

1.3 Specific questions addressed

In this study, we concentrate on the impact of olfactory cues on place cells formation and on a goal navigation learning in different environments. We focus on the following three questions:

  1. 1)

    What is the contribution of olfactory cues to the formation of place cells and goal navigation?

  2. 2)

    Can goal navigation based on place cells be improved by additional navigation mechanisms?

  3. 3)

    How does the remapping of PFs influence goal navigation when switching between different environments?

The paper is organized as follows. First we describe the sensory inputs and the model system. Then we present different goal navigation strategies and thereafter we show the results of place cell analysis, and a comparison of the presented navigation algorithms. Finally, we discuss our results and relate them to other studies and biological data.

2 Methods

2.1 Sensory inputs

We use a square box with dimensions of 10000×10000 points where walls of the arena are marked by different landmarks (see Fig. 1(a)). Visual and olfactory cues are used as allothetic inputs to the place cells in our model. As visual input, we use the perpendicular distances from the rat’s position to all four walls, similar to many other models which use distances to walls or landmarks (Sharp 1991; Recce and Harris 1996; O’Keefe and Burgess 1996; Touretzky and Redish 1996; Hartley et al. 2000; Ollington and Vamplew 2004). Let us define the visual input by \(v_{x,y}^k\), where x and y denote the position in the environment and k = 1...4 is the number of possible visual inputs related to the four walls of the arena. In our model the rat has a view-field of 180 degrees (real rats have a wider field of view), which means that the rat can see only the walls which are ahead, but can not see what is behind. Prediction of the distance to a non-visible wall is made by taking the last estimate of distance to the wall when it was visible. This can be described by the following recurrent equation:

$$v^j_{x,y}(t) = v^j_{x,y}(t-1),$$

where j denotes the index of the non-visible wall, and t denotes the time in steps. Note that if the rat is moving along a linear trajectory away from a non-visible wall then the error of the estimate of this wall accumulates over time. The estimate is re-calibrated as soon as the wall becomes visible again.

Fig. 1
figure 1

Environmental and neuronal setup of the system. (a) Image of square arena with landmarks. Perpendicular distances from rat’s position (gray dot) to all four walls of square arena are used as visual stimuli. (b) Examples of odors used as olfactory stimuli to the rat. Five examples (Ex. 1–Ex. 5) are shown where each box represents a different odor coming from a different location in the environment. (c) A simple feed-forward network with sensory inputs x at the input layer, connection weights w and place cells (PC) at the output layer. (d) Distribution of initial weights of the neural network (c)

We also use four different odors as an additional input to the place cells. Five examples of odors are shown in Fig. 1(b), where each box represents a different odor with a different source location in the environment. We model our odors at the ground level (2D space) by the following Gaussian functions:

$$\begin{array}{rll} o^k_{x,y} &=& e^{-\left( \frac{\left[a(x-s^k_x+\xi_x^k)\right]^2}{2\sigma_y^2} + \frac{\left[a(y-s^k_y+\xi_y^k)\right]^2}{2\sigma_x^2}\right)},\\ \sigma_x &=& 15+a x+5sin(0.1 a x),\\ \sigma_y &=& 15+a y-5sin(0.1 a y), \end{array}$$

where x and y denote the position in the environment, k = 1...4 is the number of the odor sources, and a = 0.01 is the scaling factor. The variables \(s^k_{x,y}\) denote the coordinates of the center (maximum intensity) of the odor source and are given as follows: \(s^1_{x,y}=[100,100]\), \(s^2_{x,y}=[9900,100]\), \(s^3_{x,y}=[9900,9900]\) and \(s^4_{x,y}=[100,9900]\). Values \(\xi_x^k\) and \(\xi_y^k\) are randomly drawn from a Gaussian distribution with zero mean and a standard deviation of 100. Note, that here we model static odors that do not change during different runs of the same experiment but differ across experiments. The rat can smell the odors locally, and it does not sense the direction of the odor source. Noise is also added to the visual sensory inputs, assuming that the rat makes larger errors in the estimation of long distances. Similarly the rat makes larger errors in estimating odors with low intensity and smaller errors for odors with high intensity. This is given by the following equations:

$$\begin{array}{rll}V^k_{x,y} &=& \left[v^k_{x,y} + 0.03v^k_{x,y}\eta_v^k\right]/L,\\\\ O^k_{x,y} &=& \left[o^k_{x,y} + 0.03(1-v^k_{x,y}) \eta_o^k\right]/M^k, \end{array}$$

where \(\eta_v^k\) and \(\eta_o^k\) are random values from a uniform distribution within the interval [-1;1]. Note, that both visual and olfactory inputs are normalized and bounded within the interval [0;1], where L = 10000 points is the size of the environment, and \(M^k = max_{x,y} \ o^k_{x,y}\) is the maximal intensity of the k-th odor source.

2.2 Place cell model

We model place cells by using a simple feed-forward network with an input and an output layer as shown in Fig. 1(c). At the input layer we have sensory inputs \(X: [V_{x,y}^k,O_{x,y}^k]\) received from visual and olfactory stimuli. Here we have a fully-connected network where every neuron in the input layer is connected to every neuron in the output layer via connection weights W i = [w i,1...w i,n], where i = 1...N, N = 500 is the total number of place cells and n is the number of sensory inputs (n = 4 if only visual cues are used and n = 8 if both visual and olfactory cues are used). Weights are initialized randomly by a function f z :

$$f_z=\left(1+e^{\frac{z-m}{2\sigma^2}}\right)^{-1},$$

where z is a random number from a uniform distribution within the interval [0;1], m = 0.5 and σ = 0.2. The distribution of initial weights is plotted in Fig. 1(d). We have chosen such a distribution for the reason that if the weights are initialized according to a uniform distribution then all PF centers are located around the center of the environment and we do not obtain PFs close to the walls of the environment. In our model weights are basis vectors, which are used to compute firing rates of place cells (see equation below) where we start with random initialization of basis vectors. By employing competitive learning, cells become tuned to a specific input, which leads to the spatial selectivity of the place cells.

The firing rate of place cell i is expressed by a Gaussian function (similar to O’Keefe and Burgess 1996; Hartley et al. 2000) and is computed as follows:

$$r_t^i=e^{- \frac{ \left[ \frac{1}{n} ||X_t-W^i_t|| \right]^2 }{2\sigma_f^2}},$$

where σ f  = 0.07 defines the width of the PF, n is the dimension of the input space, and the norm is the Euclidean distance. Weights of our neural network are modified according to a winner-takes-all mechanism where we change only the weights of the best matching unit β t :

$$\beta_t= \arg \min\nolimits_{i} \big|\big|X_t-W^i_t\big|\big|.$$

Weights of the winner neuron β t are changed according to the following equation:

$$W_{t+1}^{\beta_t}=W_t^{\beta_t} + \mu \big(X_t-W_t^{\beta_t}\big),$$

where 0 < μ ≪ 1 is the rate factor.

2.3 Navigation strategies

2.3.1 Closed loop context

Before presenting the details of navigation strategies, we stress that we are dealing with a closed loop system (Fig. 2(a)). We create place cells from allothetic visual and olfactory cues. Place cells are connected to motor neurons, which produce certain motor actions. The rat has to learn appropriate motor actions, which eventually lead to the food source. As a consequence, sensory inputs as well as place cells are affected whenever the rat navigates in the environment, thus closing the loop as shown in Fig. 2(a).

Fig. 2
figure 2

(a) Schematic diagram of the closed loop scenario. (b) Environmental setup of the goal navigation task. We used a discrete square arena with dimensions of 10000×10000 points and a goal (food source) with dimensions of 2000×2000 points. The starting position of the rat was 1000 points from both left and bottom walls, whereas the location of the food source was 3000 points from the left wall and 2000 points from the upper wall. (c) Neuronal setup of our rat’s navigation system. Each place cell in the network is connected to eight motor neurons (eight directions). The rat makes a movement to the direction which has the strongest connection between place cells and motor neurons for eight directions averaged over all cells, which are firing at the present location. The rat makes a random movement whenever the connection weights are zero at the present location

2.3.2 Goal navigation task

The rat has to learn to navigate from its home location to the goal, i.e the food source. The rat can use allothetic visual and olfactory cues described above but it can not see or smell the food source (similar to the Morris water-maze task, Morris (1984)). The rat gets a reward only when it approaches the goal location. The setup for such a spatial task is shown in Fig. 2(b). We use the same discrete environment (square box) as described above, where we have different landmarks on all four walls (see Fig. 1(a)). The home location of the rat is in the bottom-left corner, 1000 points from both walls and is marked by a gray dot. The dimensions of the food source, marked by a square, are 2000×2000 points and it is located 3000 points from the left wall and 2000 points from the upper wall. At the beginning, the rat explores the environment randomly and finds the goal just by chance (dashed line), whereas after a few learning runs the rat finds a more or less direct path to the food source. Whenever the rat finds the food location we start a new run from the start position (home location). A maximum number of 200 steps is allowed for one run with a step size in the range of 400-600 points. In our model during the first run in most of the cases (80%) the rat finds the goal within less than 200 steps, so the rat has enough time to find the goal even when navigating randomly. Another reason for the 200 step limit is related to the frustration phenomenon observed in animals where creatures return to “home-base” if the goal is not found within an expected time (Eilam and Golani 1989; Whishaw et al. 2001; Wallace et al. 2002b; Hines and Whishaw 2005; Nemati and Whishaw 2007).

2.3.3 Q-learning with function approximation

As a first approach we apply reinforcement learning (Sutton and Barto 1998) as used by other studies on hippocampus-based navigation (Arleo and Gerstner 2000; Arleo et al. 2004; Foster et al. 2000; Strösslin et al. 2005; Krichmar et al. 2005). Here we employ a version of Q-learning with function approximation similar to Reynolds (2002). The algorithm is implemented by a two layer neural network (see Fig. 2(c)) where we have place cells as inputs to the network. The place cells are connected to motor neurons representing eight directional cells: north (N), north-east (NE), east (E), south-east (SE), south (S), south-west (SW), west (W) and north-west (NW). The actual direction of movement is determined by the maximum Q-value of the eight possible directions averaged over all cells, which are firing at the present location, with additional noise. For example the horizontal movements W or E are given by the following simple equations:

$$\begin{array}{*{20}l} &&{\kern-6pt} \Delta x=\pm(\Delta s+b \cdot \eta_x), \\ &&{\kern-6pt} \Delta y=b \cdot \eta_y, \end{array}$$

where Δs = 500 is the step size, η x and η y are random values from a uniform distribution within the interval [-1;1], and b = 100 is the amplitude of the noise. Here we use the minus sign for the W direction and the plus sign for the E direction. Similarly, for SW or NE we have:

$$\begin{array}{*{20}c} \Delta x=\pm\left(\frac{\Delta s}{\sqrt{2}}+b \cdot \eta_x\right), \\ \Delta y=\pm\left(\frac{\Delta s}{\sqrt{2}}+b \cdot \eta_x\right), \end{array}$$

and the equivalent for the other directions. The rat makes a random movement whenever Q-values are zero at the present location. In this case, the rat keeps the direction of the movement with a probability of 1 − p r , whereas with p r  = 0.25 it will randomly take a new direction. When Q values are non-zero we use a usual RL strategy, with exploration and exploitation, where the direction of the movement is chosen according to the learned Q-values most times, (exploitation probability 1 − p e ), and a random move is made with exploration probability p e  = 0.1.

As mentioned before, the learning mechanism from place cells to motor cells is a version of Q-learning with function approximation. Let us define our basis functions Φ i as a function of the firing rate r t of the place cell i at the time step t:

$$ \Phi_i(r_t)= \left\{\begin{array}{ll} 1 &{\kern6pt} \mathrm{if}~r_t^i>0.5,\\ 0 &{\kern6pt} \mathrm{otherwise}. \end{array} \right. $$

Here, i = 1...N, N = 500 is the total number of place cells. Note, we discretize the space representation provided by place cell prior to the goal-navigation learning in order to reduce the amount of noise in the PF system since low firing rates give larger errors in position estimation compared to the real position of the rat in the environment. By using binary cells we still get different PF sizes and we preserve the directionality of place cells.

We define the action-value function by the following equation:

$$ Q(r_t,a_t)=\frac{\sum_i \Theta_{i,a_t} \Phi_i(r_t)}{\sum_i \Phi_i(r_t)}, $$

where Θi,a is the weight from the i − th place cell to the motor action a. In the given equation we sum over all basis functions, but at a specific location within the environment only a specific subset of basis functions will be non-zero. We use an averaging Q-learning rule according to Reynolds (2002) where we update weights \(\Theta_{i,a_t}\) of the actually taken action a t at the time step t according to the following learning rule:

$$ \Theta_{i,a_t}\!=\!\Theta_{i,a_t}\!+\! \alpha(R_{t+1}\!+\!\gamma \max\nolimits_a Q(r_{t+1},a_{t+1})\!-\!\Theta_{i,a_t})\Phi_i(r_t), $$

where α = 0.7 is the learning rate, γ = 0.7 is the discount factor and R is a reward. We define our reward function R t by

$$ R_t= \left\{\begin{array}{ll} 1 &{\kern6pt} \mathrm{if~the~rat~has~found~the~goal},\\ 0 &{\kern6pt} \mathrm{otherwise}. \end{array} \right. $$

2.3.4 Self-marking navigation

The second approach in our study is to use navigation based on self-generated odor marks, where the rat follows the self-laid scent marks to find the food source. The rat always explores the environment randomly by keeping the direction of the movement whenever it does not smell anything locally. Note that the rat can smell only within a given radius of 600 points, which corresponds to the maximum step size. At the beginning, the rat finds the food source by moving randomly and marks it by a small amount of scent. In the next run/runs, when the rat approaches the previously laid scent mark within a distance at which the rat can smell it, the rat will mark its location and then will go directly to the perceived scent mark and remark it again by another small amount of scent. The whole navigational process can be defined as follows. The rat marks the location of the food source or remarks the current location if it smells another scent mark/marks ahead by

$$ u_{t+1}^{x,y}=u_{t}^{x,y}+\Delta u, $$

where u defines the self-laid odor patches in the environment, x,y define coordinates of the position within the environment and Δu = 0.005. The locations which have strong smell, i.e. \(u_{t}^{x,y}=1\), are not remarked any more. The rat goes directly to the location \(l_{t}^{x,y}\) marked by scent mark which has the strongest smell according to

$$ l_{t}^{x,y}=\arg \max\nolimits_{x,y} u_{t}^{x,y}, $$

otherwise it makes a random movement as explained above. It is worth noting that the given method propagates scent-marks backwards from the location of the reward as in reinforcement learning, but here we do not have predefined features. Instead, we create them “on the fly”, and we do not directly memorize action values associated to states, where a state is defined by the rat’s position in the environment x,y. In our model self-laid scent marks are modeled by little “drops” which are less intense relative to the environmental odors which may have very strong odor sources and diffuse within the environment. Self-generated odor marks can be smelled and distinguished by the rat only locally within a relatively small radius (in our case within one step size).

2.3.5 Combining Q-learning with self-marking navigation

The third and the last approach is a combination of the two previously described methods. In this case the rat marks the location only if it smells another scent mark/marks and the normalized maximum Q-value at this location obtained by using the first method has reached a given threshold of λ = 1.5:

$$ u_{t+1}^{x,y}= \left\{\begin{array}{ll} u_{t}^{x,y}+\Delta u &{\kern6pt} \mathrm{if}~\frac{\max_a Q(r_{t},a_t)}{\frac{1}{8}\sum_a Q(r_{t},a_t)}>\lambda,\\[4pt] u_{t}^{x,y} &{\kern6pt} \mathrm{otherwise}. \end{array} \right. $$

The action in the combined strategy is taken by the following rule. If the rat does not smell any scent mark within given radius then it takes an action according to the Q-values, otherwise the rat follows the scent gradient. By using this type of navigation the rat develops Q values and lays scent marks at the same time.

2.4 Remapping and navigation

It is known from the literature that PFs can change in firing rate, position, shape, or turn on/off when the animal is exposed to different environments, a phenomenon which is called remapping (Muller and Kubie 1987; Wilson and McNaughton 1993; Shapiro et al. 1997; Tanila et al. 1997; Knierim et al. 1995; Knierim et al. 1998). Fundamental changes occur within 5-10 minutes of exploration in a new environment, whereas the firing rate can change even within the first second (Wilson and McNaughton 1993). In this study we also investigate how remapping of place cells affects goal navigation task when the rat switches between different environments. We compare different navigation strategies with respect to change of environmental cues, as well as to a change of the goal location.

To look at the remapping of place cells, we first let the rat explore randomly the whole environment “A” for 5000 time steps. Environment “A” contains visual and olfactory cues as shown in Fig. 3, as already used in the previously described experiments. Afterward the rat is exposed to another environment, “B”, for 5000 time steps (see panels a and b). In our model we use the same visual landmarks and the same odors for both environments “A” and “B”. In order to change the environment we switch the landmarks and change the locations of odor sources. Landmarks are used by the rat in order to distinguish between the four walls and to estimate distance to them. When we switch landmarks the rat gets different estimates of distances to the walls marked by the same landmark when being at the same position in the environments “A” and “B”. The rat also gets different odor intensity at the same position in the environment “A” compared to the environment “B”. After exploration in the environment “B” the rat was moved back to the familiar environment “A”.

Fig. 3
figure 3

(a) Images of different environmental setups. Landmarks are switched in the environment “B” as compared to the original environment “A” whereas in the environment “C” allothetic cues as well as the location of the goal are changed. (b) Change of olfactory cues. The locations of odor sources are changed in the environment “B/C” as compared to the environment “A”

To compare Q-learning based on PFs obtained from combined visual and olfactory stimuli with the combination of Q-learning with the navigation based on self-generated odor marks we perform different sets of experiments. In the first set of experiments, we switch between two environments “A” and “B”, changing only environmental cues and keeping the location of goal unchanged (see Fig. 3(a)). In the second set of experiments, we switch between the environment “A” and “C”, and in “C” the environmental cues as well as the location of the food source are changed.

3 Results

3.1 Place cell analysis

Examples of PFs after random exploration over 5000 time steps are presented in Fig. 4. PFs obtained when using visual or olfactory cues alone are shown in panels a and b. PFs obtained from both visual and olfactory cues are shown in panel c. Here we show only selected PFs which have a maximum firing rate r > 0.5. Resulting PFs are localized, can differ in size and firing rate, and are similar to real PFs. For examples of PFs obtained from the rodent hippocampus see Wilson and McNaughton (1993), O’Keefe (1999).

Fig. 4
figure 4

Examples of PFs (100 out of a total of 500 cells). (a) PFs obtained when using visual cues alone. (b) PFs obtained when using olfactory cues alone. (c) PFs obtained when using both, visual and olfactory, cues. Selected PFs with the maximum firing rate r > 0.5 are shown for each case

The distribution of firing rates is shown in Fig. 5(b), where we have fewer cells with a high firing rate than cells with a low firing rate, which resembles experimental data (Hartley et al. 2000). Some of the cells which are silent in a specific environment become active when moved to the other environment (see Fig. 11(a)). PF centers from a single experiment (location of maximum firing rate within the field) are shown in Fig. 5(a), where circles represent centers of PFs with a low firing rate (r ≤ 0.5) and dots those with a high firing rate (r > 0.5). We observed that cells with low firing rate are distributed around the center of the environment (similar to Gaussian distribution, panel c) whereas cells with high firing rate are evenly distributed within the whole environment (see panel d). The latter cells will drive the learning in the goal navigation task (see Section 3.3).

Fig. 5
figure 5

(a) Distribution of PF centers within the environment from single experiment. Dots denote centers of PFs with maximum firing rate r > 0.5 whereas circles denote centers of fields with maximum firing rate r ≤ 0.5. (b) Distribution of maximum firing rates r of 500 cells; average and standard deviation (SD) for 100 experiments. (c, d) Distribution of x and y position of place cell centers with maximum firing rate r ≤ 0.5 (c) and r > 0.5 (d); average and standard deviation (SD) for 100 experiments. (e) Example of the rat’s trajectory when the rat explores the environment randomly. (f) Percentage of omnidirectional cells before and after learning (rate factor μ = 0.01). The average together with confidence intervals (95%) is shown in 20 experiments. (g) Connection weights between input neurons and place cells (see Fig. 1(c)) depending on the rate factor μ

Before looking at the comparison of goal navigation strategies we would like to investigate the contribution of the olfactory input to place cell formation. This influence can be assessed by measuring the directionality of place cells. For this investigation, we let the rat to explore the environment randomly as shown in Fig. 5(e) for 5000 time steps (development phase). For comparison we used a relatively low rate factor (μ = 0.01) to develop connection weights between an input and an output layer (see Fig. 1(c)), because weights oscillate and do not converge when a high rate factor (μ = 0.1) is used, and this does not lead to the final stabilization of place cells. For comparison of weight development for different rate factors see Fig. 5(g). After the development phase we let the rat move in the environment for another 5000 time steps to create test data. To evaluate the directionality of place cells we looked at the locations which had been passed by the rat in different directions. We say that a cell is omnidirectional, i.e. independent of the movement direction, if at a given location the cell fires with its highest firing rate regardless of crossing the location in different directions. Averaged results of 20 experiments are presented in Fig. 5(f) where we compare the directionality of place cells obtained from visual cues alone with that obtained from both visual and olfactory stimuli. The white bars show the control case, with place cell directionality before the development phase (i.e. before learning). We can see that we obtain more omnidirectional cells when we use combined stimuli compared to visual stimuli alone and more omnidirectional cells develop during the development phase compared to control case. The improvement in omni-directionality when using olfactory cues can be explained by the fact that perception of olfactory cues is direction independent whereas perception of visual cues depends on local views. Note that the view-field influences the directionality of PFs. The larger the view-field, the fewer directional cells are obtained. Since the rats do not have the omnidirectional view we still would get more directional cells obtained from visual information alone compared to combined stimuli (visual and olfactory cues) or olfactory cues alone. Our results on place cell directionality are qualitatively similar to experimental data of Battaglia et al. (2004). For further discussion on place cell directionality see Section 4.

3.2 Goal navigation

3.2.1 Comparison of different navigation strategies

Before presenting statistical analysis of different navigation strategies, we compare different strategies by showing examples of single experiments. An example of navigation by using Q-learning based on place cells formed from combined visual and olfactory cues is presented in Fig. 6(a) and (b). Trajectories of the rat’s paths obtained from 30 runs are shown in panel a, and the number of steps needed to reach the goal versus number of runs are plotted in panel b. The rat found a more or less straight path to the goal after seven trials. Results for self-marking navigation are shown in Fig. 6(c–e). Trajectories of the rat’s paths obtained from 60 runs are presented in panel c. The environment with self-laid scent marks (marked as dots) is shown in panel d, where the dot’s size is proportional to the strength of the scent mark. The rat follows the scent gradient to find the food source. The number of steps needed to reach the goal versus number of runs is plotted in panel e, where the rat had generated the trail of scent mark, which leads from the home location to the food source after 56 runs (see the last four trajectories in panel c). One example of navigation with combined strategies is shown in Fig. 6(f–h) where trajectories of the rat’s paths obtained from 30 runs are presented in panel f and the number of steps needed to reach the goal versus number of runs in panel h. In this experiment the rat found a more or less straight path to the food source already after five runs. From the given example we can see that scent marks (panel g) are laid only along the way to the food source whereas in the previous example of self-marking navigation scent marks (see Fig. 6(d)) are spread out widely through the environment.

Fig. 6
figure 6

Results from single experiments using different navigation strategies to find a goal. (a, b) Q-learning based on place cells obtained from visual and olfactory cues. (a) Trajectories of rat’s paths from 30 runs. (b) Number of steps needed to reach the goal versus number of runs. (c-e) Self-marking navigation based on scent marks. (c) Trajectories of rat’s paths from 60 runs. (d) Environment with self-laid scent marks (marked by dots) is shown where larger dots represent stronger scent; rat follows the trail of scent marks to find the goal. (e) Number of steps needed to reach the goal versus number of runs. (f-h) Q-learning based on place cells obtained from visual and olfactory cues combined with the self-marking navigation. (f) Trajectories of rat’s paths from 30 runs. (g) Environment with self-laid scent marks (marked by dots). (h) Number of steps needed to reach the goal are plotted versus number of runs

Results obtained from single experiments using different navigation strategies to find a goal from a random start position are presented in Fig. 7, where in every trial the rat was placed randomly within the environment. In panel a we show results from Q-learning navigation based on place cells formed from both visual and olfactory input. A vector field representation of learned Q-values after 100 runs is shown where each vector represents the cumulative direction of movement from corresponding location. The vector field was calculated according to the following procedure. A 20×20 grid was used to define specific points in the environment. Corresponding subsets of place cells were found, which fire at each intersection point of the grid. Average Q-values for eight directions were calculated for the corresponding subset of place cells. The resulting movement direction vector was computed from obtained average Q-values for each intersection point of the grid. In panel b we show the resulting map of self-laid scent marks (marked by dots) from self-marking navigation after 200 runs. Here we use more runs since self-marking navigation converges slower than Q-learning (see Fig. 9(b)). When starting from random positions, the rat creates a map of a tree-like structure of scent marks, where it chooses the closest branch and then follows the gradient of scent marks, leading to the goal. Results of combined navigation are presented in panel c, where we show the vector field of learned actions (left) and the corresponding map of scent marks (right) after 100 runs. As expected we obtained similar results to those of self-marking and Q-learning navigation (see panels a and b). In general, we observed that when starting from the same location the rat creates one main trail of scent marks, whereas when starting from a random location the rat creates tree-like structures of scent marks with several main branches. Also, the rat creates more scent marks when using pure self-marking navigation compared to the combined strategy.

Fig. 7
figure 7

Results from single experiments using different navigation strategies to find a goal from random start position. (a) Q-learning: vector field representation of learned actions. (b) Self-marking navigation: environment with self-laid scent marks (marked by dots). (c) Combined navigation: vector field representation of learned actions (left) and corresponding self-laid scent marks (right)

We also investigated the performance of self-marking navigation in the environment with multiple targets. For this experiment we used an environment with two food sources as shown in Fig. 8(a), where in one case the rat always started to search for food from the same start position (home location) and in the other case the rat was placed at a random position. Results of a single experiment for self-marking navigation when always starting from the home location are shown in Fig. 8(b), where we show a map of self-laid scent marks after 200 runs. In the beginning the rat back-propagates scent marks from both goal locations, where at the end it creates a stronger trail of scent marks, which leads to only one of two food sources (see left and right sub-panels). When starting from a random location (panel c), the rat creates a map of scent marks with a tree-like structure similar to the case with one food source (see Fig. 7(b)). Here we obtain two trees of scent marks where each leads to one corresponding food source. Results of combined navigation are presented in Fig. 8(d, e). As expected, when starting from the home location (panel d), the rat marks only one route. Note, as opposed to self-marking navigation (panel b), the rat back-propagates scent marks only from one of the two food sources. Results for combined navigation when starting from a random location are shown in panel e where we show the vector field of learned actions (left sub-panel) and the corresponding map of scent marks (right sub-panel). As opposed to self-marking navigation (panel c), the rat creates only one tree of scent marks, where all direction vectors point to the marked food source. This is due to the fact that in combined navigation the rat marks only the locations where Q-values are relatively high. As soon as the rat finds one of the two goals, it goes to that goal location more often and propagates scent marks backwards (similarly to the results presented in panel d). We also observed that if one of two food sources is located significantly closer to the home location than the other, the rat in most of the cases finds the closer food source. This is due to the fact that the rat propagates scent marks from the food sources to home location backwards, and scent-marks from the closer food source reach home location earlier than those of food source which is further away. In general, we observed that the rat learns a unique route which leads to one of the two targets and only in the case of pure self-marking navigation when starting from a random locations does the rat create routes to both targets.

Fig. 8
figure 8

Results from single experiments using self-marking navigation and combined navigation in the environment with two targets. (a) Environmental setup of the goal navigation task. We use a discrete square arena with dimensions of 10000×10000 points and two food sources with dimensions of 1500×1500 points (small squares). The starting position of the rat is 1000 points from both left and bottom wall. The location of the first food source is 1000 points from both left and upper wall, whereas the location of the second food source is 1000 points from the right wall and 2500 points from the bottom wall. (b, c) Self-marking navigation: self-laid scent marks obtained for the same start position (b) and for random start position (c). (d, e) Combined navigation: self-laid scent marks obtained for the same start position (d); E - vector field of learned Q-actions (left) and self-laid scent marks (right) obtained for random start position

In the following paragraph we statistically determine the effectiveness of different stimuli for the goal navigation task and compare the previously described navigation strategies. The task for the rat was to find a route from home location to the food source as shown in Fig. 2(b). In Fig. 9 results from four cases are shown: VQ) Place cells based on visual cues alone are used for goal navigation by using Q-learning; VOQ) Similar to the case VQ, but here cells are created from combination of visual and olfactory cues; S) Self-marking navigation based on odor patches where the rat follows self-laid scent marks to find a food source; VOQS) Combined navigation where the rat marks its location only if the Q-values (obtained by VOQ) at this location have reached a given threshold (for details see Section 2). The average number of steps needed to find the goal versus number of runs obtained from 200 experiments is shown for each case in panel b. We obtained faster convergence when both visual and olfactory cues are used as compared to visual stimuli alone (see VQ, VOQ). This can be explained by the observation that cells formed from combined stimuli are less directional than those formed from visual cues alone. Note, that if we have place cell system where all place cells are directional then it will require learning of actions for every movement direction of an animal for every specific location in the environment. For instance, if the rat learns the direction to the goal from a specific location with a certain movement direction (e.g. north) then the rat will not know the direction to the goal from the same location when crossing this location with a different movement direction (east), since place cells will not fire when moving along this different direction. If we have omni-directional place cell system then we learn actions for a specific location independently of the movement direction of the animal (the same actions for all movement directions for a specific location) which as a consequence makes the learning faster. Self-marking navigation alone (S) converges much slower than Q-learning based on PCs obtained from combined stimuli (VOQ), whereas the combination of self-marking navigation with Q-learning (VOQS) is faster than Q-learning alone (VOQ). Note that the number of steps needed to reach the goal when using Q-learning (VQ/VOQ) is larger on average than that for self-marking navigation (S) or combined method (VOQS). This is due to the fact that we use a RL strategy with exploration and exploitation, where the rat tries random directions hoping to find a better path. This sometimes leads to a loss of track and long path trajectories, which on average shifts the curve up. In self-marking navigation or with the combined method the rat does not explore the environment anymore as it now follows self-laid scent marks. We also compared self-marking navigation (S) with combined method (VOQS) in a task where after learning of the spatial task the self-generated marks were “cleaned” (i.e. u(x,y) = 0). Results are presented in Fig. 9(c). As expected, the rat has to relearn the path to the goal from scratch when using self-marking navigation alone, whereas the combined strategy allows the rat to use learned Q-values (or in the other words, to navigate using allothetic visual and olfactory cues) whenever self-generated scent marks are not available anymore and it remarks the path again. The small peak with a decay after “cleaning” (see case VOQS) is a result of the previously discussed exploratory behavior.

Fig. 9
figure 9

(a) Four cases of different navigational strategies. VQ: place cells obtained from visual cues alone are used for goal navigation by using Q-learning. VOQ: similar to the case VQ, but here place cells are obtained from both visual and olfactory cues. S: Self-marking navigation (no place cells) where the rat follows self-generated marks to find a goal. VOQS: Combined navigation where the rat marks the location only if the Q-value (obtained from the VOQ) has reached a given threshold. (b, c) Comparison of different goal navigation strategies. The average number of steps needed to find the goal is plotted versus the number of runs in 200 experiments. The vertical bars show the standard error mean (SEM). (c) Comparison between the case S and VOQS (see panel a) where the self-generated marks were “cleaned” after run 75

3.2.2 Hierarchical input preference in spatial navigation

In the presented combined strategy scent trails are used by the rat to find a goal after learning. However, this kind of strategy is inconsistent with biological findings. Maaswinkel and Whishaw (1999) showed that rats use visual cues for spatial navigation if they are available. If visual cues are not available, the rats rely on self generated odor cues. To address this problem we modified our combined navigation strategy by adding hierarchical input preference to the model. At the beginning the rat uses both environmental cues and self-marking cues (combined strategy) in order to speed up learning as described above. This differs from the previous version in that the rat stops laying and following scent marks as soon as the trail of scent marks reaches the home location, whereas Q-values are still left modifiable. Furthermore, the rat prefers environmental cues (i.e. navigation based on Q-values) if they are available; if not, the rat follows previously generated scent marks. Here we use a combined strategy (Q-learning with self-marking navigation) for learning as it makes learning faster and only later on we use the hierarchical input preference for navigation. During learning, Q-values as well as odor marks are generated where initially the Q-value development dominates in the learning and guides the placing of the odor patches since the rat lays a scent mark only if the normalized maximum Q-value at this location has reached a given threshold. As we would associate the Q-system with landmarks we find that during learning we are, due to Q-dominance, compatible with Maaswinkel and Whishaw (1999). Note, that if we were starting with the hierarchical input preference from the beginning then this would lead to a slower convergence since the rat would learn the route based on landmarks alone (without self-generated odor marks) and this would lead to the results obtained by using Q-learning algorithm alone. After learning the model allows distinguishing between different input preferences. To demonstrate such a behavior we have performed two different experiments. In the first experiment we flipped the self-generated scent marks after learning along the diagonal of the box in a way that the scent trail does not lead to the goal anymore (see left and right panels in Fig. 10(a, b)), where environmental cues were left unaffected. In the second experiment we removed all environmental cues (visual and olfactory) after learning and left scent trail unaffected. Two examples of single results from the first experiment are shown in Fig. 10(a, b), where in the left sub-panel we show the scent trail and the rat’s trajectory at the end of learning and in the right sub-panel we show three trajectories of consecutive runs after scent marks were flipped. We found that the rat takes a correct route to the goal using environmental cues. We also noticed that the route is along the trail of scent marks that were produced during learning, which means that the rat has created two similar representations of route to the goal, where one is based on environmental cues and the other based on self-laid scent marks. After learning, the rat prefers environmental cues, so the rat’s performance remains unaffected when we flip scent marks. Statistics for 200 experiments are presented in panel c. We show the average number of steps needed to find a goal versus number of runs, where after 49 runs we flipped the scent trail. This analysis shows that the rat finds a path using combined navigation after approximately 20 runs, on average. After learning, the rat switches to the navigation based on environmental cues, and we observe an upwards curve shift due to the exploration and exploitation strategy of the Q-learning. As expected, the rat’s performance is not affected after scent marks were flipped since the rat prefers environmental cues after learning. Statistics for the second experiment are presented in Fig. 10(d) where we can see that as soon as environmental cues are unavailable (i.e. removed) the rat follows the trail of scent marks which leads to the food source. Lack of exploration in this case leads to the noise free flat line after run 49. Our modified model captures similar properties of hierarchical input preference observed in animals (Maaswinkel and Whishaw 1999). For further discussion and relation to biological data see the Section 4.

Fig. 10
figure 10

(a-c) Navigation results when self-generated marks were flipped after run 49. (a, b) Results of single experiments: self-generated marks and rat’s trajectory at the end of learning (left) and flipped self-generated marks and rat’s trajectories of three consecutive runs after scent marks were flipped. (c) The average number of steps needed to find the goal is plotted versus the number of runs in 200 experiments. The vertical bars show the standard error mean (SEM). (d) Navigation results when environmental cues were removed after run 49. The average number of steps together with SEM is plotted versus the number of runs in 200 experiments

3.3 Remapping

3.3.1 Remapping of PFs

The resulting PFs of a remapping experiment when switching between environments “A” and “B” are shown in Fig. 11(a), with the same selected 100 of total 500 place cells shown for each case. As expected, we can see that PFs of cells can change their firing rate, position, shape, or turn on/off. Note that there are also cells which do not change their properties in both environments. The average distribution of change in maximal firing rates of PFs between environments “A” and “B” in 100 experiments is shown in Fig. 11(b). Note that we show change in firing rates of PFs only for cells with maximum firing rate r > 0.5, which are the cells that actually drive Q-learning. Positive values mean that cells increased firing rate or turned on when moving the rat from the environment “A” to “B” and vice versa. The distribution of changes in the positions of PFs (only with maximum firing rate r > 0.5) is presented in Fig. 11(c), where we plot the average distance between PFs centers (given by the location of the maximal firing in the PF) in environment “A” versus “B”. Place cells, as expected, display their original fields when returned to “A” from “B” back to “A” (see Fig. 11(a)).

Fig. 11
figure 11

(a) Remapping of PFs from environment “A” to “B” and from the environment “B” back to “A”. The same selected cells (100 of total 500) are presented in all three cases. (b) Average difference between maximum firing rate of PFs in environment “A” and “B” together with standard deviation (SD) are plotted for 100 experiments. -1 means that the cell stopped firing when switched to the other environment and +1 means that the cell was off in environment “A” but turned on when moved to environment “B”. (c) Average distance between centers of PFs in environment “A” and “B” together with SD are plotted for 100 experiments. (d-g) Comparison of goal navigation strategies with respect to different environmental setups: (d), (e) - only environmental change, (f ), (g) - the environment an the location of the goal changed (see Fig. 3(a, b)). The average number of steps needed to find the goal are plotted versus the number of runs in 200 experiments. The vertical bars show the standard error mean (SEM). Cases VOQ and VOQS are as explained in Fig. 9(a). Control: the same as in case VOQ, but we start learning with random Q-values at the beginning in the environment “A” and “B” whereas in case VOQ we initialize weights with zero Q-values only at the very beginning and do not reset values while switching between the environments

3.3.2 Remapping and goal navigation

In the following subsection we present results on spatial navigation with respect to the remapping of PFs when switching between to different environments. For environmental setup see Fig. 3. The results of goal navigation while switching between environments “A” and “B” are shown in Fig. 11(d–g), where the average number of steps needed to find the food source is plotted versus number of runs for 200 experiments. Navigation results obtained by using Q-learning based on PCs obtained from visual and olfactory stimuli (VOQ) are presented in panel d, and results of the combined method (VOQS) are shown in panel e. Note that here we used a combined strategy without hierarchical input preference, i.e. the rate would still follow a scent trail after learning. We can see that by using both navigation strategies the rat can learn to find the goal in two environments “A” and “B”, whenever the location of the food source is the same in both environments, and it goes directly to the goal after returning to the previous environment. It is worthwhile to note that in our model we do not introduce unfamiliar cues to the rat in the new environment, but we just “fool” the rat by switching visual cues and changing the position and shape of olfactory cues. That is why we also observe that the rat uses some information (i.e. learned Q-values) from the previous environment, and it does not have to relearn from scratch when moved to the new environment. In panel d, for comparison, we show the control case where in environments “A” and “B” we initialize Q-values randomly from a uniform distribution within the interval [0;1]. The results for the goal navigation while switching between environments “A” and “C”(the location of the goal is also changed) for the cases VOQ and VOQS are presented in Fig. 11 (f, g) respectively. Here we found that the rat has to relearn the food location all the time, even if returned to the previously visited environment. However, by employing the combined strategy (see panel g), the rat can easily find the food source in both environments even if the location of the goal is changed, because the rat just follows the trail of scent marks. Note that if we used the combined strategy with hierarchical input preference we would have obtained results similarly to the case VOQ (see panel f), since after learning the rat would prefer environmental cues and navigate according Q-values. In general, we observed that the rat can learn both environments when location of the goal is unchanged but has to relearn the route in case of changes in both environmental cues and location of the goal. For further discussion on remapping results see the Section 4.

4 Discussion

In the following we compare our place cell model and goal navigation strategies with other approaches. We also discuss our results in relation to biological data.

A starting point for this study was experimental data which show that olfactory cues play an important role for the stability of PFs (Markus et al. 1994; Save et al. 2000) and navigation of rodents (Tomlinson and Johnston 1991; Lavenex and Schenk 1995, 1996, 1998; Wallace et al. 2002a, 2003). We have for the first time, to our knowledge, implemented an odor supported place cell model and applied it for goal navigation learning. Based on self-marking behavior in rodents (Harley and Martin 1999), we proposed a novel navigation mechanism which allows better performance in goal directed navigation. We predict that use of environmental odor cues improve omni-directionality of place cells which as a consequence results in faster goal directed learning, whereas use of self-generated scent marks results in even faster learning, and could serve as an additional information for path finding when environmental cues are not available.

4.1 Place cell model

We modeled place cells from visual and olfactory cues using a feed-forward network based on radial basis functions (RBF). Here we used an abstract model excluding interactions between hippocampal layers. This is justified as we did not focus on the place model itself but rather on the contribution of sensory inputs to the formation of place cells and on the utilization of place cells in spatial navigation. Our approach is similar to the model of O’Keefe and Burgess (1996) or Hartley et al. (2000), but we use n-dimensional RBFs instead of calculating the thresholded sum of the Gaussian tuning-curves of the rat’s distance from each box wall (O’Keefe and Burgess 1996). Our model differs from the augmented model of Hartley et al. (2000), where the firing rate of a place cell is modeled as the thresholded sum of boundary vector cells (BVCs). The response of a BVC is the product of two Gaussian tuning curves, where one is a function of the distance from the rat to the wall and the second is a function of the rat’s head direction (Hartley et al. 2000). In these models, the amplitude and the width of the PF depend on the distance to the wall: the larger the distance, the lower the amplitude and the broader the field, and vice versa. In our model we keep the width of the PF σ f fixed and the obtained PFs that vary in shape and amplitude because of the combination of different sensory inputs (see Fig. 4(c)). We use a winner-takes-all mechanism for PF formation, which means that we do not change weights of neighbor neurons as in self-organizing map (SOM) approaches (Chokshi et al. 2003; Ollington and Vamplew 2004) as there are no obvious topographical relations between the positions of the PFs and the anatomical locations of the place cells relative to each other within the hippocampus (O’Keefe 1999).

In several studies (Arleo and Gerstner 2000; Arleo et al. 2004; Sheynikhovich et al. 2005; Strösslin et al. 2005) self-motion cues have been used as an additional input to hippocampus to create place cells. The disadvantage of self-motion cues is that path integration leads to an accumulation of errors in direction and distance, and needs to be re-calibrated according to position estimation from stable cues (Etienne et al. 1996, 2004). Save et al. (2000) have shown that path integration alone is insufficient to maintain the stability of PFs. If visual or olfactory sensory cues are available then these cues dominate over path integration information (Maaswinkel and Whishaw 1999; Whishaw et al. 2001). In contrast to other models we use odor cues as an additional input to form place cells. For the sake of simplicity we model static odors. Models of dynamic odors are quite complex and include many parameters (Boeker et al. 2000). By using static odors we ignore odor patch development, and effects that might be induced by changes of odors in time. Here we concentrate only on an odor function as a reference cue that is sensed unambiguously by the rat, as opposed to visual cues, which might be mismatched, misinterpreted or not seen at all. Obtained PFs capture similar properties to those that were found in the rats’ hippocampus (Muller and Kubie 1987; Muller et al. 1994; Wilson and McNaughton 1993; O’Keefe 1999).

Place cells tend to be less directional when navigating in an open environment as compared to navigation where the rat is forced to move along a specific direction (McNaughton et al. 1983; Muller et al. 1994; Markus et al. 1995). These properties has been also captured by the models of Sharp (1991) and Brunel and Trullier (1998). In this study, we have investigated the contribution of olfactory input to the directionality of place cells. From our analysis, we found that if olfactory cues are available for the formation of place cells, more omnidirectional fields develop. This agrees with observations of PFs by Battaglia et al. (2004) on cue-rich and cue-poor linear tracks. The proportion of omnidirectional cells over total spatially selective cells was ≈ 43% in a cue-rich environment vs. ≈ 30% in a cue-poor environment. We obtained more omnidirectional cells because cells tend to be more directional in eight-arm mazes or T-mazes compared to open environments (Muller et al. 1994; Markus et al. 1995). Our results support the notion that place cell directionality should influence goal directed behavior as we obtained better performance in a goal navigation task when using place cells formed from both visual and combined stimuli than when using place cells formed from visual cues alone.

4.2 Goal navigation learning

In the second part of our study we presented different navigation strategies and compared them in a goal navigation task and in a remapping situation. Goal navigation based on place cells has previously been addressed by implementing reinforcement learning algorithms (Arleo and Gerstner 2000; Arleo et al. 2004; Foster et al. 2000; Strösslin et al. 2005; Sheynikhovich et al. 2005; Krichmar et al. 2005). We presented a new navigation mechanism that combined Q-learning with navigation based on self-generated odor patches in order to achieve better performance in goal directed navigation. Our approach differs from that of Russell (1995), who developed a robotic system where the robot is able to lay an odor trail on the ground and to follow the trail afterward. In his approach the robot is not using odor marking to find a goal, whereas in our approach, the rat lays scent marks in order to find a goal and to create a trail, which leads to the food source. The proposed mechanism, based on self-marking, propagates scent marks backwards from the location of the reward as in reinforcement learning, but here we do not have predefined features, but rather create them “on the fly”, and we do not directly memorize action values associated to states. The mechanism of RBFFootnote 1-like features created on-line in action learning was used in several other studies (Kretchmar and Anderson 1997; Atkeson et al. 1997). The method of updating odor marks resembles a TD(0) approach with function approximation (Sutton and Barto 1998), where the weights towards the value function are increased if the following states have high values. The update rule in our study is different from the one used in TD. Here, updates of odor marks are made by a fixed amount based on the binary decision whether some odor is sensed at the current location or not.

Experimental data show that rats perform better in cue-rich environments compared to the cue-poor environments. Barnes et al. (1980) showed that if all of the extra-maze cues surrounding a circular maze were removed, rats made many more errors finding a goal location. Morris (1984) demonstrated that rats performed worse when he obscured some of the cues around the water maze by pulling the curtains 1/4 of the way around. When he obscured all of the extra-maze cues by pulling the curtains fully around, the rats performed very badly. Prados and Trobalon (1998) showed that rats could learn the platform location in a water maze if 4 or 2 extra-maze cues were available, but they were much worse if only 1 cue was present. We addressed these findings by testing the performance of our model rat with and without olfactory input where we served that the model rat performed significantly better with both, visual and olfactory, cues compared to visual stimuli alone.

The experiments of Maaswinkel and Whishaw (1999) suggest that rats have a hierarchical preference in using sensory cues. In their experiments, rats ignored distortion in self-motion cues when they where moved to a new starting position or ignored distortion in odor cues (scent marks) when the apparatus was rotated suggesting that visual cues dominate over other cues whenever they are available. However, when blindfolded, the rats still performed well suggesting that they were using odor cues when available, and path integration when odor cues were disrupted. To address these findings we modified our combined navigation strategy by adding an input preference component where the rat uses both environmental and self generated cues for the learning. After learning the rat prefers environmental cues if they are available and uses self-generated olfactory cues when visual cues are not available. By using such an modified strategy, we have demonstrated that the model rat succeeds in faster goal directed learning showing unaffected performance when environmental cues are changed. This is supported by the finding that rat can find a goal when scent trail is distorted or removed, or can find the route to the goal using self-laid odor cues when environmental cues are unavailable.

4.3 Remapping and goal navigation

The results for goal navigation with respect to remapping of place cells show that the rat can learn to find a goal in two environments, “A” and “B”, by using Q-learning or combined navigation when the location of the goal is unchanged, but environmental cues are switched. Note that the rat can learn both environments only as long as different, partially overlapping subsets of place cells fire in the environments “A” and “B”, i.e. most of the cells, which do not fire in the environment “A”, fire in the environment “B”. In case of cue rotation the rat would need to relearn the task all the time if the location of goal is not rotated together with landmarks, because in both environments the same subset of place cells would be used. This is an equivalent of leaving the environment the same, but changing the location of the goal. Also in the Morris water-maze experiment (Morris 1981) the rat also has to relearn the location of the platform every time whenever it is moved to another location. When environments are substantially different and the cells remap, in our experiments the rat can easily find the food source in both environments even if the location of the goal is changed by employing the combined strategy, because the rat can use the trail of scent marks.

Our model predicts that the remapping of PFs would disrupt a previously learned route to a goal. The closest empirical data addressing this prediction is a study by Jeffery et al. (2003), who examined the relationship between remapping and performance of a spatial navigation task. In their experiment, rats were trained to search for a food source in a black box, and subsequently tested in a white box. Jeffery et al. (2003) found that place cells re-mapped between the two boxes, and although the rats were slightly worse in the second environment, they still performed well. This finding suggests that, although the place cells may encode spatial contexts, they dont directly guide behavior. One difference between the experimental situation of Jeffery et al. (2003) and that of the current model is that in the experimental situation there were no landmarks within the square apparatus. Instead, rats relied on spatial landmarks - posters on the curtains surrounding the apparatus - for orientation. So, in the Jeffery et al. (2003) experiment, unlike in our model, cues outside the immediate environment were the only way in which the animal could distinguish the correct corner. The results of Yoganarasimha and Knierim (2005) suggest that head direction cells are influenced by distal landmarks, whereas some place cells are influenced by local landmarks. Thus it may be that the Jeffery et al. (2003) task was one that could not be solved using place cells, because there was no way of distinguishing one corner of the apparatus from the other because there were no local cues available within the square. Rats may have used a non-place cell representation - such as the head direction cell system - to solve the task. Had there been local cues inside the square enclosure and no cues outside the enclosure, a stronger link between remapping and disrupted navigation may have been observed. An acknowledged difficulty with this account, however, is that Jeffery et al. (2003) also show that this task is impaired by lesions of the hippocampus.

4.4 Predictions and suggested experiments

Present experimental studies on spatial learning in cue-rich-cue-poor environments are still based on visual cues alone (Barnes et al. 1980; Morris 1984; Prados and Trobalon 1998). They also test the performance of the rat after learning. It would thus be interesting to test whether real animals would learn the task faster in environments with additional olfactory cues compared to visual stimuli alone as our model predicts.

Experiments on self-marking behavior in the process of learning would be useful to prove or disprove the proposed setup and hypothesis that self-marking behavior speeds-up learning.

In the Jeffery et al. (2003) experiment on place cell remapping and goal navigation, it may be that the task was one that could not be solved using place cells, be cause there was no way of distinguishing one corner of the apparatus from the other because there were no local cues available within the square. It would be interesting to make more experiments in order to test the hypothesis whether remapping of place cells influences goal directed learning or not as our model predicts.

By using a combined strategy with hierarchical input preference the model rat creates two representations of the route to the goal: one is based on environmental cues while the other is based on self-generated scent marks. Our model predicts that in case of remapping, when the goal in two environments is at different locations, the rat would fail when moved back to the previous environment since it would prefer environmental cues. We would hypothesize that the rat could use the scent trail in the next trial after it fails to find a goal when using environmental cues. Experiments to test this hypothesis would also be of great interest.